Build ontology-aware, Wikidata-aligned knowledge graphs from raw text using LLMs
Knowledge Graphs (KGs) provide structured, verifiable representations of knowledge, enabling fact grounding and empowering large language models (LLMs) with up-to-date, real-world information. However, creating high-quality KGs from open-domain text is challenging due to issues like redundancy, inconsistency, and lack of alignment with formal ontologies.
Wikontic is a multi-stage pipeline for constructing ontology-aligned KGs from unstructured text using LLMs and Wikidata. It extracts candidate triples from raw text, then refines them through ontology-based typing, schema validation, and entity deduplication—resulting in compact, semantically coherent graphs.
-
preprocessing/constraint-preprocessing.ipynb
Jupyter notebook for collecting constraint rules from Wikidata. -
utils/
Utilities for LLM-based triple extraction and alignment with Wikidata ontology rules. -
utils/openai_utils.py
LLMTripletExtractorclass for LLM-based triple extraction.
-
utils/ontology_mappings/
JSON files containing ontology mappings from Wikidata. -
utils/structured_inference_with_db.pyStructuredInferenceWithDBclass: triple extraction and qa functions
-
utils/structured_aligner.pyAlignerclass: ontology alignment and entity name refinement
-
utils/inference_with_db.pyInferenceWithDBclass: triple extraction and qa functions
-
utils/dynamic_aligner.pyAlignerclass: entity and relation name refinement
inference_and_eval/- Scripts for building KGs for MuSiQue and HotPot datasets and evaluation of QA performance
analysis/- Notebooks with downstream analysis of the resulted KG
-
pages/andWikontic.py
Code for the web service for knowledge graph extraction and visualization. -
Dockerfile
For building a containerized web service.
-
Set up the ontology and KG databases:
./setup_db.sh -
Launch the web service:
streamlit run Wikontic.py
Enjoy building knowledge graphs with Wikontic!


