The lexical-graph package provides a framework for automating the construction of a hierarchical lexical graph from unstructured data, and composing question-answering strategies that query this graph when answering user questions.
- Built-in graph store support for Amazon Neptune Analytics, Amazon Neptune Database, and Neo4j.
- Built-in vector store support for Neptune Analytics, Amazon OpenSearch Serverless, Amazon S3 Vectors and Postgres with the pgvector extension.
- Built-in support for foundation models (LLMs and embedding models) on Amazon Bedrock.
- Easily extended to support additional graph and vector stores and model backends.
- Multi-tenancy – multiple separate lexical graphs in the same underlying graph and vector stores.
- Continuous ingest and batch extraction (using Bedrock batch inference) modes.
- Versioned updates for updating source documents and querying the state of the graph and vector stores at a point in time.
- Quickstart AWS CloudFormation templates for Neptune Database, OpenSearch Serverless, and Amazon Aurora Postgres.
The lexical-graph requires Python 3.10 or greater and pip.
Install from the latest release tag:
$ pip install https://github.com/awslabs/graphrag-toolkit/archive/refs/tags/v3.16.2.zip#subdirectory=lexical-graph
Or install from the main branch to get the latest changes:
$ pip install https://github.com/awslabs/graphrag-toolkit/archive/refs/heads/main.zip#subdirectory=lexical-graph
If you're running on AWS, you must run your application in an AWS region containing the Amazon Bedrock foundation models used by the lexical graph (see the configuration section in the documentation for details on the default models used), and must enable access to these models before running any part of the solution.
You will need to install additional dependencies for specific graph and vector store backends:
$ pip install opensearch-py llama-index-vector-stores-opensearch$ pip install psycopg2-binary pgvector$ pip install neo4jPass a connection string to GraphStoreFactory.for_graph_store() or VectorStoreFactory.for_vector_store() to select a backend:
| Store | Connection string |
|---|---|
| Neptune Analytics (graph) | neptune-graph://<graph-id> |
| Neptune Database (graph) | neptune-db://<hostname> or any hostname ending .neptune.amazonaws.com |
| Neo4j (graph) | bolt://, bolt+ssc://, bolt+s://, neo4j://, neo4j+ssc://, or neo4j+s:// URLs |
| OpenSearch Serverless (vector) | aoss://<url> |
| Neptune Analytics (vector) | neptune-graph://<graph-id> |
| pgvector (vector) | constructed via PGVectorIndexFactory |
| S3 Vectors (vector) | constructed via S3VectorIndexFactory |
| Dummy / no-op | None or any unrecognised string — falls back to DummyGraphStore / DummyVectorIndex |
from graphrag_toolkit.lexical_graph import LexicalGraphIndex
from graphrag_toolkit.lexical_graph.storage import GraphStoreFactory
from graphrag_toolkit.lexical_graph.storage import VectorStoreFactory
# requires pip install llama-index-readers-web
from llama_index.readers.web import SimpleWebPageReader
def run_extract_and_build():
with (
GraphStoreFactory.for_graph_store(
'neptune-db://my-graph.cluster-abcdefghijkl.us-east-1.neptune.amazonaws.com'
) as graph_store,
VectorStoreFactory.for_vector_store(
'aoss://https://abcdefghijkl.us-east-1.aoss.amazonaws.com'
) as vector_store
):
graph_index = LexicalGraphIndex(
graph_store,
vector_store
)
doc_urls = [
'https://docs.aws.amazon.com/neptune/latest/userguide/intro.html',
'https://docs.aws.amazon.com/neptune-analytics/latest/userguide/what-is-neptune-analytics.html',
'https://docs.aws.amazon.com/neptune-analytics/latest/userguide/neptune-analytics-features.html',
'https://docs.aws.amazon.com/neptune-analytics/latest/userguide/neptune-analytics-vs-neptune-database.html'
]
docs = SimpleWebPageReader(
html_to_text=True,
metadata_fn=lambda url:{'url': url}
).load_data(doc_urls)
graph_index.extract_and_build(docs, show_progress=True)
if __name__ == '__main__':
run_extract_and_build()from graphrag_toolkit.lexical_graph import LexicalGraphQueryEngine
from graphrag_toolkit.lexical_graph.storage import GraphStoreFactory
from graphrag_toolkit.lexical_graph.storage import VectorStoreFactory
def run_query():
with (
GraphStoreFactory.for_graph_store(
'neptune-db://my-graph.cluster-abcdefghijkl.us-east-1.neptune.amazonaws.com'
) as graph_store,
VectorStoreFactory.for_vector_store(
'aoss://https://abcdefghijkl.us-east-1.aoss.amazonaws.com'
) as vector_store
):
query_engine = LexicalGraphQueryEngine.for_traversal_based_search(
graph_store,
vector_store
)
response = query_engine.query('''What are the differences between Neptune Database
and Neptune Analytics?''')
print(response.response)
if __name__ == '__main__':
run_query()This project is licensed under the Apache-2.0 License.