Skip to content

Latest commit

 

History

History

README.md

Lexical Graph

The lexical-graph package provides a framework for automating the construction of a hierarchical lexical graph from unstructured data, and composing question-answering strategies that query this graph when answering user questions.

Features

Installation

The lexical-graph requires Python 3.10 or greater and pip.

Install from the latest release tag:

$ pip install https://github.com/awslabs/graphrag-toolkit/archive/refs/tags/v3.16.2.zip#subdirectory=lexical-graph

Or install from the main branch to get the latest changes:

$ pip install https://github.com/awslabs/graphrag-toolkit/archive/refs/heads/main.zip#subdirectory=lexical-graph

If you're running on AWS, you must run your application in an AWS region containing the Amazon Bedrock foundation models used by the lexical graph (see the configuration section in the documentation for details on the default models used), and must enable access to these models before running any part of the solution.

Additional dependencies

You will need to install additional dependencies for specific graph and vector store backends:

Amazon OpenSearch Serverless

$ pip install opensearch-py llama-index-vector-stores-opensearch

Postgres with pgvector

$ pip install psycopg2-binary pgvector

Neo4j

$ pip install neo4j

Connection strings

Pass a connection string to GraphStoreFactory.for_graph_store() or VectorStoreFactory.for_vector_store() to select a backend:

Store Connection string
Neptune Analytics (graph) neptune-graph://<graph-id>
Neptune Database (graph) neptune-db://<hostname> or any hostname ending .neptune.amazonaws.com
Neo4j (graph) bolt://, bolt+ssc://, bolt+s://, neo4j://, neo4j+ssc://, or neo4j+s:// URLs
OpenSearch Serverless (vector) aoss://<url>
Neptune Analytics (vector) neptune-graph://<graph-id>
pgvector (vector) constructed via PGVectorIndexFactory
S3 Vectors (vector) constructed via S3VectorIndexFactory
Dummy / no-op None or any unrecognised string — falls back to DummyGraphStore / DummyVectorIndex

Example of use

Indexing

from graphrag_toolkit.lexical_graph import LexicalGraphIndex
from graphrag_toolkit.lexical_graph.storage import GraphStoreFactory
from graphrag_toolkit.lexical_graph.storage import VectorStoreFactory

# requires pip install llama-index-readers-web
from llama_index.readers.web import SimpleWebPageReader

def run_extract_and_build():

    with (
        GraphStoreFactory.for_graph_store(
            'neptune-db://my-graph.cluster-abcdefghijkl.us-east-1.neptune.amazonaws.com'
        ) as graph_store,
        VectorStoreFactory.for_vector_store(
            'aoss://https://abcdefghijkl.us-east-1.aoss.amazonaws.com'
        ) as vector_store
    ):

        graph_index = LexicalGraphIndex(
            graph_store,
            vector_store
        )

        doc_urls = [
            'https://docs.aws.amazon.com/neptune/latest/userguide/intro.html',
            'https://docs.aws.amazon.com/neptune-analytics/latest/userguide/what-is-neptune-analytics.html',
            'https://docs.aws.amazon.com/neptune-analytics/latest/userguide/neptune-analytics-features.html',
            'https://docs.aws.amazon.com/neptune-analytics/latest/userguide/neptune-analytics-vs-neptune-database.html'
        ]

        docs = SimpleWebPageReader(
            html_to_text=True,
            metadata_fn=lambda url:{'url': url}
        ).load_data(doc_urls)

        graph_index.extract_and_build(docs, show_progress=True)

if __name__ == '__main__':
    run_extract_and_build()

Querying

from graphrag_toolkit.lexical_graph import LexicalGraphQueryEngine
from graphrag_toolkit.lexical_graph.storage import GraphStoreFactory
from graphrag_toolkit.lexical_graph.storage import VectorStoreFactory

def run_query():

    with (
        GraphStoreFactory.for_graph_store(
            'neptune-db://my-graph.cluster-abcdefghijkl.us-east-1.neptune.amazonaws.com'
        ) as graph_store,
        VectorStoreFactory.for_vector_store(
            'aoss://https://abcdefghijkl.us-east-1.aoss.amazonaws.com'
        ) as vector_store
    ):

        query_engine = LexicalGraphQueryEngine.for_traversal_based_search(
            graph_store,
            vector_store
        )

        response = query_engine.query('''What are the differences between Neptune Database
                                         and Neptune Analytics?''')

        print(response.response)

if __name__ == '__main__':
    run_query()

Documentation

License

This project is licensed under the Apache-2.0 License.