Solanum lycopersicum (tomato) is a plant of major agronomic interest and an increasingly studied organism.
However, knowledge on the tomato is widely spread across different databases. Bringing together information on this organism in one place could help many biologists to speed up their understanding.
Here I developed a knowledge graph (KG) on S. lycopersicum species using BioCypher.
The KG is composed of several input databases as described in the following table :
| Database | Description |
|---|---|
| Sol Genomics Network | Genome |
| miRBase | microRNA and precursor |
| PlantTFDB | TF identification |
| PlantRegMap | TF-target interaction |
| TarDB | microRNA-transcript interaction |
| DPMIND | microRNA-transcript interaction |
| PNRD | microRNA-transcript interaction |
| STRING | protein-protein interaction |
| Planteome | term associated to gene |
| Mercator4 | pathway associated to gene |
| OMA | Gene - A.thaliana gene |
Once you clone the repository, you can install the dependencies using poetry:
poetry installThen, you should be able to create the knowledge graph by first downloading all the databases. The databases must be downloaded before creating the graph.
poetry shell
python scripts/download_databases.py
python create_knowledge_graph.pyIf everything runs smoothly, you can run the Docker 🐳
NB: You can exit the poetry shell just by typing exit
After downloading the files and make sure the graph can be built, we can start the Neo4j database with the docker:
docker compose up -dYou can connect and browse the Neo4j instance at localhost:7474. No authentification is needed, just press connect.
To shutdown the docker :
docker compose down -vYou can find all the details on the graph construction here