Knows is a powerful and user-friendly tool for generating property graphs. These graphs are crucial in many fields. Knows supports multiple output formats, schema files and basic visualization capabilities, making it a go-to tool for researchers, educators and data enthusiasts.
- Customizable Graph Generation: Tailor your graphs by specifying the number of nodes and edges.
- Diverse Output Formats: Export graphs in formats like GraphML, YARS-PG 5.0, CSV, Cypher, GEXF, GML, JSON, and others.
- Flexible Output Options: Display results in the console, redirect them, or save them directly to a file.
- Integrated Graph Visualization: Conveniently visualize your graphs in SVG, PNG, JPG, or PDF format.
- Intuitive Command-Line Interface (CLI): A user-friendly CLI for streamlined graph generation and visualization.
- Docker Compatibility: Deploy Knows in Docker containers for a consistent and isolated runtime environment.
- Selectable Properties: Choose which node and edge properties should be generated.
- Custom Schema Support: Define custom node/edge types and properties using GQL-inspired (ISO/IEC 39075) JSON schema files. Includes JSON Schema for validation. Schemas support many data types (String, Int, Float, Date, enums, and more), symmetric edge properties (for mutual relationships), computed node properties (like degree), and type constraints. See SCHEMA.md for full documentation.
- Reproducible graphs: Ensure deterministic outputs by setting the
-s/--seedoption regardless of the selected output format.
Note on reproducibility: The
-s/--seedoption makes the random aspects of graph generation deterministic within the same software environment. Results may still differ across versions of Python or dependencies.
Build-in graph structure:
- Generates graphs with specified or random nodes and edges.
- Creates directed graphs.
- Nodes are labeled
Personwith unique IDs (N1, N2, N3, ..., Nn). - Nodes feature
firstNameandlastNameproperties by default. - Edges are labeled
knowsand includestrength[1..100] andlastMeetingDate[1955-01-01..2025-06-28] properties by default. - Additional node properties:
favoriteColorcompanyjobphoneNumberpostalAddressfriendCount(actual node degree - number of unique connections)preferredContactMethod[inPerson,email,postalMail,phone,textMessage,videoCall,noPreference]
- Additional edge properties:
lastMeetingCitymeetingCount[1..10000]
- Edges have random nodes, avoiding cycles.
- If edges connect the same nodes in both directions, the paired edges share
lastMeetingCity,lastMeetingDate, andmeetingCountvalues.
You can define custom graph structures using schema files. See SCHEMA.md for details and examples.
You can install knows via PyPI, Docker or run it from the source code.
-
Installation:
pip install knows[draw]
The
drawinstalls amatplotlibandscipylibraries for graph visualization. You can omit the[draw]if you don't need visualization andsvgoutput generation. -
Running Knows:
knows [options]
-
Pull Image:
docker pull lszeremeta/knows
-
Run Container:
docker run --rm lszeremeta/knows [options]
-
Build Image:
docker build -t knows . -
Run Container:
docker run --rm knows [options]
See Docker examples in Practical Examples section.
-
Clone Repository:
git clone git@github.com:lszeremeta/knows.git cd knows -
Install Requirements:
pip install .[draw]
-
Execute Knows:
python -m knows [options]
The -d/--draw option requires Tkinter.
-
Ubuntu:
sudo apt update sudo apt install python3-tk
See Installing Tkinter on Ubuntu for details.
-
macOS (Homebrew):
brew install python3 brew install python-tk
See Installing Tkinter on macOS for details.
-
Windows: On Windows, Tkinter should be installed by default with Python. No additional steps required.
usage: knows [-h] [-n NODES] [-e EDGES] [-s SEED] [-v]
[-f {yarspg,graphml,csv,cypher,gexf,gml,svg,png,jpg,pdf,adjacency_list,multiline_adjacency_list,edge_list,json}]
[--schema FILE]
[-np [{firstName,lastName,company,job,phoneNumber,favoriteColor,postalAddress,preferredContactMethod,friendCount} ...]]
[-ep [{strength,lastMeetingCity,lastMeetingDate,meetingCount} ...]] [-ap] [-d] [-l N] [--no-limit] [--hide-info]
[output]Available options may vary depending on the version. To display all available options with their descriptions use
knows -h.
output: Optional path to save the graph. For CSV format two files will be created:*_nodes.csvand*_edges.csv.
-h,--help: Show help message and exit.-n NODES,--nodes NODES: Number of nodes in the graph. Selected randomly if not specified.-e EDGES,--edges EDGES: Number of edges in the graph. Selected randomly if not specified.-s SEED,--seed SEED: Seed for random number generation to ensure reproducible results (also between various output formats).-v,--version: Show program version and exit.-f {yarspg,graphml,csv,cypher,gexf,gml,svg,png,jpg,pdf,adjacency_list,multiline_adjacency_list,edge_list,json}, --format {yarspg,graphml,csv,cypher,gexf,gml,svg,png,jpg,pdf,adjacency_list,multiline_adjacency_list,edge_list,json}: Format to output the graph. Default:yarspg. Thesvg,png,jpgandpdfformats are for simple graph visualization.--schema FILE: Path to JSON schema file defining custom node/edge types and properties. When specified, overrides-np,-ep, and-apoptions. GQL-inspired schema format (ISO/IEC 39075). See SCHEMA.md for details.-np [{firstName,lastName,company,job,phoneNumber,favoriteColor,postalAddress,friendCount,preferredContactMethod} ...], --node-props [{firstName,lastName,company,job,phoneNumber,favoriteColor,postalAddress,friendCount,preferredContactMethod} ...]:
Space-separated node properties. Available:firstName,lastName,company,job,phoneNumber,favoriteColor,postalAddress,preferredContactMethodfriendCount. Ignored when--schemais used.-ep [{strength,lastMeetingCity,lastMeetingDate,meetingCount} ...],
--edge-props [{strength,lastMeetingCity,lastMeetingDate,meetingCount} ...]:
Space-separated edge properties. Available:strength,lastMeetingCity,lastMeetingDate,meetingCount. Ignored when--schemais used.-ap,--all-props: Use all available node and edge properties. Ignored when--schemais used.-d,--draw: Show interactive graph window. Requires Tkinter. May not work in Docker.
-l N,--limit N: Maximum nodes to display (default: 50). Shows subgraph centered on most connected nodes.--no-limit: Show full graph without node limit.--hide-info: Hide node count info (e.g.,50/200 nodes) from output.
- Create a random graph in YARS-PG 5.0 format and show it:
knows # or docker run --rm lszeremeta/knows - Create a 100-node, 70-edge graph in GraphML format:
knows -n 100 -e 70 -f graphml > graph.graphml # or knows -n 100 -e 70 -f graphml graph.graphml # or docker run --rm lszeremeta/knows -n 100 -e 70 -f graphml > graph.graphml # or docker run --rm -v "$(pwd)":/data lszeremeta/knows -n 100 -e 70 -f graphml /data/graph.graphml
- Create a random graph in CSV format and save to files (nodes are written to standard output, edges to standard
error):
The latter command creates
knows -f csv > nodes.csv 2> edges.csv # or knows -f csv graph.csv # or docker run --rm lszeremeta/knows -f csv > nodes.csv 2> edges.csv # or docker run --rm -v "$(pwd)":/data lszeremeta/knows -f csv /data/graph.csv
graph_nodes.csvandgraph_edges.csv. - Create a 50-node, 20-edge graph in Cypher format:
knows -n 50 -e 20 -f cypher > graph.cypher # or knows -n 50 -e 20 -f cypher graph.cypher # or docker run --rm lszeremeta/knows -n 50 -e 20 -f cypher > graph.cypher # or docker run --rm -v "$(pwd)":/data lszeremeta/knows -n 50 -e 20 -f cypher /data/graph.cypher
- Create a 100-node, 50-edge graph in YARS-PG format:
knows -n 100 -e 50 > graph.yarspg # or knows -n 100 -e 50 graph.yarspg # or docker run --rm lszeremeta/knows -n 100 -e 50 > graph.yarspg # or docker run --rm -v "$(pwd)":/data lszeremeta/knows -n 100 -e 50 /data/graph.yarspg
- Create, save, and visualize a 100-node, 50-edge graph in SVG:
knows -n 100 -e 50 -f svg -d > graph.svg # or knows -n 100 -e 50 -f svg -d graph.svg
- Create, save a 70-node, 50-edge graph in SVG:
knows -n 70 -e 50 -f svg > graph.svg # or knows -n 70 -e 50 -f svg graph.svg # or docker run --rm lszeremeta/knows -n 70 -e 50 -f svg > graph.svg # or docker run --rm -v "$(pwd)":/data lszeremeta/knows -n 70 -e 50 -f svg /data/graph.svg
- Create, save a 10-node, 5-edge graph in PNG:
knows -n 10 -e 5 -f png > graph.png # or knows -n 10 -e 5 -f png graph.png # or docker run --rm lszeremeta/knows -n 10 -e 5 -f png > graph.png # or docker run --rm -v "$(pwd)":/data lszeremeta/knows -n 10 -e 5 -f png /data/graph.png
- Create a graph in JSON format:
knows -f json > graph.json # or knows -f json graph.json # or docker run --rm lszeremeta/knows -f json > graph.json # or docker run --rm -v "$(pwd)":/data lszeremeta/knows -f json /data/graph.json
- Create a graph with custom properties (20 nodes, 10 edges) and show it:
knows -n 20 -e 10 -np firstName favoriteColor job -ep lastMeetingCity
# or
docker run --rm lszeremeta/knows -n 20 -e 10 -np firstName favoriteColor job -ep lastMeetingCity- Create a graph with all possible properties in YARS-PG format and save it to file:
knows -ap > graph.yarspg
# or
knows -ap graph.yarspg
# or
docker run --rm lszeremeta/knows -ap > graph.yarspg
# or
docker run --rm -v "$(pwd)":/data lszeremeta/knows -ap /data/graph.yarspg- Generate a reproducible graph in CSV by setting a seed:
knows -n 3 -e 2 -s 43 -f csv
# or
docker run --rm lszeremeta/knows -n 3 -e 2 -s 43 -f csvRunning the command again with the same seed will produce the identical graph, provided the environment and dependencies remain unchanged.
- Generate the same graph as above but in YARS-PG format:
knows -n 3 -e 2 -s 43
# or
docker run --rm lszeremeta/knows -n 3 -e 2 -s 43- Generate a graph using a custom schema file:
knows -n 10 -e 15 --schema schema-examples/employee_schema.json
# or
knows -n 10 -e 15 --schema schema-examples/employee_schema.json -f cypher > employees.cypher
# or with Docker (using built-in example schemas)
docker run --rm lszeremeta/knows --schema /app/schema-examples/employee_schema.json -n 10 -e 15
docker run --rm -v "$(pwd)":/data lszeremeta/knows --schema /app/schema-examples/employee_schema.json -n 10 -e 15 -f cypher /data/employees.cypherSee SCHEMA.md for full schema documentation and more examples.
- Visualize a large graph with custom node limit:
knows -n 500 -e 300 -f svg -l 100 > graph.svg
# or
docker run --rm lszeremeta/knows -n 500 -e 300 -f svg -l 100 > graph.svg
# or
docker run --rm -v "$(pwd)":/data lszeremeta/knows -n 500 -e 300 -f svg -l 100 /data/graph.svgThis limits the visualization to 100 nodes (default is 50), centered on the most connected nodes.
- Visualize the full graph without node limit:
knows -n 200 -e 150 -f png --no-limit > graph.png
# or
docker run --rm lszeremeta/knows -n 200 -e 150 -f png --no-limit > graph.png
# or
docker run --rm -v "$(pwd)":/data lszeremeta/knows -n 200 -e 150 -f png --no-limit /data/graph.png- Create visualization without node count info:
knows -n 300 -e 200 -f svg --hide-info > graph.svg
# or
docker run --rm lszeremeta/knows -n 300 -e 200 -f svg --hide-info > graph.svg
# or
docker run --rm -v "$(pwd)":/data lszeremeta/knows -n 300 -e 200 -f svg --hide-info /data/graph.svgNote: On Windows PowerShell, replace
$(pwd)with${PWD}. On Windows Command Prompt, use%cd%.
Your ideas and contributions can make Knows even better! If you're new to open source, read How to Contribute to Open Source and CONTRIBUTING.md.
Knows is licensed under the MIT License.
