Google Data Commons API - Research & Exploration

Overview

Google Data Commons provides a structured and linked data repository that aggregates datasets from various sources and organizes them into a Knowledge Graph (KG). This project demonstrates how to effectively use the Google Data Commons API for data retrieval, exploration, and visualization.

The tutorials in this repository provide step-by-step guidance on:

Understanding Knowledge Graphs and their structure.
Accessing and querying Google Data Commons using the Python API.
Retrieving statistical variables, entity relationships, and geospatial data.
Using Python and Pandas to manipulate and analyze data.
Visualizing interconnected relationships in the Knowledge Graph.

Installation

To get started, install the necessary dependencies:

pip install datacommons datacommons_pandas pygraphviz

If using Google Colab, you may also need:

!apt-get install -y graphviz libgraphviz-dev pkg-config
!pip install datacommons pygraphviz

Understanding Data Commons

Our descriptive presentation of Common Concepts

Google Data Commons is built as a Knowledge Graph (KG), which organizes information as nodes (entities) and edges (relationships). These entities can include:

Countries, States, and Cities
Population and Demographic Statistics
Environmental Data (e.g., Carbon Emissions)
Economic Indicators (e.g., Median Income, Unemployment Rates)

Key Features:

✅ Unified Data Model - Combines multiple datasets into a single knowledge graph.
✅ Pre-processed Data - Eliminates the need for manual cleaning.
✅ APIs for Access - Retrieve structured data programmatically using Python or REST API.
✅ Graph Structure - Allows relational queries (e.g., "Nearby Places", "Contained In").

More details: Google Data Commons

Tutorials & Notebooks

This repository contains three interactive Jupyter Notebooks demonstrating different aspects of the API:

Notebook	Description
Data Commons Basics	Introduction to the API, retrieving properties, and exploring the graph structure.
Exploring Interconnectivity	Advanced queries using `get_triples()`, `get_places_in()`, and visualizing networks.
Building Emissions Dataset	Retrieving environmental data (CO₂ emissions, methane levels) and constructing time-series datasets.

Concepts & API Methods

This project explores key Data Commons API methods related to Knowledge Graphs, statistical variables, and geospatial data.

Concept	Description	Example API Method
Knowledge Graph (KG)	Graph-based data model storing relationships between entities.	`get_triples(dcids)`
Entity (Node)	A single object in the KG (e.g., country, city, person).	`get_property_values(dcids, properties)`
DCID (Identifier)	Unique identifier assigned to each entity.	`"geoId/06"` for California
Relationships (Edges)	Connections between nodes (e.g., "located in", "nearby").	`get_triples(['geoId/12'])`
Statistical Variable	Measurable data points (e.g., population, GDP).	`get_stat_series(place, stat_var)`
Nearby Places	List of closest geographic locations to a given entity.	`get_places_in([geoId/06], "County")`
Contained In	Finds parent regions of a given entity.	`get_property_values(['geoId/12'], ['containedInPlace'])`

How to Use the Tutorials

Follow these steps to explore Google Data Commons using this repository:

1️⃣ Run the First Notebook

Start with Data_Commons_Tutorial_Draft_v1.ipynb to understand the API basics.

import datacommons as dc
dcid = 'geoId/12'  # Florida
properties = dc.get_property_values([dcid], ['name', 'latitude', 'longitude'])
print(properties)

This returns:

{"geoId/12": {"name": "Florida", "latitude": 27.994402"}}

2️⃣ Retrieve Statistical Data

Use get_stat_series() to fetch emissions data for US states.

dc.get_stat_series('geoId/12', 'Annual_Emissions_CarbonDioxide_NonBiogenic')

3️⃣ Visualize the Knowledge Graph

Use NetworkX to generate a graph of entity relationships.

import networkx as nx
import matplotlib.pyplot as plt

G = nx.DiGraph()
triples = dc.get_triples(['geoId/12'])
for s, p, o in triples['geoId/12']:
    G.add_edge(s, o, label=p)

nx.draw(G, with_labels=True)
plt.show()

➡️ (This will generate a graph visualization showing relationships for Florida.)

References & Documentation

Data Commons API Docs

Knowledge Graph Explorer

Google Data Commons

Future Enhancements

We plan to expand this project with:

✅ Interactive Visualizations - Implement Plotly Dash for real-time data exploration.

✅ Automated Data Collection - Fetch and update emissions data dynamically.

✅ Integration with Google BigQuery - Combine Data Commons with BigQuery for large-scale analysis.

✅ Geospatial Analysis - Use GeoPandas for mapping emissions trends.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.gitignore		.gitignore
01_data_commons_basics.ipynb		01_data_commons_basics.ipynb
02_environmental_exploration.ipynb		02_environmental_exploration.ipynb
03_exploring_graphs.ipynb		03_exploring_graphs.ipynb
Data_Commons_Tutorial_Draft_v1.ipynb		Data_Commons_Tutorial_Draft_v1.ipynb
LICENSE		LICENSE
README.md		README.md
Second_Part_tutorial.ipynb		Second_Part_tutorial.ipynb
Tutorial_blocks.ipynb		Tutorial_blocks.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Google Data Commons API - Research & Exploration

Overview

Table of Contents

Installation

Understanding Data Commons

Our descriptive presentation of Common Concepts

Key Features:

Tutorials & Notebooks

Concepts & API Methods

How to Use the Tutorials

1️⃣ Run the First Notebook

2️⃣ Retrieve Statistical Data

3️⃣ Visualize the Knowledge Graph

References & Documentation

Future Enhancements

About

Uh oh!

Releases

Packages

Languages

License

Nastiiasaenko/Data-Commons-

Folders and files

Latest commit

History

Repository files navigation

Google Data Commons API - Research & Exploration

Overview

Table of Contents

Installation

Understanding Data Commons

Our descriptive presentation of Common Concepts

Key Features:

Tutorials & Notebooks

Concepts & API Methods

How to Use the Tutorials

1️⃣ Run the First Notebook

2️⃣ Retrieve Statistical Data

3️⃣ Visualize the Knowledge Graph

References & Documentation

Future Enhancements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages