Skip to content

Nastiiasaenko/Data-Commons-

Repository files navigation

Google Data Commons API - Research & Exploration

Overview

Google Data Commons provides a structured and linked data repository that aggregates datasets from various sources and organizes them into a Knowledge Graph (KG). This project demonstrates how to effectively use the Google Data Commons API for data retrieval, exploration, and visualization.

The tutorials in this repository provide step-by-step guidance on:

  • Understanding Knowledge Graphs and their structure.
  • Accessing and querying Google Data Commons using the Python API.
  • Retrieving statistical variables, entity relationships, and geospatial data.
  • Using Python and Pandas to manipulate and analyze data.
  • Visualizing interconnected relationships in the Knowledge Graph.

Table of Contents


Installation

To get started, install the necessary dependencies:

pip install datacommons datacommons_pandas pygraphviz

If using Google Colab, you may also need:

!apt-get install -y graphviz libgraphviz-dev pkg-config
!pip install datacommons pygraphviz

Understanding Data Commons

Google Data Commons is built as a Knowledge Graph (KG), which organizes information as nodes (entities) and edges (relationships). These entities can include:

  • Countries, States, and Cities
  • Population and Demographic Statistics
  • Environmental Data (e.g., Carbon Emissions)
  • Economic Indicators (e.g., Median Income, Unemployment Rates)

Key Features:

Unified Data Model - Combines multiple datasets into a single knowledge graph.
Pre-processed Data - Eliminates the need for manual cleaning.
APIs for Access - Retrieve structured data programmatically using Python or REST API.
Graph Structure - Allows relational queries (e.g., "Nearby Places", "Contained In").

More details: Google Data Commons

Tutorials & Notebooks

This repository contains three interactive Jupyter Notebooks demonstrating different aspects of the API:

Notebook Description
Data Commons Basics Introduction to the API, retrieving properties, and exploring the graph structure.
Exploring Interconnectivity Advanced queries using get_triples(), get_places_in(), and visualizing networks.
Building Emissions Dataset Retrieving environmental data (CO₂ emissions, methane levels) and constructing time-series datasets.

Concepts & API Methods

This project explores key Data Commons API methods related to Knowledge Graphs, statistical variables, and geospatial data.

Concept Description Example API Method
Knowledge Graph (KG) Graph-based data model storing relationships between entities. get_triples(dcids)
Entity (Node) A single object in the KG (e.g., country, city, person). get_property_values(dcids, properties)
DCID (Identifier) Unique identifier assigned to each entity. "geoId/06" for California
Relationships (Edges) Connections between nodes (e.g., "located in", "nearby"). get_triples(['geoId/12'])
Statistical Variable Measurable data points (e.g., population, GDP). get_stat_series(place, stat_var)
Nearby Places List of closest geographic locations to a given entity. get_places_in([geoId/06], "County")
Contained In Finds parent regions of a given entity. get_property_values(['geoId/12'], ['containedInPlace'])

How to Use the Tutorials

Follow these steps to explore Google Data Commons using this repository:

1️⃣ Run the First Notebook

Start with Data_Commons_Tutorial_Draft_v1.ipynb to understand the API basics.

import datacommons as dc
dcid = 'geoId/12'  # Florida
properties = dc.get_property_values([dcid], ['name', 'latitude', 'longitude'])
print(properties)

This returns:

{"geoId/12": {"name": "Florida", "latitude": 27.994402"}}

2️⃣ Retrieve Statistical Data

Use get_stat_series() to fetch emissions data for US states.

dc.get_stat_series('geoId/12', 'Annual_Emissions_CarbonDioxide_NonBiogenic')

3️⃣ Visualize the Knowledge Graph

Use NetworkX to generate a graph of entity relationships.

import networkx as nx
import matplotlib.pyplot as plt

G = nx.DiGraph()
triples = dc.get_triples(['geoId/12'])
for s, p, o in triples['geoId/12']:
    G.add_edge(s, o, label=p)

nx.draw(G, with_labels=True)
plt.show()

➡️ (This will generate a graph visualization showing relationships for Florida.)

References & Documentation

Data Commons API Docs

Knowledge Graph Explorer

Google Data Commons

Future Enhancements

We plan to expand this project with:

✅ Interactive Visualizations - Implement Plotly Dash for real-time data exploration.

✅ Automated Data Collection - Fetch and update emissions data dynamically.

✅ Integration with Google BigQuery - Combine Data Commons with BigQuery for large-scale analysis.

✅ Geospatial Analysis - Use GeoPandas for mapping emissions trends.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published