A comprehensive Python client library for the GDELT (Global Database of Events, Language, and Tone) project.
- Unified Interface: Single client covering all 6 REST APIs, 3 database tables, and NGrams dataset
- Version Normalization: Transparent handling of GDELT v1/v2 differences with normalized output
- Resilience: Automatic fallback to BigQuery when APIs fail or rate limit
- Modern Python: 3.11+, Async-first, Pydantic models, type hints throughout
- Streaming: Generator-based iteration for large datasets with memory efficiency
- Developer Experience: Clear errors, progress indicators, comprehensive lookups
# Basic installation
pip install gdelt-py
# With BigQuery support
pip install gdelt-py[bigquery]
# With all optional dependencies
pip install gdelt-py[bigquery,pandas]from py_gdelt import GDELTClient
from py_gdelt.filters import DateRange, EventFilter
from datetime import date, timedelta
async with GDELTClient() as client:
# Query recent events
yesterday = date.today() - timedelta(days=1)
event_filter = EventFilter(
date_range=DateRange(start=yesterday, end=yesterday),
actor1_country="USA",
)
result = await client.events.query(event_filter)
print(f"Found {len(result)} events")
# Query Visual GKG (image analysis)
from py_gdelt.filters import VGKGFilter
vgkg_filter = VGKGFilter(
date_range=DateRange(start=yesterday),
domain="cnn.com",
)
images = await client.vgkg.query(vgkg_filter)
# Query TV NGrams (word frequencies from TV)
from py_gdelt.filters import BroadcastNGramsFilter
tv_filter = BroadcastNGramsFilter(
date_range=DateRange(start=yesterday),
station="CNN",
ngram_size=1,
)
ngrams = await client.tv_ngrams.query(tv_filter)
# Query Graph Datasets (quotes, entities, frontpage links)
from py_gdelt.filters import GQGFilter, GEGFilter
gqg_filter = GQGFilter(date_range=DateRange(start=yesterday))
quotes = await client.graphs.query_gqg(gqg_filter)
geg_filter = GEGFilter(date_range=DateRange(start=yesterday))
async for entity in client.graphs.stream_geg(geg_filter):
print(f"{entity.name}: {entity.entity_type}")- Events - Structured event data (who did what to whom, when, where)
- Mentions - Article mentions of events over time
- GKG - Global Knowledge Graph (themes, entities, tone, quotations)
- NGrams - Word and phrase occurrences in articles (Jan 2020+)
- VGKG - Visual GKG (image annotations via Cloud Vision API)
- TV-GKG - Television GKG (closed caption analysis from TV broadcasts)
- TV NGrams - Word frequencies from TV closed captions
- Radio NGrams - Word frequencies from radio transcripts
- Graph Datasets - GQG, GEG, GFG, GGG, GEMG, GAL (see below)
- DOC 2.0 - Full-text article search and discovery
- GEO 2.0 - Geographic analysis and mapping
- Context 2.0 - Sentence-level contextual search
- TV 2.0 - Television news closed caption search
- TV AI 2.0 - AI-enhanced visual TV search (labels, OCR, faces)
- LowerThird ποΈ - TV chyron/lower-third text search
- TVV ποΈ - TV Visual channel inventory
- GKG GeoJSON v1 ποΈ - Legacy geographic GKG API
- GQG - Global Quotation Graph (extracted quotes with context)
- GEG - Global Entity Graph (NER via Cloud NLP API)
- GFG - Global Frontpage Graph (homepage link tracking)
- GGG - Global Geographic Graph (location co-mentions)
- GDG ποΈ - Global Difference Graph (article change detection)
- GEMG - Global Embedded Metadata Graph (meta tags, JSON-LD)
- GRG ποΈ - Global Relationship Graph (subject-verb-object triples)
- GAL - Article List (lightweight article metadata)
- CAMEO - Event classification codes and Goldstein scale
- Themes - GKG theme taxonomy
- Countries - Country code conversions (FIPS β ISO)
- Ethnic/Religious Groups - Group classification codes
- GCAM ποΈ - 2,300+ emotional/thematic dimensions
- Image Tags ποΈ - Cloud Vision labels for DOC API
- Languages ποΈ - Supported language codes
| Data Type | API | BigQuery | Raw Files | Time Range | Fallback |
|---|---|---|---|---|---|
| Articles (fulltext) | DOC 2.0 | - | - | Rolling 3 months | - |
| Article geography | GEO 2.0 | - | - | Rolling 7 days | - |
| Sentence context | Context 2.0 | - | - | Rolling 72 hours | - |
| TV captions | TV 2.0 | - | - | Jul 2009+ | - |
| TV visual/AI | TV AI 2.0 | - | - | Jul 2010+ | - |
| TV chyrons ποΈ | LowerThird | - | - | Aug 2017+ | - |
| Events v2 | - | β | β | Feb 2015+ | β |
| Events v1 | - | β | β | 1979 - Feb 2015 | β |
| Mentions | - | β | β | Feb 2015+ | β |
| GKG v2 | - | β | β | Feb 2015+ | β |
| GKG v1 | - | β | β | Apr 2013 - Feb 2015 | β |
| Web NGrams | - | β | β | Jan 2020+ | β |
| VGKG | - | β | β | Dec 2015+ | β |
| TV-GKG | - | β | β | Jul 2009+ | β |
| TV NGrams | - | - | β | Jul 2009+ | - |
| Radio NGrams | - | - | β | 2017+ | - |
| GQG | - | - | β | Jan 2020+ | - |
| GEG | - | - | β | Jul 2016+ | - |
| GFG | - | - | β | Mar 2018+ | - |
| GGG | - | - | β | Jan 2020+ | - |
| GEMG | - | - | β | Jan 2020+ | - |
| GAL | - | - | β | Jan 2020+ | - |
ποΈ = Work in progress - coming in future releases
All I/O operations are async by default for optimal performance:
async with GDELTClient() as client:
articles = await client.doc.query(doc_filter)Synchronous wrappers are available for compatibility:
with GDELTClient() as client:
articles = client.doc.query_sync(doc_filter)Process large datasets without loading everything into memory:
async with GDELTClient() as client:
async for event in client.events.stream(event_filter):
process(event) # Memory-efficientPydantic models throughout with full type hints:
event: Event = result[0]
assert event.goldstein_scale # Type-checkedFlexible configuration via environment variables, TOML files, or programmatic settings:
settings = GDELTSettings(
timeout=60,
max_retries=5,
cache_dir=Path("/custom/cache"),
)
async with GDELTClient(settings=settings) as client:
...Full documentation available at: https://rbozydar.github.io/py-gdelt/
Contributions are welcome! See Contributing Guide for details.
MIT License - see LICENSE file for details.