grokipedia-py

Zero-dependency Python client for extracting structured content from Grokipedia pages.

Install

pip install grokipedia-py

uv pip install https://github.com/caentzminger/grokipedia-py.git
uv add "grokipedia-py @ git+https://github.com/caentzminger/grokipedia-py.git"

Quickstart

from grokipedia import from_url

page = from_url("https://grokipedia.com/page/13065923")

print(page.title)
print(page.slug)
print(page.intro_text)
print(page.infobox[:3])
print(page.lead_figure)
print([section.title for section in page.sections])
first_media = next(
    (
        subsection.media
        for section in page.sections
        for subsection in section.subsections
        if subsection.media
    ),
    [],
)
print(first_media[:1])
print(len(page.references))
print(page.links[:5])
print(page.metadata.keywords)
print(page.markdown[:500])
print(page.to_json(indent=2))

Parse raw HTML without network access:

from grokipedia import from_html

page = from_html(html, source_url="https://grokipedia.com/page/13065923")

Resolve a page from a title:

from grokipedia import page

page_obj = page('"Hello, World!" program')

Search for page URLs:

from grokipedia import search

results = search("hello world")
print(results[:5])

If this returns [], try:

results = search("hello world", respect_robots=False)

As of February 18, 2026, https://grokipedia.com/robots.txt disallows /api/, and /search is mostly client-rendered HTML.

Use class-based API with sitemap manifest caching:

from grokipedia import Grokipedia

wiki = Grokipedia(verbose=True)
result = wiki.page("The C Programming Language")
matches = wiki.search("programming language")

# Lazy sitemap lookup + cached child sitemap manifests.
url = wiki.find_page_url('"Hello, World!" program')
manifest = wiki.refresh_manifest()

Logging

The library uses Python's standard logging module (logger namespace: grokipedia).

import logging

logging.basicConfig(level=logging.INFO)
logging.getLogger("grokipedia").setLevel(logging.DEBUG)

Development & CI

This project stays runtime dependency-free (dependencies = []) and relies on the standard library for runtime behavior.

just setup
just fmt-py
just lint-py
just typecheck
just test
just ci

Robots behavior

from_url() enforces robots.txt by default.

respect_robots=True (default): validate robots.txt before page fetch.
search() first tries /api/full-text-search and falls back to /search HTML parsing.
allow_robots_override=False (default): strict mode.
if robots.txt is unavailable or malformed, the library fails closed with RobotsUnavailableError.
if URL is disallowed, it raises RobotsDisallowedError.

You can bypass robots enforcement by setting either:

respect_robots=False, or
allow_robots_override=True

Data model

from_url() and from_html() return Page with:

url
slug
title
intro_text
infobox (InfoboxField list for dt/dd fact rows)
lead_figure (LeadFigure from the top figure image/caption when present)
sections (Section tree with nested subsections; each section includes indexed media)
references (Reference list)
links (ordered unique links extracted from the main article)
metadata (PageMetadata, including optional keywords)

Page also includes:

lede_text (alias of intro_text)
lead_media (alias of lead_figure)
markdown
to_dict() / to_json()
from_dict() / from_json()

Exceptions

All library exceptions inherit from GrokipediaError.

FetchError
HttpStatusError
PageNotFoundError
RobotsUnavailableError
RobotsDisallowedError
ParseError

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.github		.github
src/grokipedia		src/grokipedia
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
Justfile		Justfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

grokipedia-py

Install

Quickstart

Logging

Development & CI

Robots behavior

Data model

Exceptions

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

grokipedia-py

Install

Quickstart

Logging

Development & CI

Robots behavior

Data model

Exceptions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages