THIS IS EXPERIMENTAL SOFTWARE

[WARNING]

THIS IS EXPERIMENTAL SOFTWARE

I wrangled this tool with AI because reading Curturally Responsive Computing for my course, in full, would be too painful. Literally. My brain would melt. It is a quarter knowledge with verbose ages old human pain as a remainder. I don't have space for all the violins in my head.

aqe turned out quite useful so far, though, and gave me ideas on how to scale this. It is agent use friendly ;) Just like any CLI.

The tested parts work and wrangling is ongoing - see Project Status

Keep building,

NiXLiM

[WARNING]

Academic Quote Extractor (aqe)

A Go CLI application for extracting relevant quotes from academic documents with Harvard-style citations. Designed for students who need to find quotable passages for essays and research papers.

How It Works

AQE uses a hybrid RAG (Retrieval-Augmented Generation) architecture:

Ingest -- Parse PDF, DOCX, or TXT documents via Docling, chunk them hierarchically, generate vector embeddings via Ollama, and store verbatim text in SQLite.
Extract -- Given a research topic, perform hybrid BM25 + vector search in Weaviate, send top candidates to Claude for relevance scoring, and save results.
Export -- Output saved extractions as Markdown (with blockquotes and bibliography), JSON, or BibTeX.

Zero hallucination guarantee: The LLM returns only chunk IDs and relevance scores. Quote text is always retrieved verbatim from SQLite -- never generated by the LLM.

Quick Example

# Start services
docker-compose up -d
docker exec -it ollama ollama pull nomic-embed-text

# Build
go build -o aqe ./cmd/aqe

# Ingest a document with metadata
./aqe ingest "my-paper.pdf" \
  --title "Culturally Responsive Computing" \
  --author "Walton, Devan J." \
  --year 2024
# => Processing: my-paper.pdf
# => Ingested 1 documents, 2920 chunks

# Extract quotes on a topic (saves with an auto-assigned ID)
./aqe extract "cultural bias in technology and algorithms"
# => Searching for relevant quotes...
# => Found 50 candidate chunks
# => Scoring relevance with Claude...
# =>
# => Extraction #1: "cultural bias in technology and algorithms"
# => Retrieved 20 quotes (relevance >= 60)
# =>
# => Quote 1 (Relevance: 92/100)
# => ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# => "Despite their seemingly objective nature, algorithms can, and often do,
# =>  reflect the biases of their creators..."
# =>
# => — (Walton, 2024)
# => ...
# => Saved as extraction #1. Export with: aqe export 1

# List saved extractions to see IDs
./aqe list
# => Saved Extractions:
# =>
# =>   #1: "cultural bias in technology and algorithms"
# =>       20 quotes | 2026-01-30
# =>       Export: aqe export 1

# Export using the extraction ID from above
./aqe export 1 --format markdown --output quotes.md
# => Exported to quotes.md

Documentation

User Quickstart -- Install prerequisites, start services, and run your first extraction in minutes.
Developer Quickstart -- Set up the development environment, understand the architecture, run tests, and contribute.
CLI Reference -- Complete reference for all commands, flags, output formats, error handling, and worked examples.
Project Status -- Known limitations, untested features, incomplete tasks, hardcoded values, and areas needing work.
Contributors -- Project contributors and how to contribute.

Architecture Diagrams

Implementation Architecture -- As-built component layout showing Go packages, Docker services, and data flow.
Implemented Flow -- Sequence diagram of the actual implemented data flow for all phases (ingest, extract, list, export, meta fix), annotated with real code paths and known fallbacks.

CLI Commands

Command	Description
`aqe ingest <path>`	Parse and index documents for quote extraction
`aqe extract <topic>`	Find relevant quotes for a research topic
`aqe export <id>`	Export a saved extraction (Markdown, JSON, BibTeX)
`aqe list`	List all saved extractions
`aqe meta fix`	Interactively fix missing document metadata
`aqe status`	Show infrastructure and database status

Run ./aqe --help or ./aqe <command> --help for built-in usage. See CLI Reference for the full reference with examples and error handling.

Architecture Overview

                          +------------------+
                          |   CLI (Cobra)    |
                          +--------+---------+
                                   |
              +--------------------+--------------------+
              |                    |                     |
     +--------v-------+  +--------v--------+  +--------v--------+
     |   Ingest Flow  |  |  Extract Flow   |  |  Export Flow     |
     +--------+-------+  +--------+--------+  +--------+--------+
              |                    |                     |
  +-----------+----------+   +----+----+          +-----+-----+
  | Docling   | Python   |   |Weaviate |          |  SQLite   |
  | (parsing) | Chunker  |   |(search) |          |  (data)   |
  +-----------+----------+   +----+----+          +-----+-----+
                                  |
                            +-----+------+
                            | Claude CLI |
                            | (scoring)  |
                            +------------+

Services (Docker):

Docling -- Document parsing (PDF, DOCX, TXT) with layout analysis
Weaviate -- Vector database with hybrid BM25 + semantic search
Ollama -- Local embedding generation (nomic-embed-text, 768 dimensions)

Embedded:

SQLite -- Stores documents, chunks, extractions, and quote text
Claude CLI -- Relevance scoring and explanation generation

Output Formats

Markdown

Produces blockquotes with in-text citations, relevance scores, and a bibliography section.

JSON

Structured output with quotes, references, relevance scores, and document metadata.

BibTeX

Standard BibTeX bibliography entries for all cited sources.

Requirements

Account

Claude Code -- You need an active Claude Code subscription. AQE calls the claude CLI during the extraction phase to score quote relevance. Without it, ingestion and search still work, but extraction will fail.

Software

Requirement	Version	What it does	Install
Go	1.25+	Builds and runs the CLI. CGO must be enabled (`CGO_ENABLED=1`) because SQLite uses a C driver.	go.dev/dl
Docker	20.10+	Runs Docling, Weaviate, and Ollama as containers.	docs.docker.com
Docker Compose	2.0+	Orchestrates the three services from the included `docker-compose.yml`.	Included with Docker Desktop, or install the plugin separately.
Python 3	3.9+	Runs the chunking script (`scripts/chunk_helper.py`) that splits documents into hierarchical chunks.	python.org
Claude CLI	Latest	Scores candidate chunks for relevance during extraction. Must be authenticated and available in your PATH.	`npm install -g @anthropic-ai/claude-code`

Python packages

The chunker requires two packages from the Docling project:

pip3 install "docling>=2.70.0" "docling-core>=2.0.0"

Verify your setup

go version                  # go1.25 or later
docker --version            # 20.10 or later
docker compose version      # 2.0 or later
python3 --version           # 3.9 or later
claude --version            # any recent version

All five commands should succeed before you proceed to User Quickstart.

Project Structure

cmd/aqe/          CLI entry point
internal/
  cli/            Cobra commands (ingest, extract, export, list, meta, status)
  docling/        HTTP client for Docling-serve
  chunker/        Python wrapper for HierarchicalChunker
  claude/         Claude CLI wrapper and prompt templates
  search/         Weaviate client (insert, hybrid search, delete)
  store/          SQLite operations and migrations
  harvard/        Harvard reference formatting (pure Go)
  models/         Domain types (Document, Chunk, Extraction, Quote)
scripts/          Python chunking script
tests/
  unit/           Unit tests (no Docker required)
  contract/       API contract tests (require Docker)
  integration/    End-to-end tests (require Docker)

License

This project is licensed under the MIT License. See LICENSE for details.

CopyAI (cAI) 2026 NiXLiM @ Foundry of Zero.AI

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.beads		.beads
.opencode		.opencode
.specify		.specify
files		files
internal		internal
output		output
scripts		scripts
specs/001-quote-extractor-cli		specs/001-quote-extractor-cli
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CLI_REFERENCE.md		CLI_REFERENCE.md
CONTRIBUTORS.md		CONTRIBUTORS.md
DEV_QUICKSTART.md		DEV_QUICKSTART.md
DOCLING_HOWTO.md		DOCLING_HOWTO.md
Goblins_working_on_aqe.png		Goblins_working_on_aqe.png
KNOWLEDGE.md		KNOWLEDGE.md
LICENSE		LICENSE
POSSIBLE_QUALITY_IMPROVEMENTS.md		POSSIBLE_QUALITY_IMPROVEMENTS.md
README.md		README.md
SPEC-v2.md		SPEC-v2.md
SPECIFY_INSTRUCTIONS.md		SPECIFY_INSTRUCTIONS.md
STATUS.md		STATUS.md
USERS_QUICKSTART.md		USERS_QUICKSTART.md
agent-roles.mermaid		agent-roles.mermaid
architecture-v2.mermaid		architecture-v2.mermaid
data-flow-v2.mermaid		data-flow-v2.mermaid
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum
implementation-architecture.mermaid		implementation-architecture.mermaid
implemented-flow.mermaid		implemented-flow.mermaid
inventor.png		inventor.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

THIS IS EXPERIMENTAL SOFTWARE

Academic Quote Extractor (aqe)

How It Works

Quick Example

Documentation

Architecture Diagrams

CLI Commands

Architecture Overview

Output Formats

Markdown

JSON

BibTeX

Requirements

Account

Software

Python packages

Verify your setup

Project Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

THIS IS EXPERIMENTAL SOFTWARE

Academic Quote Extractor (aqe)

How It Works

Quick Example

Documentation

Architecture Diagrams

CLI Commands

Architecture Overview

Output Formats

Markdown

JSON

BibTeX

Requirements

Account

Software

Python packages

Verify your setup

Project Structure

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages