Research projects carried out by AI tools

Each directory in this repo is a separate research project carried out by an LLM tool - usually Claude Code. Every single line of text and code was written by an LLM.

I try to include prompts and links to transcripts in the PRs that added each report, or in the commits.

Research projects

sqlite-query-linter (2025-11-04)

The SQLite Query Linter is a lightweight Python library that wraps the standard sqlite3 module to provide configurable linting and rule-based analysis of SQL queries before execution. Acting as a drop-in replacement, it helps catch common syntax errors and platform incompatibilities—such as invalid types in CAST, use of unsupported functions, SELECT *, missing WHERE clauses, and string quoting mistakes—helping developers avoid runtime errors and improve code quality. Users can choose built-in rules, set severity levels, and easily define custom rules via an extensible API. Designed for flexibility, it can block execution on critical issues or run in permissive/audit-only modes, with zero dependencies other than Python's standard library. Explore code and integration options at GitHub or view usage in the included demo.py script.

Key Features & Findings:

Detects SQL mistakes commonly encountered when migrating between databases or writing raw SQLite queries
Flexible configuration: Enable/disable rules, set strictness, and use audit-only monitoring
Easy to extend for custom organizational or project rules
Applicable to development, automated testing, database migrations, and production monitoring

h3-library-benchmark (2025-11-04)

A systematic performance benchmark was conducted on two prominent Python libraries implementing Uber's H3 geospatial indexing system: h3-py (official, C-based) and h3o-python (Rust-based). Results show h3o-python consistently outperforms h3-py on core operations, achieving over 2x speedup for coordinate conversions and up to 13x faster neighbor queries, while area calculations remain comparable. The performance advantage holds steady across varied dataset sizes and H3 resolutions, suggesting h3o-python's Rust backend is highly optimized for geospatial workloads. Differences in API coverage and cell representation (string vs. integer) should inform choice based on project requirements.

Key Findings:

h3o-python is 2.2x faster for coordinate-to-cell and 1.8–2x for cell-to-coordinate conversions.
Neighbor queries with grid_disk are 10–13x faster in h3o-python.
Both libraries perform similarly for cell area calculations.
h3-py offers more features and broader API support; h3o-python excels in raw speed for core operations.

h3o-python (2025-11-03)

h3o-python delivers efficient Python bindings for the h3o Rust library, enabling fast and convenient access to H3 geospatial indexing from Python. Utilizing PyO3 and packaged with maturin, it allows encoding geographic coordinates into 64-bit H3 cell indexes, decoding indexes, performing neighborhood queries, calculating great-circle distances, and retrieving surface area metrics—all without requiring a separate H3 installation. The module bundles its Rust extension in the distributable wheel for seamless deployment, and the API mirrors the upstream Rust crate for high performance and compatibility.

Key capabilities:

Simple conversion between latitude/longitude and H3 cell indexes
Neighborhood and adjacency checks, and disk queries
Accurate area and distance calculations using H3 algorithms
Lossless string/integer conversions of H3 indexes

wazero-python-claude (2025-11-02)

Wazero Python Bindings enable seamless integration of the wazero WebAssembly runtime—written in Go—with Python applications, delivering a zero-dependency solution for running WASM modules natively from Python. The project exposes a clean, Pythonic API for instantiating modules, calling exported WASM functions, and managing resources efficiently with context managers. Performance benchmarks demonstrate rapid execution and minimal overhead between Python and WASM. While the library excels at speed and ease of use, current limitations include support only for integer argument and return types, restricted WASI features, and lack of direct memory access.

Key findings:

Near-native performance for compute-intensive WebAssembly code via wazero.
Simple Python interface with automatic resource management and no external dependencies.
Presently limited to i32/i64 arguments/results and basic WASM module features; WASI filesystem and direct memory access are not available yet.

datasette-plugin-skill (2025-10-24)

Covering every aspect of Datasette plugin development, this project creates a comprehensive skill set for authors—from bootstrapping with cookiecutter to deploying on GitHub and PyPI. It provides precise guides and working code samples for essential plugin hooks like custom SQL functions, authentication, custom views, and output formats. The resource includes an extensive API reference, best practices for configuration, static assets, and templates, plus testing and publishing workflows to ensure reliable plugins. Developers can use this to rapidly build a variety of plugins—custom SQL, visualizations, authentication handlers, data exporters, and more.

Key tools/projects:

Key findings:

Covers both sync and async hook design for performance.
Explains complete request/response and database APIs.
Provides tested patterns for authentication, authorization, routing, and output customization.

blog-tags-scikit-learn (2025-10-24)

Automatically assigning meaningful tags to historic, untagged blog posts, this project leverages the Simon Willison blog database and scikit-learn to train and compare multi-label text classification models. Four approaches—TF-IDF + Logistic Regression, Multinomial Naive Bayes, Random Forest, and LinearSVC—were tested on posts’ title and body text using the 158 most frequently used tags. LinearSVC, with probability calibration, yielded the best overall performance, striking a balance between precision (85%) and recall (56%) with an F1 score of 68%, proving especially effective for assigning multiple tags to each entry. This open-source toolkit not only automates metadata enrichment but facilitates rapid quality assessment and scalable tag prediction for content libraries.

Key findings:

LinearSVC outperformed other models, delivering the highest F1 score (0.6791) and recall.
Logistic Regression and Random Forest prioritized precision but were more conservative—missing more actual tags.
Naive Bayes offered a fast, simple solution with a solid balance of metrics.
TF-IDF features and OneVsRest multi-label strategies proved robust for text classification in high-dimensional spaces.

cmarkgfm-in-pyodide (2025-10-22)

By rewriting cmarkgfm's bindings from CFFI to the Python C API, the project successfully ported GitHub's cmark-gfm Markdown parser to Pyodide. The resulting wheel is fully functional, requires no further building, and supports all GitHub Flavored Markdown features with high performance, thanks to direct C code execution via WebAssembly. Users can integrate the package into Pyodide (see Pyodide documentation) and render robust Markdown—including tables, strikethrough, and task lists—directly in the browser. This port demonstrates a practical technique for bringing other CFFI-based packages to WebAssembly/Pyodide environments.

Key Findings:

All GFM features (tables, strikethrough, smart typography, etc.) work accurately.
Integration and pytest test suites pass 100%.
The port uses only Python C API bindings, improving compatibility and speed.
Project source & wheel available.

python-markdown-comparison (2025-10-22)

Comparing seven prominent Python markdown libraries, cmarkgfm—bindings to GitHub’s C-based CommonMark/GFM parser—proved dramatically faster (10-50x) than pure Python options such as mistune, Python-Markdown, and marko. The benchmark, spanning small to large markdown documents, consistently found cmarkgfm excels in both speed and stability, making it ideal for high-volume or performance-critical applications. However, cmarkgfm trades extensibility and custom output formats for speed, so libraries like mistune (for fast pure Python and custom rendering) or Python-Markdown (for extension-rich configurability) may be preferable for projects prioritizing flexibility or ease of customization. See cmarkgfm's repository and mistune for details.

Key findings:

cmarkgfm is 10-50x faster than pure Python markdown libraries, especially for large documents.
Pure Python options offer greater extensibility, custom output formats, and API access, but at the cost of speed.
Best library choice depends on project needs: cmarkgfm for raw speed/GFM compatibility, mistune for pure Python speed/customization, Python-Markdown for plugins/extensions.

datasette-plugin-alpha-versions (2025-10-20)

Datasette Plugins Analysis presents a systematic evaluation of 44 key plugins from the Datasette ecosystem, focusing on dependencies, permissions hooks, and release patterns as of October 2025. The study finds that 89% of these plugins rely on ALPHA versions of Datasette, with only 8 plugins having stable releases and just 5 supporting stable Datasette while using advanced hooks like register_permissions(). The open datasets, such as datasette_plugins_analysis.json and analysis scripts, support deeper inspection and maintenance planning as Datasette nears its 1.0 milestone. This enables maintainers to prioritize updates for plugins with alpha dependencies and track release maturity across the ecosystem.

Key Findings:

39 plugins depend on Datasette ALPHA versions; 34 of these have no stable releases.
Only 5 plugins use register_permissions() without requiring ALPHA Datasette.
8 of the analyzed plugins currently offer at least one stable release.
Main analysis and scripts are available here for further plugin and dependency tracking.

deepseek-ocr-nvidia-spark (2025-10-20)

Successfully deployed DeepSeek-OCR on an NVIDIA GB10 (ARM64, sm_121) by upgrading to PyTorch 2.9.0+cu130 so CUDA 13.0 wheels could be used instead of building from source. The repo includes automated scripts (setup.sh, run_ocr.py) that load the 6.3GB safetensors model (~34s) and run GPU inference (~58s for a 3503×1668 image), producing annotated images, markdown/text outputs and bounding boxes with validated multi-column accuracy. Flash-attn failed to compile on ARM64 and the pipeline falls back to eager attention, but overall accuracy and production readiness were confirmed. Reproducible instructions, logs and scripts are provided in the DeepSeek-OCR repo and the PyTorch cu130 wheel index linked below.

Key findings: PyTorch 2.9.0+cu130 provides forward compatibility for sm_121 (no source build needed).
Performance: model load ≈34s, inference ≈58s; detected 2257 text tokens / 921 vision tokens.
Artifacts & links: DeepSeek-OCR code/model (https://github.com/deepseek-ai/DeepSeek-OCR) and PyTorch cu130 wheel index (https://download.pytorch.org/whl/cu130).

sqlite-permissions-poc (2025-10-20)

A proof-of-concept implements a fully SQLite-based hierarchical permission system that computes allowed database/table pairs by cascading rules across child (table), parent (database), and global levels with DENY-over-ALLOW semantics; it uses only plain SQL (CTEs + SQLite JSON functions) and is built on SQLite (https://sqlite.org). Actor and token inputs are JSON-parsed inside the query so a single CTE-based SQL statement resolves per-resource decisions (child → parent → global) and then intersects results with optional token scope, ensuring tokens can only restrict, not grant, access; behavior is validated with a pytest test suite (https://pytest.org). The demo includes a minimal schema, multiple simulated “hook” rule sources, example data, and 11 test scenarios that show child-level ALLOW overriding parent DENY, child-level DENY blocking parent ALLOW, default-deny behavior, and token intersection semantics.

Key findings:

Pure-SQL implementation (no UDFs/extensions) using CTEs and sqlite JSON helpers.
Cascading precedence: child > parent > global; at the same level DENY beats ALLOW.
Token scoping applied via INTERSECT; tokens cannot elevate permissions.
Single-query engine returns final db/table pairs; schema and tests are compact and extensible.
11 pytest scenarios confirm intended conflict-resolution rules and edge cases.

minijinja-vs-jinja2 (2025-10-19)

Benchmarking the Python bindings for minijinja (https://github.com/mitsuhiko/minijinja) against Jinja2 (https://palletsprojects.com/p/jinja/) on Python 3.14 and 3.14t measured template render performance using a realistic e-commerce template with inheritance, loops, and ~65KB HTML output. The suite runs 200 iterations per scenario, captures mean/median/std/min/max, and provides reproducible scripts (run_benchmark.sh, benchmark.py) plus matplotlib charts to visualize results. Jinja2 is faster on stock Python 3.14, while minijinja gains more from the free-threaded 3.14t build, indicating minijinja may be better positioned for free-threaded Python even though it’s currently slower in absolute terms. Everything needed to reproduce the 15–20 minute benchmark and view detailed analysis is included in the repository.

Jinja2 (3.14): 0.990 ms mean vs Minijinja: 1.528 ms mean — Jinja2 ≈ 1.54× faster on 3.14
Jinja2 slows ~14% on 3.14t (1.127 ms); Minijinja speeds up ~13% on 3.14t (1.336 ms)
Artifacts: JSON results, comparison/distribution/speedup/timeline charts, and BENCHMARK_RESULTS.md with full analysis

node-pyodide (2025-10-19)

A compact demo shows how to run Python scripts inside a WebAssembly sandbox from Node.js using Pyodide: after npm install, launching node server-simple.js executes example-simple.py and writes generated files to the output/ directory. The project demonstrates a minimal server-side integration pattern for Pyodide (https://pyodide.org/) under Node.js (https://nodejs.org/) and is aimed at quick experimentation with sandboxed Python execution. It requires Node.js v16 or later and provides a simple starting point for extending Python-in-WASM workflows in Node applications.

Executes Python in WebAssembly via Pyodide and writes outputs to output/
Minimal commands: npm install; node server-simple.js
Recommended Node.js v16+ for best compatibility

Updating this README

This README uses cogapp to automatically generate project descriptions.

Automatic updates

A GitHub Action automatically runs cog -r -P README.md on every push to main and commits any changes to the README or new _summary.md files.

Manual updates

To update locally:

# Run cogapp to regenerate the project list
cog -r -P README.md

The script automatically:

Discovers all subdirectories in this folder
Gets the first commit date for each folder and sorts by most recent first
For each folder, checks if a _summary.md file exists
If the summary exists, it uses the cached version
If not, it generates a new summary using `llm -m github/gpt-4.1

` with a prompt that creates engaging descriptions with bullets and links

Creates markdown links to each project folder on GitHub
New summaries are saved to _summary.md to avoid regenerating them on every run

To regenerate a specific project's description, delete its _summary.md file and run cog -r -P README.md again.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.github/workflows		.github/workflows
claude_skills_from_apex2		claude_skills_from_apex2
claude_skills_vs_mcp		claude_skills_vs_mcp
deepseek-ocr-nvidia-spark		deepseek-ocr-nvidia-spark
mcp_code_execution		mcp_code_execution
python-markdown-comparison		python-markdown-comparison
sqlite-permissions-poc		sqlite-permissions-poc
sqlite-query-linter		sqlite-query-linter
terminalbench-comparison		terminalbench-comparison
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
GEMINI.md		GEMINI.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Research projects carried out by AI tools