Ink2MD is a Python project that watches a configured folder in a cloud storage service, identifies new PDF files, and converts them into clean Markdown using a Large Language Model (LLM) vision endpoint. The first target integration focuses on Google Drive, with the goal of supporting additional providers in the future. The repository now ships with a fully functional development pipeline that can monitor either Google Drive (when credentials are available) or a local folder for PDFs. The default example configuration uses the agentic router, writing Markdown to one Google Drive folder and mindmaps to another (with optional local copies for both).
- Folder Monitoring – Poll a designated Google Drive folder and maintain a record of processed vs. unprocessed documents.
- PDF Extraction – Discover new PDF files and forward their content to a
configurable multimodal LLM endpoint (for example,
gemini-2.5-flash). - Prompt-Driven Conversion – Submit a reusable conversion prompt that asks the LLM to produce publication-quality Markdown from each PDF.
- Result Management – Store the generated Markdown documents in a local destination, upload them to Google Drive, or commit the results to a Git repository that can be synchronized with tools such as Obsidian. Mindmaps can upload to a separate Google Drive folder.
- Mindmap Export (optional) – Convert hand-drawn mindmaps into
FreeMind-compatible
.mmfiles and upload them to a designated Google Drive folder, with an optional local copy for debugging. - Agentic Routing (optional) – An orchestration agent inspects each PDF (and
optional hashtags like
#mm/#mindmap) and routes it to either the Markdown or Mindmap agent automatically.
The core modules that make up the project include:
- Configuration – Dataclasses and helpers that hydrate the runtime from a JSON configuration file.
- Cloud Connectors – A pluggable abstraction with concrete implementations for Google Drive and the local filesystem.
- Processing State Tracker – A JSON-backed tracker that stores processed document IDs and timestamps to avoid duplicate conversions.
- LLM Client – A pluggable interface with an initial implementation that
uses
pypdfto extract text locally and emit Markdown. This can be swapped with a real LLM integration. - Markdown Output Handlers – Write conversion results either to the local
filesystem or directly into a Git repository (committing changes and
optionally pushing to a remote). Markdown filenames are emitted as
<sanitized-title>-<YYYYMMDDHHMMSS>.md, which keeps chronological ordering predictable in Obsidian vaults and similar tools. - Mindmap Output Handler – Render a mindmap tree into FreeMind XML and upload it to a target Google Drive folder.
The project requires Python 3.10+ and the typical tooling for virtual environments and dependency management. A high-level bootstrap process looks like the following:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt # installs runtime + Gemini dependencies
pip install -e .[dev] # optional: editable install with pytestYou will also need to supply:
- Google Drive OAuth credentials for the end user whose My Drive should be
monitored. Provide the downloaded client secrets file via
google_drive.oauth_client_secrets_fileand choose a writable path forgoogle_drive.oauth_token_fileso the connector can cache the refreshable access token. Optional overrides are available for scopes if additional Drive permissions are required. - If you want to write back to Google Drive (for Markdown or mindmap outputs),
include a write scope such as
https://www.googleapis.com/auth/drive.fileingoogle_drive.scopesand delete the cached token before re-authorizing. If you need to read PDFs the app did not create, also include a read scope such ashttps://www.googleapis.com/auth/drive.readonly(or use the fulldrivescope). Example: - Configuration values describing folder IDs, polling intervals, and local
output paths. A starter configuration can be found in
example.config.json. - An optional Git repository destination. Configure
markdown.provideras"git", setmarkdown.directoryto the folder within the repository where Markdown should be written, and define themarkdown.gitblock with repository path, branch, and commit settings. - An optional prompt file that provides guidance to the downstream Markdown
generator. A default prompt lives in
prompts/markdown.txtand is referenced viamarkdown.prompt_path. Tip: When you sync results to an Obsidian vault you can pointmarkdown.prompt_pathat a dedicated note in that vault (for exampledefault-vault/ink2md prompt.md) so the prompt stays version controlled and can be edited directly from Obsidian instead of logging into the server. This introduces a possible prompt-injection attack surface, so weigh the convenience against the risk and prefer a vetted local prompt file when security is the priority. The trade-off is that the file must keep the same name and location unless you update the configuration. - LLM credentials when using a managed provider such as Gemini. Configure the
llmblock as described below and supply the API key via environment variables or a secrets manager—avoid committing secrets to git. - When running the mindmap pipeline, set
pipelineto"mindmap"and populate themindmapblock with an output folder ID and an optional custom prompt. - When running the agentic router, set
pipelineto"agentic"and populate themindmapandagenticblocks; the router usesagentic.prompt_path(defaultprompts/orchestration.txtor falls back tollm.prompt_path) to decide which agent to invoke. Setmarkdown.providerandmarkdown.google_drive.folder_idfor Markdown, andmindmap.google_drive.folder_idfor mindmaps. - Optional output settings:
markdown.asset_directorycopies the original PDFs alongside the generated Markdown using the same timestamp suffix (for example,Report-20240918103000.pdf).- When targeting an Obsidian vault, adjust
markdown.obsidian.media_modeto control how page assets are written: keep the default"pdf"to link back to the source document, or choose"png"/"jpg"to render 800px-wide, 8-bit grayscale images (PNG output additionally runs through lossless optimizers when available). Combine with the optionalmarkdown.obsidian.media_inverttoggle to invert PNG or JPG pages before they are committed to the vault. Generated Markdown and attachments use the same<name>-<timestamp>naming pattern as filesystem output to simplify cross-target automation. - The Obsidian handler pulls from the configured remote before every write. The repository must be clean (no uncommitted edits) or the processor will abort the run so you can resolve the local changes. Ensure the remote branch can be fast-forwarded—if collaborators push conflicting files the handler stops and surfaces the Git error instead of rewriting history.
To authorize access to an individual's My Drive, create a Google Cloud project,
enable the Drive API, and generate OAuth client credentials of type "Desktop
App." Download the resulting JSON secrets file and point
google_drive.oauth_client_secrets_file at its location. On the first run the
processor will open a local webserver and browser window to complete the OAuth
consent flow. In a headless session copy the printed authorization URL into a
browser, approve the requested scopes (the default is the read-only Drive scope),
and paste either the verification code or the full redirected URL back into the
running process. If you prefer to always perform the console-based exchange (for
example when SSH tunneling from a workstation), pass --headless-token on the
command line to force the console prompt and discard any cached OAuth token
before starting the flow. The connector extracts the authorization code, saves the
refreshable token to google_drive.oauth_token_file, and subsequent runs reuse
and transparently refresh that token so you do not need to reauthorize.
Add an llm block to your configuration to choose between built-in text
extraction and the Gemini integration:
"llm": {
"provider": "gemini",
"model": "models/gemini-2.5-flash",
"api_key": "${GEMINI_API_KEY}",
"prompt_path": "./prompts/markdown.txt",
"temperature": 0.0
}provider: "simple"is the default and usespypdffor basic text extraction. Also helpful for testing during installation.provider: "gemini"uploads the original PDF to Gemini 2.5 Flash and returns a consolidated Markdown response that preserves handwriting and images. SetGEMINI_API_KEYin your environment before starting the processor.prompt_pathis optional; when present the file contents are appended to the system instructions sent to the LLM.
Switch to the mindmap pipeline by setting "pipeline": "mindmap" in your
configuration. Use the mindmap block to control the prompt, destination
folder, and local copies:
"pipeline": "mindmap",
"mindmap": {
"prompt_path": "./prompts/mindmap.txt",
"keep_local_copy": true,
"google_drive": {
"folder_id": "YOUR_MINDMAP_OUTPUT_FOLDER_ID"
}
},
"google_drive": {
"folder_id": "YOUR_INPUT_FOLDER_ID",
"oauth_client_secrets_file": "./credentials/client_secret.json",
"scopes": ["https://www.googleapis.com/auth/drive.readonly"]
}When keep_local_copy is true, generated .mm files are also written to
markdown.directory; uploads always target the mindmap.google_drive
folder. The prompt in prompts/mindmap.txt asks the LLM to emit deterministic
JSON (text, children, optional link, color, priority) before the
tree is rendered to FreeMind XML.
Set "pipeline": "agentic" to let the orchestration agent decide per document.
It will route to the mindmap agent when the PDF looks like a mindmap or
contains hashtags like #mm / #mindmap, otherwise it uses the Markdown
agent. Configure both outputs:
"pipeline": "agentic",
"agentic": {
"prompt_path": "./prompts/orchestration.txt",
"hashtags": ["mm", "mindmap"]
},
"mindmap": {
"prompt_path": "./prompts/mindmap.txt",
"keep_local_copy": true,
"google_drive": { "folder_id": "YOUR_MINDMAP_OUTPUT_FOLDER_ID" }
},
"markdown": {
"provider": "google_drive",
"directory": "./output", // used when keep_local_copy is true
"google_drive": {
"folder_id": "YOUR_MARKDOWN_OUTPUT_FOLDER_ID",
"keep_local_copy": true
}
},
"google_drive": { "folder_id": "YOUR_INPUT_FOLDER_ID", ... }Use hashtags in the PDF filename to force mindmap routing when needed.
Copy example.config.json to a working file (for
example, config.local.json) and update the placeholders for Drive folder ID,
client secret locations, and the llm block. Remember to keep credentials and
token files outside version control.
The project exposes a console script and module entrypoint. Assuming a
configuration file similar to example.config.json, run:
ink2md --config example.config.json --onceor with Python directly:
python -m ink2md --config example.config.jsonOmit --once to continuously poll the configured provider using the
poll_interval defined in the configuration. On each iteration the processor
will:
- Discover PDFs from the provider.
- Skip files that already appear in the processing state file.
- Convert new PDFs into Markdown using the configured LLM client.
- Write Markdown files to the output directory.
- Record the processed document in the state tracker.
This repository is in its initialization phase. Contributions that help define the project structure, configuration management, and integrations are welcome.
This project is licensed under the terms of the MIT License. See the
LICENSE file for details.
Run the bundled installer from the repository root to provision the service, virtual environment, configuration skeleton, and supporting timers:
sudo ./scripts/install_service.shBefore running the installer ensure the host has the standard Python tooling
available—for most Debian/Ubuntu systems the following covers everything the
script expects: sudo apt install python3 python3-venv python3-pip rsync. The
installer copies the repository to /opt/ink2md, creates the
ink2md service account, bootstraps a virtual environment, renders the
systemd units, and enables the health check + retention timers. Re-run it after
pulling new changes to deploy upgrades. Override paths or toggle timers with
flags such as --prefix, --config-dir, --skip-healthcheck, and
--skip-purge.
When the script completes it prints any manual follow-up items (for example,
editing /etc/ink2md/config.json and /etc/ink2md/env). It
also creates /etc/ink2md/credentials/client_secrets.json as a
placeholder—replace it with your real Google Drive OAuth client JSON before
continuing. The installer generates an SSH deploy key at
/etc/ink2md/ssh/id_ed25519 and seeds the known_hosts file based on
the configured repository URL; copy the printed public key into the Git host
that backs your Obsidian vault before starting the service. By default the
configuration writes Markdown to /opt/ink2md/default-vault/inbox and
attachments to /opt/ink2md/default-vault/media, with the repository
root at /opt/ink2md/default-vault. Clone or initialize your Obsidian
repository in that location and configure a Git identity for the
ink2md user, for example:
sudo -u ink2md git clone git@github.com:your-org/obsidian-vault.git \
/opt/ink2md/default-vault
sudo -u ink2md git -C /opt/ink2md/default-vault config \
user.name "Ink2MD Service"
sudo -u ink2md git -C /opt/ink2md/default-vault config \
user.email "ops@example.com"The service is already enabled and running; after you finish editing those files apply the changes with:
sudo systemctl daemon-reload
sudo systemctl restart ink2md.serviceUse systemctl status ink2md or journalctl -u ink2md.service to confirm the deployment is healthy.
Note: The Obsidian output handler performs a fast-forward pull before each write and aborts if the repository has uncommitted changes. Keep the vault clone clean (commit or discard manual edits) and ensure collaborators push to a remote that the service can fast-forward. If you host the vault on a non-bare repository, configure it to accept fast-forward pushes (for example,
git config receive.denyCurrentBranch updateInstead) so the service can update the checked-out branch safely.
Authorize Google Drive access once before leaving the service unattended. Run
sudo -u ink2md /opt/ink2md/.venv/bin/ink2md \
--config /etc/ink2md/config.json --onceFollow the printed OAuth link in a browser, approve the consent screen, and
wait for the run to finish. When running on a headless host the command will
print the URL and, after you authorize in a separate browser, prompt for the
verification code—paste it back into the SSH session to complete the flow. Add
--headless-token if you want to force this console prompt and remove the
existing token cache before reauthorizing even when the host can launch a
browser. The resulting token is saved to
/var/lib/ink2md/google_drive_token.json; subsequent service runs reuse it
automatically.
The installer also updates llm.prompt_path to point at
/opt/ink2md/prompts/markdown.txt, and rewrites the Obsidian Git
settings to use the generated deploy key and known-hosts file under
/etc/ink2md/ssh. If you provide a custom prompt or different Git
credentials, store them somewhere readable by ink2md and adjust the
config to match. Tip: You can target a note inside the Obsidian vault
itself—create a page such as default-vault/ink2md prompt.md, set
llm.prompt_path to that file, and edit the prompt from Obsidian while keeping
it version controlled. This convenience opens the door to prompt-injection if
the note is tampered with, so adopt it only when the risk is acceptable and
fall back to a vetted local prompt file for the safest posture. Keep the
filename and path stable or update the configuration whenever you move it.
To run the processor autonomously on a Linux host without the installer,
provision the provided systemd unit and supporting environment file. The unit
templates include ${...} placeholders that match the installer defaults—edit
them to reflect your target paths before copying them into place.
- Create a dedicated service account, for example
sudo useradd --system --home /var/lib/ink2md --shell /usr/sbin/nologin ink2md. - Check out the repository to
/opt/ink2md(or another root owned by the service account) and install dependencies into/opt/ink2md/.venv. - Create writable directories for runtime state, logs, and temporary files such as
/var/lib/ink2mdand/var/tmp/ink2md. Grant ownership to the service user.
- Copy
deploy/systemd/ink2md.serviceto/etc/systemd/system/and adjust the service user, working directory, and virtual environment paths to match your host. - Copy
deploy/systemd/ink2md.envto/etc/ink2md/env, populate the credential paths and API keys, and set permissions so only the service account can read the file (for examplechmod 640andchown ink2md:ink2md). - Place your runtime configuration (for example
config.json) under/etc/ink2md/or another directory that the service account can access. - Reload systemd with
sudo systemctl daemon-reload, enable the unit withsudo systemctl enable --now ink2md, and inspect service status withsystemctl status ink2md.
The script scripts/check_processor_health.py summarizes the latest processed
document and optionally tails recent journal errors. Integrate it with your
monitoring stack or a systemd timer to ensure the pipeline keeps up with new
documents:
./scripts/check_processor_health.py --state-file /var/lib/ink2md/state/processed.json \
--max-age 180 --journal-unit ink2mdFor automated checks, install the provided timer template:
- Copy
deploy/systemd/ink2md-healthcheck.serviceand.timerto/etc/systemd/system/. - Adjust the script path, state file, and thresholds in the service unit.
- Enable the timer with
sudo systemctl enable --now ink2md-healthcheck.timer.
Use scripts/purge_output.py to prune generated Markdown and attachments while
retaining the most recent 30 days. Schedule it via cron or a systemd timer
alongside the service:
./scripts/purge_output.py /var/lib/ink2md/output --days 30 --recursive --remove-empty-dirsTimer templates in deploy/systemd/ink2md-purge.service and
.timer show how to run the purge job daily with a dry-run warning before
permanent deletion. Copy them into place and enable the timer to keep the output
volume bounded.
Refer to deploy/README.md for annotated installation commands and file descriptions.