Skip to content

dnl-fm/markgo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

markgo

Convert documents to Markdown. Fast, single binary, minimal dependencies.

A Go rewrite of Microsoft's MarkItDown.

Supported Formats

Format Extensions Notes
PDF .pdf Layout preserved
Excel .xlsx, .xlsm All sheets as tables
Word .docx Headings, lists, tables, formatting
HTML .html, .htm Links, images, code blocks, tables

Installation

Prerequisites

PDF support requires pdftotext from poppler-utils:

# Debian/Ubuntu
sudo apt install poppler-utils

# macOS
brew install poppler

# Fedora
sudo dnf install poppler-utils

# Arch
sudo pacman -S poppler

Build from Source

git clone https://github.com/user/markgo
cd markgo
go build -o markgo ./cmd/markgo

# Optional: install to PATH
sudo mv markgo /usr/local/bin/

Usage

Basic

# Convert a file (output to stdout)
markgo document.pdf

# Save to file
markgo document.pdf -o document.md

# Or use redirection
markgo document.pdf > document.md

From stdin

# Pipe content (use -x to specify format)
cat document.xlsx | markgo -x .xlsx

# Download and convert
curl -s https://example.com | markgo -x .html

Options

-o, --output     Output file (default: stdout)
-x, --extension  File extension hint (.pdf, .xlsx, .docx, .html)
-m, --mime-type  MIME type hint
-v, --version    Show version

Examples

PDF Invoice

$ markgo invoice.pdf

Acme Corp                                              Invoice #12345
123 Main Street                                        Date: 2025-01-15

Description                    Qty        Price           Total
─────────────────────────────────────────────────────────────────
Consulting Services            10         $150.00         $1,500.00
Development                    20         $200.00         $4,000.00

                                          Subtotal:       $5,500.00
                                          Tax (10%):        $550.00
                                          Total:          $6,050.00

Excel Spreadsheet

$ markgo data.xlsx

## Sheet1

| Name | Department | Salary |
| --- | --- | --- |
| Alice | Engineering | $95,000 |
| Bob | Marketing | $75,000 |
| Carol | Engineering | $105,000 |

Word Document

$ markgo report.docx

# Quarterly Report

## Executive Summary

This quarter showed **strong growth** in all departments...

## Key Metrics

- Revenue: $2.5M
- Customers: 1,250
- Satisfaction: 94%

HTML Webpage

$ curl -s https://example.com | markgo -x .html

# Example Domain

This domain is for use in illustrative examples in documents.

[More information...](https://www.iana.org/domains/example)

Go API

package main

import (
    "fmt"
    "log"

    "github.com/markgo/pkg/converter"
)

func main() {
    md := converter.New()

    // From file
    result, err := md.ConvertFile("document.pdf")
    if err != nil {
        log.Fatal(err)
    }
    fmt.Println(result.Markdown)

    // From reader
    file, _ := os.Open("data.xlsx")
    defer file.Close()
    
    result, err = md.ConvertReader(file, streaminfo.StreamInfo{
        Extension: ".xlsx",
    })
    if err != nil {
        log.Fatal(err)
    }
    fmt.Println(result.Markdown)
}

Comparison with MarkItDown (Python)

markgo MarkItDown
Language Go Python
Binary Single 9MB binary Requires Python + deps
PDF pdftotext (fast) pdfminer (slower)
Startup Instant ~500ms
Dependencies 1 external (pdftotext) Many Python packages

Roadmap

  • CSV / JSON / XML
  • Plain text (passthrough)
  • PowerPoint (.pptx)
  • Images (EXIF metadata)
  • ZIP archives (extract and convert)
  • URL fetching (built-in)

License

MIT

Credits

Inspired by MarkItDown by Microsoft.

About

Convert documents to Markdown. Fast, single binary. Supports PDF, Excel, Word, HTML.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published