This guide explains how the code fits together, provides a recommended reading order for newcomers, and documents protocols and architectural invariants. It is intended for both human contributors and AI coding agents.
For using the tool as an AI agent (tool descriptions, example invocations,
MCP tool schemas), see AGENTS.md.
(Side-note: I intend to maintain this file via AI, and an AI wrote it.)
- Project Summary
- Reading Order for Comprehension
- Package Map
- Data Flow
- Command Registration Pattern
- Key Types and Relationships
- Generated Code and
go generate - Architectural Invariants and Pitfalls
- Protocols
- External Dependencies
- Testing
- Build and Run
character is a Go CLI tool for Unicode codepoint lookup, transformations,
and encoding information. It is built on Cobra and has no runtime
server β except when run as character agent mcp, which starts an MCP
stdio server exposing eight Unicode lookup tools.
License: MIT. Copyright Phil Pennock.
For understanding the codebase end-to-end, read in this order:
| # | File(s) | Why |
|---|---|---|
| 1 | main.go |
Entry point; shows all command imports and go:generate directives |
| 2 | commands/root/root.go |
Root Cobra command; AddCommand, Start, Cobra exports |
| 3 | sources/sources.go |
Sources struct β the data aggregator everything depends on |
| 4 | unicode/unicode.go |
CharInfo, Unicode, Load, LoadSearch |
| 5 | unicode/blocks.go |
BlockInfo, Blocks, LookupInfo, FindByName |
| 6 | unicode/category.go |
GeneralCategory(r rune) string |
| 7 | unicode/emoji.go |
PresentationVariants, Emojiable |
| 8 | internal/uformat/uformat.go |
Pure byte-formatting helpers shared by CLI and MCP output |
| 9 | resultset/resultset.go |
ResultSet, JItem, JSON rendering β CLI output backbone |
| 10 | resultset/cmdrender.go |
ResultCmdFlags, flag registration, RenderPerCmdline |
| 11 | commands/name/name.go |
A simple command β shows the typical command pattern |
| 12 | internal/mcpstdio/mcpstdio.go |
MCP stdio server (~200 lines, hand-rolled) |
| 13 | commands/agent/mcpserver/charprops.go |
CharProps, CharPropsFromRune |
| 14 | commands/agent/mcpserver/tools.go |
Eight MCP tool handlers |
| 15 | commands/agent/mcpserver/server.go |
MCP server wiring |
| 16 | commands/agent/agentmcp.go |
agent mcp command β ties it all together |
After reading these 16 files you will understand every major subsystem.
github.com/philpennock/character/
β
βββ main.go Entry point, go:generate directives
βββ repo_version.go Build-time version info
β
βββ commands/ Cobra command implementations
β βββ root/ Root command, AddCommand(), Start()
β βββ name/ `name <char>β¦` β info about literal characters
β βββ named/ `named <NAME>` β lookup by Unicode name
β βββ code/ `code <U+XXXX>` β lookup by codepoint
β βββ browse/ `browse -b <block>` β list block contents
β βββ known/ `known -b` β list block/charset names
β βββ aliases/ `aliases` β alias characters
β βββ puny/ `x-puny` β punycode encode/decode
β βββ region/ `region <CC>` β flag emoji
β βββ transform/ `transform <type>` β fraktur, math, scream, turn
β βββ version/ `version` β version info
β βββ deprecated/ Deprecated command stubs
β βββ agent/ Agent sub-command tree
β βββ agent.go Parent `agent` command
β βββ agenthelp.go `agent help` β JSON schema of all commands
β βββ agentexamples.go `agent examples` β example invocations
β βββ agentmcp.go `agent mcp` β start MCP stdio server
β βββ mcpserver/ MCP tool implementations
β βββ server.go Server wrapper + NewServer
β βββ charprops.go CharProps struct, CharPropsFromRune
β βββ schemas.go JSON Schema constants for each tool
β βββ tools.go registerTools, 8 handler closures
β
βββ unicode/ Unicode data and lookups
β βββ unicode.go CharInfo, Unicode struct, Load/LoadSearch
β βββ blocks.go BlockInfo, Blocks, LookupInfo, FindByName
β βββ category.go GeneralCategory()
β βββ emoji.go PresentationVariants(), Emojiable()
β βββ regional.go Regional indicator helpers
β βββ sort.go Sort interface for CharInfo slices
β βββ generated_data.go β go generate (character name maps)
β βββ generated_blocks.go β go generate (block ranges)
β βββ generated_emoji.go β go generate (emojiable/textable sets)
β
βββ sources/ Data source aggregation
β βββ sources.go Sources struct, NewFast(), NewAll()
β βββ vim.go VimDigraph, VimData, digraph loaders
β βββ x11.go X11Data, compose sequence loader
β βββ generated_static_vim.go β go generate
β βββ generated_x11_compose.go β go generate
β
βββ entities/ HTML/XML entity lookup
β βββ generated_html.go β go generate (HTMLEntities, reverse map)
β βββ generated_xml.go β go generate (XMLEntities, reverse map)
β
βββ resultset/ CLI result rendering
β βββ resultset.go ResultSet, JItem, Add*, PrintJSON, PrintPlain
β βββ cmdrender.go ResultCmdFlags, RegisterCmdFlags, RenderPerCmdline
β
βββ internal/
β βββ mcpstdio/ Hand-rolled MCP stdio server
β β βββ mcpstdio.go Server, Handler, ToolDef, readFrame, writeFrame
β βββ uformat/ Pure rune β string formatting helpers
β β βββ uformat.go UTF8Bytes, UTF8Escaped, UnicodeEscaped, etc.
β βββ runemanip/ Rune manipulation utilities
β β βββ runes.go RuneFromHexField
β β βββ hexDecode.go HexDecodeArgs
β β βββ widths.go DisplayCellWidth
β β βββ regional.go Regional indicator helpers
β β βββ variations.go Variation selector helpers
β βββ table/ Table rendering abstraction
β β βββ tabular.go NewTable, Supported
β βββ clipboard/ Clipboard I/O (conditional build)
β βββ encodings/ Charset decoders
β
βββ extra/ Extra data files, web assets
β
βββ util/ Build-time code generators + tools
βββ update_unicode.go Generate unicode/generated_*.go
βββ update_entities.go Generate entities/generated_*.go
βββ update_x11_compose.go Generate sources/generated_x11_compose.go
βββ update_static_vim Bash: generate sources/generated_static_vim.go
βββ mcp_test_driver Python: interactive MCP REPL for testing
main.go β root.Start()
β Cobra dispatch β commands/name.Run
β sources.NewFast() load all static data (~1ms)
β resultset.NewResultSet(srcs)
β rs.AddCharacterByRune(r) populate from Sources
β rs.RenderPerCmdline() dispatch on ResultCmdFlags
β rs.PrintJSON() marshal JItem structs
main.go β root.Start()
β Cobra dispatch β commands/agent/agentmcp.Run
β sources.NewFast()
β srcs.LoadUnicodeSearch() build Ferret index (~100-300ms)
β mcpserver.NewServer(srcs)
β mcpstdio.NewServer("character", version)
β registerTools(srv, srcs) register 8 tool handlers
β srv.ServeStdio(ctx) read stdin, dispatch, write stdout
β readFrame (newline-delimited JSON)
β dispatch on method: initialize | tools/list | tools/call | β¦
β handler(ctx, args) β CharPropsFromRune(r, srcs)
β writeFrame (JSON + \n)
sources.NewFast()
= NewEmpty()
.LoadUnicode() unicode.Load() β ByRune, ByName maps
.LoadUnicodeBlocks() unicode.LoadBlocks() β sorted []BlockInfo
.LoadStaticVim() compiled-in vim digraphs
.LoadStaticX11() compiled-in X11 compose sequences
sources.NewAll() additionally calls:
.LoadUnicodeSearch() Ferret inverted-suffix index (~100-300ms)
.LoadLiveVim() runs `vim` subprocess for live digraphs
Every command package self-registers in init():
// commands/name/name.go
func init() {
root.AddCommand(nameCmd)
}main.go imports each command package with a blank import:
import (
_ "github.com/philpennock/character/commands/name"
_ "github.com/philpennock/character/commands/named"
// β¦
)This means main.go is the single list of enabled commands. Sub-commands
(e.g. agent help, agent mcp) are wired within their parent package's
init().
Most commands register shared output flags via:
resultset.RegisterCmdFlags(cmd, supportsOneline)This adds -v, -N, -J, -1, -c, emoji/text bias flags, etc., and
makes them mutually exclusive where needed.
Sources (sources/sources.go)
ββ Unicode unicode.Unicode map[rune]CharInfo, map[string]CharInfo, Search
ββ UBlocks unicode.Blocks sorted []BlockInfo
ββ Vim sources.VimData map[rune][]VimDigraph
ββ X11 sources.X11Data map[rune]string
CharInfo (unicode/unicode.go)
{Number rune, Name string, NameWidth int}
BlockInfo (unicode/blocks.go)
{Min rune, Max rune, ID BlockID, Name string}
ResultSet (resultset/resultset.go)
sources *Sources
items []charItem β JItem (JSON rendering)
JItem (resultset/resultset.go)
CLI JSON output β display-oriented fields, string decimal,
"block" as string + "block_info" as object
CharProps (commands/agent/mcpserver/charprops.go)
MCP JSON output β structured fields, int decimal,
"block" as object, no display-oriented fields
Computed by CharPropsFromRune(r, srcs)
JItem and CharProps are deliberately separate types with different JSON
contracts. Both use internal/uformat for shared byte-formatting.
Run go generate ./... from the repo root to regenerate all static data.
| Generator | Input | Output |
|---|---|---|
util/update_unicode.go |
unicode/UnicodeData.txt, unicode/Blocks.txt, unicode/emoji-variation-sequences.txt |
unicode/generated_data.go, unicode/generated_blocks.go, unicode/generated_emoji.go |
util/update_entities.go |
HTML/XML entity specs | entities/generated_html.go, entities/generated_xml.go |
util/update_x11_compose.go |
sources/Compose.en_US.UTF-8.txt |
sources/generated_x11_compose.go |
util/update_static_vim (bash) |
Runs vim |
sources/generated_static_vim.go |
Generated files are committed to the repository and should not be
hand-edited. After modifying a generator, re-run go generate and commit
the regenerated output alongside the generator change.
resultset.ResultCmdFlags is a package-level struct populated by Cobra flag
parsing. It drives CLI rendering decisions.
Invariant: CharPropsFromRune (MCP path) must never read
ResultCmdFlags. The MCP server does not import resultset at all; this is
enforced by Go's package dependency graph.
New fields in JItem use omitempty. The block object uses JSON key
"block_info" to coexist with the legacy "block" string field. In
CharProps (MCP), there is no legacy, so "block" is the structured object.
Both resultset.JSONEntry and mcpserver.CharPropsFromRune delegate
byte-formatting to internal/uformat. Do not duplicate these computations.
srcs.LoadUnicodeSearch() builds a Ferret inverted-suffix index,
taking ~100β300 ms. agent mcp calls it eagerly at startup because MCP
servers are long-lived and first-request latency matters more than startup
latency.
CJK Unified Ideographs (U+4E00βU+9FFF) are absent from srcs.Unicode.ByRune
because the Unicode standard does not assign individual names to that range.
unicode_browse_block("CJK Unified Ideographs") returns zero results. This
matches the CLI behaviour and is not a bug.
JSON-RPC 2.0 allows id to be a string, number, or null. The MCP server
preserves the raw id as json.RawMessage and echoes it unchanged. Do not
unmarshal it into interface{} and re-marshal β that risks converting bare
integers to floats via Go's default JSON decoder.
This section documents wire protocols used or referenced by the codebase, with summaries and pointers to authoritative specifications.
Used by: character agent mcp β internal/mcpstdio
MCP is a protocol for exposing tools to AI agents. The character tool
implements a tool-only MCP server over the stdio transport.
MCP stdio uses newline-delimited JSON (NDJSON). Each message is a single
JSON-RPC 2.0 object on one line terminated by \n. Messages MUST NOT
contain embedded newlines.
β {"jsonrpc":"2.0","id":1,"method":"initialize","params":{...}}\n
β {"jsonrpc":"2.0","id":1,"result":{...}}\n
This is NOT the same as LSP framing (see below).
| Method | Direction | Response |
|---|---|---|
initialize |
client β server | InitializeResult (capabilities, serverInfo, protocolVersion) |
notifications/initialized |
client β server | None (notification, no id) |
tools/list |
client β server | {"tools": [...]} |
tools/call |
client β server | {"content": [...], "isError": bool} |
- Specification: https://spec.modelcontextprotocol.io/
- Transports: https://spec.modelcontextprotocol.io/specification/basic/transports/
- Stdio transport: newline-delimited, UTF-8, no embedded newlines
- Protocol version used:
"2024-11-05" - Website: https://modelcontextprotocol.io/
- TypeScript SDK (reference impl): https://github.com/modelcontextprotocol/typescript-sdk
- Go SDK (not used here): https://github.com/modelcontextprotocol/go-sdk
Not used by this project, but referenced here because the MCP stdio transport is frequently confused with LSP's wire format.
LSP uses Content-Length framing over stdio:
Content-Length: 52\r\n
\r\n
{"jsonrpc":"2.0","id":1,"method":"initialize",...}
Each message is preceded by HTTP-style headers (Content-Length: N\r\n\r\n),
then exactly N bytes of body. The body may contain newlines.
Key difference from MCP: LSP uses Content-Length + \r\n\r\n; MCP uses
bare \n-delimited lines. Do not mix them up.
- Specification: https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/
- Base protocol (framing): https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#baseProtocol
Both MCP and LSP build on JSON-RPC 2.0.
- Requests have
jsonrpc,method,id, and optionalparams. - Notifications have
jsonrpcandmethodbut noidβ no response is sent. - Responses have
jsonrpc,id, and eitherresultorerror. - Error objects have
code(integer) andmessage(string). - Standard error codes:
-32700(parse error),-32600(invalid request),-32601(method not found),-32602(invalid params),-32603(internal error).
- Specification: https://www.jsonrpc.org/specification
| Module | Purpose | Why |
|---|---|---|
github.com/spf13/cobra |
CLI framework | Command tree, flag parsing, help generation |
github.com/spf13/pflag |
POSIX flag parsing | Cobra dependency; also used directly for flag introspection |
github.com/argusdusty/Ferret |
Inverted-suffix index | Substring search over ~35k Unicode character names |
github.com/atotto/clipboard |
Clipboard I/O | -c flag: copy result to clipboard |
github.com/mattn/go-runewidth |
Terminal cell width | Correct column alignment in table output |
github.com/mattn/go-shellwords |
Shell word splitting | --argv flag: re-parse arguments |
go.pennock.tech/tabular |
Table rendering | -v verbose table output |
golang.org/x/net |
IDN / punycode | x-puny command |
golang.org/x/text |
Unicode normalisation | NFC/NFD handling |
github.com/liquidgecka/testlib |
Test assertion helpers | Test-only |
No MCP SDK is used; internal/mcpstdio is ~200 lines of hand-rolled code.
go test ./... # all tests
go test ./internal/mcpstdio/ # MCP protocol tests
go test ./commands/agent/mcpserver/ # MCP tool handler tests
go test ./internal/uformat/ # formatting helper tests
go test ./unicode/ # unicode data tests
go test ./resultset/ # CLI rendering testsThe util/mcp_test_driver script is a Python REPL for interactively testing
the MCP server end-to-end. It builds the binary, performs the MCP handshake,
and lets you call tools from a prompt:
./util/mcp_test_driver
mcp> list
mcp> unicode_lookup_char char=β
mcp> unicode_search query=snowman# Build
go build -o character .
# Run
./character name β
./character named -Jj CHECK MARK
./character agent mcp # start MCP server on stdio
# Regenerate data (after updating Unicode source files or generators)
go generate ./...
# Format
gofmt -w .