Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,9 @@ debug/
trace.out
*.out
out/

# Temporary files
tmp/

# Large test files (kept locally, not in repo)
testdata/**/*_large.*
96 changes: 68 additions & 28 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,22 +24,47 @@ make check

```
imx/
├── *.go # Public API (api.go, config.go, extractor.go, types.go, tags.go)
├── cmd/imx/ # CLI tool
├── examples/ # Usage examples (basic & advanced)
├── *.go # Public API (api.go, config.go, extractor.go, types.go, tags.go)
├── cmd/imx/ # CLI tool
│ ├── filter/ # Tag filtering logic
│ ├── output/ # Output formatters (JSON, CSV, Table, Text, Summary)
│ ├── processor/ # File processing
│ ├── ui/ # CLI interface
│ └── util/ # Utilities
├── examples/ # Usage examples
├── internal/
│ ├── format/ # Container format parsers (JPEG, etc.)
│ └── meta/ # Metadata parsers (EXIF, IPTC, XMP, ICC)
├── testdata/goldens/ # Test images with expected metadata
└── Makefile # Build automation
│ ├── binary/ # Binary reading helpers
│ ├── bufpool/ # Buffer pool for performance
│ ├── parser/ # Unified parser architecture
│ │ ├── cr2/ # Canon RAW parser
│ │ ├── flac/ # FLAC audio parser
│ │ ├── gif/ # GIF parser
│ │ ├── heic/ # HEIC/HEIF parser
│ │ ├── icc/ # ICC profile parser
│ │ ├── id3/ # ID3/MP3 parser
│ │ ├── iptc/ # IPTC metadata parser
│ │ ├── jpeg/ # JPEG parser
│ │ ├── mp4/ # MP4/M4A parser
│ │ ├── png/ # PNG parser
│ │ ├── tiff/ # TIFF parser
│ │ ├── webp/ # WebP parser
│ │ └── xmp/ # XMP parser
│ └── testing/ # Shared test utilities
├── testdata/ # Test files for all formats
│ ├── jpeg/, png/, gif/ # Image formats
│ ├── flac/, mp3/, mp4/ # Audio/video formats
│ └── goldens/ # Expected metadata outputs
└── Makefile # Build automation
```

### Architecture

Three-layer pipeline:
1. **Format Layer** - Extracts raw metadata blocks from container formats
2. **Meta Layer** - Parses raw blocks into structured tags
3. **API Layer** - Provides user-facing types and functions
**Unified Parser Model**:
- All parsers implement `parser.Parser` interface
- Each parser is stateless and thread-safe
- Uses `io.ReaderAt` for efficient random access
- Returns `[]parser.Directory` with structured tags
- 100% test coverage for all parsers

## Development Guidelines

Expand Down Expand Up @@ -91,17 +116,26 @@ Closes #45

### Adding a New Parser

**Metadata Parser:**
1. Create package in `internal/meta/<spec>/`
2. Implement `meta.Parser` interface
3. Register in `extractor.go`
4. Add tests with 100% coverage

**Format Parser:**
1. Create package in `internal/format/<format>/`
2. Implement `format.Parser` interface
3. Register in `extractor.go`
4. Add tests with 100% coverage
1. Create package in `internal/parser/<format>/`
2. Implement the `parser.Parser` interface:
```go
type Parser interface {
Name() string
Detect(r io.ReaderAt) bool
Parse(r io.ReaderAt) ([]Directory, *ParseError)
}
```
3. Make parser stateless and thread-safe (no struct fields that store state)
4. Use `io.ReaderAt` for efficient random access
5. Add comprehensive tests:
- Unit tests for all functions
- Fuzz tests (`FuzzParser`)
- Benchmark tests
- Concurrent access tests
- Target: 100% test coverage
6. Add constants file if you have 10+ magic numbers
7. Document the format structure in package comments
8. Register parser in the main extractor

## Core Principles

Expand All @@ -124,18 +158,24 @@ func parse(data []byte) error {
}
```

### Streaming Only
### Efficient I/O

Use `bufio.Reader` for parsing. Never load entire files into memory:
Use `io.ReaderAt` for parsing. Never load entire files into memory:

```go
// Good
func Parse(r *bufio.Reader) ([]Block, error)
// Good - Random access without loading entire file
func Parse(r io.ReaderAt) ([]Directory, *ParseError)

// Bad
func Parse(data []byte) ([]Block, error)
// Bad - Loads entire file into memory
func Parse(data []byte) ([]Directory, *ParseError)
```

**Benefits of `io.ReaderAt`**:
- Random access to any file position
- No memory copying
- Thread-safe for concurrent reads
- Works with files, byte slices, and network streams

### Validate Sizes

Always validate sizes before allocating to prevent attacks:
Expand Down
36 changes: 17 additions & 19 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,6 @@ build:
$(GOBUILD) $(ALL_PKGS)
cd cmd/imx && $(GOBUILD) -o ../../$(BIN_DIR)/imx .
$(GOBUILD) -o $(BIN_DIR)/basic ./examples/basic
$(GOBUILD) -o $(BIN_DIR)/advanced ./examples/advanced
@echo "✓ Build complete"

# Run all tests with race detector
Expand Down Expand Up @@ -73,12 +72,9 @@ install:
# Generate coverage report for all packages (library + CLI)
coverage:
@echo "Running tests with coverage..."
@rm -f go.work go.work.sum
@go work init . ./cmd/imx
@$(GOTEST) -coverprofile=$(COVERAGE_FILE) -covermode=atomic ./... ./cmd/imx/...
$(GOTEST) -coverprofile=$(COVERAGE_FILE) -covermode=atomic ./...
@echo ""
@$(GOCMD) tool cover -func=$(COVERAGE_FILE) | tail -1
@rm -f go.work go.work.sum

# Generate HTML coverage report
coverage-html: coverage
Expand All @@ -91,28 +87,30 @@ coverage-html: coverage
# Run basic example
example: build
@echo "Running example..."
./$(BIN_DIR)/imx testdata/goldens/jpeg/google_iptc.jpg
./$(BIN_DIR)/imx testdata/jpeg/google_iptc.jpg

# Run benchmarks
bench:
@echo "Running benchmarks..."
$(GOTEST) -bench=. -benchmem -benchtime=2s $(ALL_PKGS)
cd cmd/imx && $(GOTEST) -bench=. -benchmem -benchtime=2s ./...
$(GOTEST) -run=^$$ -bench=. -benchmem -benchtime=2s $(ALL_PKGS)
cd cmd/imx && $(GOTEST) -run=^$$ -bench=. -benchmem -benchtime=2s ./...

# Run fuzz tests
fuzz:
@echo "Running fuzz tests..."
@$(GOTEST) -fuzz='^FuzzJPEGParse$$' -fuzztime=10s ./internal/format/jpeg
@$(GOTEST) -fuzz='^FuzzJPEGDetect$$' -fuzztime=10s ./internal/format/jpeg
@$(GOTEST) -fuzz='^FuzzEXIFParse$$' -fuzztime=10s ./internal/meta/exif
@$(GOTEST) -fuzz='^FuzzEXIFParseIFD$$' -fuzztime=10s ./internal/meta/exif
@$(GOTEST) -fuzz='^FuzzIPTCParse$$' -fuzztime=10s ./internal/meta/iptc
@$(GOTEST) -fuzz='^FuzzIPTCParseIPTCIIM$$' -fuzztime=10s ./internal/meta/iptc
@$(GOTEST) -fuzz='^FuzzXMPParse$$' -fuzztime=10s ./internal/meta/xmp
@$(GOTEST) -fuzz='^FuzzXMPParsePacket$$' -fuzztime=10s ./internal/meta/xmp
@$(GOTEST) -fuzz='^FuzzICCParse$$' -fuzztime=10s ./internal/meta/icc
@$(GOTEST) -fuzz='^FuzzICCParseHeader$$' -fuzztime=10s ./internal/meta/icc
@$(GOTEST) -fuzz='^FuzzICCParseTagTable$$' -fuzztime=10s ./internal/meta/icc
@$(GOTEST) -fuzz='^FuzzCR2Parse$$' -fuzztime=5s ./internal/parser/cr2
@$(GOTEST) -fuzz='^FuzzFLACParse$$' -fuzztime=5s ./internal/parser/flac
@$(GOTEST) -fuzz='^FuzzGIFParse$$' -fuzztime=5s ./internal/parser/gif
@$(GOTEST) -fuzz='^FuzzHEICParse$$' -fuzztime=5s ./internal/parser/heic
@$(GOTEST) -fuzz='^FuzzICCParse$$' -fuzztime=5s ./internal/parser/icc
@$(GOTEST) -fuzz='^FuzzID3Parse$$' -fuzztime=5s ./internal/parser/id3
@$(GOTEST) -fuzz='^FuzzIPTCParse$$' -fuzztime=5s ./internal/parser/iptc
@$(GOTEST) -fuzz='^FuzzJPEGParse$$' -fuzztime=5s ./internal/parser/jpeg
@$(GOTEST) -fuzz='^FuzzMP4Parse$$' -fuzztime=5s ./internal/parser/mp4
@$(GOTEST) -fuzz='^FuzzPNGParse$$' -fuzztime=5s ./internal/parser/png
@$(GOTEST) -fuzz='^FuzzTIFFParse$$' -fuzztime=5s ./internal/parser/tiff
@$(GOTEST) -fuzz='^FuzzWebPParse$$' -fuzztime=5s ./internal/parser/webp
@$(GOTEST) -fuzz='^FuzzXMPParse$$' -fuzztime=5s ./internal/parser/xmp
@echo "✓ All fuzz tests complete"

# Show help
Expand Down
Loading
Loading