Skip to content

Conversation

@rdhyee
Copy link
Contributor

@rdhyee rdhyee commented Jan 14, 2026

Summary

  • Add small, hand-crafted example datasets to help understand the iSamples PQG format
  • Same data represented in JSON, CSV, and all three parquet formats (export, narrow, wide)

Dataset Overview

Domain: Geological rock samples from Mount Rainier volcanic monitoring project

Entities:

  • 3 MaterialSampleRecords (samples)
  • 3 SamplingEvents (collection/preparation events)
  • 2 GeospatialCoordLocations (coordinates)
  • 1 SamplingSite (Mount Rainier Summit Area)
  • 1 Agent (Jane Smith, collector)

Relationships demonstrated:

  • Sample → produced_by → SamplingEvent (how samples are created)
  • Sample → derivedFrom → Sample (parent/child relationship)
  • SamplingEvent → sample_location → GeospatialCoordLocation
  • SamplingEvent → sampling_site → SamplingSite
  • SamplingSite → site_location → GeospatialCoordLocation

Files Added

pqg/examples/minimal/
├── README.md                      # Explains files, queries, format comparison
├── json/
│   ├── 1_sample.json              # Single MaterialSampleRecord
│   └── 3_samples.json             # Three related samples
├── csv/
│   ├── samples.csv                # MaterialSampleRecords
│   ├── events.csv                 # SamplingEvents
│   ├── locations.csv              # GeospatialCoordLocations
│   ├── sites.csv                  # SamplingSites
│   ├── agents.csv                 # Agents
│   └── edges.csv                  # Relationships (narrow format)
└── parquet/
    ├── minimal_export.parquet     # Export format (3 rows, nested)
    ├── minimal_narrow.parquet     # Narrow format (21 rows, with edges)
    └── minimal_wide.parquet       # Wide format (10 rows, p__* columns)

Test plan

  • JSON files validate against iSamplesSchemaCore1.0.json
  • Parquet files readable with DuckDB
  • Example queries in README execute correctly
  • Entity counts match across formats

🤖 Generated with Claude Code

Hand-crafted small examples to help understand the iSamples PQG format:

- JSON: 1-sample and 3-sample examples (validated against schema)
- CSV: Flattened entity files (samples, events, locations, sites, agents, edges)
- Parquet: Same data in all 3 formats:
  - Export (3 rows, nested structs)
  - Narrow (21 rows, explicit edge rows)
  - Wide (10 rows, p__* columns)

Includes README with:
- Entity relationship diagram
- Example queries for each format
- Format comparison table

Idea from meeting with Stephen Richard - small examples make format
differences much easier to understand.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant