Skip to content

Commit 7693d48

Browse files
committed
readme update
1 parent 217bac3 commit 7693d48

1 file changed

Lines changed: 25 additions & 3 deletions

File tree

README.md

Lines changed: 25 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,19 @@ Module conversion produces three standardized parquet files:
8585
- **State**: `protective`, `risk`, or `neutral`
8686
- **Genotype**: List of 2 alleles, alphabetically sorted
8787

88-
Converted datasets are uploaded to the [`just-dna-seq`](https://huggingface.co/just-dna-seq) organization on HuggingFace Hub. See the [Hugging Face Module Consumption Guide](docs/HF_MODULES_CONSUMPTION.md) for details on how to use these modules in your own pipelines.
88+
Converted datasets are uploaded to the [`just-dna-seq`](https://huggingface.co/just-dna-seq) organization on HuggingFace Hub.
89+
90+
## Documentation
91+
92+
Detailed documentation is available in the [`docs/`](docs/) folder:
93+
94+
| Document | Description |
95+
|----------|-------------|
96+
| [Modules Schema](docs/modules_schema.md) | Unified annotation schema specification (annotations, studies, weights) |
97+
| [Dagster Modules Pipeline](docs/DAGSTER_MODULES_PIPELINE.md) | Overview of all module conversion pipelines |
98+
| [Dagster LongevityMap Pipeline](docs/DAGSTER_LONGEVITYMAP_PIPELINE.md) | Detailed LongevityMap pipeline with genotype expansion |
99+
| [Dagster Ensembl Pipeline](docs/DAGSTER_ENSEMBL_PIPELINE.md) | Ensembl VCF download and conversion pipeline |
100+
| [HuggingFace Consumption Guide](docs/HF_MODULES_CONSUMPTION.md) | How to use converted modules from HuggingFace Hub |
89101

90102
## Package Structure
91103

@@ -98,14 +110,18 @@ src/prepare_annotations/
98110
├── cli.py # Typer CLI entrypoint
99111
100112
├── core/ # Core utilities
113+
│ ├── config.py # Pydantic config classes
114+
│ ├── dagster_configs.py # Dagster-specific configurations
115+
│ ├── dagster_io_managers.py # Custom IO managers
101116
│ ├── io.py # VCF/Parquet I/O
102117
│ ├── models.py # Pydantic models
103118
│ ├── paths.py # Path helpers
104-
│ └── runtime.py # Profiling, environment
119+
│ └── runtime.py # Resource tracking and profiling
105120
106121
├── assets/ # Dagster assets
107122
│ ├── ensembl.py # Ensembl VCF pipeline
108-
│ └── modules.py # OakVar module conversion
123+
│ ├── modules.py # OakVar module conversion
124+
│ └── checks.py # Asset validation checks
109125
110126
├── downloaders/ # Download utilities
111127
│ ├── vcf.py # VCF download
@@ -116,6 +132,12 @@ src/prepare_annotations/
116132
│ └── dataset_cards.py # Dataset card templates
117133
118134
└── converters/ # OakVar module converters
135+
├── longevitymap.py # LongevityMap converter
136+
├── lipidmetabolism.py # Lipid metabolism converter
137+
├── vo2max.py # VO2max converter
138+
├── superhuman.py # Superhuman converter
139+
├── coronary.py # Coronary disease converter
140+
└── drugs.py # Pharmacogenomics converter
119141
```
120142

121143
## Testing

0 commit comments

Comments
 (0)