diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index c340f25..8007bd3 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -38,7 +38,6 @@ jobs:
             ubuntu-24.04,
             macos-13,
             windows-2022,
-            windows-2019,
           ]
         python-version: ["3.11"]
     runs-on: ${{ matrix.os }}
diff --git a/.gitignore b/.gitignore
index 3eeb70f..cf9aeb9 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,8 +1,14 @@
-#mac files
+# claude
+CLAUDE.md
+.claude/
+claude_output/
+claude_logs/
+
+# mac files
 **/.DS_Store
 
 # Dataset directory
-data/
+./data/
 
 # logs
 **/logs/
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index 0961d95..e45b42f 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -1,4 +1,4 @@
-exclude: ^docs/|devcontainer.json|.*/snapshots/
+exclude: ^docs/|devcontainer.json|.*/snapshots/|mkdocs.yml
 default_stages: [commit]
 
 default_language_version:
@@ -62,6 +62,7 @@ repos:
     rev: v1.15.0
     hooks:
       - id: mypy
+        additional_dependencies: [types-requests]
 
   - repo: https://github.com/markdownlint/markdownlint
     rev: v0.12.0
diff --git a/docs/brentlab_yeastresources_collection.md b/docs/brentlab_yeastresources_collection.md
new file mode 100644
index 0000000..4f49a02
--- /dev/null
+++ b/docs/brentlab_yeastresources_collection.md
@@ -0,0 +1,378 @@
+# BrentLab Yeast Resources Collection
+
+This document describes the BrentLab yeast resources collection on HuggingFace as an example implementation of the [datacard specifications](huggingface_datacard.md). This collection demonstrates best practices for organizing transcription factor binding and perturbation datasets for *Saccharomyces cerevisiae*.
+
+## Collection Overview
+
+The BrentLab yeast resources collection contains 11 datasets related to yeast transcription factor binding and gene expression regulation:
+
+1. **barkai_compendium** - ChEC-seq binding data across multiple GEO series
+2. **callingcards** - Calling Cards transposon-based binding data
+3. **hackett_2020** - TF overexpression with nutrient limitation
+4. **harbison_2004** - ChIP-chip binding across 14 environmental conditions
+5. **hu_2007_reimand_2010** - TF knockout expression data
+6. **hughes_2006** - TF perturbation screen (overexpression and knockout)
+7. **kemmeren_2014** - TF deletion expression profiling
+8. **mahendrawada_2025** - ChEC-seq and nascent RNA-seq data
+9. **rossi_2021** - ChIP-exo binding data
+10. **yeast_comparative_analysis** - Cross-dataset comparative analyses
+11. **yeast_genome_resources** - Reference genomic features
+
+## Standardized Media Names
+
+The collection uses standardized media names to facilitate cross-dataset queries. When specifying media in datacards, use these canonical names:
+
+### Rich Media
+
+- **YPD** (Yeast extract Peptone Dextrose)
+  - Carbon source: 2% D-glucose
+  - Nitrogen sources: 1% yeast extract, 2% peptone
+  - Standard rich medium for yeast growth
+
+- **yeast_extract_peptone**
+  - Base medium without specified carbon source
+  - Used with galactose (YPGal) or raffinose (YPRaff)
+
+### Minimal/Defined Media
+
+- **minimal** or **minimal_glucose**
+  - Minimal defined medium with glucose as carbon source
+  - Nitrogen source varies by experiment
+
+- **synthetic_complete** or **synthetic_complete_dextrose**
+  - Defined medium with complete amino acid supplementation
+  - Carbon source: typically 2% D-glucose
+  - Nitrogen source: yeast nitrogen base + amino acid dropout mix
+
+- **synthetic_complete_minus_X**
+  - Synthetic complete medium lacking specific nutrient(s)
+  - Examples: `synthetic_complete_minus_thiamine`, `synthetic_complete_minus_phosphate`
+  - Used for nutrient deprivation experiments
+
+- **selective_medium**
+  - Defined medium for plasmid selection
+  - Specific composition varies by selection markers
+
+## Standardized Strain Backgrounds
+
+The collection primarily uses these strain backgrounds:
+
+- **BY4741** - MATa his3Δ1 leu2Δ0 met15Δ0 ura3Δ0
+  - Used in: hu_2007_reimand_2010, kemmeren_2014
+
+- **W303** - Common alternative strain background
+  - Used in: harbison_2004 (derivative Z1256)
+
+- **S288C** - Reference genome strain
+  - Used in: Various datasets
+
+Strain background can be specified as a string or detailed object:
+
+```yaml
+# Simple string
+experimental_conditions:
+  strain_background: BY4741
+
+# Detailed specification
+experimental_conditions:
+  strain_background:
+    genotype: BY4741
+    mating_type: MATa
+    markers:
+      - his3Δ1
+      - leu2Δ0
+      - met15Δ0
+      - ura3Δ0
+    source: Open_Biosystems
+    description: Knockout strains for nonessential transcription factors
+```
+
+## Standard Experimental Conditions
+
+### Growth Temperature
+
+Standard growth temperature across the collection is **30°C** unless otherwise noted.
+
+Exceptions:
+- **rossi_2021**: 25°C baseline with 37°C heat shock for some samples
+- **hu_2007_reimand_2010**: Heat shock at 39°C for heat shock response TFs
+- **callingcards**: the experiments are performed at room temperature (~22-25°C)
+
+### Growth Phase
+
+Common growth phase specifications:
+
+These labels are taken from the original publications. In some cases the OD600
+is noted
+
+- **early_log_phase**
+- **mid_log_phase**
+- **late_log_phase**
+- **stationary_phase** - eg barkai_compendium, which are allowed to grow overnight. The
+  cells are harvested at a very high density (OD600 4.0).
+
+Example:
+```yaml
+experimental_conditions:
+  growth_phase_at_harvest:
+    stage: mid_log_phase
+    od600: 0.6
+    od600_tolerance: 0.1
+```
+
+### Cultivation Methods
+
+Standard cultivation methods used:
+
+- **liquid_culture** - Standard batch culture in flasks
+- **batch** - Batch culture
+- **plate** - Growth on agar plates
+- **chemostat** - Continuous culture (hackett_2020)
+
+## Concentration Specifications
+
+**Always use `concentration_percent`** for all concentration specifications.
+Convert other units to percentage:
+
+- **mg/ml to percent**: divide by 10 (e.g., 5 mg/ml = 0.5%)
+- **g/L to percent**: divide by 10 (e.g., 6.71 g/L = 0.671%)
+- **Molar to percent**: convert using molecular weight
+  - Example: 100 nM rapamycin = 9.142e-6%
+
+### Examples from the Collection
+
+```yaml
+# Yeast nitrogen base: 6.71 g/L = 0.671%
+- compound: yeast_nitrogen_base
+  concentration_percent: 0.671
+
+# Alpha factor: 5 mg/ml = 0.5%
+- compound: alpha_factor_pheromone
+  concentration_percent: 0.5
+
+# Rapamycin: 100 nM = 9.142e-6%
+chemical_treatment:
+  compound: rapamycin
+  concentration_percent: 9.142e-6
+```
+
+## Field Naming Conventions
+
+The collection follows these field naming conventions:
+
+### Gene/Feature Identifiers
+
+- **regulator_locus_tag**: Systematic ID of regulatory factor (e.g., "YJR060W")
+- **regulator_symbol**: Common name of regulatory factor (e.g., "CBF1")
+- **target_locus_tag**: Systematic ID of target gene
+- **target_symbol**: Common name of target gene
+
+All locus tags and symbols join to **yeast_genome_resources** dataset.
+
+### Quantitative Measurements Examples
+
+Common measurement field names:
+
+- **effect**, **log2fc**, **log2_ratio** - Log fold change measurements
+- **pvalue**, **pval**, **p_value** - Statistical significance
+- **padj**, **adj_p_value** - FDR-adjusted p-values
+- **binding_score**, **peak_score** - Binding strength metrics
+- **enrichment** - Enrichment ratios
+
+### Experimental Metadata Examples
+
+- **sample_id** - Unique sample identifier (integer)
+- **db_id** - Legacy database identifier (deprecated, do not use)
+- **batch** - Experimental batch identifier
+- **replicate** - Biological replicate number
+- **time** - Timepoint in timecourse experiments
+
+## Dataset Type Usage Examples
+
+### genomic_features
+
+**yeast_genome_resources** provides reference annotations:
+- Gene coordinates and strand information
+- Systematic IDs (locus_tag) and common names (symbol)
+- Feature types (gene, ncRNA_gene, tRNA_gene, etc.)
+
+Used for joining regulator/target identifiers across all other datasets.
+
+### annotated_features
+
+Most common dataset type in the collection. Examples:
+
+- **hackett_2020**: TF overexpression with timecourse measurements
+- **harbison_2004**: ChIP-chip binding with condition field definitions
+- **kemmeren_2014**: TF deletion expression data
+- **mahendrawada_2025**: ChEC-seq binding scores
+
+Typical structure: regulator × target × measurements, with optional condition fields.
+
+### genome_map
+
+Position-level data, typically partitioned by sample or accession:
+
+- **barkai_compendium**: ChEC-seq pileup data partitioned by Series/Accession
+- **rossi_2021**: ChIP-exo 5' tag coverage partitioned by sample
+- **callingcards**: Transposon insertion density partitioned by batch
+
+### metadata
+
+Separate metadata configs or embedded metadata via `metadata_fields`:
+
+**Separate config example** (barkai_compendium):
+```yaml
+- config_name: GSE178430_metadata
+  dataset_type: metadata
+  applies_to: ["genomic_coverage"]
+```
+
+**Embedded metadata example** (harbison_2004):
+```yaml
+- config_name: harbison_2004
+  dataset_type: annotated_features
+  metadata_fields: ["regulator_locus_tag", "regulator_symbol", "condition"]
+```
+
+### comparative
+
+**yeast_comparative_analysis** provides cross-dataset analysis results:
+
+- **dto config**: Direct Target Overlap analysis comparing binding and perturbation experiments
+- Uses `source_sample` role for composite identifiers
+- Format: `"repo_id;config_name;sample_id"` (semicolon-separated)
+- Contains 8 quantitative measures: rank thresholds, set sizes, FDR, p-values
+- Partitioned by binding_repo_dataset and perturbation_repo_dataset
+
+**Composite Sample Identifiers**:
+Comparative datasets use composite identifiers to reference samples from other datasets:
+- `binding_id`: Points to a binding experiment (e.g., `BrentLab/callingcards;annotated_features;1`)
+- `perturbation_id`: Points to a perturbation experiment (e.g., `BrentLab/hackett_2020;hackett_2020;200`)
+
+**Typical structure**: source_sample_1 x source_sample_2 x ... x measurements
+
+**Use case**: Answer questions like "Which binding experiments show significant overlap with perturbation effects?"
+
+## Categorical Condition Definitions
+
+Many datasets define categorical experimental conditions using the `definitions` field.
+
+### harbison_2004 Environmental Conditions
+
+14 conditions with detailed specifications:
+- **YPD** (rich media baseline)
+- **SM** (amino acid starvation)
+- **RAPA** (rapamycin treatment)
+- **H2O2Hi**, **H2O2Lo** (oxidative stress)
+- **HEAT** (heat shock)
+- **GAL**, **RAFF** (alternative carbon sources)
+- And 6 more...
+
+Each condition definition includes media composition, temperature, growth phase, and treatments.
+
+### hackett_2020 Nutrient Limitations
+
+```yaml
+restriction:
+  definitions:
+    P:  # Phosphate limitation
+      media:
+        phosphate_source:
+          - compound: potassium_phosphate_monobasic
+            concentration_percent: 0.002
+    N:  # Nitrogen limitation
+      media:
+        nitrogen_source:
+          - compound: ammonium_sulfate
+            concentration_percent: 0.004
+    M:  # Undefined limitation
+      description: "Not defined in the paper"
+```
+
+### hu_2007_reimand_2010 Treatment Conditions
+
+```yaml
+heat_shock:
+  definitions:
+    true:
+      temperature_celsius: 39
+      duration_minutes: 15
+    false:
+      description: Standard growth conditions at 30°C
+```
+
+## Partitioning Strategies
+
+Large genome_map datasets use partitioning:
+
+**barkai_compendium** - Two-level partitioning:
+```yaml
+partitioning:
+  partition_by: ["Series", "Accession"]
+  path_template: "genome_map/*/*/part-0.parquet"
+```
+
+**callingcards** - Batch partitioning:
+```yaml
+partitioning:
+  enabled: true
+  partition_by: ["batch"]
+  path_template: "genome_map/batch={batch}/*.parquet"
+```
+
+## Collection-Wide Best Practices
+
+### 1. Omit unspecified fields with a comment
+
+`tfbpapi` will handle adding "unspecified" to fields which are not common across
+datasets.
+
+```yaml
+# CORRECT
+experimental_conditions:
+  temperature_celsius: 30
+  # cultivation_method is note noted in the paper and is omitted
+
+# INCORRECT
+experimental_conditions:
+  temperature_celsius: unspecified
+```
+
+### 2. Document Source Publications
+
+If the original paper used something like g/L, then convert that to
+`concentration_percent` and add a comment with the original value and units.
+
+```yaml
+carbon_source:
+  - compound: D-glucose
+    # Saldanha et al 2004: 10 g/L
+    concentration_percent: 1
+```
+
+### 3. Use Standard Field Roles
+
+Apply semantic roles consistently:
+- `regulator_identifier` - for regulator fields
+- `target_identifier` - for target fields
+- `quantitative_measure` - for measurements
+- `experimental_condition` - for condition fields
+- `genomic_coordinate` - for positional data
+
+### 4. Provide sample_id
+
+All annotated_features datasets should include `sample_id` to uniquely identify experimental samples. This enables cross-dataset joining and metadata management.
+
+### 5. Specify metadata_fields or applies_to
+
+For datasets with metadata, either:
+- Use `metadata_fields` to extract from the data itself, OR
+- Create separate metadata config with `applies_to` field
+
+### 6. Use Consistent Gene Identifiers
+
+All regulator/target identifiers must be joinable to **yeast_genome_resources**:
+- Use current systematic IDs (ORF names)
+- Include both locus_tag and symbol fields
+- Mark with appropriate roles
diff --git a/docs/datacard.md b/docs/datacard.md
new file mode 100644
index 0000000..cfab1f1
--- /dev/null
+++ b/docs/datacard.md
@@ -0,0 +1,6 @@
+# DataCard
+
+::: tfbpapi.datacard.DataCard
+    options:
+      show_root_heading: true
+      show_source: true
diff --git a/docs/errors.md b/docs/errors.md
new file mode 100644
index 0000000..6ba92ff
--- /dev/null
+++ b/docs/errors.md
@@ -0,0 +1,28 @@
+# Custom Exceptions
+
+## HfDataFetchError
+
+::: tfbpapi.errors.HfDataFetchError
+    options:
+      show_root_heading: true
+      show_source: true
+
+Raised when HuggingFace API requests fail during data fetching operations.
+
+## DataCardError
+
+::: tfbpapi.errors.DataCardError
+    options:
+      show_root_heading: true
+      show_source: true
+
+Base exception for DataCard operations.
+
+## DataCardValidationError
+
+::: tfbpapi.errors.DataCardValidationError
+    options:
+      show_root_heading: true
+      show_source: true
+
+Raised when dataset card validation fails during parsing or loading.
\ No newline at end of file
diff --git a/docs/fetchers.md b/docs/fetchers.md
new file mode 100644
index 0000000..2901a79
--- /dev/null
+++ b/docs/fetchers.md
@@ -0,0 +1,16 @@
+# Data Fetchers
+
+::: tfbpapi.fetchers.HfDataCardFetcher
+    options:
+      show_root_heading: true
+      show_source: true
+
+::: tfbpapi.fetchers.HfRepoStructureFetcher
+    options:
+      show_root_heading: true
+      show_source: true
+
+::: tfbpapi.fetchers.HfSizeInfoFetcher
+    options:
+      show_root_heading: true
+      show_source: true
diff --git a/docs/hf_cache_manager.md b/docs/hf_cache_manager.md
new file mode 100644
index 0000000..752b712
--- /dev/null
+++ b/docs/hf_cache_manager.md
@@ -0,0 +1,6 @@
+# HfCacheManager
+
+::: tfbpapi.hf_cache_manager.HfCacheManager
+    options:
+      show_root_heading: true
+      show_source: true
diff --git a/docs/huggingface_datacard.md b/docs/huggingface_datacard.md
new file mode 100644
index 0000000..d56c771
--- /dev/null
+++ b/docs/huggingface_datacard.md
@@ -0,0 +1,496 @@
+# HuggingFace Dataset Card Format
+
+This document describes the expected YAML metadata format for HuggingFace dataset
+repositories used with the tfbpapi package. The metadata is defined in the repository's
+README.md file, at the top in a yaml block, and provides structured information about
+the dataset configuration and contents.  
+
+This documentation is intended for developers preparing or augmenting a huggingface
+dataset repository to be compatible with tfbpapi. Before reading, please review the
+[BrentLab/hackett_2020](https://huggingface.co/datasets/BrentLab/hackett_2020/blob/main/README.md) 
+datacard as an example of a complete implementation of a simple repository. After
+reviewing Hackett 2020 and this documentation, it might be helpful to review a more
+complex example such as:
+
+- [BrentLab/barkai_compendium](https://huggingface.co/datasets/BrentLab/barkai_compendium):
+  This contains a `genome_map` partitioned dataset with separate metadata applied via
+  the `applies_to` field. 
+- [Brentlab/rossi_2021](https://huggingface.co/datasets/BrentLab/rossi_2021):
+  This contains multiple `annotated_features` datasets with embedded metadata
+- [Brentlab/yeast_genomic_features](https://huggingface.co/datasets/BrentLab/yeast_genomic_features):
+  This contains a simple `genomic_features` dataset used as a reference for other
+  datasets in the collection.
+
+## Dataset Types
+
+The `dataset_type` field is a property of each config (hierarchically under
+`config_name`). `tfbpapi` recognizes the following dataset types:
+
+### 1. `genomic_features`
+Static information about genomic features (genes, promoters, etc.)
+- **Use case**: Gene annotations, regulatory classifications, static feature data
+- **Structure**: One row per genomic feature
+- **Required fields**: Usually includes gene identifiers, coordinates, classifications
+
+### 2. `annotated_features`
+Quantitative data associated with genomic features. A field `sample_id` should exist
+to identify single experiments in a single set of conditions.
+- **Use case**: Expression data, binding scores, differential expression results
+- **Structure**: Each sample will have one row per genomic feature measured. The
+  role `quantitative_measure` should be used to identify measurement columns.
+- **Common fields**: `regulator_*`, `target_*` fields with the roles
+  `regulator_identifier` and `target_identifier` respectively. Fields with the role
+  `quantitative_measure` for measurements.
+
+### 3. `genome_map`
+Position-level data across genomic coordinates
+- **Use case**: Signal tracks, coverage data, genome-wide binding profiles
+- **Structure**: Position-value pairs, often large datasets
+- **Required fields**: `chr` (chromosome), `pos` (position), signal values
+
+### 4. `metadata`
+Experimental metadata and sample descriptions
+- **Use case**: Sample information, experimental conditions, protocol details. Note
+  that this can also include per-sample QC metrics. For cross-sample QC or analysis,
+  see [comparative](#5-comparative) below.
+- **Structure**: One row per sample
+- **Common fields**: Sample identifiers, experimental conditions, publication info
+- **Special field**: `applies_to` - Optional list of config names this metadata applies to
+
+### 5. `comparative`
+
+Quality control metrics, validation results, and cross-dataset analysis outputs.
+
+**Use cases**:
+- Cross-dataset quality assessments and validation metrics
+- Analysis results relating samples across datasets or repositories
+- Comparative analyses (e.g., binding vs expression correlation)
+
+**Structure**: One row represents an observation on 2 or more samples. Note that the
+  name of the column containing the sample references isn't specified. However, the
+  role and format of the sample references are strictly defined. See
+  [Defining Sample References](#defining-sample-references) below.
+
+#### Defining Sample References
+
+The name of the field which contains the sample reference is user-defined. However,
+the contents of that field, and its role, must be as follows:
+
+- **`source_sample`**: Fields containing composite sample identifiers. This must be in
+  the format `"repo_id;config_name;sample_id"`.
+
+```
+"repo_id;config_name;sample_id"
+```
+
+Examples:
+- `"BrentLab/harbison_2004;harbison_2004;CBF1_YPD"`
+- `"BrentLab/kemmeren_2014;kemmeren_2014;sample_42"`
+
+## Experimental Conditions
+
+Experimental conditions can be specified in three ways:
+1. **Top-level** `experimental_conditions`: Apply to all configs in the repository.
+  Use when experimental parameters are common across all datasets. This will occur
+  at the same level as `configs`
+2. **Config-level** `experimental_conditions`: Apply to a specific config
+  ([dataset](#dataset)). Use when certain datasets have experimental parameters that
+  are not shared by all other datasets in the [repository](#huggingface-repo), but
+  are common to all [samples](#sample) within that dataset.
+3. **Field-level** with `role: experimental_condition` ([feature-roles](#feature-roles)): For
+  per-sample or per-measurement variation in experimental conditions stored as
+  data columns. This is specified in the
+  `dataset_info.features` ([feature-definitions](#feature-definitions))
+  section of a config. `experimental_condition` fields which are categorical can are
+  specifically defined in [categorical fields with value definitions](#categorical-fields-with-value-definitions).
+
+The priority of experimental conditions is:
+
+field-level > config-level > top-level
+
+**Example of all three methods:**
+```yaml
+# Top-level experimental conditions (apply to all [datasets](#dataset) in the repo)
+experimental_conditions:
+  temperature_celsius: 30
+configs:
+- config_name: overexpression_data
+  description: TF overexpression perturbation data
+  dataset_type: annotated_features
+  # The overexpression_data [dataset](#dataset) has an additional experimental
+  # condition that is specific to this dataset
+  experimental_conditions:
+    strain_background: "BY4741"
+  data_files:
+    - split: train
+      path: overexpression.parquet
+  dataset_info:
+    features:
+      - name: time
+        dtype: float
+        description: Time point in minutes
+        role: experimental_condition
+      - name: mechanism
+        dtype: string
+        description: Induction mechanism (GEV or ZEV)
+        role: experimental_condition
+        definitions:
+          GEV:
+            perturbation_method:
+              type: inducible_overexpression
+              system: GEV
+              inducer: beta-estradiol
+              description: "Galactose-inducible estrogen receptor-VP16 fusion system"
+          ZEV:
+            perturbation_method:
+              type: inducible_overexpression
+              system: ZEV
+              inducer: beta-estradiol
+              description: >-
+                "Z3 (synthetic zinc finger)-estrogen receptor-VP16 fusion system"
+      - name: log2_ratio
+        dtype: float
+        description: Log2 fold change
+        role: quantitative_measure
+```
+
+## Feature Definitions
+
+Each config must include detailed feature definitions in `dataset_info.features`:
+```yaml
+dataset_info:
+  features:
+    - name: field_name           # Column name in the data
+      dtype: string              # Data type (string, int64, float64, etc.)
+      description: "Detailed description of what this field contains"
+      role: "target_identifier"  # Optional: semantic role of the feature
+```
+
+### Categorical Fields with Value Definitions
+
+For fields with `role: experimental_condition` that contain categorical values, you can
+provide structured definitions for each value using the `definitions` field. This allows
+machine-parsable specification of what each condition value means experimentally:
+```yaml
+- name: condition
+  dtype:
+    class_label:
+      names: ["standard", "heat_shock"]
+  role: experimental_condition
+  description: Growth condition of the sample
+  definitions:
+    standard:
+      media:
+        name: synthetic_complete
+        carbon_source:
+          - compound: D-glucose
+            concentration_percent: 2
+        nitrogen_source:
+          - compound: yeast_nitrogen_base
+            # lastname et al 2025 used 6.71 g/L
+            concentration_percent: 0.671
+            specifications:
+              - without_amino_acids
+              - without_ammonium_sulfate
+          - compound: ammonium_sulfate
+            # lastname et al 2025 used 5 g/L
+            concentration_percent: 0.5
+          - compound: amino_acid_dropout_mix
+            # lastname et al 2025 used 2 g/L
+            concentration_percent: 0.2
+    heat_shock:
+      temperature_celsius: 37
+      duration_minutes: 10
+```
+
+Each key in `definitions` must correspond to a possible value in the field.
+The structure under each value provides experimental parameters specific to that
+condition using the same nested format as `experimental_conditions` at config or
+top level.
+
+### Naming Conventions
+
+**Gene/Feature Identifiers:**
+- `(regulator/target)_locus_tag`: Systematic gene identifiers (e.g., "YJR060W"). Must
+  be able to join to a genomic_features dataset. If none is specific,
+  then the BrentLab/yeast_genomic_features is used
+- `(regulator/target)_symbol`: Standard gene symbols (e.g., "CBF1"). Must be able to
+  join to a genomic_features dataset. If none is specific,
+  then the BrentLab/yeast_genomic_features is used
+
+**Genomic Coordinates:**  
+Unless otherwise noted, assume that coordinates are 0-based, half-open intervals
+
+- `chr`: Chromosome identifier
+- `start`, `end`: Genomic coordinates
+- `pos`: Single position
+- `strand`: Strand information (+ or -)
+
+## Feature Roles
+
+The optional `role` field provides semantic meaning to features, especially useful
+for annotated_features datasets. The following roles are recognized by tfbpapi.
+**NOTE** `experimental_condition` is a reserved role with additional behavior
+as described above.
+
+## Partitioned Datasets
+
+For large datasets (eg most genome_map datasets), use partitioning:
+
+```yaml
+dataset_info:
+  partitioning:
+    enabled: true
+    partition_by: ["accession"]  # Partition column(s)
+    path_template: "data/accession={accession}/*.parquet"
+```
+
+This allows efficient querying of subsets without loading the entire dataset.
+
+## Metadata 
+
+### Metadata Relationships with `applies_to`
+
+For metadata configs, you can explicitly specify which other configs the metadata
+applies to using the `applies_to` field. This provides more control than automatic
+type-based matching.
+
+```yaml
+configs:
+# Data configs
+- config_name: genome_map_data
+  dataset_type: genome_map
+  # ... rest of config
+
+- config_name: binding_scores
+  dataset_type: annotated_features
+  # ... rest of config
+
+- config_name: expression_data
+  dataset_type: annotated_features
+  # ... rest of config
+
+# Metadata config that applies to multiple data configs
+- config_name: repo_metadata
+  dataset_type: metadata
+  applies_to: ["genome_map_data", "binding_scores", "expression_data"]
+  # ... rest of config
+```
+
+### Embedded Metadata with `metadata_fields`
+
+When no explicit metadata config exists, you can extract metadata directly from the
+dataset's own files using the `metadata_fields` field. This specifies which fields
+should be treated as metadata.
+
+### Single File Embedded Metadata
+
+For single parquet files, the system extracts distinct values using `SELECT DISTINCT`:
+
+```yaml
+- config_name: binding_data
+  dataset_type: annotated_features
+  metadata_fields: ["regulator_symbol", "experimental_condition"]
+  data_files:
+  - split: train
+    path: binding_measurements.parquet
+  dataset_info:
+    features:
+    - name: regulator_symbol
+      dtype: string
+      description: Transcription factor name
+    - name: experimental_condition
+      dtype: string
+      description: Experimental treatment
+    - name: binding_score
+      dtype: float64
+      description: Quantitative measurement
+```
+
+### Partitioned Dataset Embedded Metadata
+
+For partitioned datasets, partition values are extracted from directory structure:
+
+```yaml
+- config_name: genome_map_data
+  dataset_type: genome_map
+  metadata_fields: ["run_accession", "regulator_symbol"]
+  data_files:
+  - split: train
+    path: genome_map/accession=*/regulator=*/*.parquet
+  dataset_info:
+    features:
+    - name: chr
+      dtype: string
+      description: Chromosome
+    - name: pos
+      dtype: int32
+      description: Position
+    - name: signal
+      dtype: float32
+      description: Signal intensity
+    partitioning:
+      enabled: true
+      partition_by: ["run_accession", "regulator_symbol"]
+```
+
+## Data File Organization
+
+### Single Files
+```yaml
+data_files:
+- split: train
+  path: single_file.parquet
+```
+
+### Multiple Files/Partitioned Data
+```yaml
+data_files:
+- split: train
+  path: data_directory/*/*.parquet  # Glob patterns supported
+```
+
+## Complete Example Structure
+
+```yaml
+license: mit
+language: [en]
+tags: [biology, genomics, transcription-factors]
+pretty_name: "Example Genomics Dataset"
+size_categories: [100K<n<1M]
+
+configs:
+- config_name: genomic_features
+  description: Gene annotations and regulatory features
+  dataset_type: genomic_features
+  data_files:
+  - split: train
+    path: features.parquet
+  dataset_info:
+    features:
+    - name: gene_id
+      dtype: string
+      description: Systematic gene identifier
+    - name: chr
+      dtype: string
+      description: Chromosome name
+    - name: start
+      dtype: int64
+      description: Gene start position
+
+- config_name: binding_data
+  description: Transcription factor binding measurements
+  dataset_type: annotated_features
+  default: true
+  data_files:
+  - split: train
+    path: binding.parquet
+  dataset_info:
+    features:
+    - name: regulator_symbol
+      dtype: string
+      description: Transcription factor name
+      role: regulator_identifier
+    - name: target_locus_tag
+      dtype: string
+      description: Target gene systematic identifier
+      role: target_identifier
+    - name: target_symbol
+      dtype: string
+      description: Target gene name
+      role: target_identifier
+    - name: binding_score
+      dtype: float64
+      description: Quantitative binding measurement
+      role: quantitative_measure
+
+- config_name: experiment_metadata
+  description: Experimental conditions and sample information
+  dataset_type: metadata
+  applies_to: ["genomic_features", "binding_data"]
+  data_files:
+  - split: train
+    path: metadata.parquet
+  dataset_info:
+    features:
+    - name: sample_id
+      dtype: string
+      description: Unique sample identifier
+    - name: experimental_condition
+      dtype: string
+      description: Experimental treatment or condition
+    - name: publication_doi
+      dtype: string
+      description: DOI of associated publication
+```
+
+## Terms and definitions
+
+### field/feature/attribute/column
+In a collection of samples (see below), the fields record information about the
+record. For example, if there are two samples each of which report results for 6000
+genes and the way in which the samples differ is by growth media, then growth_media
+would be a feature with two levels, eg YPD and SC. If the two samples are stored in
+the same parquet file, then there would be a column where the entry for all 6000
+rows of the first sample would be YPD and the entry for all 6000 rows of the second
+sample would be SC.
+
+### record/row
+A row in a table, or a single observation in a single sample (see below).
+
+### metadata
+Data about data. However, there are multiple objects to which metadata is attached in
+our usage, in particular at the dataset level and at the repo level (see below for
+those terms).
+
+### sample
+The result of a single biological experiment. For example, if a given dataset has 20
+regulators, in 3 replicates in 2 conditions, then there would be 20×3×2 samples.
+If the way the results are reported is over 6000 genes, then we would expect all
+20×3×2 of those samples to have 6000 records.
+
+### huggingface repo
+HuggingFace is a thin layer on top of GitHub. HuggingFace repos are GitHub repos with
+additional functionality.
+
+### datacard
+A README file in the HuggingFace repo. In HuggingFace, this is called a datacard and
+has an additional YAML section at the top. This YAML section stores information on
+the repo and is extensible. It is in this YAML section that we record a defined set
+of attributes and features that allow us to search/filter/subset the data in the
+collection (see below). See the datacard format documentation for a full description.
+
+### dataset
+In our HuggingFace repos, we store one or more datasets. These datasets have
+defined types. In general, we try to refer to datasets by the first author and year
+of the paper from which they originate, eg 'Mahendrawada 2025'. However, the
+distinction between a dataset and a repo can be complicated, as in the case of
+Mahendrawada 2025 there is ChEC-seq, ChIP-seq and RNA-seq data. Each of those may be
+provided in multiple datasets, eg one which was reported by the authors, and another
+reprocessed in our lab. A dataset should refer to a single one of those collections
+and may require further specification beyond the first author's name and year published.
+
+### huggingface collection
+HuggingFace allows you to group repositories together, which is what we are doing
+with all repos storing data related to the yeast database project.
+
+### regulator
+A superset that includes "TF" or "transcription factor". These are proteins which
+are assayed for their effect on gene expression.
+
+### target
+Genes on which the regulator's effect is measured.
+
+### tfbpapi
+A Python package which provides the interface to the HuggingFace collection.
+
+### active set (of samples)
+In order to conduct analysis, a user will need to define a set of samples. A sample
+(see definition above) is defined by the metadata features, eg regulator_locus_tag.
+If the user is interested in all datasets in which this regulator exists, then the
+active set would be the set of samples, across the entire collection (see HuggingFace
+collection above), with this regulator_locus_tag. The user may choose to filter on
+additional features in order to further refine the active set (eg, if a different
+dataset has 2 conditions for that regulator, then the user may wish to only retain
+1 of those conditions in their active set. They may wish to completely exclude a
+different dataset, etc).
\ No newline at end of file
diff --git a/docs/index.md b/docs/index.md
new file mode 100644
index 0000000..44b9d63
--- /dev/null
+++ b/docs/index.md
@@ -0,0 +1,90 @@
+# tfbpapi Documentation
+
+## Development Commands
+
+### Testing
+- Run tests: `poetry run pytest`
+- Run specific test: `poetry run pytest tfbpapi/tests/test_[module_name].py`
+- Run tests with coverage: `poetry run pytest --cov=tfbpapi`
+
+### Linting and Formatting
+- Run all pre-commit checks: `poetry run pre-commit run --all-files`
+- Format code with Black: `poetry run black tfbpapi/`
+- Sort imports with isort: `poetry run isort tfbpapi/`
+- Type check with mypy: `poetry run mypy tfbpapi/`
+- Lint with flake8: `poetry run flake8 tfbpapi/`
+
+### Installation
+- Install dependencies: `poetry install`
+- Install pre-commit hooks: `poetry run pre-commit install`
+
+## Architecture
+
+This is a Python package for interfacing with a collection of datasets hosted on Hugging Face. The modern architecture provides efficient querying, caching, and metadata management for genomic and transcriptomic datasets.
+
+### Core Components
+
+- **VirtualDB** (`tfbpapi/virtual_db.py`): Primary API for unified cross-dataset queries. Provides standardized query interface across heterogeneous datasets with varying experimental condition structures through external YAML configuration.
+
+- **DataCard** (`tfbpapi/datacard.py`): Interface for exploring HuggingFace dataset metadata without loading actual data. Enables dataset structure discovery, experimental condition exploration, and query planning.
+
+- **HfCacheManager** (`tfbpapi/hf_cache_manager.py`): Manages HuggingFace cache with intelligent downloading, DuckDB-based SQL querying, and automatic cleanup based on age/size thresholds.
+
+### Supporting Components
+
+- **Models** (`tfbpapi/models.py`): Pydantic models for dataset cards, configurations, features, and VirtualDB configuration (MetadataConfig, PropertyMapping, RepositoryConfig).
+
+- **Fetchers** (`tfbpapi/fetchers.py`): Low-level components for retrieving data from HuggingFace Hub (HfDataCardFetcher, HfRepoStructureFetcher, HfSizeInfoFetcher).
+
+### Data Types
+
+The datasets in this collection store the following types of genomic data:
+
+- **genomic_features**: Labels and information about genomic features (e.g., parsed GTF/GFF files)
+- **annotated_features**: Data quantified to features, typically genes
+- **genome_map**: Data mapped to genome coordinates
+- **metadata**: Additional sample information (cell types, experimental conditions, etc.)
+
+Data is stored in Apache Parquet format, either as single files or parquet datasets (directories of parquet files).
+
+### Error Handling
+
+- **errors.py** (`tfbpapi/errors.py`): Custom exception classes for dataset management including `HfDataFetchError`, `DataCardError`, and `DataCardValidationError`.
+
+## Configuration
+
+- Uses Poetry for dependency management
+- Python 3.11+ required
+- Black formatter with 88-character line length
+- Pre-commit hooks include Black, isort, flake8, mypy, and various file checks
+- pytest with comprehensive testing support
+- Environment variables: `HF_TOKEN`, `HF_CACHE_DIR`
+
+## Testing Patterns
+
+- Tests use pytest with modern testing practices
+- Integration tests for HuggingFace dataset functionality
+- Test fixtures for dataset operations
+- Comprehensive error handling testing
+
+### mkdocs
+
+#### Commands
+
+After building the environment with poetry, you can use `poetry run` or a poetry shell
+to execute the following:
+
+* `mkdocs new [dir-name]` - Create a new project.
+* `mkdocs serve` - Start the live-reloading docs server.
+* `mkdocs build` - Build the documentation site.
+* `mkdocs -h` - Print help message and exit.
+
+#### Project layout
+
+    mkdocs.yml    # The configuration file.
+    docs/
+        index.md  # The documentation homepage.
+        ...       # Other markdown pages, images and other files.
+
+To update the gh-pages documentation, use `poetry run mkdocs gh-deply`
+
diff --git a/docs/models.md b/docs/models.md
new file mode 100644
index 0000000..631494d
--- /dev/null
+++ b/docs/models.md
@@ -0,0 +1,41 @@
+# Pydantic Models
+
+## VirtualDB Configuration Models
+
+::: tfbpapi.models.MetadataConfig
+    options:
+      show_root_heading: true
+
+::: tfbpapi.models.RepositoryConfig
+    options:
+      show_root_heading: true
+
+::: tfbpapi.models.PropertyMapping
+    options:
+      show_root_heading: true
+
+## DataCard Models
+
+::: tfbpapi.models.DatasetCard
+    options:
+      show_root_heading: true
+
+::: tfbpapi.models.DatasetConfig
+    options:
+      show_root_heading: true
+
+::: tfbpapi.models.DatasetType
+    options:
+      show_root_heading: true
+
+::: tfbpapi.models.FeatureInfo
+    options:
+      show_root_heading: true
+
+::: tfbpapi.models.MetadataRelationship
+    options:
+      show_root_heading: true
+
+::: tfbpapi.models.ExtractedMetadata
+    options:
+      show_root_heading: true
diff --git a/docs/tutorials/cache_manager_tutorial.ipynb b/docs/tutorials/cache_manager_tutorial.ipynb
new file mode 100644
index 0000000..899b45f
--- /dev/null
+++ b/docs/tutorials/cache_manager_tutorial.ipynb
@@ -0,0 +1,1471 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# HfCacheManager Tutorial: Intelligent Cache Management for HuggingFace Datasets\n",
+    "\n",
+    "The `HfCacheManager` class provides sophisticated cache management capabilities for HuggingFace genomics datasets. It extends DataCard functionality with intelligent caching strategies and automated cache cleanup tools.\n",
+    "\n",
+    "This tutorial covers:\n",
+    "- Setting up HfCacheManager for cache management\n",
+    "- Understanding the 3-case metadata caching strategy\n",
+    "- Automated cache cleanup by age, size, and revision\n",
+    "- Cache monitoring and diagnostics\n",
+    "- Best practices for efficient cache management\n",
+    "- Integration with data loading workflows\n",
+    "\n",
+    "**Prerequisites**: Basic familiarity with DataCard (see datacard_tutorial.ipynb) and HuggingFace datasets."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. Setting Up HfCacheManager\n",
+    "\n",
+    "The HfCacheManager extends DataCard with cache management capabilities. Unlike DataCard which focuses on dataset exploration, HfCacheManager adds intelligent caching and cleanup features."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/home/chase/code/tfbp/tfbpapi/.venv/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
+      "  from .autonotebook import tqdm as notebook_tqdm\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "HfCacheManager initialized for: BrentLab/mahendrawada_2025\n",
+      "DuckDB connection: Active\n",
+      "Logger configured: Yes\n",
+      "Current HF cache size: 5.5G\n",
+      "Cached repositories: 11\n"
+     ]
+    }
+   ],
+   "source": [
+    "import duckdb\n",
+    "import logging\n",
+    "from tfbpapi.hf_cache_manager import HfCacheManager\n",
+    "from huggingface_hub import scan_cache_dir\n",
+    "\n",
+    "# Set up logging to see cache management activities\n",
+    "logging.basicConfig(level=logging.INFO)\n",
+    "logger = logging.getLogger(__name__)\n",
+    "\n",
+    "# Create DuckDB connection for metadata caching\n",
+    "conn = duckdb.connect(':memory:')\n",
+    "\n",
+    "# Initialize HfCacheManager\n",
+    "cache_manager = HfCacheManager(\n",
+    "    repo_id='BrentLab/mahendrawada_2025',\n",
+    "    duckdb_conn=conn,\n",
+    "    logger=logger\n",
+    ")\n",
+    "\n",
+    "print(f\"HfCacheManager initialized for: {cache_manager.repo_id}\")\n",
+    "print(f\"DuckDB connection: {'Active' if conn else 'None'}\")\n",
+    "print(f\"Logger configured: {'Yes' if logger else 'No'}\")\n",
+    "\n",
+    "# Show current cache status -- NOTE: this is from huggingface_hub,\n",
+    "# not from HfCacheManager\n",
+    "cache_info = scan_cache_dir()\n",
+    "print(f\"Current HF cache size: {cache_info.size_on_disk_str}\")\n",
+    "print(f\"Cached repositories: {len(cache_info.repos)}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "HFCacheInfo(size_on_disk=5525278711, repos=frozenset({CachedRepoInfo(repo_id='BrentLab/yeast_comparative_analysis', repo_type='dataset', repo_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_comparative_analysis'), size_on_disk=166072, nb_files=7, revisions=frozenset({CachedRevisionInfo(commit_hash='ac03d065bb493bc9dd7a77460fdaf6f954968b0b', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_comparative_analysis/snapshots/ac03d065bb493bc9dd7a77460fdaf6f954968b0b'), size_on_disk=166072, files=frozenset({CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_comparative_analysis/snapshots/ac03d065bb493bc9dd7a77460fdaf6f954968b0b/dto/binding_repo_dataset=harbison_2004-harbison_2004/perturbation_repo_dataset=hu_2007_reimand_2010-hu_2007_reimand_2010/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_comparative_analysis/blobs/30024adb7ad73354c2c962d168dee64c9601c5ee3dcfea72c24178a5638dc32b'), size_on_disk=4498, blob_last_accessed=1767824941.5591376, blob_last_modified=1767824941.5531375), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_comparative_analysis/snapshots/ac03d065bb493bc9dd7a77460fdaf6f954968b0b/dto/binding_repo_dataset=harbison_2004-harbison_2004/perturbation_repo_dataset=Hackett_2020-hackett_2020/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_comparative_analysis/blobs/0372fb07783a76efc38492cd738f4bf3eef044a6069f00a1d447cf2dae3e243c'), size_on_disk=20942, blob_last_accessed=1767824941.5571375, blob_last_modified=1767824940.9771397), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_comparative_analysis/snapshots/ac03d065bb493bc9dd7a77460fdaf6f954968b0b/dto/binding_repo_dataset=callingcards-annotated_features/perturbation_repo_dataset=Hackett_2020-hackett_2020/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_comparative_analysis/blobs/fd88d84c5e45f81dd6432d5e5a6f9056dd38cfb03c162442d1d5928151bcb2aa'), size_on_disk=100988, blob_last_accessed=1767824941.5591376, blob_last_modified=1767824941.0061395), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_comparative_analysis/snapshots/ac03d065bb493bc9dd7a77460fdaf6f954968b0b/dto/binding_repo_dataset=callingcards-annotated_features/perturbation_repo_dataset=kemmeren_2014-kemmeren_2014/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_comparative_analysis/blobs/8454d4bba35420099ec175dd81eeaacec4b1ec43adf8110b278974716c46c25c'), size_on_disk=19150, blob_last_accessed=1767824941.5591376, blob_last_modified=1767824940.9881396), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_comparative_analysis/snapshots/ac03d065bb493bc9dd7a77460fdaf6f954968b0b/dto/binding_repo_dataset=harbison_2004-harbison_2004/perturbation_repo_dataset=kemmeren_2014-kemmeren_2014/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_comparative_analysis/blobs/cbacede910cd43bf24ffdde99be52f8ec211c9b54ed1f0865034b67101ffcda7'), size_on_disk=10728, blob_last_accessed=1767824941.5591376, blob_last_modified=1767824940.9741397), CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_comparative_analysis/snapshots/ac03d065bb493bc9dd7a77460fdaf6f954968b0b/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_comparative_analysis/blobs/5c68356d43f7f19567f17bea53a021de93e97e3f'), size_on_disk=3660, blob_last_accessed=1767808305.8494713, blob_last_modified=1767808305.8474715), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_comparative_analysis/snapshots/ac03d065bb493bc9dd7a77460fdaf6f954968b0b/dto/binding_repo_dataset=callingcards-annotated_features/perturbation_repo_dataset=hu_2007_reimand_2010-hu_2007_reimand_2010/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_comparative_analysis/blobs/872b0b76a4eaa911c4319fc5103864b7f3d22ec5b25faeb551694985f69a8697'), size_on_disk=6106, blob_last_accessed=1767824941.5591376, blob_last_modified=1767824940.9561398)}), refs=frozenset({'main'}), last_modified=1767824941.5531375)}), last_accessed=1767824941.5591376, last_modified=1767824941.5531375), CachedRepoInfo(repo_id='BrentLab/yeast_genome_resources', repo_type='dataset', repo_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_genome_resources'), size_on_disk=114475, nb_files=7, revisions=frozenset({CachedRevisionInfo(commit_hash='15fdb72f8c31ae58f3e3ce7c90be279383743a2f', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_genome_resources/snapshots/15fdb72f8c31ae58f3e3ce7c90be279383743a2f'), size_on_disk=88406, files=frozenset({CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_genome_resources/snapshots/15fdb72f8c31ae58f3e3ce7c90be279383743a2f/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_genome_resources/blobs/645da8028fd6abe86ee305f76eab78cdf29be364'), size_on_disk=6130, blob_last_accessed=1755819093.2316637, blob_last_modified=1755819093.2306638), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_genome_resources/snapshots/15fdb72f8c31ae58f3e3ce7c90be279383743a2f/features/chr=chrII/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_genome_resources/blobs/d64638adb2290412a962abffe0fbd0546ff139b29064bbf313b0a768449454a5'), size_on_disk=82276, blob_last_accessed=1755812630.656903, blob_last_modified=1755812631.0999012)}), refs=frozenset(), last_modified=1755819093.2306638), CachedRevisionInfo(commit_hash='42beb28478e27c5d4c5bb2308b12100e6489ef07', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_genome_resources/snapshots/42beb28478e27c5d4c5bb2308b12100e6489ef07'), size_on_disk=6125, files=frozenset({CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_genome_resources/snapshots/42beb28478e27c5d4c5bb2308b12100e6489ef07/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_genome_resources/blobs/ab6cc475062a2767a41b2aaa006faf4d9d2d60d6'), size_on_disk=6125, blob_last_accessed=1759787821.8450553, blob_last_modified=1758155946.5549896)}), refs=frozenset({'main'}), last_modified=1758155946.5549896), CachedRevisionInfo(commit_hash='25b47a191e40728efc2437d906e84317f922b36b', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_genome_resources/snapshots/25b47a191e40728efc2437d906e84317f922b36b'), size_on_disk=5426, files=frozenset({CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_genome_resources/snapshots/25b47a191e40728efc2437d906e84317f922b36b/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_genome_resources/blobs/9cdf7b342b249f25f2c2cbe46a2e0a26ded4f692'), size_on_disk=5426, blob_last_accessed=1755815405.3308983, blob_last_modified=1755815405.3298984)}), refs=frozenset(), last_modified=1755815405.3298984), CachedRevisionInfo(commit_hash='7441b9a8c858332e730e7ddb9d255460ae698626', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_genome_resources/snapshots/7441b9a8c858332e730e7ddb9d255460ae698626'), size_on_disk=87714, files=frozenset({CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_genome_resources/snapshots/7441b9a8c858332e730e7ddb9d255460ae698626/features/chr=chrII/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_genome_resources/blobs/d64638adb2290412a962abffe0fbd0546ff139b29064bbf313b0a768449454a5'), size_on_disk=82276, blob_last_accessed=1755812630.656903, blob_last_modified=1755812631.0999012), CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_genome_resources/snapshots/7441b9a8c858332e730e7ddb9d255460ae698626/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_genome_resources/blobs/e63c2087b2e56dc15e8ac6cf5693c2391127b732'), size_on_disk=5438, blob_last_accessed=1755816785.70087, blob_last_modified=1755816785.6988702)}), refs=frozenset(), last_modified=1755816785.6988702), CachedRevisionInfo(commit_hash='a5ecb243ba27ca4b4d6ad1b60cc160c033ca2be6', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_genome_resources/snapshots/a5ecb243ba27ca4b4d6ad1b60cc160c033ca2be6'), size_on_disk=82276, files=frozenset({CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_genome_resources/snapshots/a5ecb243ba27ca4b4d6ad1b60cc160c033ca2be6/features/chr=chrII/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_genome_resources/blobs/d64638adb2290412a962abffe0fbd0546ff139b29064bbf313b0a768449454a5'), size_on_disk=82276, blob_last_accessed=1755812630.656903, blob_last_modified=1755812631.0999012)}), refs=frozenset(), last_modified=1755812631.0999012), CachedRevisionInfo(commit_hash='aae6e5ea8139e23940d11b4d7e77684c78bb7f6b', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_genome_resources/snapshots/aae6e5ea8139e23940d11b4d7e77684c78bb7f6b'), size_on_disk=4585, files=frozenset({CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_genome_resources/snapshots/aae6e5ea8139e23940d11b4d7e77684c78bb7f6b/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_genome_resources/blobs/4320f8f18620d988a5fc32872a671ace406fd0a2'), size_on_disk=4585, blob_last_accessed=1755814342.1923234, blob_last_modified=1755814342.1913235)}), refs=frozenset(), last_modified=1755814342.1913235), CachedRevisionInfo(commit_hash='6607f7be76546563db0a68a9365c0a858bb5f3a1', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_genome_resources/snapshots/6607f7be76546563db0a68a9365c0a858bb5f3a1'), size_on_disk=4495, files=frozenset({CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_genome_resources/snapshots/6607f7be76546563db0a68a9365c0a858bb5f3a1/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_genome_resources/blobs/a31c4647c502767c22b5fe725ec8de58686a02d4'), size_on_disk=4495, blob_last_accessed=1755813454.8008156, blob_last_modified=1755813454.7998157)}), refs=frozenset(), last_modified=1755813454.7998157)}), last_accessed=1759787821.8450553, last_modified=1758155946.5549896), CachedRepoInfo(repo_id='BrentLab/barkai_compendium', repo_type='dataset', repo_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium'), size_on_disk=3579068970, nb_files=504, revisions=frozenset({CachedRevisionInfo(commit_hash='a987ef37c72fcd07b18828d320bc4305480daade', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade'), size_on_disk=3579068970, files=frozenset({CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417941/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/30188c510d2d7041e1525e304edc8f25b569a1771cf00844cb27300a052441e8'), size_on_disk=7032333, blob_last_accessed=1756926363.2551925, blob_last_modified=1756926364.2891881), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417914/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/d062dbf660e1ef240ff8ef99ff2d486d9caaabda14575e6365ce6f725fa91ccf'), size_on_disk=8599250, blob_last_accessed=1756926358.2422144, blob_last_modified=1756926360.3412054), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381017/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/1d9b2dbb5079eca871182c2dce1d681b943f1362072c18e255a4bd6d1068b840'), size_on_disk=2436549, blob_last_accessed=1756926378.161128, blob_last_modified=1756926378.7931254), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381028/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/8d40caa48e7b9d32d2a0548732ef91c01ede9d0f4f6d74f045817e818c52c808'), size_on_disk=18141950, blob_last_accessed=1756926378.9691246, blob_last_modified=1756926778.2687466), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417928/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/97e74e01fd641bfef68b6206a3c62ab41b5c28d8ea8369ce4198df1b3387b266'), size_on_disk=2449472, blob_last_accessed=1756926360.9072027, blob_last_modified=1756926361.3542008), CachedFileInfo(file_name='GSE178430_metadata.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/GSE178430_metadata.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/be2c4af55a502d7fa22abb28afe6db347950327bde1bc7e97dbb68fdb0c59f3d'), size_on_disk=9398, blob_last_accessed=1756926861.9192877, blob_last_modified=1756926300.5344653), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417837/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/04d099809ca78bd87dcb9921398681ac64e827439049a37e97a71ba0828debfb'), size_on_disk=8348990, blob_last_accessed=1756926344.4012744, blob_last_modified=1756926345.6152692), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417625/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/96b04f28e74e25bd1502a4ea9aca8047bfa8462e61cced0d726af4a336e1a585'), size_on_disk=17384720, blob_last_accessed=1756926304.4014485, blob_last_modified=1756926306.2074406), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417735/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/634551506800bc76183ce179c8c8f324cb6f0f577f6cf0b629b74ce4f1e995d5'), size_on_disk=19171195, blob_last_accessed=1756926326.0393543, blob_last_modified=1756926329.4873393), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417627/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/4027ab6ecd21364ca365bbe7192797ed924c817e6edc841d750111b2e3f6f45a'), size_on_disk=21089243, blob_last_accessed=1756926305.0144458, blob_last_modified=1756926309.6264257), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417943/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/0e4191a403f5070636bcb281bd80d39e11c70fb502fa2b18150032d58f2e0740'), size_on_disk=5396199, blob_last_accessed=1756926363.7271905, blob_last_modified=1756926364.542187), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417673/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/535d074f70f4553e24f3171fe5cfbd1821635ce7ca6ef93f8665876de620d6c9'), size_on_disk=6559693, blob_last_accessed=1756926315.1844015, blob_last_modified=1756926317.0733933), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417720/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/961772a081bde1795af5caea9d7562e2b1f889981bd3c3f1c792ad9093a758d7'), size_on_disk=659095, blob_last_accessed=1756926324.1233625, blob_last_modified=1756926324.6243603), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417698/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/d240b652ebca64784e037cc8ffbda424e2c92f368c5bb9c730930f48f29facc2'), size_on_disk=1855826, blob_last_accessed=1756926321.2993748, blob_last_modified=1756926322.0703714), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417777/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/bd953fd8a3ee2a5c90382c1c9adc8091fbca801aa87f07a409b6dd5a9704421f'), size_on_disk=1267333, blob_last_accessed=1756926332.9623241, blob_last_modified=1756926333.6353211), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381019/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/3023e878b223b00b77d3411a6d20195b85aa7e61a8445fe725ff694c77816053'), size_on_disk=1022843, blob_last_accessed=1756926378.3191273, blob_last_modified=1756926378.864125), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417844/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/ae034bb207abc0c649fb60aae78e798d46647f53b4e0352c6510fb7c8339a234'), size_on_disk=5408445, blob_last_accessed=1756926345.5212696, blob_last_modified=1756926346.4342656), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417835/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/6245fc8d220aad556115d4b831f72ae6a9beac48ed6e2aba811284aa0ccab4f7'), size_on_disk=7888215, blob_last_accessed=1756926343.795277, blob_last_modified=1756926344.942272), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381023/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/22e3e0f4cd1f2c755ec1ac13b99d2d6c98d0158319478f2deca4e89aefc5e88e'), size_on_disk=3860195, blob_last_accessed=1756926378.5651264, blob_last_modified=1756926379.6101217), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417757/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/6c9c5dbd0513759ace8599b2542a1f99fe0c30feb318665ebdb55bfb111c8c67'), size_on_disk=5834465, blob_last_accessed=1756926330.011337, blob_last_modified=1756926330.7623336), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417778/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/54e8ea03e214967eb4a1c7b50a086c4d7083d2c9c95dd0419e545fb0e8f18422'), size_on_disk=9345257, blob_last_accessed=1756926333.1513233, blob_last_modified=1756926334.0903192), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417647/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/8c01c928abe00b5afa6a2292820695797391b3671cc110b9026f5fc5cb8564ae'), size_on_disk=2891716, blob_last_accessed=1756926310.1614234, blob_last_modified=1756926310.93942), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381060/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/1bb05aa71fe55db65fc3868b131854c1e6666dca8cc370874f517aa0756d21ac'), size_on_disk=9137992, blob_last_accessed=1756926778.9637427, blob_last_modified=1756926780.2307358), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417688/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/f21ccaca0f6782b6ce2b70b50c9dc93487b6aaec03db6c30d9cb0c9fc01dda42'), size_on_disk=16023102, blob_last_accessed=1756926318.5463867, blob_last_modified=1756926320.0283804), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380972/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/91a8106ca7ac2babcabd1549072b2b730909e1386f827e732428bff2bb5251cb'), size_on_disk=6712599, blob_last_accessed=1756926371.1721582, blob_last_modified=1756926372.0711544), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381041/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/a53caa1961d9245d82db10989b8a8d601760ccf26f67666d5558937830c51e31'), size_on_disk=6621865, blob_last_accessed=1756926774.784766, blob_last_modified=1756926775.9747593), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380925/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/fdf190d30101829153acfd62edd4d689ac65a7d737f0e80978c7d460819b5caa'), size_on_disk=9220100, blob_last_accessed=1756926364.6101868, blob_last_modified=1756926366.1331801), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417945/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/6689ba375c993d00d1f0e716d3bc39604d68df654a67b2518c56f4d54c7f4ca6'), size_on_disk=11289854, blob_last_accessed=1756926364.0191894, blob_last_modified=1756926364.9011855), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381016/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/0e597280417f8f8b5b476bc9d66033afc9412759f5e2f9bd9f3367932cc6f03f'), size_on_disk=820782, blob_last_accessed=1756926377.9791288, blob_last_modified=1756926378.4041271), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417748/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/db4c67c001a1f688f70918f0e25d3ddd996392b33b9f1144a14551bc0026d036'), size_on_disk=3246481, blob_last_accessed=1756926328.8123422, blob_last_modified=1756926329.6943383), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380949/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/65090ef93158269c98bc2e747e505cea9d460f0959d0b2957e58543cd1944163'), size_on_disk=2110872, blob_last_accessed=1756926368.705169, blob_last_modified=1756926369.142167), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417636/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/4a3b2c7076107dcc47a87179cecab9e2051350c94e951a7aadc2ff0cf6fdbaca'), size_on_disk=43479381, blob_last_accessed=1756926307.6494343, blob_last_modified=1756926311.4484177), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417747/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/7011e7ff4b64ca9c57bdab93cfcfc52938e7b0574b48417feb4f6185f6c118b9'), size_on_disk=4791771, blob_last_accessed=1756926328.7593424, blob_last_modified=1756926329.9543371), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381064/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/8569620c4d2efb8dd163aa69d8edf8f855b96eba637e754a126a88e6f0c73192'), size_on_disk=20598404, blob_last_accessed=1756926779.800738, blob_last_modified=1756926781.1587305), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417870/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/1ca1bc9825b47ee9ac59e53bc6ada3b1a9e8aba1b57c57c57947afae7926054b'), size_on_disk=10710319, blob_last_accessed=1756926349.7052515, blob_last_modified=1756926350.7552469), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417710/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/9837ecd3769eaaf407d939ff09d6ae74d832630b27353251942bd262cd70fc35'), size_on_disk=330457, blob_last_accessed=1756926323.098367, blob_last_modified=1756926323.797364), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417658/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/b6cf4c024743be14fcd5730f0ddbb960dbcd83d07adf8668639f9a1cdf195368'), size_on_disk=8754108, blob_last_accessed=1756926311.5764172, blob_last_modified=1756926313.1134105), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417663/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/cc7010c3ed70c3083d49d88c0f95dbe8f6eff873d4b5848eeae06a22df71a479'), size_on_disk=8938116, blob_last_accessed=1756926313.18441, blob_last_modified=1756926314.5674043), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417640/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/411d9146359847b5805450f67cbde4f1cbd51e3c2fae4bbcd6295db7426cb2ca'), size_on_disk=7000528, blob_last_accessed=1756926308.9124289, blob_last_modified=1756926310.0344238), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417779/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/918376b5cb0b63e3e531ac990854e5a7b461e7251f5ed562c5a92cbf6a5a0141'), size_on_disk=9459105, blob_last_accessed=1756926333.1593232, blob_last_modified=1756926334.2033186), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417946/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/ef43c84affe53bbdefcd70cb97a4a4d1e483824b8b6f91e3360a11094609f815'), size_on_disk=4853893, blob_last_accessed=1756926364.0141892, blob_last_modified=1756926365.0591848), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381006/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/4c4a265f9e3e8d373550ea58ca1cde6094c823f090336974fde3e7cee117fd99'), size_on_disk=1770007, blob_last_accessed=1756926376.6771345, blob_last_modified=1756926377.1011326), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417869/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/8dd08ab28007f90a664a6fbde23298ae2bb0ded917d5d75b1275c41e3eb5f7dd'), size_on_disk=16604892, blob_last_accessed=1756926349.6872516, blob_last_modified=1756926352.7202382), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417933/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/4fe347d88754646d39d93dd8a78e857d04d1903f7163a21495038b087d5f6b8a'), size_on_disk=12389872, blob_last_accessed=1756926361.7481992, blob_last_modified=1756926362.9061942), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417881/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/bf380975f31e362e8964000688bff364809bbe2e8e3efa9ceaa71253ca6beefb'), size_on_disk=8717095, blob_last_accessed=1756926351.848242, blob_last_modified=1756926352.9732373), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381063/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/1e4361a7d20f8e92e8efc238365e0ed06eb021243864321c93d4638fa21634c4'), size_on_disk=10916323, blob_last_accessed=1756926779.3497405, blob_last_modified=1756926780.685733), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417632/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/23c561fde2e6c2973ec1913373c343a5c49d6cc0dbb44ff21eb5aff87cd77d3a'), size_on_disk=18481828, blob_last_accessed=1756926306.2764404, blob_last_modified=1756926308.5744302), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381020/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/c85f953439e9f6918c7f7c8194bcbbebe0ef8a96070119813025a42bde8c08fe'), size_on_disk=897644, blob_last_accessed=1756926378.3451273, blob_last_modified=1756926378.7061257), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417732/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/f99cdb10c12a75e1d020125d68b5c5302263eaef050b5c6b0032e27a0ac9a23c'), size_on_disk=685742, blob_last_accessed=1756926325.7463555, blob_last_modified=1756926326.326353), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380958/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/1ef3849004b6c72205ea1b298cf6a586f1114061ca94925229eda191d4666885'), size_on_disk=3288900, blob_last_accessed=1756926369.617165, blob_last_modified=1756926370.303162), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417736/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/671a67d7275810840cafc9db32e43385c4ba61b007d702e7eb5cb244a0e4afa9'), size_on_disk=13855124, blob_last_accessed=1756926326.1693537, blob_last_modified=1756926327.261349), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381013/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/c831ca7cc9cfd2cc2ccd92a0168b2e14dedf2bde9e29780dfaf6d281b31e9462'), size_on_disk=4648281, blob_last_accessed=1756926377.4421313, blob_last_modified=1756926378.3451273), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417637/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/ca1a954c7b2aca56529daf9a89249ad4e6ec2373e29ac7c19d2af1533afc4263'), size_on_disk=2776397, blob_last_accessed=1756926307.8804333, blob_last_modified=1756926309.1794276), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417898/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/ced2c80d4f9d2aa28260b885a20b1c90b41d8eb54ee92f418ed592b97b954a9e'), size_on_disk=6422129, blob_last_accessed=1756926355.1262279, blob_last_modified=1756926356.0962236), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417786/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/fdd4770c9b3e18e86f9a46ff924e3a562e37b632d6f0e8ad11627956a3416e09'), size_on_disk=11354740, blob_last_accessed=1756926334.3243182, blob_last_modified=1756926335.4083135), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417680/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/aa9e972e7427b1db5556008756651933c8792891339607dee2ee231733072d1c'), size_on_disk=18443288, blob_last_accessed=1756926316.5853953, blob_last_modified=1756926321.4263742), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380988/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/27b6e71986cf2b7da20b64dbda0ba47e08753f7184ddd57ed288f11df88d21e1'), size_on_disk=16063760, blob_last_accessed=1756926373.7421472, blob_last_modified=1756926374.932142), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417784/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/fd3dd53cf34f48c235e4f771378f42a5605446bbc62d108a5637a042513fd75d'), size_on_disk=4826642, blob_last_accessed=1756926333.7233207, blob_last_modified=1756926334.576317), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417708/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/b58038866c334daaf757a086f99c3a5c5962b8563769ee0a3205bcdcbb0144ec'), size_on_disk=250863, blob_last_accessed=1756926323.0003674, blob_last_modified=1756926323.366366), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417893/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/8f3b9f2c711cb806350320b93dba528efd6433d0ca10c39e5e2dd6aa89a408bd'), size_on_disk=9896637, blob_last_accessed=1756926354.2372317, blob_last_modified=1756926355.535226), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417766/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/39da4f16b82cc0b5019d8526f2e3eed49de9760e319f8e839db621bf4c1ab021'), size_on_disk=6482957, blob_last_accessed=1756926331.2123318, blob_last_modified=1756926332.1133277), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417761/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/ba86b2041414034d38b95ccbf3c65957642c56b56914240999986ebc76b9e193'), size_on_disk=1635118, blob_last_accessed=1756926330.478335, blob_last_modified=1756926331.0933323), CachedFileInfo(file_name='GSE178430_metadata.parquet.md5', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/GSE178430_metadata.parquet.md5'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/9673758f13ec096ec4a89642bbe34d60975f5f05'), size_on_disk=61, blob_last_accessed=1756926300.2374666, blob_last_modified=1756926300.2994664), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417775/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/2ca9ba5e1394191206945f41168d555519fe3963d4c39f251c1ead57f2a323f4'), size_on_disk=1059891, blob_last_accessed=1756926332.3723266, blob_last_modified=1756926332.773325), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417612/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/ee40f7af39e729ee79ef18d6777bbd632b7cc7811041a32d54f4a02b42638464'), size_on_disk=3509045, blob_last_accessed=1756926301.5874608, blob_last_modified=1756926302.6274562), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417715/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/ac5effc2ec07a8fb49bac4d77b7d4129a18ae2180807874ea02d1c252a3b68bd'), size_on_disk=520286, blob_last_accessed=1756926323.8693635, blob_last_modified=1756926324.477361), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417641/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/355d8669b1d4c5576d8aa900582800fdf1819360f0db91d5e0299342b49884fb'), size_on_disk=6322689, blob_last_accessed=1756926309.3414268, blob_last_modified=1756926310.7974205), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417740/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/e91586589c967e7bcbd521989874b21c024da79365d0d9916ffc66b774043e01'), size_on_disk=11875677, blob_last_accessed=1756926326.8903506, blob_last_modified=1756926327.6843472), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380970/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/bdec4a51127eaf09abb5d25d7f3a732afc3b561105fd18ba1df44d04e6dd90a2'), size_on_disk=3545790, blob_last_accessed=1756926370.9301593, blob_last_modified=1756926371.5451567), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417852/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/43431a444b2761ed9b6fa4c7e6edfc8da31b64ff933408c517bc771e9f3e94dc'), size_on_disk=6967416, blob_last_accessed=1756926346.8272638, blob_last_modified=1756926347.8462594), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380999/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/acb2ee4711dab5800afb7ccbea72c8c80268b0a848723e7bd8ecf197decb982d'), size_on_disk=2605563, blob_last_accessed=1756926375.2281408, blob_last_modified=1756926376.1321368), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417723/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/8f397cef2d9c491fb034826727c63b3c224f6ed791c9b4dcf884c63c55c9961c'), size_on_disk=2800893, blob_last_accessed=1756926324.70836, blob_last_modified=1756926325.4453568), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381031/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/b4705ac4577c8e7b116d1a1b677967f69412b0e4716e0de7fa7ee97bc9be4d74'), size_on_disk=4551217, blob_last_accessed=1756926379.2931232, blob_last_modified=1756926776.2707577), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381004/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/225bf60f7c9796682210dc48d07a30d3dabab8ebc13abb7a84834279fa131020'), size_on_disk=5307353, blob_last_accessed=1756926376.2401364, blob_last_modified=1756926377.3751316), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417697/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/198a3cd31740f347f0308db2914df4fc0fc303b4a9f040fbaf79fb5ca0170e7f'), size_on_disk=5521438, blob_last_accessed=1756926321.4153743, blob_last_modified=1756926322.4743698), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417892/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/7ac6b85018422c74c894467b7b53b1c2dc7c5298fc31654bdea5ca70493e0a87'), size_on_disk=8194725, blob_last_accessed=1756926354.176232, blob_last_modified=1756926354.9712286), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380976/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/d27d3efdccd104ca208d69fe9d94fe9011cb6212b34739679b1c8f550c5e9df2'), size_on_disk=12889551, blob_last_accessed=1756926371.6141565, blob_last_modified=1756926372.8021512), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417772/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/a21b204a2da7654dd5bf4e7965a3c1e19159c16052232030787175a830ff0233'), size_on_disk=8204405, blob_last_accessed=1756926332.1323278, blob_last_modified=1756926333.453322), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417915/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/9fbfc50c3e1f756169d25442880ba33e5cf237e3c83a0d674b64ace47e8d3322'), size_on_disk=13801295, blob_last_accessed=1756926358.318214, blob_last_modified=1756926359.4592092), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417861/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/ef65c45e0e86b7d04e274c39408b332cc94879c0804f16acf7c537b14bb6498c'), size_on_disk=10436399, blob_last_accessed=1756926348.3832572, blob_last_modified=1756926349.6192517), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417841/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/cb829be078dd8b533a90c2c548dfb2ce0455026be54c36572337e0a1ddeeb2cf'), size_on_disk=5791736, blob_last_accessed=1756926345.138271, blob_last_modified=1756926345.9902675), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417742/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/1a87c492aab479dd0c013fbe80e006f18ba80de751da600b95babcb3f6e88dad'), size_on_disk=2054520, blob_last_accessed=1756926327.3293486, blob_last_modified=1756926328.4213438), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381045/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/fdc8f9e6447280a7fc713c2a8f2157eb9076a5c01a7eb8437176e9ee30d6d4ab'), size_on_disk=4939421, blob_last_accessed=1756926776.3437574, blob_last_modified=1756926777.2507522), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417849/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/23b6606ffcf116cf066b103bf1b6b4fb33c40e544e20e4565fa44c5bbcea4bdd'), size_on_disk=5689527, blob_last_accessed=1756926346.584265, blob_last_modified=1756926347.3892615), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380961/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/fc4356aaa2a30217145b4aecc4f3fdd2575fcd696a2603fea3d8112998f770c7'), size_on_disk=798599, blob_last_accessed=1756926370.1791627, blob_last_modified=1756926370.589161), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417793/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/8a51bd4d953e697022a85f9b527afd45126988ba4bbb796d71c9e5f6c8fffc44'), size_on_disk=10797388, blob_last_accessed=1756926335.4883132, blob_last_modified=1756926336.7013078), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417620/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/b973f1a83f111c4e114a40d3731ff24779cafcb7b71523fe8571cf2ed849b316'), size_on_disk=20780203, blob_last_accessed=1756926302.5494566, blob_last_modified=1756926306.7334383), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381029/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/6431cec7b7404a0773d8715761e2b161f01ebd8471950e71e17dad8976d83f3c'), size_on_disk=7229691, blob_last_accessed=1756926379.0071244, blob_last_modified=1756926776.565756), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417902/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/2d72fc510cd348af7611ca170b7804b228b74bd38e7ff5e437375c1cc926fa95'), size_on_disk=8952511, blob_last_accessed=1756926355.9322243, blob_last_modified=1756926357.2422187), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380973/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/343eb57057b5af866386fb4c02d0c41b888e206f5473ed7b4041200cfb93dbd5'), size_on_disk=1511721, blob_last_accessed=1756926371.3611574, blob_last_modified=1756926372.4931526), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417739/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/1c9e3fcff1dc467673efe70177cb4d678d7eff7b5b2bb3ded247d7a458b466dc'), size_on_disk=21974051, blob_last_accessed=1756926326.8623507, blob_last_modified=1756926328.7393425), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381038/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/415f62b8333a02df8a87ba85e1d0de30a6cc9fdb07f7c8dc02d30f81ed4c14ef'), size_on_disk=2424198, blob_last_accessed=1756926379.8751206, blob_last_modified=1756926380.4921181), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417654/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/67d658902a2b91d0d36345be06dc00c85af1b88ba40c37deaa091705d97b3ffc'), size_on_disk=6323092, blob_last_accessed=1756926311.166419, blob_last_modified=1756926314.4344046), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417614/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/dac699f09872b0c58b742c692b341f0b00b7d83284cec5ef838c276a6f48908c'), size_on_disk=1759475, blob_last_accessed=1756926301.7274601, blob_last_modified=1756926302.4774568), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380944/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/7331bfc92dc902a721083ac4ea027481c90ce204d7dca7c493fec8f284db0155'), size_on_disk=8432479, blob_last_accessed=1756926367.299175, blob_last_modified=1756926368.6141694), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417896/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/71e393229e9071bad31abe5dc3d41bd5d719a5dcb68077827861b4b4e9a49a85'), size_on_disk=6055777, blob_last_accessed=1756926354.7882292, blob_last_modified=1756926355.8442247), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417704/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/477a83e4a5eace957a036c5ec65ef4539442c078058c1a44a877763cb850c86d'), size_on_disk=459034, blob_last_accessed=1756926322.1393712, blob_last_modified=1756926322.8993678), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417764/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/8a30e54d96d5b83981e37a0d8f6cae9e6584819a76a200f2730d9de32e282d02'), size_on_disk=2641764, blob_last_accessed=1756926331.099332, blob_last_modified=1756926332.088328), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417634/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/529d984782684991b2eb7be2d345b5a4e8a6e08d3dbe30ad34495d8380464e6e'), size_on_disk=16306189, blob_last_accessed=1756926306.810438, blob_last_modified=1756926310.239423), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417642/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/591f688d0d9f98350a1d636e6f93ed37563f3ea9765ca6b633490533b0cf32d4'), size_on_disk=11498542, blob_last_accessed=1756926309.2804272, blob_last_modified=1756926310.8714201), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417781/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/9dd7989f90f8762246868ccd7d11c9ce876b2b33099efc3a4c5bf1bc7cbcca80'), size_on_disk=6923944, blob_last_accessed=1756926333.4783218, blob_last_modified=1756926335.3743136), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417884/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/ff7d3efa4a231f5130be299d8d0ff67fe4973856e4db18e8a6c53d47ba6b28a7'), size_on_disk=11684836, blob_last_accessed=1756926352.791238, blob_last_modified=1756926354.0932324), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417646/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/2b3d78fc052de8647b1ba6e83245b5f375b4e6c10952feff795855160458a1d5'), size_on_disk=396843, blob_last_accessed=1756926310.1544235, blob_last_modified=1756926311.0694194), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417699/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/0c64ecbe6b245299fe04c04239d792fbae0f1b288562fe0c7912a24de56f0ad5'), size_on_disk=5707915, blob_last_accessed=1756926321.6203735, blob_last_modified=1756926322.8483682), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417606/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/f76c81d98337c9e127bf5516af0dd135458fd622c01f2b5c9fd85ce6edfa4c7f'), size_on_disk=9502783, blob_last_accessed=1756926300.5794652, blob_last_modified=1756926301.6394606), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380987/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/2eb045d9f88b952eed8f8354e77c682bc3003bd2e9eacb0eb0c2df5ca44a34cc'), size_on_disk=4463796, blob_last_accessed=1756926373.559148, blob_last_modified=1756926374.4241443), CachedFileInfo(file_name='GSE222268_metadata.parquet.md5', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/GSE222268_metadata.parquet.md5'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/4459325dc2deeb9f00d803b89cd8fdc823ed383f'), size_on_disk=61, blob_last_accessed=1756926300.2714665, blob_last_modified=1756926300.3444662), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417621/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/354c654ace9bc123bcd9009c56af518b98f1a547a7ccd79f348d0e533607a41e'), size_on_disk=20457307, blob_last_accessed=1756926302.5744565, blob_last_modified=1756926307.7874336), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417613/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/695d3095619f3a313d25c506b9d0493f171896f1439bcf3bd1cd932d1f6fed5d'), size_on_disk=2812565, blob_last_accessed=1756926301.6434605, blob_last_modified=1756926302.4814568), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381010/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/bd149efcdaa899981b965dc424d16fb66668a7666a059c6bc5043572331394c2'), size_on_disk=4513644, blob_last_accessed=1756926377.1751323, blob_last_modified=1756926378.1161282), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417714/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/d6abeb724a8c8b23217f3465284564da5baffee59ee4e3af582e0a3705ae32c6'), size_on_disk=262443, blob_last_accessed=1756926323.756364, blob_last_modified=1756926324.256362), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417652/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/d38c028233981eb275c44995eafa73172388a9e0e704fe278b0cedbab49d429f'), size_on_disk=12947754, blob_last_accessed=1756926311.0344195, blob_last_modified=1756926312.1414146), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381051/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/ba017d941936d755013757711f5de4661acd187ef79524662d85485ba6588f34'), size_on_disk=8199472, blob_last_accessed=1756926777.322752, blob_last_modified=1756926778.4757454), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417905/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/b9d291469b7a414bab157f45bfd3a1f7ffaaf931de25fec9eb18012a18b70939'), size_on_disk=4418829, blob_last_accessed=1756926356.1612234, blob_last_modified=1756926357.2652185), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417655/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/25d98ba17be9b354b0ec7a0e513a6ff9d4593dd1cadf269f28bcea1ca9c71565'), size_on_disk=5461224, blob_last_accessed=1756926311.5504172, blob_last_modified=1756926312.9474113), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381068/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/16e4412527449a3cb9d2da129c3309a71b2ec9b33a7a39cb9bb31d41ce5d2e04'), size_on_disk=14909067, blob_last_accessed=1756926780.5687337, blob_last_modified=1756926782.3187242), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417927/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/8e639025209cff8d34b7e35a5fa371282bb725173225097a39f8f2101a49deed'), size_on_disk=7241057, blob_last_accessed=1756926360.6762037, blob_last_modified=1756926362.1171975), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381037/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/5cef560daffa3a5b1e841a312937eca595ced0bbdb3e6c641b13359a0340fea1'), size_on_disk=2090047, blob_last_accessed=1756926379.8821206, blob_last_modified=1756926380.4171183), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417737/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/2e1d5b2306dd24257d4ba261baa42b324c9f4f8ca3a1d30e0f920c4f4a5f99ba'), size_on_disk=1342511, blob_last_accessed=1756926326.2713532, blob_last_modified=1756926326.9043505), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417882/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/ac2bd9710196abc4469d4d78ccbde6ddc67e9ef39ade7d50c32b0d951866e653'), size_on_disk=10076041, blob_last_accessed=1756926352.116241, blob_last_modified=1756926353.223236), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417798/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/0ac62fa7573f4b3c756fd9c280be4b97ec0c7c756e8c76307d4fab56d29e1c08'), size_on_disk=10509245, blob_last_accessed=1756926336.5193086, blob_last_modified=1756926337.800303), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417926/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/303d44a38fbb646cf211ce1bd03fe2d24bb004f385c63d05ab8e707c5b283f0c'), size_on_disk=10766909, blob_last_accessed=1756926360.6072042, blob_last_modified=1756926361.6751995), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417838/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/0dcd4dc268902c8e0845a1ee81938a81a00412014d5b198a41a523cc3450dfc4'), size_on_disk=10178872, blob_last_accessed=1756926344.5452738, blob_last_modified=1756926345.3902702), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417879/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/6714b3f5155dd540962a7541625f45e14e6a93b4c01e913790b70818e50c821b'), size_on_disk=8601872, blob_last_accessed=1756926351.4612439, blob_last_modified=1756926352.8182378), CachedFileInfo(file_name='GSE209631_metadata.parquet.md5', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/GSE209631_metadata.parquet.md5'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/fe31b19357a717d3d51d1d201620d37fca79b1c8'), size_on_disk=61, blob_last_accessed=1756926300.2214668, blob_last_modified=1756926300.2814665), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417773/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/d7980069a740f581dbcfd9416372452203d636d8ad4c9352c3b2d6483cd27379'), size_on_disk=5178664, blob_last_accessed=1756926332.1683276, blob_last_modified=1756926333.411322), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380984/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/a15ab5e5a0ead07dc2a530a1e4abb714027a95ce090c7a0eda5ec2bb714d903f'), size_on_disk=8070704, blob_last_accessed=1756926373.2251494, blob_last_modified=1756926374.1601455), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417794/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/f15be566c029c52cf46f6258e589529afb3a002f010710a54d04dd6151f8b684'), size_on_disk=10205023, blob_last_accessed=1756926335.9063113, blob_last_modified=1756926337.151306), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417851/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/2d8a571fae50598143744a1f13d40cec2ea8c2198e87aae17f96d7c8a00460d0'), size_on_disk=11051527, blob_last_accessed=1756926346.6472647, blob_last_modified=1756926347.6582603), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417635/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/d2d1c241467bd5c55cf059f793b4236c99d48e59bec853d1052ff6c644e50d1b'), size_on_disk=3044373, blob_last_accessed=1756926307.2014363, blob_last_modified=1756926308.391431), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417676/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/53c35af59283469a22148b2841cb8574faf80d59162a33d269560283c9228af2'), size_on_disk=12006227, blob_last_accessed=1756926315.8083987, blob_last_modified=1756926316.8563943), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381024/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/5182750a0c8af8253cc98ecad8715519463d5c18adbe067488dfef91198c0d80'), size_on_disk=603678, blob_last_accessed=1756926378.7151258, blob_last_modified=1756926379.128124), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381043/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/f6ded8e6496dcfdd499edc382d19da2071f25daebbb250b375b9a2706e9d8211'), size_on_disk=9139336, blob_last_accessed=1756926775.6057615, blob_last_modified=1756926777.1057532), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417782/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/d6624acaa7c4ec21ae02d6d1322b8669a148308095e714b41bff868483fe2fa9'), size_on_disk=11742567, blob_last_accessed=1756926333.5273216, blob_last_modified=1756926334.824316), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417724/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/e5beee12039dfe6b4dae180a31ac0897dea725754554db43e2df9f99da94db58'), size_on_disk=4535402, blob_last_accessed=1756926324.69836, blob_last_modified=1756926325.9703546), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381069/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/aa8f6186814bf973f8b4448a0c4b692c31594e03905c75d5691b982d6ccde97d'), size_on_disk=13705106, blob_last_accessed=1756926780.5807338, blob_last_modified=1756926782.319724), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417756/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/ede027f2c13a1e6669becf6194ace85af5800d5fe77db16ec47bfb3abaa735d3'), size_on_disk=8620421, blob_last_accessed=1756926329.968337, blob_last_modified=1756926331.0303326), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381014/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/5c5106dfb24bcfb1436aeb87b8218df873417ca2809aed71ca58e9b48ffacd29'), size_on_disk=1425306, blob_last_accessed=1756926377.5811305, blob_last_modified=1756926378.2571278), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380962/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/72be01aa42db736ffac8236c23c16edea00bdf5da88c3114fc94d9eb3daa4ee9'), size_on_disk=1460258, blob_last_accessed=1756926370.2361624, blob_last_modified=1756926370.6811604), CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/fb5958ec4e10f5c7e19ebc55b24f26012e6d8ccf'), size_on_disk=8004, blob_last_accessed=1756926300.2594666, blob_last_modified=1756926300.3214662), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417805/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/661f37ed294c22f6c3b02d5c1ea25794f5e33090c246185e891ee82153e2e5f2'), size_on_disk=3071166, blob_last_accessed=1756926337.8643029, blob_last_modified=1756926339.3292964), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417752/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/efc9f1ecf4a7b06b23da46fb8c7d1879b67a9d6603ed753d09e2028396ec05c5'), size_on_disk=3420764, blob_last_accessed=1756926329.26534, blob_last_modified=1756926330.0613368), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417917/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/2118491b5ba6b4af1c0d394eec3139b0258cf3a4e42b4d1a02ab177247852081'), size_on_disk=14019020, blob_last_accessed=1756926358.4622135, blob_last_modified=1756926359.8632073), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417694/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/058b507f75cb01bb5912e14e7bbb4e25a1de945e578323bce9f920f31a9ced82'), size_on_disk=11980395, blob_last_accessed=1756926320.09838, blob_last_modified=1756926321.5623736), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417631/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/b4b8c985528935ec46c6c7e78482af1ed26238137fc63b2196c6631e2d3796ab'), size_on_disk=27651513, blob_last_accessed=1756926306.2814403, blob_last_modified=1756926309.1994276), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380995/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/e0cae334f1cb71be0be1e8ceb5fc6ad23d22e4db20cb64b1314c3d080f2bb5f7'), size_on_disk=6315860, blob_last_accessed=1756926374.931142, blob_last_modified=1756926375.7271385), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417909/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/c5de46a74c16aee32dfb2be89a50dceaa5704c0c550076dcec483d1c21d983af'), size_on_disk=5969725, blob_last_accessed=1756926357.3362184, blob_last_modified=1756926358.5162132), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417936/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/8ba89583b8c822b64968df9eca8273d12270429f0f820d6027f13c66504c383d'), size_on_disk=4552381, blob_last_accessed=1756926362.205197, blob_last_modified=1756926363.171193), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417944/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/01e0e606e4e8b2203e6497ffc8c231bce8672a43d83098ba838c8edfdb052c27'), size_on_disk=4563261, blob_last_accessed=1756926363.89319, blob_last_modified=1756926364.6951864), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417672/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/f0741c46f1460c30d4d9b88ca1f7c0ca77a040cf25a34d82fab4c777a921ac26'), size_on_disk=18268758, blob_last_accessed=1756926314.9764023, blob_last_modified=1756926316.5143957), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381012/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/f00bc431128183affc8964f9610f3e9486a043bcea5b31b3ebbcbd9b0070a448'), size_on_disk=6096918, blob_last_accessed=1756926377.3431315, blob_last_modified=1756926378.2571278), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417659/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/20f21bb1ba86fdbee1918776793b1ff2275be49933c72eaba90b8be8582f7692'), size_on_disk=15311232, blob_last_accessed=1756926312.2364142, blob_last_modified=1756926313.8474073), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417722/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/8f29055ed874a3a40735526808a6739884efecd8d7c5e7e9eb1fcaa8e495ade3'), size_on_disk=345818, blob_last_accessed=1756926324.5753605, blob_last_modified=1756926325.2483575), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417759/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/b2dad4cf4b03c846adf9d5229aeab340e58034c751bc68b37bb0c8402a7c071b'), size_on_disk=1372312, blob_last_accessed=1756926330.1243365, blob_last_modified=1756926331.1623318), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417609/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/e3d8451242fe9ecf9f82e2ac1a406144840a6182178228e6e1fa67e6a04a318a'), size_on_disk=3176651, blob_last_accessed=1756926300.651465, blob_last_modified=1756926302.0864584), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417876/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/0713329434bfbd55f1998400d9d44c1d8c7fec6cb1b646c0fdb926f1046f9b47'), size_on_disk=11246388, blob_last_accessed=1756926350.971246, blob_last_modified=1756926352.0472412), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417922/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/fba17c65fa210922e57c533f5146b4237677149a5022fb836c1cd52dc89855b2'), size_on_disk=10194446, blob_last_accessed=1756926359.5812085, blob_last_modified=1756926360.610204), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380953/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/5b3410430d33f3123b935982be90037f2d0a8e657f6b0b70a7042846d5ddd1b0'), size_on_disk=1192961, blob_last_accessed=1756926368.8401685, blob_last_modified=1756926369.627165), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380974/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/ea9eb19e91fedadd9e46d5d7dc9169995065af5b6b8e823f1667117be9d737d0'), size_on_disk=8889742, blob_last_accessed=1756926371.5481567, blob_last_modified=1756926372.2801535), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380978/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/a1d811dc3a4bfd44683ddf8bd0401a71d4941497c5cac8d8040a597fc5d26587'), size_on_disk=5902739, blob_last_accessed=1756926372.160154, blob_last_modified=1756926373.09715), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381065/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/99cc73bc4c0908da04c01c7c59eff995a478ffc023c31fb970c7bb9789b2d70b'), size_on_disk=14673323, blob_last_accessed=1756926780.0087368, blob_last_modified=1756926781.5847282), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417932/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/f040d4f594fe69143a7c3f04f0498be2a30fb9b5e78d42ec464fee65efcc42dd'), size_on_disk=9231914, blob_last_accessed=1756926361.4262006, blob_last_modified=1756926362.5771956), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417623/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/562a43ff381a2ca177fb1cc3dbe47fc2cf7cade495bad511ca7ed75aefca7a98'), size_on_disk=19049552, blob_last_accessed=1756926302.696456, blob_last_modified=1756926305.4934437), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417815/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/ee8a37cd03dfe154008718f2a0f7fd95b3fdf468f99b1fc73ca515433fa45930'), size_on_disk=6563669, blob_last_accessed=1756926339.6062953, blob_last_modified=1756926341.8062856), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417901/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/69432386b347f5b364c02d270b38891358fa2068465354d1ada84d5450ea5bf0'), size_on_disk=9484337, blob_last_accessed=1756926355.6292257, blob_last_modified=1756926357.0582194), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417921/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/05fb4ff0001b88b711eee9f29ae30642d940667f7166121c9782f464057880f5'), size_on_disk=7891276, blob_last_accessed=1756926359.3782094, blob_last_modified=1756926361.2442014), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417762/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/c8b34d14be6dd44af03432d758f705cf81e86a46831c6deb4b0944248709630a'), size_on_disk=3024339, blob_last_accessed=1756926330.943333, blob_last_modified=1756926331.59933), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380966/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/f27a802d18bf78781f3d168370e0c0e241a130d04a5871d7416442be3ce3e40f'), size_on_disk=4241927, blob_last_accessed=1756926370.3881617, blob_last_modified=1756926371.272158), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417685/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/32edfbf9226b7ef1060ebe3e6986a1803d60408ccda969528cf4a59c5785c722'), size_on_disk=16967633, blob_last_accessed=1756926317.9163895, blob_last_modified=1756926319.5263824), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417768/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/8bb8e088eb5123049e418c868ab42d94fb48946f934a8313f4cf283e3bfa09ba'), size_on_disk=4124375, blob_last_accessed=1756926331.3153312, blob_last_modified=1756926332.2313273), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380934/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/d264f2a91c87696f6a07f573f1d921c91e1d53e3afca59610ba4703b3683b617'), size_on_disk=4601537, blob_last_accessed=1756926366.1091802, blob_last_modified=1756926366.9191768), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417920/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/00a869703f03e3df3f0f51a7d93d52dd7eac5174ed2a1b752207460e71c7b41a'), size_on_disk=14880236, blob_last_accessed=1756926359.1242106, blob_last_modified=1756926360.409205), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380930/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/734e9bc8119c7a507e9cf84f9926b1f9d06ccc8b0062a9b4dc36631f6f1bc6d1'), size_on_disk=4604638, blob_last_accessed=1756926365.1301844, blob_last_modified=1756926365.9251812), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417745/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/1ab84f2e8e76311cfe45a4b525b239186e835cab37f1e5222fdc35c286064e5e'), size_on_disk=1270105, blob_last_accessed=1756926328.596343, blob_last_modified=1756926329.1233408), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417716/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/cd98cf97346b62e4f1efcace598c9c273d9ad9b8d8692e35f50b11d46f7ed8a6'), size_on_disk=5100666, blob_last_accessed=1756926323.8513637, blob_last_modified=1756926324.8793592), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417820/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/9d36f1c17151e3203186142abff734c0182377d5287b3f27d00917608501abb3'), size_on_disk=8116588, blob_last_accessed=1756926340.9562893, blob_last_modified=1756926342.0902843), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380939/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/dc6382d9dba7855504a6c7fa2a4adc44f564e25ff5c54eedd082c3cbe0a520a0'), size_on_disk=9235430, blob_last_accessed=1756926366.630178, blob_last_modified=1756926367.5861738), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417774/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/ad00f794fb29d2a609375d63b09170f396f055ae392cd05a31d5f8fb5ede7a33'), size_on_disk=2914273, blob_last_accessed=1756926332.1763275, blob_last_modified=1756926333.217323), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381070/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/32c8f4c4efdbe6e137add61b0e31c27ae6a9eb5e8b463e9e4410a27cc51a57d3'), size_on_disk=29864184, blob_last_accessed=1756926780.7487328, blob_last_modified=1756926783.3167186), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417949/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/6123bc7bd312e6e37fabadee561c7dacad32d26bada92c9785c783f94fc50a37'), size_on_disk=1320707, blob_last_accessed=1756926364.4151876, blob_last_modified=1756926365.0891848), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380956/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/15047f69259f2cbc2cb08949d07ac6fde505bade6a0e3bb3912b183a48a4b288'), size_on_disk=2282680, blob_last_accessed=1756926369.4771657, blob_last_modified=1756926370.1431627), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417910/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/694d86a1e469836a53e1ce5dba0d64b77b8f68590a2034e2a1ef5310457dc7fb'), size_on_disk=4494010, blob_last_accessed=1756926357.3112185, blob_last_modified=1756926358.3942137), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417863/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/4aabb9d1a4b105c7669deb0f13cd635ab7433ed1ae02f5785aad014d604f1504'), size_on_disk=6900487, blob_last_accessed=1756926348.4532568, blob_last_modified=1756926349.2742534), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417681/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/6e4ea424e9eab4dc3ea304621992e1db158565a8ce30374980b1084a653a0132'), size_on_disk=7689435, blob_last_accessed=1756926316.934394, blob_last_modified=1756926318.231388), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381025/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/aea8c4d9f3783191d3e89f4566811f01089f3b0b76b736c5167bda54693b0bc4'), size_on_disk=339954, blob_last_accessed=1756926378.7781255, blob_last_modified=1756926379.125124), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417692/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/82405dc0a21bd225cc6a9e813e62d4dc0522719abf1e677c954a6ec38763b306'), size_on_disk=4873514, blob_last_accessed=1756926319.6253822, blob_last_modified=1756926320.6443777), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380926/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/fd708dfb8084d767fde32c52acb58e785083ce7becbd2e160277f31460719346'), size_on_disk=6579161, blob_last_accessed=1756926364.8451858, blob_last_modified=1756926365.5441828), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381015/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/345c83c5ab13d230542683014d3d6073ba5a348e3063fea325843981c6c6e7c1'), size_on_disk=3807314, blob_last_accessed=1756926377.67913, blob_last_modified=1756926378.4971266), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417894/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/12f262a12783317d135ffc5e1d281a3d43b746a82ed8053c15de824b94bd5ef8'), size_on_disk=6449687, blob_last_accessed=1756926354.4772308, blob_last_modified=1756926355.4742265), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417919/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/f5504428a85399cb212c74235abf9cec068dba03136a87ea79eae6d63a3cf430'), size_on_disk=7011211, blob_last_accessed=1756926358.6892123, blob_last_modified=1756926359.8312075), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417810/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/b807197acb33acdd1b70eddedd59dc4d831e271d231d6c9bb6f1281fb4d5813f'), size_on_disk=9808564, blob_last_accessed=1756926338.8082986, blob_last_modified=1756926339.7202947), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417726/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/739e474fc54ffee0c7e6300d3abb1f83d33d48c78d137ed848f837fa186b9308'), size_on_disk=2430088, blob_last_accessed=1756926325.0663583, blob_last_modified=1756926325.6803558), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381056/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/382194ca9758ddc18f059a649b13c5a21ead3a7f1fa5e012065315047a037a58'), size_on_disk=2351268, blob_last_accessed=1756926778.362746, blob_last_modified=1756926778.9827425), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417738/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/c3d82a351044aa8cd5d43979369088ccf2f563022d4f7aa3683ef20923202770'), size_on_disk=19772535, blob_last_accessed=1756926326.3913527, blob_last_modified=1756926328.6913426), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417859/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/e4c5915df6cb5bb450e1948d8c43bbe031d09ec345688ab51f2d8ea60f3be09f'), size_on_disk=12709203, blob_last_accessed=1756926347.8872592, blob_last_modified=1756926348.9972544), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380967/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/3abcaaa4db306d580dc27882172ced53d591b8aa69a1805bd2a9eea0448ff941'), size_on_disk=1245882, blob_last_accessed=1756926370.6731606, blob_last_modified=1756926371.1021585), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417839/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/a04fc093d2a4fe873de135165bd62a38a1cb9b29860881ed973022ff128dc6d8'), size_on_disk=6789775, blob_last_accessed=1756926344.5502737, blob_last_modified=1756926346.4242656), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417931/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/7e70415ed0dfb1eb42e46789c905ccf2b593dc192d907d939d87acd59cd963e1'), size_on_disk=4991063, blob_last_accessed=1756926361.306201, blob_last_modified=1756926363.1161933), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380935/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/4c942b91ac018330aeb4da512a3e47ae4531a434b6e9933cf3e2bb476a77f9ca'), size_on_disk=6312758, blob_last_accessed=1756926366.1041803, blob_last_modified=1756926367.0141764), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380940/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/4f59a7f9287c427bed27910f08a041e460eff2490c21e2fbdaa5cc9348f5921e'), size_on_disk=6062382, blob_last_accessed=1756926366.8141773, blob_last_modified=1756926368.3051708), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417767/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/8d0afed33cd85298c709656da4b210e3fe87aa4d6822208d744200816d59b95b'), size_on_disk=3739401, blob_last_accessed=1756926331.2333317, blob_last_modified=1756926331.8623288), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381001/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/ad0e80a28e4cd2027893adee5afceab851576ed2637c1a8e94d0af2adae6ace1'), size_on_disk=5088900, blob_last_accessed=1756926375.41114, blob_last_modified=1756926376.3821359), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417628/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/291f0e4d2d0fa1656de1583bd9e131fd4d2fc1934d0bb70a6774f7e7f90e62ae'), size_on_disk=15444530, blob_last_accessed=1756926305.1314452, blob_last_modified=1756926307.1334364), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381009/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/4331f53e52f628f99a9695fa70339b08eb69a011cc8a3cbdb08dfa8e6c5d3ddd'), size_on_disk=10779086, blob_last_accessed=1756926376.8381338, blob_last_modified=1756926377.9141293), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381021/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/3e80794742d10b6d120e70c4e4f1d7f77123ddf8aa18499f686b4abc8eadd325'), size_on_disk=1524611, blob_last_accessed=1756926378.410127, blob_last_modified=1756926378.897125), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417923/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/a32971eb62fe5c4c08298b1bdd97a916d8ef8f5898aee42aaf3fbaa8069e92d1'), size_on_disk=10004280, blob_last_accessed=1756926359.919207, blob_last_modified=1756926362.0851977), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417891/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/cabef83ee373e37d528b127d813f1535faede4b98c5524d83e9906df83558b07'), size_on_disk=12596609, blob_last_accessed=1756926353.9052331, blob_last_modified=1756926355.2992272), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417644/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/0c788a7ba70689a796af823df5cc14504dfdc854617952459367364c4498cdeb'), size_on_disk=949200, blob_last_accessed=1756926309.758425, blob_last_modified=1756926310.6444213), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417885/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/6d5faeda491268a8ec88393883d743aac6cd14d4837698164ca32ca1052da36f'), size_on_disk=7668243, blob_last_accessed=1756926352.806238, blob_last_modified=1756926354.1422322), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417828/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/eb4987a8ef94e1691c74c1d828d7d9313da4691b577118a73524def2a90c6d0b'), size_on_disk=4827634, blob_last_accessed=1756926342.3472834, blob_last_modified=1756926343.2782793), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380937/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/e31f54581c4a4a47033065033f3f9266913edad654543dd07b20d0ffe54847dd'), size_on_disk=2691574, blob_last_accessed=1756926366.2071798, blob_last_modified=1756926367.1951756), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417916/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/2abad8555b89fbb5009e8f2a95e553a5ce579d1841ae4e71eec7b137f3282ef1'), size_on_disk=9236466, blob_last_accessed=1756926358.371214, blob_last_modified=1756926359.0582108), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417712/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/59cbfa83b9abe13f60d6a80b35afe857e3ffd761366133f60505312a2b5a72e2'), size_on_disk=641243, blob_last_accessed=1756926323.2163665, blob_last_modified=1756926323.7173643), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380979/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/ef743a2a6a8fe44b62963b981ef0429676e7854243ab07d450d9c3ab569c7e8e'), size_on_disk=3735979, blob_last_accessed=1756926372.2201538, blob_last_modified=1756926373.6791475), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417682/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/83b3100cfd1dc6918855c4a711ff607b36f40971655733855a7df353b4d5fca5'), size_on_disk=14037100, blob_last_accessed=1756926317.1663928, blob_last_modified=1756926319.1013844), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417713/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/c5d75f4a32cadceee844da0c5ce703483f575fb927990a1615ee37205137d986'), size_on_disk=524444, blob_last_accessed=1756926323.533365, blob_last_modified=1756926324.045363), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381053/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/6ca1dc55841bbb6d8d8a83d1a87d00570c1eea7cf1963db8cb04ddd427711fb6'), size_on_disk=10277654, blob_last_accessed=1756926777.9197485, blob_last_modified=1756926779.9187374), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417711/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/e30dbc09cd2dfb4722d60d0833123542e9c5096f70b97867cd15d466aa7b927a'), size_on_disk=96859, blob_last_accessed=1756926323.1863666, blob_last_modified=1756926323.7823641), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417686/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/6a42f24b250baf0d88696bc123f01494a400b3f03f2ff2a1fc828f9295721a16'), size_on_disk=12070748, blob_last_accessed=1756926318.2063882, blob_last_modified=1756926319.627382), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381042/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/80a0c18e8d146043e9c95c6a86552d28e28267a489b3e5c6f2a11319cf47b877'), size_on_disk=4477936, blob_last_accessed=1756926775.660761, blob_last_modified=1756926776.9177542), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417854/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/d52c8b14e197b179a9b43fbc89bb3205d09d458161ee49ac15a62b7c1f7a39fe'), size_on_disk=6203801, blob_last_accessed=1756926347.311262, blob_last_modified=1756926348.387257), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417935/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/c103eef2a9834296fb1328b7012b5b9184c5a3f131817dfca257843e863d34b2'), size_on_disk=9829417, blob_last_accessed=1756926362.1531975, blob_last_modified=1756926363.3531923), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417633/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/0b38824aeb2fac57eca256e3652539deb4bc221eeb5afc3701a997885f9ad9d0'), size_on_disk=19311587, blob_last_accessed=1756926306.8324378, blob_last_modified=1756926308.847429), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417895/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/2daca9b311da1afa40f578c6df7dc4b0039dd4857663c98cd87c665a547b7144'), size_on_disk=11915424, blob_last_accessed=1756926354.7232296, blob_last_modified=1756926355.8262248), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417832/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/268a3b7a91ccc3f3909042a711454c336b1887705de99442bda022bcbc26a01e'), size_on_disk=7655304, blob_last_accessed=1756926343.2332795, blob_last_modified=1756926344.3042748), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417862/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/cea0cfbd611f8c6482ec33f8ea32f304a481d1c1225d3dacea432dd359fc976b'), size_on_disk=5244392, blob_last_accessed=1756926348.381257, blob_last_modified=1756926349.6162517), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380933/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/07c22e11c56ce5ecb6c22c910c3d3a0d9b54fd341bef0513b4bf42a9c53ec6a0'), size_on_disk=10138164, blob_last_accessed=1756926365.8761814, blob_last_modified=1756926366.7471776), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380963/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/c0a30240a0a99347e114928a15a24268495fd37480fde6822f503438bdf3ed48'), size_on_disk=579777, blob_last_accessed=1756926370.2681623, blob_last_modified=1756926370.81716), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417679/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/bc48687cfd5c03c1a71787981180295dd3a27cf4d2c46bd26bb68cdf8b35925a'), size_on_disk=17346576, blob_last_accessed=1756926316.1203973, blob_last_modified=1756926317.84439), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381046/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/a96c1819b1010c40fc4d92418e407cc5b5f48f9e24883d4b4cdf3e02d23b688a'), size_on_disk=3071378, blob_last_accessed=1756926776.6277559, blob_last_modified=1756926777.855749), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417721/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/5457a24eab864fb369ae0e6fbdb302876e2a6c65ae83444f18bc01ad08ecfc0c'), size_on_disk=2073045, blob_last_accessed=1756926324.3243616, blob_last_modified=1756926324.9983587), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381002/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/3fe2828766efd936f02efae2da3e068ef9b28ea374e594e09850e3e8ff0d8164'), size_on_disk=4421540, blob_last_accessed=1756926375.7961383, blob_last_modified=1756926376.7581341), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381061/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/d44d57e9c03bb6e6e750809eea88657c2960ed251b8acaa1352a07f4cb346a82'), size_on_disk=3789431, blob_last_accessed=1756926779.0527422, blob_last_modified=1756926779.8977375), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417866/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/b0424cd7567723443472cb266e090a30e2b64ee6b09bb1b94d8dcfdb9f0b0c43'), size_on_disk=9685580, blob_last_accessed=1756926349.109254, blob_last_modified=1756926350.493248), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417629/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/29c9c86be94792d797005b106ab00d066634bea44fbf17e81738a00cbffa264b'), size_on_disk=22750780, blob_last_accessed=1756926305.5594435, blob_last_modified=1756926307.5804346), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380952/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/3811d6e6194d8b88c84fc0931cba844da23677e76e1e9a25e2890df968a7e529'), size_on_disk=1003172, blob_last_accessed=1756926368.8351684, blob_last_modified=1756926369.4641657), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417948/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/d43ea802a452aa9d57004381e1688222e61bd313b107077c1c663555547d1d44'), size_on_disk=1748124, blob_last_accessed=1756926364.3661878, blob_last_modified=1756926365.038185), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417709/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/395b1dd171ab9759e06a22580672bef65aac2acd7d548e38098d207c80071d89'), size_on_disk=717131, blob_last_accessed=1756926322.9983675, blob_last_modified=1756926323.6373646), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381022/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/980ed07a51dfad581fc2f7bce44031c92445822ba7e994cbfc9b5508cdf97c85'), size_on_disk=2103246, blob_last_accessed=1756926378.5151265, blob_last_modified=1756926378.9441247), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417803/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/1fd8b542fb1b0e18bc5346c50ad85da440ff6190a987467ec6876ed3ec51a4e8'), size_on_disk=4928668, blob_last_accessed=1756926337.5433042, blob_last_modified=1756926338.7182992), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417814/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/9f70b1469a6d0ad8758aa7a44e754e07ccc401906f8af403a72075d763919535'), size_on_disk=7618977, blob_last_accessed=1756926339.5142956, blob_last_modified=1756926341.0872889), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417842/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/c763016dcbf1b53df840e3ea3c1b45234e2d9e797114000efaf692e2c394d487'), size_on_disk=9967694, blob_last_accessed=1756926345.1472712, blob_last_modified=1756926347.1812623), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380983/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/eec33f1158aa017b073fcb16d6400cd800eb79e8c0b78b5e2f831d7413e4a736'), size_on_disk=4478939, blob_last_accessed=1756926372.869151, blob_last_modified=1756926373.6721475), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381052/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/652334151cb539b888d9cfcb60d9711d80bd74ec65cfa3799de10b2970ab9c09'), size_on_disk=5788579, blob_last_accessed=1756926777.7767494, blob_last_modified=1756926779.283741), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381007/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/059cc8a8b1f7918ab1b6499d3d9843100b3b37fe6dcd57dfe9ae512c6030cad6'), size_on_disk=2109635, blob_last_accessed=1756926376.7021344, blob_last_modified=1756926377.469131), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417858/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/2d7539fcb6297592fe3755ccc88e122441925fbaf9f5600d4daa88639630fd8b'), size_on_disk=1021831, blob_last_accessed=1756926347.8042595, blob_last_modified=1756926348.5072565), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381055/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/0aad0d3724742845da7f0afae0c18ab5c992bec76c02e0ffda057f00d0894d35'), size_on_disk=3355008, blob_last_accessed=1756926778.3377461, blob_last_modified=1756926779.263741), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380957/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/5a2362d35ac90fb1926051e6313886a649074b0022b128741a22a5ee24bb095d'), size_on_disk=5388789, blob_last_accessed=1756926369.5311654, blob_last_modified=1756926370.8571596), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417661/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/c6effb515a20066646b7863cb86e9dc012e14187e432ba7458cd6af18662bfd6'), size_on_disk=22122748, blob_last_accessed=1756926312.9054115, blob_last_modified=1756926315.6813993), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380936/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/5a931bc1181001314c3822051c493ba7a8b87b5bb26b8866783612f2bcf49c62'), size_on_disk=3467608, blob_last_accessed=1756926366.19118, blob_last_modified=1756926367.2261755), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417670/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/5e36b63ef1fbb55f821ed2c3d61f0d0bf86cf00846e86ff9dbe6f4d9597f9dc8'), size_on_disk=10881106, blob_last_accessed=1756926314.7704034, blob_last_modified=1756926315.980398), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417750/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/aedd94fe4b42547108a807242052397b6dfbfda0eb4e56a9f2e267b312251147'), size_on_disk=1531584, blob_last_accessed=1756926329.1973405, blob_last_modified=1756926329.9403372), CachedFileInfo(file_name='genome_map.md5', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map.md5'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/b8f0ed936cb43c4649126e9b7596cef0ab626a2d'), size_on_disk=142786, blob_last_accessed=1756926300.410466, blob_last_modified=1756926300.4954655), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417843/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/6569e23f0eb660e9f7111b9cdd69e7ae797ba265647ee44e4cbf2d8babdeca9e'), size_on_disk=2706587, blob_last_accessed=1756926345.3402703, blob_last_modified=1756926346.125267), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417918/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/9f1cf3832969364b6cee1c817d3583a554b765bdbc1ff3f5b29692b7ea98ee7e'), size_on_disk=6936136, blob_last_accessed=1756926358.5802128, blob_last_modified=1756926360.9132028), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417765/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/bd2b5a1b1593db002ea9c159d4c40d1d8addddbf124207a7b32b0bb43e2e4d72'), size_on_disk=2696437, blob_last_accessed=1756926331.2103317, blob_last_modified=1756926332.046328), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380985/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/2ec986412ef2a9e6e59f4ea3dc278b27ab3975b5025d00bd1cd38fe1867b0fea'), size_on_disk=7208882, blob_last_accessed=1756926373.2251494, blob_last_modified=1756926373.9591463), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417826/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/3532f474125bbdc0445e46fe8be83801ca9c079afb1dedd7cd95e25994c50d17'), size_on_disk=11492099, blob_last_accessed=1756926341.9132853, blob_last_modified=1756926343.2502794), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417818/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/3b30217b127ff88f165762c5054c392d9489b9007ba7ca8b86f18b44f9988e4d'), size_on_disk=8388230, blob_last_accessed=1756926340.2372925, blob_last_modified=1756926341.5632868), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417618/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/d2ce91297551bc97b4fca46007f80c8846e5f9e9c78d9c1d7ded02fc31b3ef38'), size_on_disk=9239111, blob_last_accessed=1756926302.3654573, blob_last_modified=1756926304.3714485), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417822/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/62cbbb0172158a2c0776000136424032a794757ca9e9f971e5f445deb20e8579'), size_on_disk=6040431, blob_last_accessed=1756926341.3882875, blob_last_modified=1756926342.8302813), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417649/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/6baf4bcd82d9fe55b8dcdc7602c0d39d20954675ef4c5484331ff25dac8ad3f4'), size_on_disk=2672976, blob_last_accessed=1756926310.8234205, blob_last_modified=1756926311.4704177), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417700/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/1240fbb2497074bb7ff98b1bc6399b8a8ad5d320246285abfb6738e27e2bbdeb'), size_on_disk=19312447, blob_last_accessed=1756926321.6763732, blob_last_modified=1756926322.8383682), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417611/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/c82d882f6538d65316dd01e26a82af13e41717952c79e20dae6c3f1d9724ab60'), size_on_disk=3720891, blob_last_accessed=1756926301.4614613, blob_last_modified=1756926302.2704577), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417845/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/ca2b89837de406b10d72533a3e3d5457b3512b704f7cea395fb0e1b88828393e'), size_on_disk=4648586, blob_last_accessed=1756926345.6932688, blob_last_modified=1756926346.5862648), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381030/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/e5baa8386faa327b26d27d6c8e6b265dddac1d55565e81d724a1980b93d871a3'), size_on_disk=1867415, blob_last_accessed=1756926379.2021236, blob_last_modified=1756926379.6681216), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417769/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/e640018be0b2de04e0ed77c7586c413990876e91348531f43511c20da5fe279a'), size_on_disk=11568261, blob_last_accessed=1756926331.59533, blob_last_modified=1756926333.0913236), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417703/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/e82b21d919b84770e2d141eaea09aff82ef57c3ff3dbd15c588c1b8cff212c9e'), size_on_disk=1462841, blob_last_accessed=1756926322.0533717, blob_last_modified=1756926322.898368), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417678/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/825d9bcd10c008b68cc84e92f20608c0d52770ac028bbc4a9219f1cd210ab8c4'), size_on_disk=29320273, blob_last_accessed=1756926316.0503976, blob_last_modified=1756926318.3993874), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417746/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/432442dbdd5578acedcbf6eddc14cfe0d460facc7b0dea857225fbd4e6a4242c'), size_on_disk=2726090, blob_last_accessed=1756926328.6353428, blob_last_modified=1756926329.1983404), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417648/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/fb706eb700e8fc003822b897accd3501360d737fd1a9ffcf26c3e9d9b25a5e35'), size_on_disk=1285054, blob_last_accessed=1756926310.3104227, blob_last_modified=1756926311.0174196), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417857/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/64f710e0b8a62bb6eddf3508f133b6b82af8bd21dd07f2319af6d07981085b1e'), size_on_disk=11109477, blob_last_accessed=1756926347.471261, blob_last_modified=1756926349.6202517), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381071/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/cf15be48b47b6c369c6a4290b783b9b22b01d6cb89d04c738c275241548b811b'), size_on_disk=2543373, blob_last_accessed=1756926781.2297301, blob_last_modified=1756926782.309724), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380991/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/dd86db1779581d5853d4d6a0fa2aeacdb00f92c832683ad788743f17db625db4'), size_on_disk=2640507, blob_last_accessed=1756926374.030146, blob_last_modified=1756926374.8661423), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417607/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/623d085888c300fba1d05a9018ec3d4c31912f2b91a872ef34feda3322889b70'), size_on_disk=6809356, blob_last_accessed=1756926300.611465, blob_last_modified=1756926301.75846), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417831/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/53d810292e5a513a8ab543fe18783f8d493cedbadc7c97400dc103a4675954df'), size_on_disk=6767934, blob_last_accessed=1756926343.12528, blob_last_modified=1756926344.4612741), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381000/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/503f854145f3a220aa936072c3e66410d34914fef948992399a4addbfc977186'), size_on_disk=3904727, blob_last_accessed=1756926375.3401403, blob_last_modified=1756926376.5601351), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417912/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/4451877e599fa65e1f00f67b5fcd41fe74719d8bcb19ddacac419fb347c77cf5'), size_on_disk=4943909, blob_last_accessed=1756926357.7152166, blob_last_modified=1756926358.6302128), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417645/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/761eba472193b23cbf7735946e08aa1c4db5b1abd5125f1aad22ddabbf872f04'), size_on_disk=2080044, blob_last_accessed=1756926309.765425, blob_last_modified=1756926311.5084174), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417913/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/35c1a160b767f9f5870216ed33a1dc24804690772a5baee58f623c6dc0632127'), size_on_disk=9135350, blob_last_accessed=1756926357.9462156, blob_last_modified=1756926359.3142097), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381005/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/8168980e59c50149bfb9866a1e8855c235f2882b988df99331c9247f65e9813e'), size_on_disk=2289686, blob_last_accessed=1756926376.4721353, blob_last_modified=1756926377.2151322), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380990/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/4e04122d2830db35f2b82b29efaca846fe10d7638c88c3e4d3cf78bf10cf245a'), size_on_disk=2093813, blob_last_accessed=1756926373.9261465, blob_last_modified=1756926375.1611412), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417797/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/12ee278356fa75b0ad555fe06376c78b71630324b7b89aa28abe960c187d256f'), size_on_disk=14391201, blob_last_accessed=1756926336.3743093, blob_last_modified=1756926338.322301), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417816/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/86256a1b98c6d44750fd21ac9ea099262f1f0ec03d182b65741d669fb80f289e'), size_on_disk=10618044, blob_last_accessed=1756926339.7852945, blob_last_modified=1756926341.8372855), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417701/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/a0dd9587c7525a610a3794cf41f6db24a6ba22a12ed852dc27381a68db3e8fca'), size_on_disk=8781514, blob_last_accessed=1756926321.6273735, blob_last_modified=1756926323.1263669), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417662/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/90008c75533e5a395bf92b2a4382cfb8d6c27af4921615cac22bc1965375efb5'), size_on_disk=12701661, blob_last_accessed=1756926313.0204108, blob_last_modified=1756926314.158406), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417813/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/8f7d5239d0d39e171aad1e84964f8cdc3c7a325fe8c38b6829e114cb3997e658'), size_on_disk=10238322, blob_last_accessed=1756926339.4142962, blob_last_modified=1756926340.8722897), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380982/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/d1e44b0b82ea22fb6c513ce02b49facf5f38b64b2baa91b24942fc6858125e37'), size_on_disk=6977161, blob_last_accessed=1756926372.8241513, blob_last_modified=1756926373.4951482), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417865/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/348bfc1685f79e9ec7f559e96b61b8c43628d611378b6c4ead8bffea44415f1d'), size_on_disk=9233047, blob_last_accessed=1756926348.8982549, blob_last_modified=1756926349.9182506), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417696/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/2e1bce702520ca07d9037bd4de4d21bb30ef6b7fdf7de9d75ea232acc036a72e'), size_on_disk=31303382, blob_last_accessed=1756926320.7203774, blob_last_modified=1756926777.0657532), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417940/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/270559ca1db4605de241e8642461b4d7304d124e4fc624f7d8a8a59867301a4c'), size_on_disk=2328150, blob_last_accessed=1756926363.182193, blob_last_modified=1756926363.9021897), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417689/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/e27d1d153e5b135a09ec5db80bb05ace721fd60ac90f0693ac34ca3160fdfbdd'), size_on_disk=5015020, blob_last_accessed=1756926319.2973835, blob_last_modified=1756926321.220375), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381027/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/ba9aba2cc5b9bd9e83599ccba5920f028ac9c6936436dac50f45c606f3abd841'), size_on_disk=6537548, blob_last_accessed=1756926379.0151243, blob_last_modified=1756926775.5467618), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380928/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/251ffcb8e06f2ac0db0791065248327a9d1eeef33a07f5c6b25946e04b6c46d8'), size_on_disk=2289693, blob_last_accessed=1756926365.037185, blob_last_modified=1756926366.1961799), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417605/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/f689b77eccd8d95d70fb616f4e2aa03a41201aec2a8c3c86849b7b145cb0f796'), size_on_disk=1787545, blob_last_accessed=1756926300.5184655, blob_last_modified=1756926301.74346), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417706/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/4f1ea5592a855f3bf5739b0caae1c69c7606276d125c3dfe8e98379d12bedf1b'), size_on_disk=1154142, blob_last_accessed=1756926322.9053679, blob_last_modified=1756926323.5173652), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417886/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/62fe87e7519370c4ac46e6d9d0abf314a8f92d9a5d6804ebddae0ed095641219'), size_on_disk=7968977, blob_last_accessed=1756926352.9472373, blob_last_modified=1756926353.7992337), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417651/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/b95b6be3a7a6a5b4af3bbf15c5a4a6b3893f3030ef6d1dcfe79e6c6554e3d3cb'), size_on_disk=13551797, blob_last_accessed=1756926310.9544199, blob_last_modified=1756926314.6814036), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417864/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/15d259c52e80a2d2d9c20274ef940237c90001d32f2aecb80779335a708222ff'), size_on_disk=3706383, blob_last_accessed=1756926348.5722563, blob_last_modified=1756926350.0962496), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417834/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/a69c65a4d32399355863fa14d4a1e92ecff2c92498e122e8e77796de16d42948'), size_on_disk=8537233, blob_last_accessed=1756926343.3672788, blob_last_modified=1756926345.2462707), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380964/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/3a63df07de544662fb04d5388f34b0c8068efbc19f464b673d8a860a57b5b422'), size_on_disk=956844, blob_last_accessed=1756926370.333162, blob_last_modified=1756926370.8741596), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417650/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/c4d103888f768ff3cb0c68f4b747cb8d8aad8058b75174160cfc42b2504407f1'), size_on_disk=1337049, blob_last_accessed=1756926310.89842, blob_last_modified=1756926311.4924176), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417877/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/b170370a220db5677b9c6193707d32d1d14b9082ca574b9562c4ac8ea4c72c26'), size_on_disk=10449263, blob_last_accessed=1756926351.0392456, blob_last_modified=1756926352.32624), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417830/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/21e2dbb49f07cb7c2d7277f5bf1a442e216a2991677c3c4ece5c51316795ea77'), size_on_disk=12760016, blob_last_accessed=1756926342.9432807, blob_last_modified=1756926344.1602755), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380954/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/338886924ec42501a99a4b842e18fdbed28200d52daaf43e00ee601a147e0188'), size_on_disk=4903852, blob_last_accessed=1756926369.2841666, blob_last_modified=1756926370.101163), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417925/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/8cdb783613764a35fd3a73db89463db6876951d443f5651a9a46877a53917f3f'), size_on_disk=912848, blob_last_accessed=1756926360.409205, blob_last_modified=1756926361.1532018), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417872/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/7d71ca12594ab85797fca8c1b88947cb0f3ece93596f1a4c5adc04c11c0abf68'), size_on_disk=8522283, blob_last_accessed=1756926350.1742494, blob_last_modified=1756926351.180245), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417770/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/0e7437c90c616b78ccc0963e0388385028b70d66fbe73e0ceb4e30e2d0239816'), size_on_disk=8252889, blob_last_accessed=1756926331.6563299, blob_last_modified=1756926333.0863235), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417868/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/52e3f224a0147264d4ce477d26f4e11ff0745c328eb3e2ac44209dba1c8c55eb'), size_on_disk=5039768, blob_last_accessed=1756926349.6852515, blob_last_modified=1756926350.6082475), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417900/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/45d5a0bbf39e542f22aa4e36186673d1246533dc513f006312a26286496f6ead'), size_on_disk=11235417, blob_last_accessed=1756926355.535226, blob_last_modified=1756926356.6632211), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417848/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/eec4ffff5d145c0ee0850d5e68787ab90e6510cffa5803579db44e3a575822dc'), size_on_disk=13556826, blob_last_accessed=1756926346.1882668, blob_last_modified=1756926347.2852619), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417853/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/5995033a04f017892b30c025a1527f9b6420bf1e7636c466cdd8d1462cf228c8'), size_on_disk=2186899, blob_last_accessed=1756926346.9012635, blob_last_modified=1756926347.8172596), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380942/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/a958d997c1aff07ec07304485ee818280bc7714dd9c485ea73ef7fda81f2fe63'), size_on_disk=11407010, blob_last_accessed=1756926367.109176, blob_last_modified=1756926368.6211693), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417725/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/4195b9ed6506437a735fd8061cd1b79211c99169c4319c4f72b801a501f2b0f9'), size_on_disk=2233032, blob_last_accessed=1756926324.958359, blob_last_modified=1756926325.594356), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380932/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/f214422d02adb2b0a1037eb424d8574123cf27d8a5cd1418feb5733b0249f60c'), size_on_disk=6251591, blob_last_accessed=1756926365.6181824, blob_last_modified=1756926366.5551784), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417624/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/28c1de2a4bd6b421cd05c3bb211ffde83e4b830127f106ec7be42ad288e26ef6'), size_on_disk=16758887, blob_last_accessed=1756926303.7884512, blob_last_modified=1756926305.6184433), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417753/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/a8969c75211310735ad33d5af57327f2babc86276adc7cefcda254cc9fc9baa5'), size_on_disk=6433095, blob_last_accessed=1756926329.577339, blob_last_modified=1756926330.4023352), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417787/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/17f33e06b6bd77c0bbd3141b53fc2d9550b0c54e99b18fe79ddad931e3a283a9'), size_on_disk=12355842, blob_last_accessed=1756926334.5003173, blob_last_modified=1756926336.2953095), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417897/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/309aab637d037638082524649ef7b0828885a98e786c64af01072d68968b7d66'), size_on_disk=7857749, blob_last_accessed=1756926354.8022292, blob_last_modified=1756926355.9902241), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417731/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/39ec5969bc8f3acceb3aa30b72f0c0101785a36da340f3983ad00d8cc80160a4'), size_on_disk=1557745, blob_last_accessed=1756926325.658356, blob_last_modified=1756926326.2043536), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380950/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/f508a5da82832b920577592b853da2c2d76ebf0adca587e10b63507f46dc5066'), size_on_disk=3452513, blob_last_accessed=1756926368.7781687, blob_last_modified=1756926369.7081647), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417934/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/5816750a355a277e532d46ffa5159f8a2ec1ce66686fb12f07ccab4b4f7b3c98'), size_on_disk=7324082, blob_last_accessed=1756926361.995198, blob_last_modified=1756926363.89619), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417878/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/d3d105cc45280a5ea88c2bba754c0c5849d0cec7112a0926e9ed0c946e7f5cfd'), size_on_disk=8838404, blob_last_accessed=1756926351.2492447, blob_last_modified=1756926352.7352383), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417806/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/2befcad73b2aa8170ceaf8c249ceba45e87037771e86c9bb4bd7addc5815bbec'), size_on_disk=5276603, blob_last_accessed=1756926337.9863024, blob_last_modified=1756926338.8112986), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417702/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/41123195fa339a4ffabf6d686a284c343d0519089a18b828818c92ec43a26eae'), size_on_disk=6213776, blob_last_accessed=1756926322.0263717, blob_last_modified=1756926323.0323672), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417860/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/72e751d246e9b938ce9ab4380e2a1114ecd3c36d5829e3298f391c1edfefae24'), size_on_disk=3996315, blob_last_accessed=1756926347.9312592, blob_last_modified=1756926348.819255), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417615/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/3b8b3c4c686238c7c642151bbdcb2ba7796de56c1922e558a1e7f08507dc5abb'), size_on_disk=2866472, blob_last_accessed=1756926301.8314598, blob_last_modified=1756926302.5744565), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417856/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/3dc4e868f31759c9ac3d43f40097f5ed2f0efb6307e3bc1ccbed70c50a16ee4e'), size_on_disk=6126482, blob_last_accessed=1756926347.3832614, blob_last_modified=1756926348.3162575), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417907/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/b24926bd36cf33ea9107043b9a47db8239ad6208745c56002513225e8cb80a6e'), size_on_disk=7594380, blob_last_accessed=1756926356.7332208, blob_last_modified=1756926357.882216), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417719/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/cc0a365d1d79f0469571216394f0cb211b9096f2a567e0a347a18f0c496bac06'), size_on_disk=2076369, blob_last_accessed=1756926324.010363, blob_last_modified=1756926324.951359), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417792/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/196cd0cf279f6bc85ca0dbeb1f089721f63eb167644d999b45d8cfe9a1f52acf'), size_on_disk=11982413, blob_last_accessed=1756926335.4593132, blob_last_modified=1756926336.651308), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417617/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/9ad746ec09ed51ed497d87f3dd64b11916b70e6833ce66141bc8feb6ba73d1d3'), size_on_disk=23082279, blob_last_accessed=1756926302.1754582, blob_last_modified=1756926303.6944516), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417755/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/f4ffef985aed69a80417b5179ef9923bd73378a93a97f2dff128b1f7716b4599'), size_on_disk=8437424, blob_last_accessed=1756926329.777338, blob_last_modified=1756926331.2423315), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381057/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/4e6c617d22a655d32847df8b5334f4fb7236a15b8bf6e30aaabcbb0a4c55782f'), size_on_disk=17494867, blob_last_accessed=1756926778.5417452, blob_last_modified=1756926780.4997342), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417817/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/ccf81aa26021a009a212ef8f8bee71c12f5c8d0a2b97ffdb2c4d86d13c8b8133'), size_on_disk=7521086, blob_last_accessed=1756926340.0802932, blob_last_modified=1756926341.3222878), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417695/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/d6ae8d39c2949235dd4d82e7a0e56d9f20c20ba5f1b7a445fbba1f0df29c88b7'), size_on_disk=7710772, blob_last_accessed=1756926320.4913783, blob_last_modified=1756926321.959372), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380971/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/0147090dd5a74f558b67151503ff635423a8af661b56d1faf06bb50a59adfb3a'), size_on_disk=1276600, blob_last_accessed=1756926370.9411592, blob_last_modified=1756926371.462157), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417811/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/da92b1c1579a2295da085050305367bd02a47fd52a74d1ec2a9f227b5e808b11'), size_on_disk=6127159, blob_last_accessed=1756926338.8782983, blob_last_modified=1756926340.1172931), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417603/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/fdfb1fd9b21341104e6423c684b0ae8a52ed1701eb3e232babb98340f953ea5b'), size_on_disk=4460026, blob_last_accessed=1756926300.366466, blob_last_modified=1756926301.2574623), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380959/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/4693c1876e880fee1ec277fd5ae9127aafa9f40d41a3d5ed482dfcfa8cef55f1'), size_on_disk=873902, blob_last_accessed=1756926369.7061646, blob_last_modified=1756926370.1731627), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381054/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/1a9f9c650973937d852d2de54c0d73336d1516184d29e203563fd9bf8d7fefa5'), size_on_disk=3346485, blob_last_accessed=1756926778.1077476, blob_last_modified=1756926778.8637433), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417819/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/a1ba6808d672dbc9c58c26364c5cd9ed0b8a3041e6714844846648d3cd1aa050'), size_on_disk=3244517, blob_last_accessed=1756926340.7022905, blob_last_modified=1756926341.6952863), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380927/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/38e67ea2fef57610ff63c0c298a6db65ae1c0a159f2b1e61940f2e65306b8fa9'), size_on_disk=7297320, blob_last_accessed=1756926364.969185, blob_last_modified=1756926365.7781818), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417749/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/8460c24102627d611c8918a1620078ee1bdf9929c44cdca863cd0dbb23f9cf1f'), size_on_disk=4938551, blob_last_accessed=1756926328.814342, blob_last_modified=1756926329.7143383), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380996/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/09e663ce2a60df5635f4e4ea77d6350e7752a9bb5e81fbdfd818869ac53ca805'), size_on_disk=13309208, blob_last_accessed=1756926375.0511415, blob_last_modified=1756926376.0221374), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417626/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/1981a2a1e321acc5c35bbeb9b61a12636c27f28d6e216de20093afdbee9f8689'), size_on_disk=12046611, blob_last_accessed=1756926304.4464483, blob_last_modified=1756926306.2084405), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381049/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/c571bcc5f87a79c7cd38b9614ee83b0d5cc742094e17a7990c829394fca193be'), size_on_disk=3753119, blob_last_accessed=1756926777.145753, blob_last_modified=1756926778.2097468), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417604/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/e6dd3abdf75462f1f5d16f19383b98a8a4c817dc3b13c7cea2b4bea611c40c62'), size_on_disk=961320, blob_last_accessed=1756926300.390466, blob_last_modified=1756926301.3534617), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380977/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/74237b16fd38090785d203d96bae72e1f5ce919d423f793cfb9d3b86623358e6'), size_on_disk=11114601, blob_last_accessed=1756926371.6651561, blob_last_modified=1756926373.12615), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417771/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/8fe5513ae86b28ecca3d4c70bdb73097f578dd31b4d08a072960b61550f582fa'), size_on_disk=3970195, blob_last_accessed=1756926331.9313285, blob_last_modified=1756926332.8643246), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417733/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/542261ed8bf189205b611485f1a878f28cae5c90e452ba9cad687eacfdfc2814'), size_on_disk=2645461, blob_last_accessed=1756926325.8883548, blob_last_modified=1756926327.3873484), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417683/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/6131abcd72bd4c6a8627685c4df8a295af50e824f52a5cde156adae7cd694bba'), size_on_disk=15545733, blob_last_accessed=1756926317.382392, blob_last_modified=1756926319.0483847), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417730/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/8f2554716083cfa1bb54c2098315b92e1b3537eb265da884c00ba7e7d4c050da'), size_on_disk=14980864, blob_last_accessed=1756926325.5143564, blob_last_modified=1756926326.803351), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417760/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/0500e5835116a61477f2d98554d0dd97e8d932a6b564f709c40c4884f2b49052'), size_on_disk=2234621, blob_last_accessed=1756926330.2873356, blob_last_modified=1756926330.8273335), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417639/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/43c5fb0a4f049fa5895f8167928d047cc4954aa37db1fc8a9892fe9f99c401bf'), size_on_disk=782319, blob_last_accessed=1756926308.7694294, blob_last_modified=1756926309.6284256), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380943/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/987771cb978dbc949eff5044071149a00223c6771efcc870850e3c8cbe0648c3'), size_on_disk=13397144, blob_last_accessed=1756926367.295175, blob_last_modified=1756926368.7601688), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417727/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/414c702fdf3dd97c536f3237d2c38e0ef5ad8d08bb7c991ee252e16d57b201b6'), size_on_disk=2636905, blob_last_accessed=1756926325.0683584, blob_last_modified=1756926325.8093553), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417616/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/0b0f3c9481fd0e254abd544ba803de0c9da9b3a686451975f981d02fc481148d'), size_on_disk=15058562, blob_last_accessed=1756926301.8254597, blob_last_modified=1756926305.0234458), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380948/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/6ac219aa7f5be77c21bde60c80ccad59edf825e2a009d6a4b7fc8bd4d8be83d4'), size_on_disk=6046323, blob_last_accessed=1756926368.6691692, blob_last_modified=1756926369.5471654), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417790/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/f88bed27521de8285bbb92cf95242820116e6fa2c23ab578e6608a6a25e0c04e'), size_on_disk=10836858, blob_last_accessed=1756926335.078315, blob_last_modified=1756926336.1843102), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417889/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/295f9efb0d78e3a116ce4022a7830297570446e2017a3fcd3b3d423c0c20664b'), size_on_disk=9617979, blob_last_accessed=1756926353.3102357, blob_last_modified=1756926354.7322295), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417622/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/28eed3b67a6206b453d8adb775edf2bea32cfd54ba5b37d51787f67280e0317f'), size_on_disk=20411775, blob_last_accessed=1756926302.674456, blob_last_modified=1756926304.8564465), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417840/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/7e32bef3c0f7f7520382325aba12f5cf9405c22ebce8554b1d877fe24d2da52b'), size_on_disk=9679209, blob_last_accessed=1756926345.0092719, blob_last_modified=1756926346.0542672), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381026/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/e41f2d7a1b084a404e7f5e9e003b27f88665a983ec73859f7eb42cd8bfa2bdd3'), size_on_disk=2857045, blob_last_accessed=1756926378.898125, blob_last_modified=1756926379.7281213), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381011/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/839fea471c67e08e0b68ae6344432c76da559e67c505c3f6120b20db2f1f753e'), size_on_disk=4378212, blob_last_accessed=1756926377.2971318, blob_last_modified=1756926378.0891285), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380941/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/9a23f03fee8d60d752bd0bc1e59e9205abea4f3a78e3272b9a03447f076383d5'), size_on_disk=3514837, blob_last_accessed=1756926366.9851766, blob_last_modified=1756926368.7591689), CachedFileInfo(file_name='GSE209631_metadata.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/GSE209631_metadata.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/be49eef90682fbd55a5e60ada8b4c1c554e1b5a2ec1209996269db020d3bb074'), size_on_disk=4258, blob_last_accessed=1756926300.173467, blob_last_modified=1756926300.594465), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417930/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/62f17648a07c7675d6a101e3d12efeae247c742e37eaacf96f95e03dceedefde'), size_on_disk=20574235, blob_last_accessed=1756926361.2242014, blob_last_modified=1756926362.5701957), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417801/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/4bbb1b3faac1a4c15eaac8c81260f4f18493d2a082915cbe4c480ce76a94140e'), size_on_disk=4600965, blob_last_accessed=1756926337.090306, blob_last_modified=1756926337.9153025), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380986/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/f23d3060df5cb96778fbdf10fff0155c931e1a1e8cc077c7f1582542d54827f5'), size_on_disk=5858046, blob_last_accessed=1756926373.285149, blob_last_modified=1756926374.5971434), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417821/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/c29db68a3430cf8cb01716d90a92105387777ed1361992eb9501271c35359cfb'), size_on_disk=6597050, blob_last_accessed=1756926341.1692884, blob_last_modified=1756926342.2702837), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417887/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/58707813ec63af2eb1a72b0c8317b4701b4777df8f828f2e08a0cbb1ccfa534f'), size_on_disk=9080847, blob_last_accessed=1756926353.0712368, blob_last_modified=1756926354.412231), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417871/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/e44cb883893b5e30baa752dfbd773b31795f2072a7ec4f473371f435989cad42'), size_on_disk=6264896, blob_last_accessed=1756926349.9852502, blob_last_modified=1756926350.937246), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380968/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/549ef4e39a437adb99efecdc9b05fee3ca4b31c878b99f7a4cf1398f405ca049'), size_on_disk=3575926, blob_last_accessed=1756926370.7441602, blob_last_modified=1756926371.5941565), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381067/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/da9c5ab3c0f970f9c2af4c0f5f0e9a95d47769e8096ff6b9d76092becbc8517e'), size_on_disk=9837433, blob_last_accessed=1756926780.2977352, blob_last_modified=1756926781.5777283), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381032/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/335a75b123ae38273f1b1a10f16e9547eec356f68a1d3078926f011d3332e2b5'), size_on_disk=656385, blob_last_accessed=1756926379.6781216, blob_last_modified=1756926775.4077625), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417904/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/cf9194917da2f4c6c3f0411256b2604e8c14959373d7431047de2e23cffbb9bf'), size_on_disk=7269877, blob_last_accessed=1756926356.062224, blob_last_modified=1756926358.1762147), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417691/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/48de5a2968610e5ab855522d27f535d40e1973c1835ec32fcfa5bcd91a0795dc'), size_on_disk=14481663, blob_last_accessed=1756926319.447383, blob_last_modified=1756926321.2093751), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417808/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/0251ae27bd961d6c962253c555963b42148092cdaa2616a78f6872cf1a4e5e4d'), size_on_disk=12900784, blob_last_accessed=1756926338.3973005, blob_last_modified=1756926339.439296), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417795/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/b995a48f28ae3baa145438a39c9fed73fa8689eb31e07ff02a3bc6bc827eb76e'), size_on_disk=8032882, blob_last_accessed=1756926336.1053104, blob_last_modified=1756926336.9993064), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417669/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/8eaf08a45dbd72787c5250403f3d3079db9902e77527c7ca438abc740fdfb753'), size_on_disk=7376929, blob_last_accessed=1756926314.6724036, blob_last_modified=1756926316.0383978), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417883/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/8e23cc97e00adfd244a989f83702f4ce7a998ba7998dd95908a1b4ec256bc054'), size_on_disk=10709227, blob_last_accessed=1756926352.4542394, blob_last_modified=1756926353.675234), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417827/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/8bfd1221f930e86ffd6150c82aae85f00a23fff33c672e40f47d671f5a2e4d17'), size_on_disk=7291599, blob_last_accessed=1756926342.156284, blob_last_modified=1756926343.1502798), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417873/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/d282d59c3c191bd7b7fa7251108224d0ac3929bc98f96a2c43f283f8a5cd046a'), size_on_disk=7609540, blob_last_accessed=1756926350.3852484, blob_last_modified=1756926351.373244), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381039/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/5f95b6de8d0d73589c612313eaed6ce1e6adf734e795719f23df6f54cbbedc69'), size_on_disk=1656495, blob_last_accessed=1756926379.8751206, blob_last_modified=1756926380.283119), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417789/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/6bd6339b995a5730b6b65ec65f680b57195469ebee9f2e15de6e604b069d4b0e'), size_on_disk=9612750, blob_last_accessed=1756926334.6493168, blob_last_modified=1756926335.8183117), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417867/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/f4783b00d4674df92d84f9a304fb9d143c7b9e4754e3efcb9c072a49a9b9298b'), size_on_disk=8536709, blob_last_accessed=1756926349.3882527, blob_last_modified=1756926350.2882488), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381035/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/410edf5a6097a77bcec6880ec2618b4aec4bc10c44607163fe2b27aca0a05117'), size_on_disk=1451911, blob_last_accessed=1756926379.8691208, blob_last_modified=1756926380.5191178), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381036/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/ff76dabe83d2948240dd846fcc2a969380b7bd9f8d2eff2c71c299f0b14134cd'), size_on_disk=2593393, blob_last_accessed=1756926379.8661208, blob_last_modified=1756926380.5611176), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417707/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/89192f5007f730a90c823219ea107c799935f5e824963090abad813603160fa8'), size_on_disk=2615872, blob_last_accessed=1756926322.9183679, blob_last_modified=1756926323.9473634), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417666/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/38008595447ab722c1889d0e19babf74299822ac7733504a4f09e1af1e4ad1ac'), size_on_disk=7712422, blob_last_accessed=1756926313.945407, blob_last_modified=1756926315.073402), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417855/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/ac2218ccb9f80664ce377351ea1dc84bf3f70b36006a86cffdb761a40ddab791'), size_on_disk=1981158, blob_last_accessed=1756926347.3992615, blob_last_modified=1756926348.3062575), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380993/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/d568d29486ad3ec6f67f87b78cacb8b7102c591f979ec2bdb17ff75bde789670'), size_on_disk=3031778, blob_last_accessed=1756926374.496144, blob_last_modified=1756926375.2661407), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417938/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/bbfeb8571ee8915d793698c2116cef9f2ea61573c66b4ae7066078271c6172d6'), size_on_disk=5132301, blob_last_accessed=1756926362.6441953, blob_last_modified=1756926363.7271905), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417847/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/4dd39704bc9c1d60a8271f16206887171ee92abde023c53501c1560533a688c1'), size_on_disk=4755120, blob_last_accessed=1756926346.125267, blob_last_modified=1756926346.8372638), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417796/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/52d8262071c9c9de8cd8fb52be9e67acf125b8ce67033749fb92989f0d1f9345'), size_on_disk=10184069, blob_last_accessed=1756926336.2803097, blob_last_modified=1756926337.4333048), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417705/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/92b4f5af6793c98b854bcca83c86ac0cb41ec4e40ec23d3ba9c4ff287852c63c'), size_on_disk=332184, blob_last_accessed=1756926322.5923693, blob_last_modified=1756926323.1503668), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381018/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/878bf028aca9daeebe7ed0f8ee522580d512771ec4a5e35bfc62ac675176ad15'), size_on_disk=701951, blob_last_accessed=1756926378.2281277, blob_last_modified=1756926378.643126), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417903/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/3ddfaef9dc57d0f63b8613408aba3af803dabd5ee87a1584edd9391b2b41bec3'), size_on_disk=11218813, blob_last_accessed=1756926355.9432244, blob_last_modified=1756926357.2232187), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417684/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/eee25a9a91c54d0404db884532d1b85555f2e1de0c664abcbda4343f9d729842'), size_on_disk=10310520, blob_last_accessed=1756926317.7173905, blob_last_modified=1756926319.3473833), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417780/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/4f1b7c3ded6441c8e906c3494ae63750d25303ee6f8d8e05120dc11ce8d79080'), size_on_disk=9025914, blob_last_accessed=1756926333.2923226, blob_last_modified=1756926334.387318), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417734/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/10d4c6d9acc5b56370fc32062262ae3871008e20e960c6b352ac27a4e229c905'), size_on_disk=1495194, blob_last_accessed=1756926325.9453547, blob_last_modified=1756926326.745351), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417804/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/16ee23ae4dbe8a4d968a614f5e6de65028b6edce012231a50eb776d65f949d34'), size_on_disk=2107502, blob_last_accessed=1756926337.7883031, blob_last_modified=1756926338.5083), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417653/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/59233342d3c3b71fce95d5295cf7c1e5ef3f4254dbc120229cb83b6c8115c759'), size_on_disk=12208357, blob_last_accessed=1756926311.136419, blob_last_modified=1756926312.3904135), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417850/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/c08fca103aa3dd1944dcf5c73b67397263842dc687b4101c8ab2c1a63d50205a'), size_on_disk=1182282, blob_last_accessed=1756926346.5122652, blob_last_modified=1756926347.3082619), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381059/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/59daa71f55c9c0da6f8d56fc2497491477312fbdbcb9e92d4951874f2ed72686'), size_on_disk=13124154, blob_last_accessed=1756926778.926743, blob_last_modified=1756926781.8157268), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417602/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/26ae91a914e9ad095615c88ac3c62c99e28ae2687ccd66007f13ac8fbbffa3fe'), size_on_disk=3717181, blob_last_accessed=1756926448.0367467, blob_last_modified=1756926301.5714607), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417823/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/1887333f71d11fe7c43748662f58d9ba3fb12f3f6ca30a61d7e1c7a0c95f7064'), size_on_disk=9444476, blob_last_accessed=1756926341.6372864, blob_last_modified=1756926342.8442812), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417908/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/128e54326b088977000f8caaba7d09e324281ed7fe0e3e5f2779783f0a82f293'), size_on_disk=3614805, blob_last_accessed=1756926357.1272192, blob_last_modified=1756926358.2542143), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417677/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/cfcdac65879c0c26771c1cd1c4a656d4db20e6adb2ed3ad833cf23cd35a9d848'), size_on_disk=13135194, blob_last_accessed=1756926315.9083984, blob_last_modified=1756926318.1283886), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380975/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/ea018f4882a60f5fb07a4bbae51bd494e44e6df9b215d9da5e405687a567dcbf'), size_on_disk=8369150, blob_last_accessed=1756926371.6051564, blob_last_modified=1756926372.7471516), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417825/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/efc4971b63b21d6aee96e17d9e9b2cc2e3fd34608cc3722e36032c594fc03e48'), size_on_disk=8550531, blob_last_accessed=1756926341.8732853, blob_last_modified=1756926345.0782714), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380992/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/dbc6b1de202e7fb25701c02607facc3e8c332550907ca0b70ba177533a311d6d'), size_on_disk=2551903, blob_last_accessed=1756926374.2281451, blob_last_modified=1756926375.0341415), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417728/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/2be09708238c50a31b4e9b334c6202c26eaa6a950ebf3de4287651999d45097e'), size_on_disk=3385901, blob_last_accessed=1756926325.0603585, blob_last_modified=1756926325.869355), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381062/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/76151e17f242c5108010debea961b0b13f3821be2fcb77fd923a36cc0e672695'), size_on_disk=4880117, blob_last_accessed=1756926779.3357406, blob_last_modified=1756926780.5167341), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380997/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/add73ab9723ef88a47192fa169b89aaf17c16a0841fc9e53c1c5d73cfdc2ad50'), size_on_disk=15536494, blob_last_accessed=1756926375.1291413, blob_last_modified=1756926377.6141305), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417754/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/0d840353f1148a08928dfb3177a0abfe635c406fc77d6d9958743585ade6b872'), size_on_disk=1101891, blob_last_accessed=1756926329.758338, blob_last_modified=1756926330.215336), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380969/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/10410488c74f35a493cef2ffc4a64b28b336be1fb32a2b6dac692e0d7fbd4271'), size_on_disk=1973405, blob_last_accessed=1756926371.041159, blob_last_modified=1756926372.1191542), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380965/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/6b384b53be751cec9056e74c2c8ce9fe2e532cfbf6a38e43bb09ce74ec817f3a'), size_on_disk=12974487, blob_last_accessed=1756926370.3761618, blob_last_modified=1756926371.4361572), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417899/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/b4349f94cb0679468e9c28a9439c1a71c88d7288adc3bdad0ca31a2590f1d4e0'), size_on_disk=7910137, blob_last_accessed=1756926355.3832269, blob_last_modified=1756926356.3022227), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380929/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/aeddf7cf3aee11ee08c90d305e66fc682e675959c171ecfaa591bc1d9015ebbc'), size_on_disk=6656759, blob_last_accessed=1756926365.1011846, blob_last_modified=1756926366.0181806), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417690/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/283af9b17d776b75f6ff6ad1f426d6d0e0e5aa1f2089a20910edcb6763191b69'), size_on_disk=12265448, blob_last_accessed=1756926319.171384, blob_last_modified=1756926321.4153743), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417807/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/ebc732f5d06d745bb63af3c7f5966722972fa722ef9d0194d71cf6d9485fda83'), size_on_disk=14049230, blob_last_accessed=1756926338.268301, blob_last_modified=1756926339.5272956), CachedFileInfo(file_name='.gitattributes', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/.gitattributes'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/1ef325f1b111266a6b26e0196871bd78baa8c2f3'), size_on_disk=2461, blob_last_accessed=1756926300.2174668, blob_last_modified=1756926300.2804666), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417687/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/fa4f79173f5a5ed278c7ef80d9fb220aa9c615de97bae7963969e27ecf0e587c'), size_on_disk=13352339, blob_last_accessed=1756926318.3003879, blob_last_modified=1756926320.3913789), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417619/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/0a4d4159941caa34a5e079821d49047d1434077926b2933c27d384e04ed2e5b5'), size_on_disk=9949802, blob_last_accessed=1756926302.5744565, blob_last_modified=1756926304.05045), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417718/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/3ba23a0e7c5b3a455e95fbf6a22b51687d5330c19207241bdfd189e3776f5ba3'), size_on_disk=3304954, blob_last_accessed=1756926324.0233629, blob_last_modified=1756926324.986359), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417799/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/cba2fb579c7706a7ff38fad3c7c649000560c1052e681adea986c5cc02c02cfb'), size_on_disk=5688906, blob_last_accessed=1756926336.7133079, blob_last_modified=1756926337.7073035), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417924/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/f2053fcaa7f21b1703e3491e167b457c29759b2fe9535f0cb2bfa257a1b86cea'), size_on_disk=6988370, blob_last_accessed=1756926359.940207, blob_last_modified=1756926360.8152032), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417667/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/ef18f0ec136ef4b0facd217b32604556a7c9f514d75e26d2be17eb8da9b74a14'), size_on_disk=7979245, blob_last_accessed=1756926314.2284057, blob_last_modified=1756926315.734399), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417824/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/65d997db1339fc5a8380a61fcf1ab1a6d28fb9142fc7d78c89236dc218b91ea3'), size_on_disk=8218856, blob_last_accessed=1756926341.7612858, blob_last_modified=1756926343.0512803), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417833/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/48792db399d7f0aab6765109e8c01abd61a2da93e5d0bdf7c325bcffe01fe113'), size_on_disk=8139822, blob_last_accessed=1756926343.3262792, blob_last_modified=1756926344.480274), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417809/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/486100d34df42ce97476d9753f8d57edb5f53596eaeca2d0cf81f5f95a4f4e12'), size_on_disk=7376702, blob_last_accessed=1756926338.5872996, blob_last_modified=1756926339.9172938), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417942/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/f3a5b1eb0f954ddf66a8dfa1a38465726d2251730d3898865535269a43c7fa4a'), size_on_disk=6911573, blob_last_accessed=1756926363.415192, blob_last_modified=1756926364.3521879), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380998/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/6808b8e1fa74b807321826b2c54182a3b6e257f0a223be0a6a94caf9463959e9'), size_on_disk=16847111, blob_last_accessed=1756926375.2231407, blob_last_modified=1756926376.572135), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417911/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/3ca8b1b14564fbc90b44995a61a34d5a67482eb5fd867ce4f9f2196de2b73539'), size_on_disk=3051177, blob_last_accessed=1756926357.404218, blob_last_modified=1756926358.3022141), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417665/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/7c9b3d448e53fcac9334df092cc003f438cbc462e6785ac3d61202bd65a7125f'), size_on_disk=7481605, blob_last_accessed=1756926313.490409, blob_last_modified=1756926314.7744033), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417657/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/62398c518588ad11a939f95736cf0ed01a39890567cd4a8f05e9c9ba1155f9c1'), size_on_disk=10144285, blob_last_accessed=1756926311.5634172, blob_last_modified=1756926313.1174104), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380981/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/198a0b08846a11e72de25e0e9c44f495ef88c9260de2e036d240715004c04953'), size_on_disk=8535317, blob_last_accessed=1756926372.5771523, blob_last_modified=1756926373.8291469), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417874/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/6674c316eab3b03bbbe3502565efff7650445f5e1453933336d74938e42a14bf'), size_on_disk=6201897, blob_last_accessed=1756926350.5772476, blob_last_modified=1756926351.5142436), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417668/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/b426513aab4e071a24b7d27100a4da4da8c71c8e9b3e82163808b0ec6a42b8d0'), size_on_disk=6011682, blob_last_accessed=1756926314.5374043, blob_last_modified=1756926315.4354005), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417783/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/9603c95097c451e616c54d52d3474b866453409905ea16365d019e77c6ca31b3'), size_on_disk=4176778, blob_last_accessed=1756926333.6983209, blob_last_modified=1756926334.4773176), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380960/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/60771297311bf96c3213c2a96321062a06067c91d2bba2fce69a7a6d7a0c980e'), size_on_disk=560607, blob_last_accessed=1756926369.7811644, blob_last_modified=1756926370.3121622), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417610/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/a93c708ade83c1c4521589a51835a1dcc98972ab900e0eefdb8ce34e64394bf4'), size_on_disk=2566580, blob_last_accessed=1756926301.328462, blob_last_modified=1756926302.4814568), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380947/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/3d6062f97d18f4945c36590355a312df2a16206c26626392cfb44d9613aea49d'), size_on_disk=9201129, blob_last_accessed=1756926368.5321698, blob_last_modified=1756926369.383166), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380938/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/fc6783c3340c869bc3838a9a9d5abb8bca67361feeef12c42b6cf478d73bfd66'), size_on_disk=9020794, blob_last_accessed=1756926366.2571797, blob_last_modified=1756926367.2561753), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417937/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/e432985b09ba72eab54fc008b87b20a718dbae96d920a61b47de7e16d214227c'), size_on_disk=6445266, blob_last_accessed=1756926362.6341953, blob_last_modified=1756926363.6641908), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417890/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/3eca80a3220fd0294fdfca83740ba70a20af764d28833494948705c4d542ded3'), size_on_disk=9018889, blob_last_accessed=1756926353.7652338, blob_last_modified=1756926354.7252295), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417788/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/fd6f44471f1b86568bc059f25e5ccec7bda2b85b9f8be536aeaf2305fe015f5d'), size_on_disk=9112879, blob_last_accessed=1756926334.546317, blob_last_modified=1756926335.971311), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381003/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/43f12fb4826be4c7a3dd64800fc72cbcab2a9ce5bed7f7b5b9aee974deb7c06d'), size_on_disk=1116877, blob_last_accessed=1756926376.094137, blob_last_modified=1756926376.6491346), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381034/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/112a8e705f7536fa06c630165d3c409f87a5784ad626b891cc2b3919e24bdd14'), size_on_disk=250813, blob_last_accessed=1756926379.795121, blob_last_modified=1756926380.1791193), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381033/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/5fff567b96a4649d787e93ef8cdca2d8ce977b7f2eb25d6f78e50bc30c798bce'), size_on_disk=1712156, blob_last_accessed=1756926379.7391212, blob_last_modified=1756926380.3661187), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381048/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/53cd017f084781da16e69ae01b406ac0d8824a4d991e39b22c428e664a3808f2'), size_on_disk=2181901, blob_last_accessed=1756926776.9907537, blob_last_modified=1756926777.7067497), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417693/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/98a5478f5bd4feb033ac69510f2bf785c87777377b7b8b6dce24b8f89dbd6a03'), size_on_disk=18580692, blob_last_accessed=1756926319.7013817, blob_last_modified=1756926321.9783719), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417758/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/54f8e8d84e5b8b9457aaa143829fcfd5a4715d0fb184478ce659cd6c6efe7be1'), size_on_disk=4176901, blob_last_accessed=1756926330.0203369, blob_last_modified=1756926331.1383321), CachedFileInfo(file_name='GSE222268_metadata.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/GSE222268_metadata.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/e0f1d5b82fd97798d0c02d992c49c0e83d8dcf82e07e001284e78d9656355481'), size_on_disk=4411, blob_last_accessed=1756926300.1784668, blob_last_modified=1756926300.5344653), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380994/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/7c5f237d9c32233a005e39386360f02d9e1932f9ac267ac3458f96aaf59e01ef'), size_on_disk=2496053, blob_last_accessed=1756926374.8051426, blob_last_modified=1756926375.3471403), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381044/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/89a79f2c5f6258380ff4776af49db6119ac0c04342686705cfde94feebd7c9ca'), size_on_disk=3576881, blob_last_accessed=1756926776.065759, blob_last_modified=1756926778.8987432), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381066/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/ae017c72cf3b91ea3fdd40f63a126f640e7049340baca502ff3228c6ee40a7b4'), size_on_disk=13139272, blob_last_accessed=1756926780.000737, blob_last_modified=1756926782.6217225), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417671/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/bced5a56092fea7de5754bebc33caaab643f27236f8d9020e7bc6b30b9045d6e'), size_on_disk=5347180, blob_last_accessed=1756926314.7704034, blob_last_modified=1756926315.8293986), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417846/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/a3dc440aac6a69d75ec73242a30e59d8183023a2677f1576b5e73d037d6e6654'), size_on_disk=2680468, blob_last_accessed=1756926346.0562673, blob_last_modified=1756926346.7392642), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417608/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/d225f561536fca4d073d2418a32efc3df68e9666a3403c7d8b1ee0a520861322'), size_on_disk=873305, blob_last_accessed=1756926300.603465, blob_last_modified=1756926301.521461), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417729/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/8383a0e86c2bfe15c46045980d6768eb4e7e3c4c157ff1ae7ce20144e245ca80'), size_on_disk=3058876, blob_last_accessed=1756926325.3113573, blob_last_modified=1756926326.103354), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381047/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/dda6f6ca669a0e5c9d34b548305cbeb6b82f1b70086e06d3604e7e096fd7fb25'), size_on_disk=6799004, blob_last_accessed=1756926776.8777544, blob_last_modified=1756926778.4717455), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417791/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/8d2eb1bdd1a5cae023abb271e5858cf111879953cb0d6267f4ebf21a3d30c4ad'), size_on_disk=5232904, blob_last_accessed=1756926335.2033143, blob_last_modified=1756926336.417309), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417785/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/209799965b03585e1143e881d87c5cea80d37e06637d2fc3f7e1ad2eaadfd3bc'), size_on_disk=11557515, blob_last_accessed=1756926334.1813188, blob_last_modified=1756926335.1353147), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417829/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/63a9af789279180a07d8e8afe07bb0e9b8aa32c4340e01347a6f7ee809198308'), size_on_disk=5677879, blob_last_accessed=1756926342.904281, blob_last_modified=1756926343.7332773), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417664/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/91fc5dce8497d1fc5466fb5ac31e44a20b295d8f97825500f256f6830ea2f983'), size_on_disk=8203399, blob_last_accessed=1756926313.1894102, blob_last_modified=1756926314.6624038), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380946/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/62fe0084bf7a8f9f484e76366f203e1c68862a211d5d501c08e19b8f28678d6b'), size_on_disk=8090103, blob_last_accessed=1756926367.6871734, blob_last_modified=1756926368.7681687), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380955/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/369e0f1de51ea197e928618e562613eeed2ed59ce5c1389ce07ff6dd96d391b4'), size_on_disk=3596907, blob_last_accessed=1756926369.4561658, blob_last_modified=1756926370.2661622), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417638/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/c8cface311cd55cd1542ed94d609da72cd2b3087422c28ad26c60e1d81a3e6bd'), size_on_disk=993699, blob_last_accessed=1756926308.4724307, blob_last_modified=1756926309.2634273), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417656/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/03f8707a323ef7563c8ec2d6352d24c6454d043c4bbbca417680df5c7db208f8'), size_on_disk=10029839, blob_last_accessed=1756926311.588417, blob_last_modified=1756926312.8104117), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380931/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/be287270314fed3dded81e0c24a66d8bd991b219fd6a157f1aeaa1690262e563'), size_on_disk=4324179, blob_last_accessed=1756926365.1611843, blob_last_modified=1756926366.1241803), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417717/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/689842dffacf7efa4ba6cd9b4314d195e89f9105eb71675f27019a9e25be63db'), size_on_disk=102777, blob_last_accessed=1756926323.9603631, blob_last_modified=1756926324.5793605), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417906/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/34d3381dd45e579dcc298e91cc95174c4219cf548401b8f2392cf746fd508c5b'), size_on_disk=13795318, blob_last_accessed=1756926356.3642225, blob_last_modified=1756926357.640217), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417800/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/8e563a7d64bf06e7fe26c2817271f626273eed4c7dd19db0b9fd8e14c25f73b3'), size_on_disk=9904697, blob_last_accessed=1756926336.7953074, blob_last_modified=1756926338.958298), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381058/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/9dc97b5892fcc8af593c03fad2040396b9b10626a655319ad40b38acd75e1d7b'), size_on_disk=11534270, blob_last_accessed=1756926778.543745, blob_last_modified=1756926779.7317386), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381040/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/a7c622a8ae1f5083eef26ecbcc2444055a9265b674ae80002e5468c94c0eb070'), size_on_disk=9682671, blob_last_accessed=1756926774.782766, blob_last_modified=1756926776.7967548), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380951/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/b9b68679dd2da94e4aa4852ae3de3ee5b823c1fb1483f4ece3eee5d947014785'), size_on_disk=1154119, blob_last_accessed=1756926368.8281684, blob_last_modified=1756926369.410166), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417929/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/85b194132215311994e48908799e31357d1094663a232e8170a4edcfa5a11eff'), size_on_disk=9155231, blob_last_accessed=1756926360.9862025, blob_last_modified=1756926361.9231985), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417751/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/9a5733130de69115a731511dac8bf577fc5609a7f9649106ae9a12ffae7011ba'), size_on_disk=1660869, blob_last_accessed=1756926329.2023404, blob_last_modified=1756926329.9043374), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381008/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/64a119c8028083521d3872e2daca1545f2f8753eb8acfc751bf0c6d25803a3f8'), size_on_disk=2003211, blob_last_accessed=1756926376.794134, blob_last_modified=1756926377.2761319), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417675/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/79072181ebce12769f2f0a1971977d6691dde1841b53d7b54cf9d4f47f53b3dc'), size_on_disk=6478433, blob_last_accessed=1756926315.763399, blob_last_modified=1756926317.2643924), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380945/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/51efeee4439e81d920c4d9e860de321d90c72703600044e52215c774e1540650'), size_on_disk=6902367, blob_last_accessed=1756926367.356175, blob_last_modified=1756926368.6011696), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380980/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/e36f58da696f0827c72fed8647c2a45a0499fdc5d1859386dbf025388f07e16d'), size_on_disk=2288305, blob_last_accessed=1756926372.371153, blob_last_modified=1756926373.2051497), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417947/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/50bb9cc74904c9d3b6cc3ec8bc12c7ad1fb89204be6f89a863e703a94d224512'), size_on_disk=4023877, blob_last_accessed=1756926364.0161893, blob_last_modified=1756926364.9561853), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417875/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/8dc3634dae23cad7f562d15903341a67d3048929bd2198e53f71529b00a3508a'), size_on_disk=11490812, blob_last_accessed=1756926350.6762471, blob_last_modified=1756926351.7832425), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417776/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/01c4a97f92cd5e2919759d52d083eb824408a2fd7ade9413a060c6b20d233963'), size_on_disk=2759569, blob_last_accessed=1756926332.8593245, blob_last_modified=1756926333.629321), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417939/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/a0fc7804b026f95809f33087deb75f00beeb9635475b62864d3e58e948361bfa'), size_on_disk=9099632, blob_last_accessed=1756926362.9701939, blob_last_modified=1756926363.9411898), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417836/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/4fb964c54f4a37387c7844902d6171011a9a81efff244f365efcb156f630158f'), size_on_disk=3312893, blob_last_accessed=1756926344.2532752, blob_last_modified=1756926345.0702715), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417744/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/faf407f0b251cf5f58c8ac7becee4b995327fd9a7d6599c3045fde8a78e1e811'), size_on_disk=2548979, blob_last_accessed=1756926327.7843466, blob_last_modified=1756926328.7443423), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417802/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/ea68aebb727f77d846a145d43ec16fa2496e9c05d49a8791aa131b8e3e536452'), size_on_disk=4881005, blob_last_accessed=1756926337.2223055, blob_last_modified=1756926338.1273017), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417630/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/26fdbadf7c1252a8fd73ac85ea7430d942d2d11ddc43ea1b2590aaa008e08337'), size_on_disk=3015575, blob_last_accessed=1756926305.7104428, blob_last_modified=1756926306.7214384), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417812/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/9e8445f413d26051897f0fe3e7566b116a9a55062b78201990de334f17aecb31'), size_on_disk=19905089, blob_last_accessed=1756926339.0272977, blob_last_modified=1756926340.6232908), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6381050/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/f3c3bac37a1d4ec633edf53ba305c970eb3de412fd54dfbaf0cdf89f39574334'), size_on_disk=1518412, blob_last_accessed=1756926777.1687527, blob_last_modified=1756926778.028748), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417660/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/c394d7b0cfca3f3782dca86380e1a55903111d32efde98173af8a0b726ef5e7d'), size_on_disk=6441490, blob_last_accessed=1756926312.4744132, blob_last_modified=1756926313.4024093), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417763/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/91d8a58ead35fc52f7c2649ce9509a92236a20ca8c26f5f2ef4ba1a38e64afb1'), size_on_disk=230495, blob_last_accessed=1756926330.8943331, blob_last_modified=1756926331.4653306), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417880/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/b08db0a2399fc4e5b6f59c19b7af05b1294d0fac10b2c336b627234fa828f1be'), size_on_disk=9818886, blob_last_accessed=1756926351.5782433, blob_last_modified=1756926353.246236), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417741/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/734d5947cb26e877b6ae9e9adaf84575974a4fe2fb464ba11c4320906f17a42a'), size_on_disk=4566889, blob_last_accessed=1756926326.96635, blob_last_modified=1756926328.5653431), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417743/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/d4eb93049648498d2fdc1f125574429a4dfdec65e626ad9fa16bdf0d0cf599d8'), size_on_disk=2280534, blob_last_accessed=1756926327.462348, blob_last_modified=1756926329.1343408), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE209631/accession=GSM6380989/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/3c8b646955209386a1d97882163b7c9816463d2dea68fbed2cc960817ffeb332'), size_on_disk=12022144, blob_last_accessed=1756926373.7761471, blob_last_modified=1756926375.1511412), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417643/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/66dfa3feea4410ba34f5d5d37c7b01e1fa2cdd4ff38f9dfb10d20d55e40e449f'), size_on_disk=1012185, blob_last_accessed=1756926309.326427, blob_last_modified=1756926310.0784237), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417888/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/961a35161b04a0e892f46a0a90b6d4776d7fe07ef82fc7016c95b1a885c4274f'), size_on_disk=9651505, blob_last_accessed=1756926353.298236, blob_last_modified=1756926354.65023), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/snapshots/a987ef37c72fcd07b18828d320bc4305480daade/genome_map/series=GSE179430/accession=GSM5417674/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--barkai_compendium/blobs/5970db11c3efe2bba4eb50d07480cb0b46ed74dfc3ddf12d611ecd61f1d5fb68'), size_on_disk=4619941, blob_last_accessed=1756926315.5114, blob_last_modified=1756926317.630391)}), refs=frozenset({'main'}), last_modified=1756926783.3167186)}), last_accessed=1756926861.9192877, last_modified=1756926783.3167186), CachedRepoInfo(repo_id='BrentLab/kemmeren_2014', repo_type='dataset', repo_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--kemmeren_2014'), size_on_disk=646214296, nb_files=6, revisions=frozenset({CachedRevisionInfo(commit_hash='95a5f915ad49dfe4af75632861d9da69de2b525f', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--kemmeren_2014/snapshots/95a5f915ad49dfe4af75632861d9da69de2b525f'), size_on_disk=326978658, files=frozenset({CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--kemmeren_2014/snapshots/95a5f915ad49dfe4af75632861d9da69de2b525f/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--kemmeren_2014/blobs/05a15ab9764763e5d1957023162e13b0068a5697'), size_on_disk=10286, blob_last_accessed=1767811670.0019596, blob_last_modified=1765415815.7076464), CachedFileInfo(file_name='kemmeren_2014.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--kemmeren_2014/snapshots/95a5f915ad49dfe4af75632861d9da69de2b525f/kemmeren_2014.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--kemmeren_2014/blobs/12a63d4ccf5936c4a7ef7dc11f71287e1eb82b92901da87dd5b48e06f726c709'), size_on_disk=326968372, blob_last_accessed=1767811670.1439586, blob_last_modified=1766022970.1360188)}), refs=frozenset({'main'}), last_modified=1766022970.1360188), CachedRevisionInfo(commit_hash='a6c9a954a1def1774b68582f117ad1abc5864a07', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--kemmeren_2014/snapshots/a6c9a954a1def1774b68582f117ad1abc5864a07'), size_on_disk=319226397, files=frozenset({CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--kemmeren_2014/snapshots/a6c9a954a1def1774b68582f117ad1abc5864a07/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--kemmeren_2014/blobs/a29a64128c10ef9691ee773131972c61f720ba07'), size_on_disk=5007, blob_last_accessed=1756832529.5094638, blob_last_modified=1756832529.5404634), CachedFileInfo(file_name='.gitattributes', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--kemmeren_2014/snapshots/a6c9a954a1def1774b68582f117ad1abc5864a07/.gitattributes'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--kemmeren_2014/blobs/1ef325f1b111266a6b26e0196871bd78baa8c2f3'), size_on_disk=2461, blob_last_accessed=1756832529.5094638, blob_last_modified=1756832529.5434635), CachedFileInfo(file_name='kemmeren_2014.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--kemmeren_2014/snapshots/a6c9a954a1def1774b68582f117ad1abc5864a07/kemmeren_2014.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--kemmeren_2014/blobs/3c463017444bd7690959c0d6fff0d0ebe2faa7b916dd9c44e305156f6c98b863'), size_on_disk=319218929, blob_last_accessed=1760552530.221978, blob_last_modified=1756832530.9644527)}), refs=frozenset(), last_modified=1756832530.9644527), CachedRevisionInfo(commit_hash='b76f0dfe8477b300000faecf16b7d315fbf59344', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--kemmeren_2014/snapshots/b76f0dfe8477b300000faecf16b7d315fbf59344'), size_on_disk=319228170, files=frozenset({CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--kemmeren_2014/snapshots/b76f0dfe8477b300000faecf16b7d315fbf59344/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--kemmeren_2014/blobs/199dce9d01d3884716dd3a785a85751cb9b1de3e'), size_on_disk=9241, blob_last_accessed=1760552530.0869784, blob_last_modified=1758654056.5908144), CachedFileInfo(file_name='kemmeren_2014.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--kemmeren_2014/snapshots/b76f0dfe8477b300000faecf16b7d315fbf59344/kemmeren_2014.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--kemmeren_2014/blobs/3c463017444bd7690959c0d6fff0d0ebe2faa7b916dd9c44e305156f6c98b863'), size_on_disk=319218929, blob_last_accessed=1760552530.221978, blob_last_modified=1756832530.9644527)}), refs=frozenset(), last_modified=1758654056.5908144)}), last_accessed=1767811670.1439586, last_modified=1766022970.1360188), CachedRepoInfo(repo_id='BrentLab/hu_2007_reimand_2010', repo_type='dataset', repo_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hu_2007_reimand_2010'), size_on_disk=42682732, nb_files=4, revisions=frozenset({CachedRevisionInfo(commit_hash='31ccd35daf785420014a1346a7db064dd306d4f8', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hu_2007_reimand_2010/snapshots/31ccd35daf785420014a1346a7db064dd306d4f8'), size_on_disk=42682732, files=frozenset({CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hu_2007_reimand_2010/snapshots/31ccd35daf785420014a1346a7db064dd306d4f8/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hu_2007_reimand_2010/blobs/886f4614dfd1ef33709d462908ae18608c13e8ba'), size_on_disk=2692, blob_last_accessed=1756844742.0937967, blob_last_modified=1756844742.1537964), CachedFileInfo(file_name='hu_2007_reimand_2010.parquet.md5', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hu_2007_reimand_2010/snapshots/31ccd35daf785420014a1346a7db064dd306d4f8/hu_2007_reimand_2010.parquet.md5'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hu_2007_reimand_2010/blobs/eb9be5a1f01907a81eb2ab4fbbee36ff8c50186f'), size_on_disk=63, blob_last_accessed=1756844741.7997973, blob_last_modified=1756844741.8557973), CachedFileInfo(file_name='hu_2007_reimand_2010.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hu_2007_reimand_2010/snapshots/31ccd35daf785420014a1346a7db064dd306d4f8/hu_2007_reimand_2010.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hu_2007_reimand_2010/blobs/47a15f3e23f8cb50af3217d00c6439d29a7422b5190116e4453a5f78cb22540d'), size_on_disk=42677516, blob_last_accessed=1756844742.1787965, blob_last_modified=1756844742.1477966), CachedFileInfo(file_name='.gitattributes', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hu_2007_reimand_2010/snapshots/31ccd35daf785420014a1346a7db064dd306d4f8/.gitattributes'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hu_2007_reimand_2010/blobs/1ef325f1b111266a6b26e0196871bd78baa8c2f3'), size_on_disk=2461, blob_last_accessed=1756844741.8097973, blob_last_modified=1756844741.8767972)}), refs=frozenset({'main'}), last_modified=1756844742.1537964)}), last_accessed=1756844742.1787965, last_modified=1756844742.1537964), CachedRepoInfo(repo_id='BrentLab/callingcards', repo_type='dataset', repo_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards'), size_on_disk=49606009, nb_files=137, revisions=frozenset({CachedRevisionInfo(commit_hash='9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0'), size_on_disk=49588983, files=frozenset({CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=PR009/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/00e5602b87ac84670a98c081e6520ca744038f4e9b3a72a78cb61f5926be56b1'), size_on_disk=251087, blob_last_accessed=1763591392.5781758, blob_last_modified=1758648267.8286521), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6657/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/95ad2225152f51e2e31c030ff6708aeebccee5e548035d6f43c9cc3554bdfe5e'), size_on_disk=507730, blob_last_accessed=1763591393.163172, blob_last_modified=1758648269.1676583), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6764/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/ec87eef9e2e8cdf3c3a10ead28dcf1bfa76e94e12a42422a46c8559fb18207c2'), size_on_disk=75684, blob_last_accessed=1763591391.517183, blob_last_modified=1758648269.4036593), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_7295/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/19bd07f913e0af3460e6b4b5091971ceed8f4aa510cf18b82ec36dabb67c68b8'), size_on_disk=272619, blob_last_accessed=1763591393.1151721, blob_last_modified=1758648270.605665), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6567/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/0e9713a94594dfcbc80134af52449e12ef207ca9149f5a96c43b6cb457b2431c'), size_on_disk=478809, blob_last_accessed=1763591392.6241755, blob_last_modified=1758648268.893657), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6527/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/10081f44b1c2f5ad32489da51fc9ac5b5289b54688a7e78725db247a644b7e4d'), size_on_disk=494640, blob_last_accessed=1763591391.517183, blob_last_modified=1758648268.882657), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_7118/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/be8bd409b83526d5671501de6ef2980177955fdcc3a13d5e153d2593493b2a1e'), size_on_disk=346308, blob_last_accessed=1763591392.9281735, blob_last_modified=1758648270.1336627), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_5399/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/b65d0aa7ddc98940f91f2ae5f07e1d3f81dda688792552b91fcc39fb4f9ededa'), size_on_disk=256729, blob_last_accessed=1763591392.5471761, blob_last_modified=1758648267.972653), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6739/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/58b5ed63f2197f406dd3df6a569975d776722193007912718979eaed4d4e5f9c'), size_on_disk=399160, blob_last_accessed=1763591392.6551754, blob_last_modified=1758648269.331659), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_7185/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/14f67668aca8d8f8aa2b8af98835da683a7fc3f22aa09cdcd7f943a37632391e'), size_on_disk=201650, blob_last_accessed=1763591393.2191715, blob_last_modified=1758648270.1076627), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_7211/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/73d7747698fa1168c89f12c105bbe900d5969d27a82ae2302d0720d761fbfdab'), size_on_disk=298482, blob_last_accessed=1763591391.517183, blob_last_modified=1758648270.1276627), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6861/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/6955b1a1746fe41162ea1d7a098482bbfdb3d0e411c809b52d698e285627ce03'), size_on_disk=385103, blob_last_accessed=1763591391.517183, blob_last_modified=1758648269.6146603), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6808/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/c71f5371e8c803cfc7fd507d0b3c11f7548db5d7a0fa42c68977169369ea036e'), size_on_disk=422880, blob_last_accessed=1763591392.9491735, blob_last_modified=1758648269.3916593), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6872/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/02fabe6e32f52fe6fca923beb0a6cc2898d991304e23a3e75f3ae8a6bcf361c6'), size_on_disk=274362, blob_last_accessed=1763591393.6231687, blob_last_modified=1758648269.6966608), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6073/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/3d68ca6411ba47eb63adad81e88fa1341a0ee269897268957ae936dcc9ba7788'), size_on_disk=450031, blob_last_accessed=1763591393.1341722, blob_last_modified=1758648268.3886547), CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/ac08e026f6d48c034f13a49f3531c90b6c070907'), size_on_disk=9519, blob_last_accessed=1763588875.9720669, blob_last_modified=1763588875.9700668), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=MAG001/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/30b63d014686a103b5e7fc6571e902fa9c08630a96557cc12ec9dfcf60871306'), size_on_disk=347333, blob_last_accessed=1763591391.517183, blob_last_modified=1758648267.559651), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_7297/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/e4bd4088cbfcccdf96def916bb8fd9d673327d3e57c2a4a3e57299743ce4aca7'), size_on_disk=261344, blob_last_accessed=1763591393.588169, blob_last_modified=1758648270.6516652), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_7367/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/a7778bb24506047963575fa96126157895c7f329996c729c3e7d6070595955c3'), size_on_disk=324361, blob_last_accessed=1763591392.1631787, blob_last_modified=1758648270.628665), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6708/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/35520976736159652e3be8066e77b1371522f63a8b3564d9b0d9f5b1598a144d'), size_on_disk=253955, blob_last_accessed=1763591393.6881683, blob_last_modified=1758648269.1996584), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6100/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/7d4c0cbbf6075a8b5732e3cef2ba154d95d21aa0efc7e49d947cceebd8742b30'), size_on_disk=269007, blob_last_accessed=1763591391.517183, blob_last_modified=1758648268.4106548), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6689/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/90ea249116dc6a9be86974e1ec0acf79558dfd56a5666be2e05d2107b8c4ee60'), size_on_disk=366787, blob_last_accessed=1763591393.6661685, blob_last_modified=1758648269.1406581), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=DS003/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/11485fd7bb94d0093ce2508d541fea3ce64f934197d8a5f7bb26ecbdb59f8dbb'), size_on_disk=124642, blob_last_accessed=1763591392.3321776, blob_last_modified=1758648267.0666487), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6983/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/b4ee29f84ab279c520521a6c5a5eed9e5e5581b2e39378bdd898eec67d13654f'), size_on_disk=410123, blob_last_accessed=1763591392.6751752, blob_last_modified=1758648269.8636615), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=GJ003/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/8e20f616f6e340b76f6446d2cdffb6cc9feb81297bd0307733f835acf4142cfd'), size_on_disk=269523, blob_last_accessed=1763591392.3461773, blob_last_modified=1758648267.2696495), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_7266/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/9aea2d6f5606d24e1999d450379b1f8c8b9d80eded9e784fb729faa516a27026'), size_on_disk=324480, blob_last_accessed=1763591392.0401795, blob_last_modified=1758648270.379664), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_7151/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/9969e4607d6bf42424f5d2ce8f0d16e3d2d89d069cae6d75997e99c88a68dc77'), size_on_disk=275572, blob_last_accessed=1763591393.7231681, blob_last_modified=1758648270.1476629), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6532/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/3041f19cf0dcb3e0bd9b1178994351a22f3297ccd43c5e291df8bfe0ad8d5d02'), size_on_disk=516437, blob_last_accessed=1763591391.517183, blob_last_modified=1758648268.8236568), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6055/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/31bb70b4c90654ecb276664f5c914679322b24985ec66d5b396a99f253ebfa17'), size_on_disk=377015, blob_last_accessed=1763591393.5701692, blob_last_modified=1758648268.3076544), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=DS002/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/74cbce50c258bdddb7e42c8fd5b0b2a9da74fe3cf89a9ab5a53ee4fde7c1dee6'), size_on_disk=186142, blob_last_accessed=1763591392.842174, blob_last_modified=1758648266.9966483), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=PR006/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/71a33d11c3ff714ce0730145c740eec4fe88812c6136b8d10b86fa8d4054991f'), size_on_disk=186500, blob_last_accessed=1763591391.517183, blob_last_modified=1758648267.7286518), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_5690/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/4831e3e037d421a14efbe1da8c369015ec7d01df351bb27f0ef0400c8b94c693'), size_on_disk=311754, blob_last_accessed=1763591392.9021738, blob_last_modified=1758648268.211654), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6545/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/260c4a280f52024cb12d44559a6189641ee02efbaf20168435196efbfc69d741'), size_on_disk=404812, blob_last_accessed=1763591392.3761773, blob_last_modified=1758648268.8256567), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6573/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/e435989051765cf2955d8cd733904e8b60e5f3121e1c2fdc987f4e6f71fe3970'), size_on_disk=265022, blob_last_accessed=1763591391.517183, blob_last_modified=1758648268.9586573), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_7351/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/cc708adf186f28db09dfdd4ac43865a90482b2192306949eab01492598cf23e7'), size_on_disk=198070, blob_last_accessed=1763591391.517183, blob_last_modified=1758648270.610665), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=CS002/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/a76f0849e518faedfe50e6f0c76908a3f6beed34fd0b607603db1859493595d0'), size_on_disk=292312, blob_last_accessed=1763587658.6168654, blob_last_modified=1758648266.7676473), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=composite/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/1f8adf8fb4992283714df8587126211d54e6a0580b8be88c1d5e911196aed783'), size_on_disk=4814573, blob_last_accessed=1763591392.0561793, blob_last_modified=1758648268.0256531), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=GJ002/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/e5b3174998823af40d055f46ca06e7bb44601cbe97b133062cd3e0491dd30ac2'), size_on_disk=323308, blob_last_accessed=1763591393.153172, blob_last_modified=1758648267.2686496), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6437/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/556084dbbc174db5450872d6f5e66cc72091380ea32006977dd176e23c105a7f'), size_on_disk=389840, blob_last_accessed=1763591392.714175, blob_last_modified=1758648268.6296558), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=DS006/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/0230df4f1d3748337e5315bb1d22a0373fdab95f9bc900041b0e34966d868718'), size_on_disk=311116, blob_last_accessed=1763591392.9241736, blob_last_modified=1758648267.2506495), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6454/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/c35b60861d2fc7a3f4256869e14e629aed07cd2ccce76608578ceb8c6ec423e1'), size_on_disk=498326, blob_last_accessed=1763591391.517183, blob_last_modified=1758648268.7226562), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=KB001/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/38c939d54d3ed77a09650d2ccf00a6ed0e20390b8be722f1c3ed7881a5904a60'), size_on_disk=351677, blob_last_accessed=1763591392.9151735, blob_last_modified=1758648267.5566509), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=GJ007/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/7ef036874e53ee98dfe1b5d84c42749fc40da9279cd573174321ac83201c1651'), size_on_disk=349762, blob_last_accessed=1763591393.0791724, blob_last_modified=1758648267.4966507), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6923/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/454ca6a52d4b305eddc4b6967ef9604be4861269fdcfff419883daed9b2fe48b'), size_on_disk=337423, blob_last_accessed=1763591393.5031695, blob_last_modified=1758648269.6476605), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_5614/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/1aa702ff57d34ebed961043e0efae064bfdb62f589cbb515311ae8ea37ba8c76'), size_on_disk=429423, blob_last_accessed=1763591393.4671698, blob_last_modified=1758648268.0406532), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6853/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/ed47ecb2aa870fdbf12920fd4486933262c0a24318c9c8c5531e359a3c416c90'), size_on_disk=515114, blob_last_accessed=1763591393.2661712, blob_last_modified=1758648269.4276595), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6572/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/3488eaac5e4b9b3418b909138e6b50e3f9ebaf3e7345a4a10b6ced18b66152e4'), size_on_disk=440579, blob_last_accessed=1763591391.517183, blob_last_modified=1758648268.9506574), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6731/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/23ecd71c4e3c1c547295bb756d4654eb413ff288d10118ab286cd21ccd95b666'), size_on_disk=346187, blob_last_accessed=1763591392.1871786, blob_last_modified=1758648269.1906583), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=CS005/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/3d50648ed5c9f593f830aff661d13b3d34bd14f0bf93b16169df4765c9f47fc8'), size_on_disk=295924, blob_last_accessed=1763591391.517183, blob_last_modified=1758648266.7426472), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=PR007B/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/6f0620b102e7cbfadced7d5530545a185036b53118d3e2b7f074c56403d80f4b'), size_on_disk=263185, blob_last_accessed=1763591391.517183, blob_last_modified=1758648267.7616518), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6924/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/5e9b3a94d73c1720e6b05ea55000054905a8672edea7f950736ce3fe191f8faf'), size_on_disk=412939, blob_last_accessed=1763591393.748168, blob_last_modified=1758648269.7066607), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_7063/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/9427d8f8c54e3c0cc9d720958ef15d3859f00cdfb678559e0268387a36a8ef37'), size_on_disk=404051, blob_last_accessed=1763591392.5301762, blob_last_modified=1758648269.8976617), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_5423/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/e871cf06f596bfe74896b62aa28d6b1e6f2f0b8408630fb146c7e29cfd88fbf0'), size_on_disk=232721, blob_last_accessed=1763591392.7871745, blob_last_modified=1758648267.991653), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6938/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/eb61968b5ec5eadd9087e42ead29333f91195dd5b3f9c5a616b41e94ca23c494'), size_on_disk=330245, blob_last_accessed=1763591393.6761684, blob_last_modified=1758648269.710661), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_7100/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/6b92129c2fb99028b6cab9509cf1a4c40b142109cde34c9a130d375001dcd211'), size_on_disk=267273, blob_last_accessed=1763591392.7811744, blob_last_modified=1758648269.9106617), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6374/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/e410602f56bc94d9831a2bc8bdb865ee4ebe9862a65b603165b655fbcc4b9914'), size_on_disk=441531, blob_last_accessed=1763591393.1891718, blob_last_modified=1758648268.5356555), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=CS009/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/5a70fa6f22aeda3246d17646cc4ec7c5a46e1993f9981b19d9fd5a8cc99bf11f'), size_on_disk=308858, blob_last_accessed=1763591391.9411802, blob_last_modified=1758648267.0186484), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6738/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/e6d612d16538a06c985f1134f3fdfd367944b2a85cedfb8b0ddad1679004c940'), size_on_disk=333560, blob_last_accessed=1763591392.8651738, blob_last_modified=1758648269.3676593), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_5986/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/4f1179c75ac6e91de03b67a73882f3b9c07c8e435d938d3059de489403adde7a'), size_on_disk=328749, blob_last_accessed=1763591392.691175, blob_last_modified=1758648268.228654), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=CS012/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/7d3e36418bd46acf9537a8f864b1a1361a062d72fec2b093789b88ed655ec117'), size_on_disk=377393, blob_last_accessed=1763591393.0451727, blob_last_modified=1758648267.0136483), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6640/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/48339729112d22aa5aabf411674bfb90a567ec4c7a07ea7930738645a96823ef'), size_on_disk=329042, blob_last_accessed=1763591392.6391754, blob_last_modified=1758648269.090658), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=DS004/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/c9b19726dc45896e4f86a5acc52d07cc422f895bfaaaddb332aaceedff36560c'), size_on_disk=178793, blob_last_accessed=1763591391.517183, blob_last_modified=1758648267.0266485), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6770/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/082f14f7b0d5fc47e088efc40ae85a8ec41f99437c66da96ca69af210bce61cc'), size_on_disk=318118, blob_last_accessed=1763591392.8081744, blob_last_modified=1758648269.3776593), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_5935/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/33d7303ed077f489ee28021daca0bbc859feb1cfd383dfc9a856ea13994801a9'), size_on_disk=331877, blob_last_accessed=1763591393.292171, blob_last_modified=1758648268.2226539), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_5654/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/38961219eb66dec31505ebcc5088f16684be9891b96fffa8c9a13cd3f13254d0'), size_on_disk=447782, blob_last_accessed=1763591391.5161831, blob_last_modified=1758648268.1876538), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_7240/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/a4460d188b4cd3b362f33acf0f0b5312a267b37f8fb2dab4b2577c6c3bbb1231'), size_on_disk=307039, blob_last_accessed=1763591392.6101756, blob_last_modified=1758648270.403664), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_7012/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/bd2017e75947ead2c06f9d1936347f7fdcfac7fbff620e0b39077208a02c3bd9'), size_on_disk=303094, blob_last_accessed=1763591393.1271722, blob_last_modified=1758648269.9296618), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_7155/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/521fc1bea6963640645c21cbdb8d1de07793f11213c23ac71d249f3252b32f54'), size_on_disk=304554, blob_last_accessed=1763591393.5841691, blob_last_modified=1758648270.1366627), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6021/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/eea0d101dd58c9619b41a94009fc0dff8156d2d36d7f2b88bb653e96123a032e'), size_on_disk=439545, blob_last_accessed=1763591393.5531693, blob_last_modified=1758648268.258654), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=PR005/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/62e9a627f20607bc4887ebc9277131f974ec80ff39ac386775f4b8f2f5cb9d1d'), size_on_disk=346782, blob_last_accessed=1763591391.517183, blob_last_modified=1758648267.7476518), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=PR003/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/fc9311b65abb851fbbfd8bb0c0765af59069659f5a715aee546a8644b0af4ec2'), size_on_disk=210948, blob_last_accessed=1763591391.5161831, blob_last_modified=1758648267.6766515), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=PR010/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/dcb33135a9d854d17a6b80348f6bee8becea9ca3808583a034f89c5e4f460c54'), size_on_disk=218643, blob_last_accessed=1763591393.4831698, blob_last_modified=1758648267.9636528), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_7269/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/5bae329d2318610f627fc562a5163253c7d61e0ef83c53808b220133b2299264'), size_on_disk=203263, blob_last_accessed=1763591392.541176, blob_last_modified=1758648270.4096642), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=PR006B/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/fcc5e886ec7c6363a5ab51b03d3da189ac3e0cde0a0f79d4f0c46043a6b61838'), size_on_disk=253857, blob_last_accessed=1763591392.0501795, blob_last_modified=1758648267.785652), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=GJ005/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/646c843611bf88c7f5dcd5b1a664d724c3b24b6434cc52d9a54de62ccaf6c542'), size_on_disk=255949, blob_last_accessed=1763591391.517183, blob_last_modified=1758648267.4506505), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_7243/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/cb59e645fff2683db692b4d2f514602ba210271b632cd113853febe941c6485f'), size_on_disk=281696, blob_last_accessed=1763591392.701175, blob_last_modified=1758648270.375664), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=CS006/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/530b5650d91dae3bf169f967af8e800611e3e01daeb1811d6fd21c2612ac4ad9'), size_on_disk=315170, blob_last_accessed=1763591393.6121688, blob_last_modified=1758648266.7956474), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_7331/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/f159a9811b8ac1b5e252b52427979a3bddce2e7131b13f84046b31e47f7fa399'), size_on_disk=242821, blob_last_accessed=1763591393.2501714, blob_last_modified=1758648270.605665), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_5301_5088/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/65ef775934339b80eda59a141cf791a74e1c6f12513d8143307cebe2f11c3738'), size_on_disk=410106, blob_last_accessed=1763591391.517183, blob_last_modified=1758648267.9516528), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_7332/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/f7f62ee008fa2b5cc688cbd80c888f1bf9da538caa245b63581520d485338a99'), size_on_disk=309595, blob_last_accessed=1763591393.1021724, blob_last_modified=1758648270.605665), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_5610/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/a86512ae1000a994c86e9aebaf30cf9ce1a59160155f3dc1afa6cbde0a1fc47e'), size_on_disk=307382, blob_last_accessed=1763591393.5291693, blob_last_modified=1758648267.994653), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_7370/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/ff4dce26afb34b291220c17625be4a28078f9721a8b000ea177c4add3bf1a039'), size_on_disk=241010, blob_last_accessed=1763591393.7701678, blob_last_modified=1758648270.6726654), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=CS010/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/ac8c590c19e7b95b9aac571b1394976039f25fdd66e62a68ecb99925433f8e54'), size_on_disk=355706, blob_last_accessed=1763591391.517183, blob_last_modified=1758648267.0226483), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6863/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/ec534c91cae4413d30477014131d39b51c9c236dfbf150874bd21ec3de821613'), size_on_disk=389025, blob_last_accessed=1763591391.5161831, blob_last_modified=1758648269.6376605), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=MAG002/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/4f7f35257d4c1f7fa5812ea31c1de791e08ab5e3a4e7ca42b2bb12b2c64f7ed7'), size_on_disk=490174, blob_last_accessed=1763591392.829174, blob_last_modified=1758648267.583651), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=PR008/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/f427ea0f4ded3b6ff841153d8ad87d72a26a964c189246fb6fd5c439bd8b0c1e'), size_on_disk=353623, blob_last_accessed=1763591391.518183, blob_last_modified=1758648267.812652), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_5801/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/12e91a415cc2a149af4c34c3c8b032821ff301e2828dfb9f4636bd6b2913ea3d'), size_on_disk=360485, blob_last_accessed=1763591391.518183, blob_last_modified=1758648268.1636536), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=CS007/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/4b2ff192aa4808cb50b1c1b1667060f98505dd63c1aa58c7d8354e41194957e4'), size_on_disk=289767, blob_last_accessed=1763591393.4871697, blob_last_modified=1758648266.7706473), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=DS005/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/dc74862f02ff12b8c906046f1f03953d33f5516091d58397b0f9a0fc05bb4b5c'), size_on_disk=315435, blob_last_accessed=1763591391.517183, blob_last_modified=1758648267.5006506), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_7258/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/7b1860ba3be7dbf053203fa5690da2002f7841186940231acea6064e70f0bdc8'), size_on_disk=171331, blob_last_accessed=1763591392.6171756, blob_last_modified=1758648270.3746638), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_7283/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/0dc91bfacca82c5ed23de7757d59f9f808d541d96dd79aab56f0c4252e7aabf3'), size_on_disk=265146, blob_last_accessed=1763591392.7731745, blob_last_modified=1758648270.401664), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6651/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/7644af4cc956ea08809ab6c80c65095e3fdbd36e056040c0eaded3773d546359'), size_on_disk=559951, blob_last_accessed=1763591393.6361687, blob_last_modified=1758648269.1856585), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6855/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/48e48378d24404e926bc1c44b10af60d4b12f82124ff2f80bc0b69b4a0f98740'), size_on_disk=609248, blob_last_accessed=1763591391.517183, blob_last_modified=1758648269.6556606), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6390/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/4f324cdf73148c8b0fa23456bc4f51cff3032d08ed42e44dd9fcc4815a6d969c'), size_on_disk=560601, blob_last_accessed=1763591392.7391748, blob_last_modified=1758648268.6206558), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6061/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/73e03845d0fbff1a5931e4f792e2c1965ade02fc622ffc08e05c267ac6cd3231'), size_on_disk=394930, blob_last_accessed=1763591391.5161831, blob_last_modified=1758648268.447655), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_5961/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/49cac5c2df1adf790a9ac0209db6c0945de50b8191d403e8b1ad386ee2dcb35e'), size_on_disk=321368, blob_last_accessed=1763591392.560176, blob_last_modified=1758648268.215654), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6635/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/6c5b7f1fda3a7f5679870f438d13a45558f7f8da0f479c5144d7751bc8f7fbbc'), size_on_disk=420658, blob_last_accessed=1763591391.95018, blob_last_modified=1758648269.3956594), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_7137/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/703fe9f8552adcbdfb5e640509cb62df2fa734b43fe1e98fc1fd342415f823d7'), size_on_disk=515398, blob_last_accessed=1763589053.7698598, blob_last_modified=1758648270.1326628), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_7237/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/110a743162ace7792a44dfe48d7afd40bb67c45c722038c981548477ebc5d8fe'), size_on_disk=313207, blob_last_accessed=1763591393.143172, blob_last_modified=1758648270.1426628), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6551/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/c3729692991cdcafe9baa3d2805b3866f72fe9bc89ba22c6d2cdae4d5b5bbf5f'), size_on_disk=545477, blob_last_accessed=1763591393.0611725, blob_last_modified=1758648268.8376567), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=DS001/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/7cb5cca74f1bda2f8f1c8ef962e54d34228361eaa9dd8b21e0d1dd7092ff8164'), size_on_disk=229703, blob_last_accessed=1763591392.555176, blob_last_modified=1758648267.0126483), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6177/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/b91ffef1dfcdf7952df46956d76a4b364aa0d8342453636cf09e6a343607184f'), size_on_disk=387618, blob_last_accessed=1763591391.518183, blob_last_modified=1758648268.424655), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6506/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/986d9d3b4ed064d66ebaa0f2502b933f1650a20ac4bbbba804e2cde03b9454c0'), size_on_disk=378407, blob_last_accessed=1763591391.97318, blob_last_modified=1758648268.7656565), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=GJ006/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/fa1a44bd33df22ec6b5a8baf3be17e51a11522b5fdd545a38e6e163cc5ae8207'), size_on_disk=241911, blob_last_accessed=1763591392.572176, blob_last_modified=1758648267.5226507), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6513/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/c8f9c31704c2f8732b4668e1e6410034b8adf2ada39ff55935aa0199e920d055'), size_on_disk=507367, blob_last_accessed=1763591392.1451788, blob_last_modified=1758648268.8406568), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=DS007/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/128015caa722289251a45aa3d56cabac6b5946eb4e1be16eb950555fb24ac413'), size_on_disk=172328, blob_last_accessed=1763591393.735168, blob_last_modified=1758648267.2826495), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6959/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/c0cb6c2ec2bda19b0c1da5d5fc2229d0b6d6987d75646fa230c0c14d591cf1f1'), size_on_disk=537588, blob_last_accessed=1763591392.851174, blob_last_modified=1758648269.8686616), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=DS008/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/1fa09a5a6e6e69198501d994050adfcad7ff39a3cc2fe94e7ea177414e1bc968'), size_on_disk=303532, blob_last_accessed=1763591393.7091682, blob_last_modified=1758648267.2776496), CachedFileInfo(file_name='annotated_features_meta.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features_meta.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/2be4b693d333698f7ada7ef40ffaf927c0160d7975ebcf6e215cbc4ba1be1604'), size_on_disk=11813, blob_last_accessed=1763587594.724117, blob_last_modified=1758648264.795638), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6920/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/45122fcfb3abc3a43631023712c224dc6e00dba566fb2e15deb4ab4501ad2538'), size_on_disk=446998, blob_last_accessed=1763591392.8881738, blob_last_modified=1758648269.6416605), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=GJ001/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/ddd58117dbedfe7e9dc3fd946187e7bd5d88fe8b0f38999c6e4b5660938bb787'), size_on_disk=359739, blob_last_accessed=1763591393.5981688, blob_last_modified=1758648267.2726495), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6798/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/899cde55032cd593be0013e1f6ff33fe9a12f4616477b1410d86fae1b2008a09'), size_on_disk=342376, blob_last_accessed=1763591392.7951744, blob_last_modified=1758648269.4006593), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6140/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/66c2da17957d7e6a8b159d3573879ac779d0094e35150b51e2643f7ddee36e1f'), size_on_disk=332958, blob_last_accessed=1763591391.517183, blob_last_modified=1758648268.4236548), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_7256/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/1af21cbdd7f6f6effff15a1b061a48f85132d582f08c8fd42e04ea58bde7ec00'), size_on_disk=206710, blob_last_accessed=1763591392.586176, blob_last_modified=1758648270.375664), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6677/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/4ee6e1d3e2b9a9bbcc8851048d2b16f4aea6a1e2f752afb9e0bf75a56adf7042'), size_on_disk=322147, blob_last_accessed=1763591392.393177, blob_last_modified=1758648269.0826578), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6421/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/1e3bacd3bcf1fcb46cfc5f9ab66c2af57e190f3d006031e7ef37864fbd36aaca'), size_on_disk=495452, blob_last_accessed=1763591392.2161784, blob_last_modified=1758648268.6366558), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=CS004/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/9f7663ccf72b0681ae5866c13500fbde98f35877fd677ac0d49c17447d17d6ce'), size_on_disk=332767, blob_last_accessed=1763591392.1731787, blob_last_modified=1758648266.744647), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6965/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/9ae4546d58672f0632757c94b595af8b7c40da215611783fdc1a65000075ec98'), size_on_disk=509677, blob_last_accessed=1763591392.2731779, blob_last_modified=1758648269.933662), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_7115/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/205443fd87ce5c793dab86eb75c21dab7dc2ae2edac41d7e6314cfe491bdb01d'), size_on_disk=321850, blob_last_accessed=1763591393.2331715, blob_last_modified=1758648269.9106617), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=CS003/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/a4d098c33564f99ec6aed77d585d1f6e871806f441f099790eef4d6478e322af'), size_on_disk=264459, blob_last_accessed=1763591391.517183, blob_last_modified=1758648266.7486472), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=CS011/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/7ea7cb0559b342dd0ec278e7d5846a5ba0c0d867ea04d545f72e0d1c9472a0cc'), size_on_disk=363979, blob_last_accessed=1763591392.9791732, blob_last_modified=1758648267.0116484), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_5617/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/bc01241e8363373c995a0fa7439aa1e432a60eb8a7805124fff302311f638361'), size_on_disk=351950, blob_last_accessed=1763591393.6541686, blob_last_modified=1758648268.1026535), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_7276/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/b4e0d576cc4a6a85be64f2198e5001a9f5b02b506072df6f695cfdfc719cb041'), size_on_disk=204036, blob_last_accessed=1763591392.987173, blob_last_modified=1758648270.4646642), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6354/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/80b129a46defc7b06e472a35935bfaf410bc5e456475b2885feb2c8a4934a81b'), size_on_disk=173225, blob_last_accessed=1763591393.4981697, blob_last_modified=1758648268.4986553), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=GJ004/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/d1cd38451bc41ef14dbc1566c4704b004b2c0661f667210c81dba09bec90540f'), size_on_disk=244487, blob_last_accessed=1763591393.6301687, blob_last_modified=1758648267.3246498), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6423/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/1c8a8cb23b1ec073827d6f5343a3bf03a9d7a04cf2ae3fd3ea4cc5ac37673b38'), size_on_disk=342987, blob_last_accessed=1763591391.517183, blob_last_modified=1758648268.6226559), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6448/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/182dc134914a5ddb02992ab6a4f1c12d2450aa331820bf7bd95693dc999bcfae'), size_on_disk=390210, blob_last_accessed=1763591392.9591732, blob_last_modified=1758648268.672656), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=PR004/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/8e91fd764a83947c14f35d7a2d8c0d156c0a61a32a1da96273718ef4be720787'), size_on_disk=361855, blob_last_accessed=1763591391.518183, blob_last_modified=1758648267.7196517), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6106/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/02313d4d85859a7353e56d89b6e14bcec5160849b5c0089ccbecaa1cda5012cf'), size_on_disk=301213, blob_last_accessed=1763591391.517183, blob_last_modified=1758648268.4356549), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=PR002/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/bbe701ffd546a2f4359f821262f7b44232bdc328d9d4719bfdaa8f2aedc6d2e9'), size_on_disk=346079, blob_last_accessed=1763591393.0861723, blob_last_modified=1758648267.589651), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=CS006b/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/7eedae811a5b3dd4389e7e814d35b022bfaf01fc4d2cfb47be6e6786d6d58c64'), size_on_disk=315912, blob_last_accessed=1763591393.2571712, blob_last_modified=1758648266.7546473), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6443/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/4deb4634e8b4bf141a2410e94b2d99e2b31a0e8f037de9a1c6e14696025bffae'), size_on_disk=494775, blob_last_accessed=1763591392.991173, blob_last_modified=1758648268.6296558), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=CS008/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/e4e8332a5db6260f6357f940e62e3e7b622785f386e81cdb9c75b538d318f4a6'), size_on_disk=158978, blob_last_accessed=1763591393.5471692, blob_last_modified=1758648266.7626472), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_6954/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/aeed3c55d81a4eb0c568486e476211c599ce2bf37f9dff22c7dfca23f5d80495'), size_on_disk=476054, blob_last_accessed=1763591393.011173, blob_last_modified=1758648269.943662), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/9f504054065d0aea4b3e52ed5c33c3b4c3fb69e0/annotated_features/batch=run_7209/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/ab338795b8bbf334650386a123f6e0df41363cb5dcbb778d485ce58804e96b29'), size_on_disk=375506, blob_last_accessed=1763591392.9711733, blob_last_modified=1758648270.1326628)}), refs=frozenset(), last_modified=1763588875.9700668), CachedRevisionInfo(commit_hash='27becd0ab489579a50c38ba896dd03680c5fd2da', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da'), size_on_disk=49585603, files=frozenset({CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6100/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/7d4c0cbbf6075a8b5732e3cef2ba154d95d21aa0efc7e49d947cceebd8742b30'), size_on_disk=269007, blob_last_accessed=1763591391.517183, blob_last_modified=1758648268.4106548), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_7237/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/110a743162ace7792a44dfe48d7afd40bb67c45c722038c981548477ebc5d8fe'), size_on_disk=313207, blob_last_accessed=1763591393.143172, blob_last_modified=1758648270.1426628), CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/a88564c359f73b5ac3f70a4635f33a1ec57f527f'), size_on_disk=6139, blob_last_accessed=1763587733.4135532, blob_last_modified=1758648264.1986353), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=CS004/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/9f7663ccf72b0681ae5866c13500fbde98f35877fd677ac0d49c17447d17d6ce'), size_on_disk=332767, blob_last_accessed=1763591392.1731787, blob_last_modified=1758648266.744647), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=composite/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/1f8adf8fb4992283714df8587126211d54e6a0580b8be88c1d5e911196aed783'), size_on_disk=4814573, blob_last_accessed=1763591392.0561793, blob_last_modified=1758648268.0256531), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_7297/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/e4bd4088cbfcccdf96def916bb8fd9d673327d3e57c2a4a3e57299743ce4aca7'), size_on_disk=261344, blob_last_accessed=1763591393.588169, blob_last_modified=1758648270.6516652), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=DS007/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/128015caa722289251a45aa3d56cabac6b5946eb4e1be16eb950555fb24ac413'), size_on_disk=172328, blob_last_accessed=1763591393.735168, blob_last_modified=1758648267.2826495), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=CS007/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/4b2ff192aa4808cb50b1c1b1667060f98505dd63c1aa58c7d8354e41194957e4'), size_on_disk=289767, blob_last_accessed=1763591393.4871697, blob_last_modified=1758648266.7706473), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_7063/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/9427d8f8c54e3c0cc9d720958ef15d3859f00cdfb678559e0268387a36a8ef37'), size_on_disk=404051, blob_last_accessed=1763591392.5301762, blob_last_modified=1758648269.8976617), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6454/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/c35b60861d2fc7a3f4256869e14e629aed07cd2ccce76608578ceb8c6ec423e1'), size_on_disk=498326, blob_last_accessed=1763591391.517183, blob_last_modified=1758648268.7226562), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=PR004/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/8e91fd764a83947c14f35d7a2d8c0d156c0a61a32a1da96273718ef4be720787'), size_on_disk=361855, blob_last_accessed=1763591391.518183, blob_last_modified=1758648267.7196517), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_7351/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/cc708adf186f28db09dfdd4ac43865a90482b2192306949eab01492598cf23e7'), size_on_disk=198070, blob_last_accessed=1763591391.517183, blob_last_modified=1758648270.610665), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6532/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/3041f19cf0dcb3e0bd9b1178994351a22f3297ccd43c5e291df8bfe0ad8d5d02'), size_on_disk=516437, blob_last_accessed=1763591391.517183, blob_last_modified=1758648268.8236568), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6448/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/182dc134914a5ddb02992ab6a4f1c12d2450aa331820bf7bd95693dc999bcfae'), size_on_disk=390210, blob_last_accessed=1763591392.9591732, blob_last_modified=1758648268.672656), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_5654/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/38961219eb66dec31505ebcc5088f16684be9891b96fffa8c9a13cd3f13254d0'), size_on_disk=447782, blob_last_accessed=1763591391.5161831, blob_last_modified=1758648268.1876538), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=PR006/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/71a33d11c3ff714ce0730145c740eec4fe88812c6136b8d10b86fa8d4054991f'), size_on_disk=186500, blob_last_accessed=1763591391.517183, blob_last_modified=1758648267.7286518), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_5423/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/e871cf06f596bfe74896b62aa28d6b1e6f2f0b8408630fb146c7e29cfd88fbf0'), size_on_disk=232721, blob_last_accessed=1763591392.7871745, blob_last_modified=1758648267.991653), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_5986/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/4f1179c75ac6e91de03b67a73882f3b9c07c8e435d938d3059de489403adde7a'), size_on_disk=328749, blob_last_accessed=1763591392.691175, blob_last_modified=1758648268.228654), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_5614/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/1aa702ff57d34ebed961043e0efae064bfdb62f589cbb515311ae8ea37ba8c76'), size_on_disk=429423, blob_last_accessed=1763591393.4671698, blob_last_modified=1758648268.0406532), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=CS002/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/a76f0849e518faedfe50e6f0c76908a3f6beed34fd0b607603db1859493595d0'), size_on_disk=292312, blob_last_accessed=1763587658.6168654, blob_last_modified=1758648266.7676473), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_7185/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/14f67668aca8d8f8aa2b8af98835da683a7fc3f22aa09cdcd7f943a37632391e'), size_on_disk=201650, blob_last_accessed=1763591393.2191715, blob_last_modified=1758648270.1076627), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=DS001/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/7cb5cca74f1bda2f8f1c8ef962e54d34228361eaa9dd8b21e0d1dd7092ff8164'), size_on_disk=229703, blob_last_accessed=1763591392.555176, blob_last_modified=1758648267.0126483), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_5961/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/49cac5c2df1adf790a9ac0209db6c0945de50b8191d403e8b1ad386ee2dcb35e'), size_on_disk=321368, blob_last_accessed=1763591392.560176, blob_last_modified=1758648268.215654), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_5301_5088/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/65ef775934339b80eda59a141cf791a74e1c6f12513d8143307cebe2f11c3738'), size_on_disk=410106, blob_last_accessed=1763591391.517183, blob_last_modified=1758648267.9516528), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=CS009/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/5a70fa6f22aeda3246d17646cc4ec7c5a46e1993f9981b19d9fd5a8cc99bf11f'), size_on_disk=308858, blob_last_accessed=1763591391.9411802, blob_last_modified=1758648267.0186484), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=DS005/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/dc74862f02ff12b8c906046f1f03953d33f5516091d58397b0f9a0fc05bb4b5c'), size_on_disk=315435, blob_last_accessed=1763591391.517183, blob_last_modified=1758648267.5006506), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6872/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/02fabe6e32f52fe6fca923beb0a6cc2898d991304e23a3e75f3ae8a6bcf361c6'), size_on_disk=274362, blob_last_accessed=1763591393.6231687, blob_last_modified=1758648269.6966608), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6527/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/10081f44b1c2f5ad32489da51fc9ac5b5289b54688a7e78725db247a644b7e4d'), size_on_disk=494640, blob_last_accessed=1763591391.517183, blob_last_modified=1758648268.882657), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=GJ003/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/8e20f616f6e340b76f6446d2cdffb6cc9feb81297bd0307733f835acf4142cfd'), size_on_disk=269523, blob_last_accessed=1763591392.3461773, blob_last_modified=1758648267.2696495), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=PR002/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/bbe701ffd546a2f4359f821262f7b44232bdc328d9d4719bfdaa8f2aedc6d2e9'), size_on_disk=346079, blob_last_accessed=1763591393.0861723, blob_last_modified=1758648267.589651), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6689/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/90ea249116dc6a9be86974e1ec0acf79558dfd56a5666be2e05d2107b8c4ee60'), size_on_disk=366787, blob_last_accessed=1763591393.6661685, blob_last_modified=1758648269.1406581), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6677/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/4ee6e1d3e2b9a9bbcc8851048d2b16f4aea6a1e2f752afb9e0bf75a56adf7042'), size_on_disk=322147, blob_last_accessed=1763591392.393177, blob_last_modified=1758648269.0826578), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_7209/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/ab338795b8bbf334650386a123f6e0df41363cb5dcbb778d485ce58804e96b29'), size_on_disk=375506, blob_last_accessed=1763591392.9711733, blob_last_modified=1758648270.1326628), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6920/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/45122fcfb3abc3a43631023712c224dc6e00dba566fb2e15deb4ab4501ad2538'), size_on_disk=446998, blob_last_accessed=1763591392.8881738, blob_last_modified=1758648269.6416605), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=DS006/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/0230df4f1d3748337e5315bb1d22a0373fdab95f9bc900041b0e34966d868718'), size_on_disk=311116, blob_last_accessed=1763591392.9241736, blob_last_modified=1758648267.2506495), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6573/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/e435989051765cf2955d8cd733904e8b60e5f3121e1c2fdc987f4e6f71fe3970'), size_on_disk=265022, blob_last_accessed=1763591391.517183, blob_last_modified=1758648268.9586573), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_5801/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/12e91a415cc2a149af4c34c3c8b032821ff301e2828dfb9f4636bd6b2913ea3d'), size_on_disk=360485, blob_last_accessed=1763591391.518183, blob_last_modified=1758648268.1636536), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_7370/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/ff4dce26afb34b291220c17625be4a28078f9721a8b000ea177c4add3bf1a039'), size_on_disk=241010, blob_last_accessed=1763591393.7701678, blob_last_modified=1758648270.6726654), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=DS004/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/c9b19726dc45896e4f86a5acc52d07cc422f895bfaaaddb332aaceedff36560c'), size_on_disk=178793, blob_last_accessed=1763591391.517183, blob_last_modified=1758648267.0266485), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6421/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/1e3bacd3bcf1fcb46cfc5f9ab66c2af57e190f3d006031e7ef37864fbd36aaca'), size_on_disk=495452, blob_last_accessed=1763591392.2161784, blob_last_modified=1758648268.6366558), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=GJ007/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/7ef036874e53ee98dfe1b5d84c42749fc40da9279cd573174321ac83201c1651'), size_on_disk=349762, blob_last_accessed=1763591393.0791724, blob_last_modified=1758648267.4966507), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_7115/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/205443fd87ce5c793dab86eb75c21dab7dc2ae2edac41d7e6314cfe491bdb01d'), size_on_disk=321850, blob_last_accessed=1763591393.2331715, blob_last_modified=1758648269.9106617), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=CS010/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/ac8c590c19e7b95b9aac571b1394976039f25fdd66e62a68ecb99925433f8e54'), size_on_disk=355706, blob_last_accessed=1763591391.517183, blob_last_modified=1758648267.0226483), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6437/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/556084dbbc174db5450872d6f5e66cc72091380ea32006977dd176e23c105a7f'), size_on_disk=389840, blob_last_accessed=1763591392.714175, blob_last_modified=1758648268.6296558), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_7155/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/521fc1bea6963640645c21cbdb8d1de07793f11213c23ac71d249f3252b32f54'), size_on_disk=304554, blob_last_accessed=1763591393.5841691, blob_last_modified=1758648270.1366627), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_5610/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/a86512ae1000a994c86e9aebaf30cf9ce1a59160155f3dc1afa6cbde0a1fc47e'), size_on_disk=307382, blob_last_accessed=1763591393.5291693, blob_last_modified=1758648267.994653), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=CS003/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/a4d098c33564f99ec6aed77d585d1f6e871806f441f099790eef4d6478e322af'), size_on_disk=264459, blob_last_accessed=1763591391.517183, blob_last_modified=1758648266.7486472), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_7211/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/73d7747698fa1168c89f12c105bbe900d5969d27a82ae2302d0720d761fbfdab'), size_on_disk=298482, blob_last_accessed=1763591391.517183, blob_last_modified=1758648270.1276627), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6354/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/80b129a46defc7b06e472a35935bfaf410bc5e456475b2885feb2c8a4934a81b'), size_on_disk=173225, blob_last_accessed=1763591393.4981697, blob_last_modified=1758648268.4986553), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6177/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/b91ffef1dfcdf7952df46956d76a4b364aa0d8342453636cf09e6a343607184f'), size_on_disk=387618, blob_last_accessed=1763591391.518183, blob_last_modified=1758648268.424655), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6423/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/1c8a8cb23b1ec073827d6f5343a3bf03a9d7a04cf2ae3fd3ea4cc5ac37673b38'), size_on_disk=342987, blob_last_accessed=1763591391.517183, blob_last_modified=1758648268.6226559), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=MAG001/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/30b63d014686a103b5e7fc6571e902fa9c08630a96557cc12ec9dfcf60871306'), size_on_disk=347333, blob_last_accessed=1763591391.517183, blob_last_modified=1758648267.559651), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_7276/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/b4e0d576cc4a6a85be64f2198e5001a9f5b02b506072df6f695cfdfc719cb041'), size_on_disk=204036, blob_last_accessed=1763591392.987173, blob_last_modified=1758648270.4646642), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6651/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/7644af4cc956ea08809ab6c80c65095e3fdbd36e056040c0eaded3773d546359'), size_on_disk=559951, blob_last_accessed=1763591393.6361687, blob_last_modified=1758648269.1856585), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_7367/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/a7778bb24506047963575fa96126157895c7f329996c729c3e7d6070595955c3'), size_on_disk=324361, blob_last_accessed=1763591392.1631787, blob_last_modified=1758648270.628665), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6770/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/082f14f7b0d5fc47e088efc40ae85a8ec41f99437c66da96ca69af210bce61cc'), size_on_disk=318118, blob_last_accessed=1763591392.8081744, blob_last_modified=1758648269.3776593), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=DS002/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/74cbce50c258bdddb7e42c8fd5b0b2a9da74fe3cf89a9ab5a53ee4fde7c1dee6'), size_on_disk=186142, blob_last_accessed=1763591392.842174, blob_last_modified=1758648266.9966483), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_7118/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/be8bd409b83526d5671501de6ef2980177955fdcc3a13d5e153d2593493b2a1e'), size_on_disk=346308, blob_last_accessed=1763591392.9281735, blob_last_modified=1758648270.1336627), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=PR008/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/f427ea0f4ded3b6ff841153d8ad87d72a26a964c189246fb6fd5c439bd8b0c1e'), size_on_disk=353623, blob_last_accessed=1763591391.518183, blob_last_modified=1758648267.812652), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=PR005/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/62e9a627f20607bc4887ebc9277131f974ec80ff39ac386775f4b8f2f5cb9d1d'), size_on_disk=346782, blob_last_accessed=1763591391.517183, blob_last_modified=1758648267.7476518), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_7012/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/bd2017e75947ead2c06f9d1936347f7fdcfac7fbff620e0b39077208a02c3bd9'), size_on_disk=303094, blob_last_accessed=1763591393.1271722, blob_last_modified=1758648269.9296618), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=DS003/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/11485fd7bb94d0093ce2508d541fea3ce64f934197d8a5f7bb26ecbdb59f8dbb'), size_on_disk=124642, blob_last_accessed=1763591392.3321776, blob_last_modified=1758648267.0666487), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=CS005/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/3d50648ed5c9f593f830aff661d13b3d34bd14f0bf93b16169df4765c9f47fc8'), size_on_disk=295924, blob_last_accessed=1763591391.517183, blob_last_modified=1758648266.7426472), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6545/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/260c4a280f52024cb12d44559a6189641ee02efbaf20168435196efbfc69d741'), size_on_disk=404812, blob_last_accessed=1763591392.3761773, blob_last_modified=1758648268.8256567), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6924/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/5e9b3a94d73c1720e6b05ea55000054905a8672edea7f950736ce3fe191f8faf'), size_on_disk=412939, blob_last_accessed=1763591393.748168, blob_last_modified=1758648269.7066607), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=CS006/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/530b5650d91dae3bf169f967af8e800611e3e01daeb1811d6fd21c2612ac4ad9'), size_on_disk=315170, blob_last_accessed=1763591393.6121688, blob_last_modified=1758648266.7956474), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6853/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/ed47ecb2aa870fdbf12920fd4486933262c0a24318c9c8c5531e359a3c416c90'), size_on_disk=515114, blob_last_accessed=1763591393.2661712, blob_last_modified=1758648269.4276595), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6567/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/0e9713a94594dfcbc80134af52449e12ef207ca9149f5a96c43b6cb457b2431c'), size_on_disk=478809, blob_last_accessed=1763591392.6241755, blob_last_modified=1758648268.893657), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6635/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/6c5b7f1fda3a7f5679870f438d13a45558f7f8da0f479c5144d7751bc8f7fbbc'), size_on_disk=420658, blob_last_accessed=1763591391.95018, blob_last_modified=1758648269.3956594), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_7332/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/f7f62ee008fa2b5cc688cbd80c888f1bf9da538caa245b63581520d485338a99'), size_on_disk=309595, blob_last_accessed=1763591393.1021724, blob_last_modified=1758648270.605665), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6506/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/986d9d3b4ed064d66ebaa0f2502b933f1650a20ac4bbbba804e2cde03b9454c0'), size_on_disk=378407, blob_last_accessed=1763591391.97318, blob_last_modified=1758648268.7656565), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6513/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/c8f9c31704c2f8732b4668e1e6410034b8adf2ada39ff55935aa0199e920d055'), size_on_disk=507367, blob_last_accessed=1763591392.1451788, blob_last_modified=1758648268.8406568), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_7256/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/1af21cbdd7f6f6effff15a1b061a48f85132d582f08c8fd42e04ea58bde7ec00'), size_on_disk=206710, blob_last_accessed=1763591392.586176, blob_last_modified=1758648270.375664), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6739/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/58b5ed63f2197f406dd3df6a569975d776722193007912718979eaed4d4e5f9c'), size_on_disk=399160, blob_last_accessed=1763591392.6551754, blob_last_modified=1758648269.331659), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6731/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/23ecd71c4e3c1c547295bb756d4654eb413ff288d10118ab286cd21ccd95b666'), size_on_disk=346187, blob_last_accessed=1763591392.1871786, blob_last_modified=1758648269.1906583), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_7100/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/6b92129c2fb99028b6cab9509cf1a4c40b142109cde34c9a130d375001dcd211'), size_on_disk=267273, blob_last_accessed=1763591392.7811744, blob_last_modified=1758648269.9106617), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=CS011/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/7ea7cb0559b342dd0ec278e7d5846a5ba0c0d867ea04d545f72e0d1c9472a0cc'), size_on_disk=363979, blob_last_accessed=1763591392.9791732, blob_last_modified=1758648267.0116484), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_5617/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/bc01241e8363373c995a0fa7439aa1e432a60eb8a7805124fff302311f638361'), size_on_disk=351950, blob_last_accessed=1763591393.6541686, blob_last_modified=1758648268.1026535), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_5399/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/b65d0aa7ddc98940f91f2ae5f07e1d3f81dda688792552b91fcc39fb4f9ededa'), size_on_disk=256729, blob_last_accessed=1763591392.5471761, blob_last_modified=1758648267.972653), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6140/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/66c2da17957d7e6a8b159d3573879ac779d0094e35150b51e2643f7ddee36e1f'), size_on_disk=332958, blob_last_accessed=1763591391.517183, blob_last_modified=1758648268.4236548), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6965/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/9ae4546d58672f0632757c94b595af8b7c40da215611783fdc1a65000075ec98'), size_on_disk=509677, blob_last_accessed=1763591392.2731779, blob_last_modified=1758648269.933662), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=CS006b/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/7eedae811a5b3dd4389e7e814d35b022bfaf01fc4d2cfb47be6e6786d6d58c64'), size_on_disk=315912, blob_last_accessed=1763591393.2571712, blob_last_modified=1758648266.7546473), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=PR009/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/00e5602b87ac84670a98c081e6520ca744038f4e9b3a72a78cb61f5926be56b1'), size_on_disk=251087, blob_last_accessed=1763591392.5781758, blob_last_modified=1758648267.8286521), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=GJ005/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/646c843611bf88c7f5dcd5b1a664d724c3b24b6434cc52d9a54de62ccaf6c542'), size_on_disk=255949, blob_last_accessed=1763591391.517183, blob_last_modified=1758648267.4506505), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6923/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/454ca6a52d4b305eddc4b6967ef9604be4861269fdcfff419883daed9b2fe48b'), size_on_disk=337423, blob_last_accessed=1763591393.5031695, blob_last_modified=1758648269.6476605), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=GJ006/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/fa1a44bd33df22ec6b5a8baf3be17e51a11522b5fdd545a38e6e163cc5ae8207'), size_on_disk=241911, blob_last_accessed=1763591392.572176, blob_last_modified=1758648267.5226507), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_7269/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/5bae329d2318610f627fc562a5163253c7d61e0ef83c53808b220133b2299264'), size_on_disk=203263, blob_last_accessed=1763591392.541176, blob_last_modified=1758648270.4096642), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_5690/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/4831e3e037d421a14efbe1da8c369015ec7d01df351bb27f0ef0400c8b94c693'), size_on_disk=311754, blob_last_accessed=1763591392.9021738, blob_last_modified=1758648268.211654), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6443/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/4deb4634e8b4bf141a2410e94b2d99e2b31a0e8f037de9a1c6e14696025bffae'), size_on_disk=494775, blob_last_accessed=1763591392.991173, blob_last_modified=1758648268.6296558), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=DS008/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/1fa09a5a6e6e69198501d994050adfcad7ff39a3cc2fe94e7ea177414e1bc968'), size_on_disk=303532, blob_last_accessed=1763591393.7091682, blob_last_modified=1758648267.2776496), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6374/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/e410602f56bc94d9831a2bc8bdb865ee4ebe9862a65b603165b655fbcc4b9914'), size_on_disk=441531, blob_last_accessed=1763591393.1891718, blob_last_modified=1758648268.5356555), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6551/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/c3729692991cdcafe9baa3d2805b3866f72fe9bc89ba22c6d2cdae4d5b5bbf5f'), size_on_disk=545477, blob_last_accessed=1763591393.0611725, blob_last_modified=1758648268.8376567), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_7283/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/0dc91bfacca82c5ed23de7757d59f9f808d541d96dd79aab56f0c4252e7aabf3'), size_on_disk=265146, blob_last_accessed=1763591392.7731745, blob_last_modified=1758648270.401664), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_7266/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/9aea2d6f5606d24e1999d450379b1f8c8b9d80eded9e784fb729faa516a27026'), size_on_disk=324480, blob_last_accessed=1763591392.0401795, blob_last_modified=1758648270.379664), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=CS008/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/e4e8332a5db6260f6357f940e62e3e7b622785f386e81cdb9c75b538d318f4a6'), size_on_disk=158978, blob_last_accessed=1763591393.5471692, blob_last_modified=1758648266.7626472), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6708/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/35520976736159652e3be8066e77b1371522f63a8b3564d9b0d9f5b1598a144d'), size_on_disk=253955, blob_last_accessed=1763591393.6881683, blob_last_modified=1758648269.1996584), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=CS012/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/7d3e36418bd46acf9537a8f864b1a1361a062d72fec2b093789b88ed655ec117'), size_on_disk=377393, blob_last_accessed=1763591393.0451727, blob_last_modified=1758648267.0136483), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6954/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/aeed3c55d81a4eb0c568486e476211c599ce2bf37f9dff22c7dfca23f5d80495'), size_on_disk=476054, blob_last_accessed=1763591393.011173, blob_last_modified=1758648269.943662), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_7331/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/f159a9811b8ac1b5e252b52427979a3bddce2e7131b13f84046b31e47f7fa399'), size_on_disk=242821, blob_last_accessed=1763591393.2501714, blob_last_modified=1758648270.605665), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6073/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/3d68ca6411ba47eb63adad81e88fa1341a0ee269897268957ae936dcc9ba7788'), size_on_disk=450031, blob_last_accessed=1763591393.1341722, blob_last_modified=1758648268.3886547), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6061/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/73e03845d0fbff1a5931e4f792e2c1965ade02fc622ffc08e05c267ac6cd3231'), size_on_disk=394930, blob_last_accessed=1763591391.5161831, blob_last_modified=1758648268.447655), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6055/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/31bb70b4c90654ecb276664f5c914679322b24985ec66d5b396a99f253ebfa17'), size_on_disk=377015, blob_last_accessed=1763591393.5701692, blob_last_modified=1758648268.3076544), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6021/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/eea0d101dd58c9619b41a94009fc0dff8156d2d36d7f2b88bb653e96123a032e'), size_on_disk=439545, blob_last_accessed=1763591393.5531693, blob_last_modified=1758648268.258654), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=GJ002/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/e5b3174998823af40d055f46ca06e7bb44601cbe97b133062cd3e0491dd30ac2'), size_on_disk=323308, blob_last_accessed=1763591393.153172, blob_last_modified=1758648267.2686496), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6657/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/95ad2225152f51e2e31c030ff6708aeebccee5e548035d6f43c9cc3554bdfe5e'), size_on_disk=507730, blob_last_accessed=1763591393.163172, blob_last_modified=1758648269.1676583), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_7243/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/cb59e645fff2683db692b4d2f514602ba210271b632cd113853febe941c6485f'), size_on_disk=281696, blob_last_accessed=1763591392.701175, blob_last_modified=1758648270.375664), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6572/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/3488eaac5e4b9b3418b909138e6b50e3f9ebaf3e7345a4a10b6ced18b66152e4'), size_on_disk=440579, blob_last_accessed=1763591391.517183, blob_last_modified=1758648268.9506574), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=GJ004/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/d1cd38451bc41ef14dbc1566c4704b004b2c0661f667210c81dba09bec90540f'), size_on_disk=244487, blob_last_accessed=1763591393.6301687, blob_last_modified=1758648267.3246498), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=GJ001/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/ddd58117dbedfe7e9dc3fd946187e7bd5d88fe8b0f38999c6e4b5660938bb787'), size_on_disk=359739, blob_last_accessed=1763591393.5981688, blob_last_modified=1758648267.2726495), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6983/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/b4ee29f84ab279c520521a6c5a5eed9e5e5581b2e39378bdd898eec67d13654f'), size_on_disk=410123, blob_last_accessed=1763591392.6751752, blob_last_modified=1758648269.8636615), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_5935/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/33d7303ed077f489ee28021daca0bbc859feb1cfd383dfc9a856ea13994801a9'), size_on_disk=331877, blob_last_accessed=1763591393.292171, blob_last_modified=1758648268.2226539), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6863/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/ec534c91cae4413d30477014131d39b51c9c236dfbf150874bd21ec3de821613'), size_on_disk=389025, blob_last_accessed=1763591391.5161831, blob_last_modified=1758648269.6376605), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=PR006B/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/fcc5e886ec7c6363a5ab51b03d3da189ac3e0cde0a0f79d4f0c46043a6b61838'), size_on_disk=253857, blob_last_accessed=1763591392.0501795, blob_last_modified=1758648267.785652), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6855/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/48e48378d24404e926bc1c44b10af60d4b12f82124ff2f80bc0b69b4a0f98740'), size_on_disk=609248, blob_last_accessed=1763591391.517183, blob_last_modified=1758648269.6556606), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=MAG002/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/4f7f35257d4c1f7fa5812ea31c1de791e08ab5e3a4e7ca42b2bb12b2c64f7ed7'), size_on_disk=490174, blob_last_accessed=1763591392.829174, blob_last_modified=1758648267.583651), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6808/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/c71f5371e8c803cfc7fd507d0b3c11f7548db5d7a0fa42c68977169369ea036e'), size_on_disk=422880, blob_last_accessed=1763591392.9491735, blob_last_modified=1758648269.3916593), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_7137/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/703fe9f8552adcbdfb5e640509cb62df2fa734b43fe1e98fc1fd342415f823d7'), size_on_disk=515398, blob_last_accessed=1763589053.7698598, blob_last_modified=1758648270.1326628), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=PR003/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/fc9311b65abb851fbbfd8bb0c0765af59069659f5a715aee546a8644b0af4ec2'), size_on_disk=210948, blob_last_accessed=1763591391.5161831, blob_last_modified=1758648267.6766515), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=KB001/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/38c939d54d3ed77a09650d2ccf00a6ed0e20390b8be722f1c3ed7881a5904a60'), size_on_disk=351677, blob_last_accessed=1763591392.9151735, blob_last_modified=1758648267.5566509), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=PR010/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/dcb33135a9d854d17a6b80348f6bee8becea9ca3808583a034f89c5e4f460c54'), size_on_disk=218643, blob_last_accessed=1763591393.4831698, blob_last_modified=1758648267.9636528), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6938/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/eb61968b5ec5eadd9087e42ead29333f91195dd5b3f9c5a616b41e94ca23c494'), size_on_disk=330245, blob_last_accessed=1763591393.6761684, blob_last_modified=1758648269.710661), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_7151/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/9969e4607d6bf42424f5d2ce8f0d16e3d2d89d069cae6d75997e99c88a68dc77'), size_on_disk=275572, blob_last_accessed=1763591393.7231681, blob_last_modified=1758648270.1476629), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=PR007B/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/6f0620b102e7cbfadced7d5530545a185036b53118d3e2b7f074c56403d80f4b'), size_on_disk=263185, blob_last_accessed=1763591391.517183, blob_last_modified=1758648267.7616518), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6738/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/e6d612d16538a06c985f1134f3fdfd367944b2a85cedfb8b0ddad1679004c940'), size_on_disk=333560, blob_last_accessed=1763591392.8651738, blob_last_modified=1758648269.3676593), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_7240/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/a4460d188b4cd3b362f33acf0f0b5312a267b37f8fb2dab4b2577c6c3bbb1231'), size_on_disk=307039, blob_last_accessed=1763591392.6101756, blob_last_modified=1758648270.403664), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_7258/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/7b1860ba3be7dbf053203fa5690da2002f7841186940231acea6064e70f0bdc8'), size_on_disk=171331, blob_last_accessed=1763591392.6171756, blob_last_modified=1758648270.3746638), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6959/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/c0cb6c2ec2bda19b0c1da5d5fc2229d0b6d6987d75646fa230c0c14d591cf1f1'), size_on_disk=537588, blob_last_accessed=1763591392.851174, blob_last_modified=1758648269.8686616), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6106/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/02313d4d85859a7353e56d89b6e14bcec5160849b5c0089ccbecaa1cda5012cf'), size_on_disk=301213, blob_last_accessed=1763591391.517183, blob_last_modified=1758648268.4356549), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6861/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/6955b1a1746fe41162ea1d7a098482bbfdb3d0e411c809b52d698e285627ce03'), size_on_disk=385103, blob_last_accessed=1763591391.517183, blob_last_modified=1758648269.6146603), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6640/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/48339729112d22aa5aabf411674bfb90a567ec4c7a07ea7930738645a96823ef'), size_on_disk=329042, blob_last_accessed=1763591392.6391754, blob_last_modified=1758648269.090658), CachedFileInfo(file_name='annotated_features_meta.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features_meta.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/2be4b693d333698f7ada7ef40ffaf927c0160d7975ebcf6e215cbc4ba1be1604'), size_on_disk=11813, blob_last_accessed=1763587594.724117, blob_last_modified=1758648264.795638), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6390/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/4f324cdf73148c8b0fa23456bc4f51cff3032d08ed42e44dd9fcc4815a6d969c'), size_on_disk=560601, blob_last_accessed=1763591392.7391748, blob_last_modified=1758648268.6206558), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6798/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/899cde55032cd593be0013e1f6ff33fe9a12f4616477b1410d86fae1b2008a09'), size_on_disk=342376, blob_last_accessed=1763591392.7951744, blob_last_modified=1758648269.4006593), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_7295/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/19bd07f913e0af3460e6b4b5091971ceed8f4aa510cf18b82ec36dabb67c68b8'), size_on_disk=272619, blob_last_accessed=1763591393.1151721, blob_last_modified=1758648270.605665), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/27becd0ab489579a50c38ba896dd03680c5fd2da/annotated_features/batch=run_6764/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/ec87eef9e2e8cdf3c3a10ead28dcf1bfa76e94e12a42422a46c8559fb18207c2'), size_on_disk=75684, blob_last_accessed=1763591391.517183, blob_last_modified=1758648269.4036593)}), refs=frozenset(), last_modified=1758648270.6726654), CachedRevisionInfo(commit_hash='6864b2582ce35f92a8dde59851786319c2c494bc', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc'), size_on_disk=49590351, files=frozenset({CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=CS011/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/7ea7cb0559b342dd0ec278e7d5846a5ba0c0d867ea04d545f72e0d1c9472a0cc'), size_on_disk=363979, blob_last_accessed=1763591392.9791732, blob_last_modified=1758648267.0116484), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6635/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/6c5b7f1fda3a7f5679870f438d13a45558f7f8da0f479c5144d7751bc8f7fbbc'), size_on_disk=420658, blob_last_accessed=1763591391.95018, blob_last_modified=1758648269.3956594), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=GJ003/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/8e20f616f6e340b76f6446d2cdffb6cc9feb81297bd0307733f835acf4142cfd'), size_on_disk=269523, blob_last_accessed=1763591392.3461773, blob_last_modified=1758648267.2696495), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6872/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/02fabe6e32f52fe6fca923beb0a6cc2898d991304e23a3e75f3ae8a6bcf361c6'), size_on_disk=274362, blob_last_accessed=1763591393.6231687, blob_last_modified=1758648269.6966608), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=GJ002/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/e5b3174998823af40d055f46ca06e7bb44601cbe97b133062cd3e0491dd30ac2'), size_on_disk=323308, blob_last_accessed=1763591393.153172, blob_last_modified=1758648267.2686496), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=DS002/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/74cbce50c258bdddb7e42c8fd5b0b2a9da74fe3cf89a9ab5a53ee4fde7c1dee6'), size_on_disk=186142, blob_last_accessed=1763591392.842174, blob_last_modified=1758648266.9966483), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_7209/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/ab338795b8bbf334650386a123f6e0df41363cb5dcbb778d485ce58804e96b29'), size_on_disk=375506, blob_last_accessed=1763591392.9711733, blob_last_modified=1758648270.1326628), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_5423/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/e871cf06f596bfe74896b62aa28d6b1e6f2f0b8408630fb146c7e29cfd88fbf0'), size_on_disk=232721, blob_last_accessed=1763591392.7871745, blob_last_modified=1758648267.991653), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=PR008/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/f427ea0f4ded3b6ff841153d8ad87d72a26a964c189246fb6fd5c439bd8b0c1e'), size_on_disk=353623, blob_last_accessed=1763591391.518183, blob_last_modified=1758648267.812652), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_5654/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/38961219eb66dec31505ebcc5088f16684be9891b96fffa8c9a13cd3f13254d0'), size_on_disk=447782, blob_last_accessed=1763591391.5161831, blob_last_modified=1758648268.1876538), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_7063/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/9427d8f8c54e3c0cc9d720958ef15d3859f00cdfb678559e0268387a36a8ef37'), size_on_disk=404051, blob_last_accessed=1763591392.5301762, blob_last_modified=1758648269.8976617), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_5690/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/4831e3e037d421a14efbe1da8c369015ec7d01df351bb27f0ef0400c8b94c693'), size_on_disk=311754, blob_last_accessed=1763591392.9021738, blob_last_modified=1758648268.211654), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6100/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/7d4c0cbbf6075a8b5732e3cef2ba154d95d21aa0efc7e49d947cceebd8742b30'), size_on_disk=269007, blob_last_accessed=1763591391.517183, blob_last_modified=1758648268.4106548), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=PR002/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/bbe701ffd546a2f4359f821262f7b44232bdc328d9d4719bfdaa8f2aedc6d2e9'), size_on_disk=346079, blob_last_accessed=1763591393.0861723, blob_last_modified=1758648267.589651), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6853/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/ed47ecb2aa870fdbf12920fd4486933262c0a24318c9c8c5531e359a3c416c90'), size_on_disk=515114, blob_last_accessed=1763591393.2661712, blob_last_modified=1758648269.4276595), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6764/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/ec87eef9e2e8cdf3c3a10ead28dcf1bfa76e94e12a42422a46c8559fb18207c2'), size_on_disk=75684, blob_last_accessed=1763591391.517183, blob_last_modified=1758648269.4036593), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6855/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/48e48378d24404e926bc1c44b10af60d4b12f82124ff2f80bc0b69b4a0f98740'), size_on_disk=609248, blob_last_accessed=1763591391.517183, blob_last_modified=1758648269.6556606), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6640/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/48339729112d22aa5aabf411674bfb90a567ec4c7a07ea7930738645a96823ef'), size_on_disk=329042, blob_last_accessed=1763591392.6391754, blob_last_modified=1758648269.090658), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_7332/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/f7f62ee008fa2b5cc688cbd80c888f1bf9da538caa245b63581520d485338a99'), size_on_disk=309595, blob_last_accessed=1763591393.1021724, blob_last_modified=1758648270.605665), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_7243/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/cb59e645fff2683db692b4d2f514602ba210271b632cd113853febe941c6485f'), size_on_disk=281696, blob_last_accessed=1763591392.701175, blob_last_modified=1758648270.375664), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_7295/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/19bd07f913e0af3460e6b4b5091971ceed8f4aa510cf18b82ec36dabb67c68b8'), size_on_disk=272619, blob_last_accessed=1763591393.1151721, blob_last_modified=1758648270.605665), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=CS006/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/530b5650d91dae3bf169f967af8e800611e3e01daeb1811d6fd21c2612ac4ad9'), size_on_disk=315170, blob_last_accessed=1763591393.6121688, blob_last_modified=1758648266.7956474), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_5301_5088/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/65ef775934339b80eda59a141cf791a74e1c6f12513d8143307cebe2f11c3738'), size_on_disk=410106, blob_last_accessed=1763591391.517183, blob_last_modified=1758648267.9516528), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_7283/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/0dc91bfacca82c5ed23de7757d59f9f808d541d96dd79aab56f0c4252e7aabf3'), size_on_disk=265146, blob_last_accessed=1763591392.7731745, blob_last_modified=1758648270.401664), CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/fc266bb938fa146c46d1a7bd5446d5d27b3bf202'), size_on_disk=10887, blob_last_accessed=1763649507.5438576, blob_last_modified=1763649507.5418575), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=GJ006/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/fa1a44bd33df22ec6b5a8baf3be17e51a11522b5fdd545a38e6e163cc5ae8207'), size_on_disk=241911, blob_last_accessed=1763591392.572176, blob_last_modified=1758648267.5226507), CachedFileInfo(file_name='annotated_features_meta.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features_meta.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/2be4b693d333698f7ada7ef40ffaf927c0160d7975ebcf6e215cbc4ba1be1604'), size_on_disk=11813, blob_last_accessed=1763587594.724117, blob_last_modified=1758648264.795638), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=MAG002/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/4f7f35257d4c1f7fa5812ea31c1de791e08ab5e3a4e7ca42b2bb12b2c64f7ed7'), size_on_disk=490174, blob_last_accessed=1763591392.829174, blob_last_modified=1758648267.583651), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_7118/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/be8bd409b83526d5671501de6ef2980177955fdcc3a13d5e153d2593493b2a1e'), size_on_disk=346308, blob_last_accessed=1763591392.9281735, blob_last_modified=1758648270.1336627), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=CS008/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/e4e8332a5db6260f6357f940e62e3e7b622785f386e81cdb9c75b538d318f4a6'), size_on_disk=158978, blob_last_accessed=1763591393.5471692, blob_last_modified=1758648266.7626472), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6920/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/45122fcfb3abc3a43631023712c224dc6e00dba566fb2e15deb4ab4501ad2538'), size_on_disk=446998, blob_last_accessed=1763591392.8881738, blob_last_modified=1758648269.6416605), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=PR006B/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/fcc5e886ec7c6363a5ab51b03d3da189ac3e0cde0a0f79d4f0c46043a6b61838'), size_on_disk=253857, blob_last_accessed=1763591392.0501795, blob_last_modified=1758648267.785652), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_5801/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/12e91a415cc2a149af4c34c3c8b032821ff301e2828dfb9f4636bd6b2913ea3d'), size_on_disk=360485, blob_last_accessed=1763591391.518183, blob_last_modified=1758648268.1636536), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6390/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/4f324cdf73148c8b0fa23456bc4f51cff3032d08ed42e44dd9fcc4815a6d969c'), size_on_disk=560601, blob_last_accessed=1763591392.7391748, blob_last_modified=1758648268.6206558), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=MAG001/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/30b63d014686a103b5e7fc6571e902fa9c08630a96557cc12ec9dfcf60871306'), size_on_disk=347333, blob_last_accessed=1763591391.517183, blob_last_modified=1758648267.559651), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6983/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/b4ee29f84ab279c520521a6c5a5eed9e5e5581b2e39378bdd898eec67d13654f'), size_on_disk=410123, blob_last_accessed=1763591392.6751752, blob_last_modified=1758648269.8636615), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_7370/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/ff4dce26afb34b291220c17625be4a28078f9721a8b000ea177c4add3bf1a039'), size_on_disk=241010, blob_last_accessed=1763591393.7701678, blob_last_modified=1758648270.6726654), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6532/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/3041f19cf0dcb3e0bd9b1178994351a22f3297ccd43c5e291df8bfe0ad8d5d02'), size_on_disk=516437, blob_last_accessed=1763591391.517183, blob_last_modified=1758648268.8236568), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6423/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/1c8a8cb23b1ec073827d6f5343a3bf03a9d7a04cf2ae3fd3ea4cc5ac37673b38'), size_on_disk=342987, blob_last_accessed=1763591391.517183, blob_last_modified=1758648268.6226559), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=PR004/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/8e91fd764a83947c14f35d7a2d8c0d156c0a61a32a1da96273718ef4be720787'), size_on_disk=361855, blob_last_accessed=1763591391.518183, blob_last_modified=1758648267.7196517), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=DS003/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/11485fd7bb94d0093ce2508d541fea3ce64f934197d8a5f7bb26ecbdb59f8dbb'), size_on_disk=124642, blob_last_accessed=1763591392.3321776, blob_last_modified=1758648267.0666487), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6572/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/3488eaac5e4b9b3418b909138e6b50e3f9ebaf3e7345a4a10b6ced18b66152e4'), size_on_disk=440579, blob_last_accessed=1763591391.517183, blob_last_modified=1758648268.9506574), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=DS004/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/c9b19726dc45896e4f86a5acc52d07cc422f895bfaaaddb332aaceedff36560c'), size_on_disk=178793, blob_last_accessed=1763591391.517183, blob_last_modified=1758648267.0266485), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_7331/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/f159a9811b8ac1b5e252b52427979a3bddce2e7131b13f84046b31e47f7fa399'), size_on_disk=242821, blob_last_accessed=1763591393.2501714, blob_last_modified=1758648270.605665), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6437/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/556084dbbc174db5450872d6f5e66cc72091380ea32006977dd176e23c105a7f'), size_on_disk=389840, blob_last_accessed=1763591392.714175, blob_last_modified=1758648268.6296558), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=PR003/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/fc9311b65abb851fbbfd8bb0c0765af59069659f5a715aee546a8644b0af4ec2'), size_on_disk=210948, blob_last_accessed=1763591391.5161831, blob_last_modified=1758648267.6766515), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_7351/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/cc708adf186f28db09dfdd4ac43865a90482b2192306949eab01492598cf23e7'), size_on_disk=198070, blob_last_accessed=1763591391.517183, blob_last_modified=1758648270.610665), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=CS003/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/a4d098c33564f99ec6aed77d585d1f6e871806f441f099790eef4d6478e322af'), size_on_disk=264459, blob_last_accessed=1763591391.517183, blob_last_modified=1758648266.7486472), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6861/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/6955b1a1746fe41162ea1d7a098482bbfdb3d0e411c809b52d698e285627ce03'), size_on_disk=385103, blob_last_accessed=1763591391.517183, blob_last_modified=1758648269.6146603), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6959/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/c0cb6c2ec2bda19b0c1da5d5fc2229d0b6d6987d75646fa230c0c14d591cf1f1'), size_on_disk=537588, blob_last_accessed=1763591392.851174, blob_last_modified=1758648269.8686616), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6923/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/454ca6a52d4b305eddc4b6967ef9604be4861269fdcfff419883daed9b2fe48b'), size_on_disk=337423, blob_last_accessed=1763591393.5031695, blob_last_modified=1758648269.6476605), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_7137/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/703fe9f8552adcbdfb5e640509cb62df2fa734b43fe1e98fc1fd342415f823d7'), size_on_disk=515398, blob_last_accessed=1763589053.7698598, blob_last_modified=1758648270.1326628), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=CS004/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/9f7663ccf72b0681ae5866c13500fbde98f35877fd677ac0d49c17447d17d6ce'), size_on_disk=332767, blob_last_accessed=1763591392.1731787, blob_last_modified=1758648266.744647), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6021/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/eea0d101dd58c9619b41a94009fc0dff8156d2d36d7f2b88bb653e96123a032e'), size_on_disk=439545, blob_last_accessed=1763591393.5531693, blob_last_modified=1758648268.258654), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=PR006/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/71a33d11c3ff714ce0730145c740eec4fe88812c6136b8d10b86fa8d4054991f'), size_on_disk=186500, blob_last_accessed=1763591391.517183, blob_last_modified=1758648267.7286518), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6965/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/9ae4546d58672f0632757c94b595af8b7c40da215611783fdc1a65000075ec98'), size_on_disk=509677, blob_last_accessed=1763591392.2731779, blob_last_modified=1758648269.933662), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_5986/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/4f1179c75ac6e91de03b67a73882f3b9c07c8e435d938d3059de489403adde7a'), size_on_disk=328749, blob_last_accessed=1763591392.691175, blob_last_modified=1758648268.228654), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6551/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/c3729692991cdcafe9baa3d2805b3866f72fe9bc89ba22c6d2cdae4d5b5bbf5f'), size_on_disk=545477, blob_last_accessed=1763591393.0611725, blob_last_modified=1758648268.8376567), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=DS007/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/128015caa722289251a45aa3d56cabac6b5946eb4e1be16eb950555fb24ac413'), size_on_disk=172328, blob_last_accessed=1763591393.735168, blob_last_modified=1758648267.2826495), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6443/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/4deb4634e8b4bf141a2410e94b2d99e2b31a0e8f037de9a1c6e14696025bffae'), size_on_disk=494775, blob_last_accessed=1763591392.991173, blob_last_modified=1758648268.6296558), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_5614/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/1aa702ff57d34ebed961043e0efae064bfdb62f589cbb515311ae8ea37ba8c76'), size_on_disk=429423, blob_last_accessed=1763591393.4671698, blob_last_modified=1758648268.0406532), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6448/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/182dc134914a5ddb02992ab6a4f1c12d2450aa331820bf7bd95693dc999bcfae'), size_on_disk=390210, blob_last_accessed=1763591392.9591732, blob_last_modified=1758648268.672656), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_7367/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/a7778bb24506047963575fa96126157895c7f329996c729c3e7d6070595955c3'), size_on_disk=324361, blob_last_accessed=1763591392.1631787, blob_last_modified=1758648270.628665), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=GJ007/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/7ef036874e53ee98dfe1b5d84c42749fc40da9279cd573174321ac83201c1651'), size_on_disk=349762, blob_last_accessed=1763591393.0791724, blob_last_modified=1758648267.4966507), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_7100/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/6b92129c2fb99028b6cab9509cf1a4c40b142109cde34c9a130d375001dcd211'), size_on_disk=267273, blob_last_accessed=1763591392.7811744, blob_last_modified=1758648269.9106617), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6506/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/986d9d3b4ed064d66ebaa0f2502b933f1650a20ac4bbbba804e2cde03b9454c0'), size_on_disk=378407, blob_last_accessed=1763591391.97318, blob_last_modified=1758648268.7656565), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_7155/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/521fc1bea6963640645c21cbdb8d1de07793f11213c23ac71d249f3252b32f54'), size_on_disk=304554, blob_last_accessed=1763591393.5841691, blob_last_modified=1758648270.1366627), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6924/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/5e9b3a94d73c1720e6b05ea55000054905a8672edea7f950736ce3fe191f8faf'), size_on_disk=412939, blob_last_accessed=1763591393.748168, blob_last_modified=1758648269.7066607), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=GJ005/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/646c843611bf88c7f5dcd5b1a664d724c3b24b6434cc52d9a54de62ccaf6c542'), size_on_disk=255949, blob_last_accessed=1763591391.517183, blob_last_modified=1758648267.4506505), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=PR009/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/00e5602b87ac84670a98c081e6520ca744038f4e9b3a72a78cb61f5926be56b1'), size_on_disk=251087, blob_last_accessed=1763591392.5781758, blob_last_modified=1758648267.8286521), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=GJ001/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/ddd58117dbedfe7e9dc3fd946187e7bd5d88fe8b0f38999c6e4b5660938bb787'), size_on_disk=359739, blob_last_accessed=1763591393.5981688, blob_last_modified=1758648267.2726495), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6677/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/4ee6e1d3e2b9a9bbcc8851048d2b16f4aea6a1e2f752afb9e0bf75a56adf7042'), size_on_disk=322147, blob_last_accessed=1763591392.393177, blob_last_modified=1758648269.0826578), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_7240/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/a4460d188b4cd3b362f33acf0f0b5312a267b37f8fb2dab4b2577c6c3bbb1231'), size_on_disk=307039, blob_last_accessed=1763591392.6101756, blob_last_modified=1758648270.403664), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6454/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/c35b60861d2fc7a3f4256869e14e629aed07cd2ccce76608578ceb8c6ec423e1'), size_on_disk=498326, blob_last_accessed=1763591391.517183, blob_last_modified=1758648268.7226562), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6374/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/e410602f56bc94d9831a2bc8bdb865ee4ebe9862a65b603165b655fbcc4b9914'), size_on_disk=441531, blob_last_accessed=1763591393.1891718, blob_last_modified=1758648268.5356555), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_7185/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/14f67668aca8d8f8aa2b8af98835da683a7fc3f22aa09cdcd7f943a37632391e'), size_on_disk=201650, blob_last_accessed=1763591393.2191715, blob_last_modified=1758648270.1076627), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=CS006b/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/7eedae811a5b3dd4389e7e814d35b022bfaf01fc4d2cfb47be6e6786d6d58c64'), size_on_disk=315912, blob_last_accessed=1763591393.2571712, blob_last_modified=1758648266.7546473), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=PR005/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/62e9a627f20607bc4887ebc9277131f974ec80ff39ac386775f4b8f2f5cb9d1d'), size_on_disk=346782, blob_last_accessed=1763591391.517183, blob_last_modified=1758648267.7476518), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6708/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/35520976736159652e3be8066e77b1371522f63a8b3564d9b0d9f5b1598a144d'), size_on_disk=253955, blob_last_accessed=1763591393.6881683, blob_last_modified=1758648269.1996584), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=CS012/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/7d3e36418bd46acf9537a8f864b1a1361a062d72fec2b093789b88ed655ec117'), size_on_disk=377393, blob_last_accessed=1763591393.0451727, blob_last_modified=1758648267.0136483), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6061/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/73e03845d0fbff1a5931e4f792e2c1965ade02fc622ffc08e05c267ac6cd3231'), size_on_disk=394930, blob_last_accessed=1763591391.5161831, blob_last_modified=1758648268.447655), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6731/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/23ecd71c4e3c1c547295bb756d4654eb413ff288d10118ab286cd21ccd95b666'), size_on_disk=346187, blob_last_accessed=1763591392.1871786, blob_last_modified=1758648269.1906583), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6573/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/e435989051765cf2955d8cd733904e8b60e5f3121e1c2fdc987f4e6f71fe3970'), size_on_disk=265022, blob_last_accessed=1763591391.517183, blob_last_modified=1758648268.9586573), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=GJ004/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/d1cd38451bc41ef14dbc1566c4704b004b2c0661f667210c81dba09bec90540f'), size_on_disk=244487, blob_last_accessed=1763591393.6301687, blob_last_modified=1758648267.3246498), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_7276/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/b4e0d576cc4a6a85be64f2198e5001a9f5b02b506072df6f695cfdfc719cb041'), size_on_disk=204036, blob_last_accessed=1763591392.987173, blob_last_modified=1758648270.4646642), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6738/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/e6d612d16538a06c985f1134f3fdfd367944b2a85cedfb8b0ddad1679004c940'), size_on_disk=333560, blob_last_accessed=1763591392.8651738, blob_last_modified=1758648269.3676593), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6055/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/31bb70b4c90654ecb276664f5c914679322b24985ec66d5b396a99f253ebfa17'), size_on_disk=377015, blob_last_accessed=1763591393.5701692, blob_last_modified=1758648268.3076544), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_7269/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/5bae329d2318610f627fc562a5163253c7d61e0ef83c53808b220133b2299264'), size_on_disk=203263, blob_last_accessed=1763591392.541176, blob_last_modified=1758648270.4096642), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=PR010/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/dcb33135a9d854d17a6b80348f6bee8becea9ca3808583a034f89c5e4f460c54'), size_on_disk=218643, blob_last_accessed=1763591393.4831698, blob_last_modified=1758648267.9636528), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6689/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/90ea249116dc6a9be86974e1ec0acf79558dfd56a5666be2e05d2107b8c4ee60'), size_on_disk=366787, blob_last_accessed=1763591393.6661685, blob_last_modified=1758648269.1406581), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=DS005/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/dc74862f02ff12b8c906046f1f03953d33f5516091d58397b0f9a0fc05bb4b5c'), size_on_disk=315435, blob_last_accessed=1763591391.517183, blob_last_modified=1758648267.5006506), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_7266/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/9aea2d6f5606d24e1999d450379b1f8c8b9d80eded9e784fb729faa516a27026'), size_on_disk=324480, blob_last_accessed=1763591392.0401795, blob_last_modified=1758648270.379664), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6863/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/ec534c91cae4413d30477014131d39b51c9c236dfbf150874bd21ec3de821613'), size_on_disk=389025, blob_last_accessed=1763591391.5161831, blob_last_modified=1758648269.6376605), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_7297/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/e4bd4088cbfcccdf96def916bb8fd9d673327d3e57c2a4a3e57299743ce4aca7'), size_on_disk=261344, blob_last_accessed=1763591393.588169, blob_last_modified=1758648270.6516652), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6938/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/eb61968b5ec5eadd9087e42ead29333f91195dd5b3f9c5a616b41e94ca23c494'), size_on_disk=330245, blob_last_accessed=1763591393.6761684, blob_last_modified=1758648269.710661), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6140/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/66c2da17957d7e6a8b159d3573879ac779d0094e35150b51e2643f7ddee36e1f'), size_on_disk=332958, blob_last_accessed=1763591391.517183, blob_last_modified=1758648268.4236548), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6513/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/c8f9c31704c2f8732b4668e1e6410034b8adf2ada39ff55935aa0199e920d055'), size_on_disk=507367, blob_last_accessed=1763591392.1451788, blob_last_modified=1758648268.8406568), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_7256/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/1af21cbdd7f6f6effff15a1b061a48f85132d582f08c8fd42e04ea58bde7ec00'), size_on_disk=206710, blob_last_accessed=1763591392.586176, blob_last_modified=1758648270.375664), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6421/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/1e3bacd3bcf1fcb46cfc5f9ab66c2af57e190f3d006031e7ef37864fbd36aaca'), size_on_disk=495452, blob_last_accessed=1763591392.2161784, blob_last_modified=1758648268.6366558), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=DS008/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/1fa09a5a6e6e69198501d994050adfcad7ff39a3cc2fe94e7ea177414e1bc968'), size_on_disk=303532, blob_last_accessed=1763591393.7091682, blob_last_modified=1758648267.2776496), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6567/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/0e9713a94594dfcbc80134af52449e12ef207ca9149f5a96c43b6cb457b2431c'), size_on_disk=478809, blob_last_accessed=1763591392.6241755, blob_last_modified=1758648268.893657), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=CS005/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/3d50648ed5c9f593f830aff661d13b3d34bd14f0bf93b16169df4765c9f47fc8'), size_on_disk=295924, blob_last_accessed=1763591391.517183, blob_last_modified=1758648266.7426472), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=DS001/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/7cb5cca74f1bda2f8f1c8ef962e54d34228361eaa9dd8b21e0d1dd7092ff8164'), size_on_disk=229703, blob_last_accessed=1763591392.555176, blob_last_modified=1758648267.0126483), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6106/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/02313d4d85859a7353e56d89b6e14bcec5160849b5c0089ccbecaa1cda5012cf'), size_on_disk=301213, blob_last_accessed=1763591391.517183, blob_last_modified=1758648268.4356549), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6527/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/10081f44b1c2f5ad32489da51fc9ac5b5289b54688a7e78725db247a644b7e4d'), size_on_disk=494640, blob_last_accessed=1763591391.517183, blob_last_modified=1758648268.882657), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6739/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/58b5ed63f2197f406dd3df6a569975d776722193007912718979eaed4d4e5f9c'), size_on_disk=399160, blob_last_accessed=1763591392.6551754, blob_last_modified=1758648269.331659), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6177/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/b91ffef1dfcdf7952df46956d76a4b364aa0d8342453636cf09e6a343607184f'), size_on_disk=387618, blob_last_accessed=1763591391.518183, blob_last_modified=1758648268.424655), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_5961/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/49cac5c2df1adf790a9ac0209db6c0945de50b8191d403e8b1ad386ee2dcb35e'), size_on_disk=321368, blob_last_accessed=1763591392.560176, blob_last_modified=1758648268.215654), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6545/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/260c4a280f52024cb12d44559a6189641ee02efbaf20168435196efbfc69d741'), size_on_disk=404812, blob_last_accessed=1763591392.3761773, blob_last_modified=1758648268.8256567), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_5935/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/33d7303ed077f489ee28021daca0bbc859feb1cfd383dfc9a856ea13994801a9'), size_on_disk=331877, blob_last_accessed=1763591393.292171, blob_last_modified=1758648268.2226539), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=PR007B/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/6f0620b102e7cbfadced7d5530545a185036b53118d3e2b7f074c56403d80f4b'), size_on_disk=263185, blob_last_accessed=1763591391.517183, blob_last_modified=1758648267.7616518), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6808/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/c71f5371e8c803cfc7fd507d0b3c11f7548db5d7a0fa42c68977169369ea036e'), size_on_disk=422880, blob_last_accessed=1763591392.9491735, blob_last_modified=1758648269.3916593), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6651/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/7644af4cc956ea08809ab6c80c65095e3fdbd36e056040c0eaded3773d546359'), size_on_disk=559951, blob_last_accessed=1763591393.6361687, blob_last_modified=1758648269.1856585), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=composite/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/1f8adf8fb4992283714df8587126211d54e6a0580b8be88c1d5e911196aed783'), size_on_disk=4814573, blob_last_accessed=1763591392.0561793, blob_last_modified=1758648268.0256531), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_7237/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/110a743162ace7792a44dfe48d7afd40bb67c45c722038c981548477ebc5d8fe'), size_on_disk=313207, blob_last_accessed=1763591393.143172, blob_last_modified=1758648270.1426628), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=DS006/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/0230df4f1d3748337e5315bb1d22a0373fdab95f9bc900041b0e34966d868718'), size_on_disk=311116, blob_last_accessed=1763591392.9241736, blob_last_modified=1758648267.2506495), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_7258/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/7b1860ba3be7dbf053203fa5690da2002f7841186940231acea6064e70f0bdc8'), size_on_disk=171331, blob_last_accessed=1763591392.6171756, blob_last_modified=1758648270.3746638), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=CS010/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/ac8c590c19e7b95b9aac571b1394976039f25fdd66e62a68ecb99925433f8e54'), size_on_disk=355706, blob_last_accessed=1763591391.517183, blob_last_modified=1758648267.0226483), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=CS007/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/4b2ff192aa4808cb50b1c1b1667060f98505dd63c1aa58c7d8354e41194957e4'), size_on_disk=289767, blob_last_accessed=1763591393.4871697, blob_last_modified=1758648266.7706473), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_7211/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/73d7747698fa1168c89f12c105bbe900d5969d27a82ae2302d0720d761fbfdab'), size_on_disk=298482, blob_last_accessed=1763591391.517183, blob_last_modified=1758648270.1276627), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_5610/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/a86512ae1000a994c86e9aebaf30cf9ce1a59160155f3dc1afa6cbde0a1fc47e'), size_on_disk=307382, blob_last_accessed=1763591393.5291693, blob_last_modified=1758648267.994653), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_7012/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/bd2017e75947ead2c06f9d1936347f7fdcfac7fbff620e0b39077208a02c3bd9'), size_on_disk=303094, blob_last_accessed=1763591393.1271722, blob_last_modified=1758648269.9296618), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_7151/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/9969e4607d6bf42424f5d2ce8f0d16e3d2d89d069cae6d75997e99c88a68dc77'), size_on_disk=275572, blob_last_accessed=1763591393.7231681, blob_last_modified=1758648270.1476629), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6798/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/899cde55032cd593be0013e1f6ff33fe9a12f4616477b1410d86fae1b2008a09'), size_on_disk=342376, blob_last_accessed=1763591392.7951744, blob_last_modified=1758648269.4006593), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6073/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/3d68ca6411ba47eb63adad81e88fa1341a0ee269897268957ae936dcc9ba7788'), size_on_disk=450031, blob_last_accessed=1763591393.1341722, blob_last_modified=1758648268.3886547), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=CS009/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/5a70fa6f22aeda3246d17646cc4ec7c5a46e1993f9981b19d9fd5a8cc99bf11f'), size_on_disk=308858, blob_last_accessed=1763591391.9411802, blob_last_modified=1758648267.0186484), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_5617/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/bc01241e8363373c995a0fa7439aa1e432a60eb8a7805124fff302311f638361'), size_on_disk=351950, blob_last_accessed=1763591393.6541686, blob_last_modified=1758648268.1026535), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=KB001/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/38c939d54d3ed77a09650d2ccf00a6ed0e20390b8be722f1c3ed7881a5904a60'), size_on_disk=351677, blob_last_accessed=1763591392.9151735, blob_last_modified=1758648267.5566509), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_7115/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/205443fd87ce5c793dab86eb75c21dab7dc2ae2edac41d7e6314cfe491bdb01d'), size_on_disk=321850, blob_last_accessed=1763591393.2331715, blob_last_modified=1758648269.9106617), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6657/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/95ad2225152f51e2e31c030ff6708aeebccee5e548035d6f43c9cc3554bdfe5e'), size_on_disk=507730, blob_last_accessed=1763591393.163172, blob_last_modified=1758648269.1676583), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6954/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/aeed3c55d81a4eb0c568486e476211c599ce2bf37f9dff22c7dfca23f5d80495'), size_on_disk=476054, blob_last_accessed=1763591393.011173, blob_last_modified=1758648269.943662), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_5399/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/b65d0aa7ddc98940f91f2ae5f07e1d3f81dda688792552b91fcc39fb4f9ededa'), size_on_disk=256729, blob_last_accessed=1763591392.5471761, blob_last_modified=1758648267.972653), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=CS002/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/a76f0849e518faedfe50e6f0c76908a3f6beed34fd0b607603db1859493595d0'), size_on_disk=292312, blob_last_accessed=1763587658.6168654, blob_last_modified=1758648266.7676473), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6354/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/80b129a46defc7b06e472a35935bfaf410bc5e456475b2885feb2c8a4934a81b'), size_on_disk=173225, blob_last_accessed=1763591393.4981697, blob_last_modified=1758648268.4986553), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/snapshots/6864b2582ce35f92a8dde59851786319c2c494bc/annotated_features/batch=run_6770/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--callingcards/blobs/082f14f7b0d5fc47e088efc40ae85a8ec41f99437c66da96ca69af210bce61cc'), size_on_disk=318118, blob_last_accessed=1763591392.8081744, blob_last_modified=1758648269.3776593)}), refs=frozenset({'main'}), last_modified=1763649507.5418575)}), last_accessed=1763649507.5438576, last_modified=1763649507.5418575), CachedRepoInfo(repo_id='BrentLab/rossi_2021', repo_type='dataset', repo_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021'), size_on_disk=213734041, nb_files=48, revisions=frozenset({CachedRevisionInfo(commit_hash='22ebbfcf62a7ed665fb8eceaea53b58b45a1709e', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/22ebbfcf62a7ed665fb8eceaea53b58b45a1709e'), size_on_disk=4602, files=frozenset({CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/22ebbfcf62a7ed665fb8eceaea53b58b45a1709e/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/e03b8d10ba59c549907a309ad8343133b7b902fc'), size_on_disk=4602, blob_last_accessed=1757613581.062093, blob_last_modified=1757613581.0610929)}), refs=frozenset(), last_modified=1757613581.0610929), CachedRevisionInfo(commit_hash='3ce77eb2070d7bb43049ba26bb462fded0ff7863', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/3ce77eb2070d7bb43049ba26bb462fded0ff7863'), size_on_disk=15562566, files=frozenset({CachedFileInfo(file_name='rossi_2021_metadata.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/3ce77eb2070d7bb43049ba26bb462fded0ff7863/rossi_2021_metadata.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/8dbda57bbf680e0706bf9d7cb298848dfb09a7dfa744e79d7e9e5b6208a64e09'), size_on_disk=14696, blob_last_accessed=1762805126.6580298, blob_last_modified=1756944310.9395182), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/3ce77eb2070d7bb43049ba26bb462fded0ff7863/genome_map/accession=SRR11466106/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/04bacc0f8c03ee000ba3b69320522d0b1be909686474b6e8a126173a0ceae8c4'), size_on_disk=15547870, blob_last_accessed=1762805142.7999256, blob_last_modified=1756944357.1972551)}), refs=frozenset(), last_modified=1756944357.1972551), CachedRevisionInfo(commit_hash='6aac54b4b3fbb414561a8b6cc7acf87880c3c3c2', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6aac54b4b3fbb414561a8b6cc7acf87880c3c3c2'), size_on_disk=4665, files=frozenset({CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6aac54b4b3fbb414561a8b6cc7acf87880c3c3c2/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/14e1e02ad3804c0a4146a52509460e9cbb759d93'), size_on_disk=4665, blob_last_accessed=1758306843.4906857, blob_last_modified=1758156479.1516545)}), refs=frozenset(), last_modified=1758156479.1516545), CachedRevisionInfo(commit_hash='824bf517ff0722a085c86ac7924df0ab1278c8bf', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/824bf517ff0722a085c86ac7924df0ab1278c8bf'), size_on_disk=14696, files=frozenset({CachedFileInfo(file_name='rossi_2021_metadata.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/824bf517ff0722a085c86ac7924df0ab1278c8bf/rossi_2021_metadata.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/8dbda57bbf680e0706bf9d7cb298848dfb09a7dfa744e79d7e9e5b6208a64e09'), size_on_disk=14696, blob_last_accessed=1762805126.6580298, blob_last_modified=1756944310.9395182)}), refs=frozenset(), last_modified=1756944310.9395182), CachedRevisionInfo(commit_hash='dcf529428b66f5efc8c44ea06fb4733c41952e03', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/dcf529428b66f5efc8c44ea06fb4733c41952e03'), size_on_disk=15570962, files=frozenset({CachedFileInfo(file_name='rossi_2021_metadata.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/dcf529428b66f5efc8c44ea06fb4733c41952e03/rossi_2021_metadata.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/8dbda57bbf680e0706bf9d7cb298848dfb09a7dfa744e79d7e9e5b6208a64e09'), size_on_disk=14696, blob_last_accessed=1762805126.6580298, blob_last_modified=1756944310.9395182), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/dcf529428b66f5efc8c44ea06fb4733c41952e03/genome_map/accession=SRR11466106/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/04bacc0f8c03ee000ba3b69320522d0b1be909686474b6e8a126173a0ceae8c4'), size_on_disk=15547870, blob_last_accessed=1762805142.7999256, blob_last_modified=1756944357.1972551), CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/dcf529428b66f5efc8c44ea06fb4733c41952e03/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/192e51c15075f65f15da70558261e4c95bb1392a'), size_on_disk=8396, blob_last_accessed=1762805108.5211468, blob_last_modified=1762805108.5201466)}), refs=frozenset({'main'}), last_modified=1762805108.5201466), CachedRevisionInfo(commit_hash='664d95cdd482d753bc139d26119061c2fa6eaa76', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/664d95cdd482d753bc139d26119061c2fa6eaa76'), size_on_disk=4679, files=frozenset({CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/664d95cdd482d753bc139d26119061c2fa6eaa76/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/ff6f75db42c0690f0e0285e56c1aa95bc5cf1300'), size_on_disk=4679, blob_last_accessed=1758157110.1705706, blob_last_modified=1758157110.1695707)}), refs=frozenset(), last_modified=1758157110.1695707), CachedRevisionInfo(commit_hash='1acaa5629922414e8efe23a5d023bd44965824c8', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/1acaa5629922414e8efe23a5d023bd44965824c8'), size_on_disk=4683, files=frozenset({CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/1acaa5629922414e8efe23a5d023bd44965824c8/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/bb690d71edd44885609ed660d926f3cf2c4c473e'), size_on_disk=4683, blob_last_accessed=1758306349.7238934, blob_last_modified=1758156991.1235294)}), refs=frozenset(), last_modified=1758156991.1235294), CachedRevisionInfo(commit_hash='6b29baae54286d5fbdd1c70f499c03319badad51', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51'), size_on_disk=213707016, files=frozenset({CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466139/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/25c7d3ffbacca76fd998edffe0c762e220f448140019aa2d389facff5117ce48'), size_on_disk=176811, blob_last_accessed=1757598691.2478209, blob_last_modified=1757104412.6266806), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466143/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/54bd4af6ed8caf52a5810598d96c9b235266542586a692112cb247a209c2649f'), size_on_disk=10801408, blob_last_accessed=1757598691.2688208, blob_last_modified=1757104413.3726768), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466136/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/f5e7fd644d30ae0c8e2c72bc341155bfe5df2ce102b68b08ff4d243470026fcc'), size_on_disk=343033, blob_last_accessed=1757598691.1958208, blob_last_modified=1757104411.7116852), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466129/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/b290d4c9436756b444d0da1af1229b0a2a6e877fd968c5d01b73efa96e62cff8'), size_on_disk=6955953, blob_last_accessed=1757598691.196821, blob_last_modified=1757104410.78769), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466123/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/57b7ca5d419676590538573f295b99db05a952e20cb4c6b2a1618974611d5051'), size_on_disk=3672899, blob_last_accessed=1757598691.073821, blob_last_modified=1757104409.590696), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466109/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/6588704c47423e671054df9cb2318c045138b0018f2eefff61ef91a25848eb58'), size_on_disk=5259413, blob_last_accessed=1757598690.8298213, blob_last_modified=1757104404.9987195), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466125/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/0bba21490ea0a6297031b0ff3219fc0158d9bbcf15099906c86a770b1a1312fc'), size_on_disk=2624759, blob_last_accessed=1757598691.081821, blob_last_modified=1757104409.4426968), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466120/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/4220a31e6a7be79341828d41da77ef77fb5a8026980d4445fec18c57b5229c1f'), size_on_disk=3644577, blob_last_accessed=1757598691.034821, blob_last_modified=1757104408.3467023), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466124/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/39ef32412cae257b724c3b797ea482eeffb5b29afef4246d997ea525a2124054'), size_on_disk=7162800, blob_last_accessed=1757598691.075821, blob_last_modified=1757104409.5966961), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466116/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/587a7b86dffe3d7ade9adeeb94029ea1de54e61fb2c2445dc527979b7f1c24ac'), size_on_disk=5596311, blob_last_accessed=1757598690.988821, blob_last_modified=1757104407.3507075), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466122/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/6362da9e07f02d4c9dc05f15f638ca1b1bf0de5989da4ee8c6f0768f2453f98f'), size_on_disk=1249990, blob_last_accessed=1757598691.0398211, blob_last_modified=1757104411.9016843), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466108/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/56875f81a3cc7c1d40b3042b2c443058eb0f9a2b7e7ebc2e1704c098be612628'), size_on_disk=1765819, blob_last_accessed=1757598690.8238213, blob_last_modified=1757104407.1947083), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466132/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/cc41509e63473d565341ed36fd6881fecfd537e59c003f4779d4c4ec7b49cb54'), size_on_disk=1906184, blob_last_accessed=1757598691.161821, blob_last_modified=1757104413.4366765), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466142/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/3dc044b5cc1b7bc8070d8de4b7b870524b296f26b4f43b22dce2dcbf161d4c74'), size_on_disk=1739520, blob_last_accessed=1757598691.2588208, blob_last_modified=1757104415.5896657), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466131/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/bdce0842b5568e406a799dbe36b4b9645a30fa58eb0a707f24ed3bb3900fbf67'), size_on_disk=11759994, blob_last_accessed=1757598691.157821, blob_last_modified=1757104411.1146884), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466114/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/2c07f671cca471d6ad3e10e071cd1b976a89a45ff32920913d8fc36dbeaa1678'), size_on_disk=10992797, blob_last_accessed=1757598690.9498212, blob_last_modified=1757104406.3287127), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466128/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/f631797d2bc13a580f570637d94a0b9ea37deb2fd7ef92e32c66fd9014260525'), size_on_disk=1059587, blob_last_accessed=1757598691.118821, blob_last_modified=1757104410.6996903), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466121/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/78d0f377f8a756b28f2f5296932368c9c6b764d2972dc93aa6dd354591d29860'), size_on_disk=7053621, blob_last_accessed=1757598691.0328212, blob_last_modified=1757104408.5097015), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466126/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/34292b90d99b5a5be542c7dd10b11336605330639d92e93b092e567a56fe7d93'), size_on_disk=2850164, blob_last_accessed=1757598691.086821, blob_last_modified=1757104410.0976934), CachedFileInfo(file_name='rossi_2021_metadata.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/rossi_2021_metadata.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/8dbda57bbf680e0706bf9d7cb298848dfb09a7dfa744e79d7e9e5b6208a64e09'), size_on_disk=14696, blob_last_accessed=1762805126.6580298, blob_last_modified=1756944310.9395182), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466117/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/49065d4cee74c20f34303405e34e284e29f19082b76fa883a38aeab24cadf674'), size_on_disk=1742958, blob_last_accessed=1757598690.995821, blob_last_modified=1757104411.8276846), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466140/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/25f98a891820b5a91c11629ed5615d40ffe1fe4a6cba1b2c6642067d76b5be87'), size_on_disk=68174, blob_last_accessed=1757598691.2588208, blob_last_modified=1757104415.6456654), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466144/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/57b3d2c7917bfa29312cc51c921beff914858c61762e28a658eb29d8eaf3348b'), size_on_disk=918271, blob_last_accessed=1757598691.2758207, blob_last_modified=1757104413.554676), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466112/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/8f3441964a046d3c148b41f335948a085d26884e1a890e515711a2d1b48a40be'), size_on_disk=4883601, blob_last_accessed=1757598690.9128213, blob_last_modified=1757104408.5027015), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466118/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/192057e6004dd5cd303e8b1abf54cb72005e211c315665e28eec7b617a84d95a'), size_on_disk=728586, blob_last_accessed=1757598691.0068212, blob_last_modified=1757104407.0917087), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466110/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/d5283e08e898b4f59837f7ff2064f79c8a828994f868c637894f4e07df36daa5'), size_on_disk=13123795, blob_last_accessed=1757598690.8318212, blob_last_modified=1757104407.6787057), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466149/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/ae4d5237fde214a92d3267fbe11f6f1f88569cc7f3f7ccb40fed200871bf33b1'), size_on_disk=10933582, blob_last_accessed=1757598691.3518207, blob_last_modified=1757104415.1316679), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466119/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/0157d9136c2c2bb5640866449adf4bab8c62b3d7f7fb4262981908c77986f89e'), size_on_disk=12516152, blob_last_accessed=1757598691.0068212, blob_last_modified=1757104412.3166823), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466106/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/04bacc0f8c03ee000ba3b69320522d0b1be909686474b6e8a126173a0ceae8c4'), size_on_disk=15547870, blob_last_accessed=1762805142.7999256, blob_last_modified=1756944357.1972551), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466127/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/1565c5175e05c322f160529cb4bb74d5e8468458f150f228d6aadf2b670c3b4a'), size_on_disk=11670434, blob_last_accessed=1757598691.092821, blob_last_modified=1757104411.0776885), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466134/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/0b23434181eb1fc864e858cfca82d2d8fed1524a8f7dbea3bf576f6f452cf08b'), size_on_disk=12398132, blob_last_accessed=1757598691.169821, blob_last_modified=1757104412.552681), CachedFileInfo(file_name='.gitattributes', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/.gitattributes'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/1ef325f1b111266a6b26e0196871bd78baa8c2f3'), size_on_disk=2461, blob_last_accessed=1757598690.9228213, blob_last_modified=1757104403.6017265), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466111/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/ba431fc9a05f40581e450429d467226a8e3c49195de355437f9044bb0fdfe524'), size_on_disk=5749460, blob_last_accessed=1757598690.9028213, blob_last_modified=1757104404.9707196), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466130/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/73db821736e3e47c09da250e369fa06812e02e5abe7a4a3ceab5d1020422b66c'), size_on_disk=1920920, blob_last_accessed=1757598691.1328208, blob_last_modified=1757104410.9306893), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466150/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/80cd2d325e5d7a50c6ecbf79a052f45c54a93b9423e90735bf8989aada9bb9eb'), size_on_disk=3234303, blob_last_accessed=1757598691.3798206, blob_last_modified=1757104417.837654), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466135/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/d44d381db0b8c0cb4960796e48d3d16e3d766aff87cbc686789a0c5799ba0a98'), size_on_disk=5204256, blob_last_accessed=1757598691.180821, blob_last_modified=1757104412.4376817), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466133/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/b6d6913e14b52d5d51dab2f254815da89e03cebe75e80356b27cf18e31eb048f'), size_on_disk=5090722, blob_last_accessed=1757598691.169821, blob_last_modified=1757104412.180683), CachedFileInfo(file_name='genome_map.md5', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map.md5'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/51978dc2125afb40a41a29672f8ae750308d9a63'), size_on_disk=5311747, blob_last_accessed=1757598690.9468212, blob_last_modified=1757104406.477712), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466113/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/acec3b041a74ab45dbeb03e7f183c2b3d16479aadd64a81f80a74aa42d56368d'), size_on_disk=5866774, blob_last_accessed=1757598690.921821, blob_last_modified=1757104409.0226989), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466141/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/e9efb807ce35fc3e43b1ff0d27b80646c7d137c587abd9d6247393e44d4f30a0'), size_on_disk=986450, blob_last_accessed=1757598691.2628207, blob_last_modified=1757104413.129678), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466115/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/c08ce1194dd98cb028a6038944317e70ac1545e32b21dd323eafa57b16611e6e'), size_on_disk=3503431, blob_last_accessed=1757598690.9468212, blob_last_modified=1757104405.9427147), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/genome_map/accession=SRR11466107/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/2f924bc58d82a21b621b051addd1913c77369afaa738171ef0962548fc1a1021'), size_on_disk=5670198, blob_last_accessed=1757598690.8608212, blob_last_modified=1757104405.0417192), CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/6b29baae54286d5fbdd1c70f499c03319badad51/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/09f6dce90ec0e64d8f18614091dd18ad2e86213a'), size_on_disk=4403, blob_last_accessed=1757598690.987821, blob_last_modified=1757104409.9196944)}), refs=frozenset(), last_modified=1757104417.837654), CachedRevisionInfo(commit_hash='1e07e46fd0b5348243487ac4a573b570a2a3946b', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/1e07e46fd0b5348243487ac4a573b570a2a3946b'), size_on_disk=4683, files=frozenset({CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/snapshots/1e07e46fd0b5348243487ac4a573b570a2a3946b/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--rossi_2021/blobs/bb690d71edd44885609ed660d926f3cf2c4c473e'), size_on_disk=4683, blob_last_accessed=1758306349.7238934, blob_last_modified=1758156991.1235294)}), refs=frozenset(), last_modified=1758156991.1235294)}), last_accessed=1762805142.7999256, last_modified=1762805108.5201466), CachedRepoInfo(repo_id='BrentLab/harbison_2004', repo_type='dataset', repo_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004'), size_on_disk=70377892, nb_files=10, revisions=frozenset({CachedRevisionInfo(commit_hash='2916ad316cc0d8a1ce2e7f35d3707b831b203e87', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004/snapshots/2916ad316cc0d8a1ce2e7f35d3707b831b203e87'), size_on_disk=7160, files=frozenset({CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004/snapshots/2916ad316cc0d8a1ce2e7f35d3707b831b203e87/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004/blobs/7e3ec66f5479346ca9d082545ff0b6a960fb60d5'), size_on_disk=7160, blob_last_accessed=1758649475.233073, blob_last_modified=1758155946.7669883)}), refs=frozenset(), last_modified=1758155946.7669883), CachedRevisionInfo(commit_hash='4e54b7fafd79829bcd179b508efd11a3dc7182cc', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004/snapshots/4e54b7fafd79829bcd179b508efd11a3dc7182cc'), size_on_disk=44900498, files=frozenset({CachedFileInfo(file_name='harbison_2004.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004/snapshots/4e54b7fafd79829bcd179b508efd11a3dc7182cc/harbison_2004.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004/blobs/b89f865a471fbf3a8054871f8fe79507f6f4be5c2291dcd19030ec8fd4a5325c'), size_on_disk=44886395, blob_last_accessed=1767812938.7903292, blob_last_modified=1765314558.8410568), CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004/snapshots/4e54b7fafd79829bcd179b508efd11a3dc7182cc/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004/blobs/ef196364c79e1a0ec01b3ab6db8dae798f0f8fd5'), size_on_disk=14103, blob_last_accessed=1765911961.5656407, blob_last_modified=1765819563.7734888)}), refs=frozenset(), last_modified=1765819563.7734888), CachedRevisionInfo(commit_hash='a33c34b373e379dfa9bd4922d281790180bb1217', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004/snapshots/a33c34b373e379dfa9bd4922d281790180bb1217'), size_on_disk=44899466, files=frozenset({CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004/snapshots/a33c34b373e379dfa9bd4922d281790180bb1217/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004/blobs/3cb3997d01aa1c91d0719f954e1cf207976c8a7d'), size_on_disk=13071, blob_last_accessed=1767811669.8189607, blob_last_modified=1765916907.339037), CachedFileInfo(file_name='harbison_2004.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004/snapshots/a33c34b373e379dfa9bd4922d281790180bb1217/harbison_2004.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004/blobs/b89f865a471fbf3a8054871f8fe79507f6f4be5c2291dcd19030ec8fd4a5325c'), size_on_disk=44886395, blob_last_accessed=1767812938.7903292, blob_last_modified=1765314558.8410568)}), refs=frozenset({'main'}), last_modified=1765916907.339037), CachedRevisionInfo(commit_hash='3ff5f32dfeef3c0b2b7d6cc860b259b0ad095829', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004/snapshots/3ff5f32dfeef3c0b2b7d6cc860b259b0ad095829'), size_on_disk=25421759, files=frozenset({CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004/snapshots/3ff5f32dfeef3c0b2b7d6cc860b259b0ad095829/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004/blobs/342343ab9f71e7ce0394c540bed737de39f2cf08'), size_on_disk=5893, blob_last_accessed=1764345374.74204, blob_last_modified=1763587990.060371), CachedFileInfo(file_name='harbison_2004.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004/snapshots/3ff5f32dfeef3c0b2b7d6cc860b259b0ad095829/harbison_2004.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004/blobs/91904b3e84754bd6011cd4b385bbc99d69a899e5c7562f039a121a744ba95a0a'), size_on_disk=25415866, blob_last_accessed=1763588775.480192, blob_last_modified=1758649664.1292322)}), refs=frozenset(), last_modified=1763587990.060371), CachedRevisionInfo(commit_hash='073cdeb1b541c28b1a97df9dbc6be6af1d9c23b6', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004/snapshots/073cdeb1b541c28b1a97df9dbc6be6af1d9c23b6'), size_on_disk=5145, files=frozenset({CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004/snapshots/073cdeb1b541c28b1a97df9dbc6be6af1d9c23b6/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004/blobs/f7a6cf3dd4477f08e508f38665350e8d11589963'), size_on_disk=5145, blob_last_accessed=1755876214.0380862, blob_last_modified=1755876188.1642818)}), refs=frozenset(), last_modified=1755876188.1642818), CachedRevisionInfo(commit_hash='a0f6f1b23505c4e6a471e65a43a33ff5cced0733', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004/snapshots/a0f6f1b23505c4e6a471e65a43a33ff5cced0733'), size_on_disk=25423545, files=frozenset({CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004/snapshots/a0f6f1b23505c4e6a471e65a43a33ff5cced0733/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004/blobs/b5fbd9e98fd8ddadeeb5631e3b6f5055e917c98d'), size_on_disk=7679, blob_last_accessed=1759345152.2162483, blob_last_modified=1758649637.161202), CachedFileInfo(file_name='harbison_2004.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004/snapshots/a0f6f1b23505c4e6a471e65a43a33ff5cced0733/harbison_2004.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004/blobs/91904b3e84754bd6011cd4b385bbc99d69a899e5c7562f039a121a744ba95a0a'), size_on_disk=25415866, blob_last_accessed=1763588775.480192, blob_last_modified=1758649664.1292322)}), refs=frozenset(), last_modified=1758649664.1292322), CachedRevisionInfo(commit_hash='95dace55563820030e6bcac2d4b2a9a386a2369b', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004/snapshots/95dace55563820030e6bcac2d4b2a9a386a2369b'), size_on_disk=44900425, files=frozenset({CachedFileInfo(file_name='harbison_2004.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004/snapshots/95dace55563820030e6bcac2d4b2a9a386a2369b/harbison_2004.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004/blobs/b89f865a471fbf3a8054871f8fe79507f6f4be5c2291dcd19030ec8fd4a5325c'), size_on_disk=44886395, blob_last_accessed=1767812938.7903292, blob_last_modified=1765314558.8410568), CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004/snapshots/95dace55563820030e6bcac2d4b2a9a386a2369b/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004/blobs/1cfc9a691644fd13ba4e3c4d141caf8bc8dadc92'), size_on_disk=14030, blob_last_accessed=1765809787.9431984, blob_last_modified=1765415607.8811436)}), refs=frozenset(), last_modified=1765415607.8811436), CachedRevisionInfo(commit_hash='5eeb06e389a648a36087e20eabad1f961e22dc5a', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004/snapshots/5eeb06e389a648a36087e20eabad1f961e22dc5a'), size_on_disk=44894945, files=frozenset({CachedFileInfo(file_name='harbison_2004.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004/snapshots/5eeb06e389a648a36087e20eabad1f961e22dc5a/harbison_2004.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004/blobs/b89f865a471fbf3a8054871f8fe79507f6f4be5c2291dcd19030ec8fd4a5325c'), size_on_disk=44886395, blob_last_accessed=1767812938.7903292, blob_last_modified=1765314558.8410568), CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004/snapshots/5eeb06e389a648a36087e20eabad1f961e22dc5a/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004/blobs/aa43ea37112a84a60d21ce254c1cc1e433def6dc'), size_on_disk=8550, blob_last_accessed=1765406568.606891, blob_last_modified=1765292615.6690626)}), refs=frozenset(), last_modified=1765314558.8410568)}), last_accessed=1767812938.7903292, last_modified=1765916907.339037), CachedRepoInfo(repo_id='BrentLab/hughes_2006', repo_type='dataset', repo_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hughes_2006'), size_on_disk=11969274, nb_files=11, revisions=frozenset({CachedRevisionInfo(commit_hash='0de73d0932e423cfbbfbc1b21d029c96490dc200', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hughes_2006/snapshots/0de73d0932e423cfbbfbc1b21d029c96490dc200'), size_on_disk=8791, files=frozenset({CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hughes_2006/snapshots/0de73d0932e423cfbbfbc1b21d029c96490dc200/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hughes_2006/blobs/76efc047ec3c08e362cfcaac776b7e05c02f8a6b'), size_on_disk=8791, blob_last_accessed=1764712744.7980804, blob_last_modified=1764712744.7960804)}), refs=frozenset(), last_modified=1764712744.7960804), CachedRevisionInfo(commit_hash='cc20720c4fe7de9151a051aafbd41e6de2b4cd84', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hughes_2006/snapshots/cc20720c4fe7de9151a051aafbd41e6de2b4cd84'), size_on_disk=11937630, files=frozenset({CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hughes_2006/snapshots/cc20720c4fe7de9151a051aafbd41e6de2b4cd84/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hughes_2006/blobs/842b551fb3d1ba1e6f1daa35157a0c7c616f7655'), size_on_disk=8514, blob_last_accessed=1756855276.4871445, blob_last_modified=1756855276.4861445), CachedFileInfo(file_name='parse_hughes_2006.R', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hughes_2006/snapshots/cc20720c4fe7de9151a051aafbd41e6de2b4cd84/scripts/parse_hughes_2006.R'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hughes_2006/blobs/0c2645bdc0ab7f6be38a428a0a92752e9d3f88d5'), size_on_disk=9255, blob_last_accessed=1756856028.5803509, blob_last_modified=1756856028.6393504), CachedFileInfo(file_name='.gitattributes', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hughes_2006/snapshots/cc20720c4fe7de9151a051aafbd41e6de2b4cd84/.gitattributes'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hughes_2006/blobs/1ef325f1b111266a6b26e0196871bd78baa8c2f3'), size_on_disk=2461, blob_last_accessed=1756856028.69735, blob_last_modified=1756856028.7793496), CachedFileInfo(file_name='knockout.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hughes_2006/snapshots/cc20720c4fe7de9151a051aafbd41e6de2b4cd84/knockout.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hughes_2006/blobs/5e000e2e9401011c9ecc81655c94d55b7f9469ed856216fa21cef2883a904c93'), size_on_disk=5571518, blob_last_accessed=1756856029.2493467, blob_last_modified=1756856029.2143471), CachedFileInfo(file_name='metadata.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hughes_2006/snapshots/cc20720c4fe7de9151a051aafbd41e6de2b4cd84/metadata.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hughes_2006/blobs/e27a67137876123c68c229705cfbb8e06ed3ee76a3a868b66d80d624f8d03671'), size_on_disk=7942, blob_last_accessed=1756856028.533351, blob_last_modified=1756856028.9083488), CachedFileInfo(file_name='overexpression.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hughes_2006/snapshots/cc20720c4fe7de9151a051aafbd41e6de2b4cd84/overexpression.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hughes_2006/blobs/9b8cc1feede780efe0e8537555ed1cbbf067e7fa9aa7a7edc74e5a343c64a443'), size_on_disk=6337781, blob_last_accessed=1756856028.536351, blob_last_modified=1756856029.230347), CachedFileInfo(file_name='checksum.md5', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hughes_2006/snapshots/cc20720c4fe7de9151a051aafbd41e6de2b4cd84/checksum.md5'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hughes_2006/blobs/973b686313e32c96b6e0ef1dbc84020857c073ef'), size_on_disk=159, blob_last_accessed=1756856028.5883508, blob_last_modified=1756856028.6513503)}), refs=frozenset(), last_modified=1756856029.230347), CachedRevisionInfo(commit_hash='1009546961df1cf9743a03383826d8a5e010bb38', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hughes_2006/snapshots/1009546961df1cf9743a03383826d8a5e010bb38'), size_on_disk=17152, files=frozenset({CachedFileInfo(file_name='metadata.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hughes_2006/snapshots/1009546961df1cf9743a03383826d8a5e010bb38/metadata.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hughes_2006/blobs/54173aa9fff98e5b077e2b24e139ad600a6d7cea74a47b0305cb0fbdf1c14d9b'), size_on_disk=8362, blob_last_accessed=1764713263.750364, blob_last_modified=1764713263.741364), CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hughes_2006/snapshots/1009546961df1cf9743a03383826d8a5e010bb38/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hughes_2006/blobs/572115c6e031637a3ac876795a6f91d3653bf5e2'), size_on_disk=8790, blob_last_accessed=1764713262.7733643, blob_last_modified=1764713223.5823674)}), refs=frozenset({'main'}), last_modified=1764713263.741364), CachedRevisionInfo(commit_hash='51ae7addf429c955a97934a15b242ff8a2ccd653', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hughes_2006/snapshots/51ae7addf429c955a97934a15b242ff8a2ccd653'), size_on_disk=5701, files=frozenset({CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hughes_2006/snapshots/51ae7addf429c955a97934a15b242ff8a2ccd653/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hughes_2006/blobs/f799807a7bf81229b0d063e6c939339c1f5c90f4'), size_on_disk=5701, blob_last_accessed=1756855057.8437808, blob_last_modified=1756855057.8417807)}), refs=frozenset(), last_modified=1756855057.8417807)}), last_accessed=1764713263.750364, last_modified=1764713263.741364), CachedRepoInfo(repo_id='BrentLab/hackett_2020', repo_type='dataset', repo_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020'), size_on_disk=817026981, nb_files=22, revisions=frozenset({CachedRevisionInfo(commit_hash='6bea8e8497a12fec83927a941ba912a0ec781822', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/6bea8e8497a12fec83927a941ba912a0ec781822'), size_on_disk=8027, files=frozenset({CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/6bea8e8497a12fec83927a941ba912a0ec781822/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/blobs/159b3e518310d0bf928d14a691e41a77d5540aaa'), size_on_disk=8027, blob_last_accessed=1765299157.242701, blob_last_modified=1765299156.9627051)}), refs=frozenset(), last_modified=1765299156.9627051), CachedRevisionInfo(commit_hash='60b3ecf6a06e709f0397b327ece5c31016303b5c', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/60b3ecf6a06e709f0397b327ece5c31016303b5c'), size_on_disk=404558833, files=frozenset({CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/60b3ecf6a06e709f0397b327ece5c31016303b5c/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/blobs/6f54100c2bb84a6bc059c17e0954dad0ea451666'), size_on_disk=9543, blob_last_accessed=1760552529.9219792, blob_last_modified=1758155372.1328666), CachedFileInfo(file_name='hackett_2020.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/60b3ecf6a06e709f0397b327ece5c31016303b5c/hackett_2020.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/blobs/573bae5eeb209750fa92fd01c1ef98c50ee76ee75618783b988e36311a4dc81f'), size_on_disk=404549290, blob_last_accessed=1760552530.4189773, blob_last_modified=1756843405.225893)}), refs=frozenset(), last_modified=1758155372.1328666), CachedRevisionInfo(commit_hash='1613b522d34bc2d1e9bfb1a6e496a3d5e1282e43', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/1613b522d34bc2d1e9bfb1a6e496a3d5e1282e43'), size_on_disk=2678425, files=frozenset({CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/1613b522d34bc2d1e9bfb1a6e496a3d5e1282e43/data/regulator_locus_tag=YAL051W/time=10/mechanism=GEV/restriction=P/date=20151026/strain=yRSM209/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/blobs/4a2eb1051cad96c5b39a014f19b092f042b03700cc1b1a40c2de9e7edd73edf9'), size_on_disk=352201, blob_last_accessed=1756145253.7915416, blob_last_modified=1756145254.6375287), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/1613b522d34bc2d1e9bfb1a6e496a3d5e1282e43/data/regulator_locus_tag=YAL051W/time=30/mechanism=GEV/restriction=P/date=20151026/strain=yRSM209/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/blobs/a85bd6b418d9644d9adaa1269c27f97469a4aaee51af63cf1aa041f62cd8ba2c'), size_on_disk=350044, blob_last_accessed=1756145253.7915416, blob_last_modified=1756145254.6385286), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/1613b522d34bc2d1e9bfb1a6e496a3d5e1282e43/data/regulator_locus_tag=YAL051W/time=90/mechanism=GEV/restriction=P/date=20151026/strain=yRSM209/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/blobs/ebd06af83cc386d9094ce6b8d795b155c76c47c96da1a458743348264f1638ae'), size_on_disk=349917, blob_last_accessed=1756145253.7845416, blob_last_modified=1756145254.7235274), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/1613b522d34bc2d1e9bfb1a6e496a3d5e1282e43/data/regulator_locus_tag=YAL051W/time=0/mechanism=GEV/restriction=P/date=20151026/strain=yRSM209/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/blobs/409962e1914ac9e9d631e4ae73e04b167822d0915d8d8d4a87703d4c7281ffb9'), size_on_disk=209695, blob_last_accessed=1756145254.7305272, blob_last_modified=1756145254.608529), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/1613b522d34bc2d1e9bfb1a6e496a3d5e1282e43/data/regulator_locus_tag=YAL051W/time=45/mechanism=GEV/restriction=P/date=20151026/strain=yRSM209/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/blobs/db8595cd20dd7440c890c46025cf0643da8ec6be3e15bc18a8759b3fecdc02af'), size_on_disk=349256, blob_last_accessed=1756145253.7795417, blob_last_modified=1756145254.6225288), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/1613b522d34bc2d1e9bfb1a6e496a3d5e1282e43/data/regulator_locus_tag=YAL051W/time=5/mechanism=GEV/restriction=P/date=20151026/strain=yRSM209/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/blobs/ed331d9d2b27a0958b12368c7b5d2d3383b111d3d1c39ed0e77b78561dc78b6a'), size_on_disk=348626, blob_last_accessed=1756145253.7845416, blob_last_modified=1756145254.6395288), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/1613b522d34bc2d1e9bfb1a6e496a3d5e1282e43/data/regulator_locus_tag=YAL051W/time=15/mechanism=GEV/restriction=P/date=20151026/strain=yRSM209/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/blobs/51dfb68ed2518f0dd8ab4d4070d4896b6eca912ecb7e4aa773d857e3096020b5'), size_on_disk=351273, blob_last_accessed=1756145217.972108, blob_last_modified=1756142799.7054222), CachedFileInfo(file_name='part-0.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/1613b522d34bc2d1e9bfb1a6e496a3d5e1282e43/data/regulator_locus_tag=YAL051W/time=20/mechanism=GEV/restriction=P/date=20151026/strain=yRSM209/part-0.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/blobs/97adc0bb2b436c325555d82bb7f0ffb0e250d73092fcbf03ac0c00fa2b762e51'), size_on_disk=351379, blob_last_accessed=1756145253.7895415, blob_last_modified=1756145254.6265287), CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/1613b522d34bc2d1e9bfb1a6e496a3d5e1282e43/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/blobs/45cb0687c3ea72e7b32ad17dd799bc38ad02f80b'), size_on_disk=16034, blob_last_accessed=1756138543.6766906, blob_last_modified=1756138543.6746907)}), refs=frozenset(), last_modified=1756145254.7235274), CachedRevisionInfo(commit_hash='ca2500351e8aa9fc5b996516cd25a60e8298734d', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/ca2500351e8aa9fc5b996516cd25a60e8298734d'), size_on_disk=6294, files=frozenset({CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/ca2500351e8aa9fc5b996516cd25a60e8298734d/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/blobs/ebb31f16ba92f3c0a9eec2aaf4897b3a7662cfad'), size_on_disk=6294, blob_last_accessed=1764169619.6288424, blob_last_modified=1764169578.8420532)}), refs=frozenset(), last_modified=1764169578.8420532), CachedRevisionInfo(commit_hash='5b09fb31bac250b0bfc099c1bdb32bc2673e754b', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/5b09fb31bac250b0bfc099c1bdb32bc2673e754b'), size_on_disk=409731660, files=frozenset({CachedFileInfo(file_name='hackett_2020.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/5b09fb31bac250b0bfc099c1bdb32bc2673e754b/hackett_2020.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/blobs/10f544e6c4e4da0b411f1084cbe7f9188a842bc6578b699bdf02f3507aff6592'), size_on_disk=409722077, blob_last_accessed=1765311990.5457778, blob_last_modified=1765311990.540778), CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/5b09fb31bac250b0bfc099c1bdb32bc2673e754b/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/blobs/dacac6a8023596c4c46dce14e80f861d4e4a7401'), size_on_disk=9583, blob_last_accessed=1765309535.4347303, blob_last_modified=1765308942.1208022)}), refs=frozenset(), last_modified=1765311990.540778), CachedRevisionInfo(commit_hash='7192b0e2275ba5aa3ffb3752b87f1c3595f2331a', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/7192b0e2275ba5aa3ffb3752b87f1c3595f2331a'), size_on_disk=404565656, files=frozenset({CachedFileInfo(file_name='hackett_2020.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/7192b0e2275ba5aa3ffb3752b87f1c3595f2331a/hackett_2020.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/blobs/573bae5eeb209750fa92fd01c1ef98c50ee76ee75618783b988e36311a4dc81f'), size_on_disk=404549290, blob_last_accessed=1760552530.4189773, blob_last_modified=1756843405.225893), CachedFileInfo(file_name='hackett_2020.parquet.md5', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/7192b0e2275ba5aa3ffb3752b87f1c3595f2331a/hackett_2020.parquet.md5'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/blobs/4f47951c65c799b5c74ee3782b49c16cbb038d50'), size_on_disk=55, blob_last_accessed=1757105553.3228161, blob_last_modified=1756843395.5719483), CachedFileInfo(file_name='parse_mcisaac_data.R', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/7192b0e2275ba5aa3ffb3752b87f1c3595f2331a/scripts/parse_mcisaac_data.R'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/blobs/c1f07d2acd1b9e5fe0679a472610fdb6b08483e0'), size_on_disk=4691, blob_last_accessed=1756843395.7579472, blob_last_modified=1756843395.820947), CachedFileInfo(file_name='.gitattributes', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/7192b0e2275ba5aa3ffb3752b87f1c3595f2331a/.gitattributes'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/blobs/1ef325f1b111266a6b26e0196871bd78baa8c2f3'), size_on_disk=2461, blob_last_accessed=1756843395.5059488, blob_last_modified=1756843395.5679483), CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/7192b0e2275ba5aa3ffb3752b87f1c3595f2331a/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/blobs/f3899cd97da03a3543db10d76c15ee0442bb136e'), size_on_disk=9159, blob_last_accessed=1758145252.5929751, blob_last_modified=1757541416.40237)}), refs=frozenset(), last_modified=1757541416.40237), CachedRevisionInfo(commit_hash='4fef197cd055065207d66ea42aec9746b23f38c8', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/4fef197cd055065207d66ea42aec9746b23f38c8'), size_on_disk=9431, files=frozenset({CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/4fef197cd055065207d66ea42aec9746b23f38c8/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/blobs/0a3330d889673abc635b58a78c221ba5b3e36200'), size_on_disk=9431, blob_last_accessed=1765809790.786152, blob_last_modified=1765650535.7166286)}), refs=frozenset({'main'}), last_modified=1765650535.7166286), CachedRevisionInfo(commit_hash='2c196866d4f057cb5cf0adf1fe35f2c712c61025', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/2c196866d4f057cb5cf0adf1fe35f2c712c61025'), size_on_disk=404565376, files=frozenset({CachedFileInfo(file_name='hackett_2020.parquet.md5', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/2c196866d4f057cb5cf0adf1fe35f2c712c61025/hackett_2020.parquet.md5'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/blobs/4f47951c65c799b5c74ee3782b49c16cbb038d50'), size_on_disk=55, blob_last_accessed=1757105553.3228161, blob_last_modified=1756843395.5719483), CachedFileInfo(file_name='hackett_2020.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/2c196866d4f057cb5cf0adf1fe35f2c712c61025/hackett_2020.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/blobs/573bae5eeb209750fa92fd01c1ef98c50ee76ee75618783b988e36311a4dc81f'), size_on_disk=404549290, blob_last_accessed=1760552530.4189773, blob_last_modified=1756843405.225893), CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/2c196866d4f057cb5cf0adf1fe35f2c712c61025/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/blobs/c3e72ccb1b8deba4bbfd18abe6081de7ec3914d9'), size_on_disk=8879, blob_last_accessed=1757105552.290822, blob_last_modified=1757104206.2667632), CachedFileInfo(file_name='.gitattributes', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/2c196866d4f057cb5cf0adf1fe35f2c712c61025/.gitattributes'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/blobs/1ef325f1b111266a6b26e0196871bd78baa8c2f3'), size_on_disk=2461, blob_last_accessed=1756843395.5059488, blob_last_modified=1756843395.5679483), CachedFileInfo(file_name='parse_mcisaac_data.R', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/2c196866d4f057cb5cf0adf1fe35f2c712c61025/scripts/parse_mcisaac_data.R'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/blobs/c1f07d2acd1b9e5fe0679a472610fdb6b08483e0'), size_on_disk=4691, blob_last_accessed=1756843395.7579472, blob_last_modified=1756843395.820947)}), refs=frozenset(), last_modified=1757104206.2667632), CachedRevisionInfo(commit_hash='b9d39535e175295d9cba23489a49fafebbac5ca0', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/b9d39535e175295d9cba23489a49fafebbac5ca0'), size_on_disk=404565563, files=frozenset({CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/b9d39535e175295d9cba23489a49fafebbac5ca0/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/blobs/03e579e00c632cdb393c0ffafd9d15b069390e5c'), size_on_disk=9066, blob_last_accessed=1757439136.0598524, blob_last_modified=1757105945.6476595), CachedFileInfo(file_name='.gitattributes', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/b9d39535e175295d9cba23489a49fafebbac5ca0/.gitattributes'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/blobs/1ef325f1b111266a6b26e0196871bd78baa8c2f3'), size_on_disk=2461, blob_last_accessed=1756843395.5059488, blob_last_modified=1756843395.5679483), CachedFileInfo(file_name='parse_mcisaac_data.R', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/b9d39535e175295d9cba23489a49fafebbac5ca0/scripts/parse_mcisaac_data.R'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/blobs/c1f07d2acd1b9e5fe0679a472610fdb6b08483e0'), size_on_disk=4691, blob_last_accessed=1756843395.7579472, blob_last_modified=1756843395.820947), CachedFileInfo(file_name='hackett_2020.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/b9d39535e175295d9cba23489a49fafebbac5ca0/hackett_2020.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/blobs/573bae5eeb209750fa92fd01c1ef98c50ee76ee75618783b988e36311a4dc81f'), size_on_disk=404549290, blob_last_accessed=1760552530.4189773, blob_last_modified=1756843405.225893), CachedFileInfo(file_name='hackett_2020.parquet.md5', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/snapshots/b9d39535e175295d9cba23489a49fafebbac5ca0/hackett_2020.parquet.md5'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/blobs/4f47951c65c799b5c74ee3782b49c16cbb038d50'), size_on_disk=55, blob_last_accessed=1757105553.3228161, blob_last_modified=1756843395.5719483)}), refs=frozenset(), last_modified=1757105945.6476595)}), last_accessed=1765809790.786152, last_modified=1765650535.7166286), CachedRepoInfo(repo_id='BrentLab/mahendrawada_2025', repo_type='dataset', repo_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--mahendrawada_2025'), size_on_disk=94317969, nb_files=8, revisions=frozenset({CachedRevisionInfo(commit_hash='874a5dfe4052a1e71b6544a2e5b2c99ad4286c04', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--mahendrawada_2025/snapshots/874a5dfe4052a1e71b6544a2e5b2c99ad4286c04'), size_on_disk=18603, files=frozenset({CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--mahendrawada_2025/snapshots/874a5dfe4052a1e71b6544a2e5b2c99ad4286c04/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--mahendrawada_2025/blobs/685c0e56e0c68e20e3c2af8e38267095d686c03c'), size_on_disk=18603, blob_last_accessed=1758569041.2889845, blob_last_modified=1758569041.2879846)}), refs=frozenset(), last_modified=1758569041.2879846), CachedRevisionInfo(commit_hash='af5ac9dc922b7fbd14f460c2fe94b727db1e1245', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--mahendrawada_2025/snapshots/af5ac9dc922b7fbd14f460c2fe94b727db1e1245'), size_on_disk=92323432, files=frozenset({CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--mahendrawada_2025/snapshots/af5ac9dc922b7fbd14f460c2fe94b727db1e1245/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--mahendrawada_2025/blobs/8f0cf5e3dd2bb5bdc2f8fde970e31a2404481fec'), size_on_disk=20435, blob_last_accessed=1764298314.9501038, blob_last_modified=1763578711.307058), CachedFileInfo(file_name='reprocess_diffcontrol_5prime.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--mahendrawada_2025/snapshots/af5ac9dc922b7fbd14f460c2fe94b727db1e1245/reprocess_diffcontrol_5prime.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--mahendrawada_2025/blobs/eba528776f8fe8da63c39ee0660527bc35f989ac6a976daf09270860cad6d735'), size_on_disk=92302997, blob_last_accessed=1763665270.5143101, blob_last_modified=1763578870.280984)}), refs=frozenset({'main'}), last_modified=1763578870.280984), CachedRevisionInfo(commit_hash='8bea431be57e21633be4174df291540b1fae212a', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--mahendrawada_2025/snapshots/8bea431be57e21633be4174df291540b1fae212a'), size_on_disk=18603, files=frozenset({CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--mahendrawada_2025/snapshots/8bea431be57e21633be4174df291540b1fae212a/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--mahendrawada_2025/blobs/6a2a5a6a8006e386c0e47a2a98151442046d3d41'), size_on_disk=18603, blob_last_accessed=1758568089.249362, blob_last_modified=1758568089.246362)}), refs=frozenset(), last_modified=1758568089.246362), CachedRevisionInfo(commit_hash='3b912489743a1797199beb023a49b14f7e3ba19d', snapshot_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--mahendrawada_2025/snapshots/3b912489743a1797199beb023a49b14f7e3ba19d'), size_on_disk=1957331, files=frozenset({CachedFileInfo(file_name='chec_mahendrawada_2025.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--mahendrawada_2025/snapshots/3b912489743a1797199beb023a49b14f7e3ba19d/chec_mahendrawada_2025.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--mahendrawada_2025/blobs/dd3e967d8c54c6056dc9fd6bdb921c3eed9d10df3a449273bd08faeb1b936765'), size_on_disk=1220601, blob_last_accessed=1759345153.538238, blob_last_modified=1758569549.9902885), CachedFileInfo(file_name='rnaseq_mahendrawada_2025.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--mahendrawada_2025/snapshots/3b912489743a1797199beb023a49b14f7e3ba19d/rnaseq_mahendrawada_2025.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--mahendrawada_2025/blobs/4411789f50aab449658c100f2758ceb410022a0a30f956fd5fe41152915580f9'), size_on_disk=286605, blob_last_accessed=1759345153.5952375, blob_last_modified=1758569588.8621461), CachedFileInfo(file_name='features_mahendrawada_2025.parquet', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--mahendrawada_2025/snapshots/3b912489743a1797199beb023a49b14f7e3ba19d/features_mahendrawada_2025.parquet'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--mahendrawada_2025/blobs/618ea89d880c8a1b4c6d05cc3c13a70cbb05784d10bf2b6c4fc954705203096c'), size_on_disk=431436, blob_last_accessed=1759345156.897212, blob_last_modified=1758652945.6395862), CachedFileInfo(file_name='README.md', file_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--mahendrawada_2025/snapshots/3b912489743a1797199beb023a49b14f7e3ba19d/README.md'), blob_path=PosixPath('/home/chase/.cache/huggingface/hub/datasets--BrentLab--mahendrawada_2025/blobs/c8cbc56ef9fad47193120cce2affd35374b8373d'), size_on_disk=18689, blob_last_accessed=1759345153.4962385, blob_last_modified=1758569549.1302917)}), refs=frozenset(), last_modified=1758652945.6395862)}), last_accessed=1764298314.9501038, last_modified=1763578870.280984)}), warnings=[CorruptedCacheException(\"Snapshots dir doesn't exist in cached repo: /home/chase/.cache/huggingface/hub/datasets--BrentLab--yeast_dto/snapshots\")])"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "cache_info"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. Understanding the 3-Case Metadata Caching Strategy\n",
+    "\n",
+    "HfCacheManager implements an intelligent 3-case strategy for metadata access that minimizes downloads and maximizes performance:\n",
+    "\n",
+    "1. **DuckDB Check**: First check if metadata already exists in the DuckDB database\n",
+    "2. **Cache Load**: If not in DuckDB, try to load from local HuggingFace cache  \n",
+    "3. **Download**: If not cached, download from HuggingFace Hub\n",
+    "\n",
+    "This strategy is implemented in the internal `_get_metadata_for_config()` method and automatically used when loading data with HfQueryAPI."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Demonstrating the 3-Case Strategy\n",
+    "\n",
+    "Let's see how the caching strategy works by examining what metadata tables would be created and checking the DuckDB state."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "DuckDB Metadata Tables (Case 1):\n",
+      "===================================\n",
+      "  No metadata tables found in DuckDB\n",
+      "  → Would proceed to Case 2 (check HF cache) or Case 3 (download)\n",
+      "\n",
+      "The 3-case strategy ensures:\n",
+      "• Fast access: DuckDB queries are nearly instantaneous\n",
+      "• Minimal downloads: Reuse locally cached files when possible\n",
+      "• Automatic fallback: Download only when necessary\n",
+      "• Transparent operation: Works automatically with HfQueryAPI\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Check current DuckDB state (Case 1 check)\n",
+    "tables = conn.execute(\n",
+    "    \"SELECT table_name FROM information_schema.tables WHERE table_name LIKE 'metadata_%'\"\n",
+    ").fetchall()\n",
+    "\n",
+    "print(\"DuckDB Metadata Tables (Case 1):\")\n",
+    "print(\"=\" * 35)\n",
+    "if tables:\n",
+    "    for table in tables:\n",
+    "        count = conn.execute(f\"SELECT COUNT(*) FROM {table[0]}\").fetchone()[0]\n",
+    "        print(f\"  • {table[0]}: {count} rows\")\n",
+    "else:\n",
+    "    print(\"  No metadata tables found in DuckDB\")\n",
+    "    print(\"  → Would proceed to Case 2 (check HF cache) or Case 3 (download)\")\n",
+    "\n",
+    "print(f\"\\nThe 3-case strategy ensures:\")\n",
+    "print(\"• Fast access: DuckDB queries are nearly instantaneous\")\n",
+    "print(\"• Minimal downloads: Reuse locally cached files when possible\")  \n",
+    "print(\"• Automatic fallback: Download only when necessary\")\n",
+    "print(\"• Transparent operation: Works automatically with HfQueryAPI\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Checking HuggingFace Cache Status (Case 2)\n",
+    "\n",
+    "The second case checks if files are already cached locally by HuggingFace. Let's examine the cache state for our target repository."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "HuggingFace Cache Status (Case 2):\n",
+      "========================================\n",
+      "✓ Repository BrentLab/mahendrawada_2025 found in cache\n",
+      "  Size: 94.3M\n",
+      "  Revisions: 4\n",
+      "  Files: 8\n",
+      "  Latest revision: af5ac9dc\n",
+      "  Last accessed: 1763578870.280984\n",
+      "\n",
+      "  → Case 2 would succeed: Load from local cache\n",
+      "\n",
+      "Cache efficiency: Using local files avoids re-downloading 94.3M\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Check if target repository is in HuggingFace cache\n",
+    "cache_info = scan_cache_dir()\n",
+    "target_repo = None\n",
+    "\n",
+    "for repo in cache_info.repos:\n",
+    "    if repo.repo_id == cache_manager.repo_id:\n",
+    "        target_repo = repo\n",
+    "        break\n",
+    "\n",
+    "print(\"HuggingFace Cache Status (Case 2):\")\n",
+    "print(\"=\" * 40)\n",
+    "\n",
+    "if target_repo:\n",
+    "    print(f\"✓ Repository {cache_manager.repo_id} found in cache\")\n",
+    "    print(f\"  Size: {target_repo.size_on_disk_str}\")\n",
+    "    print(f\"  Revisions: {len(target_repo.revisions)}\")\n",
+    "    print(f\"  Files: {target_repo.nb_files}\")\n",
+    "    \n",
+    "    # Show latest revision info\n",
+    "    if target_repo.revisions:\n",
+    "        latest_rev = max(target_repo.revisions, key=lambda r: r.last_modified)\n",
+    "        print(f\"  Latest revision: {latest_rev.commit_hash[:8]}\")\n",
+    "        print(f\"  Last accessed: {latest_rev.last_modified}\")\n",
+    "        \n",
+    "    print(\"\\n  → Case 2 would succeed: Load from local cache\")\n",
+    "else:\n",
+    "    print(f\"✗ Repository {cache_manager.repo_id} not found in cache\")\n",
+    "    print(\"  → Would proceed to Case 3: Download from HuggingFace Hub\")\n",
+    "\n",
+    "print(f\"\\nCache efficiency: Using local files avoids re-downloading {target_repo.size_on_disk_str if target_repo else 'unknown size'}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3. Cache Management and Cleanup\n",
+    "\n",
+    "HfCacheManager's primary value is in providing sophisticated cache management. Let's explore the different cleanup strategies available."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Cache Overview and Current Status\n",
+    "\n",
+    "Before cleaning, let's understand what we're working with."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Current HuggingFace Cache Overview:\n",
+      "========================================\n",
+      "Total cache size: 5.5G\n",
+      "Number of repositories: 11\n",
+      "\n",
+      "Largest repositories (top 5):\n",
+      "  • BrentLab/barkai_compendium: 3.6G (1 revisions)\n",
+      "  • BrentLab/hackett_2020: 817.0M (9 revisions)\n",
+      "  • BrentLab/kemmeren_2014: 646.2M (3 revisions)\n",
+      "  • BrentLab/rossi_2021: 213.7M (9 revisions)\n",
+      "  • BrentLab/mahendrawada_2025: 94.3M (4 revisions)\n",
+      "  ... and 6 more repositories\n",
+      "\n",
+      "Total revisions across all repos: 50\n",
+      "Revisions older than 30 days: 41\n",
+      "Recent revisions (≤30 days): 9\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Get comprehensive cache overview\n",
+    "cache_info = scan_cache_dir()\n",
+    "\n",
+    "print(\"Current HuggingFace Cache Overview:\")\n",
+    "print(\"=\" * 40)\n",
+    "print(f\"Total cache size: {cache_info.size_on_disk_str}\")\n",
+    "print(f\"Number of repositories: {len(cache_info.repos)}\")\n",
+    "\n",
+    "# Analyze cache by repository size\n",
+    "repo_sizes = []\n",
+    "for repo in cache_info.repos:\n",
+    "    repo_sizes.append((repo.repo_id, repo.size_on_disk, repo.size_on_disk_str, len(repo.revisions)))\n",
+    "\n",
+    "# Sort by size (largest first)\n",
+    "repo_sizes.sort(key=lambda x: x[1], reverse=True)\n",
+    "\n",
+    "print(f\"\\nLargest repositories (top 5):\")\n",
+    "for repo_id, size_bytes, size_str, revisions in repo_sizes[:5]:\n",
+    "    print(f\"  • {repo_id}: {size_str} ({revisions} revisions)\")\n",
+    "\n",
+    "if len(repo_sizes) > 5:\n",
+    "    print(f\"  ... and {len(repo_sizes) - 5} more repositories\")\n",
+    "\n",
+    "# Calculate total revisions\n",
+    "total_revisions = sum(len(repo.revisions) for repo in cache_info.repos)\n",
+    "print(f\"\\nTotal revisions across all repos: {total_revisions}\")\n",
+    "\n",
+    "# Show age distribution\n",
+    "from datetime import datetime\n",
+    "now = datetime.now().timestamp()\n",
+    "old_revisions = 0\n",
+    "for repo in cache_info.repos:\n",
+    "    for rev in repo.revisions:\n",
+    "        age_days = (now - rev.last_modified) / (24 * 3600)\n",
+    "        if age_days > 30:\n",
+    "            old_revisions += 1\n",
+    "\n",
+    "print(f\"Revisions older than 30 days: {old_revisions}\")\n",
+    "print(f\"Recent revisions (≤30 days): {total_revisions - old_revisions}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Querying Loaded Metadata\n",
+    "\n",
+    "Once metadata is loaded into DuckDB, we can query it using SQL."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Internal Cache Management Methods\n",
+    "\n",
+    "HfCacheManager provides several internal methods that work behind the scenes. Let's explore what these methods do and how they integrate with the caching strategy."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4. Working with Specific Metadata Configurations\n",
+    "\n",
+    "You can also retrieve metadata for specific configurations rather than all at once."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "HfCacheManager Internal Methods:\n",
+      "===================================\n",
+      "\n",
+      "1. _get_metadata_for_config(config)\n",
+      "   → Implements the 3-case strategy for a specific configuration\n",
+      "   → Returns detailed result with strategy used and success status\n",
+      "\n",
+      "2. _check_metadata_exists_in_duckdb(table_name)\n",
+      "   → Case 1: Checks if metadata table already exists in DuckDB\n",
+      "   → Fast check using information_schema.tables\n",
+      "\n",
+      "3. _load_metadata_from_cache(config, table_name)\n",
+      "   → Case 2: Attempts to load from local HuggingFace cache\n",
+      "   → Uses try_to_load_from_cache() to find cached files\n",
+      "\n",
+      "4. _download_and_load_metadata(config, table_name)\n",
+      "   → Case 3: Downloads from HuggingFace Hub if not cached\n",
+      "   → Uses snapshot_download() for efficient file retrieval\n",
+      "\n",
+      "5. _create_duckdb_table_from_files(file_paths, table_name)\n",
+      "   → Creates DuckDB views from parquet files\n",
+      "   → Handles both single files and multiple files efficiently\n",
+      "\n",
+      "6. _extract_embedded_metadata_field(data_table, field, metadata_table)\n",
+      "   → Extracts metadata fields from data tables\n",
+      "   → Creates separate queryable metadata views\n",
+      "\n",
+      "These methods work together to provide:\n",
+      "• Transparent caching that 'just works'\n",
+      "• Minimal network usage through intelligent fallbacks\n",
+      "• Fast metadata access via DuckDB views\n",
+      "• Automatic handling of different file structures\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Demonstrate understanding of internal cache methods\n",
+    "print(\"HfCacheManager Internal Methods:\")\n",
+    "print(\"=\" * 35)\n",
+    "\n",
+    "print(\"\\n1. _get_metadata_for_config(config)\")\n",
+    "print(\"   → Implements the 3-case strategy for a specific configuration\")\n",
+    "print(\"   → Returns detailed result with strategy used and success status\")\n",
+    "\n",
+    "print(\"\\n2. _check_metadata_exists_in_duckdb(table_name)\")\n",
+    "print(\"   → Case 1: Checks if metadata table already exists in DuckDB\")\n",
+    "print(\"   → Fast check using information_schema.tables\")\n",
+    "\n",
+    "print(\"\\n3. _load_metadata_from_cache(config, table_name)\")\n",
+    "print(\"   → Case 2: Attempts to load from local HuggingFace cache\")\n",
+    "print(\"   → Uses try_to_load_from_cache() to find cached files\")\n",
+    "\n",
+    "print(\"\\n4. _download_and_load_metadata(config, table_name)\")\n",
+    "print(\"   → Case 3: Downloads from HuggingFace Hub if not cached\")\n",
+    "print(\"   → Uses snapshot_download() for efficient file retrieval\")\n",
+    "\n",
+    "print(\"\\n5. _create_duckdb_table_from_files(file_paths, table_name)\")\n",
+    "print(\"   → Creates DuckDB views from parquet files\")\n",
+    "print(\"   → Handles both single files and multiple files efficiently\")\n",
+    "\n",
+    "print(\"\\n6. _extract_embedded_metadata_field(data_table, field, metadata_table)\")\n",
+    "print(\"   → Extracts metadata fields from data tables\")\n",
+    "print(\"   → Creates separate queryable metadata views\")\n",
+    "\n",
+    "print(\"\\nThese methods work together to provide:\")\n",
+    "print(\"• Transparent caching that 'just works'\")\n",
+    "print(\"• Minimal network usage through intelligent fallbacks\")\n",
+    "print(\"• Fast metadata access via DuckDB views\")\n",
+    "print(\"• Automatic handling of different file structures\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5. Extracting Embedded Metadata\n",
+    "\n",
+    "Some datasets have metadata embedded within their data files. The HfCacheManager can extract this embedded metadata into separate, queryable tables."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4. Embedded Metadata Extraction\n",
+    "\n",
+    "One unique feature of HfCacheManager is the ability to extract embedded metadata fields from data tables into separate, queryable metadata tables."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Demonstrate embedded metadata extraction concept\n",
+    "print(\"Embedded Metadata Extraction:\")\n",
+    "print(\"=\" * 35)\n",
+    "\n",
+    "print(\"\\nScenario: You have a data table with embedded metadata fields\")\n",
+    "print(\"Example: genomics data with 'experimental_condition' field\")\n",
+    "\n",
+    "# Create sample data to demonstrate the concept\n",
+    "conn.execute(\"\"\"\n",
+    "    CREATE TABLE sample_genomics_data AS \n",
+    "    SELECT \n",
+    "        'gene_' || (row_number() OVER()) as gene_id,\n",
+    "        random() * 1000 as expression_value,\n",
+    "        CASE \n",
+    "            WHEN (row_number() OVER()) % 4 = 0 THEN 'control'\n",
+    "            WHEN (row_number() OVER()) % 4 = 1 THEN 'treatment_A'\n",
+    "            WHEN (row_number() OVER()) % 4 = 2 THEN 'treatment_B'\n",
+    "            ELSE 'stress_condition'\n",
+    "        END as experimental_condition,\n",
+    "        CASE \n",
+    "            WHEN (row_number() OVER()) % 3 = 0 THEN 'timepoint_0h'\n",
+    "            WHEN (row_number() OVER()) % 3 = 1 THEN 'timepoint_6h'\n",
+    "            ELSE 'timepoint_24h'\n",
+    "        END as timepoint\n",
+    "    FROM range(100)\n",
+    "\"\"\")\n",
+    "\n",
+    "print(\"✓ Created sample genomics data with embedded metadata fields\")\n",
+    "\n",
+    "# Show the data structure\n",
+    "sample_data = conn.execute(\n",
+    "    \"SELECT * FROM sample_genomics_data LIMIT 5\"\n",
+    ").fetchall()\n",
+    "\n",
+    "print(f\"\\nSample data structure:\")\n",
+    "print(\"gene_id | expression_value | experimental_condition | timepoint\")\n",
+    "print(\"-\" * 65)\n",
+    "for row in sample_data:\n",
+    "    print(f\"{row[0]:8} | {row[1]:15.1f} | {row[2]:20} | {row[3]}\")\n",
+    "\n",
+    "print(f\"\\nEmbedded metadata fields identified:\")\n",
+    "print(\"• experimental_condition: Contains treatment/control information\")\n",
+    "print(\"• timepoint: Contains temporal sampling information\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Use HfCacheManager to extract embedded metadata\n",
+    "print(\"Using HfCacheManager for Metadata Extraction:\")\n",
+    "print(\"=\" * 50)\n",
+    "\n",
+    "# Extract experimental_condition metadata\n",
+    "success1 = cache_manager._extract_embedded_metadata_field(\n",
+    "    'sample_genomics_data', \n",
+    "    'experimental_condition', \n",
+    "    'metadata_experimental_conditions'\n",
+    ")\n",
+    "\n",
+    "# Extract timepoint metadata  \n",
+    "success2 = cache_manager._extract_embedded_metadata_field(\n",
+    "    'sample_genomics_data',\n",
+    "    'timepoint', \n",
+    "    'metadata_timepoints'\n",
+    ")\n",
+    "\n",
+    "print(f\"Experimental condition extraction: {'✓ Success' if success1 else '✗ Failed'}\")\n",
+    "print(f\"Timepoint extraction: {'✓ Success' if success2 else '✗ Failed'}\")\n",
+    "\n",
+    "# Show extracted metadata tables\n",
+    "if success1:\n",
+    "    print(f\"\\nExtracted experimental conditions:\")\n",
+    "    conditions = conn.execute(\n",
+    "        \"SELECT value, count FROM metadata_experimental_conditions ORDER BY count DESC\"\n",
+    "    ).fetchall()\n",
+    "    \n",
+    "    for condition, count in conditions:\n",
+    "        print(f\"  • {condition}: {count} samples\")\n",
+    "\n",
+    "if success2:\n",
+    "    print(f\"\\nExtracted timepoints:\")\n",
+    "    timepoints = conn.execute(\n",
+    "        \"SELECT value, count FROM metadata_timepoints ORDER BY count DESC\"\n",
+    "    ).fetchall()\n",
+    "    \n",
+    "    for timepoint, count in timepoints:\n",
+    "        print(f\"  • {timepoint}: {count} samples\")\n",
+    "\n",
+    "print(f\"\\nBenefits of extraction:\")\n",
+    "print(\"• Separate queryable metadata tables\")\n",
+    "print(\"• Fast metadata-based filtering and analysis\") \n",
+    "print(\"• Clear separation of data and metadata concerns\")\n",
+    "print(\"• Reusable metadata across different analyses\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Current HuggingFace Cache Status:\n",
+      "===================================\n",
+      "Total size: 5.5G\n",
+      "Number of repositories: 11\n",
+      "\n",
+      "Repository breakdown:\n",
+      "  • BrentLab/yeast_comparative_analysis: 166.1K (1 revisions)\n",
+      "  • BrentLab/yeast_genome_resources: 114.5K (7 revisions)\n",
+      "  • BrentLab/barkai_compendium: 3.6G (1 revisions)\n",
+      "  • BrentLab/kemmeren_2014: 646.2M (3 revisions)\n",
+      "  • BrentLab/hu_2007_reimand_2010: 42.7M (1 revisions)\n",
+      "  ... and 6 more repositories\n",
+      "\n",
+      "Target repository (BrentLab/mahendrawada_2025) cache info:\n",
+      "  Size: 94.3M\n",
+      "  Revisions: 4\n",
+      "  Latest revision: af5ac9dc\n",
+      "  Last modified: 1763578870.280984\n"
+     ]
+    }
+   ],
+   "source": [
+    "from huggingface_hub import scan_cache_dir\n",
+    "\n",
+    "# Get current cache information  \n",
+    "cache_info = scan_cache_dir()\n",
+    "\n",
+    "print(\"Current HuggingFace Cache Status:\")\n",
+    "print(\"=\" * 35)\n",
+    "print(f\"Total size: {cache_info.size_on_disk_str}\")\n",
+    "print(f\"Number of repositories: {len(cache_info.repos)}\")\n",
+    "\n",
+    "print(\"\\nRepository breakdown:\")\n",
+    "for repo in list(cache_info.repos)[:5]:  # Show first 5 repos\n",
+    "    print(f\"  • {repo.repo_id}: {repo.size_on_disk_str} ({len(repo.revisions)} revisions)\")\n",
+    "\n",
+    "if len(cache_info.repos) > 5:\n",
+    "    print(f\"  ... and {len(cache_info.repos) - 5} more repositories\")\n",
+    "\n",
+    "# Show target repository if it exists in cache\n",
+    "target_repo = None\n",
+    "for repo in cache_info.repos:\n",
+    "    if repo.repo_id == cache_manager.repo_id:\n",
+    "        target_repo = repo\n",
+    "        break\n",
+    "\n",
+    "if target_repo:\n",
+    "    print(f\"\\nTarget repository ({cache_manager.repo_id}) cache info:\")\n",
+    "    print(f\"  Size: {target_repo.size_on_disk_str}\")\n",
+    "    print(f\"  Revisions: {len(target_repo.revisions)}\")\n",
+    "    if target_repo.revisions:\n",
+    "        latest_rev = max(target_repo.revisions, key=lambda r: r.last_modified)\n",
+    "        print(f\"  Latest revision: {latest_rev.commit_hash[:8]}\")\n",
+    "        print(f\"  Last modified: {latest_rev.last_modified}\")\n",
+    "else:\n",
+    "    print(f\"\\nTarget repository ({cache_manager.repo_id}) not found in cache.\")\n",
+    "    print(\"It may need to be downloaded first.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Cache Cleanup by Age"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Cleaning cache by age (30+ days old):\n",
+      "========================================\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:__main__:Found 41 old revisions. Will free 4.7G\n",
+      "INFO:__main__:Dry run completed. Use dry_run=False to execute deletion\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "Cleanup strategy created:\n",
+      "Expected space freed: 4.7G\n",
+      "Items to delete: 46\n",
+      "\n",
+      "Breakdown of items to delete:\n",
+      "  • Blob files: 27\n",
+      "  • Reference files: 0\n",
+      "  • Repository directories: 7\n",
+      "  • Snapshot directories: 12\n",
+      "\n",
+      "Sample blob files to delete:\n",
+      "  • /home/chase/.cache/huggingface/hub/datasets--BrentLab--harbison_2004/blobs/b5fbd9e98fd8ddadeeb5631e3b6f5055e917c98d\n",
+      "  • /home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/blobs/a85bd6b418d9644d9adaa1269c27f97469a4aaee51af63cf1aa041f62cd8ba2c\n",
+      "  • /home/chase/.cache/huggingface/hub/datasets--BrentLab--hackett_2020/blobs/c3e72ccb1b8deba4bbfd18abe6081de7ec3914d9\n",
+      "  ... and 24 more blob files\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Clean cache entries older than 30 days (dry run)\n",
+    "print(\"Cleaning cache by age (30+ days old):\")\n",
+    "print(\"=\" * 40)\n",
+    "\n",
+    "age_cleanup = cache_manager.clean_cache_by_age(\n",
+    "    max_age_days=30,\n",
+    "    dry_run=True  # Set to False to actually execute\n",
+    ")\n",
+    "\n",
+    "print(f\"\\nCleanup strategy created:\")\n",
+    "print(f\"Expected space freed: {age_cleanup.expected_freed_size_str}\")\n",
+    "\n",
+    "# Count total items to delete across all categories\n",
+    "total_items = len(age_cleanup.blobs) + len(age_cleanup.refs) + len(age_cleanup.repos) + len(age_cleanup.snapshots)\n",
+    "print(f\"Items to delete: {total_items}\")\n",
+    "\n",
+    "# Show breakdown of what would be deleted\n",
+    "if total_items > 0:\n",
+    "    print(f\"\\nBreakdown of items to delete:\")\n",
+    "    print(f\"  • Blob files: {len(age_cleanup.blobs)}\")\n",
+    "    print(f\"  • Reference files: {len(age_cleanup.refs)}\")\n",
+    "    print(f\"  • Repository directories: {len(age_cleanup.repos)}\")\n",
+    "    print(f\"  • Snapshot directories: {len(age_cleanup.snapshots)}\")\n",
+    "    \n",
+    "    # Show some example items\n",
+    "    if age_cleanup.blobs:\n",
+    "        print(f\"\\nSample blob files to delete:\")\n",
+    "        for item in list(age_cleanup.blobs)[:3]:\n",
+    "            print(f\"  • {item}\")\n",
+    "        if len(age_cleanup.blobs) > 3:\n",
+    "            print(f\"  ... and {len(age_cleanup.blobs) - 3} more blob files\")\n",
+    "else:\n",
+    "    print(\"No old files found for cleanup.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Cache Cleanup by Size"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Cleaning cache to target size: 1GB\n",
+      "========================================\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:__main__:Selected 17 revisions for deletion. Will free 3.8G\n",
+      "INFO:__main__:Dry run completed. Use dry_run=False to execute deletion\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "Size-based cleanup strategy:\n",
+      "Expected space freed: 3.8G\n",
+      "Items to delete: 85\n",
+      "\n",
+      "Comparing cleanup strategies for 1GB:\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:__main__:Selected 17 revisions for deletion. Will free 3.8G\n",
+      "INFO:__main__:Dry run completed. Use dry_run=False to execute deletion\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "  • oldest_first   :     3.8G (85 items)\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:__main__:Selected 4 revisions for deletion. Will free 4.0G\n",
+      "INFO:__main__:Dry run completed. Use dry_run=False to execute deletion\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "  • largest_first  :     4.0G (8 items)\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:__main__:Selected 17 revisions for deletion. Will free 3.8G\n",
+      "INFO:__main__:Dry run completed. Use dry_run=False to execute deletion\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "  • least_used     :     3.8G (85 items)\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Clean cache to target size (dry run)\n",
+    "target_size = \"1GB\"\n",
+    "print(f\"Cleaning cache to target size: {target_size}\")\n",
+    "print(\"=\" * 40)\n",
+    "\n",
+    "size_cleanup = cache_manager.clean_cache_by_size(\n",
+    "    target_size=target_size,\n",
+    "    strategy=\"oldest_first\",  # Can be: oldest_first, largest_first, least_used\n",
+    "    dry_run=True\n",
+    ")\n",
+    "\n",
+    "print(f\"\\nSize-based cleanup strategy:\")\n",
+    "print(f\"Expected space freed: {size_cleanup.expected_freed_size_str}\")\n",
+    "\n",
+    "# Count total items to delete across all categories\n",
+    "total_items = len(size_cleanup.blobs) + len(size_cleanup.refs) + len(size_cleanup.repos) + len(size_cleanup.snapshots)\n",
+    "print(f\"Items to delete: {total_items}\")\n",
+    "\n",
+    "# Compare different strategies\n",
+    "strategies = [\"oldest_first\", \"largest_first\", \"least_used\"]\n",
+    "print(f\"\\nComparing cleanup strategies for {target_size}:\")\n",
+    "\n",
+    "for strategy in strategies:\n",
+    "    try:\n",
+    "        strategy_result = cache_manager.clean_cache_by_size(\n",
+    "            target_size=target_size,\n",
+    "            strategy=strategy,\n",
+    "            dry_run=True\n",
+    "        )\n",
+    "        strategy_total = (len(strategy_result.blobs) + len(strategy_result.refs) + \n",
+    "                         len(strategy_result.repos) + len(strategy_result.snapshots))\n",
+    "        print(f\"  • {strategy:15}: {strategy_result.expected_freed_size_str:>8} \"\n",
+    "              f\"({strategy_total} items)\")\n",
+    "    except Exception as e:\n",
+    "        print(f\"  • {strategy:15}: Error - {e}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Cleaning Unused Revisions"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Cleaning unused revisions (keep latest 2 per repo):\n",
+      "==================================================\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:__main__:Found 31 unused revisions. Will free 642.9M\n",
+      "INFO:__main__:Dry run completed. Use dry_run=False to execute deletion\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "Revision cleanup strategy:\n",
+      "Expected space freed: 642.9M\n",
+      "Items to delete: 118\n",
+      "\n",
+      "Breakdown of cleanup:\n",
+      "  • Blob files: 87\n",
+      "  • Reference files: 0\n",
+      "  • Repository directories: 0\n",
+      "  • Snapshot directories: 31\n",
+      "\n",
+      "Per-repository revision analysis:\n",
+      "\n",
+      "  • BrentLab/yeast_comparative_analysis:\n",
+      "    Total revisions: 1\n",
+      "    Would keep: 1\n",
+      "    Would delete: 0\n",
+      "    Keep: ac03d065 (modified: 1767824941.5531375)\n",
+      "\n",
+      "  • BrentLab/yeast_genome_resources:\n",
+      "    Total revisions: 7\n",
+      "    Would keep: 2\n",
+      "    Would delete: 5\n",
+      "    Keep: 42beb284 (modified: 1758155946.5549896)\n",
+      "    Keep: 15fdb72f (modified: 1755819093.2306638)\n",
+      "    Delete: 7441b9a8 (modified: 1755816785.6988702)\n",
+      "\n",
+      "  • BrentLab/barkai_compendium:\n",
+      "    Total revisions: 1\n",
+      "    Would keep: 1\n",
+      "    Would delete: 0\n",
+      "    Keep: a987ef37 (modified: 1756926783.3167186)\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Clean unused revisions, keeping only the latest 2 per repository\n",
+    "print(\"Cleaning unused revisions (keep latest 2 per repo):\")\n",
+    "print(\"=\" * 50)\n",
+    "\n",
+    "revision_cleanup = cache_manager.clean_unused_revisions(\n",
+    "    keep_latest=2,\n",
+    "    dry_run=True\n",
+    ")\n",
+    "\n",
+    "print(f\"\\nRevision cleanup strategy:\")\n",
+    "print(f\"Expected space freed: {revision_cleanup.expected_freed_size_str}\")\n",
+    "\n",
+    "# Count total items to delete across all categories\n",
+    "total_items = len(revision_cleanup.blobs) + len(revision_cleanup.refs) + len(revision_cleanup.repos) + len(revision_cleanup.snapshots)\n",
+    "print(f\"Items to delete: {total_items}\")\n",
+    "\n",
+    "# Show breakdown\n",
+    "if total_items > 0:\n",
+    "    print(f\"\\nBreakdown of cleanup:\")\n",
+    "    print(f\"  • Blob files: {len(revision_cleanup.blobs)}\")\n",
+    "    print(f\"  • Reference files: {len(revision_cleanup.refs)}\")  \n",
+    "    print(f\"  • Repository directories: {len(revision_cleanup.repos)}\")\n",
+    "    print(f\"  • Snapshot directories: {len(revision_cleanup.snapshots)}\")\n",
+    "\n",
+    "# Show repository-specific breakdown\n",
+    "cache_info = scan_cache_dir()\n",
+    "if cache_info.repos:\n",
+    "    print(\"\\nPer-repository revision analysis:\")\n",
+    "    for repo in list(cache_info.repos)[:3]:\n",
+    "        print(f\"\\n  • {repo.repo_id}:\")\n",
+    "        print(f\"    Total revisions: {len(repo.revisions)}\")\n",
+    "        print(f\"    Would keep: {min(2, len(repo.revisions))}\")\n",
+    "        print(f\"    Would delete: {max(0, len(repo.revisions) - 2)}\")\n",
+    "        \n",
+    "        # Show revision details\n",
+    "        sorted_revisions = sorted(repo.revisions, key=lambda r: r.last_modified, reverse=True)\n",
+    "        for i, rev in enumerate(sorted_revisions[:2]):\n",
+    "            print(f\"    Keep: {rev.commit_hash[:8]} (modified: {rev.last_modified})\")\n",
+    "        \n",
+    "        for rev in sorted_revisions[2:3]:  # Show one that would be deleted\n",
+    "            print(f\"    Delete: {rev.commit_hash[:8]} (modified: {rev.last_modified})\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Automated Cache Management"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:__main__:Starting automated cache cleanup...\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Automated cache cleanup (comprehensive):\n",
+      "========================================\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:__main__:Found 41 old revisions. Will free 4.7G\n",
+      "INFO:__main__:Dry run completed. Use dry_run=False to execute deletion\n",
+      "INFO:__main__:Found 31 unused revisions. Will free 642.9M\n",
+      "INFO:__main__:Dry run completed. Use dry_run=False to execute deletion\n",
+      "INFO:__main__:Selected 9 revisions for deletion. Will free 2.8M\n",
+      "INFO:__main__:Dry run completed. Use dry_run=False to execute deletion\n",
+      "INFO:__main__:Automated cleanup complete. Total freed: 5.0GB\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "Automated cleanup executed 3 strategies:\n",
+      "  1. Strategy freed: 4.7G\n",
+      "  2. Strategy freed: 642.9M\n",
+      "  3. Strategy freed: 2.8M\n",
+      "\n",
+      "Total space that would be freed: 5.0GB\n",
+      "Cache size after cleanup: 129.8MB\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Automated cache cleanup with multiple strategies\n",
+    "print(\"Automated cache cleanup (comprehensive):\")\n",
+    "print(\"=\" * 40)\n",
+    "\n",
+    "auto_cleanup = cache_manager.auto_clean_cache(\n",
+    "    max_age_days=30,           # Remove anything older than 30 days\n",
+    "    max_total_size=\"5GB\",      # Target maximum cache size\n",
+    "    keep_latest_per_repo=2,    # Keep 2 latest revisions per repo\n",
+    "    dry_run=True               # Dry run for safety\n",
+    ")\n",
+    "\n",
+    "print(f\"\\nAutomated cleanup executed {len(auto_cleanup)} strategies:\")\n",
+    "\n",
+    "total_freed = 0\n",
+    "for i, strategy in enumerate(auto_cleanup, 1):\n",
+    "    print(f\"  {i}. Strategy freed: {strategy.expected_freed_size_str}\")\n",
+    "    total_freed += strategy.expected_freed_size\n",
+    "\n",
+    "print(f\"\\nTotal space that would be freed: {cache_manager._format_bytes(total_freed)}\")\n",
+    "\n",
+    "# Calculate final cache size\n",
+    "current_cache = scan_cache_dir()\n",
+    "final_size = current_cache.size_on_disk - total_freed\n",
+    "print(f\"Cache size after cleanup: {cache_manager._format_bytes(max(0, final_size))}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 7. Best Practices and Performance Tips\n",
+    "\n",
+    "Here are some best practices for using HfCacheManager effectively:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Performance Best Practices"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Performance Demonstration: Cache Management Benefits\n",
+      "=======================================================\n",
+      "\n",
+      "Demonstrating cache cleanup performance...\n",
+      "\n",
+      "1. Cache scanning performance:\n",
+      "   Time to scan cache: 0.096 seconds\n",
+      "   Repositories found: 11\n",
+      "   Total cache size: 5.5G\n",
+      "\n",
+      "2. Cleanup strategy creation performance:\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:__main__:Found 41 old revisions. Will free 4.7G\n",
+      "INFO:__main__:Dry run completed. Use dry_run=False to execute deletion\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "   Age cleanup strategy: 0.094 seconds\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:__main__:Selected 17 revisions for deletion. Will free 3.8G\n",
+      "INFO:__main__:Dry run completed. Use dry_run=False to execute deletion\n",
+      "INFO:__main__:Found 31 unused revisions. Will free 642.9M\n",
+      "INFO:__main__:Dry run completed. Use dry_run=False to execute deletion\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "   Size cleanup strategy: 0.093 seconds\n",
+      "   Revision cleanup strategy: 0.100 seconds\n",
+      "\n",
+      "Performance insights:\n",
+      "• Cache scanning is fast: 0.096s for 11 repos\n",
+      "• Cleanup strategy creation is efficient\n",
+      "• Dry runs allow safe preview of cleanup operations\n",
+      "• Multiple strategies can be compared quickly\n"
+     ]
+    }
+   ],
+   "source": [
+    "import time\n",
+    "\n",
+    "print(\"Performance Demonstration: Cache Management Benefits\")\n",
+    "print(\"=\" * 55)\n",
+    "\n",
+    "print(\"\\nDemonstrating cache cleanup performance...\")\n",
+    "\n",
+    "# Show performance of cache scanning and cleanup strategy creation\n",
+    "print(\"\\n1. Cache scanning performance:\")\n",
+    "start_time = time.time()\n",
+    "cache_info = scan_cache_dir()\n",
+    "scan_time = time.time() - start_time\n",
+    "print(f\"   Time to scan cache: {scan_time:.3f} seconds\")\n",
+    "print(f\"   Repositories found: {len(cache_info.repos)}\")\n",
+    "print(f\"   Total cache size: {cache_info.size_on_disk_str}\")\n",
+    "\n",
+    "# Show performance of cleanup strategy creation\n",
+    "print(\"\\n2. Cleanup strategy creation performance:\")\n",
+    "\n",
+    "start_time = time.time()\n",
+    "age_strategy = cache_manager.clean_cache_by_age(max_age_days=30, dry_run=True)\n",
+    "age_time = time.time() - start_time\n",
+    "print(f\"   Age cleanup strategy: {age_time:.3f} seconds\")\n",
+    "\n",
+    "start_time = time.time()\n",
+    "size_strategy = cache_manager.clean_cache_by_size(target_size=\"1GB\", dry_run=True)\n",
+    "size_time = time.time() - start_time\n",
+    "print(f\"   Size cleanup strategy: {size_time:.3f} seconds\")\n",
+    "\n",
+    "start_time = time.time()\n",
+    "revision_strategy = cache_manager.clean_unused_revisions(keep_latest=2, dry_run=True)\n",
+    "revision_time = time.time() - start_time\n",
+    "print(f\"   Revision cleanup strategy: {revision_time:.3f} seconds\")\n",
+    "\n",
+    "print(f\"\\nPerformance insights:\")\n",
+    "print(f\"• Cache scanning is fast: {scan_time:.3f}s for {len(cache_info.repos)} repos\")\n",
+    "print(f\"• Cleanup strategy creation is efficient\")\n",
+    "print(f\"• Dry runs allow safe preview of cleanup operations\")\n",
+    "print(f\"• Multiple strategies can be compared quickly\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Memory and Storage Optimization"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Memory and Storage Optimization Tips:\n",
+      "========================================\n",
+      "\n",
+      "1. DuckDB Views vs Tables:\n",
+      "   • HfCacheManager creates VIEWS by default (not tables)\n",
+      "   • Views reference original parquet files without duplication\n",
+      "   • This saves storage space while enabling fast SQL queries\n",
+      "\n",
+      "2. Metadata-First Workflow:\n",
+      "   • Load metadata first to understand data structure\n",
+      "   • Use metadata to filter and select specific data subsets\n",
+      "   • Avoid loading entire datasets when only portions are needed\n",
+      "\n",
+      "3. Cache Management Strategy:\n",
+      "   • Run automated cleanup regularly\n",
+      "   • Keep cache size reasonable for your system\n",
+      "   • Prioritize keeping recent and frequently-used datasets\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(\"Memory and Storage Optimization Tips:\")\n",
+    "print(\"=\" * 40)\n",
+    "\n",
+    "print(\"\\n1. DuckDB Views vs Tables:\")\n",
+    "print(\"   • HfCacheManager creates VIEWS by default (not tables)\")\n",
+    "print(\"   • Views reference original parquet files without duplication\")\n",
+    "print(\"   • This saves storage space while enabling fast SQL queries\")\n",
+    "\n",
+    "print(\"\\n2. Metadata-First Workflow:\")\n",
+    "print(\"   • Load metadata first to understand data structure\")\n",
+    "print(\"   • Use metadata to filter and select specific data subsets\")\n",
+    "print(\"   • Avoid loading entire datasets when only portions are needed\")\n",
+    "\n",
+    "print(\"\\n3. Cache Management Strategy:\")\n",
+    "print(\"   • Run automated cleanup regularly\")\n",
+    "print(\"   • Keep cache size reasonable for your system\")\n",
+    "print(\"   • Prioritize keeping recent and frequently-used datasets\")\n",
+    "\n",
+    "# Demonstrate DuckDB view benefits\n",
+    "tables_info = conn.execute(\n",
+    "    \"SELECT table_name, table_type FROM information_schema.tables WHERE table_name LIKE 'metadata_%'\"\n",
+    ").fetchall()\n",
+    "\n",
+    "if tables_info:\n",
+    "    print(f\"\\nCurrent DuckDB objects ({len(tables_info)} total):\")\n",
+    "    for table_name, table_type in tables_info:\n",
+    "        print(f\"   • {table_name}: {table_type}\")\n",
+    "    \n",
+    "    view_count = sum(1 for _, table_type in tables_info if table_type == 'VIEW')\n",
+    "    print(f\"\\n   {view_count} views created (space-efficient!)\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 8. Integration with Other Components\n",
+    "\n",
+    "The HfCacheManager works seamlessly with other components in the tfbpapi ecosystem."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "HfCacheManager Integration Workflow:\n",
+      "========================================\n",
+      "\n",
+      "1. Cache Management Setup:\n",
+      "   from tfbpapi.HfCacheManager import HfCacheManager\n",
+      "   cache_mgr = HfCacheManager(repo_id, duckdb_conn)\n",
+      "   # Inherits all DataCard functionality + cache management\n",
+      "\n",
+      "2. Proactive Cache Cleanup:\n",
+      "   # Clean before large operations\n",
+      "   cache_mgr.auto_clean_cache(max_total_size='5GB', dry_run=False)\n",
+      "   # Or use specific strategies\n",
+      "   cache_mgr.clean_cache_by_age(max_age_days=30)\n",
+      "\n",
+      "3. Data Loading with Cache Awareness:\n",
+      "   # The 3-case strategy works automatically with HfQueryAPI\n",
+      "   from tfbpapi import HfQueryAPI\n",
+      "   query_api = HfQueryAPI(repo_id, duckdb_conn)\n",
+      "   # Metadata loading uses cache manager's strategy\n",
+      "   data_df = query_api.get_pandas('config_name')\n",
+      "\n",
+      "4. Embedded Metadata Extraction:\n",
+      "   # Extract metadata fields after data loading\n",
+      "   cache_mgr._extract_embedded_metadata_field(\n",
+      "       'data_table_name', 'metadata_field', 'metadata_table_name')\n",
+      "\n",
+      "5. Regular Cache Maintenance:\n",
+      "   # Schedule regular cleanup\n",
+      "   cache_mgr.clean_unused_revisions(keep_latest=2)\n",
+      "   cache_mgr.clean_cache_by_size('10GB', strategy='oldest_first')\n",
+      "\n",
+      "Current Session State:\n",
+      "Repository: BrentLab/mahendrawada_2025\n",
+      "DuckDB tables: 0\n",
+      "HF cache size: 5.5G\n",
+      "Cache repositories: 11\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(\"HfCacheManager Integration Workflow:\")\n",
+    "print(\"=\" * 40)\n",
+    "\n",
+    "print(\"\\n1. Cache Management Setup:\")\n",
+    "print(\"   from tfbpapi.HfCacheManager import HfCacheManager\")\n",
+    "print(\"   cache_mgr = HfCacheManager(repo_id, duckdb_conn)\")\n",
+    "print(\"   # Inherits all DataCard functionality + cache management\")\n",
+    "\n",
+    "print(\"\\n2. Proactive Cache Cleanup:\")\n",
+    "print(\"   # Clean before large operations\")\n",
+    "print(\"   cache_mgr.auto_clean_cache(max_total_size='5GB', dry_run=False)\")\n",
+    "print(\"   # Or use specific strategies\")\n",
+    "print(\"   cache_mgr.clean_cache_by_age(max_age_days=30)\")\n",
+    "\n",
+    "print(\"\\n3. Data Loading with Cache Awareness:\")\n",
+    "print(\"   # The 3-case strategy works automatically with HfQueryAPI\")\n",
+    "print(\"   from tfbpapi import HfQueryAPI\")\n",
+    "print(\"   query_api = HfQueryAPI(repo_id, duckdb_conn)\")\n",
+    "print(\"   # Metadata loading uses cache manager's strategy\")\n",
+    "print(\"   data_df = query_api.get_pandas('config_name')\")\n",
+    "\n",
+    "print(\"\\n4. Embedded Metadata Extraction:\")\n",
+    "print(\"   # Extract metadata fields after data loading\")\n",
+    "print(\"   cache_mgr._extract_embedded_metadata_field(\")\n",
+    "print(\"       'data_table_name', 'metadata_field', 'metadata_table_name')\")\n",
+    "\n",
+    "print(\"\\n5. Regular Cache Maintenance:\")\n",
+    "print(\"   # Schedule regular cleanup\")\n",
+    "print(\"   cache_mgr.clean_unused_revisions(keep_latest=2)\")\n",
+    "print(\"   cache_mgr.clean_cache_by_size('10GB', strategy='oldest_first')\")\n",
+    "\n",
+    "# Show current state\n",
+    "print(f\"\\nCurrent Session State:\")\n",
+    "print(f\"Repository: {cache_manager.repo_id}\")\n",
+    "print(f\"DuckDB tables: {len(conn.execute('SELECT table_name FROM information_schema.tables').fetchall())}\")\n",
+    "\n",
+    "cache_info = scan_cache_dir()\n",
+    "print(f\"HF cache size: {cache_info.size_on_disk_str}\")\n",
+    "print(f\"Cache repositories: {len(cache_info.repos)}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 9. Troubleshooting and Error Handling\n",
+    "\n",
+    "The HfCacheManager includes comprehensive error handling and diagnostic capabilities."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Cache Management Troubleshooting:\n",
+      "===================================\n",
+      "\n",
+      "1. Import and Setup Issues:\n",
+      "   • Ensure correct import: from tfbpapi.HfCacheManager import HfCacheManager\n",
+      "   • Verify DuckDB connection: conn = duckdb.connect(':memory:')\n",
+      "   • Check repository access permissions\n",
+      "\n",
+      "2. Cache Space and Performance Issues:\n",
+      "   Current cache size: 5.5G\n",
+      "   • Use auto_clean_cache() for automated management\n",
+      "   • Monitor cache growth with scan_cache_dir()\n",
+      "   • Set appropriate size limits for your system\n",
+      "\n",
+      "3. Cache Cleanup Issues:\n",
+      "   • Use dry_run=True first to preview changes\n",
+      "   • Check disk permissions for cache directory\n",
+      "   • Verify no active processes are using cached files\n",
+      "\n",
+      "4. DuckDB Integration Issues:\n",
+      "   • Ensure DuckDB connection is active\n",
+      "   • Check memory limits for in-memory databases\n",
+      "   • Verify table names don't conflict\n",
+      "\n",
+      "Cache Health Check:\n",
+      "✓ DuckDB connection: DuckDB OK\n",
+      "✓ Cache access: 11 repositories found\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:__main__:No old revisions found to delete\n",
+      "INFO:__main__:Found 0 old revisions. Will free 0.0\n",
+      "INFO:__main__:Dry run completed. Use dry_run=False to execute deletion\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "✓ Cache cleanup methods: Working\n",
+      "\n",
+      "Current Status:\n",
+      "Repository: BrentLab/mahendrawada_2025\n",
+      "Logger configured: True\n",
+      "Cache management ready: ✓\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(\"Cache Management Troubleshooting:\")\n",
+    "print(\"=\" * 35)\n",
+    "\n",
+    "print(\"\\n1. Import and Setup Issues:\")\n",
+    "print(\"   • Ensure correct import: from tfbpapi.HfCacheManager import HfCacheManager\")\n",
+    "print(\"   • Verify DuckDB connection: conn = duckdb.connect(':memory:')\")\n",
+    "print(\"   • Check repository access permissions\")\n",
+    "\n",
+    "print(\"\\n2. Cache Space and Performance Issues:\")\n",
+    "try:\n",
+    "    cache_info = scan_cache_dir()\n",
+    "    print(f\"   Current cache size: {cache_info.size_on_disk_str}\")\n",
+    "    print(\"   • Use auto_clean_cache() for automated management\")\n",
+    "    print(\"   • Monitor cache growth with scan_cache_dir()\")\n",
+    "    print(\"   • Set appropriate size limits for your system\")\n",
+    "    \n",
+    "    # Show if cache is getting large\n",
+    "    total_gb = cache_info.size_on_disk / (1024**3)\n",
+    "    if total_gb > 10:\n",
+    "        print(f\"   ⚠️  Large cache detected ({total_gb:.1f}GB) - consider cleanup\")\n",
+    "    \n",
+    "except Exception as e:\n",
+    "    print(f\"   Cache scan error: {e}\")\n",
+    "\n",
+    "print(\"\\n3. Cache Cleanup Issues:\")\n",
+    "print(\"   • Use dry_run=True first to preview changes\")\n",
+    "print(\"   • Check disk permissions for cache directory\")\n",
+    "print(\"   • Verify no active processes are using cached files\")\n",
+    "\n",
+    "print(\"\\n4. DuckDB Integration Issues:\")\n",
+    "print(\"   • Ensure DuckDB connection is active\")\n",
+    "print(\"   • Check memory limits for in-memory databases\")\n",
+    "print(\"   • Verify table names don't conflict\")\n",
+    "\n",
+    "# Perform health checks\n",
+    "print(f\"\\nCache Health Check:\")\n",
+    "\n",
+    "# Test DuckDB\n",
+    "try:\n",
+    "    test_result = conn.execute(\"SELECT 'DuckDB OK' as status\").fetchone()\n",
+    "    print(f\"✓ DuckDB connection: {test_result[0]}\")\n",
+    "except Exception as e:\n",
+    "    print(f\"✗ DuckDB connection: {e}\")\n",
+    "\n",
+    "# Test cache access\n",
+    "try:\n",
+    "    cache_info = scan_cache_dir()\n",
+    "    print(f\"✓ Cache access: {len(cache_info.repos)} repositories found\")\n",
+    "except Exception as e:\n",
+    "    print(f\"✗ Cache access: {e}\")\n",
+    "\n",
+    "# Test cache manager methods\n",
+    "try:\n",
+    "    test_cleanup = cache_manager.clean_cache_by_age(max_age_days=999, dry_run=True)\n",
+    "    print(f\"✓ Cache cleanup methods: Working\")\n",
+    "except Exception as e:\n",
+    "    print(f\"✗ Cache cleanup methods: {e}\")\n",
+    "\n",
+    "print(f\"\\nCurrent Status:\")\n",
+    "print(f\"Repository: {cache_manager.repo_id}\")\n",
+    "print(f\"Logger configured: {cache_manager.logger is not None}\")\n",
+    "print(f\"Cache management ready: ✓\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "tfbpapi-py3.11",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/docs/tutorials/datacard_tutorial.ipynb b/docs/tutorials/datacard_tutorial.ipynb
new file mode 100644
index 0000000..1556a1c
--- /dev/null
+++ b/docs/tutorials/datacard_tutorial.ipynb
@@ -0,0 +1,606 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# DataCard Tutorial: Exploring HuggingFace Dataset Metadata\n",
+    "\n",
+    "The `DataCard` class provides an interface for exploring HuggingFace dataset metadata without loading the actual genomic data. This is particularly useful for:\n",
+    "\n",
+    "- Understanding dataset structure and available configurations\n",
+    "- Exploring experimental conditions at all hierarchy levels\n",
+    "- Discovering metadata relationships\n",
+    "- Planning data analysis workflows and metadata table creation\n",
+    "\n",
+    "In this tutorial, we'll explore the **BrentLab/harbison_2004** dataset, which contains ChIP-chip data for transcription factor binding across 14 environmental conditions in yeast."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. Instantiating a DataCard Object"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Repository: BrentLab/harbison_2004\n"
+     ]
+    }
+   ],
+   "source": [
+    "from tfbpapi.datacard import DataCard\n",
+    "\n",
+    "card = DataCard('BrentLab/harbison_2004')\n",
+    "\n",
+    "print(f\"Repository: {card.repo_id}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. Repository Overview\n",
+    "\n",
+    "Let's start by getting a high-level overview of the dataset."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Repository Information:\n",
+      "========================================\n",
+      "repo_id             : BrentLab/harbison_2004\n",
+      "pretty_name         : Harbison, 2004 ChIP-chip\n",
+      "license             : mit\n",
+      "tags                : ['genomics', 'yeast', 'transcription', 'binding']\n",
+      "language            : ['en']\n",
+      "size_categories     : ['1M<n<10M']\n",
+      "num_configs         : 1\n",
+      "dataset_types       : ['annotated_features']\n",
+      "total_files         : 7\n",
+      "last_modified       : 2025-12-16T20:28:09+00:00\n",
+      "has_default_config  : True\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Get repository information\n",
+    "repo_info = card.get_repository_info()\n",
+    "\n",
+    "print(\"Repository Information:\")\n",
+    "print(\"=\" * 40)\n",
+    "for key, value in repo_info.items():\n",
+    "    print(f\"{key:20}: {value}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Dataset Summary:\n",
+      "==================================================\n",
+      "Dataset: Harbison, 2004 ChIP-chip\n",
+      "Repository: BrentLab/harbison_2004\n",
+      "License: mit\n",
+      "Configurations: 1\n",
+      "Dataset Types: annotated_features\n",
+      "Tags: genomics, yeast, transcription, binding\n",
+      "\n",
+      "Configurations:\n",
+      "  - harbison_2004: annotated_features (default)\n",
+      "    ChIP-chip transcription factor binding data with environmental conditions\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Get a human-readable summary\n",
+    "print(\"Dataset Summary:\")\n",
+    "print(\"=\" * 50)\n",
+    "print(card.summary())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3. Exploring Configurations\n",
+    "\n",
+    "Datasets can have multiple configurations representing different types of data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Number of configurations: 1\n",
+      "\n",
+      "Configuration details:\n",
+      "\n",
+      "• harbison_2004:\n",
+      "  Type: annotated_features\n",
+      "  Default: True\n",
+      "  Description: ChIP-chip transcription factor binding data with environmental conditions\n",
+      "  Features: 7\n"
+     ]
+    }
+   ],
+   "source": [
+    "# List all configurations\n",
+    "print(f\"Number of configurations: {len(card.configs)}\")\n",
+    "print(\"\\nConfiguration details:\")\n",
+    "\n",
+    "for config in card.configs:\n",
+    "    print(f\"\\n• {config.config_name}:\")\n",
+    "    print(f\"  Type: {config.dataset_type.value}\")\n",
+    "    print(f\"  Default: {config.default}\")\n",
+    "    print(f\"  Description: {config.description}\")\n",
+    "    print(f\"  Features: {len(config.dataset_info.features)}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4. Understanding Experimental Conditions: The Three-Level Hierarchy\n",
+    "\n",
+    "The tfbpapi system supports experimental conditions at three hierarchy levels:\n",
+    "\n",
+    "1. **Top-level (repo-wide)**: Conditions common to all datasets/samples\n",
+    "2. **Config-level**: Conditions specific to a dataset configuration\n",
+    "3. **Field-level**: Conditions that vary per sample, defined in field definitions\n",
+    "\n",
+    "Let's explore each level for the Harbison 2004 dataset."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Level 1: Top-Level Conditions\n",
+    "\n",
+    "Top-level conditions apply to all experiments in the repository."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Top-Level Experimental Conditions:\n",
+      "========================================\n",
+      "No top-level conditions defined for this repository\n",
+      "(All conditions are defined at config or field level)\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Get top-level experimental conditions\n",
+    "top_conditions = card.get_experimental_conditions()\n",
+    "\n",
+    "print(\"Top-Level Experimental Conditions:\")\n",
+    "print(\"=\" * 40)\n",
+    "\n",
+    "if top_conditions:\n",
+    "    for key, value in top_conditions.items():\n",
+    "        print(f\"{key}: {value}\")\n",
+    "else:\n",
+    "    print(\"No top-level conditions defined for this repository\")\n",
+    "    print(\"(All conditions are defined at config or field level)\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Level 2: Config-Level Conditions\n",
+    "\n",
+    "Config-level conditions apply to all samples in a specific configuration."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Config-Level Experimental Conditions:\n",
+      "========================================\n",
+      "No config-level conditions defined\n",
+      "(Conditions vary per sample at field level)\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Get config-level conditions (merged with top-level)\n",
+    "config_conditions = card.get_experimental_conditions('harbison_2004')\n",
+    "\n",
+    "print(\"Config-Level Experimental Conditions:\")\n",
+    "print(\"=\" * 40)\n",
+    "\n",
+    "if config_conditions:\n",
+    "    for key, value in config_conditions.items():\n",
+    "        print(f\"{key}: {value}\")\n",
+    "else:\n",
+    "    print(\"No config-level conditions defined\")\n",
+    "    print(\"(Conditions vary per sample at field level)\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Level 3: Field-Level Conditions\n",
+    "\n",
+    "Field-level conditions vary per sample and are defined in field definitions."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Condition Field Definitions:\n",
+      "========================================\n",
+      "Found 14 defined conditions:\n",
+      "\n",
+      "  • Acid\n",
+      "  • Alpha\n",
+      "  • BUT14\n",
+      "  • BUT90\n",
+      "  • GAL\n",
+      "  • H2O2Hi\n",
+      "  • H2O2Lo\n",
+      "  • HEAT\n",
+      "  • Pi-\n",
+      "  • RAFF\n",
+      "  • RAPA\n",
+      "  • SM\n",
+      "  • Thi-\n",
+      "  • YPD\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Get definitions for the 'condition' field\n",
+    "# This maps each condition value to its detailed specification\n",
+    "condition_defs = card.get_field_definitions('harbison_2004', 'condition')\n",
+    "\n",
+    "print(f\"Condition Field Definitions:\")\n",
+    "print(\"=\" * 40)\n",
+    "print(f\"Found {len(condition_defs)} defined conditions:\\n\")\n",
+    "\n",
+    "# Show all condition names\n",
+    "for cond_name in sorted(condition_defs.keys()):\n",
+    "    print(f\"  • {cond_name}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "YPD Condition Definition:\n",
+      "========================================\n",
+      "{\n",
+      "  \"description\": \"Rich media baseline condition\",\n",
+      "  \"temperature_celsius\": 30,\n",
+      "  \"growth_phase_at_harvest\": {\n",
+      "    \"od600\": 0.8\n",
+      "  },\n",
+      "  \"media\": {\n",
+      "    \"name\": \"YPD\",\n",
+      "    \"carbon_source\": [\n",
+      "      {\n",
+      "        \"compound\": \"D-glucose\",\n",
+      "        \"concentration_percent\": 2\n",
+      "      }\n",
+      "    ],\n",
+      "    \"nitrogen_source\": [\n",
+      "      {\n",
+      "        \"compound\": \"yeast_extract\",\n",
+      "        \"concentration_percent\": 1\n",
+      "      },\n",
+      "      {\n",
+      "        \"compound\": \"peptone\",\n",
+      "        \"concentration_percent\": 2\n",
+      "      }\n",
+      "    ]\n",
+      "  }\n",
+      "}\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Explore a specific condition in detail\n",
+    "import json\n",
+    "\n",
+    "# Let's look at the YPD baseline condition\n",
+    "ypd_def = condition_defs.get('YPD', {})\n",
+    "\n",
+    "print(\"YPD Condition Definition:\")\n",
+    "print(\"=\" * 40)\n",
+    "print(json.dumps(ypd_def, indent=2))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 22,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "HEAT Condition Definition:\n",
+      "========================================\n",
+      "{\n",
+      "  \"description\": \"Heat shock stress condition\",\n",
+      "  \"initial_temperature_celsius\": 30,\n",
+      "  \"temperature_shift_celsius\": 37,\n",
+      "  \"temperature_shift_duration_minutes\": 45,\n",
+      "  \"growth_phase_at_harvest\": {\n",
+      "    \"od600\": 0.5\n",
+      "  },\n",
+      "  \"media\": {\n",
+      "    \"name\": \"YPD\",\n",
+      "    \"carbon_source\": [\n",
+      "      {\n",
+      "        \"compound\": \"D-glucose\",\n",
+      "        \"concentration_percent\": 2\n",
+      "      }\n",
+      "    ],\n",
+      "    \"nitrogen_source\": [\n",
+      "      {\n",
+      "        \"compound\": \"yeast_extract\",\n",
+      "        \"concentration_percent\": 1\n",
+      "      },\n",
+      "      {\n",
+      "        \"compound\": \"peptone\",\n",
+      "        \"concentration_percent\": 2\n",
+      "      }\n",
+      "    ]\n",
+      "  }\n",
+      "}\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Let's look at a treatment condition (HEAT shock)\n",
+    "heat_def = condition_defs.get('HEAT', {})\n",
+    "\n",
+    "print(\"HEAT Condition Definition:\")\n",
+    "print(\"=\" * 40)\n",
+    "print(json.dumps(heat_def, indent=2))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5. Working with Condition Definitions\n",
+    "\n",
+    "Now let's see how to extract specific information from condition definitions."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Growth Media Across Conditions:\n",
+      "========================================\n",
+      "  Acid      : YPD\n",
+      "  Alpha     : YPD\n",
+      "  BUT14     : YPD\n",
+      "  BUT90     : YPD\n",
+      "  GAL       : yeast_extract_peptone\n",
+      "  H2O2Hi    : YPD\n",
+      "  H2O2Lo    : YPD\n",
+      "  HEAT      : YPD\n",
+      "  Pi-       : synthetic_complete_minus_phosphate\n",
+      "  RAFF      : yeast_extract_peptone\n",
+      "  RAPA      : YPD\n",
+      "  SM        : synthetic_complete\n",
+      "  Thi-      : synthetic_complete_minus_thiamine\n",
+      "  YPD       : YPD\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Extract growth media names for all conditions\n",
+    "print(\"Growth Media Across Conditions:\")\n",
+    "print(\"=\" * 40)\n",
+    "\n",
+    "for cond_name, cond_def in sorted(condition_defs.items()):\n",
+    "    # Navigate the nested structure\n",
+    "    media = cond_def.get('media', {})\n",
+    "    media_name = media.get('name', 'unspecified')\n",
+    "\n",
+    "    print(f\"  {cond_name:10}: {media_name}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'description': 'Rich media baseline condition',\n",
+       " 'temperature_celsius': 30,\n",
+       " 'growth_phase_at_harvest': {'od600': 0.8},\n",
+       " 'media': {'name': 'YPD',\n",
+       "  'carbon_source': [{'compound': 'D-glucose', 'concentration_percent': 2}],\n",
+       "  'nitrogen_source': [{'compound': 'yeast_extract',\n",
+       "    'concentration_percent': 1},\n",
+       "   {'compound': 'peptone', 'concentration_percent': 2}]}}"
+      ]
+     },
+     "execution_count": 24,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "condition_defs.get(\"YPD\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Temperature Across Conditions:\n",
+      "========================================\n",
+      "  Acid      : not specified°C\n",
+      "  Alpha     : not specified°C\n",
+      "  BUT14     : not specified°C\n",
+      "  BUT90     : not specified°C\n",
+      "  GAL       : not specified°C\n",
+      "  H2O2Hi    : not specified°C\n",
+      "  H2O2Lo    : not specified°C\n",
+      "  HEAT      : not specified°C\n",
+      "  Pi-       : not specified°C\n",
+      "  RAFF      : not specified°C\n",
+      "  RAPA      : not specified°C\n",
+      "  SM        : not specified°C\n",
+      "  Thi-      : not specified°C\n",
+      "  YPD       : not specified°C\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Extract temperature conditions\n",
+    "print(\"Temperature Across Conditions:\")\n",
+    "print(\"=\" * 40)\n",
+    "\n",
+    "for cond_name, cond_def in sorted(condition_defs.items()):\n",
+    "    env_conds = cond_def.get('environmental_conditions', {})\n",
+    "    temp = env_conds.get('temperature_celsius', 'not specified')\n",
+    "\n",
+    "    # Also check for temperature shifts\n",
+    "    temp_shift = env_conds.get('temperature_shift')\n",
+    "    if temp_shift:\n",
+    "        from_temp = temp_shift.get('from_celsius', '?')\n",
+    "        to_temp = temp_shift.get('to_celsius', '?')\n",
+    "        print(f\"  {cond_name:10}: {from_temp}°C → {to_temp}°C\")\n",
+    "    else:\n",
+    "        print(f\"  {cond_name:10}: {temp}°C\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 6. Using extract_metadata_schema for Metadata Table Planning\n",
+    "\n",
+    "The `extract_metadata_schema` method provides all condition information in one call, which is useful for planning metadata table creation."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Metadata Schema Summary:\n",
+      "========================================\n",
+      "Regulator fields: ['regulator_locus_tag', 'regulator_symbol']\n",
+      "Target fields: ['target_locus_tag', 'target_symbol']\n",
+      "Condition fields: ['condition']\n",
+      "\n",
+      "Top-level conditions: None\n",
+      "Config-level conditions: None\n",
+      "Field definitions available for: ['condition']\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Extract complete metadata schema\n",
+    "schema = card.extract_metadata_schema('harbison_2004')\n",
+    "\n",
+    "print(\"Metadata Schema Summary:\")\n",
+    "print(\"=\" * 40)\n",
+    "print(f\"Regulator fields: {schema['regulator_fields']}\")\n",
+    "print(f\"Target fields: {schema['target_fields']}\")\n",
+    "print(f\"Condition fields: {schema['condition_fields']}\")\n",
+    "print(f\"\\nTop-level conditions: {schema['top_level_conditions']}\")\n",
+    "print(f\"Config-level conditions: {schema['config_level_conditions']}\")\n",
+    "print(f\"Field definitions available for: {list(schema['condition_definitions'].keys())}\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "tfbpapi-py3.11",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/docs/tutorials/virtual_db_tutorial.ipynb b/docs/tutorials/virtual_db_tutorial.ipynb
new file mode 100644
index 0000000..c5f87f6
--- /dev/null
+++ b/docs/tutorials/virtual_db_tutorial.ipynb
@@ -0,0 +1,3087 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "# VirtualDB Tutorial: Unified Cross-Dataset Queries\n",
+        "\n",
+        "The `VirtualDB` class provides a unified query interface across heterogeneous datasets with different experimental condition structures and terminologies. Each dataset defines conditions in its own way, with properties at different hierarchy levels and using different naming conventions. VirtualDB uses external YAML configuration to:\n",
+        "\n",
+        "- Map varying structures to a common schema\n",
+        "- Normalize factor level names (e.g., \"D-glucose\", \"dextrose\", \"glu\" all become \"glucose\")\n",
+        "- Enable cross-dataset queries with standardized field names and values\n",
+        "\n",
+        "In this tutorial, we'll explore how to use VirtualDB to query and compare data across multiple datasets."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Creating a VirtualDB Specification\n",
+        "\n",
+        "VirtualDB requires a YAML configuration file that defines:\n",
+        "- Which datasets to include\n",
+        "- How to map their fields to common names\n",
+        "- How to normalize factor levels"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 1,
+      "metadata": {},
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Configuration saved to: /tmp/tmpnetd9hv1/vdb_config.yaml\n"
+          ]
+        }
+      ],
+      "source": [
+        "# For this tutorial, we'll create a sample configuration\n",
+        "# In practice, you'd load this from a YAML file\n",
+        "config_yaml = \"\"\"\n",
+        "repositories:\n",
+        "  BrentLab/harbison_2004:\n",
+        "    dataset:\n",
+        "      harbison_2004:\n",
+        "        sample_id:\n",
+        "          field: sample_id\n",
+        "        carbon_source:\n",
+        "          field: condition\n",
+        "          path: media.carbon_source.compound\n",
+        "        temperature_celsius:\n",
+        "          field: condition\n",
+        "          path: temperature_celsius\n",
+        "          dtype: numeric\n",
+        "        environmental_condition:\n",
+        "          field: condition\n",
+        "\n",
+        "        comparative_analyses:\n",
+        "          - repo: BrentLab/yeast_comparative_analysis\n",
+        "            dataset: dto\n",
+        "            via_field: binding_id\n",
+        "\n",
+        "  BrentLab/kemmeren_2014:\n",
+        "    dataset:\n",
+        "      kemmeren_2014:\n",
+        "        sample_id:\n",
+        "          field: sample_id\n",
+        "        carbon_source:\n",
+        "          path: media.carbon_source.compound\n",
+        "        temperature_celsius:\n",
+        "          path: temperature_celsius\n",
+        "          dtype: numeric\n",
+        "\n",
+        "        comparative_analyses:\n",
+        "          - repo: BrentLab/yeast_comparative_analysis\n",
+        "            dataset: dto\n",
+        "            via_field: perturbation_id\n",
+        "\n",
+        "factor_aliases:\n",
+        "  carbon_source:\n",
+        "    glucose: [D-glucose, dextrose, glu]\n",
+        "    galactose: [D-galactose, gal]\n",
+        "    raffinose: [D-raffinose]\n",
+        "\n",
+        "missing_value_labels:\n",
+        "  carbon_source: \"unspecified\"\n",
+        "\n",
+        "description:\n",
+        "  carbon_source: The carbon source provided during growth\n",
+        "  temperature_celsius: Growth temperature in degrees Celsius\n",
+        "  environmental_condition: Named environmental condition\n",
+        "\"\"\"\n",
+        "\n",
+        "# Save config to temporary file\n",
+        "import tempfile\n",
+        "from pathlib import Path\n",
+        "\n",
+        "temp_config = Path(tempfile.mkdtemp()) / \"vdb_config.yaml\"\n",
+        "temp_config.write_text(config_yaml)\n",
+        "\n",
+        "print(f\"Configuration saved to: {temp_config}\")"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 2,
+      "metadata": {},
+      "outputs": [
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "/home/chase/code/tfbp/tfbpapi/.venv/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
+            "  from .autonotebook import tqdm as notebook_tqdm\n"
+          ]
+        },
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "VirtualDB initialized successfully!\n",
+            "Configured repositories: 2\n"
+          ]
+        }
+      ],
+      "source": [
+        "from tfbpapi.virtual_db import VirtualDB\n",
+        "\n",
+        "# Initialize VirtualDB with the configuration\n",
+        "vdb = VirtualDB(str(temp_config))\n",
+        "\n",
+        "print(\"VirtualDB initialized successfully!\")\n",
+        "print(f\"Configured repositories: {len(vdb.config.repositories)}\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Schema Discovery\n",
+        "\n",
+        "The VirtualDB class provides methods to inspect the unified schema after loading the\n",
+        "configuration."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 3,
+      "metadata": {},
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "All available fields:\n",
+            "  - carbon_source\n",
+            "  - environmental_condition\n",
+            "  - sample_id\n",
+            "  - temperature_celsius\n"
+          ]
+        }
+      ],
+      "source": [
+        "# Get all fields defined in any dataset\n",
+        "all_fields = vdb.get_fields()\n",
+        "\n",
+        "print(\"All available fields:\")\n",
+        "for field in sorted(all_fields):\n",
+        "    print(f\"  - {field}\")"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 4,
+      "metadata": {},
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Common fields (present in all datasets):\n",
+            "  - carbon_source\n",
+            "  - temperature_celsius\n"
+          ]
+        }
+      ],
+      "source": [
+        "# Get fields present in ALL datasets (common fields)\n",
+        "common_fields = vdb.get_common_fields()\n",
+        "\n",
+        "print(\"Common fields (present in all datasets):\")\n",
+        "for field in sorted(common_fields):\n",
+        "    print(f\"  - {field}\")"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 5,
+      "metadata": {},
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Datasets with comparative data\n",
+            "\n",
+            "\n",
+            "BrentLab/harbison_2004/harbison_2004:\n",
+            "  - BrentLab/yeast_comparative_analysis/dto\n",
+            "     via field: binding_id\n",
+            "     fields available: 8\n",
+            "\n",
+            "BrentLab/kemmeren_2014/kemmeren_2014:\n",
+            "  - BrentLab/yeast_comparative_analysis/dto\n",
+            "     via field: perturbation_id\n",
+            "     fields available: 8\n",
+            "Comparative data fields\n",
+            "\n",
+            "BrentLab/yeast_comparative_analysis/dto:\n",
+            "  binding_id                    binding_rank_threshold        binding_set_size            \n",
+            "  dto_empirical_pvalue          dto_fdr                       perturbation_id             \n",
+            "  perturbation_rank_threshold   perturbation_set_size       \n"
+          ]
+        }
+      ],
+      "source": [
+        "# Get fields that may be used to filter two or more datasets at a time\n",
+        "comp_info = vdb.get_comparative_analyses()\n",
+        "\n",
+        "print(\"Datasets with comparative data\\n\")\n",
+        "for primary_dataset, comparatives in sorted(comp_info[\"primary_to_comparative\"].items()):\n",
+        "    print(f\"\\n{primary_dataset}:\")\n",
+        "    for comp in comparatives:\n",
+        "        comp_key = f\"{comp['comparative_repo']}/{comp['comparative_dataset']}\"\n",
+        "        print(f\"  - {comp_key}\")\n",
+        "        print(f\"     via field: {comp['via_field']}\")\n",
+        "        num_fields = len(comp_info[\"comparative_fields\"].get(comp_key, []))\n",
+        "        print(f\"     fields available: {num_fields}\")\n",
+        "\n",
+        "# Show fields available from comparative datasets\n",
+        "print(\"Comparative data fields\")\n",
+        "for comp_dataset, fields in sorted(comp_info[\"comparative_fields\"].items()):\n",
+        "    print(f\"\\n{comp_dataset}:\")\n",
+        "    if fields:\n",
+        "        # Print in columns for better readability\n",
+        "        fields_sorted = sorted(fields)\n",
+        "        for i in range(0, len(fields_sorted), 3):\n",
+        "            row_fields = fields_sorted[i:i + 3]\n",
+        "            print(\"  \" + \"  \".join(f\"{f:<28}\" for f in row_fields))\n",
+        "    else:\n",
+        "        print(\"  (no fields found)\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Discovering Valid Values\n",
+        "\n",
+        "VirtualDB can tell you what values exist for each field."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 6,
+      "metadata": {},
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Unique carbon sources (normalized):\n",
+            "  - galactose\n",
+            "  - glucose\n",
+            "  - raffinose\n",
+            "  - unspecified\n"
+          ]
+        }
+      ],
+      "source": [
+        "# Get all unique values for a field (normalized)\n",
+        "carbon_source_factor_levels = vdb.get_unique_values(\"carbon_source\")\n",
+        "\n",
+        "print(\"Unique carbon sources (normalized):\")\n",
+        "for source in sorted(carbon_source_factor_levels):\n",
+        "    print(f\"  - {source}\")"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 7,
+      "metadata": {},
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Carbon sources by dataset:\n",
+            "\n",
+            "BrentLab/harbison_2004/harbison_2004:\n",
+            "  - galactose\n",
+            "  - glucose\n",
+            "  - raffinose\n",
+            "  - unspecified\n",
+            "\n",
+            "BrentLab/kemmeren_2014/kemmeren_2014:\n",
+            "  - glucose\n"
+          ]
+        }
+      ],
+      "source": [
+        "# Get values broken down by dataset\n",
+        "carbon_by_dataset = vdb.get_unique_values(\"carbon_source\", by_dataset=True)\n",
+        "\n",
+        "print(\"Carbon sources by dataset:\")\n",
+        "for dataset, sources in carbon_by_dataset.items():\n",
+        "    print(f\"\\n{dataset}:\")\n",
+        "    for source in sorted(sources):\n",
+        "        print(f\"  - {source}\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## 4. Simple Queries\n",
+        "\n",
+        "Now let's start querying data. The `query()` method is the primary interface."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Basic Query: All Samples with Glucose\n",
+        "\n",
+        "By default, queries return sample-level data (one row per sample) with all configured fields."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 8,
+      "metadata": {},
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Found 1797 samples with glucose\n",
+            "\n",
+            "Columns: ['sample_id', 'regulator_locus_tag', 'regulator_symbol', 'condition', 'carbon_source', 'temperature_celsius', 'dataset_id']\n",
+            "\n",
+            "First few rows:\n"
+          ]
+        },
+        {
+          "data": {
+            "application/vnd.microsoft.datawrangler.viewer.v0+json": {
+              "columns": [
+                {
+                  "name": "index",
+                  "rawType": "int64",
+                  "type": "integer"
+                },
+                {
+                  "name": "sample_id",
+                  "rawType": "int32",
+                  "type": "integer"
+                },
+                {
+                  "name": "regulator_locus_tag",
+                  "rawType": "object",
+                  "type": "string"
+                },
+                {
+                  "name": "regulator_symbol",
+                  "rawType": "object",
+                  "type": "string"
+                },
+                {
+                  "name": "condition",
+                  "rawType": "object",
+                  "type": "string"
+                },
+                {
+                  "name": "carbon_source",
+                  "rawType": "object",
+                  "type": "string"
+                },
+                {
+                  "name": "temperature_celsius",
+                  "rawType": "float64",
+                  "type": "float"
+                },
+                {
+                  "name": "dataset_id",
+                  "rawType": "object",
+                  "type": "string"
+                }
+              ],
+              "ref": "b22d63b4-294f-452a-b64e-f79320f4da61",
+              "rows": [
+                [
+                  "0",
+                  "1",
+                  "YSC0017",
+                  "MATA1",
+                  "YPD",
+                  "glucose",
+                  "30.0",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "1",
+                  "2",
+                  "YAL051W",
+                  "OAF1",
+                  "YPD",
+                  "glucose",
+                  "30.0",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "2",
+                  "3",
+                  "YBL005W",
+                  "PDR3",
+                  "YPD",
+                  "glucose",
+                  "30.0",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "3",
+                  "4",
+                  "YBL008W",
+                  "HIR1",
+                  "YPD",
+                  "glucose",
+                  "30.0",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "4",
+                  "5",
+                  "YBL021C",
+                  "HAP3",
+                  "YPD",
+                  "glucose",
+                  "30.0",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ]
+              ],
+              "shape": {
+                "columns": 7,
+                "rows": 5
+              }
+            },
+            "text/html": [
+              "<div>\n",
+              "<style scoped>\n",
+              "    .dataframe tbody tr th:only-of-type {\n",
+              "        vertical-align: middle;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe tbody tr th {\n",
+              "        vertical-align: top;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe thead th {\n",
+              "        text-align: right;\n",
+              "    }\n",
+              "</style>\n",
+              "<table border=\"1\" class=\"dataframe\">\n",
+              "  <thead>\n",
+              "    <tr style=\"text-align: right;\">\n",
+              "      <th></th>\n",
+              "      <th>sample_id</th>\n",
+              "      <th>regulator_locus_tag</th>\n",
+              "      <th>regulator_symbol</th>\n",
+              "      <th>condition</th>\n",
+              "      <th>carbon_source</th>\n",
+              "      <th>temperature_celsius</th>\n",
+              "      <th>dataset_id</th>\n",
+              "    </tr>\n",
+              "  </thead>\n",
+              "  <tbody>\n",
+              "    <tr>\n",
+              "      <th>0</th>\n",
+              "      <td>1</td>\n",
+              "      <td>YSC0017</td>\n",
+              "      <td>MATA1</td>\n",
+              "      <td>YPD</td>\n",
+              "      <td>glucose</td>\n",
+              "      <td>30.0</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>1</th>\n",
+              "      <td>2</td>\n",
+              "      <td>YAL051W</td>\n",
+              "      <td>OAF1</td>\n",
+              "      <td>YPD</td>\n",
+              "      <td>glucose</td>\n",
+              "      <td>30.0</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2</th>\n",
+              "      <td>3</td>\n",
+              "      <td>YBL005W</td>\n",
+              "      <td>PDR3</td>\n",
+              "      <td>YPD</td>\n",
+              "      <td>glucose</td>\n",
+              "      <td>30.0</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>3</th>\n",
+              "      <td>4</td>\n",
+              "      <td>YBL008W</td>\n",
+              "      <td>HIR1</td>\n",
+              "      <td>YPD</td>\n",
+              "      <td>glucose</td>\n",
+              "      <td>30.0</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>4</th>\n",
+              "      <td>5</td>\n",
+              "      <td>YBL021C</td>\n",
+              "      <td>HAP3</td>\n",
+              "      <td>YPD</td>\n",
+              "      <td>glucose</td>\n",
+              "      <td>30.0</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "  </tbody>\n",
+              "</table>\n",
+              "</div>"
+            ],
+            "text/plain": [
+              "   sample_id regulator_locus_tag regulator_symbol condition carbon_source  \\\n",
+              "0          1             YSC0017            MATA1       YPD       glucose   \n",
+              "1          2             YAL051W             OAF1       YPD       glucose   \n",
+              "2          3             YBL005W             PDR3       YPD       glucose   \n",
+              "3          4             YBL008W             HIR1       YPD       glucose   \n",
+              "4          5             YBL021C             HAP3       YPD       glucose   \n",
+              "\n",
+              "   temperature_celsius                            dataset_id  \n",
+              "0                 30.0  BrentLab/harbison_2004/harbison_2004  \n",
+              "1                 30.0  BrentLab/harbison_2004/harbison_2004  \n",
+              "2                 30.0  BrentLab/harbison_2004/harbison_2004  \n",
+              "3                 30.0  BrentLab/harbison_2004/harbison_2004  \n",
+              "4                 30.0  BrentLab/harbison_2004/harbison_2004  "
+            ]
+          },
+          "execution_count": 8,
+          "metadata": {},
+          "output_type": "execute_result"
+        }
+      ],
+      "source": [
+        "# Query all datasets for samples grown on glucose\n",
+        "glucose_samples = vdb.query(filters={\"carbon_source\": \"glucose\"})\n",
+        "\n",
+        "print(f\"Found {len(glucose_samples)} samples with glucose\")\n",
+        "print(f\"\\nColumns: {list(glucose_samples.columns)}\")\n",
+        "print(f\"\\nFirst few rows:\")\n",
+        "glucose_samples.head()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Query Specific Datasets\n",
+        "\n",
+        "Limit your query to specific datasets using the `datasets` parameter."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 9,
+      "metadata": {},
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Found 310 samples from harbison_2004\n"
+          ]
+        },
+        {
+          "data": {
+            "application/vnd.microsoft.datawrangler.viewer.v0+json": {
+              "columns": [
+                {
+                  "name": "index",
+                  "rawType": "int64",
+                  "type": "integer"
+                },
+                {
+                  "name": "sample_id",
+                  "rawType": "int32",
+                  "type": "integer"
+                },
+                {
+                  "name": "regulator_locus_tag",
+                  "rawType": "object",
+                  "type": "string"
+                },
+                {
+                  "name": "regulator_symbol",
+                  "rawType": "object",
+                  "type": "string"
+                },
+                {
+                  "name": "condition",
+                  "rawType": "object",
+                  "type": "string"
+                },
+                {
+                  "name": "carbon_source",
+                  "rawType": "object",
+                  "type": "string"
+                },
+                {
+                  "name": "temperature_celsius",
+                  "rawType": "float64",
+                  "type": "float"
+                },
+                {
+                  "name": "dataset_id",
+                  "rawType": "object",
+                  "type": "string"
+                }
+              ],
+              "ref": "0c9ebf95-0bf1-46d7-83ee-57b87c5def44",
+              "rows": [
+                [
+                  "0",
+                  "1",
+                  "YSC0017",
+                  "MATA1",
+                  "YPD",
+                  "glucose",
+                  "30.0",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "1",
+                  "2",
+                  "YAL051W",
+                  "OAF1",
+                  "YPD",
+                  "glucose",
+                  "30.0",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "2",
+                  "3",
+                  "YBL005W",
+                  "PDR3",
+                  "YPD",
+                  "glucose",
+                  "30.0",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "3",
+                  "4",
+                  "YBL008W",
+                  "HIR1",
+                  "YPD",
+                  "glucose",
+                  "30.0",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "4",
+                  "5",
+                  "YBL021C",
+                  "HAP3",
+                  "YPD",
+                  "glucose",
+                  "30.0",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ]
+              ],
+              "shape": {
+                "columns": 7,
+                "rows": 5
+              }
+            },
+            "text/html": [
+              "<div>\n",
+              "<style scoped>\n",
+              "    .dataframe tbody tr th:only-of-type {\n",
+              "        vertical-align: middle;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe tbody tr th {\n",
+              "        vertical-align: top;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe thead th {\n",
+              "        text-align: right;\n",
+              "    }\n",
+              "</style>\n",
+              "<table border=\"1\" class=\"dataframe\">\n",
+              "  <thead>\n",
+              "    <tr style=\"text-align: right;\">\n",
+              "      <th></th>\n",
+              "      <th>sample_id</th>\n",
+              "      <th>regulator_locus_tag</th>\n",
+              "      <th>regulator_symbol</th>\n",
+              "      <th>condition</th>\n",
+              "      <th>carbon_source</th>\n",
+              "      <th>temperature_celsius</th>\n",
+              "      <th>dataset_id</th>\n",
+              "    </tr>\n",
+              "  </thead>\n",
+              "  <tbody>\n",
+              "    <tr>\n",
+              "      <th>0</th>\n",
+              "      <td>1</td>\n",
+              "      <td>YSC0017</td>\n",
+              "      <td>MATA1</td>\n",
+              "      <td>YPD</td>\n",
+              "      <td>glucose</td>\n",
+              "      <td>30.0</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>1</th>\n",
+              "      <td>2</td>\n",
+              "      <td>YAL051W</td>\n",
+              "      <td>OAF1</td>\n",
+              "      <td>YPD</td>\n",
+              "      <td>glucose</td>\n",
+              "      <td>30.0</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2</th>\n",
+              "      <td>3</td>\n",
+              "      <td>YBL005W</td>\n",
+              "      <td>PDR3</td>\n",
+              "      <td>YPD</td>\n",
+              "      <td>glucose</td>\n",
+              "      <td>30.0</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>3</th>\n",
+              "      <td>4</td>\n",
+              "      <td>YBL008W</td>\n",
+              "      <td>HIR1</td>\n",
+              "      <td>YPD</td>\n",
+              "      <td>glucose</td>\n",
+              "      <td>30.0</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>4</th>\n",
+              "      <td>5</td>\n",
+              "      <td>YBL021C</td>\n",
+              "      <td>HAP3</td>\n",
+              "      <td>YPD</td>\n",
+              "      <td>glucose</td>\n",
+              "      <td>30.0</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "  </tbody>\n",
+              "</table>\n",
+              "</div>"
+            ],
+            "text/plain": [
+              "   sample_id regulator_locus_tag regulator_symbol condition carbon_source  \\\n",
+              "0          1             YSC0017            MATA1       YPD       glucose   \n",
+              "1          2             YAL051W             OAF1       YPD       glucose   \n",
+              "2          3             YBL005W             PDR3       YPD       glucose   \n",
+              "3          4             YBL008W             HIR1       YPD       glucose   \n",
+              "4          5             YBL021C             HAP3       YPD       glucose   \n",
+              "\n",
+              "   temperature_celsius                            dataset_id  \n",
+              "0                 30.0  BrentLab/harbison_2004/harbison_2004  \n",
+              "1                 30.0  BrentLab/harbison_2004/harbison_2004  \n",
+              "2                 30.0  BrentLab/harbison_2004/harbison_2004  \n",
+              "3                 30.0  BrentLab/harbison_2004/harbison_2004  \n",
+              "4                 30.0  BrentLab/harbison_2004/harbison_2004  "
+            ]
+          },
+          "execution_count": 9,
+          "metadata": {},
+          "output_type": "execute_result"
+        }
+      ],
+      "source": [
+        "# Query only harbison_2004\n",
+        "harbison_glucose = vdb.query(\n",
+        "    filters={\"carbon_source\": \"glucose\"},\n",
+        "    datasets=[(\"BrentLab/harbison_2004\", \"harbison_2004\")]\n",
+        ")\n",
+        "\n",
+        "print(f\"Found {len(harbison_glucose)} samples from harbison_2004\")\n",
+        "harbison_glucose.head()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Select Specific Fields\n",
+        "\n",
+        "Return only the fields you need with the `fields` parameter."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 10,
+      "metadata": {},
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Columns: ['sample_id', 'carbon_source', 'temperature_celsius', 'dataset_id']\n"
+          ]
+        },
+        {
+          "data": {
+            "application/vnd.microsoft.datawrangler.viewer.v0+json": {
+              "columns": [
+                {
+                  "name": "index",
+                  "rawType": "int64",
+                  "type": "integer"
+                },
+                {
+                  "name": "sample_id",
+                  "rawType": "int32",
+                  "type": "integer"
+                },
+                {
+                  "name": "carbon_source",
+                  "rawType": "object",
+                  "type": "string"
+                },
+                {
+                  "name": "temperature_celsius",
+                  "rawType": "float64",
+                  "type": "float"
+                },
+                {
+                  "name": "dataset_id",
+                  "rawType": "object",
+                  "type": "string"
+                }
+              ],
+              "ref": "38c6fa44-08cf-4751-9476-7bca3cb1c41c",
+              "rows": [
+                [
+                  "0",
+                  "1",
+                  "glucose",
+                  "30.0",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "1",
+                  "2",
+                  "glucose",
+                  "30.0",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "2",
+                  "3",
+                  "glucose",
+                  "30.0",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "3",
+                  "4",
+                  "glucose",
+                  "30.0",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "4",
+                  "5",
+                  "glucose",
+                  "30.0",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ]
+              ],
+              "shape": {
+                "columns": 4,
+                "rows": 5
+              }
+            },
+            "text/html": [
+              "<div>\n",
+              "<style scoped>\n",
+              "    .dataframe tbody tr th:only-of-type {\n",
+              "        vertical-align: middle;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe tbody tr th {\n",
+              "        vertical-align: top;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe thead th {\n",
+              "        text-align: right;\n",
+              "    }\n",
+              "</style>\n",
+              "<table border=\"1\" class=\"dataframe\">\n",
+              "  <thead>\n",
+              "    <tr style=\"text-align: right;\">\n",
+              "      <th></th>\n",
+              "      <th>sample_id</th>\n",
+              "      <th>carbon_source</th>\n",
+              "      <th>temperature_celsius</th>\n",
+              "      <th>dataset_id</th>\n",
+              "    </tr>\n",
+              "  </thead>\n",
+              "  <tbody>\n",
+              "    <tr>\n",
+              "      <th>0</th>\n",
+              "      <td>1</td>\n",
+              "      <td>glucose</td>\n",
+              "      <td>30.0</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>1</th>\n",
+              "      <td>2</td>\n",
+              "      <td>glucose</td>\n",
+              "      <td>30.0</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2</th>\n",
+              "      <td>3</td>\n",
+              "      <td>glucose</td>\n",
+              "      <td>30.0</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>3</th>\n",
+              "      <td>4</td>\n",
+              "      <td>glucose</td>\n",
+              "      <td>30.0</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>4</th>\n",
+              "      <td>5</td>\n",
+              "      <td>glucose</td>\n",
+              "      <td>30.0</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "  </tbody>\n",
+              "</table>\n",
+              "</div>"
+            ],
+            "text/plain": [
+              "   sample_id carbon_source  temperature_celsius  \\\n",
+              "0          1       glucose                 30.0   \n",
+              "1          2       glucose                 30.0   \n",
+              "2          3       glucose                 30.0   \n",
+              "3          4       glucose                 30.0   \n",
+              "4          5       glucose                 30.0   \n",
+              "\n",
+              "                             dataset_id  \n",
+              "0  BrentLab/harbison_2004/harbison_2004  \n",
+              "1  BrentLab/harbison_2004/harbison_2004  \n",
+              "2  BrentLab/harbison_2004/harbison_2004  \n",
+              "3  BrentLab/harbison_2004/harbison_2004  \n",
+              "4  BrentLab/harbison_2004/harbison_2004  "
+            ]
+          },
+          "execution_count": 10,
+          "metadata": {},
+          "output_type": "execute_result"
+        }
+      ],
+      "source": [
+        "# Get just sample_id, carbon_source, and temperature\n",
+        "minimal_data = vdb.query(\n",
+        "    filters={\"carbon_source\": \"glucose\"},\n",
+        "    fields=[\"sample_id\", \"carbon_source\", \"temperature_celsius\"]\n",
+        ")\n",
+        "\n",
+        "print(f\"Columns: {list(minimal_data.columns)}\")\n",
+        "minimal_data.head()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## 5. Advanced Queries\n",
+        "\n",
+        "VirtualDB supports more sophisticated query patterns."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Multiple Filter Conditions"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 11,
+      "metadata": {},
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Found 1791 samples with glucose at 30C\n"
+          ]
+        },
+        {
+          "data": {
+            "application/vnd.microsoft.datawrangler.viewer.v0+json": {
+              "columns": [
+                {
+                  "name": "index",
+                  "rawType": "int64",
+                  "type": "integer"
+                },
+                {
+                  "name": "sample_id",
+                  "rawType": "int32",
+                  "type": "integer"
+                },
+                {
+                  "name": "regulator_locus_tag",
+                  "rawType": "object",
+                  "type": "string"
+                },
+                {
+                  "name": "regulator_symbol",
+                  "rawType": "object",
+                  "type": "string"
+                },
+                {
+                  "name": "condition",
+                  "rawType": "object",
+                  "type": "string"
+                },
+                {
+                  "name": "carbon_source",
+                  "rawType": "object",
+                  "type": "string"
+                },
+                {
+                  "name": "temperature_celsius",
+                  "rawType": "float64",
+                  "type": "float"
+                },
+                {
+                  "name": "dataset_id",
+                  "rawType": "object",
+                  "type": "string"
+                }
+              ],
+              "ref": "357f69ed-ad79-4401-8458-5a1cc48f14c5",
+              "rows": [
+                [
+                  "0",
+                  "1",
+                  "YSC0017",
+                  "MATA1",
+                  "YPD",
+                  "glucose",
+                  "30.0",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "1",
+                  "2",
+                  "YAL051W",
+                  "OAF1",
+                  "YPD",
+                  "glucose",
+                  "30.0",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "2",
+                  "3",
+                  "YBL005W",
+                  "PDR3",
+                  "YPD",
+                  "glucose",
+                  "30.0",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "3",
+                  "4",
+                  "YBL008W",
+                  "HIR1",
+                  "YPD",
+                  "glucose",
+                  "30.0",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "4",
+                  "5",
+                  "YBL021C",
+                  "HAP3",
+                  "YPD",
+                  "glucose",
+                  "30.0",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ]
+              ],
+              "shape": {
+                "columns": 7,
+                "rows": 5
+              }
+            },
+            "text/html": [
+              "<div>\n",
+              "<style scoped>\n",
+              "    .dataframe tbody tr th:only-of-type {\n",
+              "        vertical-align: middle;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe tbody tr th {\n",
+              "        vertical-align: top;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe thead th {\n",
+              "        text-align: right;\n",
+              "    }\n",
+              "</style>\n",
+              "<table border=\"1\" class=\"dataframe\">\n",
+              "  <thead>\n",
+              "    <tr style=\"text-align: right;\">\n",
+              "      <th></th>\n",
+              "      <th>sample_id</th>\n",
+              "      <th>regulator_locus_tag</th>\n",
+              "      <th>regulator_symbol</th>\n",
+              "      <th>condition</th>\n",
+              "      <th>carbon_source</th>\n",
+              "      <th>temperature_celsius</th>\n",
+              "      <th>dataset_id</th>\n",
+              "    </tr>\n",
+              "  </thead>\n",
+              "  <tbody>\n",
+              "    <tr>\n",
+              "      <th>0</th>\n",
+              "      <td>1</td>\n",
+              "      <td>YSC0017</td>\n",
+              "      <td>MATA1</td>\n",
+              "      <td>YPD</td>\n",
+              "      <td>glucose</td>\n",
+              "      <td>30.0</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>1</th>\n",
+              "      <td>2</td>\n",
+              "      <td>YAL051W</td>\n",
+              "      <td>OAF1</td>\n",
+              "      <td>YPD</td>\n",
+              "      <td>glucose</td>\n",
+              "      <td>30.0</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2</th>\n",
+              "      <td>3</td>\n",
+              "      <td>YBL005W</td>\n",
+              "      <td>PDR3</td>\n",
+              "      <td>YPD</td>\n",
+              "      <td>glucose</td>\n",
+              "      <td>30.0</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>3</th>\n",
+              "      <td>4</td>\n",
+              "      <td>YBL008W</td>\n",
+              "      <td>HIR1</td>\n",
+              "      <td>YPD</td>\n",
+              "      <td>glucose</td>\n",
+              "      <td>30.0</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>4</th>\n",
+              "      <td>5</td>\n",
+              "      <td>YBL021C</td>\n",
+              "      <td>HAP3</td>\n",
+              "      <td>YPD</td>\n",
+              "      <td>glucose</td>\n",
+              "      <td>30.0</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "  </tbody>\n",
+              "</table>\n",
+              "</div>"
+            ],
+            "text/plain": [
+              "   sample_id regulator_locus_tag regulator_symbol condition carbon_source  \\\n",
+              "0          1             YSC0017            MATA1       YPD       glucose   \n",
+              "1          2             YAL051W             OAF1       YPD       glucose   \n",
+              "2          3             YBL005W             PDR3       YPD       glucose   \n",
+              "3          4             YBL008W             HIR1       YPD       glucose   \n",
+              "4          5             YBL021C             HAP3       YPD       glucose   \n",
+              "\n",
+              "   temperature_celsius                            dataset_id  \n",
+              "0                 30.0  BrentLab/harbison_2004/harbison_2004  \n",
+              "1                 30.0  BrentLab/harbison_2004/harbison_2004  \n",
+              "2                 30.0  BrentLab/harbison_2004/harbison_2004  \n",
+              "3                 30.0  BrentLab/harbison_2004/harbison_2004  \n",
+              "4                 30.0  BrentLab/harbison_2004/harbison_2004  "
+            ]
+          },
+          "execution_count": 11,
+          "metadata": {},
+          "output_type": "execute_result"
+        }
+      ],
+      "source": [
+        "# Samples with glucose at 30C\n",
+        "glucose_30c = vdb.query(\n",
+        "    filters={\n",
+        "        \"carbon_source\": \"glucose\",\n",
+        "        \"temperature_celsius\": 30\n",
+        "    }\n",
+        ")\n",
+        "\n",
+        "print(f\"Found {len(glucose_30c)} samples with glucose at 30C\")\n",
+        "glucose_30c.head()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Numeric Range Queries"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 12,
+      "metadata": {},
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Found 1833 samples at >= 30C\n",
+            "Found 1833 samples between 28-32C\n"
+          ]
+        }
+      ],
+      "source": [
+        "# Samples at temperature >= 30C\n",
+        "warm_samples = vdb.query(\n",
+        "    filters={\"temperature_celsius\": (\">=\", 30)}\n",
+        ")\n",
+        "\n",
+        "print(f\"Found {len(warm_samples)} samples at >= 30C\")\n",
+        "\n",
+        "# Samples between 28C and 32C\n",
+        "moderate_temp = vdb.query(\n",
+        "    filters={\"temperature_celsius\": (\"between\", 28, 32)}\n",
+        ")\n",
+        "\n",
+        "print(f\"Found {len(moderate_temp)} samples between 28-32C\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Factor Alias Expansion\n",
+        "\n",
+        "When you query for a normalized value, VirtualDB automatically expands to all original aliases."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 13,
+      "metadata": {},
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "   sample_id regulator_locus_tag regulator_symbol condition carbon_source  \\\n",
+            "0         68             YDR277C             MTH1       GAL     galactose   \n",
+            "1        112             YGL035C             MIG1       GAL     galactose   \n",
+            "2        197             YKL038W             RGT1       GAL     galactose   \n",
+            "3        335             YPL248C             GAL4       GAL     galactose   \n",
+            "\n",
+            "   temperature_celsius                            dataset_id  \n",
+            "0                 30.0  BrentLab/harbison_2004/harbison_2004  \n",
+            "1                 30.0  BrentLab/harbison_2004/harbison_2004  \n",
+            "2                 30.0  BrentLab/harbison_2004/harbison_2004  \n",
+            "3                 30.0  BrentLab/harbison_2004/harbison_2004  \n"
+          ]
+        }
+      ],
+      "source": [
+        "# Query for \"galactose\" matches \"D-galactose\", \"gal\", and \"galactose\"\n",
+        "galactose_samples = vdb.query(filters={\"carbon_source\": \"galactose\"})\n",
+        "\n",
+        "print(galactose_samples)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Complete Data Retrieval\n",
+        "\n",
+        "By default, `query()` returns sample-level metadata (one row per sample). \n",
+        "Set `complete=True` to get all measurements (many rows per sample)."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 14,
+      "metadata": {},
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Complete data: 1930060 rows\n",
+            "Columns: ['sample_id', 'db_id', 'target_locus_tag', 'target_symbol', 'effect', 'pvalue', 'regulator_locus_tag', 'regulator_symbol', 'condition', 'carbon_source', 'temperature_celsius', 'dataset_id']\n",
+            "\n",
+            "First few measurements:\n"
+          ]
+        },
+        {
+          "data": {
+            "application/vnd.microsoft.datawrangler.viewer.v0+json": {
+              "columns": [
+                {
+                  "name": "index",
+                  "rawType": "int64",
+                  "type": "integer"
+                },
+                {
+                  "name": "sample_id",
+                  "rawType": "int32",
+                  "type": "integer"
+                },
+                {
+                  "name": "db_id",
+                  "rawType": "float64",
+                  "type": "float"
+                },
+                {
+                  "name": "target_locus_tag",
+                  "rawType": "object",
+                  "type": "string"
+                },
+                {
+                  "name": "target_symbol",
+                  "rawType": "object",
+                  "type": "string"
+                },
+                {
+                  "name": "effect",
+                  "rawType": "float64",
+                  "type": "float"
+                },
+                {
+                  "name": "pvalue",
+                  "rawType": "float64",
+                  "type": "float"
+                },
+                {
+                  "name": "regulator_locus_tag",
+                  "rawType": "object",
+                  "type": "string"
+                },
+                {
+                  "name": "regulator_symbol",
+                  "rawType": "object",
+                  "type": "string"
+                },
+                {
+                  "name": "condition",
+                  "rawType": "object",
+                  "type": "string"
+                },
+                {
+                  "name": "carbon_source",
+                  "rawType": "object",
+                  "type": "string"
+                },
+                {
+                  "name": "temperature_celsius",
+                  "rawType": "float64",
+                  "type": "float"
+                },
+                {
+                  "name": "dataset_id",
+                  "rawType": "object",
+                  "type": "string"
+                }
+              ],
+              "ref": "0b10b74e-6f1a-42af-8654-7811d039bfac",
+              "rows": [
+                [
+                  "0",
+                  "1",
+                  "0.0",
+                  "YAL001C",
+                  "TFC3",
+                  "1.697754",
+                  "0.068704735",
+                  "YSC0017",
+                  "MATA1",
+                  "YPD",
+                  "glucose",
+                  "30.0",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "1",
+                  "1",
+                  "0.0",
+                  "YAL002W",
+                  "VPS8",
+                  null,
+                  null,
+                  "YSC0017",
+                  "MATA1",
+                  "YPD",
+                  "glucose",
+                  "30.0",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "2",
+                  "1",
+                  "0.0",
+                  "YAL003W",
+                  "EFB1",
+                  null,
+                  null,
+                  "YSC0017",
+                  "MATA1",
+                  "YPD",
+                  "glucose",
+                  "30.0",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "3",
+                  "1",
+                  "0.0",
+                  "YAL004W",
+                  "YAL004W",
+                  "0.74534215",
+                  "0.83592938",
+                  "YSC0017",
+                  "MATA1",
+                  "YPD",
+                  "glucose",
+                  "30.0",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "4",
+                  "1",
+                  "0.0",
+                  "YAL005C",
+                  "SSA1",
+                  null,
+                  null,
+                  "YSC0017",
+                  "MATA1",
+                  "YPD",
+                  "glucose",
+                  "30.0",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ]
+              ],
+              "shape": {
+                "columns": 12,
+                "rows": 5
+              }
+            },
+            "text/html": [
+              "<div>\n",
+              "<style scoped>\n",
+              "    .dataframe tbody tr th:only-of-type {\n",
+              "        vertical-align: middle;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe tbody tr th {\n",
+              "        vertical-align: top;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe thead th {\n",
+              "        text-align: right;\n",
+              "    }\n",
+              "</style>\n",
+              "<table border=\"1\" class=\"dataframe\">\n",
+              "  <thead>\n",
+              "    <tr style=\"text-align: right;\">\n",
+              "      <th></th>\n",
+              "      <th>sample_id</th>\n",
+              "      <th>db_id</th>\n",
+              "      <th>target_locus_tag</th>\n",
+              "      <th>target_symbol</th>\n",
+              "      <th>effect</th>\n",
+              "      <th>pvalue</th>\n",
+              "      <th>regulator_locus_tag</th>\n",
+              "      <th>regulator_symbol</th>\n",
+              "      <th>condition</th>\n",
+              "      <th>carbon_source</th>\n",
+              "      <th>temperature_celsius</th>\n",
+              "      <th>dataset_id</th>\n",
+              "    </tr>\n",
+              "  </thead>\n",
+              "  <tbody>\n",
+              "    <tr>\n",
+              "      <th>0</th>\n",
+              "      <td>1</td>\n",
+              "      <td>0.0</td>\n",
+              "      <td>YAL001C</td>\n",
+              "      <td>TFC3</td>\n",
+              "      <td>1.697754</td>\n",
+              "      <td>0.068705</td>\n",
+              "      <td>YSC0017</td>\n",
+              "      <td>MATA1</td>\n",
+              "      <td>YPD</td>\n",
+              "      <td>glucose</td>\n",
+              "      <td>30.0</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>1</th>\n",
+              "      <td>1</td>\n",
+              "      <td>0.0</td>\n",
+              "      <td>YAL002W</td>\n",
+              "      <td>VPS8</td>\n",
+              "      <td>NaN</td>\n",
+              "      <td>NaN</td>\n",
+              "      <td>YSC0017</td>\n",
+              "      <td>MATA1</td>\n",
+              "      <td>YPD</td>\n",
+              "      <td>glucose</td>\n",
+              "      <td>30.0</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2</th>\n",
+              "      <td>1</td>\n",
+              "      <td>0.0</td>\n",
+              "      <td>YAL003W</td>\n",
+              "      <td>EFB1</td>\n",
+              "      <td>NaN</td>\n",
+              "      <td>NaN</td>\n",
+              "      <td>YSC0017</td>\n",
+              "      <td>MATA1</td>\n",
+              "      <td>YPD</td>\n",
+              "      <td>glucose</td>\n",
+              "      <td>30.0</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>3</th>\n",
+              "      <td>1</td>\n",
+              "      <td>0.0</td>\n",
+              "      <td>YAL004W</td>\n",
+              "      <td>YAL004W</td>\n",
+              "      <td>0.745342</td>\n",
+              "      <td>0.835929</td>\n",
+              "      <td>YSC0017</td>\n",
+              "      <td>MATA1</td>\n",
+              "      <td>YPD</td>\n",
+              "      <td>glucose</td>\n",
+              "      <td>30.0</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>4</th>\n",
+              "      <td>1</td>\n",
+              "      <td>0.0</td>\n",
+              "      <td>YAL005C</td>\n",
+              "      <td>SSA1</td>\n",
+              "      <td>NaN</td>\n",
+              "      <td>NaN</td>\n",
+              "      <td>YSC0017</td>\n",
+              "      <td>MATA1</td>\n",
+              "      <td>YPD</td>\n",
+              "      <td>glucose</td>\n",
+              "      <td>30.0</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "  </tbody>\n",
+              "</table>\n",
+              "</div>"
+            ],
+            "text/plain": [
+              "   sample_id  db_id target_locus_tag target_symbol    effect    pvalue  \\\n",
+              "0          1    0.0          YAL001C          TFC3  1.697754  0.068705   \n",
+              "1          1    0.0          YAL002W          VPS8       NaN       NaN   \n",
+              "2          1    0.0          YAL003W          EFB1       NaN       NaN   \n",
+              "3          1    0.0          YAL004W       YAL004W  0.745342  0.835929   \n",
+              "4          1    0.0          YAL005C          SSA1       NaN       NaN   \n",
+              "\n",
+              "  regulator_locus_tag regulator_symbol condition carbon_source  \\\n",
+              "0             YSC0017            MATA1       YPD       glucose   \n",
+              "1             YSC0017            MATA1       YPD       glucose   \n",
+              "2             YSC0017            MATA1       YPD       glucose   \n",
+              "3             YSC0017            MATA1       YPD       glucose   \n",
+              "4             YSC0017            MATA1       YPD       glucose   \n",
+              "\n",
+              "   temperature_celsius                            dataset_id  \n",
+              "0                 30.0  BrentLab/harbison_2004/harbison_2004  \n",
+              "1                 30.0  BrentLab/harbison_2004/harbison_2004  \n",
+              "2                 30.0  BrentLab/harbison_2004/harbison_2004  \n",
+              "3                 30.0  BrentLab/harbison_2004/harbison_2004  \n",
+              "4                 30.0  BrentLab/harbison_2004/harbison_2004  "
+            ]
+          },
+          "execution_count": 14,
+          "metadata": {},
+          "output_type": "execute_result"
+        }
+      ],
+      "source": [
+        "# Get complete data with measurements\n",
+        "complete_data = vdb.query(\n",
+        "    filters={\"carbon_source\": \"glucose\"},\n",
+        "    datasets=[(\"BrentLab/harbison_2004\", \"harbison_2004\")],\n",
+        "    complete=True\n",
+        ")\n",
+        "\n",
+        "print(f\"Complete data: {len(complete_data)} rows\")\n",
+        "print(f\"Columns: {list(complete_data.columns)}\")\n",
+        "print(\"\\nFirst few measurements:\")\n",
+        "complete_data.head()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 15,
+      "metadata": {},
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Binding data: 1930060 measurements\n"
+          ]
+        },
+        {
+          "data": {
+            "application/vnd.microsoft.datawrangler.viewer.v0+json": {
+              "columns": [
+                {
+                  "name": "index",
+                  "rawType": "int64",
+                  "type": "integer"
+                },
+                {
+                  "name": "sample_id",
+                  "rawType": "int32",
+                  "type": "integer"
+                },
+                {
+                  "name": "regulator_symbol",
+                  "rawType": "object",
+                  "type": "string"
+                },
+                {
+                  "name": "target_symbol",
+                  "rawType": "object",
+                  "type": "string"
+                },
+                {
+                  "name": "effect",
+                  "rawType": "float64",
+                  "type": "float"
+                },
+                {
+                  "name": "pvalue",
+                  "rawType": "float64",
+                  "type": "float"
+                },
+                {
+                  "name": "dataset_id",
+                  "rawType": "object",
+                  "type": "string"
+                }
+              ],
+              "ref": "0b7cb890-7e9c-44d2-9ef5-59374bcf3a8a",
+              "rows": [
+                [
+                  "0",
+                  "2",
+                  "OAF1",
+                  "TFC3",
+                  "1.5895642",
+                  "0.088986168",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "1",
+                  "2",
+                  "OAF1",
+                  "VPS8",
+                  "1.1413208",
+                  "0.32480496",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "2",
+                  "2",
+                  "OAF1",
+                  "EFB1",
+                  "0.72911994",
+                  "0.87882413",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "3",
+                  "2",
+                  "OAF1",
+                  "YAL004W",
+                  "1.1679044",
+                  "0.28225283",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "4",
+                  "2",
+                  "OAF1",
+                  "SSA1",
+                  "0.72911994",
+                  "0.87882413",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "5",
+                  "2",
+                  "OAF1",
+                  "ERP2",
+                  "1.0508274",
+                  "0.43070675",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "6",
+                  "2",
+                  "OAF1",
+                  "FUN14",
+                  "1.3478761",
+                  "0.15551056",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "7",
+                  "2",
+                  "OAF1",
+                  "SPO7",
+                  "0.93967306",
+                  "0.57823415",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "8",
+                  "2",
+                  "OAF1",
+                  "MDM10",
+                  "0.93967306",
+                  "0.57823415",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "9",
+                  "2",
+                  "OAF1",
+                  "SWC3",
+                  "0.86566703",
+                  "0.6711192",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ]
+              ],
+              "shape": {
+                "columns": 6,
+                "rows": 10
+              }
+            },
+            "text/html": [
+              "<div>\n",
+              "<style scoped>\n",
+              "    .dataframe tbody tr th:only-of-type {\n",
+              "        vertical-align: middle;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe tbody tr th {\n",
+              "        vertical-align: top;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe thead th {\n",
+              "        text-align: right;\n",
+              "    }\n",
+              "</style>\n",
+              "<table border=\"1\" class=\"dataframe\">\n",
+              "  <thead>\n",
+              "    <tr style=\"text-align: right;\">\n",
+              "      <th></th>\n",
+              "      <th>sample_id</th>\n",
+              "      <th>regulator_symbol</th>\n",
+              "      <th>target_symbol</th>\n",
+              "      <th>effect</th>\n",
+              "      <th>pvalue</th>\n",
+              "      <th>dataset_id</th>\n",
+              "    </tr>\n",
+              "  </thead>\n",
+              "  <tbody>\n",
+              "    <tr>\n",
+              "      <th>0</th>\n",
+              "      <td>2</td>\n",
+              "      <td>OAF1</td>\n",
+              "      <td>TFC3</td>\n",
+              "      <td>1.589564</td>\n",
+              "      <td>0.088986</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>1</th>\n",
+              "      <td>2</td>\n",
+              "      <td>OAF1</td>\n",
+              "      <td>VPS8</td>\n",
+              "      <td>1.141321</td>\n",
+              "      <td>0.324805</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2</th>\n",
+              "      <td>2</td>\n",
+              "      <td>OAF1</td>\n",
+              "      <td>EFB1</td>\n",
+              "      <td>0.729120</td>\n",
+              "      <td>0.878824</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>3</th>\n",
+              "      <td>2</td>\n",
+              "      <td>OAF1</td>\n",
+              "      <td>YAL004W</td>\n",
+              "      <td>1.167904</td>\n",
+              "      <td>0.282253</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>4</th>\n",
+              "      <td>2</td>\n",
+              "      <td>OAF1</td>\n",
+              "      <td>SSA1</td>\n",
+              "      <td>0.729120</td>\n",
+              "      <td>0.878824</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>5</th>\n",
+              "      <td>2</td>\n",
+              "      <td>OAF1</td>\n",
+              "      <td>ERP2</td>\n",
+              "      <td>1.050827</td>\n",
+              "      <td>0.430707</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>6</th>\n",
+              "      <td>2</td>\n",
+              "      <td>OAF1</td>\n",
+              "      <td>FUN14</td>\n",
+              "      <td>1.347876</td>\n",
+              "      <td>0.155511</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>7</th>\n",
+              "      <td>2</td>\n",
+              "      <td>OAF1</td>\n",
+              "      <td>SPO7</td>\n",
+              "      <td>0.939673</td>\n",
+              "      <td>0.578234</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>8</th>\n",
+              "      <td>2</td>\n",
+              "      <td>OAF1</td>\n",
+              "      <td>MDM10</td>\n",
+              "      <td>0.939673</td>\n",
+              "      <td>0.578234</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>9</th>\n",
+              "      <td>2</td>\n",
+              "      <td>OAF1</td>\n",
+              "      <td>SWC3</td>\n",
+              "      <td>0.865667</td>\n",
+              "      <td>0.671119</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "  </tbody>\n",
+              "</table>\n",
+              "</div>"
+            ],
+            "text/plain": [
+              "   sample_id regulator_symbol target_symbol    effect    pvalue  \\\n",
+              "0          2             OAF1          TFC3  1.589564  0.088986   \n",
+              "1          2             OAF1          VPS8  1.141321  0.324805   \n",
+              "2          2             OAF1          EFB1  0.729120  0.878824   \n",
+              "3          2             OAF1       YAL004W  1.167904  0.282253   \n",
+              "4          2             OAF1          SSA1  0.729120  0.878824   \n",
+              "5          2             OAF1          ERP2  1.050827  0.430707   \n",
+              "6          2             OAF1         FUN14  1.347876  0.155511   \n",
+              "7          2             OAF1          SPO7  0.939673  0.578234   \n",
+              "8          2             OAF1         MDM10  0.939673  0.578234   \n",
+              "9          2             OAF1          SWC3  0.865667  0.671119   \n",
+              "\n",
+              "                             dataset_id  \n",
+              "0  BrentLab/harbison_2004/harbison_2004  \n",
+              "1  BrentLab/harbison_2004/harbison_2004  \n",
+              "2  BrentLab/harbison_2004/harbison_2004  \n",
+              "3  BrentLab/harbison_2004/harbison_2004  \n",
+              "4  BrentLab/harbison_2004/harbison_2004  \n",
+              "5  BrentLab/harbison_2004/harbison_2004  \n",
+              "6  BrentLab/harbison_2004/harbison_2004  \n",
+              "7  BrentLab/harbison_2004/harbison_2004  \n",
+              "8  BrentLab/harbison_2004/harbison_2004  \n",
+              "9  BrentLab/harbison_2004/harbison_2004  "
+            ]
+          },
+          "execution_count": 15,
+          "metadata": {},
+          "output_type": "execute_result"
+        }
+      ],
+      "source": [
+        "# You can combine complete=True with field selection\n",
+        "# Get just the binding data columns\n",
+        "binding_data = vdb.query(\n",
+        "    filters={\"carbon_source\": \"glucose\"},\n",
+        "    datasets=[(\"BrentLab/harbison_2004\", \"harbison_2004\")],\n",
+        "    fields=[\"sample_id\", \"regulator_symbol\", \"target_symbol\", \"effect\", \"pvalue\"],\n",
+        "    complete=True\n",
+        ")\n",
+        "\n",
+        "print(f\"Binding data: {len(binding_data)} measurements\")\n",
+        "binding_data.head(10)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Example analysis\n",
+        "\n",
+        "The following is an example of using VirtualDB to extract and summarize data across\n",
+        "datasets."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 16,
+      "metadata": {},
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Sample counts by dataset and carbon source:\n",
+            "                          dataset_id carbon_source  num_samples\n",
+            "BrentLab/harbison_2004/harbison_2004     galactose            4\n",
+            "BrentLab/harbison_2004/harbison_2004       glucose          310\n",
+            "BrentLab/harbison_2004/harbison_2004     raffinose            1\n",
+            "BrentLab/harbison_2004/harbison_2004   unspecified           37\n",
+            "BrentLab/kemmeren_2014/kemmeren_2014       glucose         1487\n"
+          ]
+        }
+      ],
+      "source": [
+        "# Compare number of samples by carbon source across datasets\n",
+        "\n",
+        "# Get all samples\n",
+        "all_samples = vdb.query()\n",
+        "\n",
+        "# Count by dataset and carbon source\n",
+        "summary = all_samples.groupby(['dataset_id', 'carbon_source']).size()\n",
+        "summary = summary.reset_index(name='num_samples')\n",
+        "\n",
+        "print(\"Sample counts by dataset and carbon source:\")\n",
+        "print(summary.to_string(index=False))"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 17,
+      "metadata": {},
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Glucose samples by temperature:\n",
+            "  30.0C: 1791 samples\n"
+          ]
+        }
+      ],
+      "source": [
+        "# Compare glucose experiments at different temperatures\n",
+        "\n",
+        "glucose_by_temp = vdb.query(\n",
+        "    filters={\"carbon_source\": \"glucose\"},\n",
+        "    fields=[\"sample_id\", \"temperature_celsius\", \"environmental_condition\"]\n",
+        ")\n",
+        "\n",
+        "# Count samples by temperature\n",
+        "temp_counts = glucose_by_temp['temperature_celsius'].value_counts().sort_index()\n",
+        "\n",
+        "print(\"Glucose samples by temperature:\")\n",
+        "for temp, count in temp_counts.items():\n",
+        "    print(f\"  {temp}C: {count} samples\")"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 18,
+      "metadata": {},
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Found 18678 FHL1 binding measurements in glucose\n",
+            "Significant targets: 379\n",
+            "\n",
+            "Top 10 targets by effect size:\n",
+            "target_symbol    effect       pvalue\n",
+            "         RPS5 24.145013 9.739702e-09\n",
+            "       RPL11A 20.585725 1.232356e-08\n",
+            "         PRE2 20.585725 1.232356e-08\n",
+            "         SRF1 20.342898 1.226799e-08\n",
+            "         SLX8 20.057080 1.513076e-08\n",
+            "       RPL23B 20.057080 1.513076e-08\n",
+            "       RPL40A 19.262139 1.761808e-08\n",
+            "         MLP2 19.262139 1.761808e-08\n",
+            "        RPS6A 18.704379 1.544172e-08\n",
+            "       RPL22A 17.926705 1.560357e-08\n"
+          ]
+        }
+      ],
+      "source": [
+        "# Get binding data for a specific regulator across datasets\n",
+        "\n",
+        "# Query for FHL1 binding in glucose conditions\n",
+        "fhl1_binding = vdb.query(\n",
+        "    filters={\n",
+        "        \"carbon_source\": \"glucose\",\n",
+        "        \"regulator_symbol\": \"FHL1\"\n",
+        "    },\n",
+        "    fields=[\"sample_id\", \"regulator_symbol\", \"target_symbol\", \"effect\", \"pvalue\"],\n",
+        "    complete=True\n",
+        ")\n",
+        "\n",
+        "print(f\"Found {len(fhl1_binding)} FHL1 binding measurements in glucose\")\n",
+        "\n",
+        "# Find significant targets (p < 0.001)\n",
+        "significant = fhl1_binding[fhl1_binding['pvalue'] < 0.001]\n",
+        "print(f\"Significant targets: {len(significant)}\")\n",
+        "\n",
+        "# Top 10 by effect size\n",
+        "top_targets = significant.nlargest(10, 'effect')[['target_symbol', 'effect', 'pvalue']]\n",
+        "print(\"\\nTop 10 targets by effect size:\")\n",
+        "print(top_targets.to_string(index=False))"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Querying Comparative Datasets\n",
+        "\n",
+        "Comparative datasets like DTO (Direct Target Overlap) contain analysis results that relate samples across multiple datasets. These datasets can be queried directly to find significant cross-dataset relationships."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 24,
+      "metadata": {},
+      "outputs": [
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "Fetching 6 files: 100%|██████████| 6/6 [00:00<00:00, 65536.00it/s]\n",
+            "Fetching 6 files: 100%|██████████| 6/6 [00:00<00:00, 57325.34it/s]"
+          ]
+        },
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Found 32 FHL1 binding measurements\n",
+            "\n",
+            "Columns: ['sample_id', 'regulator_symbol', 'condition', 'dto_fdr', 'perturbation_id', 'dataset_id']\n",
+            "\n",
+            "Rows with DTO data: 4\n",
+            "\n",
+            "First few results:\n"
+          ]
+        },
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "\n"
+          ]
+        },
+        {
+          "data": {
+            "application/vnd.microsoft.datawrangler.viewer.v0+json": {
+              "columns": [
+                {
+                  "name": "index",
+                  "rawType": "int64",
+                  "type": "integer"
+                },
+                {
+                  "name": "sample_id",
+                  "rawType": "int32",
+                  "type": "integer"
+                },
+                {
+                  "name": "regulator_symbol",
+                  "rawType": "object",
+                  "type": "string"
+                },
+                {
+                  "name": "condition",
+                  "rawType": "object",
+                  "type": "string"
+                },
+                {
+                  "name": "dto_fdr",
+                  "rawType": "float64",
+                  "type": "float"
+                },
+                {
+                  "name": "perturbation_id",
+                  "rawType": "object",
+                  "type": "string"
+                },
+                {
+                  "name": "dataset_id",
+                  "rawType": "object",
+                  "type": "string"
+                }
+              ],
+              "ref": "a0eb6112-b457-4642-add7-4bcd5068e495",
+              "rows": [
+                [
+                  "0",
+                  "345",
+                  "FHL1",
+                  "H2O2Hi",
+                  "0.4549087454017032",
+                  "BrentLab/Hackett_2020;hackett_2020;1666",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "1",
+                  "345",
+                  "FHL1",
+                  "H2O2Hi",
+                  null,
+                  "BrentLab/Hackett_2020;hackett_2020;1665",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "2",
+                  "345",
+                  "FHL1",
+                  "H2O2Hi",
+                  null,
+                  "BrentLab/Hackett_2020;hackett_2020;1667",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "3",
+                  "345",
+                  "FHL1",
+                  "H2O2Hi",
+                  null,
+                  "BrentLab/Hackett_2020;hackett_2020;1669",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "4",
+                  "345",
+                  "FHL1",
+                  "H2O2Hi",
+                  null,
+                  "BrentLab/Hackett_2020;hackett_2020;1663",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "5",
+                  "345",
+                  "FHL1",
+                  "H2O2Hi",
+                  null,
+                  "BrentLab/Hackett_2020;hackett_2020;1664",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "6",
+                  "345",
+                  "FHL1",
+                  "H2O2Hi",
+                  null,
+                  "BrentLab/Hackett_2020;hackett_2020;1670",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "7",
+                  "345",
+                  "FHL1",
+                  "H2O2Hi",
+                  null,
+                  "BrentLab/Hackett_2020;hackett_2020;1668",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "8",
+                  "346",
+                  "FHL1",
+                  "RAPA",
+                  null,
+                  "BrentLab/Hackett_2020;hackett_2020;1667",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "9",
+                  "346",
+                  "FHL1",
+                  "RAPA",
+                  null,
+                  "BrentLab/Hackett_2020;hackett_2020;1663",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "10",
+                  "346",
+                  "FHL1",
+                  "RAPA",
+                  null,
+                  "BrentLab/Hackett_2020;hackett_2020;1670",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "11",
+                  "346",
+                  "FHL1",
+                  "RAPA",
+                  null,
+                  "BrentLab/Hackett_2020;hackett_2020;1668",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "12",
+                  "346",
+                  "FHL1",
+                  "RAPA",
+                  "0.0",
+                  "BrentLab/Hackett_2020;hackett_2020;1666",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "13",
+                  "346",
+                  "FHL1",
+                  "RAPA",
+                  null,
+                  "BrentLab/Hackett_2020;hackett_2020;1669",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "14",
+                  "346",
+                  "FHL1",
+                  "RAPA",
+                  null,
+                  "BrentLab/Hackett_2020;hackett_2020;1664",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "15",
+                  "346",
+                  "FHL1",
+                  "RAPA",
+                  null,
+                  "BrentLab/Hackett_2020;hackett_2020;1665",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "16",
+                  "347",
+                  "FHL1",
+                  "SM",
+                  null,
+                  "BrentLab/Hackett_2020;hackett_2020;1667",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "17",
+                  "347",
+                  "FHL1",
+                  "SM",
+                  "0.0221957781456953",
+                  "BrentLab/Hackett_2020;hackett_2020;1666",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "18",
+                  "347",
+                  "FHL1",
+                  "SM",
+                  null,
+                  "BrentLab/Hackett_2020;hackett_2020;1669",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "19",
+                  "347",
+                  "FHL1",
+                  "SM",
+                  null,
+                  "BrentLab/Hackett_2020;hackett_2020;1664",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "20",
+                  "347",
+                  "FHL1",
+                  "SM",
+                  null,
+                  "BrentLab/Hackett_2020;hackett_2020;1663",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "21",
+                  "347",
+                  "FHL1",
+                  "SM",
+                  null,
+                  "BrentLab/Hackett_2020;hackett_2020;1670",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "22",
+                  "347",
+                  "FHL1",
+                  "SM",
+                  null,
+                  "BrentLab/Hackett_2020;hackett_2020;1668",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "23",
+                  "347",
+                  "FHL1",
+                  "SM",
+                  null,
+                  "BrentLab/Hackett_2020;hackett_2020;1665",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "24",
+                  "348",
+                  "FHL1",
+                  "YPD",
+                  null,
+                  "BrentLab/Hackett_2020;hackett_2020;1664",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "25",
+                  "348",
+                  "FHL1",
+                  "YPD",
+                  "0.089578429724277",
+                  "BrentLab/Hackett_2020;hackett_2020;1666",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "26",
+                  "348",
+                  "FHL1",
+                  "YPD",
+                  null,
+                  "BrentLab/Hackett_2020;hackett_2020;1663",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "27",
+                  "348",
+                  "FHL1",
+                  "YPD",
+                  null,
+                  "BrentLab/Hackett_2020;hackett_2020;1667",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "28",
+                  "348",
+                  "FHL1",
+                  "YPD",
+                  null,
+                  "BrentLab/Hackett_2020;hackett_2020;1669",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "29",
+                  "348",
+                  "FHL1",
+                  "YPD",
+                  null,
+                  "BrentLab/Hackett_2020;hackett_2020;1665",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "30",
+                  "348",
+                  "FHL1",
+                  "YPD",
+                  null,
+                  "BrentLab/Hackett_2020;hackett_2020;1670",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ],
+                [
+                  "31",
+                  "348",
+                  "FHL1",
+                  "YPD",
+                  null,
+                  "BrentLab/Hackett_2020;hackett_2020;1668",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ]
+              ],
+              "shape": {
+                "columns": 6,
+                "rows": 32
+              }
+            },
+            "text/html": [
+              "<div>\n",
+              "<style scoped>\n",
+              "    .dataframe tbody tr th:only-of-type {\n",
+              "        vertical-align: middle;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe tbody tr th {\n",
+              "        vertical-align: top;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe thead th {\n",
+              "        text-align: right;\n",
+              "    }\n",
+              "</style>\n",
+              "<table border=\"1\" class=\"dataframe\">\n",
+              "  <thead>\n",
+              "    <tr style=\"text-align: right;\">\n",
+              "      <th></th>\n",
+              "      <th>sample_id</th>\n",
+              "      <th>regulator_symbol</th>\n",
+              "      <th>condition</th>\n",
+              "      <th>dto_fdr</th>\n",
+              "      <th>perturbation_id</th>\n",
+              "      <th>dataset_id</th>\n",
+              "    </tr>\n",
+              "  </thead>\n",
+              "  <tbody>\n",
+              "    <tr>\n",
+              "      <th>0</th>\n",
+              "      <td>345</td>\n",
+              "      <td>FHL1</td>\n",
+              "      <td>H2O2Hi</td>\n",
+              "      <td>0.454909</td>\n",
+              "      <td>BrentLab/Hackett_2020;hackett_2020;1666</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>1</th>\n",
+              "      <td>345</td>\n",
+              "      <td>FHL1</td>\n",
+              "      <td>H2O2Hi</td>\n",
+              "      <td>NaN</td>\n",
+              "      <td>BrentLab/Hackett_2020;hackett_2020;1665</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2</th>\n",
+              "      <td>345</td>\n",
+              "      <td>FHL1</td>\n",
+              "      <td>H2O2Hi</td>\n",
+              "      <td>NaN</td>\n",
+              "      <td>BrentLab/Hackett_2020;hackett_2020;1667</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>3</th>\n",
+              "      <td>345</td>\n",
+              "      <td>FHL1</td>\n",
+              "      <td>H2O2Hi</td>\n",
+              "      <td>NaN</td>\n",
+              "      <td>BrentLab/Hackett_2020;hackett_2020;1669</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>4</th>\n",
+              "      <td>345</td>\n",
+              "      <td>FHL1</td>\n",
+              "      <td>H2O2Hi</td>\n",
+              "      <td>NaN</td>\n",
+              "      <td>BrentLab/Hackett_2020;hackett_2020;1663</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>5</th>\n",
+              "      <td>345</td>\n",
+              "      <td>FHL1</td>\n",
+              "      <td>H2O2Hi</td>\n",
+              "      <td>NaN</td>\n",
+              "      <td>BrentLab/Hackett_2020;hackett_2020;1664</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>6</th>\n",
+              "      <td>345</td>\n",
+              "      <td>FHL1</td>\n",
+              "      <td>H2O2Hi</td>\n",
+              "      <td>NaN</td>\n",
+              "      <td>BrentLab/Hackett_2020;hackett_2020;1670</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>7</th>\n",
+              "      <td>345</td>\n",
+              "      <td>FHL1</td>\n",
+              "      <td>H2O2Hi</td>\n",
+              "      <td>NaN</td>\n",
+              "      <td>BrentLab/Hackett_2020;hackett_2020;1668</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>8</th>\n",
+              "      <td>346</td>\n",
+              "      <td>FHL1</td>\n",
+              "      <td>RAPA</td>\n",
+              "      <td>NaN</td>\n",
+              "      <td>BrentLab/Hackett_2020;hackett_2020;1667</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>9</th>\n",
+              "      <td>346</td>\n",
+              "      <td>FHL1</td>\n",
+              "      <td>RAPA</td>\n",
+              "      <td>NaN</td>\n",
+              "      <td>BrentLab/Hackett_2020;hackett_2020;1663</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>10</th>\n",
+              "      <td>346</td>\n",
+              "      <td>FHL1</td>\n",
+              "      <td>RAPA</td>\n",
+              "      <td>NaN</td>\n",
+              "      <td>BrentLab/Hackett_2020;hackett_2020;1670</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>11</th>\n",
+              "      <td>346</td>\n",
+              "      <td>FHL1</td>\n",
+              "      <td>RAPA</td>\n",
+              "      <td>NaN</td>\n",
+              "      <td>BrentLab/Hackett_2020;hackett_2020;1668</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>12</th>\n",
+              "      <td>346</td>\n",
+              "      <td>FHL1</td>\n",
+              "      <td>RAPA</td>\n",
+              "      <td>0.000000</td>\n",
+              "      <td>BrentLab/Hackett_2020;hackett_2020;1666</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>13</th>\n",
+              "      <td>346</td>\n",
+              "      <td>FHL1</td>\n",
+              "      <td>RAPA</td>\n",
+              "      <td>NaN</td>\n",
+              "      <td>BrentLab/Hackett_2020;hackett_2020;1669</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>14</th>\n",
+              "      <td>346</td>\n",
+              "      <td>FHL1</td>\n",
+              "      <td>RAPA</td>\n",
+              "      <td>NaN</td>\n",
+              "      <td>BrentLab/Hackett_2020;hackett_2020;1664</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>15</th>\n",
+              "      <td>346</td>\n",
+              "      <td>FHL1</td>\n",
+              "      <td>RAPA</td>\n",
+              "      <td>NaN</td>\n",
+              "      <td>BrentLab/Hackett_2020;hackett_2020;1665</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>16</th>\n",
+              "      <td>347</td>\n",
+              "      <td>FHL1</td>\n",
+              "      <td>SM</td>\n",
+              "      <td>NaN</td>\n",
+              "      <td>BrentLab/Hackett_2020;hackett_2020;1667</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>17</th>\n",
+              "      <td>347</td>\n",
+              "      <td>FHL1</td>\n",
+              "      <td>SM</td>\n",
+              "      <td>0.022196</td>\n",
+              "      <td>BrentLab/Hackett_2020;hackett_2020;1666</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>18</th>\n",
+              "      <td>347</td>\n",
+              "      <td>FHL1</td>\n",
+              "      <td>SM</td>\n",
+              "      <td>NaN</td>\n",
+              "      <td>BrentLab/Hackett_2020;hackett_2020;1669</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>19</th>\n",
+              "      <td>347</td>\n",
+              "      <td>FHL1</td>\n",
+              "      <td>SM</td>\n",
+              "      <td>NaN</td>\n",
+              "      <td>BrentLab/Hackett_2020;hackett_2020;1664</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>20</th>\n",
+              "      <td>347</td>\n",
+              "      <td>FHL1</td>\n",
+              "      <td>SM</td>\n",
+              "      <td>NaN</td>\n",
+              "      <td>BrentLab/Hackett_2020;hackett_2020;1663</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>21</th>\n",
+              "      <td>347</td>\n",
+              "      <td>FHL1</td>\n",
+              "      <td>SM</td>\n",
+              "      <td>NaN</td>\n",
+              "      <td>BrentLab/Hackett_2020;hackett_2020;1670</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>22</th>\n",
+              "      <td>347</td>\n",
+              "      <td>FHL1</td>\n",
+              "      <td>SM</td>\n",
+              "      <td>NaN</td>\n",
+              "      <td>BrentLab/Hackett_2020;hackett_2020;1668</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>23</th>\n",
+              "      <td>347</td>\n",
+              "      <td>FHL1</td>\n",
+              "      <td>SM</td>\n",
+              "      <td>NaN</td>\n",
+              "      <td>BrentLab/Hackett_2020;hackett_2020;1665</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>24</th>\n",
+              "      <td>348</td>\n",
+              "      <td>FHL1</td>\n",
+              "      <td>YPD</td>\n",
+              "      <td>NaN</td>\n",
+              "      <td>BrentLab/Hackett_2020;hackett_2020;1664</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>25</th>\n",
+              "      <td>348</td>\n",
+              "      <td>FHL1</td>\n",
+              "      <td>YPD</td>\n",
+              "      <td>0.089578</td>\n",
+              "      <td>BrentLab/Hackett_2020;hackett_2020;1666</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>26</th>\n",
+              "      <td>348</td>\n",
+              "      <td>FHL1</td>\n",
+              "      <td>YPD</td>\n",
+              "      <td>NaN</td>\n",
+              "      <td>BrentLab/Hackett_2020;hackett_2020;1663</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>27</th>\n",
+              "      <td>348</td>\n",
+              "      <td>FHL1</td>\n",
+              "      <td>YPD</td>\n",
+              "      <td>NaN</td>\n",
+              "      <td>BrentLab/Hackett_2020;hackett_2020;1667</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>28</th>\n",
+              "      <td>348</td>\n",
+              "      <td>FHL1</td>\n",
+              "      <td>YPD</td>\n",
+              "      <td>NaN</td>\n",
+              "      <td>BrentLab/Hackett_2020;hackett_2020;1669</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>29</th>\n",
+              "      <td>348</td>\n",
+              "      <td>FHL1</td>\n",
+              "      <td>YPD</td>\n",
+              "      <td>NaN</td>\n",
+              "      <td>BrentLab/Hackett_2020;hackett_2020;1665</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>30</th>\n",
+              "      <td>348</td>\n",
+              "      <td>FHL1</td>\n",
+              "      <td>YPD</td>\n",
+              "      <td>NaN</td>\n",
+              "      <td>BrentLab/Hackett_2020;hackett_2020;1670</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>31</th>\n",
+              "      <td>348</td>\n",
+              "      <td>FHL1</td>\n",
+              "      <td>YPD</td>\n",
+              "      <td>NaN</td>\n",
+              "      <td>BrentLab/Hackett_2020;hackett_2020;1668</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "  </tbody>\n",
+              "</table>\n",
+              "</div>"
+            ],
+            "text/plain": [
+              "    sample_id regulator_symbol condition   dto_fdr  \\\n",
+              "0         345             FHL1    H2O2Hi  0.454909   \n",
+              "1         345             FHL1    H2O2Hi       NaN   \n",
+              "2         345             FHL1    H2O2Hi       NaN   \n",
+              "3         345             FHL1    H2O2Hi       NaN   \n",
+              "4         345             FHL1    H2O2Hi       NaN   \n",
+              "5         345             FHL1    H2O2Hi       NaN   \n",
+              "6         345             FHL1    H2O2Hi       NaN   \n",
+              "7         345             FHL1    H2O2Hi       NaN   \n",
+              "8         346             FHL1      RAPA       NaN   \n",
+              "9         346             FHL1      RAPA       NaN   \n",
+              "10        346             FHL1      RAPA       NaN   \n",
+              "11        346             FHL1      RAPA       NaN   \n",
+              "12        346             FHL1      RAPA  0.000000   \n",
+              "13        346             FHL1      RAPA       NaN   \n",
+              "14        346             FHL1      RAPA       NaN   \n",
+              "15        346             FHL1      RAPA       NaN   \n",
+              "16        347             FHL1        SM       NaN   \n",
+              "17        347             FHL1        SM  0.022196   \n",
+              "18        347             FHL1        SM       NaN   \n",
+              "19        347             FHL1        SM       NaN   \n",
+              "20        347             FHL1        SM       NaN   \n",
+              "21        347             FHL1        SM       NaN   \n",
+              "22        347             FHL1        SM       NaN   \n",
+              "23        347             FHL1        SM       NaN   \n",
+              "24        348             FHL1       YPD       NaN   \n",
+              "25        348             FHL1       YPD  0.089578   \n",
+              "26        348             FHL1       YPD       NaN   \n",
+              "27        348             FHL1       YPD       NaN   \n",
+              "28        348             FHL1       YPD       NaN   \n",
+              "29        348             FHL1       YPD       NaN   \n",
+              "30        348             FHL1       YPD       NaN   \n",
+              "31        348             FHL1       YPD       NaN   \n",
+              "\n",
+              "                            perturbation_id  \\\n",
+              "0   BrentLab/Hackett_2020;hackett_2020;1666   \n",
+              "1   BrentLab/Hackett_2020;hackett_2020;1665   \n",
+              "2   BrentLab/Hackett_2020;hackett_2020;1667   \n",
+              "3   BrentLab/Hackett_2020;hackett_2020;1669   \n",
+              "4   BrentLab/Hackett_2020;hackett_2020;1663   \n",
+              "5   BrentLab/Hackett_2020;hackett_2020;1664   \n",
+              "6   BrentLab/Hackett_2020;hackett_2020;1670   \n",
+              "7   BrentLab/Hackett_2020;hackett_2020;1668   \n",
+              "8   BrentLab/Hackett_2020;hackett_2020;1667   \n",
+              "9   BrentLab/Hackett_2020;hackett_2020;1663   \n",
+              "10  BrentLab/Hackett_2020;hackett_2020;1670   \n",
+              "11  BrentLab/Hackett_2020;hackett_2020;1668   \n",
+              "12  BrentLab/Hackett_2020;hackett_2020;1666   \n",
+              "13  BrentLab/Hackett_2020;hackett_2020;1669   \n",
+              "14  BrentLab/Hackett_2020;hackett_2020;1664   \n",
+              "15  BrentLab/Hackett_2020;hackett_2020;1665   \n",
+              "16  BrentLab/Hackett_2020;hackett_2020;1667   \n",
+              "17  BrentLab/Hackett_2020;hackett_2020;1666   \n",
+              "18  BrentLab/Hackett_2020;hackett_2020;1669   \n",
+              "19  BrentLab/Hackett_2020;hackett_2020;1664   \n",
+              "20  BrentLab/Hackett_2020;hackett_2020;1663   \n",
+              "21  BrentLab/Hackett_2020;hackett_2020;1670   \n",
+              "22  BrentLab/Hackett_2020;hackett_2020;1668   \n",
+              "23  BrentLab/Hackett_2020;hackett_2020;1665   \n",
+              "24  BrentLab/Hackett_2020;hackett_2020;1664   \n",
+              "25  BrentLab/Hackett_2020;hackett_2020;1666   \n",
+              "26  BrentLab/Hackett_2020;hackett_2020;1663   \n",
+              "27  BrentLab/Hackett_2020;hackett_2020;1667   \n",
+              "28  BrentLab/Hackett_2020;hackett_2020;1669   \n",
+              "29  BrentLab/Hackett_2020;hackett_2020;1665   \n",
+              "30  BrentLab/Hackett_2020;hackett_2020;1670   \n",
+              "31  BrentLab/Hackett_2020;hackett_2020;1668   \n",
+              "\n",
+              "                              dataset_id  \n",
+              "0   BrentLab/harbison_2004/harbison_2004  \n",
+              "1   BrentLab/harbison_2004/harbison_2004  \n",
+              "2   BrentLab/harbison_2004/harbison_2004  \n",
+              "3   BrentLab/harbison_2004/harbison_2004  \n",
+              "4   BrentLab/harbison_2004/harbison_2004  \n",
+              "5   BrentLab/harbison_2004/harbison_2004  \n",
+              "6   BrentLab/harbison_2004/harbison_2004  \n",
+              "7   BrentLab/harbison_2004/harbison_2004  \n",
+              "8   BrentLab/harbison_2004/harbison_2004  \n",
+              "9   BrentLab/harbison_2004/harbison_2004  \n",
+              "10  BrentLab/harbison_2004/harbison_2004  \n",
+              "11  BrentLab/harbison_2004/harbison_2004  \n",
+              "12  BrentLab/harbison_2004/harbison_2004  \n",
+              "13  BrentLab/harbison_2004/harbison_2004  \n",
+              "14  BrentLab/harbison_2004/harbison_2004  \n",
+              "15  BrentLab/harbison_2004/harbison_2004  \n",
+              "16  BrentLab/harbison_2004/harbison_2004  \n",
+              "17  BrentLab/harbison_2004/harbison_2004  \n",
+              "18  BrentLab/harbison_2004/harbison_2004  \n",
+              "19  BrentLab/harbison_2004/harbison_2004  \n",
+              "20  BrentLab/harbison_2004/harbison_2004  \n",
+              "21  BrentLab/harbison_2004/harbison_2004  \n",
+              "22  BrentLab/harbison_2004/harbison_2004  \n",
+              "23  BrentLab/harbison_2004/harbison_2004  \n",
+              "24  BrentLab/harbison_2004/harbison_2004  \n",
+              "25  BrentLab/harbison_2004/harbison_2004  \n",
+              "26  BrentLab/harbison_2004/harbison_2004  \n",
+              "27  BrentLab/harbison_2004/harbison_2004  \n",
+              "28  BrentLab/harbison_2004/harbison_2004  \n",
+              "29  BrentLab/harbison_2004/harbison_2004  \n",
+              "30  BrentLab/harbison_2004/harbison_2004  \n",
+              "31  BrentLab/harbison_2004/harbison_2004  "
+            ]
+          },
+          "execution_count": 24,
+          "metadata": {},
+          "output_type": "execute_result"
+        }
+      ],
+      "source": [
+        "# Query harbison_2004 binding data enriched with DTO metrics\n",
+        "# This demonstrates field-based joins: requesting dto_fdr field\n",
+        "# while querying the primary binding dataset\n",
+        "\n",
+        "binding_with_dto = vdb.query(\n",
+        "    datasets=[(\"BrentLab/harbison_2004\", \"harbison_2004\")],\n",
+        "    filters={\"regulator_symbol\": \"FHL1\"},\n",
+        "    fields=[\"sample_id\", \"regulator_symbol\", \"condition\", \"dto_fdr\", \"binding_id\", \"perturbation_id\"],\n",
+        ")\n",
+        "\n",
+        "print(f\"Found {len(binding_with_dto)} FHL1 binding measurements\")\n",
+        "print(f\"\\nColumns: {list(binding_with_dto.columns)}\")\n",
+        "print(f\"\\nRows with DTO data: {binding_with_dto['dto_fdr'].notna().sum()}\")\n",
+        "print(f\"\\nFirst few results:\")\n",
+        "binding_with_dto"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "Fetching 6 files: 100%|██████████| 6/6 [00:00<00:00, 122760.12it/s]\n",
+            "Fetching 6 files: 100%|██████████| 6/6 [00:00<00:00, 35951.18it/s]\n"
+          ]
+        },
+        {
+          "data": {
+            "application/vnd.microsoft.datawrangler.viewer.v0+json": {
+              "columns": [
+                {
+                  "name": "index",
+                  "rawType": "int64",
+                  "type": "integer"
+                },
+                {
+                  "name": "sample_id",
+                  "rawType": "int32",
+                  "type": "integer"
+                },
+                {
+                  "name": "regulator_symbol",
+                  "rawType": "object",
+                  "type": "string"
+                },
+                {
+                  "name": "perturbation_id",
+                  "rawType": "object",
+                  "type": "string"
+                },
+                {
+                  "name": "dto_empirical_pvalue",
+                  "rawType": "float64",
+                  "type": "float"
+                },
+                {
+                  "name": "dataset_id",
+                  "rawType": "object",
+                  "type": "string"
+                }
+              ],
+              "ref": "f666fc22-ce67-46fc-80bb-c44baafdf799",
+              "rows": [
+                [
+                  "0",
+                  "347",
+                  "FHL1",
+                  "BrentLab/Hackett_2020;hackett_2020;1666",
+                  "0.297",
+                  "BrentLab/harbison_2004/harbison_2004"
+                ]
+              ],
+              "shape": {
+                "columns": 5,
+                "rows": 1
+              }
+            },
+            "text/html": [
+              "<div>\n",
+              "<style scoped>\n",
+              "    .dataframe tbody tr th:only-of-type {\n",
+              "        vertical-align: middle;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe tbody tr th {\n",
+              "        vertical-align: top;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe thead th {\n",
+              "        text-align: right;\n",
+              "    }\n",
+              "</style>\n",
+              "<table border=\"1\" class=\"dataframe\">\n",
+              "  <thead>\n",
+              "    <tr style=\"text-align: right;\">\n",
+              "      <th></th>\n",
+              "      <th>sample_id</th>\n",
+              "      <th>regulator_symbol</th>\n",
+              "      <th>perturbation_id</th>\n",
+              "      <th>dto_empirical_pvalue</th>\n",
+              "      <th>dataset_id</th>\n",
+              "    </tr>\n",
+              "  </thead>\n",
+              "  <tbody>\n",
+              "    <tr>\n",
+              "      <th>0</th>\n",
+              "      <td>347</td>\n",
+              "      <td>FHL1</td>\n",
+              "      <td>BrentLab/Hackett_2020;hackett_2020;1666</td>\n",
+              "      <td>0.297</td>\n",
+              "      <td>BrentLab/harbison_2004/harbison_2004</td>\n",
+              "    </tr>\n",
+              "  </tbody>\n",
+              "</table>\n",
+              "</div>"
+            ],
+            "text/plain": [
+              "   sample_id regulator_symbol                          perturbation_id  \\\n",
+              "0        347             FHL1  BrentLab/Hackett_2020;hackett_2020;1666   \n",
+              "\n",
+              "   dto_empirical_pvalue                            dataset_id  \n",
+              "0                 0.297  BrentLab/harbison_2004/harbison_2004  "
+            ]
+          },
+          "execution_count": 34,
+          "metadata": {},
+          "output_type": "execute_result"
+        }
+      ],
+      "source": [
+        "# You can also filter on comparative dataset fields\n",
+        "# This returns only binding measurements with significant DTO results\n",
+        "\n",
+        "significant_dtos = vdb.query(\n",
+        "    datasets=[(\"BrentLab/harbison_2004\", \"harbison_2004\")],\n",
+        "    filters={\n",
+        "        \"regulator_symbol\": \"FHL1\",\n",
+        "        # the threshold is high here b/c FHL1 didn't have significant results in harbison\n",
+        "        \"dto_empirical_pvalue\": (\"<\", 0.5)\n",
+        "    },\n",
+        "    fields=[\"sample_id\", \"regulator_symbol\", \"target_symbol\", \"perturbation_id\", \"dto_empirical_pvalue\"],\n",
+        ")\n",
+        "\n",
+        "significant_dtos"
+      ]
+    }
+  ],
+  "metadata": {
+    "kernelspec": {
+      "display_name": "tfbpapi-py3.11",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.11.9"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 4
+}
diff --git a/docs/virtual_database_concepts.md b/docs/virtual_database_concepts.md
new file mode 100644
index 0000000..55fe6c9
--- /dev/null
+++ b/docs/virtual_database_concepts.md
@@ -0,0 +1,525 @@
+# Virtual Database
+
+VirtualDB provides a unified query interface across heterogeneous datasets with
+different experimental condition structures and terminologies. Each dataset
+defines experimental conditions in its own way, with properties stored at
+different hierarchy levels (repository, dataset, or field) and using different
+naming conventions. VirtualDB uses an external YAML configuration to map these
+varying structures to a common schema, normalize factor level names (e.g.,
+"D-glucose", "dextrose", "glu" all become "glucose"), and enable cross-dataset
+queries with standardized field names and values.
+
+## Configuration Structure
+
+This is a basic example of a VirtualDB configuration YAML file:
+
+```yaml
+repositories:
+  # Each repository defines a "table" in the virtual database
+  BrentLab/harbison_2004:
+    # REQUIRED: Specify which field is the sample identifier. At this level, it means
+    # that all datasets have a field `sample_id` that uniquely identifies samples.
+    sample_id:
+      field: sample_id
+    # Repository-wide properties (apply to all datasets in this repository)
+    nitrogen_source:
+      path: media.nitrogen_source.name
+
+    dataset:
+      # Each dataset gets its own view with standardized fields
+      harbison_2004:
+        # Dataset-specific properties (constant for all samples)
+        phosphate_source:
+          path: media.phosphate_source.compound
+
+        # Field-level properties (vary per sample)
+        carbon_source:
+          field: condition
+          path: media.carbon_source.compound
+          dtype: string  # Optional: specify data type
+
+        # Field without path (column alias with normalization)
+        environmental_condition:
+          field: condition
+
+        # if there is a `comparative_analysis` dataset that you want to link to
+        # a given dataset, you can declare it at the dataset level
+        # For more information on this section, see the section
+        # 'Comparative Datasets in VirtualDB'
+        comparative_analyses:
+          # specify the comparative analysis repo
+          - repo: BrentLab/yeast_comparative_analysis
+            # and dataset
+            dataset: dto
+            # and the field in the comparative analysis that links back tot this
+            # dataset. Note that this field should have role `source_sample`, and it
+            # should therefore be formated as `repo_id;config_name;sample_id` where the
+            # sample_id is derived from the field in this dataset that is specified
+            # for this dataset in the `sample_id` field above.
+            via_field: perturbation_id
+
+  BrentLab/kemmeren_2014:
+    dataset:
+      kemmeren_2014:
+        # REQUIRED: If `sample_id` isn't defined at the repo level, then it must be
+        # defined at the dataset level for each dataset in the repo
+        sample_id:
+          field: sample_id
+        # Same logical fields, different physical paths
+        carbon_source:
+          path: media.carbon_source.compound
+          dtype: string
+        temperature_celsius:
+          path: temperature_celsius
+          dtype: numeric  # Enables numeric filtering with comparison operators
+
+# ===== Normalization Rules =====
+# Map varying terminologies to standardized values
+factor_aliases:
+  carbon_source:
+    glucose: [D-glucose, glu, dextrose]
+    galactose: [D-galactose, gal]
+
+# Handle missing values with defaults
+missing_value_labels:
+  carbon_source: "unspecified"
+
+# ===== Documentation =====
+description:
+  carbon_source: The carbon source provided to the cells during growth
+```
+
+### Property Hierarchy
+
+Properties are extracted at three hierarchy levels:
+
+1. **Repository-wide**: Common to all datasets in a repository
+   - Paths relative to repository-level `experimental_conditions`
+   - Example: `path: media.nitrogen_source.name`
+
+2. **Dataset-specific**: Specific to one dataset configuration
+   - Paths relative to config-level `experimental_conditions`
+   - Example: `path: media.phosphate_source.compound`
+
+3. **Field-level**: Vary per sample, defined in field definitions
+   - `field` specifies which field to extract from
+   - `path` relative to field definitions (not `experimental_conditions`)
+   - Example: `field: condition, path: media.carbon_source.compound`
+
+**Special case**: Field without path creates a column alias
+- `field: condition` (no path) to renames `condition` column, enables normalization
+
+### Path Resolution
+
+Paths use dot notation to navigate nested structures:
+
+**Repository/Dataset-level** (automatically prepends `experimental_conditions.`):
+- `path: temperature_celsius` to `experimental_conditions.temperature_celsius`
+- `path: media.carbon_source.compound` to
+  `experimental_conditions.media.carbon_source.compound`
+
+**Field-level** (paths relative to field definitions):
+- `field: condition, path: media.carbon_source.compound` to looks in field
+`condition`'s definitions to navigates to `media.carbon_source.compound`
+
+### Data Type Specifications
+
+Field mappings support an optional `dtype` parameter to ensure proper type handling
+during metadata extraction and query filtering.
+
+**Supported dtypes**:
+- `string` - Text data (default if not specified)
+- `numeric` - Numeric values (integers or floating-point numbers)
+- `bool` - Boolean values (true/false)
+
+**When to use dtype**:
+
+1. **Numeric filtering**: Required for fields used with comparison operators
+   (`<`, `>`, `<=`, `>=`, `between`)
+2. **Type consistency**: When source data might be extracted with incorrect type
+3. **Performance**: Helps with query optimization and prevents type mismatches
+
+**Type conversion process**:
+
+Type conversion happens during metadata extraction:
+1. Extract value from source using path
+2. Convert to specified dtype if provided
+3. Store in metadata DataFrame with correct type
+
+**Example - The problem**:
+```python
+# Without dtype: temperature extracted as string "30"
+# Comparison fails or produces incorrect results
+df = vdb.query(filters={"temperature_celsius": (">", 25)})
+# String comparison: "30" > 25 evaluates incorrectly
+```
+
+**Example - The solution**:
+```yaml
+temperature_celsius:
+  path: temperature_celsius
+  dtype: numeric  # Ensures numeric type for proper comparison
+```
+
+```python
+# With dtype: temperature extracted as numeric 30.0
+# Comparison works correctly
+df = vdb.query(filters={"temperature_celsius": (">", 25)})
+# Numeric comparison: 30.0 > 25 is True (correct!)
+```
+
+**Usage examples**:
+```yaml
+repositories:
+  BrentLab/example:
+    dataset:
+      example_dataset:
+        # String field for categorical data
+        strain_background:
+          path: strain_background
+          dtype: string
+
+        # Numeric field for quantitative filtering
+        temperature_celsius:
+          path: temperature_celsius
+          dtype: numeric
+
+        # Numeric field for concentration measurements
+        drug_concentration_um:
+          path: drug_treatment.concentration_um
+          dtype: numeric
+
+        # Boolean field
+        is_heat_shock:
+          path: is_heat_shock
+          dtype: bool
+```
+
+## VirtualDB Structure
+
+VirtualDB maintains a collection of dataset-specific metadata tables, one per
+configured dataset. Each table has the same structure (standardized schema) but
+contains data specific to that dataset.  
+
+Unless directed, these tables are not stored on desk and instead generated via
+query against the source parquet files. Think of them as a typical database view.
+
+### Internal Structure
+
+```python
+{
+    # Primary datasets with sample_id
+    ("BrentLab/harbison_2004", "harbison_2004"): DataFrame(
+        # Columns: sample_id, carbon_source, temperature_celsius, nitrogen_source, ...
+        # Values: Normalized according to factor_aliases
+        # Example rows:
+        #   sample_id       carbon_source  temperature_celsius  nitrogen_source
+        #   harbison_001    glucose        30                   yeast nitrogen base
+        #   harbison_002    galactose      30                   yeast nitrogen base
+    ),
+
+    ("BrentLab/kemmeren_2014", "kemmeren_2014"): DataFrame(
+        # Columns: sample_id, carbon_source, temperature_celsius, ...
+        # Note: Different physical source paths, same logical schema
+        # Example rows:
+        #   sample_id       carbon_source  temperature_celsius
+        #   kemmeren_001    glucose        30
+        #   kemmeren_002    raffinose      30
+    ),
+
+    # Comparative datasets with parsed composite identifiers
+    ("BrentLab/yeast_comparative_analysis", "dto"): DataFrame(
+        # Original composite ID columns preserved
+        # Columns: binding_id, perturbation_id, dto_fdr, dto_empirical_pvalue, ...
+        # Example rows:
+        #   binding_id                                           perturbation_id                               dto_fdr
+        #   BrentLab/harbison_2004;harbison_2004;harbison_001   BrentLab/kemmeren_2014;kemmeren_2014;sample_42  0.001
+        #   BrentLab/harbison_2004;harbison_2004;harbison_002   BrentLab/kemmeren_2014;kemmeren_2014;sample_43  0.045
+        #
+        # When materialized with foreign keys, additional parsed columns are created:
+        # Columns: binding_id, binding_repo_id, binding_config_name, binding_sample_id,
+        #          perturbation_id, perturbation_repo_id, perturbation_config_name, perturbation_sample_id,
+        #          dto_fdr, dto_empirical_pvalue, ...
+        # Example rows:
+        #   binding_repo_id         binding_config_name  binding_sample_id  dto_fdr
+        #   BrentLab/harbison_2004  harbison_2004        harbison_001       0.001
+        #   BrentLab/harbison_2004  harbison_2004        harbison_002       0.045
+    )
+}
+```
+
+### View Materialization
+
+Tables can be cached for faster subsequent queries via materialization:
+
+```python
+# Cache all views for faster subsequent queries
+vdb.materialize_views()
+
+# Cache specific datasets
+vdb.materialize([("BrentLab/harbison_2004", "harbison_2004")])
+
+# Invalidate cache (e.g., after data updates)
+vdb.invalidate_cache()
+vdb.invalidate_cache([("BrentLab/harbison_2004", "harbison_2004")])
+```
+
+Materialized views are stored locally and reused for queries.
+
+## VirtualDB Interface
+
+### Schema Discovery
+
+**List all queryable fields**:
+```python
+from tfbpapi.virtual_db import VirtualDB
+
+vdb = VirtualDB("config.yaml")
+
+# All fields defined in any dataset
+fields = vdb.get_fields()
+# ["carbon_source", "temperature_celsius", "nitrogen_source", "phosphate_source", ...]
+
+# Fields present in ALL datasets (common fields)
+common = vdb.get_common_fields()
+# ["carbon_source", "temperature_celsius"]
+
+# Fields for specific dataset
+dataset_fields = vdb.get_fields("BrentLab/harbison_2004", "harbison_2004")
+# ["carbon_source", "temperature_celsius", "nitrogen_source", "phosphate_source"]
+```
+
+**Discover valid values for fields**:
+```python
+# Unique values across all datasets (normalized)
+values = vdb.get_unique_values("carbon_source")
+# ["glucose", "galactose", "raffinose", "unspecified"]
+
+# Values broken down by dataset
+values_by_dataset = vdb.get_unique_values("carbon_source", by_dataset=True)
+# {
+#     "BrentLab/harbison_2004": ["glucose", "galactose"],
+#     "BrentLab/kemmeren_2014": ["glucose", "raffinose"]
+# }
+```
+
+### Querying Data
+
+The `query()` method is the primary interface for retrieving data from VirtualDB.
+
+**Basic usage** (sample-level, all fields):
+```python
+# Query across all configured datasets
+# Returns one row per sample with all configured fields
+df = vdb.query(filters={"carbon_source": "glucose"})
+# DataFrame: sample_id, carbon_source, temperature_celsius, nitrogen_source, ...
+```
+
+**Query specific datasets**:
+```python
+# Limit query to specific datasets
+df = vdb.query(
+    filters={"carbon_source": "glucose", "temperature_celsius": 30},
+    datasets=[("BrentLab/harbison_2004", "harbison_2004")]
+)
+```
+
+**Select specific fields**:
+```python
+# Return only specified fields
+df = vdb.query(
+    filters={"carbon_source": "glucose"},
+    fields=["sample_id", "carbon_source", "temperature_celsius"]
+)
+# DataFrame: sample_id, carbon_source, temperature_celsius
+```
+
+**Complete data** (measurement-level):
+```python
+# Set complete=True to get all measurements, not just sample-level
+# Returns many rows per sample (one per target/feature/coordinate)
+df = vdb.query(
+    filters={"carbon_source": "glucose"},
+    complete=True
+)
+# DataFrame: sample_id, target, value, carbon_source, temperature_celsius, ...
+# For annotated_features: target-level data for all matching samples
+# For genome_map: coordinate-level data for all matching samples
+
+# Can combine with field selection
+df = vdb.query(
+    filters={"carbon_source": "glucose"},
+    fields=["sample_id", "target", "effect"],
+    complete=True
+)
+# DataFrame: sample_id, target, effect
+```
+
+### Factor Alias Expansion
+
+When querying with aliased values, VirtualDB automatically expands to all
+original values specified in the configuration:
+
+```python
+# User queries for normalized value
+df = vdb.query(filters={"carbon_source": "galactose"})
+
+# Internally expands to all aliases
+# WHERE carbon_source IN ('D-galactose', 'gal', 'galactose')
+```
+
+### Numeric Field Filtering
+
+Numeric fields support exact matching and range queries:
+
+```python
+# Exact match
+df = vdb.query(filters={"temperature_celsius": 30})
+
+# Range queries
+df = vdb.query(filters={"temperature_celsius": (">=", 28)})
+# inclusive of the boundaries, ie [28, 32]
+df = vdb.query(filters={"temperature_celsius": ("between", 28, 32)})
+
+# Missing value labels. This analogous to how factor_aliases work. In this case, it
+# will return where the temprature_celsius is missing/None/Null/NaN/etc and/or the
+# value matches the specified label, in this case "room". If the missing value label
+# is a character value and the field is a numeric field, then only missing values will
+# be matched.
+df = vdb.query(filters={"temperature_celsius": "room"})
+# Matches samples where temperature is None/missing
+```
+
+## Comparative Datasets in VirtualDB
+
+Comparative datasets differ from other dataset types in that they represent
+relationships between samples across datasets rather than individual samples.
+Each row relates 2+ samples from other datasets.
+
+### Structure
+
+Comparative datasets use `source_sample` fields instead of a single `sample_id`:
+- Multiple fields with `role: source_sample`
+- Each contains composite identifier: `"repo_id;config_name;sample_id"`
+- Example: `binding_id = "BrentLab/harbison_2004;harbison_2004;42"`
+
+### Querying Comparative Data
+
+Comparative datasets can be queried in two ways: **direct queries** for analysis
+results, and **field-based queries** to enrich primary dataset queries with
+comparative metrics.
+
+#### Direct Queries
+
+Query the comparative dataset directly to find analysis results:
+
+```python
+# Find significant DTO results across all experiments
+dto_results = vdb.query(
+    datasets=[("BrentLab/yeast_comparative_analysis", "dto")],
+    filters={"dto_fdr": ("<", 0.05)},
+    complete=True
+)
+# Returns: binding_id, perturbation_id, dto_fdr, dto_empirical_pvalue,
+#          binding_rank_threshold, perturbation_rank_threshold, ...
+
+# Filter by source dataset
+dto_for_harbison = vdb.query(
+    datasets=[("BrentLab/yeast_comparative_analysis", "dto")],
+    filters={"binding_id": ("contains", "harbison_2004")},
+    complete=True
+)
+
+# Combine filters on both metrics and source samples
+high_quality_dto = vdb.query(
+    datasets=[("BrentLab/yeast_comparative_analysis", "dto")],
+    filters={
+        "dto_fdr": ("<", 0.01),
+        "binding_id": ("contains", "callingcards")
+    },
+    complete=True
+)
+```
+
+#### Field-based Queries
+
+```python
+# Query binding data, automatically include DTO metrics
+binding_with_dto = vdb.query(
+    datasets=[("BrentLab/callingcards", "annotated_features")],
+    filters={"regulator_locus_tag": "YJR060W"},
+    fields=["sample_id", "target_locus_tag", "binding_score", "dto_fdr"],
+    complete=True
+)
+# Returns binding data WITH dto_fdr joined automatically via composite ID
+
+# Query perturbation data, include derived significance field
+perturbation_with_significance = vdb.query(
+    datasets=[("BrentLab/hackett_2020", "hackett_2020")],
+    filters={"regulator_locus_tag": "YJR060W"},
+    fields=["sample_id", "target_locus_tag", "log2fc", "is_significant"],
+    complete=True
+)
+# Returns perturbation data WITH is_significant (computed from dto_fdr < 0.05)
+```
+
+### Configuration
+
+Comparative datasets work differently - 
+**primary datasets declare which comparative datasets reference them**:
+
+```yaml
+repositories:
+  # Primary dataset (e.g., binding data)
+  BrentLab/callingcards:
+    dataset:
+      annotated_features:
+        # REQUIRED: Specify which field is the sample identifier
+        sample_id:
+          field: sample_id
+
+        # OPTIONAL: Declare comparative analyses that include this dataset
+        comparative_analyses:
+          - repo: BrentLab/yeast_comparative_analysis
+            dataset: dto
+            via_field: binding_id
+            # VirtualDB knows composite format: "BrentLab/callingcards;annotated_features;<sample_id>"
+
+        # Regular fields
+        regulator_locus_tag:
+          field: regulator_locus_tag
+        # ... other fields
+
+  # Another primary dataset (e.g., perturbation data)
+  BrentLab/hu_2007_reimand_2010:
+    dataset:
+      data:
+        sample_id:
+          field: sample_id
+
+        comparative_analyses:
+          - repo: BrentLab/yeast_comparative_analysis
+            dataset: dto
+            via_field: perturbation_id
+
+        # Regular fields
+        # ... other fields
+
+  # Comparative dataset - OPTIONAL field mappings for renaming/aliasing
+  BrentLab/yeast_comparative_analysis:
+    dataset:
+      dto:
+        # Optional: Rename fields for clarity or add derived columns
+        fdr:
+          field: dto_fdr  # Rename dto_fdr to fdr
+
+        empirical_pvalue:
+          field: dto_empirical_pvalue  # Rename for consistency
+
+        is_significant:
+          # Derived field: computed from dto_fdr
+          expression: "dto_fdr < 0.05"
+```
+
+## See Also
+- [DataCard Documentation](huggingface_datacard.md)
diff --git a/docs/virtual_db.md b/docs/virtual_db.md
new file mode 100644
index 0000000..8fe590e
--- /dev/null
+++ b/docs/virtual_db.md
@@ -0,0 +1,21 @@
+# VirtualDB
+
+::: tfbpapi.virtual_db.VirtualDB
+    options:
+      show_root_heading: true
+      show_source: true
+
+## Helper Functions
+
+::: tfbpapi.virtual_db.get_nested_value
+    options:
+      show_root_heading: true
+
+::: tfbpapi.virtual_db.normalize_value
+    options:
+      show_root_heading: true
+
+## Usage
+
+For comprehensive usage documentation including comparative datasets, see
+[Virtual Database Concepts](virtual_database_concepts.md).
diff --git a/mkdocs.yml b/mkdocs.yml
index a28f581..0635060 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -1,45 +1,141 @@
 site_name: tfbpapi
-site_description: "A collection of objects and functions to work with calling cards sequencing tools"
+site_description: "Python API for querying and analyzing genomic datasets from HuggingFace Hub"
 site_author: "ben mueller <email@email.com>, chase mateusiak <chasem@wustl.edu>, michael brent <brent@wustl.edu>"
-site_url: "https://brentlab.github.io/tfbpapi/"
+site_url: "https://brentlab.github.io/tfbpapi"
 repo_url: "https://github.com/brentlab/tfbpapi"
-repo_name: "tfbpapi"
-edit_uri: "edit/master/docs/"
+repo_name: "brentlab/tfbpapi"
+edit_uri: "edit/main/docs/"
 watch: ['tfbpapi', 'docs']
 
 theme:
   name: material
+  palette:
+    # Palette toggle for light mode
+    - media: "(prefers-color-scheme: light)"
+      scheme: default
+      primary: indigo
+      accent: indigo
+      toggle:
+        icon: material/brightness-7
+        name: Switch to dark mode
+    # Palette toggle for dark mode
+    - media: "(prefers-color-scheme: dark)"
+      scheme: slate
+      primary: indigo
+      accent: indigo
+      toggle:
+        icon: material/brightness-4
+        name: Switch to light mode
+  features:
+    - navigation.tabs
+    - navigation.sections
+    - navigation.expand
+    - navigation.path
+    - navigation.top
+    - search.highlight
+    - search.share
+    - search.suggest
+    - content.code.copy
+    - content.code.select
+    - content.code.annotate
+    - content.action.edit
+    - content.action.view
+  icon:
+    repo: fontawesome/brands/github
+    edit: material/pencil
+    view: material/eye
 
 plugins:
-- search
-- autorefs
-- section-index
-- mkdocs-jupyter:
+  - search:
+      separator: '[\s\-,:!=\[\]()"`/]+|\.(?!\d)|&[lg]t;|(?!\b)(?=[A-Z][a-z])'
+  - autorefs
+  - section-index
+  - mkdocs-jupyter:
       remove_tag_config:
-          remove_input_tags:
-              - hide
-          remove_output_tags:
-              - hide
-- mkdocstrings:
-    handlers:
-          python:
-            paths: [tfbpapi]  # search packages in the src folder
-            merge_init_into_class: True
-            options:
-              docstring_style: 'sphinx'
+        remove_input_tags:
+          - hide
+        remove_output_tags:
+          - hide
+      execute: false
+      allow_errors: false
+  - mkdocstrings:
+      handlers:
+        python:
+          paths: [.]
+          inventories:
+            - https://docs.python.org/3/objects.inv
+            - https://numpy.org/doc/stable/objects.inv
+            - https://pandas.pydata.org/docs/objects.inv
+          options:
+            docstring_style: sphinx
+            show_source: true
+            show_root_heading: true
+            show_root_toc_entry: true
+            show_symbol_type_heading: true
+            show_symbol_type_toc: true
+            signature_crossrefs: true
 
 markdown_extensions:
+  - abbr
+  - admonition
+  - attr_list
+  - def_list
+  - footnotes
+  - md_in_html
   - smarty
+  - tables
   - toc:
-      permalink: True
+      permalink: true
+      title: On this page
   - sane_lists
   - pymdownx.arithmatex:
       generic: true
+  - pymdownx.betterem:
+      smart_enable: all
+  - pymdownx.caret
+  - pymdownx.details
+  - pymdownx.emoji:
+      emoji_generator: !!python/name:material.extensions.emoji.to_svg
+      emoji_index: !!python/name:material.extensions.emoji.twemoji
+  - pymdownx.highlight:
+      anchor_linenums: true
+      line_spans: __span
+      pygments_lang_class: true
+  - pymdownx.inlinehilite
+  - pymdownx.keys
+  - pymdownx.magiclink:
+      normalize_issue_symbols: true
+      repo_url_shorthand: true
+      user: brentlab
+      repo: tfbpapi
+  - pymdownx.mark
+  - pymdownx.smartsymbols
+  - pymdownx.snippets:
+      auto_append:
+        - includes/mkdocs.md
   - pymdownx.superfences:
       custom_fences:
         - name: mermaid
           class: mermaid
-          format: "!!python/name:pymdownx.superfences.fence_code_format"
+          format: !!python/name:pymdownx.superfences.fence_code_format
+  - pymdownx.tabbed:
+      alternate_style: true
+      combine_header_slug: true
+      slugify: !!python/object/apply:pymdownx.slugs.slugify
+        kwds:
+          case: lower
+  - pymdownx.tasklist:
+      custom_checkbox: true
+  - pymdownx.tilde
+
+extra:
+  social:
+    - icon: fontawesome/brands/github
+      link: https://github.com/brentlab/tfbpapi
+      name: GitHub Repository
+  version:
+    provider: mike
+    default: latest
 
 extra_javascript:
   - javascripts/mathjax.js
@@ -47,36 +143,29 @@ extra_javascript:
   - https://cdn.jsdelivr.net/npm/mermaid/dist/mermaid.min.js
   - js/init-mermaid.js
 
+extra_css:
+  - stylesheets/extra.css
+
 nav:
-- Home: index.md
-- Tutorials:
-  - Database Interface: tutorials/database_interface.ipynb
-  - LassoCV: tutorials/lassoCV.ipynb
-  - Interactor Modeling Workflow: tutorials/interactor_modeling_workflow.ipynb
-- API:
-  - Models:
-    - Overview: ml_models/index.md
-    - SigmoidModel: ml_models/SigmoidModel.md
-    - Lasso Modeling: ml_models/lasso_modeling.md
-  - Database Interface:
-    - Records Only Classes:
-      - interface/BindingManualQCAPI.md
-      - interface/DataSourceAPI.md
-      - interface/DtoAPI.md
-      - interface/ExpressionManualQCAPI.md
-      - interface/FileFormatAPI.md
-      - interface/GenomicFeatureAPI.md
-      - interface/RegulatorAPI.md
-    - Records and Files Classes:
-      - BindingAPI: interface/BindingAPI.md
-      - BindingConcatenatedAPI: interface/BindingConcatenatedAPI.md
-      - CallingCardsBackgroundAPI: interface/CallingCardsBackgroundAPI.md
-      - ExpressionAPI: interface/ExpressionAPI.md
-      - PromoterSetAPI: interface/PromoterSetAPI.md
-      - PromoterSetSigAPI: interface/PromoterSetSigAPI.md
-    - Developer Classes:
-      - interface/AbstractAPI.md
-      - interface/AbstractRecordsAndFilesAPI.md
-      - interface/AbstractRecordsOnlyAPI.md
-      - interface/Cache.md
-      - interface/ParamsDict.md
+  - Home: index.md
+  - Tutorials:
+    - "Getting Started":
+      - "DataCard: Exploring Datasets": tutorials/datacard_tutorial.ipynb
+      - "Cache Management": tutorials/cache_manager_tutorial.ipynb
+    - "Querying Data":
+      - "VirtualDB: Unified Cross-Dataset Queries": tutorials/virtual_db_tutorial.ipynb
+  - Concepts:
+    - "Virtual Database Design": virtual_database_concepts.md
+  - API Reference:
+    - Core:
+      - VirtualDB: virtual_db.md
+      - DataCard: datacard.md
+      - HfCacheManager: hf_cache_manager.md
+    - Models and Configuration:
+      - Pydantic Models: models.md
+      - Fetchers: fetchers.md
+    - Error Handling:
+      - Custom Exceptions: errors.md
+  - HuggingFace Configuration:
+    - HuggingFace Dataset Card Format: huggingface_datacard.md
+    - BrentLab Collection: brentlab_yeastresources_collection.md
diff --git a/pyproject.toml b/pyproject.toml
index 27e077a..e3710f2 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -9,21 +9,25 @@ readme = "README.md"
 [tool.poetry.dependencies]
 python = "^3.11"
 requests = "^2.32.3"
-aiohttp = "^3.11.18"
-cachetools = "^5.5.2"
-scikit-learn = "^1.6.1"
-requests-toolbelt = "^1.0.0"
-responses = "^0.25.7"
-aioresponses = "^0.7.8"
-numpy = "^2.2.5"
-dotenv = "^0.9.9"
 pandas = "^2.3.1"
+huggingface-hub = "^0.34.4"
+duckdb = "^1.3.2"
+pydantic = "^2.11.9"
 
 
 [tool.poetry.group.dev.dependencies]
 pytest = "^8.3.5"
-pytest-snapshot = "^0.9.0"
 pytest-asyncio = "^0.26.0"
+types-requests = "^2.32.4.20250809"
+mkdocs = "^1.6.1"
+mkdocs-material = "^9.6.19"
+mkdocs-autorefs = "^1.4.3"
+mkdocs-section-index = "^0.3.10"
+mkdocs-jupyter = "^0.25.1"
+mkdocstrings = {extras = ["python"], version = "^0.30.0"}
+matplotlib = "^3.10.6"
+seaborn = "^0.13.2"
+types-pyyaml = "^6.0.12.20250915"
 
 
 [tool.pytest.ini_options]
diff --git a/tfbpapi/AbstractAPI.py b/tfbpapi/AbstractAPI.py
deleted file mode 100644
index 19c4eb6..0000000
--- a/tfbpapi/AbstractAPI.py
+++ /dev/null
@@ -1,230 +0,0 @@
-import logging
-import os
-from abc import ABC, abstractmethod
-from collections.abc import Coroutine
-from typing import Any
-
-import pandas as pd
-import requests  # type: ignore
-
-from tfbpapi.Cache import Cache
-from tfbpapi.ParamsDict import ParamsDict
-
-
-class AbstractAPI(ABC):
-    """
-    Abstract base class for creating API clients that require token authentication.
-
-    This class provides a template for connecting to a cache for caching API responses,
-    validating parameters against a list of valid keys, and provides an interface for
-    CRUD operations.
-
-    """
-
-    def __init__(
-        self,
-        url: str = "",
-        token: str = "",
-        **kwargs,
-    ):
-        """
-        Initialize the API client.
-
-        :param url: The API endpoint URL. Defaults to the `BASE_URL`
-            environment variable.
-        :param token: The authentication token. Defaults to the `TOKEN`
-            environment variable.
-        :param valid_param_keys: A list of valid parameter keys for the API.
-        :param params: A ParamsDict object containing parameters for the API request.
-        :param cache: a Cache object for caching API responses.
-        :param kwargs: Additional keyword arguments that may be passed on to the
-            ParamsDict and Cache constructors.
-
-        """
-        self.logger = logging.getLogger(self.__class__.__name__)
-        self._token = token or os.getenv("TOKEN", "")
-        self.url = url or os.getenv("BASE_URL", "")
-        self.params = ParamsDict(
-            params=kwargs.pop("params", {}),
-            valid_keys=kwargs.pop("valid_keys", []),
-        )
-        self.cache = Cache(
-            maxsize=kwargs.pop("maxsize", 100), ttl=kwargs.pop("ttl", 300)
-        )
-
-    @property
-    def header(self) -> dict[str, str]:
-        """The HTTP authorization header."""
-        return {
-            "Authorization": f"token {self.token}",
-            "Content-Type": "application/json",
-        }
-
-    @property
-    def url(self) -> str:
-        """The URL for the API."""
-        return self._url  # type: ignore
-
-    @url.setter
-    def url(self, value: str) -> None:
-        if not value:
-            self._url = None
-        elif hasattr(self, "token") and self.token:
-            # validate the URL with the new token
-            self._is_valid_url(value)
-            self._url = value
-        else:
-            self.logger.warning("No token provided: URL un-validated")
-            self._url = value
-
-    @property
-    def token(self) -> str:
-        """The authentication token for the API."""
-        return self._token
-
-    @token.setter
-    def token(self, value: str) -> None:
-        self._token = value
-        # validate the URL with the new token
-        if hasattr(self, "url") and self.url:
-            self.logger.info("Validating URL with new token")
-            self._is_valid_url(self.url)
-
-    @property
-    def cache(self) -> Cache:
-        """The cache object for caching API responses."""
-        return self._cache
-
-    @cache.setter
-    def cache(self, value: Cache) -> None:
-        self._cache = value
-
-    @property
-    def params(self) -> ParamsDict:
-        """The ParamsDict object containing parameters for the API request."""
-        return self._params
-
-    @params.setter
-    def params(self, value: ParamsDict) -> None:
-        self._params = value
-
-    def push_params(self, params: dict[str, Any]) -> None:
-        """Adds or updates parameters in the ParamsDict."""
-        try:
-            self.params.update(params)
-        except KeyError as e:
-            self.logger.error(f"Error updating parameters: {e}")
-
-    def pop_params(self, keys: list[str] | None = None) -> None:
-        """Removes parameters from the ParamsDict."""
-        if keys is None:
-            self.params.clear()
-            return
-        if keys is not None and not isinstance(keys, list):
-            keys = [keys]
-        for key in keys:
-            del self.params[key]
-
-    @abstractmethod
-    def create(self, data: dict[str, Any], **kwargs) -> Any:
-        """Placeholder for the create method."""
-        raise NotImplementedError(
-            f"`create()` is not implemented for {self.__class__.__name__}"
-        )
-
-    @abstractmethod
-    def read(self, **kwargs) -> Any:
-        """Placeholder for the read method."""
-        raise NotImplementedError(
-            f"`read()` is not implemented for {self.__class__.__name__}"
-        )
-
-    @abstractmethod
-    def update(self, df: pd.DataFrame, **kwargs) -> Any:
-        """Placeholder for the update method."""
-        raise NotImplementedError(
-            f"`update()` is not implemented for {self.__class__.__name__}"
-        )
-
-    @abstractmethod
-    def delete(self, id: str, **kwargs) -> Any:
-        """Placeholder for the delete method."""
-        raise NotImplementedError(
-            f"`delete()` is not implemented for {self.__class__.__name__}"
-        )
-
-    @abstractmethod
-    def submit(self, post_dict: dict[str, Any], **kwargs) -> Any:
-        """Placeholder for the submit method."""
-        raise NotImplementedError(
-            f"`submit()` is not implemented for {self.__class__.__name__}"
-        )
-
-    @abstractmethod
-    def retrieve(
-        self, group_task_id: str, timeout: int, polling_interval: int, **kwargs
-    ) -> Coroutine[Any, Any, Any]:
-        """Placeholder for the retrieve method."""
-        raise NotImplementedError(
-            f"`retrieve()` is not implemented for {self.__class__.__name__}"
-        )
-
-    def _is_valid_url(self, url: str) -> None:
-        """
-        Confirms that the URL is valid and the header authorization is appropriate.
-
-        :param url: The URL to validate.
-        :type url: str
-        :raises ValueError: If the URL is invalid or the token is not set.
-
-        """
-        try:
-            # note that with allow_redirect=True the result can be a 300 status code
-            # which is not an error, and then another request to the redirected URL
-            response = requests.head(url, headers=self.header, allow_redirects=True)
-            if response.status_code != 200:
-                raise ValueError("Invalid URL or token provided. Check both.")
-        except requests.RequestException as e:
-            raise AttributeError(f"Error validating URL: {e}") from e
-        except AttributeError as e:
-            self.logger.error(f"Error validating URL: {e}")
-
-    def _cache_get(self, key: str, default: Any = None) -> Any:
-        """
-        Get a value from the cache if configured.
-
-        :param key: The key to retrieve from the cache.
-        :type key: str
-        :param default: The default value to return if the key is not found.
-        :type default: any, optional
-        :return: The value from the cache or the default value.
-        :rtype: any
-
-        """
-        return self.cache.get(key, default=default)
-
-    def _cache_set(self, key: str, value: Any) -> None:
-        """
-        Set a value in the cache if configured.
-
-        :param key: The key to set in the cache.
-        :type key: str
-        :param value: The value to set in the cache.
-        :type value: any
-
-        """
-        self.cache.set(key, value)
-
-    def _cache_list(self) -> list[str]:
-        """List keys in the cache if configured."""
-        return self.cache.list()
-
-    def _cache_delete(self, key: str) -> None:
-        """
-        Delete a key from the cache if configured.
-
-        :param key: The key to delete from the cache.
-        :type key: str
-
-        """
-        self.cache.delete(key)
diff --git a/tfbpapi/AbstractRecordsAndFilesAPI.py b/tfbpapi/AbstractRecordsAndFilesAPI.py
deleted file mode 100644
index 87f99ad..0000000
--- a/tfbpapi/AbstractRecordsAndFilesAPI.py
+++ /dev/null
@@ -1,314 +0,0 @@
-import csv
-import gzip
-import os
-import tarfile
-import tempfile
-from collections.abc import Callable
-from io import BytesIO
-from typing import Any
-
-import aiohttp
-import pandas as pd
-
-from tfbpapi.AbstractAPI import AbstractAPI
-
-
-class AbstractRecordsAndFilesAPI(AbstractAPI):
-    """
-    Abstract class to interact with both the records and the data stored in the `file`
-    field.
-
-    The return for this class must be records, against the `/export`
-    endpoint when `retrieve_files` is False. When `retrieve_files` is True, the cache
-    should be checked first. If the file doesn't exist there, it should be retrieved
-    from the database against the `/record_table_and_files` endpoint. The file should
-    be a tarball with the metadata.csv and the file associated with the record,
-    where the file is named according to the `id` field in metadata.csv. Data files
-    should be `.csv.gz`.
-
-    """
-
-    def __init__(self, **kwargs):
-        """
-        Initialize the AbstractRecordsAndFilesAPI object. This will serve as an
-        interface to an endpoint that can serve both records and files, and cache the
-        file/retrieve from the cache if it exists.
-
-        :param kwargs: parameters to pass to AbstractAPI.
-
-        """
-        self.export_url_suffix = kwargs.pop("export_url_suffix", "export")
-        self.export_files_url_suffix = kwargs.pop(
-            "export_files_url_suffix", "record_table_and_files"
-        )
-        super().__init__(**kwargs)
-
-    @property
-    def export_url_suffix(self) -> str:
-        """The URL suffix for exporting records."""
-        return self._export_url_suffix
-
-    @export_url_suffix.setter
-    def export_url_suffix(self, value: str) -> None:
-        self._export_url_suffix = value
-
-    @property
-    def export_files_url_suffix(self) -> str:
-        """The URL suffix for exporting files."""
-        return self._export_files_url_suffix
-
-    @export_files_url_suffix.setter
-    def export_files_url_suffix(self, value: str) -> None:
-        self._export_files_url_suffix = value
-
-    def _detect_delimiter(self, file_path: str, sample_size: int = 1024) -> str:
-        """
-        Detect the delimiter of a CSV file.
-
-        :param file_path: The path to the CSV file.
-        :type file_path: str
-        :param sample_size: The number of bytes to read from the file to detect the
-            delimiter. Defaults to 1024.
-        :type sample_size: int
-        :return: The delimiter of the CSV file.
-        :rtype: str
-        :raises FileNotFoundError: If the file does not exist.
-        :raises gzip.BadGzipFile: If the file is not a valid gzip file.
-        :raises _csv.Error: If the CSV sniffer cannot determine the delimiter.
-
-        """
-        try:
-            # by default, open() uses newline=False, which opens the file
-            # in universal newline mode and translates all new line characters
-            # to '\n'
-            file = (
-                gzip.open(file_path, "rt")
-                if file_path.endswith(".gz")
-                else open(file_path)
-            )
-        except FileNotFoundError as exc:
-            raise FileNotFoundError(f"File {file_path} not found.") from exc
-
-        sample = file.read(sample_size)
-
-        # In order to avoid errors in the csv sniffer, attempt to find the
-        # last newline character in the string
-        last_newline_index = sample.rfind("\n")
-        # if a newline character is found, trim the sample to the last newline
-        if last_newline_index != -1:
-            # Trim to the last complete line
-            sample = sample[:last_newline_index]
-
-        sniffer = csv.Sniffer()
-        dialect = sniffer.sniff(sample)
-        delimiter = dialect.delimiter
-
-        file.close()
-
-        return delimiter
-
-    async def read(
-        self,
-        callback: Callable[
-            [pd.DataFrame, dict[str, Any] | None, Any], Any
-        ] = lambda metadata, data, cache, **kwargs: (
-            {"metadata": metadata, "data": data}
-        ),
-        retrieve_files: bool = False,
-        **kwargs,
-    ) -> Any:
-        """
-        Retrieve data from the endpoint according to the `retrieve_files` parameter. If
-        `retrieve_files` is False, the records will be returned as a dataframe. If
-        `retrieve_files` is True, the files associated with the records will be
-        retrieved either from the local cache or from the database. Note that a user can
-        select which effect_colname and pvalue_colname is used for a genomicfile (see
-        database documentation for more details). If one or both of those are present in
-        the params, and retrieve_file is true, then that column name is added to the
-        cache_key. Eg if record 1 is being retrieved from mcisaac data with
-        effect_colname "log2_raio", then the cache_key for that data will be
-        "1_log2_ratio". The default effect colname, which is set by the database, will
-        be stored with only the record id as the cache_key.
-
-        :param callback: The function to call with the metadata. Signature must
-            include `metadata`, `data`, and `cache`.
-        :type callback: Callable[[pd.DataFrame, dict[str, Any] | None, Any], Any]
-        :param retrieve_files: Boolean. Whether to retrieve the files associated with
-            the records. Defaults to False.
-        :type retrieve_files: bool
-        :param kwargs: The following kwargs are used by the read() function. Any
-            others are passed onto the callback function
-            - timeout: The timeout for the GET request. Defaults to 120.
-
-        :return: The result of the callback function.
-        :rtype: Any
-
-        :raises ValueError: If the callback function does not have the correct
-            signature.
-        :raises aiohttp.ClientError: If there is an error in the GET request.
-        :raises pd.errors.ParserError: If there is an error reading the request
-
-        """
-        if not callable(callback) or {"metadata", "data", "cache"} - set(
-            callback.__code__.co_varnames
-        ):
-            raise ValueError(
-                "The callback must be a callable function with `metadata`, `data`, ",
-                "and `cache` as parameters.",
-            )
-
-        export_url = f"{self.url.rstrip('/')}/{self.export_url_suffix}"
-        self.logger.debug("read() export_url: %s", export_url)
-
-        timeout = aiohttp.ClientTimeout(kwargs.pop("timeout", 120))
-        async with aiohttp.ClientSession(timeout=timeout) as session:
-            try:
-                async with session.get(
-                    export_url, headers=self.header, params=self.params
-                ) as response:
-                    response.raise_for_status()
-                    content = await response.content.read()
-                    with gzip.GzipFile(fileobj=BytesIO(content)) as f:
-                        records_df = pd.read_csv(f)
-
-                    if not retrieve_files:
-                        return callback(records_df, None, self.cache, **kwargs)
-                    else:
-                        data_list = await self._retrieve_files(session, records_df)
-                        return callback(
-                            records_df,
-                            data_list,
-                            self.cache,
-                            **kwargs,
-                        )
-
-            except aiohttp.ClientError as e:
-                self.logger.error(f"Error in GET request: {e}")
-                raise
-            except pd.errors.ParserError as e:
-                self.logger.error(f"Error reading request content: {e}")
-                raise
-
-    async def _retrieve_files(
-        self, session: aiohttp.ClientSession, records_df: pd.DataFrame
-    ) -> dict[str, pd.DataFrame]:
-        """
-        Retrieve files associated with the records either from the local cache or from
-        the database.
-
-        :param session: The aiohttp ClientSession.
-        :type session: aiohttp.ClientSession
-        :param records_df: The DataFrame containing the records.
-        :type records_df: pd.DataFrame
-        :return: A dictionary where the keys are record IDs and the values are
-            DataFrames of the associated files.
-        :rtype: dict[str, pd.DataFrame]
-
-        """
-        data_list = {}
-        for record_id in records_df["id"]:
-            data_list[str(record_id)] = await self._retrieve_file(session, record_id)
-        return data_list
-
-    async def _retrieve_file(
-        self, session: aiohttp.ClientSession, record_id: int
-    ) -> pd.DataFrame:
-        """
-        Retrieve a file associated with a record either from the local cache or from the
-        database.
-
-        :param session: The aiohttp ClientSession.
-        :type session: aiohttp.ClientSession
-        :param record_id: The ID of the record.
-        :type record_id: int
-        :return: A DataFrame containing the file's data.
-        :rtype: pd.DataFrame
-        :raises FileNotFoundError: If the file is not found in the tar archive.
-        :raises ValueError: If the delimiter is not supported.
-
-        """
-        export_files_url = f"{self.url.rstrip('/')}/{self.export_files_url_suffix}"
-        self.logger.debug("_retrieve_file() export_url: %s", export_files_url)
-
-        # set key for local cache
-        cache_key = str(record_id)
-        if "effect_colname" in self.params:
-            cache_key += f"_{self.params['effect_colname']}"
-        if "pvalue_colname" in self.params:
-            cache_key += f"_{self.params['pvalue_colname']}"
-        cached_data = self._cache_get(cache_key)
-        if cached_data is not None:
-            self.logger.info(f"cache_key {cache_key} retrieved from cache.")
-            return pd.read_json(BytesIO(cached_data.encode()))
-        else:
-            self.logger.debug(f"cache_key {cache_key} not found in cache.")
-
-        try:
-            header = self.header.copy()
-            header["Content-Type"] = "application/gzip"
-            retrieve_files_params = self.params.copy()
-            retrieve_files_params.update({"id": record_id})
-            async with session.get(
-                export_files_url,
-                headers=header,
-                params=retrieve_files_params,
-                timeout=120,
-            ) as response:
-                response.raise_for_status()
-                tar_data = await response.read()
-
-            # Create a temporary file for the tarball
-            tar_file = tempfile.NamedTemporaryFile(delete=False, suffix=".tar.gz")
-            try:
-                tar_file.write(tar_data)
-                tar_file.flush()
-                tar_file.seek(0)
-
-                # Create a temporary directory for extraction
-                with tempfile.TemporaryDirectory() as extract_dir:
-                    # Open the tar file and log its contents
-                    with tarfile.open(fileobj=tar_file, mode="r:gz") as tar:
-                        tar_members = tar.getmembers()
-                        self.logger.debug(
-                            f"Tar file contains: "
-                            f"{[member.name for member in tar_members]}",
-                        )
-
-                        # Find the specific file to extract
-                        csv_filename = f"{record_id}.csv.gz"
-                        member = next(
-                            (m for m in tar_members if m.name == csv_filename), None
-                        )
-                        if member is None:
-                            raise FileNotFoundError(
-                                f"{csv_filename} not found in tar archive"
-                            )
-
-                        # Extract only the specific member
-                        tar.extract(member, path=extract_dir)
-
-                    # Read the extracted CSV file
-                    csv_path = os.path.join(extract_dir, csv_filename)
-                    self.logger.debug(f"Extracted file: {csv_path}")
-
-                    delimiter = self._detect_delimiter(csv_path)
-
-                    # raise an error if the delimiter is not a "," or a "\t"
-                    if delimiter not in [",", "\t"]:
-                        raise ValueError(
-                            f"Delimiter {delimiter} is not supported. "
-                            "Supported delimiters are ',' and '\\t'."
-                        )
-
-                    df = pd.read_csv(csv_path, delimiter=delimiter)
-
-                    # Store the data in the cache
-                    self.logger.debug(f"Storing {cache_key} in cache.")
-                    self._cache_set(cache_key, df.to_json())
-            finally:
-                os.unlink(tar_file.name)
-
-            return df
-        except Exception as e:
-            self.logger.error(f"Error retrieving file for cache_key {cache_key}: {e}")
-            raise
diff --git a/tfbpapi/AbstractRecordsOnlyAPI.py b/tfbpapi/AbstractRecordsOnlyAPI.py
deleted file mode 100644
index 1751ec7..0000000
--- a/tfbpapi/AbstractRecordsOnlyAPI.py
+++ /dev/null
@@ -1,82 +0,0 @@
-import gzip
-import logging
-from collections.abc import Callable
-from io import BytesIO
-from typing import Any
-
-import aiohttp
-import pandas as pd
-
-from tfbpapi.AbstractAPI import AbstractAPI
-
-
-class AbstractRecordsOnlyAPI(AbstractAPI):
-    """Abstract class for CRUD operations on records-only (no file storage)
-    endpoints."""
-
-    def __init__(self, **kwargs):
-        """
-        Initialize the RecordsOnlyAPI object.
-
-        :param kwargs: Additional parameters to pass to AbstractAPI.
-
-        """
-        self.logger = logging.getLogger(__name__)
-        super().__init__(**kwargs)
-
-    async def read(
-        self,
-        callback: Callable[
-            [pd.DataFrame, dict[str, Any] | None, Any], Any
-        ] = lambda metadata, data, cache, **kwargs: {
-            "metadata": metadata,
-            "data": data,
-        },
-        export_url_suffix="export",
-        **kwargs,
-    ) -> Any:
-        """
-        Retrieve data from the endpoint. The data will be returned as a dataframe. The
-        callback function must take metadata, data, and cache as parameters.
-
-        :param callback: The function to call with the data. Signature must
-            include `metadata`, `data`, and `cache` as parameters.
-        :param export_url_suffix: The URL suffix for the export endpoint. This will
-            return a response object with a csv file.
-        :param kwargs: This can be used to pass "params" to the request to use in place
-            of `self.params`. If those are passed, they will be popped off and then
-            the remaining kwargs will be passed to the callback function
-
-        """
-        if not callable(callback) or {"metadata", "data", "cache"} - set(
-            callback.__code__.co_varnames
-        ):
-            raise ValueError(
-                "The callback must be a callable function with `metadata`,",
-                "`data`, and `cache` as parameters.",
-            )
-
-        export_url = f"{self.url.rstrip('/')}/{export_url_suffix}"
-        self.logger.debug("read() export_url: %s", export_url)
-
-        async with aiohttp.ClientSession() as session:
-            try:
-                # note that the url and the export suffix are joined such that
-                # the url is stripped of any trailing slashes and the export suffix is
-                # added without a leading slash
-                async with session.get(
-                    export_url,
-                    headers=self.header,
-                    params=kwargs.pop("params", self.params),
-                ) as response:
-                    response.raise_for_status()
-                    content = await response.content.read()
-                    with gzip.GzipFile(fileobj=BytesIO(content)) as f:
-                        records_df = pd.read_csv(f)
-                    return callback(records_df, None, self.cache, **kwargs)
-            except aiohttp.ClientError as e:
-                self.logger.error(f"Error in GET request: {e}")
-                raise
-            except pd.errors.ParserError as e:
-                self.logger.error(f"Error reading request content: {e}")
-                raise
diff --git a/tfbpapi/BindingAPI.py b/tfbpapi/BindingAPI.py
deleted file mode 100644
index b766b37..0000000
--- a/tfbpapi/BindingAPI.py
+++ /dev/null
@@ -1,62 +0,0 @@
-import os
-from typing import Any
-
-import pandas as pd
-
-from tfbpapi.AbstractRecordsAndFilesAPI import (
-    AbstractRecordsAndFilesAPI,
-)
-
-
-class BindingAPI(AbstractRecordsAndFilesAPI):
-    """Class to interact with the BindingAPI endpoint."""
-
-    def __init__(self, **kwargs) -> None:
-        """
-        Initialize the BindingAPI object.
-
-        :param kwargs: parameters to pass through AbstractRecordsAndFilesAPI to
-            AbstractAPI.
-
-        """
-        valid_param_keys = kwargs.pop(
-            "valid_param_keys",
-            [
-                "id",
-                "regulator",
-                "regulator_locus_tag",
-                "regulator_symbol",
-                "batch",
-                "replicate",
-                "source",
-                "source_name",
-                "source_orig_id",
-                "strain",
-                "condition",
-                "lab",
-                "assay",
-                "workflow",
-                "data_usable",
-            ],
-        )
-
-        url = kwargs.pop("url", os.getenv("BINDING_URL", None))
-
-        super().__init__(url=url, valid_keys=valid_param_keys, **kwargs)
-
-    def create(self, data: dict[str, Any], **kwargs) -> Any:
-        raise NotImplementedError("The BindingAPI does not support create.")
-
-    def update(self, df: pd.DataFrame, **kwargs) -> Any:
-        raise NotImplementedError("The BindingAPI does not support update.")
-
-    def delete(self, id: str, **kwargs) -> Any:
-        raise NotImplementedError("The BindingAPI does not support delete.")
-
-    def submit(self, post_dict: dict[str, Any], **kwargs) -> Any:
-        raise NotImplementedError("The BindingAPI does not support submit.")
-
-    def retrieve(
-        self, group_task_id: str, timeout: int, polling_interval: int, **kwargs
-    ) -> Any:
-        raise NotImplementedError("The BindingAPI does not support retrieve.")
diff --git a/tfbpapi/BindingConcatenatedAPI.py b/tfbpapi/BindingConcatenatedAPI.py
deleted file mode 100644
index 1ad6aff..0000000
--- a/tfbpapi/BindingConcatenatedAPI.py
+++ /dev/null
@@ -1,62 +0,0 @@
-import os
-from typing import Any
-
-import pandas as pd
-
-from tfbpapi.AbstractRecordsAndFilesAPI import (
-    AbstractRecordsAndFilesAPI,
-)
-
-
-class BindingConcatenatedAPI(AbstractRecordsAndFilesAPI):
-    """Class to interact with the BindingConcatenatedAPI endpoint."""
-
-    def __init__(self, **kwargs) -> None:
-        """
-        Initialize the BindingConcatenatedAPI object.
-
-        :param kwargs: parameters to pass through AbstractRecordsAndFilesAPI to
-            AbstractAPI.
-
-        """
-        valid_param_keys = kwargs.pop(
-            "valid_param_keys",
-            [
-                "id",
-                "regulator",
-                "regulator_locus_tag",
-                "regulator_symbol",
-                "batch",
-                "replicate",
-                "source",
-                "strain",
-                "condition",
-                "lab",
-                "assay",
-                "workflow",
-                "data_usable",
-            ],
-        )
-
-        url = kwargs.pop("url", os.getenv("BINDINGCONCATENATED_URL", None))
-
-        super().__init__(url=url, valid_keys=valid_param_keys, **kwargs)
-
-    def create(self, data: dict[str, Any], **kwargs) -> Any:
-        raise NotImplementedError("The BindingConcatenatedAPI does not support create.")
-
-    def update(self, df: pd.DataFrame, **kwargs) -> Any:
-        raise NotImplementedError("The BindingConcatenatedAPI does not support update.")
-
-    def delete(self, id: str, **kwargs) -> Any:
-        raise NotImplementedError("The BindingConcatenatedAPI does not support delete.")
-
-    def submit(self, post_dict: dict[str, Any], **kwargs) -> Any:
-        raise NotImplementedError("The BindingConcatenatedAPI does not support submit.")
-
-    def retrieve(
-        self, group_task_id: str, timeout: int, polling_interval: int, **kwargs
-    ) -> Any:
-        raise NotImplementedError(
-            "The BindingConcatenatedAPI does not support retrieve."
-        )
diff --git a/tfbpapi/BindingManualQCAPI.py b/tfbpapi/BindingManualQCAPI.py
deleted file mode 100644
index df4169b..0000000
--- a/tfbpapi/BindingManualQCAPI.py
+++ /dev/null
@@ -1,106 +0,0 @@
-import os
-from typing import Any
-
-import pandas as pd
-import requests  # type: ignore
-
-from tfbpapi.AbstractRecordsOnlyAPI import AbstractRecordsOnlyAPI
-
-
-class BindingManualQCAPI(AbstractRecordsOnlyAPI):
-    """A class to interact with the BindingManualQCAPI endpoint."""
-
-    def __init__(self, **kwargs):
-        """
-        Initialize the BindingManualQCAPI object.
-
-        :param kwargs: parameters to pass to AbstractAPI via AbstractRecordsOnlyAPI.
-
-        """
-        valid_param_keys = kwargs.pop(
-            "valid_param_keys",
-            [
-                "id",
-                "binding",
-                "best_datatype",
-                "data_usable",
-                "passing_replicate",
-                "rank_recall",
-                "regulator",
-                "regulator_locus_tag",
-                "regulator_symbol",
-                "batch",
-                "source",
-            ],
-        )
-
-        url = kwargs.pop("url", os.getenv("BINDINGMANUALQC_URL", None))
-        if not url:
-            raise AttributeError(
-                "url must be provided or the environmental variable ",
-                "`BINDINGMANUALQC_URL` must be set",
-            )
-
-        self.bulk_update_url_suffix = kwargs.pop(
-            "bulk_update_url_suffix", "bulk-update"
-        )
-
-        super().__init__(url=url, valid_param_keys=valid_param_keys, **kwargs)
-
-    @property
-    def bulk_update_url_suffix(self) -> str:
-        """The URL suffix for updating multiple records in the same request."""
-        return self._bulk_update_url_suffix
-
-    @bulk_update_url_suffix.setter
-    def bulk_update_url_suffix(self, value: str) -> None:
-        self._bulk_update_url_suffix = value
-
-    def update(self, df: pd.DataFrame, **kwargs: Any) -> requests.Response:
-        """
-        Update the records in the database.
-
-        :param df: The DataFrame containing the records to update.
-        :type df: pd.DataFrame
-        :param kwargs: Additional fields to include in the payload.
-        :type kwargs: Any
-        :return: The response from the POST request.
-        :rtype: requests.Response
-        :raises requests.RequestException: If the request fails.
-
-        """
-        bulk_update_url = (
-            f"{self.url.rstrip('/')}/{self.bulk_update_url_suffix.rstrip('/')}/"
-        )
-
-        self.logger.debug("bulk_update_url: %s", bulk_update_url)
-
-        # Include additional fields in the payload if provided
-        payload = {"data": df.to_dict(orient="records")}
-        payload.update(kwargs)
-
-        try:
-            response = requests.post(
-                bulk_update_url,
-                headers=self.header,
-                json=payload,
-            )
-            response.raise_for_status()
-            return response
-        except requests.RequestException as e:
-            self.logger.error(f"Error in POST request: {e}")
-            raise
-
-    def create(self, data: dict[str, Any], **kwargs) -> Any:
-        raise NotImplementedError("The BindingManualQCAPI does not support create.")
-
-    def delete(self, id: str, **kwargs) -> Any:
-        raise NotImplementedError("The BindingManualQCAPI does not support delete.")
-
-    def submit(self, post_dict: dict[str, Any], **kwargs) -> Any:
-        raise NotImplementedError("The BindingManualQCAPI does not support submit.")
-
-    def retrieve(
-        self, group_task_id: str, timeout: int, polling_interval: int, **kwargs
-    ) -> Any:
-        raise NotImplementedError("The BindingManualQCAPI does not support retrieve.")
diff --git a/tfbpapi/Cache.py b/tfbpapi/Cache.py
deleted file mode 100644
index 366604d..0000000
--- a/tfbpapi/Cache.py
+++ /dev/null
@@ -1,29 +0,0 @@
-import logging
-from typing import Any
-
-from cachetools import TTLCache  # type: ignore
-
-
-class Cache:
-    """A caching class that uses cachetools for TTL caching with an LRU eviction
-    policy."""
-
-    def __init__(self, maxsize: int = 100, ttl: int = 300):
-        self.ttl_cache = TTLCache(maxsize=maxsize, ttl=ttl)
-        self.logger = logging.getLogger(__name__)
-
-    def get(self, key: str, default: Any = None) -> Any:
-        """Get a value from the cache."""
-        return self.ttl_cache.get(key, default)
-
-    def set(self, key: str, value: Any) -> None:
-        """Set a value in the cache."""
-        self.ttl_cache[key] = value
-
-    def list(self) -> list[str]:
-        """List all keys in the cache."""
-        return list(self.ttl_cache.keys())
-
-    def delete(self, key: str) -> None:
-        """Delete a key from the cache."""
-        self.ttl_cache.pop(key, None)
diff --git a/tfbpapi/CallingCardsBackgroundAPI.py b/tfbpapi/CallingCardsBackgroundAPI.py
deleted file mode 100644
index f5b7668..0000000
--- a/tfbpapi/CallingCardsBackgroundAPI.py
+++ /dev/null
@@ -1,56 +0,0 @@
-import os
-from typing import Any
-
-import pandas as pd
-
-from tfbpapi.AbstractRecordsAndFilesAPI import (
-    AbstractRecordsAndFilesAPI,
-)
-
-
-class CallingCardsBackgroundAPI(AbstractRecordsAndFilesAPI):
-    """Class to interact with the CallingCardsBackgroundAPI endpoint."""
-
-    def __init__(self, **kwargs) -> None:
-        """
-        Initialize the CallingCardsBackgroundAPI object.
-
-        :param kwargs: parameters to pass through AbstractRecordsAndFilesAPI to
-            AbstractAPI.
-
-        """
-        valid_param_keys = kwargs.pop(
-            "valid_param_keys",
-            ["id", "name"],
-        )
-
-        url = kwargs.pop("url", os.getenv("CALLINGCARDSBACKGROUND_URL", None))
-
-        super().__init__(url=url, valid_keys=valid_param_keys, **kwargs)
-
-    def create(self, data: dict[str, Any], **kwargs) -> Any:
-        raise NotImplementedError(
-            "The CallingCardsBackgroundAPI does not support create."
-        )
-
-    def update(self, df: pd.DataFrame, **kwargs) -> Any:
-        raise NotImplementedError(
-            "The CallingCardsBackgroundAPI does not support update."
-        )
-
-    def delete(self, id: str, **kwargs) -> Any:
-        raise NotImplementedError(
-            "The CallingCardsBackgroundAPI does not support delete."
-        )
-
-    def submit(self, post_dict: dict[str, Any], **kwargs) -> Any:
-        raise NotImplementedError(
-            "The CallingCardsBackgroundAPI does not support submit."
-        )
-
-    def retrieve(
-        self, group_task_id: str, timeout: int, polling_interval: int, **kwargs
-    ) -> Any:
-        raise NotImplementedError(
-            "The CallingCardsBackgroundAPI does not support retrieve."
-        )
diff --git a/tfbpapi/DataSourceAPI.py b/tfbpapi/DataSourceAPI.py
deleted file mode 100644
index 0d00785..0000000
--- a/tfbpapi/DataSourceAPI.py
+++ /dev/null
@@ -1,48 +0,0 @@
-import os
-from typing import Any
-
-import pandas as pd
-
-from tfbpapi.AbstractRecordsOnlyAPI import AbstractRecordsOnlyAPI
-
-
-class DataSourceAPI(AbstractRecordsOnlyAPI):
-    """A class to interact with the DataSourceAPI endpoint."""
-
-    def __init__(self, **kwargs):
-        """
-        Initialize the DataSourceAPI object.
-
-        :param kwargs: parameters to pass to AbstractAPI via AbstractRecordsOnlyAPI.
-
-        """
-        valid_param_keys = kwargs.pop(
-            "valid_param_keys",
-            ["id", "fileformat_id", "fileformat", "lab", "assay", "workflow"],
-        )
-
-        url = kwargs.pop("url", os.getenv("DATASOURCE_URL", None))
-        if not url:
-            raise AttributeError(
-                "url must be provided or the environmental variable ",
-                "`DATASOURCE_URL` must be set",
-            )
-
-        super().__init__(url=url, valid_keys=valid_param_keys, **kwargs)
-
-    def create(self, data: dict[str, Any], **kwargs) -> Any:
-        raise NotImplementedError("The DataSourceAPI does not support create.")
-
-    def update(self, df: pd.DataFrame, **kwargs) -> Any:
-        raise NotImplementedError("The DataSourceAPI does not support update.")
-
-    def delete(self, id: str, **kwargs) -> Any:
-        raise NotImplementedError("The DataSourceAPI does not support delete.")
-
-    def submit(self, post_dict: dict[str, Any], **kwargs) -> Any:
-        raise NotImplementedError("The DataSourceAPI does not support submit.")
-
-    def retrieve(
-        self, group_task_id: str, timeout: int, polling_interval: int, **kwargs
-    ) -> Any:
-        raise NotImplementedError("The DataSourceAPI does not support retrieve.")
diff --git a/tfbpapi/DtoAPI.py b/tfbpapi/DtoAPI.py
deleted file mode 100644
index bc8d404..0000000
--- a/tfbpapi/DtoAPI.py
+++ /dev/null
@@ -1,295 +0,0 @@
-import asyncio
-import json
-import os
-import time
-from typing import Any
-
-import aiohttp
-import pandas as pd
-import requests  # type: ignore
-
-from tfbpapi.AbstractRecordsOnlyAPI import AbstractRecordsOnlyAPI
-
-
-class DtoAPI(AbstractRecordsOnlyAPI):
-    """
-    A class to interact with the DTO API.
-
-    Retrieves dto data from the database.
-
-    """
-
-    def __init__(self, **kwargs) -> None:
-        """
-        Initialize the DTO object. This will serve as an interface to the DTO endpoint
-        of both the database and the application cache.
-
-        :param url: The URL of the DTO API
-        :param kwargs: Additional parameters to pass to AbstractAPI.
-
-        """
-
-        self.bulk_update_url_suffix = kwargs.pop(
-            "bulk_update_url_suffix", "bulk-update"
-        )
-
-        super().__init__(
-            url=kwargs.pop("url", os.getenv("DTO_URL", "")),
-            **kwargs,
-        )
-
-    async def read(self, *args, **kwargs) -> Any:
-        """
-        Override the read() method to use a custom callback that parses metadata.
-
-        :param callback: The function to call with the metadata. Defaults to parsing
-            metadata.
-        :type callback: Callable[[pd.DataFrame, dict[str, Any] | None, Any], Any]
-        :return: The result of the callback function.
-        :rtype: Any
-
-        """
-
-        # Define the default callback
-        def dto_callback(metadata, data, cache, **kwargs):
-            return {"metadata": self.parse_metadata(metadata), "data": data}
-
-        # Explicitly set the callback argument to dto_callback
-        kwargs["callback"] = dto_callback
-
-        # Call the superclass method with updated kwargs
-        return await super().read(*args, **kwargs)
-
-    async def submit(
-        self,
-        post_dict: dict[str, Any],
-        **kwargs,
-    ) -> Any:
-        """
-        Submit a DTO task to the DTO API.
-
-        :param post_dict: The dictionary to submit to the DTO API. The typing needs to
-            be adjusted -- it can take a list of dictionaries to submit a batch.
-        :return: The group_task_id of the submitted task.
-
-        """
-        # make a post request with the post_dict to dto_url
-        dto_url = f"{self.url.rstrip('/')}/submit/"
-        self.logger.debug("dto_url: %s", dto_url)
-
-        async with aiohttp.ClientSession() as session:
-            async with session.post(
-                dto_url, headers=self.header, json=post_dict
-            ) as response:
-                try:
-                    response.raise_for_status()
-                except aiohttp.ClientResponseError as e:
-                    self.logger.error(
-                        "Failed to submit DTO task: Status %s, Reason %s",
-                        e.status,
-                        e.message,
-                    )
-                    raise
-                result = await response.json()
-                try:
-                    return result["group_task_id"]
-                except KeyError:
-                    self.logger.error(
-                        "Expected 'group_task_id' in response: %s", json.dumps(result)
-                    )
-                    raise
-
-    async def retrieve(
-        self,
-        group_task_id: str,
-        timeout: int = 300,
-        polling_interval: int = 2,
-        **kwargs,
-    ) -> dict[str, pd.DataFrame]:
-        """
-        Periodically check the task status and retrieve the result when the task
-        completes.
-
-        :param group_task_id: The task ID to retrieve results for.
-        :param timeout: The maximum time to wait for the task to complete (in seconds).
-        :param polling_interval: The time to wait between status checks (in seconds).
-        :return: Records from the DTO API of the successfully completed task.
-
-        """
-        # Start time for timeout check
-        start_time = time.time()
-
-        # Task status URL
-        status_url = f"{self.url.rstrip('/')}/status/"
-
-        while True:
-            async with aiohttp.ClientSession() as session:
-                # Send a GET request to check the task status
-                async with session.get(
-                    status_url,
-                    headers=self.header,
-                    params={"group_task_id": group_task_id},
-                ) as response:
-                    response.raise_for_status()  # Raise an error for bad status codes
-                    status_response = await response.json()
-
-                    # Check if the task is complete
-                    if status_response.get("status") == "SUCCESS":
-
-                        if error_tasks := status_response.get("error_tasks"):
-                            self.logger.error(
-                                f"Tasks {group_task_id} failed: {error_tasks}"
-                            )
-                        if success_tasks := status_response.get("success_pks"):
-                            params = {"id": ",".join(str(pk) for pk in success_tasks)}
-                            return await self.read(params=params)
-                    elif status_response.get("status") == "FAILURE":
-                        raise Exception(
-                            f"Task {group_task_id} failed: {status_response}"
-                        )
-
-                    # Check if we have reached the timeout
-                    elapsed_time = time.time() - start_time
-                    if elapsed_time > timeout:
-                        raise TimeoutError(
-                            f"Task {group_task_id} did not "
-                            "complete within {timeout} seconds."
-                        )
-
-                    # Wait for the specified polling interval before checking again
-                    await asyncio.sleep(polling_interval)
-
-    def create(self, data: dict[str, Any], **kwargs) -> requests.Response:
-        raise NotImplementedError("The DTO does not support create.")
-
-    def update(self, df: pd.DataFrame, **kwargs: Any) -> requests.Response:
-        """
-        Update the records in the database.
-
-        :param df: The DataFrame containing the records to update.
-        :type df: pd.DataFrame
-        :param kwargs: Additional fields to include in the payload.
-        :type kwargs: Any
-        :return: The response from the POST request.
-        :rtype: requests.Response
-        :raises requests.RequestException: If the request fails.
-
-        """
-        bulk_update_url = (
-            f"{self.url.rstrip('/')}/{self.bulk_update_url_suffix.rstrip('/')}/"
-        )
-
-        self.logger.debug("bulk_update_url: %s", bulk_update_url)
-
-        # Include additional fields in the payload if provided
-        payload = {"data": df.to_dict(orient="records")}
-        payload.update(kwargs)
-
-        try:
-            response = requests.post(
-                bulk_update_url,
-                headers=self.header,
-                json=payload,
-            )
-            response.raise_for_status()
-            return response
-        except requests.RequestException as e:
-            self.logger.error(f"Error in POST request: {e}")
-            raise
-
-    def delete(self, id: str, **kwargs) -> Any:
-        """
-        Delete a DTO record from the database.
-
-        :param id: The ID of the DTO record to delete.
-        :return: A dictionary with a status message indicating success or failure.
-
-        """
-        # Include the Authorization header with the token
-        headers = kwargs.get("headers", {})
-        headers["Authorization"] = f"Token {self.token}"
-
-        # Make the DELETE request with the updated headers
-        response = requests.delete(f"{self.url}/{id}/", headers=headers, **kwargs)
-
-        if response.status_code == 204:
-            return {"status": "success", "message": "DTO deleted successfully."}
-
-        # Raise an error if the response indicates failure
-        response.raise_for_status()
-
-    def parse_metadata(self, metadata: pd.DataFrame) -> pd.DataFrame:
-        """
-        Parse the metadata from the DTO API.
-
-        :param metadata: The metadata DataFrame to parse.
-        :return: The parsed metadata DataFrame.
-        :raises KeyError: If the metadata DataFrame is missing required columns.
-
-        """
-        if metadata.empty:
-            self.logger.warning("Metadata is empty")
-            return metadata
-
-        output_columns = [
-            "id",
-            "promotersetsig",
-            "expression",
-            "regulator_symbol",
-            "binding_source",
-            "expression_source",
-            "passing_fdr",
-            "passing_pvalue",
-        ]
-
-        # required columns are "result" and output_columns
-        missing_req_columns = [
-            col for col in ["result"] + output_columns if col not in metadata.columns
-        ]
-        if missing_req_columns:
-            raise KeyError(
-                "Metadata is missing required columns: "
-                "{', '.join(missing_req_columns)}"
-            )
-
-        dto_results_list = []
-
-        # Check and rename keys, logging a warning if a key is missing
-        keys_to_rename = {
-            "rank1": "binding_rank_threshold",
-            "rank2": "perturbation_rank_threshold",
-            "set1_len": "binding_set_size",
-            "set2_len": "perturbation_set_size",
-        }
-
-        for _, row in metadata.iterrows():
-            dto_results = json.loads(row.result.replace("'", '"'))
-
-            for old_key, new_key in keys_to_rename.items():
-                if old_key in dto_results:
-                    dto_results[new_key] = dto_results.pop(old_key)
-                else:
-                    self.logger.warning(
-                        f"Key '{old_key}' missing in row with id '{row.id}'."
-                    )
-
-            dto_results["id"] = row.id
-            dto_results["promotersetsig"] = row.promotersetsig
-            dto_results["expression"] = row.expression
-            dto_results["regulator_symbol"] = row.regulator_symbol
-            dto_results["binding_source"] = row.binding_source
-            dto_results["expression_source"] = row.expression_source
-            dto_results["passing_fdr"] = row.passing_fdr
-            dto_results["passing_pvalue"] = row.passing_pvalue
-
-            dto_results_list.append(dto_results)
-
-        # Create DataFrame
-        result_df = pd.DataFrame(dto_results_list)
-
-        # Reorder columns: output_columns first, followed by others
-        reordered_columns = output_columns + [
-            col for col in result_df.columns if col not in output_columns
-        ]
-
-        return result_df.loc[:, reordered_columns]
diff --git a/tfbpapi/ExpressionAPI.py b/tfbpapi/ExpressionAPI.py
deleted file mode 100644
index c61e1f7..0000000
--- a/tfbpapi/ExpressionAPI.py
+++ /dev/null
@@ -1,66 +0,0 @@
-import os
-from typing import Any
-
-import pandas as pd
-
-from tfbpapi.AbstractRecordsAndFilesAPI import (
-    AbstractRecordsAndFilesAPI,
-)
-
-
-class ExpressionAPI(AbstractRecordsAndFilesAPI):
-    """Class to interact with the ExpressionAPI endpoint."""
-
-    def __init__(self, **kwargs) -> None:
-        """
-        Initialize the ExpressionAPI object.
-
-        :param kwargs: parameters to pass through AbstractRecordsAndFilesAPI to
-            AbstractAPI.
-
-        """
-        valid_param_keys = kwargs.pop(
-            "valid_param_keys",
-            [
-                "id",
-                "regulator",
-                "regulator_locus_tag",
-                "regulator_symbol",
-                "batch",
-                "control",
-                "mechanism",
-                "restriction",
-                "time",
-                "strain",
-                "source",
-                "source_name",
-                "source_time",
-                "lab",
-                "assay",
-                "workflow",
-                "effect_colname",
-                "pvalue_colname",
-                "preferred_replicate",
-            ],
-        )
-
-        url = kwargs.pop("url", os.getenv("EXPRESSION_URL", None))
-
-        super().__init__(url=url, valid_keys=valid_param_keys, **kwargs)
-
-    def create(self, data: dict[str, Any], **kwargs) -> Any:
-        raise NotImplementedError("The ExpressionAPI does not support create.")
-
-    def update(self, df: pd.DataFrame, **kwargs) -> Any:
-        raise NotImplementedError("The ExpressionAPI does not support update.")
-
-    def delete(self, id: str, **kwargs) -> Any:
-        raise NotImplementedError("The ExpressionAPI does not support delete.")
-
-    def submit(self, post_dict: dict[str, Any], **kwargs) -> Any:
-        raise NotImplementedError("The ExpressionAPI does not support submit.")
-
-    def retrieve(
-        self, group_task_id: str, timeout: int, polling_interval: int, **kwargs
-    ) -> Any:
-        raise NotImplementedError("The ExpressionAPI does not support retrieve.")
diff --git a/tfbpapi/ExpressionManualQCAPI.py b/tfbpapi/ExpressionManualQCAPI.py
deleted file mode 100644
index 80023e6..0000000
--- a/tfbpapi/ExpressionManualQCAPI.py
+++ /dev/null
@@ -1,103 +0,0 @@
-import os
-from typing import Any
-
-import pandas as pd
-import requests  # type: ignore
-
-from tfbpapi.AbstractRecordsOnlyAPI import AbstractRecordsOnlyAPI
-
-
-class ExpressionManualQCAPI(AbstractRecordsOnlyAPI):
-    """A class to interact with the ExpressionManualQCAPI endpoint."""
-
-    def __init__(self, **kwargs):
-        """
-        Initialize the ExpressionManualQCAPI object.
-
-        :param kwargs: parameters to pass to AbstractAPI via AbstractRecordsOnlyAPI.
-
-        """
-        valid_param_keys = kwargs.pop(
-            "valid_param_keys",
-            [
-                "id",
-                "expression",
-                "strain_verified",
-                "regulator_locus_tag",
-                "regulator_symbol",
-                "batch",
-                "replicate",
-                "control",
-                "mechanism",
-                "restriction",
-                "time",
-                "source",
-                "lab",
-                "assay",
-                "workflow",
-            ],
-        )
-
-        url = kwargs.pop("url", os.getenv("EXPRESSIONMANUALQC_URL", None))
-        if not url:
-            raise AttributeError(
-                "url must be provided or the environmental variable ",
-                "`EXPRESSIONMANUALQC_URL` must be set",
-            )
-
-        self.bulk_update_url_suffix = kwargs.pop(
-            "bulk_update_url_suffix", "bulk-update"
-        )
-
-        super().__init__(url=url, valid_keys=valid_param_keys, **kwargs)
-
-    def create(self, data: dict[str, Any], **kwargs) -> Any:
-        raise NotImplementedError("The ExpressionManualQCAPI does not support create.")
-
-    def update(self, df: pd.DataFrame, **kwargs: Any) -> requests.Response:
-        """
-        Update the records in the database.
-
-        :param df: The DataFrame containing the records to update.
-        :type df: pd.DataFrame
-        :param kwargs: Additional fields to include in the payload.
-        :type kwargs: Any
-        :return: The response from the POST request.
-        :rtype: requests.Response
-        :raises requests.RequestException: If the request fails.
-
-        """
-        bulk_update_url = (
-            f"{self.url.rstrip('/')}/{self.bulk_update_url_suffix.rstrip('/')}/"
-        )
-
-        self.logger.debug("bulk_update_url: %s", bulk_update_url)
-
-        # Include additional fields in the payload if provided
-        payload = {"data": df.to_dict(orient="records")}
-        payload.update(kwargs)
-
-        try:
-            response = requests.post(
-                bulk_update_url,
-                headers=self.header,
-                json=payload,
-            )
-            response.raise_for_status()
-            return response
-        except requests.RequestException as e:
-            self.logger.error(f"Error in POST request: {e}")
-            raise
-
-    def delete(self, id: str, **kwargs) -> Any:
-        raise NotImplementedError("The ExpressionManualQCAPI does not support delete.")
-
-    def submit(self, post_dict: dict[str, Any], **kwargs) -> Any:
-        raise NotImplementedError("The ExpressionManualQCAPI does not support submit.")
-
-    def retrieve(
-        self, group_task_id: str, timeout: int, polling_interval: int, **kwargs
-    ) -> Any:
-        raise NotImplementedError(
-            "The ExpressionManualQCAPI does not support retrieve."
-        )
diff --git a/tfbpapi/FileFormatAPI.py b/tfbpapi/FileFormatAPI.py
deleted file mode 100644
index bccdcc1..0000000
--- a/tfbpapi/FileFormatAPI.py
+++ /dev/null
@@ -1,57 +0,0 @@
-import os
-from typing import Any
-
-import pandas as pd
-
-from tfbpapi.AbstractRecordsOnlyAPI import AbstractRecordsOnlyAPI
-
-
-class FileFormatAPI(AbstractRecordsOnlyAPI):
-    """A class to interact with the FileFormatAPI endpoint."""
-
-    def __init__(self, **kwargs):
-        """
-        Initialize the FileFormatAPI object.
-
-        :param kwargs: parameters to pass to AbstractAPI via AbstractRecordsOnlyAPI.
-
-        """
-        valid_param_keys = kwargs.pop(
-            "valid_param_keys",
-            [
-                "fileformat",
-                "fields",
-                "separator",
-                "feature_identifier_col",
-                "effect_col",
-                "default_effect_threshold",
-                "pval_col",
-                "default_pvalue_threshold",
-            ],
-        )
-
-        url = kwargs.pop("url", os.getenv("FILEFORMAT_URL", None))
-        if not url:
-            raise AttributeError(
-                "url must be provided or the environmental variable ",
-                "`FILEFORMAT_URL` must be set",
-            )
-
-        super().__init__(url=url, valid_keys=valid_param_keys, **kwargs)
-
-    def create(self, data: dict[str, Any], **kwargs) -> Any:
-        raise NotImplementedError("The FileFormatAPI does not support create.")
-
-    def update(self, df: pd.DataFrame, **kwargs) -> Any:
-        raise NotImplementedError("The FileFormatAPI does not support update.")
-
-    def delete(self, id: str, **kwargs) -> Any:
-        raise NotImplementedError("The FileFormatAPI does not support delete.")
-
-    def submit(self, post_dict: dict[str, Any], **kwargs) -> Any:
-        raise NotImplementedError("The FileFormatAPI does not support submit.")
-
-    def retrieve(
-        self, group_task_id: str, timeout: int, polling_interval: int, **kwargs
-    ) -> Any:
-        raise NotImplementedError("The FileFormatAPI does not support retrieve.")
diff --git a/tfbpapi/GenomicFeatureAPI.py b/tfbpapi/GenomicFeatureAPI.py
deleted file mode 100644
index 499cb6c..0000000
--- a/tfbpapi/GenomicFeatureAPI.py
+++ /dev/null
@@ -1,60 +0,0 @@
-import os
-from typing import Any
-
-import pandas as pd
-
-from tfbpapi.AbstractRecordsOnlyAPI import AbstractRecordsOnlyAPI
-
-
-class GenomicFeatureAPI(AbstractRecordsOnlyAPI):
-    """A class to interact with the GenomicFeatureAPI endpoint."""
-
-    def __init__(self, **kwargs):
-        """
-        Initialize the GenomicFeatureAPI object.
-
-        :param kwargs: parameters to pass to AbstractAPI via AbstractRecordsOnlyAPI.
-
-        """
-        valid_param_keys = kwargs.pop(
-            "valid_param_keys",
-            [
-                "id",
-                "chr",
-                "start",
-                "end",
-                "strand",
-                "type",
-                "locus_tag",
-                "symbol",
-                "source",
-                "alias",
-                "note",
-            ],
-        )
-
-        url = kwargs.pop("url", os.getenv("GENOMICFEATURE_URL", None))
-        if not url:
-            raise AttributeError(
-                "url must be provided or the environmental variable ",
-                "`GENOMICFEATURE_URL` must be set",
-            )
-
-        super().__init__(url=url, valid_keys=valid_param_keys, **kwargs)
-
-    def create(self, data: dict[str, Any], **kwargs) -> Any:
-        raise NotImplementedError("The GenomicFeatureAPI does not support create.")
-
-    def update(self, df: pd.DataFrame, **kwargs) -> Any:
-        raise NotImplementedError("The GenomicFeatureAPI does not support update.")
-
-    def delete(self, id: str, **kwargs) -> Any:
-        raise NotImplementedError("The GenomicFeatureAPI does not support delete.")
-
-    def submit(self, post_dict: dict[str, Any], **kwargs) -> Any:
-        raise NotImplementedError("The GenomicFeatureAPI does not support submit.")
-
-    def retrieve(
-        self, group_task_id: str, timeout: int, polling_interval: int, **kwargs
-    ) -> Any:
-        raise NotImplementedError("The GenomicFeatureAPI does not support retrieve.")
diff --git a/tfbpapi/ParamsDict.py b/tfbpapi/ParamsDict.py
deleted file mode 100644
index 19f7470..0000000
--- a/tfbpapi/ParamsDict.py
+++ /dev/null
@@ -1,156 +0,0 @@
-from typing import Any, Union
-
-
-class ParamsDict(dict):
-    """
-    A dictionary subclass that ensures all keys are strings and supports multiple key-
-    value assignments at once, with validation against a list of valid keys.
-
-    This class is designed to be used for passing parameters to HTTP requests and
-    extends the base dictionary class, ensuring that insertion order is preserved.
-
-    """
-
-    def __init__(self, params: dict[str, Any] = {}, valid_keys: list[str] = []) -> None:
-        """
-        Initialize the ParamsDict with optional initial parameters and valid keys.
-
-        :param params: A dictionary of initial parameters. All keys must be strings.
-        :type params: dict, optional
-        :param valid_keys: A list of valid keys for validation.
-        :type valid_keys: list of str, optional
-        :raises ValueError: If `params` is not a dictionary or if any of the keys
-            are not strings.
-
-        """
-        params = params or {}
-        valid_keys = valid_keys or []
-        if not isinstance(params, dict):
-            raise ValueError("params must be a dictionary")
-        if len(params) > 0 and not all(isinstance(k, str) for k in params.keys()):
-            raise ValueError("params must be a dictionary with string keys")
-        super().__init__(params)
-        self._valid_keys = valid_keys
-
-    def __setitem__(self, key: str | list[str], value: Any | list[Any]) -> None:
-        """
-        Set a parameter value or multiple parameter values.
-
-        :param key: The parameter key or a list of parameter keys.
-        :type key: str or list of str
-        :param value: The parameter value or a list of parameter values.
-        :type value: any or list of any
-        :raises ValueError: If the length of `key` and `value` lists do not match.
-        :raises KeyError: If `key` is not a string or a list of strings.
-
-        """
-        if isinstance(key, str):
-            self._validate_key(key)
-            super().__setitem__(key, value)
-        elif isinstance(key, list) and isinstance(value, list):
-            if len(key) != len(value):
-                raise ValueError("Length of keys and values must match")
-            for k, v in zip(key, value):
-                if not isinstance(k, str):
-                    raise KeyError("All keys must be strings")
-                self._validate_key(k)
-                super().__setitem__(k, v)
-        else:
-            raise KeyError("Key must be a string or list of strings")
-
-    def __getitem__(self, key: str | list[str]) -> Union[Any, "ParamsDict"]:
-        """
-        Get a parameter value or a new ParamsDict with specified keys.
-
-        :param key: The parameter key or a list of parameter keys.
-        :type key: str or list of str
-        :return: The parameter value or a new ParamsDict with the specified keys.
-        :rtype: any or ParamsDict
-        :raises KeyError: If `key` is not a string or a list of strings.
-
-        """
-        if isinstance(key, str):
-            return super().__getitem__(key)
-        elif isinstance(key, list):
-            return ParamsDict({k: dict.__getitem__(self, k) for k in key if k in self})
-        else:
-            raise KeyError("Key must be a string or list of strings")
-
-    def __delitem__(self, key: str) -> None:
-        """
-        Delete a parameter by key.
-
-        :param key: The parameter key.
-        :type key: str
-        :raises KeyError: If `key` is not a string.
-
-        """
-        if isinstance(key, str):
-            super().__delitem__(key)
-        else:
-            raise KeyError("Key must be a string")
-
-    def __repr__(self) -> str:
-        """
-        Return a string representation of the ParamsDict.
-
-        :return: A string representation of the ParamsDict.
-        :rtype: str
-
-        """
-        return f"ParamsDict({super().__repr__()})"
-
-    def __str__(self) -> str:
-        """
-        Return a human-readable string representation of the ParamsDict.
-
-        :return: A human-readable string representation of the ParamsDict.
-        :rtype: str
-
-        """
-        return ", ".join(f"{k}: {v}" for k, v in self.items())
-
-    def update(self, *args, **kwargs) -> None:
-        """Update the ParamsDict with the key/value pairs from other, overwriting
-        existing keys."""
-        if args:
-            other = args[0]
-            if isinstance(other, dict):
-                [self._validate_key(k) for k in other.keys()]
-                for key, value in other.items():
-                    self.__setitem__(key, value)
-            else:
-                [self._validate_key(k) for k, _ in other]
-                for key, value in other:
-                    self.__setitem__(key, value)
-        [self._validate_key(k) for k in kwargs.keys()]
-        for key, value in kwargs.items():
-            self.__setitem__(key, value)
-
-    def as_dict(self) -> dict:
-        """
-        Convert the ParamsDict to a standard dictionary.
-
-        :return: A standard dictionary with the same items as the ParamsDict.
-        :rtype: dict
-
-        """
-        return dict(self)
-
-    def _validate_key(self, key: str) -> bool:
-        """Validate that the key is in the list of valid keys."""
-        if self._valid_keys and key not in self._valid_keys:
-            raise KeyError(f"Invalid parameter key provided: {key}")
-        return True
-
-    @property
-    def valid_keys(self) -> list[str]:
-        """Get the list of valid keys."""
-        return self._valid_keys
-
-    @valid_keys.setter
-    def valid_keys(self, keys: list[str]) -> None:
-        """Set the list of valid keys."""
-        if not all(isinstance(k, str) for k in keys):
-            raise ValueError("valid_keys must be a list of strings")
-        self._valid_keys = keys
diff --git a/tfbpapi/PromoterSetAPI.py b/tfbpapi/PromoterSetAPI.py
deleted file mode 100644
index f747497..0000000
--- a/tfbpapi/PromoterSetAPI.py
+++ /dev/null
@@ -1,46 +0,0 @@
-import os
-from typing import Any
-
-import pandas as pd
-
-from tfbpapi.AbstractRecordsAndFilesAPI import (
-    AbstractRecordsAndFilesAPI,
-)
-
-
-class PromoterSetAPI(AbstractRecordsAndFilesAPI):
-    """Class to interact with the PromoterSetAPI endpoint."""
-
-    def __init__(self, **kwargs) -> None:
-        """
-        Initialize the PromoterSetAPI object.
-
-        :param kwargs: parameters to pass through AbstractRecordsAndFilesAPI to
-            AbstractAPI.
-
-        """
-        valid_param_keys = kwargs.pop(
-            "valid_param_keys",
-            ["id", "name"],
-        )
-
-        url = kwargs.pop("url", os.getenv("PROMOTERSET_URL", None))
-
-        super().__init__(url=url, valid_keys=valid_param_keys, **kwargs)
-
-    def create(self, data: dict[str, Any], **kwargs) -> Any:
-        raise NotImplementedError("The PromoterSetAPI does not support create.")
-
-    def update(self, df: pd.DataFrame, **kwargs) -> Any:
-        raise NotImplementedError("The PromoterSetAPI does not support update.")
-
-    def delete(self, id: str, **kwargs) -> Any:
-        raise NotImplementedError("The PromoterSetAPI does not support delete.")
-
-    def submit(self, post_dict: dict[str, Any], **kwargs) -> Any:
-        raise NotImplementedError("The PromoterSetAPI does not support submit.")
-
-    def retrieve(
-        self, group_task_id: str, timeout: int, polling_interval: int, **kwargs
-    ) -> Any:
-        raise NotImplementedError("The PromoterSetAPI does not support retrieve.")
diff --git a/tfbpapi/PromoterSetSigAPI.py b/tfbpapi/PromoterSetSigAPI.py
deleted file mode 100644
index e75609e..0000000
--- a/tfbpapi/PromoterSetSigAPI.py
+++ /dev/null
@@ -1,68 +0,0 @@
-import os
-from typing import Any
-
-import pandas as pd
-
-from tfbpapi.AbstractRecordsAndFilesAPI import (
-    AbstractRecordsAndFilesAPI,
-)
-
-
-class PromoterSetSigAPI(AbstractRecordsAndFilesAPI):
-    """Class to interact with the PromoterSetSigAPI endpoint."""
-
-    def __init__(self, **kwargs) -> None:
-        """
-        Initialize the PromoterSetSigAPI object.
-
-        :param kwargs: parameters to pass through AbstractRecordsAndFilesAPI to
-            AbstractAPI.
-
-        """
-        valid_param_keys = kwargs.pop(
-            "valid_param_keys",
-            [
-                "id",
-                "single_binding",
-                "composite_binding",
-                "promoter",
-                "promoter_name",
-                "background",
-                "background_name",
-                "regulator_locus_tag",
-                "regulator_symbol",
-                "batch",
-                "replicate",
-                "source",
-                "source_name",
-                "lab",
-                "assay",
-                "workflow",
-                "data_usable",
-                "aggregated",
-                "condition",
-                "deduplicate",
-                "preferred_replicate",
-            ],
-        )
-
-        url = kwargs.pop("url", os.getenv("PROMOTERSETSIG_URL", None))
-
-        super().__init__(url=url, valid_keys=valid_param_keys, **kwargs)
-
-    def create(self, data: dict[str, Any], **kwargs) -> Any:
-        raise NotImplementedError("The PromoterSetSigAPI does not support create.")
-
-    def update(self, df: pd.DataFrame, **kwargs) -> Any:
-        raise NotImplementedError("The PromoterSetSigAPI does not support update.")
-
-    def delete(self, id: str, **kwargs) -> Any:
-        raise NotImplementedError("The PromoterSetSigAPI does not support delete.")
-
-    def submit(self, post_dict: dict[str, Any], **kwargs) -> Any:
-        raise NotImplementedError("The PromoterSetSigAPI does not support submit.")
-
-    def retrieve(
-        self, group_task_id: str, timeout: int, polling_interval: int, **kwargs
-    ) -> Any:
-        raise NotImplementedError("The PromoterSetSigAPI does not support retrieve.")
diff --git a/tfbpapi/RankResponseAPI.py b/tfbpapi/RankResponseAPI.py
deleted file mode 100644
index 6ed3330..0000000
--- a/tfbpapi/RankResponseAPI.py
+++ /dev/null
@@ -1,286 +0,0 @@
-import asyncio
-import json
-import os
-import tarfile
-import tempfile
-import time
-from typing import Any
-
-import aiohttp
-import pandas as pd
-from requests import Response, delete, post  # type: ignore
-from requests_toolbelt import MultipartEncoder
-
-from tfbpapi.AbstractRecordsAndFilesAPI import (
-    AbstractRecordsAndFilesAPI,
-)
-
-
-class RankResponseAPI(AbstractRecordsAndFilesAPI):
-    """
-    A class to interact with the Rank Response API.
-
-    Retrieves rank response data from the database.
-
-    """
-
-    def __init__(self, **kwargs) -> None:
-        """
-        Initialize the RankResponseAPI object. This will serve as an interface to the
-        RankResponse endpoint of both the database and the application cache.
-
-        :param url: The URL of the Rank Response API
-        :param kwargs: Additional parameters to pass to AbstractAPI.
-
-        """
-        super().__init__(
-            url=kwargs.pop("url", os.getenv("RANKRESPONSE_URL", "")),
-            **kwargs,
-        )
-
-    async def submit(
-        self,
-        post_dict: dict[str, Any],
-        **kwargs,
-    ) -> Any:
-        # make a post request with the post_dict to rankresponse_url
-        rankresponse_url = f"{self.url.rstrip('/')}/submit/"
-        self.logger.debug("rankresponse_url: %s", rankresponse_url)
-
-        async with aiohttp.ClientSession() as session:
-            async with session.post(
-                rankresponse_url, headers=self.header, json=post_dict
-            ) as response:
-                response.raise_for_status()
-                result = await response.json()
-                try:
-                    return result["group_task_id"]
-                except KeyError:
-                    self.logger.error(
-                        "Expected 'group_task_id' in response: %s", json.dumps(result)
-                    )
-                    raise
-
-    async def retrieve(
-        self,
-        group_task_id: str,
-        timeout: int = 300,
-        polling_interval: int = 2,
-        **kwargs,
-    ) -> dict[str, pd.DataFrame]:
-        """
-        Periodically check the task status and retrieve the result when the task
-        completes.
-
-        :param group_task_id: The task ID to retrieve results for.
-        :param timeout: The maximum time to wait for the task to complete (in seconds).
-        :param polling_interval: The time to wait between status checks (in seconds).
-        :return: Extracted files from the result tarball.
-
-        """
-        # Start time for timeout check
-        start_time = time.time()
-
-        # Task status URL
-        status_url = f"{self.url.rstrip('/')}/status/"
-
-        while True:
-            async with aiohttp.ClientSession() as session:
-                # Send a GET request to check the task status
-                async with session.get(
-                    status_url,
-                    headers=self.header,
-                    params={"group_task_id": group_task_id},
-                ) as response:
-                    response.raise_for_status()  # Raise an error for bad status codes
-                    status_response = await response.json()
-
-                    # Check if the task is complete
-                    if status_response.get("status") == "SUCCESS":
-                        # Fetch and return the tarball
-                        return await self._download_result(group_task_id)
-                    elif status_response.get("status") == "FAILURE":
-                        raise Exception(
-                            f"Task {group_task_id} failed: {status_response}"
-                        )
-
-                    # Check if we have reached the timeout
-                    elapsed_time = time.time() - start_time
-                    if elapsed_time > timeout:
-                        raise TimeoutError(
-                            f"Task {group_task_id} did not "
-                            "complete within {timeout} seconds."
-                        )
-
-                    # Wait for the specified polling interval before checking again
-                    await asyncio.sleep(polling_interval)
-
-    async def _download_result(self, group_task_id: str) -> Any:
-        """
-        Download the result tarball after the task is successful.
-
-        :param group_task_id: The group_task_id to download the results for.
-        :return: Extracted metadata and data from the tarball.
-
-        """
-        download_url = f"{self.url.rstrip('/')}/retrieve_task/"
-
-        async with aiohttp.ClientSession() as session:
-            async with session.get(
-                download_url,
-                headers=self.header,
-                params={"group_task_id": group_task_id},
-            ) as response:
-                response.raise_for_status()  # Ensure request was successful
-                tar_data = await response.read()
-
-                # Save tarball to a temporary file or return raw tar content
-                with tempfile.NamedTemporaryFile(
-                    delete=False, suffix=".tar.gz"
-                ) as temp_file:
-                    temp_file.write(tar_data)
-                    temp_file.flush()
-                    temp_file.seek(0)
-
-                    # Extract and return the content of the tarball
-                    return self._extract_files(temp_file.name)
-
-    def _extract_files(self, tar_path: str) -> dict[str, pd.DataFrame]:
-        """
-        Extract metadata and associated files from a tarball.
-
-        :param tar_path: The path to the tarball file.
-        :return: A tuple of metadata DataFrame and a dictionary of DataFrames for each
-            file.
-
-        """
-        with tarfile.open(tar_path, mode="r:gz") as tar:
-            tar_members = tar.getmembers()
-
-            # Extract metadata.json
-            metadata_member = next(
-                (m for m in tar_members if m.name == "metadata.json"), None
-            )
-            if metadata_member is None:
-                raise FileNotFoundError("metadata.json not found in tar archive")
-
-            extracted_file = tar.extractfile(metadata_member)
-            if extracted_file is None:
-                raise FileNotFoundError("Failed to extract metadata.json")
-
-            with extracted_file as f:
-                metadata_dict = json.load(f)
-
-            metadata_df = pd.DataFrame(metadata_dict.values())
-            metadata_df["id"] = metadata_dict.keys()
-
-            # Extract CSV files
-            data = {}
-            for rr_id in metadata_df["id"]:
-                csv_filename = f"{rr_id}.csv.gz"
-                member = next((m for m in tar_members if m.name == csv_filename), None)
-                if member is None:
-                    raise FileNotFoundError(f"{csv_filename} not found in tar archive")
-
-                extracted_file = tar.extractfile(member)
-                if extracted_file is None:
-                    raise FileNotFoundError(f"Failed to extract {csv_filename}")
-
-                with extracted_file as f:
-                    data[rr_id] = pd.read_csv(f, compression="gzip")
-        return {"metadata": metadata_df, "data": data}
-
-    def create(self, data: dict[str, Any], **kwargs) -> Response:
-        """
-        Create a new RankResponse record by uploading a gzipped CSV file.
-
-        :param data: This should be the fields in the RankREsponse model, eg
-            "promotersetsig_id", "expression_id" and "parameters".
-        :param kwargs: Additional parameters to pass to the post. This must include a
-            DataFrame to upload as a CSV file with the keyword `df`, eg `df=my_df`.
-
-        :return: The result of the post request.
-
-        :raises ValueError: If a DataFrame is not provided in the keyword arguments.
-        :raises TypeError: If the DataFrame provided is not a pandas DataFrame.
-
-        """
-        # ensure that the url ends in a slash
-        rankresponse_url = f"{self.url.rstrip('/')}/"
-        df = kwargs.pop("df", None)
-
-        if df is None:
-            raise ValueError(
-                "A DataFrame must be provided to create "
-                "a RankResponse via keyword `df`"
-            )
-        if not isinstance(df, pd.DataFrame):
-            raise TypeError(
-                f"Expected a DataFrame for keyword `df`, got {type(df).__name__}"
-            )
-
-        # Create a temporary gzipped CSV file from the DataFrame
-        with tempfile.NamedTemporaryFile(suffix=".csv.gz") as temp_file:
-            df.to_csv(temp_file.name, compression="gzip", index=False)
-
-            # Prepare the file and metadata for upload
-            with open(temp_file.name, "rb") as file:
-                multipart_data = MultipartEncoder(
-                    fields={**data, "file": (temp_file.name, file, "application/gzip")}
-                )
-                headers = {**self.header, "Content-Type": multipart_data.content_type}
-
-                # Send the POST request with custom encoded multipart data
-                response = post(rankresponse_url, headers=headers, data=multipart_data)
-
-        response.raise_for_status()
-        return response
-
-    def update(self, df: pd.DataFrame, **kwargs) -> Any:
-        raise NotImplementedError("The RankResponseAPI does not support update.")
-
-    def delete(self, id: str = "", **kwargs) -> Any:
-        """
-        Delete one or more records from the database.
-
-        :param id: The ID of the record to delete. However, you can also pass in
-        `ids` as a list of IDs to delete multiple records. This is why `id` is optional.
-        If neither `id` nor `ids` is provided, a ValueError is raised.
-
-        :return: A dictionary with a status message indicating success or failure.
-
-        :raises ValueError: If neither `id` nor `ids` is provided.
-
-        """
-        # Include the Authorization header with the token
-        headers = kwargs.get("headers", {})
-        headers["Authorization"] = f"Token {self.token}"
-
-        ids = kwargs.pop("ids", str(id))
-
-        # Determine if it's a single ID or multiple
-        if isinstance(ids, str) and str != "":
-            # Single ID deletion for backward compatibility
-            response = delete(f"{self.url}/{ids}/", headers=headers, **kwargs)
-        elif isinstance(ids, list) and ids:
-            # Bulk delete with a list of IDs
-            response = delete(
-                f"{self.url}/delete/",
-                headers=headers,
-                json={"ids": ids},  # Send the list of IDs in the request body
-                **kwargs,
-            )
-        else:
-            raise ValueError(
-                "No ID(s) provided for deletion. Either pass a single ID with "
-                "`id` or a list of IDs with `ids = [1,2, ...]"
-            )
-
-        if response.status_code in [200, 204]:
-            return {
-                "status": "success",
-                "message": "RankResponse(s) deleted successfully.",
-            }
-
-        # Raise an error if the response indicates failure
-        response.raise_for_status()
diff --git a/tfbpapi/RegulatorAPI.py b/tfbpapi/RegulatorAPI.py
deleted file mode 100644
index 675c002..0000000
--- a/tfbpapi/RegulatorAPI.py
+++ /dev/null
@@ -1,53 +0,0 @@
-import os
-from typing import Any
-
-import pandas as pd
-
-from tfbpapi.AbstractRecordsOnlyAPI import AbstractRecordsOnlyAPI
-
-
-class RegulatorAPI(AbstractRecordsOnlyAPI):
-    """A class to interact with the RegulatorAPI endpoint."""
-
-    def __init__(self, **kwargs):
-        """
-        Initialize the RegulatorAPI object.
-
-        :param kwargs: parameters to pass to AbstractAPI via AbstractRecordsOnlyAPI.
-
-        """
-        valid_param_keys = kwargs.pop(
-            "valid_param_keys",
-            [
-                "id",
-                "regulator_locus_tag",
-                "regulator_symbol",
-                "under_development",
-            ],
-        )
-
-        url = kwargs.pop("url", os.getenv("REGULATOR_URL", None))
-        if not url:
-            raise AttributeError(
-                "url must be provided or the environmental variable ",
-                "`REGULATOR_URL` must be set",
-            )
-
-        super().__init__(url=url, valid_keys=valid_param_keys, **kwargs)
-
-    def create(self, data: dict[str, Any], **kwargs) -> Any:
-        raise NotImplementedError("The RegulatorAPI does not support create.")
-
-    def update(self, df: pd.DataFrame, **kwargs) -> Any:
-        raise NotImplementedError("The RegulatorAPI does not support update.")
-
-    def delete(self, id: str, **kwargs) -> Any:
-        raise NotImplementedError("The RegulatorAPI does not support delete.")
-
-    def submit(self, post_dict: dict[str, Any], **kwargs) -> Any:
-        raise NotImplementedError("The RegulatorAPI does not support submit.")
-
-    def retrieve(
-        self, group_task_id: str, timeout: int, polling_interval: int, **kwargs
-    ) -> Any:
-        raise NotImplementedError("The RegulatorAPI does not support retrieve.")
diff --git a/tfbpapi/UnivariateModelsAPI.py b/tfbpapi/UnivariateModelsAPI.py
deleted file mode 100644
index d3bc632..0000000
--- a/tfbpapi/UnivariateModelsAPI.py
+++ /dev/null
@@ -1,202 +0,0 @@
-import asyncio
-import json
-import os
-import time
-from typing import Any
-
-import aiohttp
-import pandas as pd
-import requests  # type: ignore
-
-from tfbpapi.AbstractRecordsOnlyAPI import AbstractRecordsOnlyAPI
-
-
-class UnivariateModelsAPI(AbstractRecordsOnlyAPI):
-    """
-    A class to interact with the UnivariateModels API.
-
-    Retrieves univariatemodels data from the database.
-
-    """
-
-    def __init__(self, **kwargs) -> None:
-        """
-        Initialize the UnivariateModels object. This will serve as an interface to the
-        UnivariateModels endpoint of both the database and the application cache.
-
-        :param url: The URL of the UnivariateModels API
-        :param kwargs: Additional parameters to pass to AbstractAPI.
-
-        """
-
-        self.bulk_update_url_suffix = kwargs.pop(
-            "bulk_update_url_suffix", "bulk-update"
-        )
-
-        super().__init__(
-            url=kwargs.pop("url", os.getenv("UNIVARIATEMODELS_URL", "")),
-            **kwargs,
-        )
-
-    async def submit(
-        self,
-        post_dict: dict[str, Any],
-        **kwargs,
-    ) -> Any:
-        """
-        Submit a UnivariateModels task to the UnivariateModels API.
-
-        :param post_dict: The dictionary to submit to the UnivariateModels API. The
-            typing needs to be adjusted -- it can take a list of dictionaries to submit
-            a batch.
-        :return: The group_task_id of the submitted task.
-
-        """
-        # make a post request with the post_dict to univariatemodels_url
-        univariatemodels_url = f"{self.url.rstrip('/')}/submit/"
-        self.logger.debug("univariatemodels_url: %s", univariatemodels_url)
-
-        async with aiohttp.ClientSession() as session:
-            async with session.post(
-                univariatemodels_url, headers=self.header, json=post_dict
-            ) as response:
-                try:
-                    response.raise_for_status()
-                except aiohttp.ClientResponseError as e:
-                    self.logger.error(
-                        "Failed to submit UnivariateModels task: Status %s, Reason %s",
-                        e.status,
-                        e.message,
-                    )
-                    raise
-                result = await response.json()
-                try:
-                    return result["group_task_id"]
-                except KeyError:
-                    self.logger.error(
-                        "Expected 'group_task_id' in response: %s", json.dumps(result)
-                    )
-                    raise
-
-    async def retrieve(
-        self,
-        group_task_id: str,
-        timeout: int = 300,
-        polling_interval: int = 2,
-        **kwargs,
-    ) -> dict[str, pd.DataFrame]:
-        """
-        Periodically check the task status and retrieve the result when the task
-        completes.
-
-        :param group_task_id: The task ID to retrieve results for.
-        :param timeout: The maximum time to wait for the task to complete (in seconds).
-        :param polling_interval: The time to wait between status checks (in seconds).
-        :return: Records from the UnivariateModels API of the successfully completed
-            task.
-
-        """
-        # Start time for timeout check
-        start_time = time.time()
-
-        # Task status URL
-        status_url = f"{self.url.rstrip('/')}/status/"
-
-        while True:
-            async with aiohttp.ClientSession() as session:
-                # Send a GET request to check the task status
-                async with session.get(
-                    status_url,
-                    headers=self.header,
-                    params={"group_task_id": group_task_id},
-                ) as response:
-                    response.raise_for_status()  # Raise an error for bad status codes
-                    status_response = await response.json()
-
-                    # Check if the task is complete
-                    if status_response.get("status") == "SUCCESS":
-
-                        if error_tasks := status_response.get("error_tasks"):
-                            self.logger.error(
-                                f"Tasks {group_task_id} failed: {error_tasks}"
-                            )
-                        if success_tasks := status_response.get("success_pks"):
-                            params = {"id": ",".join(str(pk) for pk in success_tasks)}
-                            return await self.read(params=params)
-                    elif status_response.get("status") == "FAILURE":
-                        raise Exception(
-                            f"Task {group_task_id} failed: {status_response}"
-                        )
-
-                    # Check if we have reached the timeout
-                    elapsed_time = time.time() - start_time
-                    if elapsed_time > timeout:
-                        raise TimeoutError(
-                            f"Task {group_task_id} did not "
-                            "complete within {timeout} seconds."
-                        )
-
-                    # Wait for the specified polling interval before checking again
-                    await asyncio.sleep(polling_interval)
-
-    def create(self, data: dict[str, Any], **kwargs) -> requests.Response:
-        raise NotImplementedError("The UnivariateModels does not support create.")
-
-    def update(self, df: pd.DataFrame, **kwargs: Any) -> requests.Response:
-        """
-        Update the records in the database.
-
-        :param df: The DataFrame containing the records to update.
-        :type df: pd.DataFrame
-        :param kwargs: Additional fields to include in the payload.
-        :type kwargs: Any
-        :return: The response from the POST request.
-        :rtype: requests.Response
-        :raises requests.RequestException: If the request fails.
-
-        """
-        bulk_update_url = (
-            f"{self.url.rstrip('/')}/{self.bulk_update_url_suffix.rstrip('/')}/"
-        )
-
-        self.logger.debug("bulk_update_url: %s", bulk_update_url)
-
-        # Include additional fields in the payload if provided
-        payload = {"data": df.to_dict(orient="records")}
-        payload.update(kwargs)
-
-        try:
-            response = requests.post(
-                bulk_update_url,
-                headers=self.header,
-                json=payload,
-            )
-            response.raise_for_status()
-            return response
-        except requests.RequestException as e:
-            self.logger.error(f"Error in POST request: {e}")
-            raise
-
-    def delete(self, id: str, **kwargs) -> Any:
-        """
-        Delete a UnivariateModels record from the database.
-
-        :param id: The ID of the UnivariateModels record to delete.
-        :return: A dictionary with a status message indicating success or failure.
-
-        """
-        # Include the Authorization header with the token
-        headers = kwargs.get("headers", {})
-        headers["Authorization"] = f"Token {self.token}"
-
-        # Make the DELETE request with the updated headers
-        response = requests.delete(f"{self.url}/{id}/", headers=headers, **kwargs)
-
-        if response.status_code == 204:
-            return {
-                "status": "success",
-                "message": "UnivariateModels deleted successfully.",
-            }
-
-        # Raise an error if the response indicates failure
-        response.raise_for_status()
diff --git a/tfbpapi/__init__.py b/tfbpapi/__init__.py
index 8c0f3be..f9db664 100644
--- a/tfbpapi/__init__.py
+++ b/tfbpapi/__init__.py
@@ -1,39 +1,33 @@
-from .BindingAPI import BindingAPI
-from .BindingConcatenatedAPI import BindingConcatenatedAPI
-from .BindingManualQCAPI import BindingManualQCAPI
-from .CallingCardsBackgroundAPI import CallingCardsBackgroundAPI
-from .DataSourceAPI import DataSourceAPI
-from .DtoAPI import DtoAPI
-from .ExpressionAPI import ExpressionAPI
-from .ExpressionManualQCAPI import ExpressionManualQCAPI
-from .FileFormatAPI import FileFormatAPI
-from .GenomicFeatureAPI import GenomicFeatureAPI
-from .metric_arrays import metric_arrays
-from .PromoterSetAPI import PromoterSetAPI
-from .PromoterSetSigAPI import PromoterSetSigAPI
-from .rank_transforms import shifted_negative_log_ranks, stable_rank, transform
-from .RankResponseAPI import RankResponseAPI
-from .RegulatorAPI import RegulatorAPI
-from .UnivariateModelsAPI import UnivariateModelsAPI
+from .datacard import DataCard
+from .fetchers import HfDataCardFetcher, HfRepoStructureFetcher, HfSizeInfoFetcher
+from .hf_cache_manager import HfCacheManager
+from .models import (
+    DatasetCard,
+    DatasetConfig,
+    DatasetType,
+    ExtractedMetadata,
+    FeatureInfo,
+    MetadataConfig,
+    MetadataRelationship,
+    PropertyMapping,
+    RepositoryConfig,
+)
+from .virtual_db import VirtualDB
 
 __all__ = [
-    "BindingAPI",
-    "BindingConcatenatedAPI",
-    "BindingManualQCAPI",
-    "CallingCardsBackgroundAPI",
-    "DataSourceAPI",
-    "DtoAPI",
-    "ExpressionAPI",
-    "ExpressionManualQCAPI",
-    "FileFormatAPI",
-    "GenomicFeatureAPI",
-    "metric_arrays",
-    "transform",
-    "PromoterSetAPI",
-    "PromoterSetSigAPI",
-    "RankResponseAPI",
-    "RegulatorAPI",
-    "stable_rank",
-    "shifted_negative_log_ranks",
-    "UnivariateModelsAPI",
+    "DataCard",
+    "HfCacheManager",
+    "HfDataCardFetcher",
+    "HfRepoStructureFetcher",
+    "HfSizeInfoFetcher",
+    "MetadataConfig",
+    "PropertyMapping",
+    "RepositoryConfig",
+    "VirtualDB",
+    "DatasetCard",
+    "DatasetConfig",
+    "DatasetType",
+    "ExtractedMetadata",
+    "FeatureInfo",
+    "MetadataRelationship",
 ]
diff --git a/tfbpapi/constants.py b/tfbpapi/constants.py
new file mode 100644
index 0000000..749678f
--- /dev/null
+++ b/tfbpapi/constants.py
@@ -0,0 +1,11 @@
+import os
+from pathlib import Path
+
+from huggingface_hub.constants import HF_HUB_CACHE
+
+CACHE_DIR = Path(os.getenv("HF_CACHE_DIR", HF_HUB_CACHE))
+
+
+def get_hf_token() -> str | None:
+    """Get HuggingFace token from environment variable."""
+    return os.getenv("HF_TOKEN")
diff --git a/tfbpapi/datacard.py b/tfbpapi/datacard.py
new file mode 100644
index 0000000..b8798fc
--- /dev/null
+++ b/tfbpapi/datacard.py
@@ -0,0 +1,492 @@
+"""
+DataCard class for parsing and exploring HuggingFace dataset metadata.
+
+This module provides the DataCard class for parsing HuggingFace dataset cards
+into structured Python objects that can be easily explored. The focus is on
+enabling users to drill down into the YAML structure to understand:
+
+- Dataset configurations and their types
+- Feature definitions and roles
+- Experimental conditions at all hierarchy levels (top/config/field)
+- Field-level condition definitions
+- Metadata relationships
+
+Users can then use this information to plan metadata table structures and
+data loading strategies.
+
+"""
+
+import logging
+from typing import Any
+
+from pydantic import ValidationError
+
+from tfbpapi.errors import DataCardError, DataCardValidationError, HfDataFetchError
+from tfbpapi.fetchers import (
+    HfDataCardFetcher,
+    HfRepoStructureFetcher,
+    HfSizeInfoFetcher,
+)
+from tfbpapi.models import (
+    DatasetCard,
+    DatasetConfig,
+    ExtractedMetadata,
+    FeatureInfo,
+    MetadataRelationship,
+)
+
+
+class DataCard:
+    """
+    Parser and explorer for HuggingFace dataset metadata.
+
+    The parsed structure uses Pydantic models with `extra="allow"` to accept
+    arbitrary fields (like experimental_conditions) without requiring code
+    changes.
+
+    Key capabilities:
+    - Parse dataset card YAML into structured objects
+    - Navigate experimental conditions at 3 levels (top/config/field)
+    - Explore field definitions and roles
+    - Extract metadata schema for table design
+    - Discover metadata relationships
+
+    Example:
+        >>> card = DataCard("BrentLab/harbison_2004")
+        >>> # Use context manager for config exploration
+        >>> with card.config("harbison_2004") as cfg:
+        ...     # Get all experimental conditions
+        ...     conds = cfg.experimental_conditions()
+        ...     # Get condition fields with definitions
+        ...     fields = cfg.condition_fields()
+        ...     # Drill down into specific field
+        ...     for name, info in fields.items():
+        ...         for value, definition in info['definitions'].items():
+        ...             print(f"{name}={value}: {definition}")
+
+    Example (legacy API still supported):
+        >>> card = DataCard("BrentLab/harbison_2004")
+        >>> conditions = card.get_experimental_conditions("harbison_2004")
+        >>> defs = card.get_field_definitions("harbison_2004", "condition")
+
+    """
+
+    def __init__(self, repo_id: str, token: str | None = None):
+        """
+        Initialize DataCard for a repository.
+
+        :param repo_id: HuggingFace repository identifier (e.g., "user/dataset")
+        :param token: Optional HuggingFace token for authentication
+
+        """
+        self.repo_id = repo_id
+        self.token = token
+        self.logger = logging.getLogger(self.__class__.__name__)
+
+        # Initialize fetchers
+        self._card_fetcher = HfDataCardFetcher(token=token)
+        self._structure_fetcher = HfRepoStructureFetcher(token=token)
+        self._size_fetcher = HfSizeInfoFetcher(token=token)
+
+        # Cache for parsed card
+        self._dataset_card: DatasetCard | None = None
+        self._metadata_cache: dict[str, list[ExtractedMetadata]] = {}
+
+    @property
+    def dataset_card(self) -> DatasetCard:
+        """Get the validated dataset card."""
+        if self._dataset_card is None:
+            self._load_and_validate_card()
+        # this is here for type checking purposes. _load_and_validate_card()
+        # will either set the _dataset_card or raise an error
+        assert self._dataset_card is not None
+        return self._dataset_card
+
+    def _load_and_validate_card(self) -> None:
+        """Load and validate the dataset card from HuggingFace."""
+        try:
+            self.logger.debug(f"Loading dataset card for {self.repo_id}")
+            card_data = self._card_fetcher.fetch(self.repo_id)
+
+            if not card_data:
+                raise DataCardValidationError(
+                    f"No dataset card found for {self.repo_id}"
+                )
+
+            # Validate using Pydantic model
+            self._dataset_card = DatasetCard(**card_data)
+            self.logger.debug(f"Successfully validated dataset card for {self.repo_id}")
+
+        except ValidationError as e:
+            # Create a more user-friendly error message
+            error_details = []
+            for error in e.errors():
+                field_path = " -> ".join(str(x) for x in error["loc"])
+                error_type = error["type"]
+                error_msg = error["msg"]
+                input_value = error.get("input", "N/A")
+
+                if "dtype" in field_path and error_type == "string_type":
+                    error_details.append(
+                        f"Field '{field_path}': Expected a simple data type "
+                        "string (like 'string', 'int64', 'float64') "
+                        "but got a complex structure. This might be a categorical "
+                        "field with class labels. "
+                        f"Actual value: {input_value}"
+                    )
+                else:
+                    error_details.append(
+                        f"Field '{field_path}': {error_msg} (got: {input_value})"
+                    )
+
+            detailed_msg = (
+                f"Dataset card validation failed for {self.repo_id}:\n"
+                + "\n".join(f"  - {detail}" for detail in error_details)
+            )
+            self.logger.error(detailed_msg)
+            raise DataCardValidationError(detailed_msg) from e
+        except HfDataFetchError as e:
+            raise DataCardError(f"Failed to fetch dataset card: {e}") from e
+
+    @property
+    def configs(self) -> list[DatasetConfig]:
+        """Get all dataset configurations."""
+        return self.dataset_card.configs
+
+    def get_config(self, config_name: str) -> DatasetConfig | None:
+        """Get a specific configuration by name."""
+        return self.dataset_card.get_config_by_name(config_name)
+
+    def get_features(self, config_name: str) -> list[FeatureInfo]:
+        """
+        Get all feature definitions for a configuration.
+
+        :param config_name: Configuration name
+        :return: List of FeatureInfo objects
+        :raises DataCardError: If config not found
+
+        """
+        config = self.get_config(config_name)
+        if not config:
+            raise DataCardError(f"Configuration '{config_name}' not found")
+
+        return config.dataset_info.features
+
+    def _extract_partition_values(
+        self, config: DatasetConfig, field_name: str
+    ) -> set[str]:
+        """Extract values from partition structure."""
+        if (
+            not config.dataset_info.partitioning
+            or not config.dataset_info.partitioning.enabled
+        ):
+            return set()
+
+        partition_columns = config.dataset_info.partitioning.partition_by or []
+        if field_name not in partition_columns:
+            return set()
+
+        try:
+            # Get partition values from repository structure
+            partition_values = self._structure_fetcher.get_partition_values(
+                self.repo_id, field_name
+            )
+            return set(partition_values)
+        except HfDataFetchError:
+            self.logger.warning(f"Failed to extract partition values for {field_name}")
+            return set()
+
+    def get_metadata_relationships(
+        self, refresh_cache: bool = False
+    ) -> list[MetadataRelationship]:
+        """
+        Get relationships between data configs and their metadata.
+
+        :param refresh_cache: If True, force refresh dataset card from remote
+
+        """
+        # Clear cached dataset card if refresh requested
+        if refresh_cache:
+            self._dataset_card = None
+
+        relationships = []
+        data_configs = self.dataset_card.get_data_configs()
+        metadata_configs = self.dataset_card.get_metadata_configs()
+
+        for data_config in data_configs:
+            # Check for explicit applies_to relationships
+            for meta_config in metadata_configs:
+                if (
+                    meta_config.applies_to
+                    and data_config.config_name in meta_config.applies_to
+                ):
+                    relationships.append(
+                        MetadataRelationship(
+                            data_config=data_config.config_name,
+                            metadata_config=meta_config.config_name,
+                            relationship_type="explicit",
+                        )
+                    )
+
+            # Check for embedded metadata (always runs regardless of
+            # explicit relationships)
+            if data_config.metadata_fields:
+                relationships.append(
+                    MetadataRelationship(
+                        data_config=data_config.config_name,
+                        metadata_config=f"{data_config.config_name}_embedded",
+                        relationship_type="embedded",
+                    )
+                )
+
+        return relationships
+
+    def get_repository_info(self) -> dict[str, Any]:
+        """Get general repository information."""
+        card = self.dataset_card
+
+        try:
+            structure = self._structure_fetcher.fetch(self.repo_id)
+            total_files = structure.get("total_files", 0)
+            last_modified = structure.get("last_modified")
+        except HfDataFetchError:
+            total_files = None
+            last_modified = None
+
+        return {
+            "repo_id": self.repo_id,
+            "pretty_name": card.pretty_name,
+            "license": card.license,
+            "tags": card.tags,
+            "language": card.language,
+            "size_categories": card.size_categories,
+            "num_configs": len(card.configs),
+            "dataset_types": [config.dataset_type.value for config in card.configs],
+            "total_files": total_files,
+            "last_modified": last_modified,
+            "has_default_config": self.dataset_card.get_default_config() is not None,
+        }
+
+    def extract_metadata_schema(self, config_name: str) -> dict[str, Any]:
+        """
+        Extract complete metadata schema for planning metadata table structure.
+
+        This is the primary method for understanding what metadata is available and
+        how to structure it into a metadata table. It consolidates information from
+        all sources:
+
+        - **Field roles**: Which fields are regulators, targets, conditions, etc.
+        - **Top-level conditions**: Repo-wide conditions (constant for all samples)
+        - **Config-level conditions**: Config-specific conditions
+          (constant for this config)
+        - **Field-level definitions**: Per-sample condition definitions
+
+        The returned schema provides all the information needed to:
+        1. Identify sample identifier fields (regulator_identifier, etc.)
+        2. Determine which conditions are constant vs. variable
+        3. Access condition definitions for creating flattened columns
+        4. Plan metadata table structure
+
+        :param config_name: Configuration name to extract schema for
+        :return: Dict with comprehensive schema including:
+            - regulator_fields: List of regulator identifier field names
+            - target_fields: List of target identifier field names
+            - condition_fields: List of experimental_condition field names
+            - condition_definitions: Dict mapping field -> value -> definition
+            - top_level_conditions: Dict of repo-wide conditions
+            - config_level_conditions: Dict of config-specific conditions
+        :raises DataCardError: If configuration not found
+
+        Example:
+            >>> schema = card.extract_metadata_schema('harbison_2004')
+            >>> # Identify identifier fields
+            >>> print(f"Regulator fields: {schema['regulator_fields']}")
+            >>> # Check for constant conditions
+            >>> if schema['top_level_conditions']:
+            ...     print("Has repo-wide constant conditions")
+            >>> # Get field-level definitions for metadata table
+            >>> for field in schema['condition_fields']:
+            ...     defs = schema['condition_definitions'][field]
+            ...     print(f"{field} has {len(defs)} levels")
+
+        """
+        config = self.get_config(config_name)
+        if not config:
+            raise DataCardError(f"Configuration '{config_name}' not found")
+
+        schema: dict[str, Any] = {
+            "regulator_fields": [],  # Fields with role=regulator_identifier
+            "target_fields": [],  # Fields with role=target_identifier
+            "condition_fields": [],  # Fields with role=experimental_condition
+            "condition_definitions": {},  # Field-level condition details
+            "top_level_conditions": None,  # Repo-level conditions
+            "config_level_conditions": None,  # Config-level conditions
+        }
+
+        for feature in config.dataset_info.features:
+            if feature.role == "regulator_identifier":
+                schema["regulator_fields"].append(feature.name)
+            elif feature.role == "target_identifier":
+                schema["target_fields"].append(feature.name)
+            elif feature.role == "experimental_condition":
+                schema["condition_fields"].append(feature.name)
+                if feature.definitions:
+                    schema["condition_definitions"][feature.name] = feature.definitions
+
+        # Add top-level conditions (applies to all configs/samples)
+        # Stored in model_extra as dict
+        if self.dataset_card.model_extra:
+            top_level = self.dataset_card.model_extra.get("experimental_conditions")
+            if top_level:
+                schema["top_level_conditions"] = top_level
+
+        # Add config-level conditions (applies to this config's samples)
+        # Stored in model_extra as dict
+        if config.model_extra:
+            config_level = config.model_extra.get("experimental_conditions")
+            if config_level:
+                schema["config_level_conditions"] = config_level
+
+        return schema
+
+    def get_experimental_conditions(
+        self, config_name: str | None = None
+    ) -> dict[str, Any]:
+        """
+        Get experimental conditions with proper hierarchy handling.
+
+        This method enables drilling down into the experimental conditions hierarchy:
+        - Top-level (repo-wide): Common to all configs/samples
+        - Config-level: Specific to a config, common to its samples
+        - Field-level: Per-sample variation (use get_field_definitions instead)
+
+        Returns experimental conditions at the appropriate level:
+        - If config_name is None: returns top-level (repo-wide) conditions only
+        - If config_name is provided: returns merged (top + config) conditions
+
+        All conditions are returned as flexible dicts that preserve the original
+        YAML structure. Navigate nested dicts to access specific values.
+
+        :param config_name: Optional config name. If provided, merges top
+          and config levels
+        :return: Dict of experimental conditions (empty dict if none defined)
+
+        Example:
+            >>> # Get top-level conditions
+            >>> top = card.get_experimental_conditions()
+            >>> temp = top.get('temperature_celsius', 30)
+            >>>
+            >>> # Get merged conditions for a config
+            >>> merged = card.get_experimental_conditions('config_name')
+            >>> media = merged.get('media', {})
+            >>> media_name = media.get('name', 'unspecified')
+
+        """
+        # Get top-level conditions (stored in model_extra)
+        top_level = (
+            self.dataset_card.model_extra.get("experimental_conditions", {})
+            if self.dataset_card.model_extra
+            else {}
+        )
+
+        # If no config specified, return top-level only
+        if config_name is None:
+            return top_level.copy() if isinstance(top_level, dict) else {}
+
+        # Get config-level conditions
+        config = self.get_config(config_name)
+        if not config:
+            raise DataCardError(f"Configuration '{config_name}' not found")
+
+        config_level = (
+            config.model_extra.get("experimental_conditions", {})
+            if config.model_extra
+            else {}
+        )
+
+        # Merge: config-level overrides top-level
+        merged = {}
+        if isinstance(top_level, dict):
+            merged.update(top_level)
+        if isinstance(config_level, dict):
+            merged.update(config_level)
+
+        return merged
+
+    def get_field_definitions(
+        self, config_name: str, field_name: str
+    ) -> dict[str, Any]:
+        """
+        Get definitions for a specific field (field-level conditions).
+
+        This is the third level of the experimental conditions hierarchy - conditions
+        that vary per sample. Returns a dict mapping each possible field value to its
+        detailed specification.
+
+        For fields with role=experimental_condition, the definitions typically include
+        nested structures like media composition, temperature, treatments, etc. that
+        define what each categorical value means experimentally.
+
+        :param config_name: Configuration name
+        :param field_name: Field name (typically has role=experimental_condition)
+        :return: Dict mapping field values to their definition dicts
+          (empty if no definitions)
+        :raises DataCardError: If config or field not found
+
+        Example:
+            >>> # Get condition definitions
+            >>> defs = card.get_field_definitions('harbison_2004', 'condition')
+            >>> # defs = {'YPD': {...}, 'HEAT': {...}, ...}
+            >>>
+            >>> # Drill down into a specific condition
+            >>> ypd = defs['YPD']
+            >>> env_conds = ypd.get('environmental_conditions', {})
+            >>> media = env_conds.get('media', {})
+            >>> media_name = media.get('name')
+
+        """
+        config = self.get_config(config_name)
+        if not config:
+            raise DataCardError(f"Configuration '{config_name}' not found")
+
+        # Find the feature
+        feature = None
+        for f in config.dataset_info.features:
+            if f.name == field_name:
+                feature = f
+                break
+
+        if not feature:
+            raise DataCardError(
+                f"Field '{field_name}' not found in config '{config_name}'"
+            )
+
+        # Return definitions if present, otherwise empty dict
+        return feature.definitions if feature.definitions else {}
+
+    def summary(self) -> str:
+        """Get a human-readable summary of the dataset."""
+        card = self.dataset_card
+        info = self.get_repository_info()
+
+        lines = [
+            f"Dataset: {card.pretty_name or self.repo_id}",
+            f"Repository: {self.repo_id}",
+            f"License: {card.license or 'Not specified'}",
+            f"Configurations: {len(card.configs)}",
+            f"Dataset Types: {', '.join(info['dataset_types'])}",
+        ]
+
+        if card.tags:
+            lines.append(f"Tags: {', '.join(card.tags)}")
+
+        # Add config summaries
+        lines.append("\nConfigurations:")
+        for config in card.configs:
+            default_mark = " (default)" if config.default else ""
+            lines.append(
+                f"  - {config.config_name}: {config.dataset_type.value}{default_mark}"
+            )
+            lines.append(f"    {config.description}")
+
+        return "\n".join(lines)
diff --git a/tfbpapi/errors.py b/tfbpapi/errors.py
new file mode 100644
index 0000000..cbacc92
--- /dev/null
+++ b/tfbpapi/errors.py
@@ -0,0 +1,39 @@
+"""Custom exception classes for dataset management."""
+
+from typing import Any
+
+
+class HfDataFetchError(Exception):
+    """Raised when HuggingFace API requests fail."""
+
+    def __init__(
+        self,
+        message: str,
+        repo_id: str | None = None,
+        status_code: int | None = None,
+        endpoint: str | None = None,
+    ):
+        super().__init__(message)
+        self.repo_id = repo_id
+        self.status_code = status_code
+        self.endpoint = endpoint
+
+
+class DataCardError(Exception):
+    """Base exception for DataCard operations."""
+
+    pass
+
+
+class DataCardValidationError(DataCardError):
+    """Exception raised when dataset card validation fails."""
+
+    def __init__(
+        self,
+        message: str,
+        repo_id: str | None = None,
+        validation_errors: list | None = None,
+    ):
+        super().__init__(message)
+        self.repo_id = repo_id
+        self.validation_errors = validation_errors or []
diff --git a/tfbpapi/fetchers.py b/tfbpapi/fetchers.py
new file mode 100644
index 0000000..c8d978f
--- /dev/null
+++ b/tfbpapi/fetchers.py
@@ -0,0 +1,244 @@
+"""Data fetchers for HuggingFace Hub integration."""
+
+import logging
+import re
+from typing import Any
+
+import requests
+from huggingface_hub import DatasetCard, repo_info
+from requests import HTTPError
+
+from tfbpapi.constants import get_hf_token
+from tfbpapi.errors import HfDataFetchError
+
+
+class HfDataCardFetcher:
+    """Handles fetching dataset cards from HuggingFace Hub."""
+
+    def __init__(self, token: str | None = None):
+        """
+        Initialize the fetcher.
+
+        :param token: HuggingFace token for authentication
+
+        """
+        self.logger = logging.getLogger(self.__class__.__name__)
+        self.token = token or get_hf_token()
+
+    def fetch(self, repo_id: str, repo_type: str = "dataset") -> dict[str, Any]:
+        """
+        Fetch and return dataset card data.
+
+        :param repo_id: Repository identifier (e.g., "user/dataset")
+        :param repo_type: Type of repository ("dataset", "model", "space")
+        :return: Dataset card data as dictionary
+        :raises HfDataFetchError: If fetching fails
+
+        """
+        try:
+            self.logger.debug(f"Fetching dataset card for {repo_id}")
+            card = DatasetCard.load(repo_id, repo_type=repo_type, token=self.token)
+
+            if not card.data:
+                self.logger.warning(f"Dataset card for {repo_id} has no data section")
+                return {}
+
+            return card.data.to_dict()
+
+        except Exception as e:
+            error_msg = f"Failed to fetch dataset card for {repo_id}: {e}"
+            self.logger.error(error_msg)
+            raise HfDataFetchError(error_msg) from e
+
+
+class HfSizeInfoFetcher:
+    """Handles fetching size information from HuggingFace Dataset Server API."""
+
+    def __init__(self, token: str | None = None):
+        """
+        Initialize the fetcher.
+
+        :param token: HuggingFace token for authentication
+
+        """
+        self.logger = logging.getLogger(self.__class__.__name__)
+        self.token = token or get_hf_token()
+        self.base_url = "https://datasets-server.huggingface.co"
+
+    def _build_headers(self) -> dict[str, str]:
+        """Build request headers with authentication if available."""
+        headers = {"User-Agent": "TFBP-API/1.0"}
+        if self.token:
+            headers["Authorization"] = f"Bearer {self.token}"
+        return headers
+
+    def fetch(self, repo_id: str) -> dict[str, Any]:
+        """
+        Fetch dataset size information.
+
+        :param repo_id: Repository identifier (e.g., "user/dataset")
+        :return: Size information as dictionary
+        :raises HfDataFetchError: If fetching fails
+
+        """
+        url = f"{self.base_url}/size"
+        params = {"dataset": repo_id}
+        headers = self._build_headers()
+
+        try:
+            self.logger.debug(f"Fetching size info for {repo_id}")
+            response = requests.get(url, params=params, headers=headers, timeout=30)
+            response.raise_for_status()
+
+            data = response.json()
+            self.logger.debug(f"Size info fetched successfully for {repo_id}")
+            return data
+
+        except HTTPError as e:
+            if e.response.status_code == 404:
+                error_msg = f"Dataset {repo_id} not found"
+            elif e.response.status_code == 403:
+                error_msg = (
+                    f"Access denied to dataset {repo_id} (check token permissions)"
+                )
+            else:
+                error_msg = f"HTTP error fetching size for {repo_id}: {e}"
+
+            self.logger.error(error_msg)
+            raise HfDataFetchError(error_msg) from e
+
+        except requests.RequestException as e:
+            error_msg = f"Request failed fetching size for {repo_id}: {e}"
+            self.logger.error(error_msg)
+            raise HfDataFetchError(error_msg) from e
+
+        except ValueError as e:
+            error_msg = f"Invalid JSON response fetching size for {repo_id}: {e}"
+            self.logger.error(error_msg)
+            raise HfDataFetchError(error_msg) from e
+
+
+class HfRepoStructureFetcher:
+    """Handles fetching repository structure from HuggingFace Hub."""
+
+    def __init__(self, token: str | None = None):
+        """
+        Initialize the fetcher.
+
+        :param token: HuggingFace token for authentication
+
+        """
+        self.logger = logging.getLogger(self.__class__.__name__)
+        self.token = token or get_hf_token()
+        self._cached_structure: dict[str, dict[str, Any]] = {}
+
+    def fetch(self, repo_id: str, force_refresh: bool = False) -> dict[str, Any]:
+        """
+        Fetch repository structure information.
+
+        :param repo_id: Repository identifier (e.g., "user/dataset")
+        :param force_refresh: If True, bypass cache and fetch fresh data
+        :return: Repository structure information
+        :raises HfDataFetchError: If fetching fails
+
+        """
+        # Check cache first unless force refresh is requested
+        if not force_refresh and repo_id in self._cached_structure:
+            self.logger.debug(f"Using cached repo structure for {repo_id}")
+            return self._cached_structure[repo_id]
+
+        try:
+            self.logger.debug(f"Fetching repo structure for {repo_id}")
+            info = repo_info(repo_id=repo_id, repo_type="dataset", token=self.token)
+
+            # Extract file structure
+            files = []
+            partitions: dict[str, set] = {}
+
+            for sibling in info.siblings or []:
+                file_info = {
+                    "path": sibling.rfilename,
+                    "size": sibling.size,
+                    "is_lfs": sibling.lfs is not None,
+                }
+                files.append(file_info)
+
+                # Extract partition information from file paths
+                self._extract_partition_info(sibling.rfilename, partitions)
+
+            result = {
+                "repo_id": repo_id,
+                "files": files,
+                "partitions": partitions,
+                "total_files": len(files),
+                "last_modified": (
+                    info.last_modified.isoformat() if info.last_modified else None
+                ),
+            }
+
+            # Cache the result
+            self._cached_structure[repo_id] = result
+            return result
+
+        except Exception as e:
+            error_msg = f"Failed to fetch repo structure for {repo_id}: {e}"
+            self.logger.error(error_msg)
+            raise HfDataFetchError(error_msg) from e
+
+    def _extract_partition_info(
+        self, file_path: str, partitions: dict[str, set[str]]
+    ) -> None:
+        """
+        Extract partition information from file paths.
+
+        :param file_path: Path to analyze for partitions
+        :param partitions: Dictionary to update with partition info
+
+        """
+        # Look for partition patterns like "column=value" in path
+        partition_pattern = r"([^/=]+)=([^/]+)"
+        matches = re.findall(partition_pattern, file_path)
+
+        for column, value in matches:
+            if column not in partitions:
+                partitions[column] = set()
+            partitions[column].add(value)
+
+    def get_partition_values(
+        self, repo_id: str, partition_column: str, force_refresh: bool = False
+    ) -> list[str]:
+        """
+        Get all values for a specific partition column.
+
+        :param repo_id: Repository identifier
+        :param partition_column: Name of the partition column
+        :param force_refresh: If True, bypass cache and fetch fresh data
+        :return: List of unique partition values
+        :raises HfDataFetchError: If fetching fails
+
+        """
+        structure = self.fetch(repo_id, force_refresh=force_refresh)
+        partition_values = structure.get("partitions", {}).get(partition_column, set())
+        return sorted(list(partition_values))
+
+    def get_dataset_files(
+        self, repo_id: str, path_pattern: str | None = None, force_refresh: bool = False
+    ) -> list[dict[str, Any]]:
+        """
+        Get dataset files, optionally filtered by path pattern.
+
+        :param repo_id: Repository identifier
+        :param path_pattern: Optional regex pattern to filter files
+        :param force_refresh: If True, bypass cache and fetch fresh data
+        :return: List of matching files
+        :raises HfDataFetchError: If fetching fails
+
+        """
+        structure = self.fetch(repo_id, force_refresh=force_refresh)
+        files = structure["files"]
+
+        if path_pattern:
+            pattern = re.compile(path_pattern)
+            files = [f for f in files if pattern.search(f["path"])]
+
+        return files
diff --git a/tfbpapi/hf_cache_manager.py b/tfbpapi/hf_cache_manager.py
new file mode 100644
index 0000000..26ca708
--- /dev/null
+++ b/tfbpapi/hf_cache_manager.py
@@ -0,0 +1,631 @@
+import logging
+from datetime import datetime, timedelta
+from pathlib import Path
+from typing import Any, Literal
+
+import duckdb
+from huggingface_hub import scan_cache_dir, try_to_load_from_cache
+from huggingface_hub.utils import DeleteCacheStrategy
+
+from tfbpapi.datacard import DataCard
+
+
+class HfCacheManager(DataCard):
+    """Enhanced cache management for Hugging Face Hub with metadata-focused
+    retrieval."""
+
+    def __init__(
+        self,
+        repo_id: str,
+        duckdb_conn: duckdb.DuckDBPyConnection,
+        token: str | None = None,
+        logger: logging.Logger | None = None,
+    ):
+        super().__init__(repo_id, token)
+        self.duckdb_conn = duckdb_conn
+        self.logger = logger or logging.getLogger(__name__)
+
+    def _get_metadata_for_config(
+        self, config, force_refresh: bool = False
+    ) -> dict[str, Any]:
+        """
+        Get metadata for a specific configuration using 3-case strategy.
+
+        :param config: Configuration object to process
+        :param force_refresh: If True, skip cache checks and download fresh from remote
+
+        """
+        config_result = {
+            "config_name": config.config_name,
+            "strategy": None,
+            "table_name": None,
+            "success": False,
+            "message": "",
+        }
+
+        table_name = f"metadata_{config.config_name}"
+
+        try:
+            # Skip cache checks if force_refresh is True
+            if not force_refresh:
+                # Case 1: Check if metadata already exists in DuckDB
+                if self._check_metadata_exists_in_duckdb(table_name):
+                    config_result.update(
+                        {
+                            "strategy": "duckdb_exists",
+                            "table_name": table_name,
+                            "success": True,
+                            "message": f"Metadata table {table_name} "
+                            "already exists in DuckDB",
+                        }
+                    )
+                    return config_result
+
+                # Case 2: Check if HF data is in cache, create DuckDB representation
+                if self._load_metadata_from_cache(config, table_name):
+                    config_result.update(
+                        {
+                            "strategy": "cache_loaded",
+                            "table_name": table_name,
+                            "success": True,
+                            "message": "Loaded metadata from cache "
+                            f"into table {table_name}",
+                        }
+                    )
+                    return config_result
+
+            # Case 3: Download from HF (explicit vs embedded)
+            if self._download_and_load_metadata(config, table_name):
+                config_result.update(
+                    {
+                        "strategy": "downloaded",
+                        "table_name": table_name,
+                        "success": True,
+                        "message": "Downloaded and loaded metadata "
+                        f"into table {table_name}",
+                    }
+                )
+                return config_result
+
+            config_result["message"] = (
+                f"Failed to retrieve metadata for {config.config_name}"
+            )
+
+        except Exception as e:
+            config_result["message"] = f"Error processing {config.config_name}: {e}"
+            self.logger.error(f"Error in metadata config {config.config_name}: {e}")
+
+        return config_result
+
+    def _check_metadata_exists_in_duckdb(self, table_name: str) -> bool:
+        """Case 1: Check if metadata table already exists in DuckDB database."""
+        try:
+            # Query information schema to check if table exists
+            result = self.duckdb_conn.execute(
+                "SELECT table_name FROM information_schema.tables WHERE table_name = ?",
+                [table_name],
+            ).fetchone()
+
+            exists = result is not None
+            if exists:
+                self.logger.debug(f"Table {table_name} already exists in DuckDB")
+            return exists
+
+        except Exception as e:
+            self.logger.debug(f"Error checking DuckDB table existence: {e}")
+            return False
+
+    def _load_metadata_from_cache(self, config, table_name: str) -> bool:
+        """Case 2: HF data in cache, create DuckDB representation."""
+        try:
+            # Check if metadata files are cached locally
+            cached_files = []
+            for data_file in config.data_files:
+                cached_path = try_to_load_from_cache(
+                    repo_id=self.repo_id,
+                    filename=data_file.path,
+                    repo_type="dataset",
+                )
+
+                if isinstance(cached_path, str) and Path(cached_path).exists():
+                    cached_files.append(cached_path)
+
+            if not cached_files:
+                self.logger.debug(f"No cached files found for {config.config_name}")
+                return False
+
+            # Load cached parquet files into DuckDB
+            self._create_duckdb_table_from_files(
+                cached_files, table_name, config.config_name
+            )
+            self.logger.info(
+                f"Loaded {len(cached_files)} cached files into {table_name}"
+            )
+            return True
+
+        except Exception as e:
+            self.logger.debug(f"Error loading from cache for {config.config_name}: {e}")
+            return False
+
+    def _download_and_load_metadata(self, config, table_name: str) -> bool:
+        """Case 3: Download from HF (explicit vs embedded)."""
+        try:
+            from huggingface_hub import snapshot_download
+
+            # Download specific files for this metadata config
+            file_patterns = [data_file.path for data_file in config.data_files]
+
+            downloaded_path = snapshot_download(
+                repo_id=self.repo_id,
+                repo_type="dataset",
+                allow_patterns=file_patterns,
+                token=self.token,
+            )
+
+            # Find downloaded parquet files
+            downloaded_files = []
+            for pattern in file_patterns:
+                file_path = Path(downloaded_path) / pattern
+                if file_path.exists() and file_path.suffix == ".parquet":
+                    downloaded_files.append(str(file_path))
+                else:
+                    # Handle wildcard patterns, including nested wildcards
+                    if "*" in pattern:
+                        # Use glob on the full pattern relative to downloaded_path
+                        base_path = Path(downloaded_path)
+                        matching_files = list(base_path.glob(pattern))
+                        downloaded_files.extend(
+                            [str(f) for f in matching_files if f.suffix == ".parquet"]
+                        )
+                    else:
+                        # Handle non-wildcard patterns that might be directories
+                        parent_dir = Path(downloaded_path) / Path(pattern).parent
+                        if parent_dir.exists():
+                            downloaded_files.extend(
+                                [str(f) for f in parent_dir.glob("*.parquet")]
+                            )
+
+            if not downloaded_files:
+                self.logger.warning(
+                    f"No parquet files found after download for {config.config_name}"
+                )
+                return False
+
+            # Load downloaded files into DuckDB
+            self._create_duckdb_table_from_files(
+                downloaded_files, table_name, config.config_name
+            )
+            self.logger.info(
+                f"Downloaded and loaded {len(downloaded_files)} files into {table_name}"
+            )
+            return True
+
+        except Exception as e:
+            self.logger.error(
+                f"Error downloading metadata for {config.config_name}: {e}"
+            )
+            return False
+
+    def _create_duckdb_table_from_files(
+        self, file_paths: list[str], table_name: str, config_name: str
+    ) -> None:
+        """Create DuckDB table/view from parquet files."""
+        if len(file_paths) == 1:
+            # Single file
+            create_sql = f"""
+            CREATE OR REPLACE VIEW {table_name} AS
+            SELECT * FROM read_parquet('{file_paths[0]}')
+            """
+        else:
+            # Multiple files
+            files_str = "', '".join(file_paths)
+            create_sql = f"""
+            CREATE OR REPLACE VIEW {table_name} AS
+            SELECT * FROM read_parquet(['{files_str}'])
+            """
+
+        self.duckdb_conn.execute(create_sql)
+        self.logger.debug(
+            f"Created DuckDB view {table_name} from {len(file_paths)} files"
+        )
+
+        # Validate source_sample fields if they exist
+        self._validate_source_sample_fields(table_name, config_name)
+
+    def _validate_source_sample_fields(self, table_name: str, config_name: str) -> None:
+        """
+        Validate source_sample fields have correct format.
+
+        Composite sample identifiers must be in the format:
+        "repo_id;config_name;sample_id" (exactly 3 semicolon-separated parts)
+
+        """
+        config = self.get_config(config_name)
+
+        # Find all source_sample fields
+        source_sample_fields = [
+            f.name
+            for f in config.dataset_info.features  # type: ignore
+            if f.role == "source_sample"
+        ]
+
+        if not source_sample_fields:
+            return  # No validation needed
+
+        # For each field, validate format
+        for field_name in source_sample_fields:
+            query = f"""
+            SELECT {field_name},
+                   LENGTH({field_name}) - LENGTH(REPLACE({field_name}, ';', ''))
+                   AS semicolon_count
+            FROM {table_name}
+            WHERE semicolon_count != 2
+            LIMIT 1
+            """
+            result = self.duckdb_conn.execute(query).fetchone()
+
+            if result:
+                raise ValueError(
+                    f"Invalid format in field '{field_name}' "
+                    f"with role='source_sample'. "
+                    f"Expected 'repo_id;config_name;sample_id' "
+                    f"(3 semicolon-separated parts), "
+                    f"but found: '{result[0]}'"
+                )
+
+    def _extract_embedded_metadata_field(
+        self, data_table_name: str, field_name: str, metadata_table_name: str
+    ) -> bool:
+        """Extract a specific metadata field from a data table."""
+        try:
+            # Create a metadata view with unique values from the specified field
+            extract_sql = f"""
+            CREATE OR REPLACE VIEW {metadata_table_name} AS
+            SELECT DISTINCT {field_name} as value, COUNT(*) as count
+            FROM {data_table_name}
+            WHERE {field_name} IS NOT NULL
+            GROUP BY {field_name}
+            ORDER BY count DESC
+            """
+
+            self.duckdb_conn.execute(extract_sql)
+
+            # Verify the table was created and has data
+            count_result = self.duckdb_conn.execute(
+                f"SELECT COUNT(*) FROM {metadata_table_name}"
+            ).fetchone()
+
+            if count_result and count_result[0] > 0:
+                self.logger.info(
+                    f"Extracted {count_result[0]} unique values for {field_name} "
+                    f"into {metadata_table_name}"
+                )
+                return True
+            else:
+                self.logger.warning(f"No data found for field {field_name}")
+                return False
+
+        except Exception as e:
+            self.logger.error(f"Error extracting field {field_name}: {e}")
+            return False
+
+    def clean_cache_by_age(
+        self,
+        max_age_days: int = 30,
+        dry_run: bool = True,
+    ) -> DeleteCacheStrategy:
+        """
+        Clean cache entries older than specified age.
+
+        :param max_age_days: Remove revisions older than this many days
+        :param  dry_run: If True, show what would be deleted without executing
+            size_threshold: Only delete if total cache size exceeds this (e.g., "10GB")
+
+        :return: DeleteCacheStrategy object that can be executed
+
+        """
+        cache_info = scan_cache_dir()
+        cutoff_date = datetime.now() - timedelta(days=max_age_days)
+
+        old_revisions = []
+        for repo in cache_info.repos:
+            for revision in repo.revisions:
+                # Check if revision is older than cutoff
+                revision_date = datetime.fromtimestamp(revision.last_modified)
+                if revision_date < cutoff_date:
+                    old_revisions.append(revision.commit_hash)
+                    self.logger.debug(
+                        f"Marking for deletion: {revision.commit_hash} "
+                        f"(last modified: {revision.last_modified})"
+                    )
+
+        if not old_revisions:
+            self.logger.info("No old revisions found to delete")
+            # return None
+
+        delete_strategy = cache_info.delete_revisions(*old_revisions)
+
+        self.logger.info(
+            f"Found {len(old_revisions)} old revisions. "
+            f"Will free {delete_strategy.expected_freed_size_str}"
+        )
+
+        if not dry_run:
+            delete_strategy.execute()
+            self.logger.info(
+                f"Cache cleanup completed. Freed "
+                f"{delete_strategy.expected_freed_size_str}"
+            )
+        else:
+            self.logger.info("Dry run completed. Use dry_run=False to execute deletion")
+
+        return delete_strategy
+
+    def clean_cache_by_size(
+        self,
+        target_size: str,
+        strategy: Literal[
+            "oldest_first", "largest_first", "least_used"
+        ] = "oldest_first",
+        dry_run: bool = True,
+    ) -> DeleteCacheStrategy:
+        """
+        Clean cache to reach target size by removing revisions.
+
+        :param target_size: Target cache size (e.g., "5GB", "500MB")
+        :param strategy: Deletion strategy - "oldest_first", "largest_first",
+            "least_used"
+        :param dry_run: If True, show what would be deleted without executing
+
+        :return: DeleteCacheStrategy object that can be executed
+
+        """
+        cache_info = scan_cache_dir()
+        current_size = cache_info.size_on_disk
+        target_bytes = self._parse_size_string(target_size)
+
+        if current_size <= target_bytes:
+            self.logger.info(
+                f"Cache size ({cache_info.size_on_disk_str}) already below "
+                f"target ({target_size})"
+            )
+
+        bytes_to_free = current_size - target_bytes
+
+        # Get all revisions sorted by strategy
+        all_revisions = []
+        for repo in cache_info.repos:
+            for revision in repo.revisions:
+                all_revisions.append(revision)
+
+        # Sort revisions based on strategy
+        if strategy == "oldest_first":
+            all_revisions.sort(key=lambda r: r.last_modified)
+        elif strategy == "largest_first":
+            all_revisions.sort(key=lambda r: r.size_on_disk, reverse=True)
+        elif strategy == "least_used":
+            # Use last_modified as proxy for usage
+            all_revisions.sort(key=lambda r: r.last_modified)
+        else:
+            raise ValueError(f"Unknown strategy: {strategy}")
+
+        # Select revisions to delete
+        revisions_to_delete = []
+        freed_bytes = 0
+
+        for revision in all_revisions:
+            if freed_bytes >= bytes_to_free:
+                break
+            revisions_to_delete.append(revision.commit_hash)
+            freed_bytes += revision.size_on_disk
+
+        if not revisions_to_delete:
+            self.logger.warning("No revisions selected for deletion")
+
+        delete_strategy = cache_info.delete_revisions(*revisions_to_delete)
+
+        self.logger.info(
+            f"Selected {len(revisions_to_delete)} revisions for deletion. "
+            f"Will free {delete_strategy.expected_freed_size_str}"
+        )
+
+        if not dry_run:
+            delete_strategy.execute()
+            self.logger.info(
+                f"Cache cleanup completed. Freed "
+                f"{delete_strategy.expected_freed_size_str}"
+            )
+        else:
+            self.logger.info("Dry run completed. Use dry_run=False to execute deletion")
+
+        return delete_strategy
+
+    def clean_unused_revisions(
+        self, keep_latest: int = 2, dry_run: bool = True
+    ) -> DeleteCacheStrategy:
+        """
+        Clean unused revisions, keeping only the latest N revisions per repo.
+
+        :param keep_latest: Number of latest revisions to keep per repo
+        :param dry_run: If True, show what would be deleted without executing
+        :return: DeleteCacheStrategy object that can be executed
+
+        """
+        cache_info = scan_cache_dir()
+        revisions_to_delete = []
+
+        for repo in cache_info.repos:
+            # Sort revisions by last modified (newest first)
+            sorted_revisions = sorted(
+                repo.revisions, key=lambda r: r.last_modified, reverse=True
+            )
+
+            # Keep the latest N, mark the rest for deletion
+            if len(sorted_revisions) > keep_latest:
+                old_revisions = sorted_revisions[keep_latest:]
+                for revision in old_revisions:
+                    revisions_to_delete.append(revision.commit_hash)
+                    self.logger.debug(
+                        f"Marking old revision for deletion: {repo.repo_id} - "
+                        f"{revision.commit_hash}"
+                    )
+
+        delete_strategy = cache_info.delete_revisions(*revisions_to_delete)
+
+        self.logger.info(
+            f"Found {len(revisions_to_delete)} unused revisions. "
+            f"Will free {delete_strategy.expected_freed_size_str}"
+        )
+
+        if not dry_run:
+            delete_strategy.execute()
+            self.logger.info(
+                f"Cache cleanup completed. Freed "
+                f"{delete_strategy.expected_freed_size_str}"
+            )
+        else:
+            self.logger.info("Dry run completed. Use dry_run=False to execute deletion")
+
+        return delete_strategy
+
+    def auto_clean_cache(
+        self,
+        max_age_days: int = 30,
+        max_total_size: str = "10GB",
+        keep_latest_per_repo: int = 2,
+        dry_run: bool = True,
+    ) -> list[DeleteCacheStrategy]:
+        """
+        Automated cache cleaning with multiple strategies.
+
+        :param max_age_days: Remove revisions older than this
+        :param max_total_size: Target maximum cache size
+        :param keep_latest_per_repo: Keep this many latest revisions per repo
+        :param dry_run: If True, show what would be deleted without executing
+        :return: List of DeleteCacheStrategy objects that were executed
+
+        """
+        strategies_executed = []
+
+        self.logger.info("Starting automated cache cleanup...")
+
+        # Step 1: Remove very old revisions
+        strategy = self.clean_cache_by_age(max_age_days=max_age_days, dry_run=dry_run)
+        if strategy:
+            strategies_executed.append(strategy)
+
+        # Step 2: Remove unused revisions (keep only latest per repo)
+        strategy = self.clean_unused_revisions(
+            keep_latest=keep_latest_per_repo, dry_run=dry_run
+        )
+        if strategy:
+            strategies_executed.append(strategy)
+
+        # Step 3: If still over size limit, remove more aggressively
+        cache_info = scan_cache_dir()
+        if cache_info.size_on_disk > self._parse_size_string(max_total_size):
+            strategy = self.clean_cache_by_size(
+                target_size=max_total_size, strategy="oldest_first", dry_run=dry_run
+            )
+            if strategy:
+                strategies_executed.append(strategy)
+
+        total_freed = sum(s.expected_freed_size for s in strategies_executed)
+        self.logger.info(
+            f"Automated cleanup complete. Total freed: "
+            f"{self._format_bytes(total_freed)}"
+        )
+
+        return strategies_executed
+
+    def _parse_size_string(self, size_str: str) -> int:
+        """Parse size string like '10GB' to bytes."""
+        size_str = size_str.upper().strip()
+
+        # Check longer units first to avoid partial matches
+        multipliers = {"TB": 1024**4, "GB": 1024**3, "MB": 1024**2, "KB": 1024, "B": 1}
+
+        for unit, multiplier in multipliers.items():
+            if size_str.endswith(unit):
+                number = float(size_str[: -len(unit)])
+                return int(number * multiplier)
+
+        # If no unit specified, assume bytes
+        return int(size_str)
+
+    def _format_bytes(self, bytes_size: int) -> str:
+        """Format bytes into human readable string."""
+        if bytes_size == 0:
+            return "0B"
+
+        # iterate over common units, dividing by 1024 each time, to find an
+        # appropriate unit. Default to TB if the size is very large
+        size = float(bytes_size)
+        for unit in ["B", "KB", "MB", "GB", "TB"]:
+            if size < 1024.0:
+                return f"{size:.1f}{unit}"
+            size /= 1024.0
+        return f"{size:.1f}TB"
+
+    def query(self, sql: str, config_name: str, refresh_cache: bool = False) -> Any:
+        """
+        Execute SQL query against a specific dataset configuration.
+
+        Loads the specified configuration and executes the SQL query.
+        Automatically replaces the config name in the SQL with the actual
+        table name for user convenience.
+
+        :param sql: SQL query to execute
+        :param config_name: Configuration name to query (table will be loaded
+            if needed)
+        :param refresh_cache: If True, force refresh from remote instead of
+            using cache
+        :return: DataFrame with query results
+        :raises ValueError: If config_name not found or query fails
+
+        Example:
+            mgr = HfCacheManager("BrentLab/harbison_2004", duckdb.connect())
+            df = mgr.query(
+                "SELECT DISTINCT sample_id FROM harbison_2004",
+                "harbison_2004"
+            )
+
+        """
+        # Validate config exists
+        if config_name not in [c.config_name for c in self.configs]:
+            available_configs = [c.config_name for c in self.configs]
+            raise ValueError(
+                f"Config '{config_name}' not found. "
+                f"Available configs: {available_configs}"
+            )
+
+        # Load the configuration data
+        config = self.get_config(config_name)
+        if not config:
+            raise ValueError(f"Could not retrieve config '{config_name}'")
+
+        config_result = self._get_metadata_for_config(
+            config, force_refresh=refresh_cache
+        )
+        if not config_result.get("success", False):
+            raise ValueError(
+                f"Failed to load data for config '{config_name}': "
+                f"{config_result.get('message', 'Unknown error')}"
+            )
+
+        table_name = config_result.get("table_name")
+        if not table_name:
+            raise ValueError(f"No table available for config '{config_name}'")
+
+        # Replace config name with actual table name in SQL for user convenience
+        modified_sql = sql.replace(config_name, table_name)
+
+        # Execute query
+        try:
+            result = self.duckdb_conn.execute(modified_sql).fetchdf()
+            self.logger.debug(f"Query executed successfully on {config_name}")
+            return result
+        except Exception as e:
+            self.logger.error(f"Query execution failed: {e}")
+            self.logger.error(f"SQL: {modified_sql}")
+            raise ValueError(f"Query execution failed: {e}") from e
diff --git a/tfbpapi/metric_arrays.py b/tfbpapi/metric_arrays.py
deleted file mode 100644
index 2bfaf14..0000000
--- a/tfbpapi/metric_arrays.py
+++ /dev/null
@@ -1,162 +0,0 @@
-import logging
-from collections.abc import Callable
-
-import pandas as pd
-
-logger = logging.getLogger(__name__)
-
-
-def metric_arrays(
-    res_dict: dict[str, pd.DataFrame | dict[str, pd.DataFrame]],
-    metrics_dict: dict[str, Callable],
-    rownames: str = "target_symbol",
-    colnames: str = "regulator_symbol",
-    row_dedup_func: Callable | None = None,
-    drop_incomplete_rows: bool = True,
-) -> dict[str, pd.DataFrame]:
-    """
-    Extract specified metrics from an AbstractRecordsAndFilesAPI instance's
-    read(retrieve_files=True) results object.
-
-    :param res_dict: The output of an AbstractRecordsAndFiles instance.
-    :param metrics_dict: A dictionary where the keys are metrics and the values are
-        functions to apply to rows in the event that there are multiple rows with
-        the same rownames. Set to None to raise error if duplicate rownames are found.
-    :param rownames: Column name to use for row labels.
-    :param colnames: Column name to use for column labels.
-    :param drop_incomplete_rows: When True, drops rows and columns with all NaN values.
-
-    :return: A dictionary where the metric is the key and the value is a DataFrame.
-        The column values are metric values, and the column names correspond
-        to `colnames` in the metadata DataFrame.
-
-    :raises AttributeError: If the values in `colnames` or `rownames` are not unique
-    :raises KeyError: If the res_dict does not have keys 'metadata' and 'data'
-    :raises KeyError: If the data dictionary does not have the same keys as the 'id'
-        column
-    :raises ValueError: If the metadata does not have an 'id' column
-    :raises ValueError: If either the metadata or the data dictionary values are not
-        DataFrames
-    :raises ValueError: If the `colnames` is not in the res_dict metadata
-    :raises ValueError: If the `rownames` is not in the res_dict data
-    :raises ValueError: If the metrics are not in the data dictionary
-
-    """
-
-    # Check required keys
-    if not all(k in res_dict for k in ["metadata", "data"]):
-        raise KeyError("res_dict must have keys 'metadata' and 'data'")
-
-    metadata: pd.DataFrame = res_dict["metadata"]
-
-    # Verify 'id' in metadata
-    if "id" not in metadata.columns:
-        raise ValueError("metadata must have an 'id' column")
-
-    # Check for missing keys in 'data'
-    missing_keys = [k for k in metadata["id"] if str(k) not in res_dict["data"]]
-    if missing_keys:
-        raise KeyError(
-            f"Data dictionary must have the same keys as the 'id' "
-            f"column. Missing keys: {missing_keys}"
-        )
-
-    # Ensure all data dictionary values are DataFrames
-    if not all(isinstance(v, pd.DataFrame) for v in res_dict["data"].values()):
-        raise ValueError("All values in the data dictionary must be DataFrames")
-
-    # Verify rownames in data and colnames in metadata
-    if colnames not in metadata.columns:
-        raise ValueError(f"colnames '{colnames}' not in metadata")
-    data_with_missing_rownames = [
-        id for id, df in res_dict["data"].items() if rownames not in df.columns
-    ]
-    if data_with_missing_rownames:
-        raise ValueError(
-            f"rownames '{rownames}' not in data for ids: {data_with_missing_rownames}"
-        )
-
-    # Factorize unique row and column labels
-    row_labels = pd.Index(
-        {item for df in res_dict["data"].values() for item in df[rownames].unique()}
-    )
-
-    # Initialize output dictionary with NaN DataFrames for each metric
-    output_dict = {
-        m: pd.DataFrame(index=pd.Index(row_labels, name=rownames))
-        for m in metrics_dict.keys()
-    }
-
-    # Populate DataFrames with metric values
-    info_msgs = set()
-    for _, row in metadata.iterrows():
-        try:
-            data = res_dict["data"][row["id"]]
-        except KeyError:
-            info_msgs.add("casting `id` to str to extract data from res_dict['data']")
-            data = res_dict["data"][str(row["id"])]
-
-        for metric, row_dedup_func in metrics_dict.items():
-            # Filter data to include only the rownames and metric columns
-            if metric not in data.columns:
-                raise ValueError(
-                    f"Metric '{metric}' not found in data for id '{row['id']}'"
-                )
-
-            metric_data = data[[rownames, metric]]
-
-            # Handle deduplication if row_dedup_func is provided
-            if row_dedup_func is not None:
-                metric_data = (
-                    metric_data.groupby(rownames)[metric]
-                    .apply(row_dedup_func)
-                    .reset_index()
-                )
-            else:
-                # Ensure no duplicates exist if no deduplication function is provided
-                if metric_data[rownames].duplicated().any():
-                    raise ValueError(
-                        f"Duplicate entries found for metric '{metric}' "
-                        f"in id '{row['id']}' without dedup_func"
-                    )
-
-            # test if row[colnames] is already in output_dict[metric]. If it is, add a
-            # replicate suffix and try again, Continue doing this until the column name
-            # is unique
-            colname = row[colnames]
-            suffix = 2
-            while colname in output_dict[metric].columns:
-                colname = f"{row[colnames]}_rep{suffix}"
-                suffix += 1
-            if suffix > 2:
-                info_msgs.add(
-                    f"Column name '{row[colnames]}' already exists in "
-                    f"output DataFrame for metric '{metric}'. "
-                    f"Renaming to '{colname}'"
-                )
-            # Join metric data with output DataFrame for the metric
-            output_dict[metric] = output_dict[metric].join(
-                metric_data.set_index(rownames).rename(columns={metric: colname}),
-                how="left",
-            )
-    logger.info("; ".join(info_msgs))
-
-    # Drop incomplete rows and columns if drop_incomplete_rows is True
-    if drop_incomplete_rows:
-        for metric, df in output_dict.items():
-            # Drop rows and columns where all values are NaN
-            initial_shape = df.shape
-            output_dict[metric] = df.dropna(axis=0)
-            final_shape = output_dict[metric].shape
-
-            dropped_rows = initial_shape[0] - final_shape[0]
-            dropped_columns = initial_shape[1] - final_shape[1]
-
-            if dropped_rows > 0 or dropped_columns > 0:
-                logger.warning(
-                    f"{dropped_rows} rows and {dropped_columns} "
-                    f"columns with incomplete "
-                    f"records were dropped for metric '{metric}'."
-                )
-
-    return output_dict
diff --git a/tfbpapi/models.py b/tfbpapi/models.py
new file mode 100644
index 0000000..bb86f2e
--- /dev/null
+++ b/tfbpapi/models.py
@@ -0,0 +1,734 @@
+"""
+Pydantic models for dataset card validation and metadata configuration.
+
+These models provide minimal structure for parsing HuggingFace dataset cards while
+remaining flexible enough to accommodate diverse experimental systems. Most fields use
+extra="allow" to accept domain-specific additions without requiring code changes.
+
+Also includes models for VirtualDB metadata normalization configuration.
+
+"""
+
+from enum import Enum
+from pathlib import Path
+from typing import Any
+
+import yaml  # type: ignore[import-untyped]
+from pydantic import BaseModel, ConfigDict, Field, field_validator, model_validator
+
+
+class DatasetType(str, Enum):
+    """Supported dataset types."""
+
+    GENOMIC_FEATURES = "genomic_features"
+    ANNOTATED_FEATURES = "annotated_features"
+    GENOME_MAP = "genome_map"
+    METADATA = "metadata"
+    COMPARATIVE = "comparative"
+
+
+class FeatureInfo(BaseModel):
+    """
+    Information about a dataset feature/column.
+
+    Minimal required fields with flexible dtype handling.
+
+    """
+
+    name: str = Field(..., description="Column name in the data")
+    dtype: str | dict[str, Any] = Field(
+        ...,
+        description="Data type (string, int64, float64, etc.) or class_label dict",
+    )
+    description: str = Field(..., description="Description of the field")
+    role: str | None = Field(
+        default=None,
+        description="Optional semantic role. 'experimental_condition' "
+        "has special behavior.",
+    )
+    definitions: dict[str, Any] | None = Field(
+        default=None,
+        description="For experimental_condition fields: definitions per value",
+    )
+
+
+class PartitioningInfo(BaseModel):
+    """Partitioning configuration for datasets."""
+
+    enabled: bool = Field(default=False, description="Whether partitioning is enabled")
+    partition_by: list[str] | None = Field(
+        default=None, description="Partition column names"
+    )
+    path_template: str | None = Field(
+        default=None, description="Path template for partitioned files"
+    )
+
+
+class DatasetInfo(BaseModel):
+    """Dataset structure information."""
+
+    features: list[FeatureInfo] = Field(..., description="Feature definitions")
+    partitioning: PartitioningInfo | None = Field(
+        default=None, description="Partitioning configuration"
+    )
+
+
+class DataFileInfo(BaseModel):
+    """Information about data files."""
+
+    split: str = Field(default="train", description="Dataset split name")
+    path: str = Field(..., description="Path to data file(s)")
+
+
+class DatasetConfig(BaseModel):
+    """
+    Configuration for a dataset within a repository.
+
+    Uses extra="allow" to accept arbitrary experimental_conditions and other fields.
+
+    """
+
+    config_name: str = Field(..., description="Unique configuration identifier")
+    description: str = Field(..., description="Human-readable description")
+    dataset_type: DatasetType = Field(..., description="Type of dataset")
+    default: bool = Field(
+        default=False, description="Whether this is the default config"
+    )
+    applies_to: list[str] | None = Field(
+        default=None, description="Configs this metadata applies to"
+    )
+    metadata_fields: list[str] | None = Field(
+        default=None, description="Fields for embedded metadata extraction"
+    )
+    data_files: list[DataFileInfo] = Field(..., description="Data file information")
+    dataset_info: DatasetInfo = Field(..., description="Dataset structure information")
+
+    model_config = ConfigDict(extra="allow")
+
+    @field_validator("applies_to")
+    @classmethod
+    def applies_to_only_for_metadata(cls, v, info):
+        """Validate that applies_to is only used for metadata or comparative configs."""
+        if v is not None:
+            dataset_type = info.data.get("dataset_type")
+            if dataset_type not in (DatasetType.METADATA, DatasetType.COMPARATIVE):
+                raise ValueError(
+                    "applies_to field is only valid "
+                    "for metadata and comparative dataset types"
+                )
+        return v
+
+    @field_validator("metadata_fields")
+    @classmethod
+    def metadata_fields_validation(cls, v):
+        """Validate metadata_fields usage."""
+        if v is not None and len(v) == 0:
+            raise ValueError("metadata_fields cannot be empty list, use None instead")
+        return v
+
+
+class DatasetCard(BaseModel):
+    """
+    Complete dataset card model.
+
+    Uses extra="allow" to accept arbitrary top-level metadata and
+    experimental_conditions.
+
+    """
+
+    configs: list[DatasetConfig] = Field(..., description="Dataset configurations")
+
+    model_config = ConfigDict(extra="allow")
+
+    @field_validator("configs")
+    @classmethod
+    def configs_not_empty(cls, v):
+        """Ensure at least one config is present."""
+        if not v:
+            raise ValueError("At least one dataset configuration is required")
+        return v
+
+    @field_validator("configs")
+    @classmethod
+    def unique_config_names(cls, v):
+        """Ensure config names are unique."""
+        names = [config.config_name for config in v]
+        if len(names) != len(set(names)):
+            raise ValueError("Configuration names must be unique")
+        return v
+
+    @field_validator("configs")
+    @classmethod
+    def at_most_one_default(cls, v):
+        """Ensure at most one config is marked as default."""
+        defaults = [config for config in v if config.default]
+        if len(defaults) > 1:
+            raise ValueError("At most one configuration can be marked as default")
+        return v
+
+    def get_config_by_name(self, name: str) -> DatasetConfig | None:
+        """Get a configuration by name."""
+        for config in self.configs:
+            if config.config_name == name:
+                return config
+        return None
+
+    def get_configs_by_type(self, dataset_type: DatasetType) -> list[DatasetConfig]:
+        """Get all configurations of a specific type."""
+        return [
+            config for config in self.configs if config.dataset_type == dataset_type
+        ]
+
+    def get_default_config(self) -> DatasetConfig | None:
+        """Get the default configuration if one exists."""
+        defaults = [config for config in self.configs if config.default]
+        return defaults[0] if defaults else None
+
+    def get_data_configs(self) -> list[DatasetConfig]:
+        """Get all non-metadata configurations."""
+        return [
+            config
+            for config in self.configs
+            if config.dataset_type != DatasetType.METADATA
+        ]
+
+    def get_metadata_configs(self) -> list[DatasetConfig]:
+        """Get all metadata configurations."""
+        return [
+            config
+            for config in self.configs
+            if config.dataset_type == DatasetType.METADATA
+        ]
+
+
+class ExtractedMetadata(BaseModel):
+    """Metadata extracted from datasets."""
+
+    config_name: str = Field(..., description="Source configuration name")
+    field_name: str = Field(
+        ..., description="Field name the metadata was extracted from"
+    )
+    values: set[str] = Field(..., description="Unique values found")
+    extraction_method: str = Field(..., description="How the metadata was extracted")
+
+    model_config = ConfigDict(
+        # Allow sets in JSON serialization
+        json_encoders={set: list}
+    )
+
+
+class MetadataRelationship(BaseModel):
+    """Relationship between a data config and its metadata."""
+
+    data_config: str = Field(..., description="Data configuration name")
+    metadata_config: str = Field(..., description="Metadata configuration name")
+    relationship_type: str = Field(
+        ..., description="Type of relationship (explicit, embedded)"
+    )
+
+
+# ============================================================================
+# VirtualDB Metadata Configuration Models
+# ============================================================================
+
+
+class ComparativeAnalysis(BaseModel):
+    """
+    Reference to a comparative dataset that includes this dataset.
+
+    Comparative datasets relate samples across multiple source datasets.
+    This model specifies which comparative dataset references the current
+    dataset and through which field (via_field).
+
+    Attributes:
+        repo: HuggingFace repository ID of the comparative dataset
+        dataset: Config name of the comparative dataset
+        via_field: Field in the comparative dataset containing composite
+                   identifiers that reference this dataset's samples.
+                   Format: "repo_id;config_name;sample_id"
+
+    Example:
+        ```python
+        # In BrentLab/callingcards config
+        ComparativeAnalysis(
+            repo="BrentLab/yeast_comparative_analysis",
+            dataset="dto",
+            via_field="binding_id"
+        )
+        # Means: dto dataset has a binding_id field with values like:
+        # "BrentLab/callingcards;annotated_features;123"
+        ```
+
+    """
+
+    repo: str = Field(..., description="Comparative dataset repository ID")
+    dataset: str = Field(..., description="Comparative dataset config name")
+    via_field: str = Field(
+        ..., description="Field containing composite sample identifiers"
+    )
+
+
+class PropertyMapping(BaseModel):
+    """
+    Mapping specification for a single property.
+
+    Attributes:
+        path: Optional dot-notation path to the property value.
+              For repo/config-level: relative to experimental_conditions
+              For field-level: relative to field definitions
+              When omitted with field specified, creates a column alias.
+        field: Optional field name for field-level properties.
+               When specified, looks in this field's definitions.
+               When omitted, looks in repo/config-level experimental_conditions.
+        expression: Optional SQL expression for derived/computed fields.
+                    When specified, creates a computed column.
+                    Cannot be used with field or path.
+        dtype: Optional data type specification for type conversion.
+               Supported values: 'string', 'numeric', 'bool'.
+               When specified, extracted values are converted to this type.
+
+    Examples:
+        Field-level property with path:
+            PropertyMapping(field="condition", path="media.carbon_source")
+
+        Repo/config-level property:
+            PropertyMapping(path="temperature_celsius")
+
+        Field-level column alias (no path):
+            PropertyMapping(field="condition")
+
+        Derived field with expression:
+            PropertyMapping(expression="dto_fdr < 0.05")
+
+    """
+
+    field: str | None = Field(None, description="Field name for field-level properties")
+    path: str | None = Field(None, description="Dot-notation path to property")
+    expression: str | None = Field(
+        None, description="SQL expression for derived fields"
+    )
+    dtype: str | None = Field(
+        None, description="Data type for conversion: 'string', 'numeric', or 'bool'"
+    )
+
+    @field_validator("path")
+    @classmethod
+    def validate_path(cls, v: str | None) -> str | None:
+        """Ensure path is not just whitespace if provided."""
+        if v is not None and not v.strip():
+            raise ValueError("path cannot be empty or whitespace")
+        return v.strip() if v else None
+
+    @field_validator("field")
+    @classmethod
+    def validate_field(cls, v: str | None) -> str | None:
+        """Ensure field is not empty string if provided."""
+        if v is not None and not v.strip():
+            raise ValueError("field cannot be empty or whitespace")
+        return v.strip() if v else None
+
+    @field_validator("expression")
+    @classmethod
+    def validate_expression(cls, v: str | None) -> str | None:
+        """Ensure expression is not empty string if provided."""
+        if v is not None and not v.strip():
+            raise ValueError("expression cannot be empty or whitespace")
+        return v.strip() if v else None
+
+    @model_validator(mode="after")
+    def validate_at_least_one_specified(self) -> "PropertyMapping":
+        """Ensure at least one field type is specified and mutually exclusive."""
+        if self.expression is not None:
+            if self.field is not None or self.path is not None:
+                raise ValueError(
+                    "expression cannot be used with field or path - "
+                    "derived fields are computed, not extracted"
+                )
+        elif self.field is None and self.path is None:
+            raise ValueError(
+                "At least one of 'field', 'path', or 'expression' must be specified"
+            )
+        return self
+
+
+class DatasetVirtualDBConfig(BaseModel):
+    """
+    VirtualDB configuration for a specific dataset within a repository.
+
+    Attributes:
+        sample_id: Mapping for the sample identifier field (required for
+          primary datasets)
+        comparative_analyses: Optional list of comparative datasets that
+          reference this dataset
+        properties: Property mappings for this specific dataset (field names to
+          PropertyMapping)
+
+    Example:
+        ```yaml
+        # In BrentLab/callingcards config
+        annotated_features:
+          sample_id:
+            field: sample_id
+          comparative_analyses:
+            - repo: BrentLab/yeast_comparative_analysis
+              dataset: dto
+              via_field: binding_id
+          regulator_locus_tag:
+            field: regulator_locus_tag
+          dto_fdr:  # Field from comparative dataset, optional renaming
+            field: dto_fdr
+        ```
+
+    """
+
+    sample_id: PropertyMapping | None = Field(
+        None, description="Mapping for sample identifier field"
+    )
+    comparative_analyses: list[ComparativeAnalysis] = Field(
+        default_factory=list,
+        description="Comparative datasets referencing this dataset",
+    )
+    # Allow additional property mappings via extra fields
+    model_config = ConfigDict(extra="allow")
+
+    @model_validator(mode="before")
+    @classmethod
+    def parse_property_mappings(cls, data: Any) -> Any:
+        """Parse extra fields as PropertyMapping objects."""
+        if not isinstance(data, dict):
+            return data
+
+        # Process all fields except sample_id and comparative_analyses
+        result = {}
+        for key, value in data.items():
+            if key in ("sample_id", "comparative_analyses"):
+                # These are typed fields, let Pydantic handle them
+                result[key] = value
+            elif isinstance(value, dict):
+                # Assume it's a PropertyMapping
+                try:
+                    result[key] = PropertyMapping.model_validate(value)
+                except Exception as e:
+                    raise ValueError(
+                        f"Invalid PropertyMapping for field '{key}': {e}"
+                    ) from e
+            else:
+                # Already parsed or wrong type
+                result[key] = value
+
+        return result
+
+
+class RepositoryConfig(BaseModel):
+    """
+    Configuration for a single repository. Eg BrentLab/harbison_2004.
+
+    Attributes:
+        properties: Repo-wide property mappings that apply to all datasets
+        dataset: Dataset-specific configurations including sample_id,
+                 comparative_analyses, and property mappings
+
+    Example:
+        ```python
+        config = RepositoryConfig(
+            properties={
+                "temperature_celsius": PropertyMapping(path="temperature_celsius")
+            },
+            dataset={
+                "dataset_name": DatasetVirtualDBConfig(
+                    sample_id=PropertyMapping(field="sample_id"),
+                    comparative_analyses=[
+                        ComparativeAnalysis(
+                            repo="BrentLab/yeast_comparative_analysis",
+                            dataset="dto",
+                            via_field="binding_id"
+                        )
+                    ],
+                    # Additional property mappings via extra fields
+                    **{"carbon_source": PropertyMapping(
+                        field="condition",
+                        path="media.carbon_source"
+                    )}
+                )
+            }
+        )
+        ```
+
+    """
+
+    properties: dict[str, PropertyMapping] = Field(
+        default_factory=dict, description="Repo-wide property mappings"
+    )
+    dataset: dict[str, DatasetVirtualDBConfig] | None = Field(
+        None, description="Dataset-specific configurations"
+    )
+
+    @model_validator(mode="before")
+    @classmethod
+    def parse_structure(cls, data: Any) -> Any:
+        """Parse raw dict structure into typed objects."""
+        if not isinstance(data, dict):
+            return data
+
+        # Extract and parse dataset section
+        dataset_section = data.get("dataset")
+        parsed_datasets: dict[str, DatasetVirtualDBConfig] | None = None
+
+        if dataset_section:
+            if not isinstance(dataset_section, dict):
+                raise ValueError("'dataset' key must contain a dict")
+
+            parsed_datasets = {}
+            for dataset_name, config_dict in dataset_section.items():
+                if not isinstance(config_dict, dict):
+                    raise ValueError(f"Dataset '{dataset_name}' must contain a dict")
+
+                # Parse DatasetVirtualDBConfig
+                # The config_dict may contain:
+                # - sample_id (PropertyMapping)
+                # - comparative_analyses (list[ComparativeAnalysis])
+                # - Other fields as PropertyMappings (via extra="allow")
+                try:
+                    parsed_datasets[dataset_name] = (
+                        DatasetVirtualDBConfig.model_validate(config_dict)
+                    )
+                except Exception as e:
+                    raise ValueError(
+                        f"Invalid configuration for dataset '{dataset_name}': {e}"
+                    ) from e
+
+        # Parse repo-wide properties (all keys except 'dataset')
+        parsed_properties = {}
+        for key, value in data.items():
+            if key == "dataset":
+                continue
+
+            try:
+                parsed_properties[key] = PropertyMapping.model_validate(value)
+            except Exception as e:
+                raise ValueError(f"Invalid repo-wide property '{key}': {e}") from e
+
+        return {"properties": parsed_properties, "dataset": parsed_datasets}
+
+
+class MetadataConfig(BaseModel):
+    """
+    Configuration for building standardized metadata tables.
+
+    Specifies optional alias mappings for normalizing factor levels across
+    heterogeneous datasets, plus property path mappings for each repository.
+
+    Attributes:
+        factor_aliases: Optional mappings of standardized names to actual values.
+                        Example: {"carbon_source":
+                        {"glucose": ["D-glucose", "dextrose"]}}
+        missing_value_labels: Labels for missing values by property name
+        description: Human-readable descriptions for each property
+        repositories: Dict mapping repository IDs to their configurations
+
+    Example:
+        ```yaml
+        repositories:
+          BrentLab/harbison_2004:
+            dataset:
+              harbison_2004:
+                carbon_source:
+                  field: condition
+                  path: media.carbon_source
+
+          BrentLab/kemmeren_2014:
+            temperature:
+              path: temperature_celsius
+            dataset:
+              kemmeren_2014:
+                carbon_source:
+                  path: media.carbon_source
+
+        factor_aliases:
+          carbon_source:
+            glucose: ["D-glucose", "dextrose"]
+            galactose: ["D-galactose", "Galactose"]
+
+        missing_value_labels:
+          carbon_source: "unspecified"
+
+        description:
+          carbon_source: "Carbon source in growth media"
+        ```
+
+    """
+
+    factor_aliases: dict[str, dict[str, list[Any]]] = Field(
+        default_factory=dict,
+        description="Optional alias mappings for normalizing factor levels",
+    )
+    missing_value_labels: dict[str, str] = Field(
+        default_factory=dict,
+        description="Labels for missing values by property name",
+    )
+    description: dict[str, str] = Field(
+        default_factory=dict,
+        description="Human-readable descriptions for each property",
+    )
+    repositories: dict[str, RepositoryConfig] = Field(
+        ..., description="Repository configurations keyed by repo ID"
+    )
+
+    @field_validator("missing_value_labels", mode="before")
+    @classmethod
+    def validate_missing_value_labels(cls, v: Any) -> dict[str, str]:
+        """Validate missing value labels structure, filtering out None values."""
+        if not v:
+            return {}
+        if not isinstance(v, dict):
+            raise ValueError("missing_value_labels must be a dict")
+        # Filter out None values that may come from empty YAML values
+        return {k: val for k, val in v.items() if val is not None}
+
+    @field_validator("description", mode="before")
+    @classmethod
+    def validate_description(cls, v: Any) -> dict[str, str]:
+        """Validate description structure, filtering out None values."""
+        if not v:
+            return {}
+        if not isinstance(v, dict):
+            raise ValueError("description must be a dict")
+        # Filter out None values that may come from empty YAML values
+        return {k: val for k, val in v.items() if val is not None}
+
+    @field_validator("factor_aliases")
+    @classmethod
+    def validate_factor_aliases(
+        cls, v: dict[str, dict[str, list[Any]]]
+    ) -> dict[str, dict[str, list[Any]]]:
+        """Validate factor alias structure."""
+        # Empty is OK - aliases are optional
+        if not v:
+            return v
+
+        for prop_name, aliases in v.items():
+            if not isinstance(aliases, dict):
+                raise ValueError(
+                    f"Property '{prop_name}' aliases must be a dict, "
+                    f"got {type(aliases).__name__}"
+                )
+
+            # Validate each alias mapping
+            for alias_name, actual_values in aliases.items():
+                if not isinstance(actual_values, list):
+                    raise ValueError(
+                        f"Alias '{alias_name}' for '{prop_name}' must map "
+                        f"to a list of values"
+                    )
+                if not actual_values:
+                    raise ValueError(
+                        f"Alias '{alias_name}' for '{prop_name}' cannot "
+                        f"have empty value list"
+                    )
+                for val in actual_values:
+                    if not isinstance(val, (str, int, float, bool)):
+                        raise ValueError(
+                            f"Alias '{alias_name}' for '{prop_name}' contains "
+                            f"invalid value type: {type(val).__name__}"
+                        )
+
+        return v
+
+    @model_validator(mode="before")
+    @classmethod
+    def parse_repositories(cls, data: Any) -> Any:
+        """Parse repository configurations from 'repositories' key."""
+        if not isinstance(data, dict):
+            return data
+
+        # Extract repositories from 'repositories' key
+        repositories_data = data.get("repositories", {})
+
+        if not repositories_data:
+            raise ValueError(
+                "Configuration must have a 'repositories' key "
+                "with at least one repository"
+            )
+
+        if not isinstance(repositories_data, dict):
+            raise ValueError("'repositories' key must contain a dict")
+
+        repositories = {}
+        for repo_id, repo_config in repositories_data.items():
+            try:
+                repositories[repo_id] = RepositoryConfig.model_validate(repo_config)
+            except Exception as e:
+                raise ValueError(
+                    f"Invalid configuration for repository '{repo_id}': {e}"
+                ) from e
+
+        return {
+            "factor_aliases": data.get("factor_aliases", {}),
+            "missing_value_labels": data.get("missing_value_labels", {}),
+            "description": data.get("description", {}),
+            "repositories": repositories,
+        }
+
+    @classmethod
+    def from_yaml(cls, path: Path | str) -> "MetadataConfig":
+        """
+        Load and validate configuration from YAML file.
+
+        :param path: Path to YAML configuration file
+        :return: Validated MetadataConfig instance
+        :raises FileNotFoundError: If file doesn't exist
+        :raises ValueError: If configuration is invalid
+
+        """
+        path = Path(path)
+
+        if not path.exists():
+            raise FileNotFoundError(f"Configuration file not found: {path}")
+
+        with open(path) as f:
+            data = yaml.safe_load(f)
+
+        if not isinstance(data, dict):
+            raise ValueError("Configuration must be a YAML dict")
+
+        return cls.model_validate(data)
+
+    def get_repository_config(self, repo_id: str) -> RepositoryConfig | None:
+        """
+        Get configuration for a specific repository.
+
+        :param repo_id: Repository ID (e.g., "BrentLab/harbison_2004")
+        :return: RepositoryConfig instance or None if not found
+
+        """
+        return self.repositories.get(repo_id)
+
+    def get_property_mappings(
+        self, repo_id: str, config_name: str
+    ) -> dict[str, PropertyMapping]:
+        """
+        Get merged property mappings for a repo/dataset combination.
+
+        Merges repo-wide and dataset-specific mappings, with dataset-specific taking
+        precedence.
+
+        :param repo_id: Repository ID
+        :param config_name: Dataset/config name
+        :return: Dict mapping property names to PropertyMapping objects
+
+        """
+        repo_config = self.get_repository_config(repo_id)
+        if not repo_config:
+            return {}
+
+        # Start with repo-wide properties
+        mappings: dict[str, PropertyMapping] = dict(repo_config.properties)
+
+        # Override with dataset-specific properties
+        if repo_config.dataset and config_name in repo_config.dataset:
+            dataset_config = repo_config.dataset[config_name]
+            # DatasetVirtualDBConfig stores property mappings in model_extra
+            if hasattr(dataset_config, "model_extra") and dataset_config.model_extra:
+                mappings.update(dataset_config.model_extra)
+
+        return mappings
diff --git a/tfbpapi/rank_transforms.py b/tfbpapi/rank_transforms.py
deleted file mode 100644
index 9e4c672..0000000
--- a/tfbpapi/rank_transforms.py
+++ /dev/null
@@ -1,154 +0,0 @@
-import numpy as np
-from scipy.stats import rankdata
-
-
-def shifted_negative_log_ranks(ranks: np.ndarray) -> np.ndarray:
-    """
-    Transforms ranks to negative log10 values and shifts such that the lowest value is
-    0.
-
-    :param ranks: A vector of ranks
-    :return np.ndarray: A vector of negative log10 transformed ranks shifted such that
-        the lowest value is 0
-    :raises ValueError: If the ranks are not numeric.
-
-    """
-    if not np.issubdtype(ranks.dtype, np.number):
-        raise ValueError("`ranks` must be a numeric")
-    max_rank = np.max(ranks)
-    log_max_rank = np.log10(max_rank)
-    return -1 * np.log10(ranks) + log_max_rank
-
-
-def stable_rank(
-    pvalue_vector: np.ndarray, enrichment_vector: np.ndarray, method="average"
-) -> np.ndarray:
-    """
-    Ranks data by primary_column, breaking ties based on secondary_column. The expected
-    primary and secondary columns are 'pvalue' and 'enrichment', respectively. Then the
-    ranks are transformed to negative log10 values and shifted such that the lowest
-    value is 0 and the highest value is log10(min_rank).
-
-    :param pvalue_vector: A vector of pvalues
-    :param enrichment_vector: A vector of enrichment values corresponding to the pvalues
-    :param method: The method to use for final ranking. Default is "average".
-        See `rankdata`
-
-    :return np.ndarray: A vector of negative log10 transformed ranks shifted such that
-        the lowest value is 0 and the highest value is log10(min_rank)
-    :raises ValueError: If the primary or secondary column is not numeric.
-
-    """
-
-    # Check if primary and secondary columns are numeric
-    if not np.issubdtype(pvalue_vector.dtype, np.number):
-        raise ValueError("`primary_vector` must be a numeric")
-    if not np.issubdtype(enrichment_vector.dtype, np.number):
-        raise ValueError("`secondary_vector` must be a numeric")
-
-    # Step 1: Rank by primary_column
-    # note that this will now always be an integer, unlike average which could return
-    # decimal values making adding the secondary rank more difficult
-    primary_rank = rankdata(pvalue_vector, method="min")
-
-    # Step 2: Identify ties in primary_rank
-    unique_ranks = np.unique(primary_rank)
-
-    # Step 3: Adjust ranks within ties using secondary ranking
-    adjusted_primary_rank = primary_rank.astype(
-        float
-    )  # Convert to float for adjustments
-
-    for unique_rank in unique_ranks:
-        # Get indices where primary_rank == unique_rank
-        tie_indices = np.where(primary_rank == unique_rank)[0]
-
-        if len(tie_indices) > 1:  # Only adjust if there are ties
-            # Rank within the tie group by secondary_column
-            # (descending if higher is better)
-            tie_secondary_values = enrichment_vector[tie_indices]
-            secondary_rank_within_ties = rankdata(
-                -tie_secondary_values, method="average"
-            )
-
-            # Calculate dynamic scale factor to ensure adjustments are < 1. Since the
-            # primary_rank is an integer, adding a number less than 1 will not affect
-            # rank relative to the other groups.
-            max_secondary_rank = np.max(secondary_rank_within_ties)
-            scale_factor = (
-                0.9 / max_secondary_rank
-            )  # Keep scale factor slightly below 1/max rank
-
-            # multiple the secondary_rank_within_ties values by 0.1 and add this value
-            # to the adjusted_primary_rank_values. This will rank the tied primary
-            # values by the secondary values, but not affect the overall primary rank
-            # outside of the tie group
-            # think about this scale factor
-            adjusted_primary_rank[tie_indices] += (
-                secondary_rank_within_ties * scale_factor
-            )
-
-    # Step 4: Final rank based on the adjusted primary ranks
-    final_ranks = rankdata(adjusted_primary_rank, method=method)
-
-    return final_ranks
-
-
-def rank_by_pvalue(pvalue_vector: np.ndarray, method="average") -> np.ndarray:
-    """
-    This expects a vector of pvalues, returns a vector of ranks where the lowest pvalue
-    has the lowest rank.
-
-    :param pvalue_vector: A vector of pvalues
-    :param enrichment_vector: A vector of enrichment values corresponding to the pvalues
-    :param method: The method to use for ranking. Default is "average". See `rankdata`
-    :return np.ndarray: A vector of negative log10 transformed ranks shifted such that
-        the lowest value is 0 and the highest value is log10(min_rank)
-    :raises ValueError: If the primary or secondary column is not numeric.
-
-    """
-
-    # Check if primary and secondary columns are numeric
-    if not np.issubdtype(pvalue_vector.dtype, np.number):
-        raise ValueError("`primary_vector` must be a numeric")
-
-    # Step 1: Rank by primary_column
-    # note that this will now always be an integer, unlike average which could return
-    # decimal values making adding the secondary rank more difficult
-    return rankdata(pvalue_vector, method=method)
-
-
-def transform(
-    pvalue_vector: np.ndarray,
-    enrichment_vector: np.ndarray,
-    use_enrichment: bool = True,
-    negative_log_shift: bool = True,
-    **kwargs,
-) -> np.ndarray:
-    """
-    This calls the rank() function and then transforms the ranks to negative log10
-    values and shifts to the right such that the lowest value (largest rank, least
-    important) is 0.
-
-    :param pvalue_vector: A vector of pvalues
-    :param enrichment_vector: A vector of enrichment values corresponding to the pvalues
-    :param use_enrichment: Set to True to use the enrichment vector to break ties.
-        Default is True. If False, pvalues will be ranked directly with method="average'
-    :param negative_log_shift: Set to True to shift the ranks to the right such that the
-        lowest value (largest rank, least important) is 0. Default is True.
-    :param kwargs: Additional keyword arguments to pass to the rank() function (e.g.
-        method="min")
-    :return np.ndarray: A vector of negative log10 transformed ranks shifted such that
-        the lowest value is 0 and the highest value is log10(min_rank)
-    :raises ValueError: If the primary or secondary column is not numeric.
-
-    """
-    if use_enrichment:
-        ranks = stable_rank(pvalue_vector, enrichment_vector, **kwargs)
-    else:
-        ranks = rank_by_pvalue(pvalue_vector, **kwargs)
-
-    if negative_log_shift:
-        return shifted_negative_log_ranks(ranks)
-    else:
-        return ranks
diff --git a/tfbpapi/tests/conftest.py b/tfbpapi/tests/conftest.py
new file mode 100644
index 0000000..55c1082
--- /dev/null
+++ b/tfbpapi/tests/conftest.py
@@ -0,0 +1,1465 @@
+import pickle
+from pathlib import Path
+from unittest.mock import patch
+
+import pytest
+
+
+@pytest.fixture
+def mock_cache_info():
+    """Load real cache data from pickle file."""
+    cache_file = Path(__file__).parent / "data" / "cache_info.pkl"
+
+    if not cache_file.exists():
+        pytest.skip(
+            "test_cache_data.pkl not found. Run cache data generation script first."
+        )
+
+    with open(cache_file, "rb") as f:
+        return pickle.load(f)
+
+
+@pytest.fixture
+def mock_scan_cache_dir(mock_cache_info):
+    """Mock scan_cache_dir to return our pickled cache data."""
+    with patch("huggingface_hub.scan_cache_dir", return_value=mock_cache_info):
+        yield mock_cache_info
+
+
+# ============================================================================
+# Datainfo Fixtures (merged from tests/datainfo/conftest.py)
+# ============================================================================
+
+
+@pytest.fixture
+def sample_dataset_card_data():
+    """Sample dataset card data for testing."""
+    return {
+        "license": "mit",
+        "language": ["en"],
+        "tags": ["biology", "genomics", "yeast"],
+        "pretty_name": "Test Genomics Dataset",
+        "size_categories": ["100K<n<1M"],
+        "configs": [
+            {
+                "config_name": "genomic_features",
+                "description": "Gene annotations and regulatory features",
+                "dataset_type": "genomic_features",
+                "default": True,
+                "data_files": [{"split": "train", "path": "features.parquet"}],
+                "dataset_info": {
+                    "features": [
+                        {
+                            "name": "gene_id",
+                            "dtype": "string",
+                            "description": "Systematic gene identifier",
+                        },
+                        {
+                            "name": "gene_symbol",
+                            "dtype": "string",
+                            "description": "Standard gene symbol",
+                        },
+                        {
+                            "name": "chromosome",
+                            "dtype": "string",
+                            "description": "Chromosome identifier",
+                        },
+                        {
+                            "name": "start",
+                            "dtype": "int64",
+                            "description": "Gene start position",
+                        },
+                        {
+                            "name": "end",
+                            "dtype": "int64",
+                            "description": "Gene end position",
+                        },
+                    ]
+                },
+            },
+            {
+                "config_name": "binding_data",
+                "description": "Transcription factor binding measurements",
+                "dataset_type": "annotated_features",
+                "metadata_fields": ["regulator_symbol", "experimental_condition"],
+                "data_files": [{"split": "train", "path": "binding/*.parquet"}],
+                "dataset_info": {
+                    "features": [
+                        {
+                            "name": "regulator_symbol",
+                            "dtype": "string",
+                            "description": "Transcription factor name",
+                        },
+                        {
+                            "name": "target_gene",
+                            "dtype": "string",
+                            "description": "Target gene identifier",
+                        },
+                        {
+                            "name": "experimental_condition",
+                            "dtype": "string",
+                            "description": "Experimental treatment condition",
+                        },
+                        {
+                            "name": "binding_score",
+                            "dtype": "float64",
+                            "description": "Quantitative binding measurement",
+                        },
+                    ]
+                },
+            },
+            {
+                "config_name": "genome_map_data",
+                "description": "Genome-wide signal tracks",
+                "dataset_type": "genome_map",
+                "data_files": [
+                    {
+                        "split": "train",
+                        "path": "tracks/regulator=*/experiment=*/*.parquet",
+                    }
+                ],
+                "dataset_info": {
+                    "features": [
+                        {
+                            "name": "chr",
+                            "dtype": "string",
+                            "description": "Chromosome identifier",
+                        },
+                        {
+                            "name": "pos",
+                            "dtype": "int32",
+                            "description": "Genomic position",
+                        },
+                        {
+                            "name": "signal",
+                            "dtype": "float32",
+                            "description": "Signal intensity",
+                        },
+                    ],
+                    "partitioning": {
+                        "enabled": True,
+                        "partition_by": ["regulator", "experiment"],
+                    },
+                },
+            },
+            {
+                "config_name": "experiment_metadata",
+                "description": "Experimental conditions and sample information",
+                "dataset_type": "metadata",
+                "applies_to": ["binding_data"],
+                "data_files": [{"split": "train", "path": "metadata.parquet"}],
+                "dataset_info": {
+                    "features": [
+                        {
+                            "name": "sample_id",
+                            "dtype": "string",
+                            "description": "Unique sample identifier",
+                        },
+                        {
+                            "name": "experimental_condition",
+                            "dtype": "string",
+                            "description": "Experimental treatment or condition",
+                        },
+                        {
+                            "name": "publication_doi",
+                            "dtype": "string",
+                            "description": "DOI of associated publication",
+                        },
+                    ]
+                },
+            },
+        ],
+    }
+
+
+@pytest.fixture
+def minimal_dataset_card_data():
+    """Minimal valid dataset card data."""
+    return {
+        "configs": [
+            {
+                "config_name": "test_config",
+                "description": "Test configuration",
+                "dataset_type": "genomic_features",
+                "data_files": [{"split": "train", "path": "test.parquet"}],
+                "dataset_info": {
+                    "features": [
+                        {
+                            "name": "test_field",
+                            "dtype": "string",
+                            "description": "Test field",
+                        }
+                    ]
+                },
+            }
+        ]
+    }
+
+
+@pytest.fixture
+def invalid_dataset_card_data():
+    """Invalid dataset card data for testing validation errors."""
+    return {
+        "configs": [
+            {
+                "config_name": "invalid_config",
+                "description": "Invalid configuration",
+                # Missing required dataset_type field
+                "data_files": [{"split": "train", "path": "test.parquet"}],
+                "dataset_info": {"features": []},  # Empty features list
+            }
+        ]
+    }
+
+
+@pytest.fixture
+def sample_repo_structure():
+    """Sample repository structure data."""
+    return {
+        "repo_id": "test/dataset",
+        "files": [
+            {"path": "features.parquet", "size": 2048000, "is_lfs": True},
+            {"path": "binding/part1.parquet", "size": 1024000, "is_lfs": True},
+            {
+                "path": "tracks/regulator=TF1/experiment=exp1/data.parquet",
+                "size": 5120000,
+                "is_lfs": True,
+            },
+            {
+                "path": "tracks/regulator=TF1/experiment=exp2/data.parquet",
+                "size": 4096000,
+                "is_lfs": True,
+            },
+            {
+                "path": "tracks/regulator=TF2/experiment=exp1/data.parquet",
+                "size": 3072000,
+                "is_lfs": True,
+            },
+        ],
+        "partitions": {"regulator": {"TF1", "TF2"}, "experiment": {"exp1", "exp2"}},
+        "total_files": 5,
+        "last_modified": "2023-12-01T10:30:00Z",
+    }
+
+
+@pytest.fixture
+def sample_size_info():
+    """Sample size information data."""
+    return {
+        "dataset": "test/dataset",
+        "num_bytes": 15360000,
+        "num_rows": 150000,
+        "download_size": 12288000,
+        "dataset_size": 15360000,
+    }
+
+
+@pytest.fixture
+def mock_hf_card_fetcher():
+    """Mock HfDataCardFetcher instance."""
+    from unittest.mock import Mock
+
+    mock_fetcher = Mock()
+    mock_fetcher.fetch.return_value = {}
+    return mock_fetcher
+
+
+@pytest.fixture
+def mock_hf_structure_fetcher():
+    """Mock HfRepoStructureFetcher instance."""
+    from unittest.mock import Mock
+
+    mock_fetcher = Mock()
+    mock_fetcher.fetch.return_value = {}
+    mock_fetcher.get_partition_values.return_value = []
+    mock_fetcher.get_dataset_files.return_value = []
+    return mock_fetcher
+
+
+@pytest.fixture
+def mock_hf_size_fetcher():
+    """Mock HfSizeInfoFetcher instance."""
+    from unittest.mock import Mock
+
+    mock_fetcher = Mock()
+    mock_fetcher.fetch.return_value = {}
+    return mock_fetcher
+
+
+@pytest.fixture
+def test_repo_id():
+    """Standard test repository ID."""
+    return "test/genomics-dataset"
+
+
+@pytest.fixture
+def test_token():
+    """Test HuggingFace token."""
+    return "test_hf_token_12345"
+
+
+@pytest.fixture
+def sample_feature_info():
+    """Sample feature information for testing."""
+    return {
+        "name": "gene_symbol",
+        "dtype": "string",
+        "description": "Standard gene symbol (e.g., HO, GAL1)",
+    }
+
+
+@pytest.fixture
+def sample_partitioning_info():
+    """Sample partitioning information."""
+    return {
+        "enabled": True,
+        "partition_by": ["regulator", "condition"],
+        "path_template": "data/regulator={regulator}/condition={condition}/*.parquet",
+    }
+
+
+@pytest.fixture
+def sample_data_file_info():
+    """Sample data file information."""
+    return {"split": "train", "path": "genomic_features.parquet"}
+
+
+# ============================================================================
+# Filter Resolver Fixtures
+# ============================================================================
+
+
+@pytest.fixture
+def write_config(tmp_path):
+    """
+    Helper to write config dict to temp file.
+
+    :param tmp_path: pytest tmp_path fixture
+    :return: Function that writes config dict and returns Path
+
+    """
+    import yaml  # type: ignore[import-untyped]
+
+    def _write(config_dict):
+        config_path = tmp_path / "config.yaml"
+        with open(config_path, "w") as f:
+            yaml.dump(config_dict, f)
+        return config_path
+
+    return _write
+
+# ============================================================================
+# Datacard Fixtures (from huggingface_collection_datacards.txt)
+# ============================================================================
+
+
+# Datacard fixtures copied directly from huggingface_collection_datacards.txt
+# These contain the complete YAML metadata for each repository
+
+
+@pytest.fixture
+def hackett_2020_datacard():
+    """Complete hackett_2020 datacard YAML from huggingface_collection_datacards.txt."""
+    return {
+        "license": "mit",
+        "language": ["en"],
+        "tags": [
+            "genomics",
+            "yeast",
+            "transcription",
+            "perturbation",
+            "response",
+            "overexpression",
+        ],
+        "pretty_name": "Hackett, 2020 Overexpression",
+        "size_categories": ["1M<n<10M"],
+        "experimental_conditions": {
+            "temperature_celsius": 30,
+            "cultivation_method": "chemostat",
+            "media": {
+                "name": "minimal",
+                "carbon_source": [
+                    {"compound": "D-glucose", "concentration_percent": 1}
+                ],
+            },
+        },
+        "configs": [
+            {
+                "config_name": "hackett_2020",
+                "description": "TF overexpression data from Hackett 2020",
+                "default": True,
+                "dataset_type": "annotated_features",
+                "metadata_fields": [
+                    "sample_id",
+                    "regulator_locus_tag",
+                    "regulator_symbol",
+                    "time",
+                    "mechanism",
+                    "restriction",
+                    "date",
+                    "strain",
+                ],
+                "data_files": [{"split": "train", "path": "hackett_2020.parquet"}],
+                "dataset_info": {
+                    "features": [
+                        {
+                            "name": "sample_id",
+                            "dtype": "integer",
+                            "description": "unique identifier for a "
+                            "specific sample. The "
+                            "sample ID identifies a "
+                            "unique "
+                            "(regulator_locus_tag, "
+                            "time, mechanism, "
+                            "restriction, date, "
+                            "strain) tuple.",
+                        },
+                        {
+                            "name": "db_id",
+                            "dtype": "integer",
+                            "description": "an old unique identifer, "
+                            "for use internally only. "
+                            "Deprecated and will be "
+                            "removed eventually. Do "
+                            "not use in analysis. "
+                            "db_id = 0, for GEV and "
+                            "Z3EV, means that those "
+                            "samples are not included "
+                            "in the original DB.",
+                        },
+                        {
+                            "name": "regulator_locus_tag",
+                            "dtype": "string",
+                            "description": "induced transcriptional "
+                            "regulator systematic ID. "
+                            "See "
+                            "hf/BrentLab/yeast_genome_resources",
+                            "role": "regulator_identifier",
+                        },
+                        {
+                            "name": "regulator_symbol",
+                            "dtype": "string",
+                            "description": "induced transcriptional "
+                            "regulator common name. If "
+                            "no common name exists, "
+                            "then the "
+                            "`regulator_locus_tag` is "
+                            "used.",
+                            "role": "regulator_identifier",
+                        },
+                        {
+                            "name": "target_locus_tag",
+                            "dtype": "string",
+                            "description": "The systematic ID of the "
+                            "feature to which the "
+                            "effect/pvalue is "
+                            "assigned. See "
+                            "hf/BrentLab/yeast_genome_resources",
+                            "role": "target_identifier",
+                        },
+                        {
+                            "name": "target_symbol",
+                            "dtype": "string",
+                            "description": "The common name of the "
+                            "feature to which the "
+                            "effect/pvalue is "
+                            "assigned. If there is no "
+                            "common name, the "
+                            "`target_locus_tag` is "
+                            "used.",
+                            "role": "target_identifier",
+                        },
+                        {
+                            "name": "time",
+                            "dtype": "float",
+                            "description": "time point (minutes)",
+                            "role": "experimental_condition",
+                        },
+                        {
+                            "name": "mechanism",
+                            "dtype": {"class_label": {"names": ["GEV", "ZEV"]}},
+                            "description": "Synthetic TF induction "
+                            "system (GEV or ZEV)",
+                            "role": "experimental_condition",
+                            "definitions": {
+                                "GEV": {
+                                    "perturbation_method": {
+                                        "type": "inducible_overexpression",
+                                        "system": "GEV",
+                                        "inducer": "beta-estradiol",
+                                        "description": "Galactose-inducible "
+                                        "estrogen "
+                                        "receptor-VP16 "
+                                        "fusion "
+                                        "system",
+                                    }
+                                },
+                                "ZEV": {
+                                    "perturbation_method": {
+                                        "type": "inducible_overexpression",
+                                        "system": "ZEV",
+                                        "inducer": "beta-estradiol",
+                                        "description": "Z3 "
+                                        "(synthetic "
+                                        "zinc "
+                                        "finger)-estrogen "
+                                        "receptor-VP16 "
+                                        "fusion "
+                                        "system",
+                                    }
+                                },
+                            },
+                        },
+                        {
+                            "name": "restriction",
+                            "dtype": {"class_label": {"names": ["M", "N", "P"]}},
+                            "description": "nutrient limitation, one "
+                            "of P (phosphate "
+                            "limitation (20 mg/l).), N "
+                            "(Nitrogen‐limited "
+                            "cultures were maintained "
+                            "at 40 mg/l ammonium "
+                            "sulfate) or M (Not "
+                            "defined in the paper or "
+                            "on the Calico website)",
+                            "role": "experimental_condition",
+                            "definitions": {
+                                "P": {
+                                    "media": {
+                                        "nitrogen_source": [
+                                            {
+                                                "compound": "ammonium_sulfate",
+                                                "concentration_percent": 0.5,
+                                            }
+                                        ],
+                                        "phosphate_source": [
+                                            {
+                                                "compound": (
+                                                    "potassium_phosphate_monobasic"
+                                                ),
+                                                "concentration_percent": 0.002,
+                                            }
+                                        ],
+                                    }
+                                },
+                                "N": {
+                                    "media": {
+                                        "nitrogen_source": [
+                                            {
+                                                "compound": "ammonium_sulfate",
+                                                "concentration_percent": 0.004,
+                                            }
+                                        ]
+                                    }
+                                },
+                                "M": {
+                                    "description": "Not "
+                                    "defined "
+                                    "in "
+                                    "the "
+                                    "paper "
+                                    "or "
+                                    "on "
+                                    "the "
+                                    "Calico "
+                                    "website"
+                                },
+                            },
+                        },
+                        {
+                            "name": "date",
+                            "dtype": "string",
+                            "description": "date performed",
+                            "role": "experimental_condition",
+                        },
+                        {
+                            "name": "strain",
+                            "dtype": "string",
+                            "description": "strain name",
+                            "role": "experimental_condition",
+                        },
+                        {
+                            "name": "green_median",
+                            "dtype": "float",
+                            "description": "median of green "
+                            "(reference) channel "
+                            "fluorescence",
+                            "role": "quantitative_measure",
+                        },
+                        {
+                            "name": "red_median",
+                            "dtype": "float",
+                            "description": "median of red "
+                            "(experimental) channel "
+                            "fluorescence",
+                            "role": "quantitative_measure",
+                        },
+                        {
+                            "name": "log2_ratio",
+                            "dtype": "float",
+                            "description": "log2(red / green) "
+                            "subtracting value at time "
+                            "zero",
+                            "role": "quantitative_measure",
+                        },
+                        {
+                            "name": "log2_cleaned_ratio",
+                            "dtype": "float",
+                            "description": "Non-specific stress "
+                            "response and prominent "
+                            "outliers removed",
+                            "role": "quantitative_measure",
+                        },
+                        {
+                            "name": "log2_noise_model",
+                            "dtype": "float",
+                            "description": "estimated noise standard " "deviation",
+                            "role": "quantitative_measure",
+                        },
+                        {
+                            "name": "log2_cleaned_ratio_zth2d",
+                            "dtype": "float",
+                            "description": "cleaned timecourses "
+                            "hard-thresholded based on "
+                            "multiple observations (or "
+                            "last observation) passing "
+                            "the noise model",
+                            "role": "quantitative_measure",
+                        },
+                        {
+                            "name": "log2_selected_timecourses",
+                            "dtype": "float",
+                            "description": "cleaned timecourses "
+                            "hard-thresholded based on "
+                            "single observations "
+                            "passing noise model and "
+                            "impulse evaluation of "
+                            "biological feasibility",
+                            "role": "quantitative_measure",
+                        },
+                        {
+                            "name": "log2_shrunken_timecourses",
+                            "dtype": "float",
+                            "description": "selected timecourses with "
+                            "observation-level "
+                            "shrinkage based on local "
+                            "FDR (false discovery "
+                            "rate). Most users of the "
+                            "data will want to use "
+                            "this column.",
+                            "role": "quantitative_measure",
+                        },
+                    ]
+                },
+            }
+        ],
+    }
+
+
+@pytest.fixture
+def harbison_2004_datacard():
+    """Complete harbison_2004 datacard YAML from
+    huggingface_collection_datacards.txt."""
+    return {
+        "license": "mit",
+        "language": ["en"],
+        "tags": ["genomics", "yeast", "transcription", "binding"],
+        "pretty_name": "Harbison, 2004 ChIP-chip",
+        "size_categories": ["1M<n<10M"],
+        "strain_information": {"background": "W303", "base_strain": "Z1256"},
+        "configs": [
+            {
+                "config_name": "harbison_2004",
+                "description": "ChIP-chip transcription factor binding data with "
+                "environmental conditions",
+                "dataset_type": "annotated_features",
+                "default": True,
+                "metadata_fields": [
+                    "regulator_locus_tag",
+                    "regulator_symbol",
+                    "condition",
+                ],
+                "data_files": [{"split": "train", "path": "harbison_2004.parquet"}],
+                "dataset_info": {
+                    "features": [
+                        {
+                            "name": "condition",
+                            "dtype": {
+                                "class_label": {
+                                    "names": [
+                                        "YPD",
+                                        "SM",
+                                        "RAPA",
+                                        "H2O2Hi",
+                                        "H2O2Lo",
+                                        "Acid",
+                                        "Alpha",
+                                        "BUT14",
+                                        "BUT90",
+                                        "Thi-",
+                                        "GAL",
+                                        "HEAT",
+                                        "Pi-",
+                                        "RAFF",
+                                    ]
+                                }
+                            },
+                            "description": "Environmental condition "
+                            "of the experiment. Nearly "
+                            "all of the 204 regulators "
+                            "have a YPD condition, and "
+                            "some have others in "
+                            "addition.",
+                            "role": "experimental_condition",
+                            "definitions": {
+                                "YPD": {
+                                    "description": "Rich "
+                                    "media "
+                                    "baseline "
+                                    "condition",
+                                    "temperature_celsius": 30,
+                                    "cultivation_method": "unspecified",
+                                    "growth_phase_at_harvest": {"od600": 0.8},
+                                    "media": {
+                                        "name": "YPD",
+                                        "carbon_source": [
+                                            {
+                                                "compound": "D-glucose",
+                                                "concentration_percent": 2,
+                                            }
+                                        ],
+                                        "nitrogen_source": [
+                                            {
+                                                "compound": "yeast_extract",
+                                                "concentration_percent": 1,
+                                            },
+                                            {
+                                                "compound": "peptone",
+                                                "concentration_percent": 2,
+                                            },
+                                        ],
+                                    },
+                                },
+                                "SM": {
+                                    "description": "Amino "
+                                    "acid "
+                                    "starvation "
+                                    "stress "
+                                    "condition",
+                                    "temperature_celsius": 30,
+                                    "cultivation_method": "unspecified",
+                                    "growth_phase_at_harvest": {"od600": 0.6},
+                                    "media": {
+                                        "name": "synthetic_complete",
+                                        "carbon_source": "unspecified",
+                                        "nitrogen_source": "unspecified",
+                                    },
+                                    "chemical_treatment": {
+                                        "compound": "sulfometuron_methyl",
+                                        "concentration_percent": 0.02,
+                                        "duration_hours": 2,
+                                    },
+                                },
+                                "RAPA": {
+                                    "description": "Nutrient "
+                                    "deprivation "
+                                    "via "
+                                    "TOR "
+                                    "inhibition",
+                                    "temperature_celsius": 30,
+                                    "cultivation_method": "unspecified",
+                                    "growth_phase_at_harvest": {"od600": 0.8},
+                                    "media": {
+                                        "name": "YPD",
+                                        "carbon_source": [
+                                            {
+                                                "compound": "D-glucose",
+                                                "concentration_percent": 2,
+                                            }
+                                        ],
+                                        "nitrogen_source": [
+                                            {
+                                                "compound": "yeast_extract",
+                                                "concentration_percent": 1,
+                                            },
+                                            {
+                                                "compound": "peptone",
+                                                "concentration_percent": 2,
+                                            },
+                                        ],
+                                    },
+                                    "chemical_treatment": {
+                                        "compound": "rapamycin",
+                                        "concentration_percent": 9.142e-06,
+                                        "duration_minutes": 20,
+                                    },
+                                },
+                                "H2O2Hi": {
+                                    "description": "High "
+                                    "oxidative "
+                                    "stress "
+                                    "condition",
+                                    "temperature_celsius": 30,
+                                    "cultivation_method": "unspecified",
+                                    "growth_phase_at_harvest": {"od600": 0.5},
+                                    "media": {
+                                        "name": "YPD",
+                                        "carbon_source": [
+                                            {
+                                                "compound": "D-glucose",
+                                                "concentration_percent": 2,
+                                            }
+                                        ],
+                                        "nitrogen_source": [
+                                            {
+                                                "compound": "yeast_extract",
+                                                "concentration_percent": 1,
+                                            },
+                                            {
+                                                "compound": "peptone",
+                                                "concentration_percent": 2,
+                                            },
+                                        ],
+                                    },
+                                    "chemical_treatment": {
+                                        "compound": "hydrogen_peroxide",
+                                        "concentration_percent": 0.0136,
+                                        "duration_minutes": 30,
+                                    },
+                                },
+                                "H2O2Lo": {
+                                    "description": "Moderate "
+                                    "oxidative "
+                                    "stress "
+                                    "condition",
+                                    "temperature_celsius": 30,
+                                    "cultivation_method": "unspecified",
+                                    "growth_phase_at_harvest": {"od600": 0.5},
+                                    "media": {
+                                        "name": "YPD",
+                                        "carbon_source": [
+                                            {
+                                                "compound": "D-glucose",
+                                                "concentration_percent": 2,
+                                            }
+                                        ],
+                                        "nitrogen_source": [
+                                            {
+                                                "compound": "yeast_extract",
+                                                "concentration_percent": 1,
+                                            },
+                                            {
+                                                "compound": "peptone",
+                                                "concentration_percent": 2,
+                                            },
+                                        ],
+                                    },
+                                    "chemical_treatment": {
+                                        "compound": "hydrogen_peroxide",
+                                        "concentration_percent": 0.00136,
+                                        "duration_minutes": 20,
+                                    },
+                                },
+                                "Acid": {
+                                    "description": "Acidic "
+                                    "pH "
+                                    "stress "
+                                    "condition",
+                                    "temperature_celsius": 30,
+                                    "cultivation_method": "unspecified",
+                                    "growth_phase_at_harvest": {"od600": 0.5},
+                                    "media": {
+                                        "name": "YPD",
+                                        "carbon_source": [
+                                            {
+                                                "compound": "D-glucose",
+                                                "concentration_percent": 2,
+                                            }
+                                        ],
+                                        "nitrogen_source": [
+                                            {
+                                                "compound": "yeast_extract",
+                                                "concentration_percent": 1,
+                                            },
+                                            {
+                                                "compound": "peptone",
+                                                "concentration_percent": 2,
+                                            },
+                                        ],
+                                    },
+                                    "chemical_treatment": {
+                                        "compound": "succinic_acid",
+                                        "concentration_percent": 0.59,
+                                        "target_pH": 4.0,
+                                        "duration_minutes": 30,
+                                    },
+                                },
+                                "Alpha": {
+                                    "description": "Mating "
+                                    "pheromone "
+                                    "induction "
+                                    "condition",
+                                    "temperature_celsius": 30,
+                                    "cultivation_method": "unspecified",
+                                    "growth_phase_at_harvest": {"od600": 0.8},
+                                    "media": {
+                                        "name": "YPD",
+                                        "carbon_source": [
+                                            {
+                                                "compound": "D-glucose",
+                                                "concentration_percent": 2,
+                                            }
+                                        ],
+                                        "nitrogen_source": [
+                                            {
+                                                "compound": "yeast_extract",
+                                                "concentration_percent": 1,
+                                            },
+                                            {
+                                                "compound": "peptone",
+                                                "concentration_percent": 2,
+                                            },
+                                        ],
+                                    },
+                                    "chemical_treatment": {
+                                        "compound": "alpha_factor_pheromone",
+                                        "concentration_percent": 0.5,
+                                        "duration_minutes": 30,
+                                    },
+                                },
+                                "BUT14": {
+                                    "description": "Long-term "
+                                    "filamentation "
+                                    "induction "
+                                    "with "
+                                    "butanol",
+                                    "temperature_celsius": 30,
+                                    "cultivation_method": "unspecified",
+                                    "growth_phase_at_harvest": {"od600": 0.8},
+                                    "media": {
+                                        "name": "YPD",
+                                        "carbon_source": [
+                                            {
+                                                "compound": "D-glucose",
+                                                "concentration_percent": 2,
+                                            }
+                                        ],
+                                        "nitrogen_source": [
+                                            {
+                                                "compound": "yeast_extract",
+                                                "concentration_percent": 1,
+                                            },
+                                            {
+                                                "compound": "peptone",
+                                                "concentration_percent": 2,
+                                            },
+                                        ],
+                                        "additives": [
+                                            {
+                                                "compound": "butanol",
+                                                "concentration_percent": 1,
+                                            }
+                                        ],
+                                    },
+                                    "incubation_duration_hours": 14,
+                                },
+                                "BUT90": {
+                                    "description": "Short-term "
+                                    "filamentation "
+                                    "induction "
+                                    "with "
+                                    "butanol",
+                                    "temperature_celsius": 30,
+                                    "cultivation_method": "unspecified",
+                                    "growth_phase_at_harvest": {"od600": 0.8},
+                                    "media": {
+                                        "name": "YPD",
+                                        "carbon_source": [
+                                            {
+                                                "compound": "D-glucose",
+                                                "concentration_percent": 2,
+                                            }
+                                        ],
+                                        "nitrogen_source": [
+                                            {
+                                                "compound": "yeast_extract",
+                                                "concentration_percent": 1,
+                                            },
+                                            {
+                                                "compound": "peptone",
+                                                "concentration_percent": 2,
+                                            },
+                                        ],
+                                        "additives": [
+                                            {
+                                                "compound": "butanol",
+                                                "concentration_percent": 1,
+                                            }
+                                        ],
+                                    },
+                                    "incubation_duration_minutes": 90,
+                                },
+                                "Thi-": {
+                                    "description": "Vitamin "
+                                    "B1 "
+                                    "deprivation "
+                                    "stress "
+                                    "condition",
+                                    "temperature_celsius": 30,
+                                    "cultivation_method": "unspecified",
+                                    "growth_phase_at_harvest": {"od600": 0.8},
+                                    "media": {
+                                        "name": "synthetic_complete_minus_thiamine",
+                                        "carbon_source": "unspecified",
+                                        "nitrogen_source": "unspecified",
+                                    },
+                                },
+                                "GAL": {
+                                    "description": "Galactose-based "
+                                    "growth "
+                                    "medium "
+                                    "condition",
+                                    "temperature_celsius": 30,
+                                    "cultivation_method": "unspecified",
+                                    "growth_phase_at_harvest": {"od600": 0.8},
+                                    "media": {
+                                        "name": "yeast_extract_peptone",
+                                        "carbon_source": [
+                                            {
+                                                "compound": "D-galactose",
+                                                "concentration_percent": 2,
+                                            }
+                                        ],
+                                        "nitrogen_source": [
+                                            {
+                                                "compound": "yeast_extract",
+                                                "concentration_percent": "unspecified",
+                                            },
+                                            {
+                                                "compound": "peptone",
+                                                "concentration_percent": "unspecified",
+                                            },
+                                        ],
+                                    },
+                                },
+                                "HEAT": {
+                                    "description": "Heat "
+                                    "shock "
+                                    "stress "
+                                    "condition",
+                                    "initial_temperature_celsius": 30,
+                                    "temperature_shift_celsius": 37,
+                                    "temperature_shift_duration_minutes": 45,
+                                    "cultivation_method": "unspecified",
+                                    "growth_phase_at_harvest": {"od600": 0.5},
+                                    "media": {
+                                        "name": "YPD",
+                                        "carbon_source": [
+                                            {
+                                                "compound": "D-glucose",
+                                                "concentration_percent": 2,
+                                            }
+                                        ],
+                                        "nitrogen_source": [
+                                            {
+                                                "compound": "yeast_extract",
+                                                "concentration_percent": 1,
+                                            },
+                                            {
+                                                "compound": "peptone",
+                                                "concentration_percent": 2,
+                                            },
+                                        ],
+                                    },
+                                },
+                                "Pi-": {
+                                    "description": "Phosphate "
+                                    "deprivation "
+                                    "stress "
+                                    "condition",
+                                    "temperature_celsius": 30,
+                                    "cultivation_method": "unspecified",
+                                    "growth_phase_at_harvest": {"od600": 0.8},
+                                    "media": {
+                                        "name": "synthetic_complete_minus_phosphate",
+                                        "carbon_source": "unspecified",
+                                        "nitrogen_source": "unspecified",
+                                    },
+                                },
+                                "RAFF": {
+                                    "description": "Raffinose-based "
+                                    "growth "
+                                    "medium "
+                                    "condition",
+                                    "temperature_celsius": 30,
+                                    "cultivation_method": "unspecified",
+                                    "growth_phase_at_harvest": {"od600": 0.8},
+                                    "media": {
+                                        "name": "yeast_extract_peptone",
+                                        "carbon_source": [
+                                            {
+                                                "compound": "D-raffinose",
+                                                "concentration_percent": 2,
+                                            }
+                                        ],
+                                        "nitrogen_source": [
+                                            {
+                                                "compound": "yeast_extract",
+                                                "concentration_percent": "unspecified",
+                                            },
+                                            {
+                                                "compound": "peptone",
+                                                "concentration_percent": "unspecified",
+                                            },
+                                        ],
+                                    },
+                                },
+                            },
+                        },
+                        {
+                            "name": "regulator_locus_tag",
+                            "dtype": "string",
+                            "description": "Systematic gene name (ORF "
+                            "identifier) of the ChIPd "
+                            "transcription factor",
+                            "role": "regulator_identifier",
+                        },
+                        {
+                            "name": "regulator_symbol",
+                            "dtype": "string",
+                            "description": "Standard gene symbol of "
+                            "the ChIPd transcription "
+                            "factor",
+                            "role": "regulator_identifier",
+                        },
+                        {
+                            "name": "target_locus_tag",
+                            "dtype": "string",
+                            "description": "Systematic gene name (ORF "
+                            "identifier) of the target "
+                            "gene measured",
+                            "role": "target_identifier",
+                        },
+                        {
+                            "name": "target_symbol",
+                            "dtype": "string",
+                            "description": "Standard gene symbol of "
+                            "the target gene measured",
+                            "role": "target_identifier",
+                        },
+                        {
+                            "name": "effect",
+                            "dtype": "float64",
+                            "description": "The chip channel ratio " "(effect size)",
+                            "role": "quantitative_measure",
+                        },
+                        {
+                            "name": "pvalue",
+                            "dtype": "float64",
+                            "description": "pvalue of the chip "
+                            "channel ratio (effect)",
+                            "role": "quantitative_measure",
+                        },
+                    ]
+                },
+            }
+        ],
+    }
+
+
+@pytest.fixture
+def kemmeren_2014_datacard():
+    """Complete kemmeren_2014 datacard YAML from
+    huggingface_collection_datacards.txt."""
+    return {
+        "license": "mit",
+        "language": ["en"],
+        "tags": [
+            "genomics",
+            "yeast",
+            "transcription",
+            "perturbation",
+            "response",
+            "knockout",
+            "TFKO",
+        ],
+        "pretty_name": "Kemmeren, 2014 Overexpression",
+        "size_categories": ["1M<n<10M"],
+        "experimental_conditions": {
+            "temperature_celsius": 30,
+            "cultivation_method": "plate",
+            "growth_phase_at_harvest": {
+                "phase": "mid_log_phase",
+                "od600": 0.6,
+                "od600_tolerance": 0.1,
+            },
+            "media": {
+                "name": "synthetic_complete",
+                "carbon_source": [
+                    {"compound": "D-glucose", "concentration_percent": 2}
+                ],
+                "nitrogen_source": [
+                    {
+                        "compound": "yeast_nitrogen_base",
+                        "concentration_percent": 0.671,
+                        "specifications": [
+                            "without_amino_acids",
+                            "without_carbohydrate",
+                            "with_ammonium_sulfate",
+                        ],
+                    },
+                    {
+                        "compound": "amino_acid_dropout_mix",
+                        "concentration_percent": 0.2,
+                    },
+                ],
+            },
+        },
+        "configs": [
+            {
+                "config_name": "kemmeren_2014",
+                "description": "Transcriptional regulator overexpression perturbation "
+                "data with differential expression measurements",
+                "dataset_type": "annotated_features",
+                "default": True,
+                "metadata_fields": ["regulator_locus_tag", "regulator_symbol"],
+                "data_files": [{"split": "train", "path": "kemmeren_2014.parquet"}],
+                "dataset_info": {
+                    "features": [
+                        {
+                            "name": "sample_id",
+                            "dtype": "integer",
+                            "description": "unique identifier for a "
+                            "specific sample. The "
+                            "sample ID identifies a "
+                            "unique regulator.",
+                        },
+                        {
+                            "name": "db_id",
+                            "dtype": "integer",
+                            "description": "an old unique identifer, "
+                            "for use internally only. "
+                            "Deprecated and will be "
+                            "removed eventually. Do "
+                            "not use in analysis. "
+                            "db_id = 0 for loci that "
+                            "were originally parsed "
+                            "incorrectly.",
+                        },
+                        {
+                            "name": "regulator_locus_tag",
+                            "dtype": "string",
+                            "description": "induced transcriptional "
+                            "regulator systematic ID. "
+                            "See "
+                            "hf/BrentLab/yeast_genome_resources",
+                            "role": "regulator_identifier",
+                        },
+                        {
+                            "name": "regulator_symbol",
+                            "dtype": "string",
+                            "description": "induced transcriptional "
+                            "regulator common name. If "
+                            "no common name exists, "
+                            "then the "
+                            "`regulator_locus_tag` is "
+                            "used.",
+                            "role": "regulator_identifier",
+                        },
+                        {
+                            "name": "reporterId",
+                            "dtype": "string",
+                            "description": "probe ID as reported from "
+                            "the original data",
+                        },
+                        {
+                            "name": "target_locus_tag",
+                            "dtype": "string",
+                            "description": "The systematic ID of the "
+                            "feature to which the "
+                            "effect/pvalue is "
+                            "assigned. See "
+                            "hf/BrentLab/yeast_genome_resources",
+                            "role": "target_identifier",
+                        },
+                        {
+                            "name": "target_symbol",
+                            "dtype": "string",
+                            "description": "The common name of the "
+                            "feature to which the "
+                            "effect/pvalue is "
+                            "assigned. If there is no "
+                            "common name, the "
+                            "`target_locus_tag` is "
+                            "used.",
+                            "role": "target_identifier",
+                        },
+                        {
+                            "name": "M",
+                            "dtype": "float64",
+                            "description": "log₂ fold change (mutant " "vs wildtype)",
+                            "role": "quantitative_measure",
+                        },
+                        {
+                            "name": "Madj",
+                            "dtype": "float64",
+                            "description": "M value with the cell "
+                            "cycle signal removed (see "
+                            "paper cited in the "
+                            "introduction above)",
+                            "role": "quantitative_measure",
+                        },
+                        {
+                            "name": "A",
+                            "dtype": "float64",
+                            "description": "average log2 intensity of "
+                            "the two channels, a proxy "
+                            "for expression level "
+                            "(This is a guess based on "
+                            "microarray convention -- "
+                            "not specified on holstege "
+                            "site)",
+                            "role": "quantitative_measure",
+                        },
+                        {
+                            "name": "pval",
+                            "dtype": "float64",
+                            "description": "significance of the "
+                            "modeled effect (M), from "
+                            "limma",
+                            "role": "quantitative_measure",
+                        },
+                        {
+                            "name": "variable_in_wt",
+                            "dtype": "string",
+                            "description": "True if the given locus "
+                            "is variable in the WT "
+                            "condition. Recommended to "
+                            "remove these from "
+                            "analysis. False "
+                            "otherwise. See Holstege "
+                            "website for more "
+                            "information",
+                            "role": "experimental_condition",
+                        },
+                        {
+                            "name": "multiple_probes",
+                            "dtype": "string",
+                            "description": "True if there is more "
+                            "than one probe associated "
+                            "with the same genomic "
+                            "locus. False otherwise",
+                            "role": "experimental_condition",
+                        },
+                        {
+                            "name": "kemmeren_regulator",
+                            "dtype": "string",
+                            "description": "True if the regulator is "
+                            "one of the regulators "
+                            "studied in the original "
+                            "Kemmeren et al. (2014) "
+                            "global regulator study. "
+                            "False otherwise",
+                            "role": "experimental_condition",
+                        },
+                        {
+                            "name": "regulator_desc",
+                            "dtype": "string",
+                            "description": "functional description of "
+                            "the induced regulator "
+                            "from the original paper "
+                            "supplement",
+                            "role": "experimental_condition",
+                        },
+                        {
+                            "name": "functional_category",
+                            "dtype": "string",
+                            "description": "functional classification "
+                            "of the regulator from the "
+                            "original paper supplement",
+                            "role": "experimental_condition",
+                        },
+                        {
+                            "name": "slides",
+                            "dtype": "string",
+                            "description": "identifier(s) for the "
+                            "microarray slide(s) used "
+                            "in this experiment",
+                            "role": "experimental_condition",
+                        },
+                        {
+                            "name": "mating_type",
+                            "dtype": "string",
+                            "description": "mating type of the strain "
+                            "background used in the "
+                            "experiment",
+                            "role": "experimental_condition",
+                        },
+                        {
+                            "name": "source_of_deletion_mutants",
+                            "dtype": "string",
+                            "description": "origin of the strain",
+                            "role": "experimental_condition",
+                        },
+                        {
+                            "name": "primary_hybsets",
+                            "dtype": "string",
+                            "description": "identifier for the "
+                            "primary hybridization set "
+                            "to which this sample "
+                            "belongs",
+                            "role": "experimental_condition",
+                        },
+                        {
+                            "name": "responsive_non_responsive",
+                            "dtype": "string",
+                            "description": "classification of the "
+                            "regulator as responsive "
+                            "or not to the deletion "
+                            "from the original paper "
+                            "supplement",
+                            "role": "experimental_condition",
+                        },
+                        {
+                            "name": "nr_sign_changes",
+                            "dtype": "integer",
+                            "description": "number of significant "
+                            "changes in expression "
+                            "detected for the "
+                            "regulator locus tag "
+                            "(abs(M) > log2(1.7) & "
+                            "pval < 0.05). Note that "
+                            "there is a slight "
+                            "difference when "
+                            "calculating from the data "
+                            "provided here, I believe "
+                            "due to a difference in "
+                            "the way the targets are "
+                            "parsed and filtered (some "
+                            "ORFs that have since been "
+                            "removed from the "
+                            "annotations are removed). "
+                            "I didn't investigate this "
+                            "closely, though.",
+                            "role": "experimental_condition",
+                        },
+                        {
+                            "name": "profile_first_published",
+                            "dtype": "string",
+                            "description": "citation or reference "
+                            "indicating where this "
+                            "expression profile was "
+                            "first published",
+                            "role": "experimental_condition",
+                        },
+                        {
+                            "name": "chase_notes",
+                            "dtype": "string",
+                            "description": "notes added during data "
+                            "curation and parsing",
+                        },
+                    ]
+                },
+            }
+        ],
+    }
diff --git a/tfbpapi/tests/conftests.py b/tfbpapi/tests/conftests.py
deleted file mode 100644
index e69de29..0000000
diff --git a/tfbpapi/tests/example_datacards.py b/tfbpapi/tests/example_datacards.py
new file mode 100644
index 0000000..36b023f
--- /dev/null
+++ b/tfbpapi/tests/example_datacards.py
@@ -0,0 +1,510 @@
+# flake8: noqa
+"""
+Three diverse datacard examples for testing datacard parsing and database construction.
+
+These examples capture different patterns of experimental condition specification:
+1. Top-level conditions with field-level variations (minimal media)
+2. Complex field-level definitions with multiple environmental conditions
+3. Partitioned dataset with separate metadata configs using applies_to
+
+"""
+
+EXAMPLE_1_SIMPLE_TOPLEVEL = """---
+license: mit
+language:
+  - en
+tags:
+  - genomics
+  - yeast
+  - transcription
+pretty_name: "Example Dataset 1 - TF Perturbation"
+size_categories:
+  - 100K<n<1M
+experimental_conditions:
+  environmental_conditions:
+    temperature_celsius: 30
+    cultivation_method: batch_culture
+    media:
+      name: minimal
+      carbon_source:
+        - compound: D-glucose
+          concentration_percent: 2
+      nitrogen_source:
+        - compound: ammonium_sulfate
+          # 5 g/L
+          concentration_percent: 0.5
+configs:
+  - config_name: perturbation_data
+    description: TF perturbation expression data
+    default: true
+    dataset_type: annotated_features
+    metadata_fields: ["sample_id", "regulator_locus_tag", "regulator_symbol", "time", "treatment"]
+    data_files:
+      - split: train
+        path: perturbation.parquet
+    dataset_info:
+      features:
+        - name: sample_id
+          dtype: integer
+          description: Unique identifier for each sample
+        - name: regulator_locus_tag
+          dtype: string
+          description: Systematic gene identifier of the perturbed transcription factor
+          role: regulator_identifier
+        - name: regulator_symbol
+          dtype: string
+          description: Standard gene symbol of the perturbed transcription factor
+          role: regulator_identifier
+        - name: target_locus_tag
+          dtype: string
+          description: Systematic gene identifier of the target gene
+          role: target_identifier
+        - name: target_symbol
+          dtype: string
+          description: Standard gene symbol of the target gene
+          role: target_identifier
+        - name: time
+          dtype: float
+          description: Time point in minutes after perturbation
+          role: experimental_condition
+        - name: treatment
+          dtype:
+            class_label:
+              names: ["control", "nitrogen_depletion", "phosphate_depletion"]
+          description: Nutrient limitation treatment applied
+          role: experimental_condition
+          definitions:
+            control:
+              description: Standard minimal media with normal nutrient levels
+              environmental_conditions:
+                media:
+                  nitrogen_source:
+                    - compound: ammonium_sulfate
+                      # 5 g/L
+                      concentration_percent: 0.5
+                  phosphate_source:
+                    - compound: potassium_phosphate_monobasic
+                      # 1 g/L
+                      concentration_percent: 0.1
+            nitrogen_depletion:
+              description: Nitrogen-limited minimal media
+              environmental_conditions:
+                media:
+                  nitrogen_source:
+                    - compound: ammonium_sulfate
+                      # 0.04 g/L
+                      concentration_percent: 0.004
+                  phosphate_source:
+                    - compound: potassium_phosphate_monobasic
+                      # 1 g/L
+                      concentration_percent: 0.1
+            phosphate_depletion:
+              description: Phosphate-limited minimal media
+              environmental_conditions:
+                media:
+                  nitrogen_source:
+                    - compound: ammonium_sulfate
+                      # 5 g/L
+                      concentration_percent: 0.5
+                  phosphate_source:
+                    - compound: potassium_phosphate_monobasic
+                      # 0.02 g/L
+                      concentration_percent: 0.002
+        - name: log2_fold_change
+          dtype: float64
+          description: Log2 fold change relative to unperturbed control
+          role: quantitative_measure
+        - name: pvalue
+          dtype: float64
+          description: Statistical significance of differential expression
+          role: quantitative_measure
+---
+"""
+
+
+EXAMPLE_2_COMPLEX_FIELD_DEFINITIONS = """---
+license: mit
+language:
+  - en
+tags:
+  - genomics
+  - yeast
+  - binding
+  - chip-seq
+pretty_name: "Example Dataset 2 - Multi-Condition ChIP"
+size_categories:
+  - 1M<n<10M
+strain_information:
+  background: S288C
+  base_strain: BY4741
+configs:
+  - config_name: chip_binding
+    description: ChIP-seq binding data across environmental conditions
+    dataset_type: annotated_features
+    default: true
+    metadata_fields: ["sample_id", "regulator_locus_tag", "regulator_symbol", "condition"]
+    data_files:
+      - split: train
+        path: chip_data.parquet
+    dataset_info:
+      features:
+        - name: sample_id
+          dtype: integer
+          description: Unique identifier for each sample
+        - name: regulator_locus_tag
+          dtype: string
+          description: >-
+            Systematic gene identifier of the ChIP-targeted transcription factor
+          role: regulator_identifier
+        - name: regulator_symbol
+          dtype: string
+          description: Standard gene symbol of the ChIP-targeted transcription factor
+          role: regulator_identifier
+        - name: target_locus_tag
+          dtype: string
+          description: Systematic gene identifier of the target gene
+          role: target_identifier
+        - name: target_symbol
+          dtype: string
+          description: Standard gene symbol of the target gene
+          role: target_identifier
+        - name: condition
+          dtype:
+            class_label:
+              names: ["YPD", "galactose", "heat_shock", "oxidative_stress",
+                      "amino_acid_starvation"]
+          description: Environmental or stress condition of the experiment
+          role: experimental_condition
+          definitions:
+            YPD:
+              description: Rich media baseline condition
+              environmental_conditions:
+                temperature_celsius: 30
+                cultivation_method: liquid_culture
+                growth_phase_at_harvest:
+                  od600: 0.6
+                  stage: mid_log_phase
+                media:
+                  name: YPD
+                  carbon_source:
+                    - compound: D-glucose
+                      concentration_percent: 2
+                  nitrogen_source:
+                    - compound: yeast_extract
+                      concentration_percent: 1
+                    - compound: peptone
+                      concentration_percent: 2
+            galactose:
+              description: Alternative carbon source condition
+              environmental_conditions:
+                temperature_celsius: 30
+                cultivation_method: liquid_culture
+                growth_phase_at_harvest:
+                  od600: 0.6
+                  stage: mid_log_phase
+                media:
+                  name: YPD
+                  carbon_source:
+                    - compound: D-galactose
+                      concentration_percent: 2
+                  nitrogen_source:
+                    - compound: yeast_extract
+                      concentration_percent: 1
+                    - compound: peptone
+                      concentration_percent: 2
+            heat_shock:
+              description: Temperature stress condition
+              environmental_conditions:
+                temperature_celsius: 37
+                cultivation_method: liquid_culture
+                growth_phase_at_harvest:
+                  od600: 0.6
+                  stage: mid_log_phase
+                media:
+                  name: YPD
+                  carbon_source:
+                    - compound: D-glucose
+                      concentration_percent: 2
+                  nitrogen_source:
+                    - compound: yeast_extract
+                      concentration_percent: 1
+                    - compound: peptone
+                      concentration_percent: 2
+                heat_treatment:
+                  duration_minutes: 15
+            oxidative_stress:
+              description: Hydrogen peroxide stress condition
+              environmental_conditions:
+                temperature_celsius: 30
+                cultivation_method: liquid_culture
+                growth_phase_at_harvest:
+                  od600: 0.6
+                  stage: mid_log_phase
+                media:
+                  name: YPD
+                  carbon_source:
+                    - compound: D-glucose
+                      concentration_percent: 2
+                  nitrogen_source:
+                    - compound: yeast_extract
+                      concentration_percent: 1
+                    - compound: peptone
+                      concentration_percent: 2
+                chemical_treatment:
+                  compound: hydrogen_peroxide
+                  concentration_percent: 0.004
+                  duration_minutes: 20
+            amino_acid_starvation:
+              description: Amino acid starvation via chemical inhibition
+              environmental_conditions:
+                temperature_celsius: 30
+                cultivation_method: liquid_culture
+                growth_phase_at_harvest:
+                  od600: 0.5
+                  stage: mid_log_phase
+                media:
+                  name: synthetic_complete
+                  carbon_source:
+                    - compound: D-glucose
+                      concentration_percent: 2
+                  nitrogen_source:
+                    - compound: yeast_nitrogen_base
+                      # 6.71 g/L
+                      concentration_percent: 0.671
+                      specifications:
+                        - without_amino_acids
+                        - without_ammonium_sulfate
+                    - compound: ammonium_sulfate
+                      # 5 g/L
+                      concentration_percent: 0.5
+                    - compound: amino_acid_dropout_mix
+                      # 2 g/L
+                      concentration_percent: 0.2
+                chemical_treatment:
+                  compound: 3-amino-1,2,4-triazole
+                  concentration_percent: 0.01
+                  duration_hours: 1
+        - name: binding_score
+          dtype: float64
+          description: ChIP-seq binding enrichment score
+          role: quantitative_measure
+        - name: peak_pvalue
+          dtype: float64
+          description: Statistical significance of binding peak
+          role: quantitative_measure
+        - name: peak_qvalue
+          dtype: float64
+          description: FDR-adjusted p-value for binding peak
+          role: quantitative_measure
+---
+"""
+
+
+EXAMPLE_3_PARTITIONED_WITH_METADATA = """---
+license: mit
+language:
+  - en
+tags:
+  - genomics
+  - yeast
+  - binding
+  - genome-wide
+  - chec-seq
+pretty_name: "Example Dataset 3 - Genome Coverage Compendium"
+size_categories:
+  - 10M<n<100M
+experimental_conditions:
+  environmental_conditions:
+    temperature_celsius: 30
+    cultivation_method: liquid_culture
+    growth_phase_at_harvest:
+      od600: 0.8
+      stage: late_log_phase
+    media:
+      name: synthetic_complete
+      carbon_source:
+        - compound: D-glucose
+          concentration_percent: 2
+      nitrogen_source:
+        - compound: yeast_nitrogen_base
+          # 6.71 g/L
+          concentration_percent: 0.671
+          specifications:
+            - without_amino_acids
+            - without_ammonium_sulfate
+        - compound: ammonium_sulfate
+          # 5 g/L
+          concentration_percent: 0.5
+        - compound: amino_acid_dropout_mix
+          # 2 g/L
+          concentration_percent: 0.2
+configs:
+  - config_name: genome_coverage
+    description: Genome-wide binding coverage at base-pair resolution
+    dataset_type: genome_map
+    default: true
+    data_files:
+      - split: train
+        path: genome_map/batch=*/regulator=*/*.parquet
+    dataset_info:
+      features:
+        - name: sample_id
+          dtype: integer
+          description: Unique identifier for each sample
+        - name: chr
+          dtype: string
+          description: Chromosome identifier (chrI, chrII, etc.)
+          role: genomic_coordinate
+        - name: pos
+          dtype: int32
+          description: Genomic position (0-based)
+          role: genomic_coordinate
+        - name: coverage
+          dtype: float32
+          description: Normalized coverage value at this position
+          role: quantitative_measure
+      partitioning:
+        enabled: true
+        partition_by: ["batch", "regulator"]
+        path_template: "genome_map/batch={batch}/regulator={regulator}/*.parquet"
+
+  - config_name: standard_batch_metadata
+    description: Metadata for standard ChEC-seq experiments
+    dataset_type: metadata
+    applies_to: ["genome_coverage"]
+    data_files:
+      - split: train
+        path: standard_metadata.parquet
+    dataset_info:
+      features:
+        - name: sample_id
+          dtype: integer
+          description: Unique identifier for each sample
+        - name: batch
+          dtype: string
+          description: Experimental batch identifier
+        - name: regulator
+          dtype: string
+          description: Transcription factor systematic identifier
+        - name: regulator_locus_tag
+          dtype: string
+          description: Systematic gene identifier of the transcription factor
+          role: regulator_identifier
+        - name: regulator_symbol
+          dtype: string
+          description: Standard gene symbol of the transcription factor
+          role: regulator_identifier
+        - name: accession
+          dtype: string
+          description: SRA accession number for the sequencing data
+        - name: replicate
+          dtype: int32
+          description: Biological replicate number
+        - name: sequencing_depth
+          dtype: int64
+          description: Total number of sequencing reads
+
+  - config_name: variant_batch_metadata
+    description: Metadata for TF variant experiments with altered conditions
+    dataset_type: metadata
+    applies_to: ["genome_coverage"]
+    experimental_conditions:
+      environmental_conditions:
+        temperature_celsius: 25
+        cultivation_method: liquid_culture
+        growth_phase_at_harvest:
+          od600: 0.6
+          stage: mid_log_phase
+        media:
+          name: synthetic_complete
+          carbon_source:
+            - compound: D-raffinose
+              concentration_percent: 2
+          nitrogen_source:
+            - compound: yeast_nitrogen_base
+              # 6.71 g/L
+              concentration_percent: 0.671
+              specifications:
+                - without_amino_acids
+                - without_ammonium_sulfate
+            - compound: ammonium_sulfate
+              # 5 g/L
+              concentration_percent: 0.5
+            - compound: amino_acid_dropout_mix
+              # 2 g/L
+              concentration_percent: 0.2
+              specifications:
+                - minus_uracil
+    data_files:
+      - split: train
+        path: variant_metadata.parquet
+    dataset_info:
+      features:
+        - name: sample_id
+          dtype: integer
+          description: Unique identifier for each sample
+        - name: batch
+          dtype: string
+          description: Experimental batch identifier (prefixed with 'VAR')
+        - name: regulator
+          dtype: string
+          description: Transcription factor systematic identifier
+        - name: regulator_locus_tag
+          dtype: string
+          description: Systematic gene identifier of the transcription factor
+          role: regulator_identifier
+        - name: regulator_symbol
+          dtype: string
+          description: Standard gene symbol of the transcription factor
+          role: regulator_identifier
+        - name: variant_type
+          dtype: string
+          description: Type of transcription factor variant (DBD_swap, truncation, etc.)
+        - name: accession
+          dtype: string
+          description: SRA accession number for the sequencing data
+        - name: replicate
+          dtype: int32
+          description: Biological replicate number
+
+  - config_name: qc_metrics
+    description: Quality control metrics for all genome coverage samples
+    dataset_type: comparative
+    applies_to: ["genome_coverage"]
+    data_files:
+      - split: train
+        path: comparative_data.parquet
+    dataset_info:
+      features:
+        - name: sample_id
+          dtype: integer
+          description: Unique identifier for each sample
+        - name: batch
+          dtype: string
+          description: Experimental batch identifier
+        - name: regulator
+          dtype: string
+          description: Transcription factor systematic identifier
+        - name: accession
+          dtype: string
+          description: SRA accession number for the sequencing data
+        - name: total_reads
+          dtype: int64
+          description: Total number of sequencing reads
+        - name: mapped_reads
+          dtype: int64
+          description: Number of reads successfully mapped to genome
+        - name: mapping_rate
+          dtype: float64
+          description: Percentage of reads successfully mapped
+        - name: peak_count
+          dtype: int32
+          description: Number of binding peaks identified
+        - name: signal_to_noise
+          dtype: float64
+          description: Ratio of signal in peaks to background
+        - name: qc_pass
+          dtype: bool
+          description: Whether sample passes quality control thresholds
+---
+"""
diff --git a/tfbpapi/tests/huggingface_collection_datacards.txt b/tfbpapi/tests/huggingface_collection_datacards.txt
new file mode 100644
index 0000000..68d5482
--- /dev/null
+++ b/tfbpapi/tests/huggingface_collection_datacards.txt
@@ -0,0 +1,2002 @@
+# barkai_compendium
+---
+license: mit
+language:
+- en
+tags:
+- transcription-factor
+- binding
+- chec-seq
+- genomics
+- biology
+pretty_name: Barkai ChEC-seq Compendium
+size_categories:
+  - 100M<n<1B
+experimental_conditions:
+  temperature_celsius: 30
+  cultivation_method: liquid_culture
+  growth_phase_at_harvest:
+    od600: 4.0
+    stage: overnight_stationary_phase
+  media:
+    name: synthetic_complete_dextrose
+    # the D-dextrose concentration and nitrogen_source
+    # are unspecified
+    carbon_source:
+      - compound: D-dextrose
+configs:
+- config_name: genomic_coverage
+  description: Genomic coverage data with pileup counts at specific positions
+  dataset_type: genome_map
+  default: true
+  data_files:
+  - split: train
+    path: genome_map/*/*/part-0.parquet
+  dataset_info:
+    features:
+    - name: seqnames
+      dtype: string
+      description: Chromosome or sequence name (e.g., chrI, chrII, etc.)
+    - name: start
+      dtype: int32
+      description: Start position of the genomic interval (1-based coordinates)
+    - name: end
+      dtype: int32
+      description: End position of the genomic interval (1-based coordinates)
+    - name: pileup
+      dtype: int32
+      description: Number of tags (5' of read) at this genomic position
+    partition_info:
+    - name: Series
+      dtype: string
+      description: GEO series of the dataset
+    - name: Accession
+      dtype: string
+      description: GEO accession of the specific sample
+- config_name: GSE178430_metadata
+  description: Metadata for GSE178430
+  dataset_type: metadata
+  data_files:
+  - split: train
+    path: GSE178430_metadata.parquet
+  dataset_info:
+    features:
+    - name: sample_id
+      dtype: integer
+      description: Unique sample identifier. Uniquely identifies an accession
+    - name: series
+      dtype: string
+      description: the GEO series to which this collection belongs
+    - name: accession
+      dtype: string
+      description: Sample accession identifier
+    - name: regulator_locus_tag
+      dtype: string
+      description: Systematic gene name (ORF identifier) of the tagged transcription factor
+    - name: regulator_symbol
+      dtype: string
+      description: Standard gene symbol of the tagged transcription factor
+    - name: strainid
+      dtype: string
+      description: Strain identifier used in the experiment
+    - name: instrument
+      dtype: string
+      description: Sequencing instrument used for data generation
+    - name: genotype
+      dtype: string
+      description: Full genotype description of the experimental strain
+    - name: dbd_donor_symbol
+      dtype: string
+      description: Gene symbol of the DNA-binding domain donor (for chimeric constructs)
+    - name: ortholog_donor
+      dtype: string
+      description: Ortholog donor information for cross-species constructs
+    - name: paralog_deletion_symbol
+      dtype: string
+      description: Gene symbol of deleted paralog in the strain background
+    - name: paralog_resistance_cassette
+      dtype: string
+      description: Antibiotic resistance cassette used for paralog deletion
+- config_name: GSE209631_metadata
+  description: ChEC-seq experiment metadata for transcription factor variant studies
+  dataset_type: metadata
+  data_files:
+  - split: train
+    path: GSE209631_metadata.parquet
+  dataset_info:
+    features:
+    - name: sample_id
+      dtype: integer
+      description: Unique sample identifier. Uniquely identifies an accession
+    - name: series
+      dtype: string
+      description: the GEO series to which this collection belongs
+    - name: accession
+      dtype: string
+      description: Sample accession identifier
+    - name: regulator_locus_tag
+      dtype: string
+      description: Systematic gene name (ORF identifier) of the tagged transcription factor
+      role: regulator_identifier
+    - name: regulator_symbol
+      dtype: string
+      description: Standard gene symbol of the tagged transcription factor
+      role: regulator_identifier
+    - name: variant_type
+      dtype: string
+      description: Type of transcription factor variant tested in the experiment
+- config_name: GSE222268_metadata
+  description: General experiment metadata for genomic studies
+  dataset_type: metadata
+  data_files:
+  - split: train
+    path: GSE222268_metadata.parquet
+  dataset_info:
+    features:
+    - name: sample_id
+      dtype: string
+      description: Unique identifier for the experimental sample
+    - name: series
+      dtype: string
+      description: Series or batch identifier grouping related samples
+    - name: accession
+      dtype: string
+      description: Accession number from public database (e.g., SRA, GEO)
+    - name: regulator_locus_tag
+      dtype: string
+      description: Systematic gene identifier for the transcription factor regulator
+      role: regulator_identifier
+    - name: regulator_symbol
+      dtype: string
+      description: Standard gene symbol for the transcription factor regulator
+      role: regulator_identifier
+    - name: experiment_details
+      dtype: string
+      description: Detailed description of experimental methods, parameters, or conditions
+      role: experimental_condition
+    - name: description
+      dtype:
+        class_label:
+          names: ["MNase", "ChEC-seq"]
+      description: Experiment type, either MNase or ChEC-seq
+---
+
+# callingcards
+---
+license: mit
+language:
+- en
+tags:
+- biology
+- genomics
+- yeast
+- transcription-factors
+- callingcards
+- transposon
+- binding
+- gene-expression
+pretty_name: "Calling Cards Transcription Factor Binding Dataset"
+
+experimental_conditions:
+  temperature_celsius: room
+  # growth phase and cultivation method unspecified
+  media:
+    name: synthetic_complete_minus_ura_his_leu
+    carbon_source:
+      - compound: D-galactose
+        concentration_percent: 2
+    nitrogen_source:
+      # concentration percent unspecified
+      - compound: amino_acid_dropout_mix
+        specifications:
+          - minus_ura
+          - minus_his
+          - minus_leu
+configs:
+- config_name: annotated_features
+  description: Calling Cards transcription factor binding data with enrichment scores and statistical significance
+  dataset_type: annotated_features
+  default: true
+  data_files:
+  - split: train
+    path: annotated_features/*/*.parquet
+  dataset_info:
+    features:
+    - name: id
+      dtype: string
+      description: Unique identifier for each binding measurement
+    - name: regulator_locus_tag
+      dtype: string
+      description: Systematic gene name (ORF identifier) of the transcription factor
+    - name: regulator_symbol
+      dtype: string
+      description: Standard gene symbol of the transcription factor
+    - name: target_locus_tag
+      dtype: string
+      description: Systematic gene name (ORF identifier) of the target gene
+    - name: target_symbol
+      dtype: string
+      description: Standard gene symbol of the target gene
+    - name: experiment_hops
+      dtype: float64
+      description: Number of transposon insertion events (hops) at target locus in experimental sample
+    - name: background_hops
+      dtype: float64
+      description: Number of transposon insertion events (hops) at target locus in background control
+    - name: background_total_hops
+      dtype: float64
+      description: Total number of background hops across all loci in the control sample
+    - name: experiment_total_hops
+      dtype: float64
+      description: Total number of experimental hops across all loci in the experimental sample
+    - name: callingcards_enrichment
+      dtype: float64
+      description: Enrichment score calculated as ratio of normalized experimental to background hops
+    - name: poisson_pval
+      dtype: float64
+      description: P-value from Poisson test for statistical significance of binding enrichment
+    - name: hypergeometric_pval
+      dtype: float64
+      description: P-value from hypergeometric test for statistical significance of binding enrichment
+    - name: batch
+      dtype: string
+      description: Experimental batch identifier for controlling batch effects
+
+- config_name: genome_map
+  description: Genome-wide calling cards insertion density data partitioned by batch
+  dataset_type: genome_map
+  data_files:
+  - split: train
+    path: genome_map/*/*.parquet
+  dataset_info:
+    features:
+    - name: id
+      dtype: string
+      description: Unique identifier for each genomic interval
+    - name: chr
+      dtype: string
+      description: Chromosome name (e.g., chrI, chrII, etc.)
+    - name: start
+      dtype: float64
+      description: Start position of genomic interval
+    - name: end
+      dtype: float64
+      description: End position of genomic interval
+    - name: depth
+      dtype: float64
+      description: Number of transposon insertion events (read depth) in this genomic interval
+    - name: strand
+      dtype: string
+      description: Strand information (+ or -) for the genomic interval
+    - name: batch
+      dtype: string
+      description: Experimental batch identifier
+    partitioning:
+      enabled: true
+      partition_by: ["batch"]
+      path_template: "genome_map/batch={batch}/*.parquet"
+
+- config_name: annotated_features_meta
+  description: Metadata for annotated features datasets including regulator informatioand data quality indicators
+  dataset_type: metadata
+  applies_to: ["annotated_features"]
+  data_files:
+  - split: train
+    path: annotated_features_meta.parquet
+  dataset_info:
+    features:
+    - name: db_id
+      dtype: string
+      description: Database identifier for the dataset
+      role: experimental_condition
+    - name: regulator_locus_tag
+      dtype: string
+      description: Systematic identifier for the regulatory factor
+      role: regulator_identifier
+    - name: regulator_symbol
+      dtype: string
+      description: Standard symbol for the regulatory factor
+      role: regulator_identifier
+    - name: data_usable
+      dtype: string
+      description: Indicator of whether the data is suitable for analysis
+      role: experimental_condition
+    - name: preferred_replicate
+      dtype: string
+      description: Boolean indicator for preferred biological replicate
+      role: experimental_condition
+    - name: batch
+      dtype: string
+      description: Experimental batch identifier
+      role: experimental_condition
+    - name: single_binding
+      dtype: int64
+      description: Count or score for single binding events
+      role: quantitative_measure
+    - name: composite_binding
+      dtype: int64
+      description: Count or score for composite binding events
+      role: quantitative_measure
+    - name: id
+      dtype: string
+      description: Unique identifier for the metadata record
+
+- config_name: genome_map_meta
+  description: Metadata for genome map datasets including regulator information and experimental details
+  dataset_type: metadata
+  applies_to: ["genome_map"]
+  data_files:
+  - split: train
+    path: genome_map_meta.parquet
+  dataset_info:
+    features:
+    - name: id
+      dtype: string
+      description: Unique identifier for the metadata record
+    - name: db_id
+      dtype: string
+      description: current django managed database identifier for the dataset
+      role: experimental_condition
+    - name: regulator_locus_tag
+      dtype: string
+      description: Systematic identifier for the regulatory factor
+      role: regulator_identifier
+    - name: regulator_symbol
+      dtype: string
+      description: Standard symbol for the regulatory factor
+      role: regulator_identifier
+    - name: batch
+      dtype: string
+      description: Experimental batch identifier
+      role: experimental_condition
+    - name: replicate
+      dtype: int64
+      description: Biological replicate number, within batch
+      role: experimental_condition
+    - name: notes
+      dtype: string
+      description: Additional notes or comments about the experiment
+      role: experimental_condition
+---
+
+# hackett_2020
+---
+license: mit
+language:
+- en
+tags:
+- genomics
+- yeast
+- transcription
+- perturbation
+- response
+- overexpression
+pretty_name: Hackett, 2020 Overexpression
+size_categories:
+- 1M<n<10M
+experimental_conditions:
+  temperature_celsius: 30
+  cultivation_method: chemostat
+  media:
+    name: minimal
+    carbon_source:
+      - compound: D-glucose
+        # Saldanha et al 2004: 10 g/l
+        concentration_percent: 1
+configs:
+  - config_name: hackett_2020
+    description: TF overexpression data from Hackett 2020
+    default: true
+    dataset_type: annotated_features
+    metadata_fields: ["sample_id", "regulator_locus_tag", "regulator_symbol", "time", "mechanism", "restriction", "date", "strain"]
+    data_files:
+      - split: train
+        path: hackett_2020.parquet
+    dataset_info:
+      features:
+        - name: sample_id
+          dtype: integer
+          description: >-
+            unique identifier for a specific sample. The sample ID identifies a unique
+            (regulator_locus_tag, time, mechanism, restriction, date, strain) tuple.
+        - name: db_id
+          dtype: integer
+          description: >-
+            an old unique identifer, for use internally only. Deprecated and will be removed eventually.
+            Do not use in analysis. db_id = 0, for GEV and Z3EV, means that those samples are not
+            included in the original DB.
+        - name: regulator_locus_tag
+          dtype: string
+          description: >-
+            induced transcriptional regulator systematic ID.
+            See hf/BrentLab/yeast_genome_resources
+          role: regulator_identifier
+        - name: regulator_symbol
+          dtype: string
+          description: >-
+            induced transcriptional regulator common name. If no common name exists,
+            then the `regulator_locus_tag` is used.
+          role: regulator_identifier
+        - name: target_locus_tag
+          dtype: string
+          description: >-
+            The systematic ID of the feature to which the effect/pvalue is assigned.
+            See hf/BrentLab/yeast_genome_resources
+          role: target_identifier
+        - name: target_symbol
+          dtype: string
+          description: >-
+            The common name of the feature to which the effect/pvalue is assigned.
+            If there is no common name, the `target_locus_tag` is used.
+          role: target_identifier
+        - name: time
+          dtype: float
+          description: time point (minutes)
+          role: experimental_condition
+        - name: mechanism
+          dtype:
+            class_label:
+              names: ["GEV", "ZEV"]
+          description: Synthetic TF induction system (GEV or ZEV)
+          role: experimental_condition
+          definitions:
+            GEV:
+              perturbation_method:
+                type: inducible_overexpression
+                system: GEV
+                inducer: beta-estradiol
+                description: "Galactose-inducible estrogen receptor-VP16 fusion system"
+            ZEV:
+              perturbation_method:
+                type: inducible_overexpression
+                system: ZEV
+                inducer: beta-estradiol
+                description: "Z3 (synthetic zinc finger)-estrogen receptor-VP16 fusion system"
+        - name: restriction
+          dtype:
+            class_label:
+              names: ["M", "N", "P"]
+          description: >-
+            nutrient limitation, one of P (phosphate limitation (20 mg/l).),
+            N (Nitrogen‐limited cultures were maintained at 40 mg/l ammonium sulfate) or
+            M (Not defined in the paper or on the Calico website)
+          role: experimental_condition
+          definitions:
+            P:
+              media:
+                nitrogen_source:
+                  - compound: ammonium_sulfate
+                    # Saldanha et al 2004: 5 g/l
+                    concentration_percent: 0.5
+                phosphate_source:
+                  - compound: potassium_phosphate_monobasic
+                    # Hackett et al 2020: 20 mg/l
+                    concentration_percent: 0.002
+            N:
+              media:
+                nitrogen_source:
+                  - compound: ammonium_sulfate
+                    # Hackett et al 2020: 40 mg/l
+                    concentration_percent: 0.004
+            M:
+              description: "Not defined in the paper or on the Calico website"
+        - name: date
+          dtype: string
+          description: date performed
+          role: experimental_condition
+        - name: strain
+          dtype: string
+          description: strain name
+          role: experimental_condition
+        - name: green_median
+          dtype: float
+          description: median of green (reference) channel fluorescence
+          role: quantitative_measure
+        - name: red_median
+          dtype: float
+          description: median of red (experimental) channel fluorescence
+          role: quantitative_measure
+        - name: log2_ratio
+          dtype: float
+          description: log2(red / green) subtracting value at time zero
+          role: quantitative_measure
+        - name: log2_cleaned_ratio
+          dtype: float
+          description: Non-specific stress response and prominent outliers removed
+          role: quantitative_measure
+        - name: log2_noise_model
+          dtype: float
+          description: estimated noise standard deviation
+          role: quantitative_measure
+        - name: log2_cleaned_ratio_zth2d
+          dtype: float
+          description: >-
+            cleaned timecourses hard-thresholded based on
+            multiple observations (or last observation) passing the noise model
+          role: quantitative_measure
+        - name: log2_selected_timecourses
+          dtype: float
+          description: >-
+            cleaned timecourses hard-thresholded based on single observations
+            passing noise model and impulse evaluation of biological feasibility
+          role: quantitative_measure
+        - name: log2_shrunken_timecourses
+          dtype: float
+          description: >-
+            selected timecourses with observation-level shrinkage based on
+            local FDR (false discovery rate). Most users of the data will want
+            to use this column.
+          role: quantitative_measure
+---
+
+# harbison_2004
+---
+license: mit
+language:
+  - en
+tags:
+  - genomics
+  - yeast
+  - transcription
+  - binding
+pretty_name: "Harbison, 2004 ChIP-chip"
+size_categories:
+  - 1M<n<10M
+strain_information:
+  background: W303
+  base_strain: Z1256
+configs:
+- config_name: harbison_2004
+  description: ChIP-chip transcription factor binding data with environmental conditions
+  dataset_type: annotated_features
+  default: true
+  metadata_fields: ["regulator_locus_tag", "regulator_symbol", "condition"]
+  data_files:
+  - split: train
+    path: harbison_2004.parquet
+  dataset_info:
+    features:
+    - name: condition
+      dtype:
+        class_label:
+          names: ["YPD", "SM", "RAPA", "H2O2Hi", "H2O2Lo",
+                  "Acid", "Alpha", "BUT14", "BUT90", "Thi-",
+                  "GAL", "HEAT", "Pi-", "RAFF"]
+      description: >-
+        Environmental condition of the experiment. Nearly all of the 204 regulators
+        have a YPD condition, and some have others in addition.
+      role: experimental_condition
+      definitions:
+        YPD:
+          description: Rich media baseline condition
+          # Harbison et al 2004: grown at 30°C (from HEAT condition context)
+          temperature_celsius: 30
+          cultivation_method: unspecified
+          growth_phase_at_harvest:
+            # Harbison et al 2004: OD600 ~0.8
+            od600: 0.8
+          media:
+            # Harbison et al 2004: 1% yeast extract / 2% peptone / 2% glucose
+            name: YPD
+            carbon_source:
+              - compound: D-glucose
+                concentration_percent: 2
+            nitrogen_source:
+              - compound: yeast_extract
+                concentration_percent: 1
+              - compound: peptone
+                concentration_percent: 2
+        SM:
+          description: Amino acid starvation stress condition
+          temperature_celsius: 30
+          cultivation_method: unspecified
+          growth_phase_at_harvest:
+            # Harbison et al 2004: OD600 ~0.6
+            od600: 0.6
+          media:
+            # Harbison et al 2004: synthetic complete medium
+            name: synthetic_complete
+            carbon_source: unspecified
+            nitrogen_source: unspecified
+          chemical_treatment:
+            compound: sulfometuron_methyl
+            # Harbison et al 2004: 0.2 mg/ml
+            concentration_percent: 0.02
+            duration_hours: 2
+        RAPA:
+          description: Nutrient deprivation via TOR inhibition
+          temperature_celsius: 30
+          cultivation_method: unspecified
+          growth_phase_at_harvest:
+            # Harbison et al 2004: OD600 ~0.8
+            od600: 0.8
+          media:
+            name: YPD
+            carbon_source:
+              - compound: D-glucose
+                concentration_percent: 2
+            nitrogen_source:
+              - compound: yeast_extract
+                concentration_percent: 1
+              - compound: peptone
+                concentration_percent: 2
+          chemical_treatment:
+            compound: rapamycin
+            # Harbison et al 2004: 100 nM
+            concentration_percent: 9.142e-6
+            duration_minutes: 20
+        H2O2Hi:
+          description: High oxidative stress condition
+          temperature_celsius: 30
+          cultivation_method: unspecified
+          growth_phase_at_harvest:
+            # Harbison et al 2004: OD600 ~0.5
+            od600: 0.5
+          media:
+            name: YPD
+            carbon_source:
+              - compound: D-glucose
+                concentration_percent: 2
+            nitrogen_source:
+              - compound: yeast_extract
+                concentration_percent: 1
+              - compound: peptone
+                concentration_percent: 2
+          chemical_treatment:
+            compound: hydrogen_peroxide
+            # Harbison et al 2004: 4 mM
+            concentration_percent: 0.0136
+            duration_minutes: 30
+        H2O2Lo:
+          description: Moderate oxidative stress condition
+          temperature_celsius: 30
+          cultivation_method: unspecified
+          growth_phase_at_harvest:
+            # Harbison et al 2004: OD600 ~0.5
+            od600: 0.5
+          media:
+            name: YPD
+            carbon_source:
+              - compound: D-glucose
+                concentration_percent: 2
+            nitrogen_source:
+              - compound: yeast_extract
+                concentration_percent: 1
+              - compound: peptone
+                concentration_percent: 2
+          chemical_treatment:
+            compound: hydrogen_peroxide
+            # Harbison et al 2004: 0.4 mM
+            concentration_percent: 0.00136
+            duration_minutes: 20
+        Acid:
+          description: Acidic pH stress condition
+          temperature_celsius: 30
+          cultivation_method: unspecified
+          growth_phase_at_harvest:
+            # Harbison et al 2004: OD600 ~0.5
+            od600: 0.5
+          media:
+            name: YPD
+            carbon_source:
+              - compound: D-glucose
+                concentration_percent: 2
+            nitrogen_source:
+              - compound: yeast_extract
+                concentration_percent: 1
+              - compound: peptone
+                concentration_percent: 2
+          chemical_treatment:
+            compound: succinic_acid
+            # Harbison et al 2004: 0.05 M to reach pH 4.0
+            concentration_percent: 0.59
+            target_pH: 4.0
+            duration_minutes: 30
+        Alpha:
+          description: Mating pheromone induction condition
+          temperature_celsius: 30
+          cultivation_method: unspecified
+          growth_phase_at_harvest:
+            # Harbison et al 2004: OD600 ~0.8
+            od600: 0.8
+          media:
+            name: YPD
+            carbon_source:
+              - compound: D-glucose
+                concentration_percent: 2
+            nitrogen_source:
+              - compound: yeast_extract
+                concentration_percent: 1
+              - compound: peptone
+                concentration_percent: 2
+          chemical_treatment:
+            compound: alpha_factor_pheromone
+            # Harbison et al 2004: 5 mg/ml
+            concentration_percent: 0.5
+            duration_minutes: 30
+        BUT14:
+          description: Long-term filamentation induction with butanol
+          temperature_celsius: 30
+          cultivation_method: unspecified
+          growth_phase_at_harvest:
+            # Harbison et al 2004: OD600 ~0.8
+            od600: 0.8
+          media:
+            # Harbison et al 2004: YPD containing 1% butanol
+            name: YPD
+            carbon_source:
+              - compound: D-glucose
+                concentration_percent: 2
+            nitrogen_source:
+              - compound: yeast_extract
+                concentration_percent: 1
+              - compound: peptone
+                concentration_percent: 2
+            additives:
+              - compound: butanol
+                concentration_percent: 1
+          incubation_duration_hours: 14
+        BUT90:
+          description: Short-term filamentation induction with butanol
+          temperature_celsius: 30
+          cultivation_method: unspecified
+          growth_phase_at_harvest:
+            # Harbison et al 2004: OD600 ~0.8
+            od600: 0.8
+          media:
+            # Harbison et al 2004: YPD containing 1% butanol
+            name: YPD
+            carbon_source:
+              - compound: D-glucose
+                concentration_percent: 2
+            nitrogen_source:
+              - compound: yeast_extract
+                concentration_percent: 1
+              - compound: peptone
+                concentration_percent: 2
+            additives:
+              - compound: butanol
+                concentration_percent: 1
+          incubation_duration_minutes: 90
+        "Thi-":
+          description: Vitamin B1 deprivation stress condition
+          temperature_celsius: 30
+          cultivation_method: unspecified
+          growth_phase_at_harvest:
+            # Harbison et al 2004: OD600 ~0.8
+            od600: 0.8
+          media:
+            # Harbison et al 2004: synthetic complete medium lacking thiamin
+            name: synthetic_complete_minus_thiamine
+            carbon_source: unspecified
+            nitrogen_source: unspecified
+        GAL:
+          description: Galactose-based growth medium condition
+          temperature_celsius: 30
+          cultivation_method: unspecified
+          growth_phase_at_harvest:
+            # Harbison et al 2004: OD600 ~0.8
+            od600: 0.8
+          media:
+            # Harbison et al 2004: YEP medium supplemented with galactose (2%)
+            name: yeast_extract_peptone
+            carbon_source:
+              - compound: D-galactose
+                concentration_percent: 2
+            nitrogen_source:
+              - compound: yeast_extract
+                concentration_percent: unspecified
+              - compound: peptone
+                concentration_percent: unspecified
+        HEAT:
+          description: Heat shock stress condition
+          # Harbison et al 2004: grown at 30°C, shifted to 37°C for 45 min
+          initial_temperature_celsius: 30
+          temperature_shift_celsius: 37
+          temperature_shift_duration_minutes: 45
+          cultivation_method: unspecified
+          growth_phase_at_harvest:
+            # Harbison et al 2004: OD600 ~0.5
+            od600: 0.5
+          media:
+            # Harbison et al 2004: YPD
+            name: YPD
+            carbon_source:
+              - compound: D-glucose
+                concentration_percent: 2
+            nitrogen_source:
+              - compound: yeast_extract
+                concentration_percent: 1
+              - compound: peptone
+                concentration_percent: 2
+        "Pi-":
+          description: Phosphate deprivation stress condition
+          temperature_celsius: 30
+          cultivation_method: unspecified
+          growth_phase_at_harvest:
+            # Harbison et al 2004: OD600 ~0.8
+            od600: 0.8
+          media:
+            # Harbison et al 2004: synthetic complete medium lacking phosphate
+            name: synthetic_complete_minus_phosphate
+            carbon_source: unspecified
+            nitrogen_source: unspecified
+        RAFF:
+          description: Raffinose-based growth medium condition
+          temperature_celsius: 30
+          cultivation_method: unspecified
+          growth_phase_at_harvest:
+            # Harbison et al 2004: OD600 ~0.8
+            od600: 0.8
+          media:
+            # Harbison et al 2004: YEP medium supplemented with raffinose (2%)
+            name: yeast_extract_peptone
+            carbon_source:
+              - compound: D-raffinose
+                concentration_percent: 2
+            nitrogen_source:
+              - compound: yeast_extract
+                concentration_percent: unspecified
+              - compound: peptone
+                concentration_percent: unspecified
+    - name: regulator_locus_tag
+      dtype: string
+      description: Systematic gene name (ORF identifier) of the ChIPd transcription factor
+      role: regulator_identifier
+    - name: regulator_symbol
+      dtype: string
+      description: Standard gene symbol of the ChIPd transcription factor
+      role: regulator_identifier
+    - name: target_locus_tag
+      dtype: string
+      description: Systematic gene name (ORF identifier) of the target gene measured
+      role: target_identifier
+    - name: target_symbol
+      dtype: string
+      description: Standard gene symbol of the target gene measured
+      role: target_identifier
+    - name: effect
+      dtype: float64
+      description: The chip channel ratio (effect size)
+      role: quantitative_measure
+    - name: pvalue
+      dtype: float64
+      description: pvalue of the chip channel ratio (effect)
+      role: quantitative_measure
+---
+
+# hu_2007_reimand_2010
+---
+license: mit
+language:
+  - en
+tags:
+  - genomics
+  - yeast
+  - transcription
+  - perturbation
+  - response
+  - knockout
+  - TFKO
+pretty_name: Hu 2007/Reimand 2010 TFKO
+size_categories:
+  - 1M<n<10M
+
+experimental_conditions:
+  # Hu et al 2007: Temperature not explicitly stated, assuming standard 30°C
+  temperature_celsius: 30
+  cultivation_method: batch
+  growth_phase_at_harvest: mid_log
+  media:
+    name: YPD
+    carbon_source:
+      - compound: D-glucose
+        # Standard YPD: 2% glucose
+        concentration_percent: 2
+    nitrogen_source:
+      - compound: yeast_extract
+        # Standard YPD: 1% yeast extract
+        concentration_percent: 1
+      - compound: peptone
+        # Standard YPD: 2% peptone
+        concentration_percent: 2
+
+configs:
+  - config_name: data
+    description: Regulator knockout expression data from Hu 2007 / Reimand 2010
+    dataset_type: annotated_features
+    data_files:
+    - split: train
+      path: hu_2007_reimand_2010.parquet
+    default: true
+    dataset_info:
+      features:
+        - name: sample_id
+          dtype: integer
+          description: unique identifier for a specific sample. The sample ID identifies a unique regulator_locus_tag
+        - name: db_id
+          dtype: integer
+          description: >-
+            an old unique identifer, for use internally only. Deprecated and will be removed eventually.
+            Do not use in analysis.
+        - name: regulator_locus_tag
+          dtype: string
+          description: induced transcriptional regulator systematic ID. See hf/BrentLab/yeast_genome_resources
+          role: regulator_identifier
+        - name: regulator_symbol
+          dtype: string
+          description: induced transcriptional regulator common name. If no common name exists, then the `regulator_locus_tag` is used.
+          role: regulator_identifier
+        - name: target_locus_tag
+          dtype: string
+          description: The systematic ID of the feature to which the effect/pvalue is assigned. See hf/BrentLab/yeast_genome_resources
+          role: target_identifier
+        - name: target_symbol
+          dtype: string
+          description: The common name of the feature to which the effect/pvalue is assigned. If there is no common name, the `target_locus_tag` is used.
+          role: target_identifier
+        - name: effect
+          dtype: float
+          description: >-
+            log fold change of mutant vs wt. From the remaind methods: Differential expression
+            was calculated using a moderated eBayes t-test as implemented in the Limma
+            Bioconductor package
+          role: quantitative_measure
+        - name: pval
+          dtype: float
+          description: P-values were FDR-adjusted across the whole microarray dataset to correct for multiple testing
+          role: quantitative_measure
+        - name: average_od_of_replicates
+          dtype: float
+          description: average OD of the replicates at harvest
+        - name: heat_shock
+          dtype: bool
+          description: >-
+            `True` if the regulator strain was subjected to heat shock treatment.
+            Applied to 22 transcription factors implicated in heat shock response.
+            `False` otherwise
+          role: experimental_condition
+          definitions:
+            true:
+              # Hu et al 2007: "15-min heat shock at 39°C"
+              temperature_celsius: 39
+              duration_minutes: 15
+              strain_background:
+                genotype: BY4741
+                mating_type: MATa
+                markers:
+                  - his3Δ1
+                  - leu2Δ0
+                  - met15Δ0
+                  - ura3Δ0
+                source: Open_Biosystems
+                description: Knockout strains for nonessential transcription factors
+            false:
+              description: Standard growth conditions at 30°C
+              strain_background:
+                genotype: BY4741
+                mating_type: MATa
+                markers:
+                  - his3Δ1
+                  - leu2Δ0
+                  - met15Δ0
+                  - ura3Δ0
+                source: Open_Biosystems
+                description: Knockout strains for nonessential transcription factors
+        - name: tetracycline_treatment
+          dtype: bool
+          description: >-
+            `True` if the regulator strain was treated with doxycycline to repress
+            TetO7-promoter regulated essential transcription factors. Applied to 6
+            essential transcription factors. `False` for untreated control condition.
+          role: experimental_condition
+          definitions:
+            true:
+              drug_treatment:
+                compound: doxycycline
+                # Hu et al 2007: 10 mg/ml
+                concentration_percent: 1
+                duration_hours_min: 14
+                duration_hours_max: 16
+              strain_background:
+                genotype: BY4741_derivative
+                mating_type: MATa
+                markers:
+                  - URA3::CMV-tTA
+                  - his3Δ1
+                  - leu2Δ0
+                  - met15Δ0
+                source: Open_Biosystems
+                description: Essential transcription factors with TetO7-promoter regulation
+            false:
+              description: No doxycycline treatment; TetO7 promoter active
+              strain_background:
+                genotype: BY4741_derivative
+                mating_type: MATa
+                markers:
+                  - URA3::CMV-tTA
+                  - his3Δ1
+                  - leu2Δ0
+                  - met15Δ0
+                source: Open_Biosystems
+                description: Essential transcription factors with TetO7-promoter regulation
+---
+
+# hughes_2006
+---
+license: mit
+language:
+- en
+tags:
+- biology
+- genomics
+- yeast
+- transcription-factors
+- gene-expression
+- perturbation-screen
+- overexpression
+- knockout
+- microarray
+- functional-genomics
+pretty_name: "Hughes 2006 Yeast Transcription Factor Perturbation Dataset"
+size_categories:
+- 100K<n<1M
+configs:
+- config_name: metadata
+  description: Transcription factor metadata including essentiality and QC status
+  dataset_type: metadata
+  default: true
+  applies_to: ["overexpression", "knockout"]
+  data_files:
+  - split: train
+    path: metadata.parquet
+  dataset_info:
+    features:
+    - name: sample_id
+      dtype: integer
+      description: >-
+        unique identifier for a specific sample. The sample ID identifies
+        a unique regulator_locus_tag and can be used to join to the
+        other datasets in this repo, including the metadata
+    - name: regulator_locus_tag
+      dtype: string
+      role: identifier
+      description: >-
+        Systematic gene name (ORF identifier) of the
+        transcription factor
+    - name: regulator_symbol
+      dtype: string
+      description: Standard gene symbol of the transcription factor
+    - name: found_domain
+      dtype: string
+      description: >-
+        Identified DNA-binding domain(s) or protein family classification
+    - name: sgd_description
+      dtype: string
+      description: >-
+        Functional description from Saccharomyces Genome Database (SGD)
+    - name: essential
+      dtype: bool
+      description: >-
+        Boolean indicating whether the gene is essential for viability
+    - name: oe_passed_qc
+      dtype: bool
+      description: >-
+        Boolean indicating whether overexpression experiments passed
+        quality control
+    - name: del_passed_qc
+      dtype: bool
+      description: >-
+        Boolean indicating whether deletion experiments passed
+        quality control
+
+- config_name: overexpression
+  description: Overexpression perturbation normalized log2 fold changes
+  dataset_type: annotated_features
+  data_files:
+  - split: train
+    path: overexpression.parquet
+    # temperature and growth phase are unspecified. nitrogen_source is
+    # also unspecified
+    media:
+      # Hughes et al 2006: "selective medium supplemented with 2% raffinose"
+      name: selective_medium
+      carbon_source:
+        - compound: D-raffinose
+          # Hughes et al 2006: 2% raffinose
+          concentration_percent: 2
+    induction:
+      # Hughes et al 2006: "induction with 2% galactose for 3 h"
+      inducer:
+        compound: D-galactose
+        concentration_percent: 2
+      duration_hours: 3
+  dataset_info:
+    features:
+    - name: sample_id
+      dtype: integer
+      description: >-
+        unique identifier for a specific sample. The sample ID identifies
+        a unique regulator_locus_tag and can be used to join to the
+        other datasets in this repo, including the metadata
+    - name: regulator_locus_tag
+      dtype: string
+      description: >-
+        Systematic gene name (ORF identifier) of the
+        perturbed transcription factor
+      role: regulator_identifier
+    - name: regulator_symbol
+      dtype: string
+      description: Standard gene symbol of the perturbed transcription factor
+    - name: target_locus_tag
+      dtype: string
+      description: >-
+        Systematic gene name (ORF identifier) of the
+        target gene measured
+      role: target_identifier
+    - name: target_symbol
+      dtype: string
+      description: Standard gene symbol of the target gene measured
+      role: target_identifier
+    - name: dye_plus
+      dtype: float64
+      role: quantitative_measure
+      description: >-
+        Normalized log2 fold change for positive (+) dye orientation.
+        Positive values indicate upregulation in response to overexpression.
+    - name: dye_minus
+      dtype: float64
+      role: quantitative_measure
+      description: >-
+        Normalized log2 fold change for negative (-) dye orientation.
+        Positive values indicate upregulation in response to overexpression.
+    - name: mean_norm_log2fc
+      dtype: float64
+      role: quantitative_measure
+      description: >-
+        Average log2 fold change across dye orientations,
+        providing a dye-independent estimate of gene expression
+        change upon transcription factor overexpression.
+
+- config_name: knockout
+  description: Deletion/knockout perturbation normalized log2 fold changes
+  dataset_type: annotated_features
+  data_files:
+  - split: train
+    path: knockout.parquet
+  experimental_conditions:
+    temperature_celsius: unspecified
+    cultivation_method: unspecified
+    media:
+      # Hughes et al 2006: "synthetic medium supplemented with 2% dextrose"
+      name: synthetic_medium
+      carbon_source:
+        - compound: D-glucose
+          # Hughes et al 2006: 2% dextrose
+          concentration_percent: 2
+      nitrogen_source: unspecified
+  dataset_info:
+    features:
+    - name: sample_id
+      dtype: integer
+      description: >-
+        unique identifier for a specific sample. The sample ID identifies
+        a unique regulator_locus_tag and can be used to join to the
+        other datasets in this repo, including the metadata
+    - name: regulator_locus_tag
+      dtype: string
+      description: >-
+        Systematic gene name (ORF identifier) of the perturbed
+        transcription factor
+      role: regulator_identifier
+    - name: regulator_symbol
+      dtype: string
+      description: Standard gene symbol of the perturbed transcription factor
+      role: regulator_identifier
+    - name: target_locus_tag
+      dtype: string
+      description: >-
+        Systematic gene name (ORF identifier) of the
+        target gene measured
+      role: target_identifier
+    - name: target_symbol
+      dtype: string
+      description: Standard gene symbol of the target gene measured
+      role: target_identifier
+    - name: dye_plus
+      dtype: float64
+      description: >-
+        Normalized log2 fold change for positive (+) dye orientation.
+        Positive values indicate upregulation in response to deletion.
+      role: quantitative_measure
+    - name: dye_minus
+      dtype: float64
+      description: >-
+        Normalized log2 fold change for negative (-) dye orientation.
+        Positive values indicate upregulation in response to deletion.
+      role: quantitative_measure
+    - name: mean_norm_log2fc
+      dtype: float64
+      description: >-
+        Average log2 fold change across dye orientations, providing a
+        dye-independent estimate of gene expression change upon
+        transcription factor deletion.
+      role: quantitative_measure
+---
+
+# kemmeren_2014
+---
+license: mit
+language:
+- en
+tags:
+- genomics
+- yeast
+- transcription
+- perturbation
+- response
+- knockout
+- TFKO
+pretty_name: "Kemmeren, 2014 Overexpression"
+size_categories:
+- 1M<n<10M
+
+experimental_conditions:
+  temperature_celsius: 30
+  cultivation_method: plate
+  growth_phase_at_harvest:
+    # harbison et al., specified this as early mid log phase. simplified here
+    phase: "mid_log_phase"
+    od600: 0.6
+    od600_tolerance: 0.1
+  media:
+    name: synthetic_complete
+    carbon_source:
+      - compound: D-glucose
+        # Kemmeren et al 2014: 2% D-glucose
+        concentration_percent: 2
+    nitrogen_source:
+      - compound: yeast_nitrogen_base
+        # Kemmeren et al 2014: 6.71 g/l
+        concentration_percent: 0.671
+        specifications:
+          - without_amino_acids
+          - without_carbohydrate
+          - with_ammonium_sulfate
+      - compound: amino_acid_dropout_mix
+        # Kemmeren et al 2014: 2.0 g/l
+        concentration_percent: 0.2
+configs:
+- config_name: kemmeren_2014
+  description: >-
+    Transcriptional regulator overexpression perturbation data with
+    differential expression measurements
+  dataset_type: annotated_features
+  default: true
+  metadata_fields: ["regulator_locus_tag", "regulator_symbol"]
+  data_files:
+  - split: train
+    path: kemmeren_2014.parquet
+  dataset_info:
+    features:
+    - name: sample_id
+      dtype: integer
+      description: >-
+        unique identifier for a specific sample.
+        The sample ID identifies a unique regulator.
+    - name: db_id
+      dtype: integer
+      description: >-
+        an old unique identifer, for use internally only. Deprecated and will be removed eventually.
+        Do not use in analysis. db_id = 0 for loci that were originally parsed incorrectly.
+    - name: regulator_locus_tag
+      dtype: string
+      description: >-
+        induced transcriptional regulator systematic ID.
+        See hf/BrentLab/yeast_genome_resources
+      role: regulator_identifier
+    - name: regulator_symbol
+      dtype: string
+      description: >-
+        induced transcriptional regulator common name.
+        If no common name exists, then the `regulator_locus_tag` is used.
+      role: regulator_identifier
+    - name: reporterId
+      dtype: string
+      description: probe ID as reported from the original data
+    - name: target_locus_tag
+      dtype: string
+      description: >-
+        The systematic ID of the feature to which the effect/pvalue is assigned.
+        See hf/BrentLab/yeast_genome_resources
+      role: target_identifier
+    - name: target_symbol
+      dtype: string
+      description: >-
+        The common name of the feature to which the effect/pvalue is assigned.
+        If there is no common name, the `target_locus_tag` is used.
+      role: target_identifier
+    - name: M
+      dtype: float64
+      description: log₂ fold change (mutant vs wildtype)
+      role: quantitative_measure
+    - name: Madj
+      dtype: float64
+      description: >-
+        M value with the cell cycle signal removed
+        (see paper cited in the introduction above)
+      role: quantitative_measure
+    - name: A
+      dtype: float64
+      description: >-
+        average log2 intensity of the two channels, a proxy for expression level
+        (This is a guess based on microarray convention -- not specified on holstege site)
+      role: quantitative_measure
+    - name: pval
+      dtype: float64
+      description: significance of the modeled effect (M), from limma
+      role: quantitative_measure
+    - name: variable_in_wt
+      dtype: string
+      description: >-
+        True if the given locus is variable in the WT condition.
+        Recommended to remove these from analysis. False otherwise.
+        See Holstege website for more information
+      role: experimental_condition
+    - name: multiple_probes
+      dtype: string
+      description: >-
+        True if there is more than one probe associated with
+        the same genomic locus. False otherwise
+      role: experimental_condition
+    - name: kemmeren_regulator
+      dtype: string
+      description: >-
+        True if the regulator is one of the regulators studied in the
+        original Kemmeren et al. (2014) global regulator study. False otherwise
+      role: experimental_condition
+    - name: regulator_desc
+      dtype: string
+      description: >-
+        functional description of the induced regulator
+        from the original paper supplement
+      role: experimental_condition
+    - name: functional_category
+      dtype: string
+      description: functional classification of the regulator from the original paper supplement
+      role: experimental_condition
+    - name: slides
+      dtype: string
+      description: identifier(s) for the microarray slide(s) used in this experiment
+      role: experimental_condition
+    - name: mating_type
+      dtype: string
+      description: mating type of the strain background used in the experiment
+      role: experimental_condition
+    - name: source_of_deletion_mutants
+      dtype: string
+      description: origin of the strain
+      role: experimental_condition
+    - name: primary_hybsets
+      dtype: string
+      description: identifier for the primary hybridization set to which this sample belongs
+      role: experimental_condition
+    - name: responsive_non_responsive
+      dtype: string
+      description: >-
+        classification of the regulator as responsive or not to the
+        deletion from the original paper supplement
+      role: experimental_condition
+    - name: nr_sign_changes
+      dtype: integer
+      description: >-
+        number of significant changes in expression detected for the regulator locus tag (abs(M) > log2(1.7) & pval < 0.05).
+        Note that there is a slight difference when calculating from the data provided here, I believe due to a difference in
+        the way the targets are parsed and filtered (some ORFs that have since been removed from the annotations are removed).
+        I didn't investigate this closely, though.
+      role: experimental_condition
+    - name: profile_first_published
+      dtype: string
+      description: citation or reference indicating where this expression profile was first published
+      role: experimental_condition
+    - name: chase_notes
+      dtype: string
+      description: notes added during data curation and parsing
+---
+
+# mahendrawada_2025
+---
+license: mit
+language:
+- en
+tags:
+- biology
+- genomics
+- yeast
+- transcription-factors
+- gene-expression
+- binding
+- chec
+- perturbation
+- rnaseq
+- nascent rnaseq
+pretty_name: "Mahendrawada 2025 ChEC-seq and Nascent RNA-seq data"
+size_categories:
+- 100K<n<1M
+
+configs:
+- config_name: genomic_features
+  description: Comprehensive genomic features and regulatory characteristics for yeast genes
+  dataset_type: genomic_features
+  data_files:
+  - split: train
+    path: features_mahendrawada_2025.parquet
+  dataset_info:
+    features:
+    - name: gene_id
+      dtype: string
+      description: Systematic gene name (ORF identifier) from SGD (https://yeastgenome.org/)
+    - name: SGD_id
+      dtype: string
+      description: Unique identifier for each gene from SGD (https://yeastgenome.org/)
+    - name: gene_name
+      dtype: string
+      description: Common name of each gene
+    - name: chr
+      dtype: string
+      description: Chromosome number corresponding to gene
+    - name: strand
+      dtype: string
+      description: Strandedness of the gene (+ or -)
+    - name: start
+      dtype: int64
+      description: Start position of the ORF
+    - name: end
+      dtype: int64
+      description: End position of the ORF
+    - name: TSS
+      dtype: int64
+      description: Transcription start site based on Park et al., 2014 (doi:10.1093/nar/gkt1366)
+    - name: TATA_category
+      dtype: string
+      description: TATA box classification from Donczew et al., 2020 using consensus TATAWAW (doi:10.7554/eLife.50109)
+    - name: expression
+      dtype: float64
+      description: Average signal normalized to gene length from Donczew et al., 2020 (doi:10.7554/eLife.50109)
+    - name: +1 nucleosome
+      dtype: float64
+      description: Position of +1 nucleosome from Chereji et al., 2018 (doi:10.1186/S13059-018-1398-0)
+    - name: -1 nucleosome
+      dtype: float64
+      description: Position of -1 nucleosome from Chereji et al., 2018 (doi:10.1186/S13059-018-1398-0)
+    - name: NDR Center
+      dtype: float64
+      description: Center of nucleosome depleted region from Chereji et al., 2018 (doi:10.1186/S13059-018-1398-0)
+    - name: NDR Width
+      dtype: float64
+      description: Width of nucleosome depletion region from Chereji et al., 2018 (doi:10.1186/S13059-018-1398-0)
+    - name: tail-dependence
+      dtype: string
+      description: Tail classification based on Mediator tail dependence from Warfield L, Donczew R et al., 2022 (doi:10.1016/j.molcel.2022.09.016)
+    - name: coactivator
+      dtype: string
+      description: Coactivator classification based on TFIID and/or SAGA dependence from Donczew et al., 2020 (doi:10.7554/eLife.50109)
+    - name: LCID_center
+      dtype: string
+      description: Genes near boundaries of chromosomal interacting domains from Swygert et al., 2020 (doi:10.1016/j.molcel.2018.11.020)
+    - name: Rossi_classes
+      dtype: string
+      description: Promoter classes from Rossi et al., 2021 (doi:10.1038/s41586-021-03314-8)
+    - name: RP_category
+      dtype: string
+      description: Ribosomal protein (RP) and ribosomal biogenesis (RiBi) gene classification from Zencir et al., 2020 (doi:10.1093/NAR/GKAA852)
+    - name: binding_cluster
+      dtype: string
+      description: Clusters from unsupervised K-means clustering using binary binding data of 178 transcription factors
+    - name: list_of_TFS_bound
+      dtype: string
+      description: List of transcription factors bound to gene promoter (-400 to +200 bp from TSS; Homer peak calling)
+    - name: number_of_bound_tfs
+      dtype: int64
+      description: Number of transcription factors bound to each promoter
+    - name: locus_tag
+      dtype: string
+      description: Systematic gene identifier from yeast_genome_resources dataset
+    - name: symbol
+      dtype: string
+      description: Standard gene symbol from yeast_genome_resources dataset
+
+- config_name: mahendrawada_chec_seq
+  description: ChEC-seq transcription factor binding data with peak scores (original authors' processed data)
+  default: true
+  dataset_type: annotated_features
+  metadata_fields:
+    - regulator_locus_tag
+    - regulator_symbol
+  data_files:
+  - split: train
+    path: chec_mahendrawada_2025.parquet
+  dataset_info:
+    features:
+    - name: sample_id
+      dtype: integer
+      description: >-
+        unique identifier for a specific sample, which uniquely identifies one of the 178 TFs.
+        Across datasets in this repo, the a given sample_id identifies the same regulator.
+    - name: regulator_locus_tag
+      dtype: string
+      description: Systematic gene name (ORF identifier) of the transcription factor
+    - name: regulator_symbol
+      dtype: string
+      description: Standard gene symbol of the transcription factor
+    - name: target_locus_tag
+      dtype: string
+      description: Systematic gene name (ORF identifier) of the target gene
+    - name: target_symbol
+      dtype: string
+      description: Standard gene symbol of the target gene
+    - name: peak_score
+      dtype: float64
+      description: ChEC signal around peak center (sum of ChEC signal from -150 to +150 bp from peak summit) normalized to Drosophila spike-in control
+    - name: processing_method
+      dtype: string
+      description: Method used for peak calling and quantification (original authors)
+
+- config_name: reprocessed_chec_seq
+  description: ChEC-seq transcription factor binding data reprocessed with updated peak calling methodology
+  dataset_type: annotated_features
+  data_files:
+  - split: train
+    path: chec_reprocessed_mahendrawada_2025.parquet
+  dataset_info:
+    features:
+    - name: sample_id
+      dtype: integer
+      description: >-
+        unique identifier for a specific sample, which uniquely identifies one of the 178 TFs.
+        Across datasets in this repo, the a given sample_id identifies the same regulator.
+    - name: regulator_locus_tag
+      dtype: string
+      description: Systematic gene name (ORF identifier) of the transcription factor
+    - name: regulator_symbol
+      dtype: string
+      description: Standard gene symbol of the transcription factor
+    - name: target_locus_tag
+      dtype: string
+      description: Systematic gene name (ORF identifier) of the target gene
+    - name: target_symbol
+      dtype: string
+      description: Standard gene symbol of the target gene
+    - name: enrichment
+      dtype: float64
+      description: ratio of experimental insertions to background insertions
+    - name: poisson_pval
+      dtype: float64
+      description: enrichment poisson pvalue
+
+- config_name: reprocessed_diffcontrol_5prime
+  description: Comparing two different sets of control replicates, m2025 from the Mahendrawada 2025 paper, and h2021 from a previous paper from the Hahn lab
+  dataset_type: annotated_features
+  metadata_fields:
+    - control_source
+    - condition
+    - regulator_locus_tag
+  experimental_conditions:
+    # Mahendrawada et al 2025: "30 °C culture"
+    temperature_celsius: 30
+    cultivation_method: unspecified
+    growth_phase_at_harvest:
+      # Mahendrawada et al 2025: "A600 of ~1.0"
+      od600: 1.0
+    media:
+      # Mahendrawada et al 2025: "synthetic complete (SC) media"
+      name: synthetic_complete
+      carbon_source: unspecified
+      nitrogen_source:
+        - compound: yeast_nitrogen_base
+          # Mahendrawada et al 2025: 1.7 g/L (without ammonium sulfate or amino acids (BD Difco))
+          concentration_percent: 0.17
+          specifications:
+            - without_ammonium_sulfate
+            - without_amino_acids
+        - compound: ammonium_sulfate
+          # Mahendrawada et al 2025: 5 g/L
+          concentration_percent: 0.5
+        - compound: amino_acid_dropout_mix
+          # Mahendrawada et al 2025: 0.6 g/L
+          concentration_percent: 0.06
+        - compound: adenine_sulfate
+          # Mahendrawada et al 2025: 40 μg/ml = 0.04 g/L
+          concentration_percent: 0.004
+        - compound: uracil
+          # Mahendrawada et al 2025: 2 μg/ml = 0.002 g/L
+          concentration_percent: 0.0002
+  data_files:
+  - split: train
+    path: reprocess_diffcontrol_5prime.parquet
+  dataset_info:
+    features:
+    - name: control_source
+      dtype: string
+      description: Source identifier for the control dataset (m2025 or h2021)
+    - name: condition
+      dtype: string
+      description: Experimental condition. 'standard' is YPD.
+    - name: regulator_locus_tag
+      dtype: string
+      description: Systematic gene name (ORF identifier) of the transcription factor
+    - name: target_locus_tag
+      dtype: string
+      description: Systematic gene name (ORF identifier) of the target gene
+    - name: chr
+      dtype: string
+      description: Chromosome name of the promoter/target region
+    - name: start
+      dtype: int64
+      description: Start coordinate of the promoter region
+    - name: end
+      dtype: int64
+      description: End coordinate of the promoter region
+    - name: strand
+      dtype: string
+      description: Strand orientation (+ or -) of the promoter/target
+    - name: input_vs_target_log2_fold_change
+      dtype: float64
+      description: Log2 fold change of TF-tagged sample vs control (from DESeq2)
+    - name: input_vs_target_p_value
+      dtype: float64
+      description: P-value for differential enrichment (from DESeq2)
+    - name: input_vs_target_adj_p_value
+      dtype: float64
+      description: Adjusted p-value (FDR-corrected) for differential enrichment (from DESeq2)
+
+- config_name: rna_seq
+  description: Nascent RNA-seq differential expression data following transcription factor depletion using 4TU metabolic labeling
+  dataset_type: annotated_features
+  metadata_fields:
+    - regulator_locus_tag
+    - regulator_symbol
+  data_files:
+  - split: train
+    path: rnaseq_mahendrawada_2025.parquet
+  dataset_info:
+    features:
+    - name: sample_id
+      dtype: integer
+      description: >-
+        unique identifier for a specific sample, which uniquely identifies one of the 178 TFs.
+        Across datasets in this repo, the a given sample_id identifies the same regulator.
+    - name: db_id
+      dtype: integer
+      description: >-
+        an old unique identifer, for use internally only. Deprecated and will be removed eventually.
+        Do not use in analysis.
+    - name: regulator_locus_tag
+      dtype: string
+      description: Systematic gene name (ORF identifier) of the depleted transcription factor
+    - name: regulator_symbol
+      dtype: string
+      description: Standard gene symbol of the depleted transcription factor
+    - name: target_locus_tag
+      dtype: string
+      description: Systematic gene name (ORF identifier) of the differentially expressed target gene
+    - name: target_symbol
+      dtype: string
+      description: Standard gene symbol of the differentially expressed target gene
+    - name: log2fc
+      dtype: float64
+      description: Log2 fold change (IAA/DMSO) for significantly affected genes (DESeq2, padj <0.1, FC >= 1.3)
+---
+
+# rossi_2021
+---
+license: mit
+tags:
+- transcription-factor
+- binding
+- chipexo
+- genomics
+- biology
+language:
+- en
+pretty_name: Rossi ChIP-exo 2021
+experimental_conditions:
+  temperature_celsius: 25
+  cultivation_method: unspecified
+  growth_phase_at_harvest:
+    phase: mid_log
+    od600: 0.8
+  media:
+    name: yeast_peptone_dextrose
+    carbon_source:
+      - compound: D-glucose
+        concentration_percent: unspecified
+    nitrogen_source:
+      - compound: yeast_extract
+        concentration_percent: unspecified
+      - compound: peptone
+        concentration_percent: unspecified
+
+  # Heat shock applied only to SAGA strains
+  # note that im not sure which strains this
+  # applies to -- it is a TODO to better
+  # document this
+  heat_shock:
+    induced: true
+    temperature_celsius: 37
+    duration_minutes: 6
+    pre_induction_temperature_celsius: 25
+    method: equal_volume_medium_transfer
+configs:
+- config_name: metadata
+  description: Metadata describing the tagged regulator in each experiment
+  dataset_type: metadata
+  data_files:
+  - split: train
+    path: rossi_2021_metadata.parquet
+  dataset_info:
+    features:
+    - name: regulator_locus_tag
+      dtype: string
+      description: Systematic gene name (ORF identifier) of the transcription factor
+    - name: regulator_symbol
+      dtype: string
+      description: Standard gene symbol of the transcription factor
+    - name: run_accession
+      dtype: string
+      description: GEO run accession identifier for the sample
+    - name: yeastepigenome_id
+      dtype: string
+      description: Sample identifier used by yeastepigenome.org
+- config_name: genome_map
+  description: "ChIP-exo 5' tag coverage data partitioned by sample accession"
+  dataset_type: genome_map
+  data_files:
+  - split: train
+    path: genome_map/*/*.parquet
+  dataset_info:
+    features:
+    - name: chr
+      dtype: string
+      description: Chromosome name (e.g., chrI, chrII, etc.)
+    - name: pos
+      dtype: int32
+      description: "Genomic position of the 5' tag"
+    - name: pileup
+      dtype: int32
+      description: "Depth of coverage (number of 5' tags) at this genomic position"
+- config_name: rossi_annotated_features
+  description: ChIP-exo regulator-target binding features with peak statistics
+  dataset_type: annotated_features
+  default: true
+  metadata_fields:
+    - regulator_locus_tag
+    - regulator_symbol
+    - target_locus_tag
+  data_files:
+    - split: train
+      path: yeastepigenome_annotatedfeatures.parquet
+  dataset_info:
+    features:
+      - name: sample_id
+        dtype: int32
+        description: >-
+          Unique identifier for each ChIP-exo experimental sample.
+      - name: pss_id
+        dtype: float64
+        description: >-
+          Current brentlab promotersetsig table id. This will eventually be removed.
+      - name: binding_id
+        dtype: float64
+        description: >-
+          Current brentlab binding table id. This will eventually be removed.
+      - name: yeastepigenome_id
+        dtype: float64
+        description: >-
+          Unique identifier in the yeastepigenome database.
+      - name: regulator_locus_tag
+        dtype: string
+        description: >-
+          Systematic ORF name of the regulator.
+        role: regulator_identifier
+      - name: regulator_symbol
+        dtype: string
+        description: >-
+          Common gene name of the regulator.
+        role: regulator_identifier
+      - name: target_locus_tag
+        dtype: string
+        description: >-
+          The systematic ID of the feature to which the effect/pvalue is
+          assigned. See hf/BrentLab/yeast_genome_resources
+        role: target_identifier
+      - name: target_symbol
+        dtype: string
+        description: >-
+          The common name of the feature to which the effect/pvalue is
+          assigned. If there is no common name, the `target_locus_tag` is
+          used.
+        role: target_identifier
+      - name: n_sig_peaks
+        dtype: float64
+        description: >-
+          Number of peaks in the promoter region of the the target gene
+        role: quantitative_measure
+      - name: max_fc
+        dtype: float64
+        description: >-
+          If there are multiple peaks in the promoter region, then the maximum is
+          reported. Otherwise, it is the fold change of the single peak in the
+          promoter.
+        role: quantitative_measure
+      - name: min_pval
+        dtype: float64
+        description: >-
+          The most significant p-value among peaks for this interaction.
+        role: quantitative_measure
+- config_name: reprocess_annotatedfeatures
+  description: >-
+    Annotated features reprocessed with updated peak
+    calling methodology
+  dataset_type: annotated_features
+  data_files:
+    - split: train
+      path: reprocess_annotatedfeatures.parquet
+  dataset_info:
+    features:
+      - name: regulator_locus_tag
+        dtype: string
+        description: Systematic gene name (ORF identifier) of the transcription factor
+      - name: regulator_symbol
+        dtype: string
+        description: Standard gene symbol of the transcription factor
+      - name: target_locus_tag
+        dtype: string
+        description: Systematic gene name (ORF identifier) of the target gene
+      - name: target_symbol
+        dtype: string
+        description: Standard gene symbol of the target gene
+      - name: baseMean
+        dtype: float64
+        description: Average of normalized count values, dividing by size factors, taken over all samples
+      - name: log2FoldChange
+        dtype: float64
+        description: Log2 fold change between comparison and control groups
+      - name: lfcSE
+        dtype: float64
+        description: Standard error estimate for the log2 fold change estimate
+      - name: stat
+        dtype: float64
+        description: Value of the test statistic for the gene
+      - name: pvalue
+        dtype: float64
+        description: P-value of the test for the gene
+      - name: padj
+        dtype: float64
+        description: Adjusted p-value for multiple testing for the gene
+- config_name: reprocess_annotatedfeatures_tagcounts
+  description: Another version of the reprocessed data, quantified similarly to Calling Cards
+  dataset_type: annotated_features
+  data_files:
+    - split: train
+      path: reprocess_annotatedfeatures_tagcounts.parquet
+  dataset_info:
+    features:
+      - name: regulator_locus_tag
+        dtype: string
+        description: Systematic gene name (ORF identifier) of the transcription factor
+        role: regulator_identifier
+      - name: target_locus_tag
+        dtype: string
+        description: Systematic gene name (ORF identifier) of the target gene
+        role: target_identifier
+      - name: rank
+        dtype: int64
+        description: Rank (ties method min rank) of the peak based on pvalue with ties broken by enrichment. Largest rank is most significant.
+      - name: control_count
+        dtype: int64
+        description: Number of tags in the control condition
+      - name: experimental_count
+        dtype: int64
+        description: Number of tags in the experimental condition
+      - name: mu
+        dtype: float64
+        description: Expected count under the null hypothesis (control_count + 1) * (experimental_total_tags / control_total_tags)
+      - name: enrichment
+        dtype: float64
+        description: Enrichment ratio of experimental over control. (experimental_counts / experimental_total) / (control_counts + pseudocount) / control_total
+        role: quantitative_measure
+      - name: log2_enrichment
+        dtype: float64
+        description: Log2-transformed enrichment ratio
+        role: quantitative_measure
+      - name: neg_log10_pvalue
+        dtype: float64
+        description: Negative log10 of the p-value for binding significance
+        role: quantitative_measure
+      - name: neg_log10_qvalue
+        dtype: float64
+        description: Negative log10 of the FDR-adjusted q-value
+        role: quantitative_measure
+---
+
+# yeast_genome_resources
+---
+license: mit
+pretty_name: BrentLab Yeast Genome Resources
+language:
+  - en
+dataset_info:
+  features:
+    - name: start
+      dtype: int32
+      description: Start coordinate (1-based, **inclusive**)
+    - name: end
+      dtype: int32
+      description: End coordinate (1-based, **inclusive**)
+    - name: strand
+      dtype: string
+      levels:
+        - +
+        - "-"
+      description: Strand of feature
+    - name: type
+      dtype: string
+      levels:
+        - gene
+        - ncRNA_gene
+        - tRNA_gene
+        - snoRNA_gene
+        - transposable_element_gene
+        - pseudogene
+        - telomerase_RNA_gene
+        - snRNA_gene
+        - rRNA_gene
+        - blocked_reading_frame
+      description: classification of feature
+    - name: locus_tag
+      dtype: string
+      description: Systematic ID of feature
+    - name: symbol
+      dtype: string
+      description: Common name of feature
+    - name: alias
+      dtype: string
+      description: Alternative names of feature, typically alternative symbols
+    - name: source
+      dtype: string
+      description: Annotation file version/origin of the feature
+    - name: note
+      dtype: string
+      description: Additional feature information, typically the description from the
+        SGD gff/gtf
+  partitioning:
+    keys:
+      - name: chr
+        dtype: string
+        levels:
+          - chrI
+          - chrII
+          - chrVII
+          - chrV
+          - chrIII
+          - chrIV
+          - chrVIII
+          - chrVI
+          - chrX
+          - chrIX
+          - chrXI
+          - chrXIV
+          - chrXII
+          - chrXIII
+          - chrXV
+          - chrXVI
+          - chrM
+configs:
+  - config_name: features
+    default: true
+    data_files:
+      - split: train
+        path:
+          - features/*/part-0.parquet
+---
diff --git a/tfbpapi/tests/snapshots/__init__.py b/tfbpapi/tests/snapshots/__init__.py
deleted file mode 100644
index e69de29..0000000
diff --git a/tfbpapi/tests/snapshots/promotersetsig_records_and_files.tar.gz b/tfbpapi/tests/snapshots/promotersetsig_records_and_files.tar.gz
deleted file mode 100644
index bde8021..0000000
Binary files a/tfbpapi/tests/snapshots/promotersetsig_records_and_files.tar.gz and /dev/null differ
diff --git a/tfbpapi/tests/snapshots/snap_test_AbstractAPI.py b/tfbpapi/tests/snapshots/snap_test_AbstractAPI.py
deleted file mode 100644
index 8444992..0000000
--- a/tfbpapi/tests/snapshots/snap_test_AbstractAPI.py
+++ /dev/null
@@ -1,20 +0,0 @@
-# -*- coding: utf-8 -*-
-# snapshottest: v1 - https://goo.gl/zC4yUc
-from __future__ import unicode_literals
-
-from snapshottest import Snapshot
-
-
-snapshots = Snapshot()
-
-snapshots['test_cache_operations cache_get_after_delete'] = 'None'
-
-snapshots['test_cache_operations cache_get_after_set'] = 'test_value'
-
-snapshots['test_cache_operations cache_list'] = "['test_key']"
-
-snapshots['test_pop_params pop_params_after_all_removed'] = '{}'
-
-snapshots['test_pop_params pop_params_after_one_removed'] = '{"param2": "value2"}'
-
-snapshots['test_push_params push_params'] = '{"param1": "value1", "param2": "value2"}'
diff --git a/tfbpapi/tests/snapshots/snap_test_AbstractRecordsAndFilesAPI.py b/tfbpapi/tests/snapshots/snap_test_AbstractRecordsAndFilesAPI.py
deleted file mode 100644
index 807cb7d..0000000
--- a/tfbpapi/tests/snapshots/snap_test_AbstractRecordsAndFilesAPI.py
+++ /dev/null
@@ -1,15 +0,0 @@
-# snapshottest: v1 - https://goo.gl/zC4yUc
-
-from snapshottest import Snapshot
-
-snapshots = Snapshot()
-
-snapshots[
-    "test_save_response_records_and_files 1"
-] = """id,uploader_id,upload_date,modifier_id,modified_date,binding_id,promoter_id,background_id,fileformat_id,file
-10690,1,2024-03-26,1,2024-03-26 14:28:43.825628+00:00,4079,4,6,5,promotersetsig/10690.csv.gz
-10694,1,2024-03-26,1,2024-03-26 14:28:44.739775+00:00,4083,4,6,5,promotersetsig/10694.csv.gz
-10754,1,2024-03-26,1,2024-03-26 14:29:01.837335+00:00,4143,4,6,5,promotersetsig/10754.csv.gz
-10929,1,2024-03-26,1,2024-03-26 14:29:45.379790+00:00,4318,4,6,5,promotersetsig/10929.csv.gz
-10939,1,2024-03-26,1,2024-03-26 14:29:47.853980+00:00,4327,4,6,5,promotersetsig/10939.csv.gz
-"""
diff --git a/tfbpapi/tests/snapshots/test_AbstractAPI/test_cache_operations/cache_get_after_delete b/tfbpapi/tests/snapshots/test_AbstractAPI/test_cache_operations/cache_get_after_delete
deleted file mode 100644
index 4af1832..0000000
--- a/tfbpapi/tests/snapshots/test_AbstractAPI/test_cache_operations/cache_get_after_delete
+++ /dev/null
@@ -1 +0,0 @@
-None
\ No newline at end of file
diff --git a/tfbpapi/tests/snapshots/test_AbstractAPI/test_cache_operations/cache_get_after_set b/tfbpapi/tests/snapshots/test_AbstractAPI/test_cache_operations/cache_get_after_set
deleted file mode 100644
index fff1c65..0000000
--- a/tfbpapi/tests/snapshots/test_AbstractAPI/test_cache_operations/cache_get_after_set
+++ /dev/null
@@ -1 +0,0 @@
-test_value
\ No newline at end of file
diff --git a/tfbpapi/tests/snapshots/test_AbstractAPI/test_cache_operations/cache_list b/tfbpapi/tests/snapshots/test_AbstractAPI/test_cache_operations/cache_list
deleted file mode 100644
index 1950491..0000000
--- a/tfbpapi/tests/snapshots/test_AbstractAPI/test_cache_operations/cache_list
+++ /dev/null
@@ -1 +0,0 @@
-['test_key']
\ No newline at end of file
diff --git a/tfbpapi/tests/snapshots/test_AbstractAPI/test_pop_params/pop_params_after_all_removed b/tfbpapi/tests/snapshots/test_AbstractAPI/test_pop_params/pop_params_after_all_removed
deleted file mode 100644
index 9e26dfe..0000000
--- a/tfbpapi/tests/snapshots/test_AbstractAPI/test_pop_params/pop_params_after_all_removed
+++ /dev/null
@@ -1 +0,0 @@
-{}
\ No newline at end of file
diff --git a/tfbpapi/tests/snapshots/test_AbstractAPI/test_pop_params/pop_params_after_one_removed b/tfbpapi/tests/snapshots/test_AbstractAPI/test_pop_params/pop_params_after_one_removed
deleted file mode 100644
index cab5c0c..0000000
--- a/tfbpapi/tests/snapshots/test_AbstractAPI/test_pop_params/pop_params_after_one_removed
+++ /dev/null
@@ -1 +0,0 @@
-{"param2": "value2"}
\ No newline at end of file
diff --git a/tfbpapi/tests/snapshots/test_AbstractAPI/test_push_params/push_params b/tfbpapi/tests/snapshots/test_AbstractAPI/test_push_params/push_params
deleted file mode 100644
index 21d59b6..0000000
--- a/tfbpapi/tests/snapshots/test_AbstractAPI/test_push_params/push_params
+++ /dev/null
@@ -1 +0,0 @@
-{"param1": "value1", "param2": "value2"}
\ No newline at end of file
diff --git a/tfbpapi/tests/test_AbstractAPI.py b/tfbpapi/tests/test_AbstractAPI.py
deleted file mode 100644
index 84a643d..0000000
--- a/tfbpapi/tests/test_AbstractAPI.py
+++ /dev/null
@@ -1,94 +0,0 @@
-import json
-from typing import Any
-
-import pytest
-import responses
-
-from tfbpapi.AbstractAPI import AbstractAPI
-from tfbpapi.ParamsDict import ParamsDict
-
-
-class ConcreteAPI(AbstractAPI):
-    """Concrete implementation of AbstractAPI for testing purposes."""
-
-    def create(self, data: dict[str, Any], **kwargs) -> Any:
-        pass  # Implement for testing if necessary
-
-    def read(self, **kwargs) -> dict[str, Any]:
-        return {"id": id}  # Mock implementation for testing
-
-    def update(self, df: Any, **kwargs) -> Any:
-        pass  # Implement for testing if necessary
-
-    def delete(self, id: str, **kwargs) -> Any:
-        pass  # Implement for testing if necessary
-
-    def submit(self, post_dict: dict, **kwargs) -> Any:
-        pass  # Implement for testing if necessary
-
-    def retrieve(
-        self, group_task_id: str, timeout: int, polling_interval: int, **kwargs
-    ) -> Any:
-        pass  # Implement for testing if necessary
-
-
-@pytest.fixture
-@responses.activate
-def api_client():
-    valid_url = "https://valid.url"
-    responses.add(responses.HEAD, valid_url, status=200)
-    return ConcreteAPI(url=valid_url, token="token")
-
-
-def test_initialize(snapshot, api_client):
-    assert api_client.url == "https://valid.url"
-    assert api_client.token == "token"
-    assert isinstance(api_client.params, ParamsDict)
-
-
-def test_push_params(snapshot, api_client):
-    params = {"param1": "value1", "param2": "value2"}
-    api_client.push_params(params)
-    # Serialize the dictionary to a JSON string for comparison
-    params_as_json = json.dumps(api_client.params.as_dict(), sort_keys=True)
-    snapshot.assert_match(params_as_json, "push_params")
-
-
-def test_pop_params(snapshot, api_client):
-    params = {"param1": "value1", "param2": "value2"}
-    api_client.push_params(params)
-    api_client.pop_params(["param1"])
-    params_as_json1 = json.dumps(api_client.params.as_dict(), sort_keys=True)
-    snapshot.assert_match(params_as_json1, "pop_params_after_one_removed")
-    api_client.pop_params()
-    params_as_json2 = json.dumps(api_client.params.as_dict(), sort_keys=True)
-    snapshot.assert_match(params_as_json2, "pop_params_after_all_removed")
-
-
-@responses.activate
-def test_is_valid_url(api_client):
-    invalid_url = "https://invalid.url"
-
-    responses.add(responses.HEAD, invalid_url, status=404)
-
-    with pytest.raises(ValueError):
-        api_client.url = invalid_url
-
-
-def test_cache_operations(snapshot, api_client):
-    key = "test_key"
-    value = "test_value"
-
-    api_client._cache_set(key, value)
-    snapshot.assert_match(str(api_client._cache_get(key)), "cache_get_after_set")
-
-    keys = api_client._cache_list()
-    snapshot.assert_match(str(keys), "cache_list")
-
-    api_client._cache_delete(key)
-    snapshot.assert_match(str(api_client._cache_get(key)), "cache_get_after_delete")
-    snapshot.assert_match(str(api_client._cache_get(key)), "cache_get_after_delete")
-
-
-if __name__ == "__main__":
-    pytest.main()
diff --git a/tfbpapi/tests/test_AbstractRecordsAndFilesAPI.py b/tfbpapi/tests/test_AbstractRecordsAndFilesAPI.py
deleted file mode 100644
index 1c64a39..0000000
--- a/tfbpapi/tests/test_AbstractRecordsAndFilesAPI.py
+++ /dev/null
@@ -1,284 +0,0 @@
-import gzip
-from io import BytesIO
-from tempfile import NamedTemporaryFile
-from typing import Any
-
-import pandas as pd
-import pytest
-import responses
-from aioresponses import aioresponses
-
-from tfbpapi.AbstractRecordsAndFilesAPI import (
-    AbstractRecordsAndFilesAPI,
-)
-
-# The following test is commented out because it requires a running server -- this is
-# how I retrieved the data for the tests below. The data is saved in the snapshot
-# directory
-#
-# @pytest.mark.asyncio
-# async def test_save_response_records_and_files(snapshot):
-#     async with aiohttp.ClientSession() as session:
-#         url = "http://127.0.0.1:8001/api/promotersetsig/export"
-#         async with session.get(
-#             url,
-#             headers={
-#                 "Authorization": f"token {os.getenv('TOKEN')}",
-#                 "Content-Type": "application/json",
-#             },
-#             params={
-#                 "regulator_symbol": "HAP5",
-#                 "workflow": "nf_core_callingcards_dev",
-#                 "data_usable": "pass",
-#             },
-#         ) as response:
-#             response.raise_for_status()
-#             response_text = await response.text()
-#             snapshot.assert_match(response_text)
-#             assert response.status == 200
-
-
-# @pytest.mark.asyncio
-# async def test_save_response_records_and_files():
-#     async with aiohttp.ClientSession() as session:
-#         url = "http://127.0.0.1:8001/api/promotersetsig/record_table_and_files"
-#         async with session.get(
-#             url,
-#             headers={
-#                 "Authorization": f"token {os.getenv('TOKEN')}",
-#                 "Content-Type": "application/gzip",
-#             },
-#             params={
-#                 "regulator_symbol": "HAP5",
-#                 "workflow": "nf_core_callingcards_dev",
-#                 "data_usable": "pass",
-#             },
-#         ) as response:
-#             response.raise_for_status()
-#             response_content = await response.read()
-#             with open("saved_response.tar.gz", "wb") as f:
-#                 f.write(response_content)
-#             assert response.status == 200
-
-
-def promotersetsig_csv_gzip() -> bytes:
-    # Define the data as a dictionary
-    data = {
-        "id": [10690, 10694, 10754, 10929, 10939],
-        "uploader_id": [1, 1, 1, 1, 1],
-        "upload_date": ["2024-03-26"] * 5,
-        "modifier_id": [1, 1, 1, 1, 1],
-        "modified_date": [
-            "2024-03-26 14:28:43.825628+00:00",
-            "2024-03-26 14:28:44.739775+00:00",
-            "2024-03-26 14:29:01.837335+00:00",
-            "2024-03-26 14:29:45.379790+00:00",
-            "2024-03-26 14:29:47.853980+00:00",
-        ],
-        "binding_id": [4079, 4083, 4143, 4318, 4327],
-        "promoter_id": [4, 4, 4, 4, 4],
-        "background_id": [6, 6, 6, 6, 6],
-        "fileformat_id": [5, 5, 5, 5, 5],
-        "file": [
-            "promotersetsig/10690.csv.gz",
-            "promotersetsig/10694.csv.gz",
-            "promotersetsig/10754.csv.gz",
-            "promotersetsig/10929.csv.gz",
-            "promotersetsig/10939.csv.gz",
-        ],
-    }
-
-    # Create a DataFrame
-    df = pd.DataFrame(data)
-
-    # Convert the DataFrame to CSV and compress it using gzip
-    csv_buffer = BytesIO()
-    with gzip.GzipFile(fileobj=csv_buffer, mode="w") as gz:
-        df.to_csv(gz, index=False)
-
-    # Get the gzipped data as bytes
-    return csv_buffer.getvalue()
-
-
-class ConcreteRecordsAndFilesAPI(AbstractRecordsAndFilesAPI):
-    """Concrete implementation of AbstractRecordsAndFilesAPI for testing purposes."""
-
-    def create(self, data: dict[str, Any], **kwargs) -> Any:
-        pass
-
-    def update(self, df: Any, **kwargs) -> Any:
-        pass
-
-    def delete(self, id: str, **kwargs) -> Any:
-        pass
-
-    def submit(self, post_dict: dict, **kwargs) -> Any:
-        pass  # Implement for testing if necessary
-
-    def retrieve(
-        self, group_task_id: str, timeout: int, polling_interval: int, **kwargs
-    ) -> Any:
-        pass  # Implement for testing if necessary
-
-
-@pytest.fixture
-@responses.activate
-def api_client():
-    valid_url = "http://127.0.0.1:8001/api/promotersetsig"
-    responses.add(responses.HEAD, valid_url, status=200)
-    return ConcreteRecordsAndFilesAPI(url=valid_url, token="my_token")
-
-
-@pytest.mark.asyncio
-async def test_read_without_files(snapshot, api_client):
-    with aioresponses() as m:
-        # Mock the HTTP response with the saved snapshot response
-        m.get(
-            "http://127.0.0.1:8001/api/promotersetsig/export",
-            status=200,
-            body=promotersetsig_csv_gzip(),
-            headers={"Content-Type": "application/gzip"},
-        )
-
-        result = await api_client.read()
-        assert isinstance(result.get("metadata"), pd.DataFrame)
-        assert result.get("metadata").shape == (
-            5,
-            10,
-        )
-
-
-# chatGPT and I went through many iterations of trying to mock two endpoints at once.
-# no success. the retrieve_files is untested outside of the tutorial notebook as a
-# result
-#
-# @pytest.mark.asyncio
-# async def test_read_with_responses(snapshot, api_client):
-#     with responses.RequestsMock() as rsps:
-#         # Mock the /export endpoint
-#         rsps.add(
-#             responses.GET,
-#             "http://127.0.0.1:8001/api/promotersetsig/export",
-#             body=promotersetsig_csv_gzip(),
-#             status=200,
-#             content_type="text/csv",
-#         )
-
-#         # Path to the tar.gz file
-#         tar_gz_file_path = os.path.join(
-#             os.path.dirname(__file__),
-#             "snapshots",
-#             "promotersetsig_records_and_files.tar.gz",
-#         )
-
-#         # Read the content of the tar.gz file
-#         with open(tar_gz_file_path, "rb") as tar_gz_file:
-#             tar_gz_content = tar_gz_file.read()
-
-#         # Mock the /record_table_and_files endpoint
-#         rsps.add(
-#             responses.GET,
-#             "http://127.0.0.1:8001/api/promotersetsig/record_table_and_files",
-#             body=tar_gz_content,
-#             status=200,
-#             content_type="application/gzip",
-#         )
-
-#         # Helper function to create a mock ClientResponse
-#         async def create_mock_response(url, method, body, content_type, status):
-#             return MockClientResponse(
-#                 method, URL(url), status, {"Content-Type": content_type}, body
-#             )
-
-#         # Patch aiohttp.ClientSession.get to use our mocked responses
-#         async def mock_get(self, url, **kwargs):
-#             if "export" in url:
-#                 return await create_mock_response(
-#                     url,
-#                     "GET",
-#                     promotersetsig_csv_gzip().encode(),
-#                     "text/csv",
-#                     200,
-#                 )
-#             elif "record_table_and_files" in url:
-#                 return await create_mock_response(
-#                     url,
-#                     "GET",
-#                     tar_gz_content,
-#                     "application/gzip",
-#                     200,
-#                 )
-#             else:
-#                 raise ValueError("Unexpected URL")
-
-#         with patch("aiohttp.ClientSession.get", new=mock_get):
-#             # Test the read method without retrieving files
-#             result = await api_client.read()
-#             assert isinstance(result.get("metadata"), pd.DataFrame)
-#             assert result.get("metadata").shape == (5, 10)
-
-#             # Test the read method with retrieving files
-#             result = await api_client.read(retrieve_files=True)
-#             assert isinstance(result.get("metadata"), pd.DataFrame)
-#             assert result.get("metadata").shape == (5, 10)
-#             assert isinstance(result.get("data"), dict)
-#             assert len(result.get("data")) == 5
-#             assert all(isinstance(v, pd.DataFrame) \
-#                     for v in result.get("data").values())
-
-# test the _detect_delimiter method ####
-
-
-def test_detect_delimiter_errors(api_client):
-    # test that a FileNotFound error is raised if the file does not exist
-    with pytest.raises(FileNotFoundError):
-        api_client._detect_delimiter("non_existent_file.csv")
-
-    with NamedTemporaryFile(mode="w", suffix=".csv.gz") as tmpfile:
-        tmpfile.write("col1,col2,col3\nval1,val2,val3")
-        tmpfile.flush()
-        tmpfile_path = tmpfile.name
-
-        with pytest.raises(gzip.BadGzipFile):
-            api_client._detect_delimiter(tmpfile_path)
-
-
-def test_comma_delimiter(api_client):
-    with NamedTemporaryFile(mode="w", suffix=".csv") as tmpfile:
-        tmpfile.write("col1,col2,col3\nval1,val2,val3")
-        tmpfile.flush()
-        tmpfile_path = tmpfile.name
-
-        delimiter = api_client._detect_delimiter(tmpfile_path)
-        assert delimiter == ","
-
-
-def test_tab_delimiter(api_client):
-    with NamedTemporaryFile(mode="w", suffix=".csv") as tmpfile:
-        tmpfile.write("col1\tcol2\tcol3\nval1\tval2\tval3")
-        tmpfile.flush()
-        tmpfile_path = tmpfile.name
-
-        delimiter = api_client._detect_delimiter(tmpfile_path)
-        assert delimiter == "\t"
-
-
-def test_space_delimiter(api_client):
-    with NamedTemporaryFile(mode="w", suffix=".csv") as tmpfile:
-        tmpfile.write("col1 col2 col3\nval1 val2 val3")
-        tmpfile.flush()
-        tmpfile_path = tmpfile.name
-
-        delimiter = api_client._detect_delimiter(tmpfile_path)
-        assert delimiter == " "
-
-
-def test_gzipped_file(api_client):
-    with NamedTemporaryFile(suffix=".csv.gz") as tmpfile:
-        with gzip.open(tmpfile.name, "wt") as gzfile:
-            gzfile.write("col1,col2,col3\nval1,val2,val3")
-            gzfile.flush()
-        tmpfile_path = tmpfile.name
-
-        delimiter = api_client._detect_delimiter(tmpfile_path)
-        assert delimiter == ","
diff --git a/tfbpapi/tests/test_AbstractRecordsOnlyAPI.py b/tfbpapi/tests/test_AbstractRecordsOnlyAPI.py
deleted file mode 100644
index 1def39a..0000000
--- a/tfbpapi/tests/test_AbstractRecordsOnlyAPI.py
+++ /dev/null
@@ -1,71 +0,0 @@
-import gzip
-from typing import Any
-
-import pandas as pd
-import pytest
-import responses
-from aioresponses import aioresponses
-
-from tfbpapi.AbstractRecordsOnlyAPI import AbstractRecordsOnlyAPI
-
-
-class ConcreteAPI(AbstractRecordsOnlyAPI):
-    """Concrete implementation of AbstractRecordsOnlyAPI for testing purposes."""
-
-    def create(self, data: dict[str, Any], **kwargs) -> Any:
-        pass  # Implement for testing if necessary
-
-    def update(self, df: Any, **kwargs) -> Any:
-        pass  # Implement for testing if necessary
-
-    def delete(self, id: str, **kwargs) -> Any:
-        pass  # Implement for testing if necessary
-
-    def submit(self, post_dict: dict, **kwargs) -> Any:
-        pass  # Implement for testing if necessary
-
-    def retrieve(
-        self, group_task_id: str, timeout: int, polling_interval: int, **kwargs
-    ) -> Any:
-        pass  # Implement for testing if necessary
-
-
-@pytest.fixture
-@responses.activate
-def api_client():
-    valid_url = "https://example.com/api/endpoint"
-    responses.add(responses.HEAD, valid_url, status=200)
-    return ConcreteAPI(url=valid_url, token="my_token")
-
-
-@pytest.mark.asyncio
-async def test_read(snapshot, api_client):
-    with aioresponses() as m:
-        # Mocking the response
-        mocked_csv = (
-            "id,uploader_id,upload_date,modifier_id,modified_date,binding_id,promoter_id,background_id,fileformat_id,file\n"  # noqa: E501
-            "10690,1,2024-03-26,1,2024-03-26 14:28:43.825628+00:00,4079,4,6,5,promotersetsig/10690.csv.gz\n"  # noqa: E501
-            "10694,1,2024-03-26,1,2024-03-26 14:28:44.739775+00:00,4083,4,6,5,promotersetsig/10694.csv.gz\n"  # noqa: E501
-            "10754,1,2024-03-26,1,2024-03-26 14:29:01.837335+00:00,4143,4,6,5,promotersetsig/10754.csv.gz\n"  # noqa: E501
-            "10929,1,2024-03-26,1,2024-03-26 14:29:45.379790+00:00,4318,4,6,5,promotersetsig/10929.csv.gz\n"  # noqa: E501
-            "10939,1,2024-03-26,1,2024-03-26 14:29:47.853980+00:00,4327,4,6,5,promotersetsig/10939.csv.gz"  # noqa: E501
-        )
-
-        # Convert to bytes and gzip the content
-        gzipped_csv = gzip.compress(mocked_csv.encode("utf-8"))
-
-        m.get(
-            "https://example.com/api/endpoint/export",
-            status=200,
-            body=gzipped_csv,
-            headers={"Content-Type": "application/gzip"},
-        )
-
-        result = await api_client.read()
-        assert isinstance(result, dict)
-        assert isinstance(result.get("metadata"), pd.DataFrame)
-        assert result.get("metadata").shape == (5, 10)  # type: ignore
-
-
-if __name__ == "__main__":
-    pytest.main()
diff --git a/tfbpapi/tests/test_Cache.py b/tfbpapi/tests/test_Cache.py
deleted file mode 100644
index a84eb37..0000000
--- a/tfbpapi/tests/test_Cache.py
+++ /dev/null
@@ -1,66 +0,0 @@
-import time
-
-import pytest
-
-from tfbpapi.Cache import Cache
-
-
-def test_cache_set_and_get():
-    cache = Cache()
-    cache.set("key1", "value1")
-    assert cache.get("key1") == "value1"
-    assert cache.get("key2", "default_value") == "default_value"
-
-
-def test_cache_list():
-    cache = Cache()
-    cache.set("key1", "value1")
-    cache.set("key2", "value2")
-    keys = cache.list()
-    assert "key1" in keys
-    assert "key2" in keys
-
-
-def test_cache_delete():
-    cache = Cache()
-    cache.set("key1", "value1")
-    cache.set("key2", "value2")
-    cache.delete("key1")
-    assert cache.get("key1") is None
-    assert cache.get("key2") == "value2"
-
-
-def test_cache_ttl():
-    cache = Cache(ttl=1)  # TTL set to 1 second
-    cache.set("key1", "value1")
-    time.sleep(1.5)  # Wait for TTL to expire
-    assert cache.get("key1") is None  # Should be None after TTL expiry
-
-
-def test_cache_lru():
-    cache = Cache(maxsize=2)
-    cache.set("key1", "value1")
-    cache.set("key2", "value2")
-    cache.set("key3", "value3")  # This should evict "key1" if LRU works
-    assert cache.get("key1") is None
-    assert cache.get("key2") == "value2"
-    assert cache.get("key3") == "value3"
-
-
-def test_separate_cache_instances():
-    cache1 = Cache()
-    cache2 = Cache()
-
-    cache1.set("key1", "value1")
-    cache2.set("key2", "value2")
-
-    # Ensure they don't share state
-    assert cache1.get("key1") == "value1"
-    assert cache1.get("key2") is None
-
-    assert cache2.get("key2") == "value2"
-    assert cache2.get("key1") is None
-
-
-if __name__ == "__main__":
-    pytest.main()
diff --git a/tfbpapi/tests/test_ParamsDict.py b/tfbpapi/tests/test_ParamsDict.py
deleted file mode 100644
index ee5a246..0000000
--- a/tfbpapi/tests/test_ParamsDict.py
+++ /dev/null
@@ -1,96 +0,0 @@
-import pytest
-import requests  # type: ignore
-import responses
-
-from tfbpapi.ParamsDict import ParamsDict
-
-
-def test_initialization():
-    params = ParamsDict({"b": 2, "a": 1}, valid_keys=["a", "b"])
-    assert params == {"a": 1, "b": 2}
-
-
-def test_getitem():
-    params = ParamsDict({"a": 1, "b": 2}, valid_keys=["a", "b"])
-    assert params["a"] == 1
-    assert params[["a", "b"]] == ParamsDict({"a": 1, "b": 2})
-    with pytest.raises(KeyError):
-        _ = params["123"]  # Changed from 123 to '123'
-
-
-def test_setitem():
-    params = ParamsDict({"a": 1}, valid_keys=["a", "b", "c", "d"])
-    params.update({"b": 2})
-    assert params == {"a": 1, "b": 2}
-
-    params[["c", "d"]] = [3, 4]
-    assert params == {"a": 1, "b": 2, "c": 3, "d": 4}
-
-    with pytest.raises(ValueError):
-        params[["e", "f"]] = [5]
-
-    with pytest.raises(KeyError):
-        params[123] = 5  # type: ignore
-
-    with pytest.raises(KeyError):
-        params.update({"d": 4, "e": 5})
-
-
-def test_delitem():
-    params = ParamsDict({"a": 1, "b": 2}, valid_keys=["a", "b"])
-    del params["a"]
-    assert params == {"b": 2}
-    with pytest.raises(KeyError):
-        del params["123"]  # Changed from 123 to '123'
-
-
-def test_repr():
-    params = ParamsDict({"a": 1, "b": 2}, valid_keys=["a", "b"])
-    assert repr(params) == "ParamsDict({'a': 1, 'b': 2})"
-
-
-def test_str():
-    params = ParamsDict({"a": 1, "b": 2}, valid_keys=["a", "b"])
-    assert str(params) == "a: 1, b: 2"
-
-
-def test_len():
-    params = ParamsDict({"a": 1, "b": 2}, valid_keys=["a", "b", "c"])
-    assert len(params) == 2
-    params["c"] = 3
-    assert len(params) == 3
-
-
-def test_keys_values_items():
-    params = ParamsDict({"a": 1, "b": 2}, valid_keys=["a", "b"])
-    assert set(params.keys()) == {"a", "b"}
-    assert set(params.values()) == {1, 2}
-    assert set(params.items()) == {("a", 1), ("b", 2)}
-
-
-def test_clear():
-    params = ParamsDict({"a": 1, "b": 2}, valid_keys=["a", "b"])
-    params.clear()
-    assert len(params) == 0
-
-
-def test_as_dict():
-    params = ParamsDict({"a": 1, "b": 2}, valid_keys=["a", "b"])
-    assert params.as_dict() == {"a": 1, "b": 2}
-
-
-@responses.activate
-def test_requests_integration():
-    params = ParamsDict({"a": 1, "b": 2}, valid_keys=["a", "b"])
-
-    url = "https://httpbin.org/get"
-    responses.add(responses.GET, url, json={"args": {"a": "1", "b": "2"}}, status=200)
-
-    response = requests.get(url, params=params)
-    assert response.status_code == 200
-    response_json = response.json()
-    assert response_json["args"] == {"a": "1", "b": "2"}
-
-
-if __name__ == "__main__":
-    pytest.main()
diff --git a/tfbpapi/tests/test_datacard.py b/tfbpapi/tests/test_datacard.py
new file mode 100644
index 0000000..01b2c0b
--- /dev/null
+++ b/tfbpapi/tests/test_datacard.py
@@ -0,0 +1,449 @@
+"""Tests for the DataCard class."""
+
+from unittest.mock import Mock, patch
+
+import pytest
+
+from tfbpapi import DataCard
+from tfbpapi.errors import DataCardError, DataCardValidationError, HfDataFetchError
+from tfbpapi.models import DatasetType
+
+
+class TestDataCard:
+    """Test suite for DataCard class."""
+
+    @patch("tfbpapi.datacard.HfDataCardFetcher")
+    @patch("tfbpapi.datacard.HfRepoStructureFetcher")
+    @patch("tfbpapi.datacard.HfSizeInfoFetcher")
+    def test_init(
+        self,
+        mock_size_fetcher,
+        mock_structure_fetcher,
+        mock_card_fetcher,
+        test_repo_id,
+        test_token,
+    ):
+        """Test DataCard initialization."""
+        datacard = DataCard(test_repo_id, token=test_token)
+
+        assert datacard.repo_id == test_repo_id
+        assert datacard.token == test_token
+        assert datacard._dataset_card is None
+        assert datacard._metadata_cache == {}
+
+        # Check that fetchers were initialized
+        mock_card_fetcher.assert_called_once_with(token=test_token)
+        mock_structure_fetcher.assert_called_once_with(token=test_token)
+        mock_size_fetcher.assert_called_once_with(token=test_token)
+
+    @patch("tfbpapi.datacard.HfDataCardFetcher")
+    @patch("tfbpapi.datacard.HfRepoStructureFetcher")
+    @patch("tfbpapi.datacard.HfSizeInfoFetcher")
+    def test_init_without_token(
+        self, mock_size_fetcher, mock_structure_fetcher, mock_card_fetcher, test_repo_id
+    ):
+        """Test DataCard initialization without token."""
+        datacard = DataCard(test_repo_id)
+
+        assert datacard.repo_id == test_repo_id
+        assert datacard.token is None
+
+        # Check that fetchers were initialized without token
+        mock_card_fetcher.assert_called_once_with(token=None)
+        mock_structure_fetcher.assert_called_once_with(token=None)
+        mock_size_fetcher.assert_called_once_with(token=None)
+
+    @patch("tfbpapi.datacard.HfDataCardFetcher")
+    @patch("tfbpapi.datacard.HfRepoStructureFetcher")
+    @patch("tfbpapi.datacard.HfSizeInfoFetcher")
+    def test_load_and_validate_card_success(
+        self,
+        mock_size_fetcher,
+        mock_structure_fetcher,
+        mock_card_fetcher,
+        test_repo_id,
+        sample_dataset_card_data,
+    ):
+        """Test successful card loading and validation."""
+        # Setup mock
+        mock_fetcher_instance = Mock()
+        mock_card_fetcher.return_value = mock_fetcher_instance
+        mock_fetcher_instance.fetch.return_value = sample_dataset_card_data
+
+        datacard = DataCard(test_repo_id)
+
+        # Access dataset_card property to trigger loading
+        card = datacard.dataset_card
+
+        assert card is not None
+        assert len(card.configs) == 4
+        assert card.pretty_name == "Test Genomics Dataset"
+        mock_fetcher_instance.fetch.assert_called_once_with(test_repo_id)
+
+    @patch("tfbpapi.datacard.HfDataCardFetcher")
+    @patch("tfbpapi.datacard.HfRepoStructureFetcher")
+    @patch("tfbpapi.datacard.HfSizeInfoFetcher")
+    def test_load_card_no_data(
+        self, mock_size_fetcher, mock_structure_fetcher, mock_card_fetcher, test_repo_id
+    ):
+        """Test handling when no dataset card is found."""
+        mock_fetcher_instance = Mock()
+        mock_card_fetcher.return_value = mock_fetcher_instance
+        mock_fetcher_instance.fetch.return_value = {}
+
+        datacard = DataCard(test_repo_id)
+
+        with pytest.raises(DataCardValidationError, match="No dataset card found"):
+            _ = datacard.dataset_card
+
+    @patch("tfbpapi.datacard.HfDataCardFetcher")
+    @patch("tfbpapi.datacard.HfRepoStructureFetcher")
+    @patch("tfbpapi.datacard.HfSizeInfoFetcher")
+    def test_load_card_validation_error(
+        self,
+        mock_size_fetcher,
+        mock_structure_fetcher,
+        mock_card_fetcher,
+        test_repo_id,
+        invalid_dataset_card_data,
+    ):
+        """Test handling of validation errors."""
+        mock_fetcher_instance = Mock()
+        mock_card_fetcher.return_value = mock_fetcher_instance
+        mock_fetcher_instance.fetch.return_value = invalid_dataset_card_data
+
+        datacard = DataCard(test_repo_id)
+
+        with pytest.raises(
+            DataCardValidationError, match="Dataset card validation failed"
+        ):
+            _ = datacard.dataset_card
+
+    @patch("tfbpapi.datacard.HfDataCardFetcher")
+    @patch("tfbpapi.datacard.HfRepoStructureFetcher")
+    @patch("tfbpapi.datacard.HfSizeInfoFetcher")
+    def test_load_card_fetch_error(
+        self, mock_size_fetcher, mock_structure_fetcher, mock_card_fetcher, test_repo_id
+    ):
+        """Test handling of fetch errors."""
+        mock_fetcher_instance = Mock()
+        mock_card_fetcher.return_value = mock_fetcher_instance
+        mock_fetcher_instance.fetch.side_effect = HfDataFetchError("Fetch failed")
+
+        datacard = DataCard(test_repo_id)
+
+        with pytest.raises(DataCardError, match="Failed to fetch dataset card"):
+            _ = datacard.dataset_card
+
+    @patch("tfbpapi.datacard.HfDataCardFetcher")
+    @patch("tfbpapi.datacard.HfRepoStructureFetcher")
+    @patch("tfbpapi.datacard.HfSizeInfoFetcher")
+    def test_configs_property(
+        self,
+        mock_size_fetcher,
+        mock_structure_fetcher,
+        mock_card_fetcher,
+        test_repo_id,
+        sample_dataset_card_data,
+    ):
+        """Test getting all configurations via property."""
+        mock_fetcher_instance = Mock()
+        mock_card_fetcher.return_value = mock_fetcher_instance
+        mock_fetcher_instance.fetch.return_value = sample_dataset_card_data
+
+        datacard = DataCard(test_repo_id)
+        configs = datacard.configs
+
+        assert len(configs) == 4
+        config_names = [config.config_name for config in configs]
+        assert "genomic_features" in config_names
+        assert "binding_data" in config_names
+        assert "genome_map_data" in config_names
+        assert "experiment_metadata" in config_names
+
+    @patch("tfbpapi.datacard.HfDataCardFetcher")
+    @patch("tfbpapi.datacard.HfRepoStructureFetcher")
+    @patch("tfbpapi.datacard.HfSizeInfoFetcher")
+    def test_get_config_by_name(
+        self,
+        mock_size_fetcher,
+        mock_structure_fetcher,
+        mock_card_fetcher,
+        test_repo_id,
+        sample_dataset_card_data,
+    ):
+        """Test getting a specific configuration by name."""
+        mock_fetcher_instance = Mock()
+        mock_card_fetcher.return_value = mock_fetcher_instance
+        mock_fetcher_instance.fetch.return_value = sample_dataset_card_data
+
+        datacard = DataCard(test_repo_id)
+
+        config = datacard.get_config("binding_data")
+        assert config is not None
+        assert config.config_name == "binding_data"
+        assert config.dataset_type == DatasetType.ANNOTATED_FEATURES
+
+        # Test non-existent config
+        assert datacard.get_config("nonexistent") is None
+
+    @patch("tfbpapi.datacard.HfDataCardFetcher")
+    @patch("tfbpapi.datacard.HfRepoStructureFetcher")
+    @patch("tfbpapi.datacard.HfSizeInfoFetcher")
+    def test_get_metadata_relationships(
+        self,
+        mock_size_fetcher,
+        mock_structure_fetcher,
+        mock_card_fetcher,
+        test_repo_id,
+        sample_dataset_card_data,
+    ):
+        """Test getting metadata relationships."""
+        mock_fetcher_instance = Mock()
+        mock_card_fetcher.return_value = mock_fetcher_instance
+        mock_fetcher_instance.fetch.return_value = sample_dataset_card_data
+
+        datacard = DataCard(test_repo_id)
+
+        relationships = datacard.get_metadata_relationships()
+
+        # Should have explicit relationship between binding_data and experiment_metadata
+        explicit_rels = [r for r in relationships if r.relationship_type == "explicit"]
+        assert len(explicit_rels) == 1
+        assert explicit_rels[0].data_config == "binding_data"
+        assert explicit_rels[0].metadata_config == "experiment_metadata"
+
+        # Should have embedded relationship for binding_data (has metadata_fields)
+        embedded_rels = [r for r in relationships if r.relationship_type == "embedded"]
+        assert len(embedded_rels) == 1
+        assert embedded_rels[0].data_config == "binding_data"
+        assert embedded_rels[0].metadata_config == "binding_data_embedded"
+
+    @patch("tfbpapi.datacard.HfDataCardFetcher")
+    @patch("tfbpapi.datacard.HfRepoStructureFetcher")
+    @patch("tfbpapi.datacard.HfSizeInfoFetcher")
+    def test_get_repository_info_success(
+        self,
+        mock_size_fetcher,
+        mock_structure_fetcher,
+        mock_card_fetcher,
+        test_repo_id,
+        sample_dataset_card_data,
+        sample_repo_structure,
+    ):
+        """Test getting repository information."""
+        mock_card_fetcher_instance = Mock()
+        mock_structure_fetcher_instance = Mock()
+        mock_card_fetcher.return_value = mock_card_fetcher_instance
+        mock_structure_fetcher.return_value = mock_structure_fetcher_instance
+
+        mock_card_fetcher_instance.fetch.return_value = sample_dataset_card_data
+        mock_structure_fetcher_instance.fetch.return_value = sample_repo_structure
+
+        datacard = DataCard(test_repo_id)
+
+        info = datacard.get_repository_info()
+
+        assert info["repo_id"] == test_repo_id
+        assert info["pretty_name"] == "Test Genomics Dataset"
+        assert info["license"] == "mit"
+        assert info["num_configs"] == 4
+        assert "genomic_features" in info["dataset_types"]
+        assert "annotated_features" in info["dataset_types"]
+        assert "genome_map" in info["dataset_types"]
+        assert "metadata" in info["dataset_types"]
+        assert info["total_files"] == 5
+        assert info["last_modified"] == "2023-12-01T10:30:00Z"
+        assert info["has_default_config"] is True
+
+    @patch("tfbpapi.datacard.HfDataCardFetcher")
+    @patch("tfbpapi.datacard.HfRepoStructureFetcher")
+    @patch("tfbpapi.datacard.HfSizeInfoFetcher")
+    def test_get_repository_info_fetch_error(
+        self,
+        mock_size_fetcher,
+        mock_structure_fetcher,
+        mock_card_fetcher,
+        test_repo_id,
+        sample_dataset_card_data,
+    ):
+        """Test getting repository info when structure fetch fails."""
+        mock_card_fetcher_instance = Mock()
+        mock_structure_fetcher_instance = Mock()
+        mock_card_fetcher.return_value = mock_card_fetcher_instance
+        mock_structure_fetcher.return_value = mock_structure_fetcher_instance
+
+        mock_card_fetcher_instance.fetch.return_value = sample_dataset_card_data
+        mock_structure_fetcher_instance.fetch.side_effect = HfDataFetchError(
+            "Structure fetch failed"
+        )
+
+        datacard = DataCard(test_repo_id)
+
+        info = datacard.get_repository_info()
+
+        assert info["repo_id"] == test_repo_id
+        assert info["total_files"] is None
+        assert info["last_modified"] is None
+
+    @patch("tfbpapi.datacard.HfDataCardFetcher")
+    @patch("tfbpapi.datacard.HfRepoStructureFetcher")
+    @patch("tfbpapi.datacard.HfSizeInfoFetcher")
+    def test_summary(
+        self,
+        mock_size_fetcher,
+        mock_structure_fetcher,
+        mock_card_fetcher,
+        test_repo_id,
+        sample_dataset_card_data,
+        sample_repo_structure,
+    ):
+        """Test getting a summary of the dataset."""
+        mock_card_fetcher_instance = Mock()
+        mock_structure_fetcher_instance = Mock()
+        mock_card_fetcher.return_value = mock_card_fetcher_instance
+        mock_structure_fetcher.return_value = mock_structure_fetcher_instance
+
+        mock_card_fetcher_instance.fetch.return_value = sample_dataset_card_data
+        mock_structure_fetcher_instance.fetch.return_value = sample_repo_structure
+
+        datacard = DataCard(test_repo_id)
+
+        summary = datacard.summary()
+
+        assert "Dataset: Test Genomics Dataset" in summary
+        assert f"Repository: {test_repo_id}" in summary
+        assert "License: mit" in summary
+        assert "Configurations: 4" in summary
+        assert "genomic_features" in summary
+        assert "binding_data" in summary
+        assert "genome_map_data" in summary
+        assert "experiment_metadata" in summary
+        assert "(default)" in summary  # genomic_features is marked as default
+
+    @patch("tfbpapi.datacard.HfDataCardFetcher")
+    @patch("tfbpapi.datacard.HfRepoStructureFetcher")
+    @patch("tfbpapi.datacard.HfSizeInfoFetcher")
+    def test_extract_partition_values(
+        self,
+        mock_size_fetcher,
+        mock_structure_fetcher,
+        mock_card_fetcher,
+        test_repo_id,
+        sample_dataset_card_data,
+    ):
+        """Test extracting partition values."""
+        mock_card_fetcher_instance = Mock()
+        mock_structure_fetcher_instance = Mock()
+        mock_card_fetcher.return_value = mock_card_fetcher_instance
+        mock_structure_fetcher.return_value = mock_structure_fetcher_instance
+
+        mock_card_fetcher_instance.fetch.return_value = sample_dataset_card_data
+        mock_structure_fetcher_instance.get_partition_values.return_value = [
+            "TF1",
+            "TF2",
+            "TF3",
+        ]
+
+        datacard = DataCard(test_repo_id)
+
+        # Get the genome_map_data config which has partitioning enabled
+        config = datacard.get_config("genome_map_data")
+        assert config is not None
+        assert config.dataset_info.partitioning.enabled is True
+
+        values = datacard._extract_partition_values(config, "regulator")
+        assert values == {"TF1", "TF2", "TF3"}
+        mock_structure_fetcher_instance.get_partition_values.assert_called_once_with(
+            test_repo_id, "regulator"
+        )
+
+    @patch("tfbpapi.datacard.HfDataCardFetcher")
+    @patch("tfbpapi.datacard.HfRepoStructureFetcher")
+    @patch("tfbpapi.datacard.HfSizeInfoFetcher")
+    def test_extract_partition_values_no_partitioning(
+        self,
+        mock_size_fetcher,
+        mock_structure_fetcher,
+        mock_card_fetcher,
+        test_repo_id,
+        sample_dataset_card_data,
+    ):
+        """Test extracting partition values when partitioning is disabled."""
+        mock_card_fetcher_instance = Mock()
+        mock_structure_fetcher_instance = Mock()
+        mock_card_fetcher.return_value = mock_card_fetcher_instance
+        mock_structure_fetcher.return_value = mock_structure_fetcher_instance
+
+        mock_card_fetcher_instance.fetch.return_value = sample_dataset_card_data
+
+        datacard = DataCard(test_repo_id)
+
+        # Get a config without partitioning
+        config = datacard.get_config("genomic_features")
+        assert config is not None
+        assert config.dataset_info.partitioning is None
+
+        values = datacard._extract_partition_values(config, "some_field")
+        assert values == set()
+        mock_structure_fetcher_instance.get_partition_values.assert_not_called()
+
+    @patch("tfbpapi.datacard.HfDataCardFetcher")
+    @patch("tfbpapi.datacard.HfRepoStructureFetcher")
+    @patch("tfbpapi.datacard.HfSizeInfoFetcher")
+    def test_extract_partition_values_field_not_in_partitions(
+        self,
+        mock_size_fetcher,
+        mock_structure_fetcher,
+        mock_card_fetcher,
+        test_repo_id,
+        sample_dataset_card_data,
+    ):
+        """Test extracting partition values when field is not a partition column."""
+        mock_card_fetcher_instance = Mock()
+        mock_structure_fetcher_instance = Mock()
+        mock_card_fetcher.return_value = mock_card_fetcher_instance
+        mock_structure_fetcher.return_value = mock_structure_fetcher_instance
+
+        mock_card_fetcher_instance.fetch.return_value = sample_dataset_card_data
+
+        datacard = DataCard(test_repo_id)
+
+        # Get the genome_map_data config which has partitioning enabled
+        config = datacard.get_config("genome_map_data")
+        assert config is not None
+
+        # Try to extract values for a field that's not in partition_by
+        values = datacard._extract_partition_values(config, "not_a_partition_field")
+        assert values == set()
+        mock_structure_fetcher_instance.get_partition_values.assert_not_called()
+
+    @patch("tfbpapi.datacard.HfDataCardFetcher")
+    @patch("tfbpapi.datacard.HfRepoStructureFetcher")
+    @patch("tfbpapi.datacard.HfSizeInfoFetcher")
+    def test_extract_partition_values_fetch_error(
+        self,
+        mock_size_fetcher,
+        mock_structure_fetcher,
+        mock_card_fetcher,
+        test_repo_id,
+        sample_dataset_card_data,
+    ):
+        """Test extracting partition values when fetch fails."""
+        mock_card_fetcher_instance = Mock()
+        mock_structure_fetcher_instance = Mock()
+        mock_card_fetcher.return_value = mock_card_fetcher_instance
+        mock_structure_fetcher.return_value = mock_structure_fetcher_instance
+
+        mock_card_fetcher_instance.fetch.return_value = sample_dataset_card_data
+        mock_structure_fetcher_instance.get_partition_values.side_effect = (
+            HfDataFetchError("Fetch failed")
+        )
+
+        datacard = DataCard(test_repo_id)
+
+        config = datacard.get_config("genome_map_data")
+        values = datacard._extract_partition_values(config, "regulator")
+
+        # Should return empty set on error
+        assert values == set()
diff --git a/tfbpapi/tests/test_datacard_parsing.py b/tfbpapi/tests/test_datacard_parsing.py
new file mode 100644
index 0000000..5d2210e
--- /dev/null
+++ b/tfbpapi/tests/test_datacard_parsing.py
@@ -0,0 +1,169 @@
+"""Test script to verify datacard parsing with new environmental_conditions."""
+
+import yaml
+
+from tfbpapi.models import DatasetCard
+from tfbpapi.tests.example_datacards import (
+    EXAMPLE_1_SIMPLE_TOPLEVEL,
+    EXAMPLE_2_COMPLEX_FIELD_DEFINITIONS,
+    EXAMPLE_3_PARTITIONED_WITH_METADATA,
+)
+
+
+def test_example_1():
+    """Test parsing example 1: simple top-level conditions."""
+    print("=" * 80)
+    print("Testing Example 1: Simple Top-Level Conditions")
+    print("=" * 80)
+
+    # Extract YAML from markdown
+    yaml_content = EXAMPLE_1_SIMPLE_TOPLEVEL.split("---")[1]
+    data = yaml.safe_load(yaml_content)
+
+    try:
+        card = DatasetCard(**data)
+        print("✓ Successfully parsed Example 1")
+        print(f"  - Configs: {len(card.configs)}")
+        print(
+            "  - Top-level experimental_conditions: "
+            f"{card.experimental_conditions is not None}"
+        )
+
+        if card.experimental_conditions:
+            env_cond = card.experimental_conditions.environmental_conditions
+            if env_cond:
+                print(f"  - Temperature: {env_cond.temperature_celsius}°C")
+                print(f"  - Cultivation: {env_cond.cultivation_method}")
+                if env_cond.media:
+                    print(f"  - Media: {env_cond.media.name}")
+                    print(f"    - Carbon sources: {len(env_cond.media.carbon_source)}")
+                    print(
+                        f"    - Nitrogen sources: {len(env_cond.media.nitrogen_source)}"
+                    )
+
+        # Check field-level definitions
+        config = card.configs[0]
+        for feature in config.dataset_info.features:
+            if feature.definitions:
+                print(
+                    f"  - Feature '{feature.name}' has "
+                    f"{len(feature.definitions)} definitions"
+                )
+                for def_name in feature.definitions.keys():
+                    print(f"    - {def_name}")
+
+        print()
+        return True
+    except Exception as e:
+        print(f"✗ Failed to parse Example 1: {e}")
+        import traceback
+
+        traceback.print_exc()
+        print()
+        return False
+
+
+def test_example_2():
+    """Test parsing example 2: complex field-level definitions."""
+    print("=" * 80)
+    print("Testing Example 2: Complex Field-Level Definitions")
+    print("=" * 80)
+
+    yaml_content = EXAMPLE_2_COMPLEX_FIELD_DEFINITIONS.split("---")[1]
+    data = yaml.safe_load(yaml_content)
+
+    try:
+        card = DatasetCard(**data)
+        print("✓ Successfully parsed Example 2")
+        print(f"  - Configs: {len(card.configs)}")
+        print(f"  - Strain information: {card.strain_information is not None}")
+
+        # Check field-level definitions
+        config = card.configs[0]
+        for feature in config.dataset_info.features:
+            if feature.definitions:
+                print(
+                    f"  - Feature '{feature.name}' has "
+                    f"{len(feature.definitions)} definitions:"
+                )
+                for def_name, def_value in feature.definitions.items():
+                    print(f"    - {def_name}")
+                    if "environmental_conditions" in def_value:
+                        env = def_value["environmental_conditions"]
+                        if "temperature_celsius" in env:
+                            print(f"      Temperature: {env['temperature_celsius']}°C")
+                        if "media" in env:
+                            print(f"      Media: {env['media']['name']}")
+
+        print()
+        return True
+    except Exception as e:
+        print(f"✗ Failed to parse Example 2: {e}")
+        import traceback
+
+        traceback.print_exc()
+        print()
+        return False
+
+
+def test_example_3():
+    """Test parsing example 3: partitioned with metadata."""
+    print("=" * 80)
+    print("Testing Example 3: Partitioned with Metadata")
+    print("=" * 80)
+
+    yaml_content = EXAMPLE_3_PARTITIONED_WITH_METADATA.split("---")[1]
+    data = yaml.safe_load(yaml_content)
+
+    try:
+        card = DatasetCard(**data)
+        print("✓ Successfully parsed Example 3")
+        print(f"  - Configs: {len(card.configs)}")
+        print(
+            "  - Top-level experimental_conditions: "
+            f"{card.experimental_conditions is not None}"
+        )
+
+        if card.experimental_conditions:
+            env_cond = card.experimental_conditions.environmental_conditions
+            if env_cond and env_cond.media:
+                print(f"  - Top-level media: {env_cond.media.name}")
+
+        # Check config-level experimental_conditions
+        for config in card.configs:
+            if config.experimental_conditions:
+                print(f"  - Config '{config.config_name}' has experimental_conditions")
+                env_cond = config.experimental_conditions.environmental_conditions
+                if env_cond and env_cond.media:
+                    print(f"    - Media: {env_cond.media.name}")
+                    print(f"    - Temperature: {env_cond.temperature_celsius}°C")
+
+        print()
+        return True
+    except Exception as e:
+        print(f"✗ Failed to parse Example 3: {e}")
+        import traceback
+
+        traceback.print_exc()
+        print()
+        return False
+
+
+if __name__ == "__main__":
+    results = []
+
+    results.append(test_example_1())
+    results.append(test_example_2())
+    results.append(test_example_3())
+
+    print("=" * 80)
+    print("Summary")
+    print("=" * 80)
+    print(f"Passed: {sum(results)}/{len(results)}")
+
+    if all(results):
+        print("\n✓ All tests passed!")
+        exit(0)
+    else:
+        print("\n✗ Some tests failed")
+        exit(1)
diff --git a/tfbpapi/tests/test_fetchers.py b/tfbpapi/tests/test_fetchers.py
new file mode 100644
index 0000000..ac350f5
--- /dev/null
+++ b/tfbpapi/tests/test_fetchers.py
@@ -0,0 +1,435 @@
+"""Tests for datainfo fetcher classes."""
+
+from unittest.mock import Mock, patch
+
+import pytest
+import requests
+from requests import HTTPError
+
+from tfbpapi.fetchers import (
+    HfDataCardFetcher,
+    HfRepoStructureFetcher,
+    HfSizeInfoFetcher,
+)
+from tfbpapi.errors import HfDataFetchError
+
+
+class TestHfDataCardFetcher:
+    """Test HfDataCardFetcher class."""
+
+    def test_init_with_token(self, test_token):
+        """Test initialization with token."""
+        fetcher = HfDataCardFetcher(token=test_token)
+        assert fetcher.token == test_token
+
+    def test_init_without_token(self):
+        """Test initialization without token."""
+        with patch.dict("os.environ", {}, clear=True):
+            fetcher = HfDataCardFetcher()
+            assert fetcher.token is None
+
+    def test_init_with_env_token(self, test_token):
+        """Test initialization with environment token."""
+        with patch.dict("os.environ", {"HF_TOKEN": test_token}):
+            fetcher = HfDataCardFetcher()
+            assert fetcher.token == test_token
+
+    @patch("tfbpapi.fetchers.DatasetCard")
+    def test_fetch_success(
+        self, mock_dataset_card, test_repo_id, sample_dataset_card_data
+    ):
+        """Test successful dataset card fetch."""
+        # Setup mock
+        mock_card = Mock()
+        mock_card.data.to_dict.return_value = sample_dataset_card_data
+        mock_dataset_card.load.return_value = mock_card
+
+        fetcher = HfDataCardFetcher(token="test_token")
+        result = fetcher.fetch(test_repo_id)
+
+        assert result == sample_dataset_card_data
+        mock_dataset_card.load.assert_called_once_with(
+            test_repo_id, repo_type="dataset", token="test_token"
+        )
+
+    @patch("tfbpapi.fetchers.DatasetCard")
+    def test_fetch_no_data_section(self, mock_dataset_card, test_repo_id):
+        """Test fetch when dataset card has no data section."""
+        # Setup mock with no data
+        mock_card = Mock()
+        mock_card.data = None
+        mock_dataset_card.load.return_value = mock_card
+
+        fetcher = HfDataCardFetcher()
+        result = fetcher.fetch(test_repo_id)
+
+        assert result == {}
+
+    @patch("tfbpapi.fetchers.DatasetCard")
+    def test_fetch_exception(self, mock_dataset_card, test_repo_id):
+        """Test fetch when DatasetCard.load raises exception."""
+        mock_dataset_card.load.side_effect = Exception("API Error")
+
+        fetcher = HfDataCardFetcher()
+
+        with pytest.raises(HfDataFetchError, match="Failed to fetch dataset card"):
+            fetcher.fetch(test_repo_id)
+
+    def test_fetch_different_repo_types(self, sample_dataset_card_data):
+        """Test fetch with different repository types."""
+        with patch("tfbpapi.fetchers.DatasetCard") as mock_dataset_card:
+            mock_card = Mock()
+            mock_card.data.to_dict.return_value = sample_dataset_card_data
+            mock_dataset_card.load.return_value = mock_card
+
+            fetcher = HfDataCardFetcher()
+
+            # Test with model repo
+            fetcher.fetch("test/repo", repo_type="model")
+            mock_dataset_card.load.assert_called_with(
+                "test/repo", repo_type="model", token=None
+            )
+
+            # Test with space repo
+            fetcher.fetch("test/repo", repo_type="space")
+            mock_dataset_card.load.assert_called_with(
+                "test/repo", repo_type="space", token=None
+            )
+
+
+class TestHfSizeInfoFetcher:
+    """Test HfSizeInfoFetcher class."""
+
+    def test_init(self, test_token):
+        """Test initialization."""
+        fetcher = HfSizeInfoFetcher(token=test_token)
+        assert fetcher.token == test_token
+        assert fetcher.base_url == "https://datasets-server.huggingface.co"
+
+    def test_build_headers_with_token(self, test_token):
+        """Test building headers with token."""
+        fetcher = HfSizeInfoFetcher(token=test_token)
+        headers = fetcher._build_headers()
+
+        assert headers["User-Agent"] == "TFBP-API/1.0"
+        assert headers["Authorization"] == f"Bearer {test_token}"
+
+    def test_build_headers_without_token(self):
+        """Test building headers without token."""
+        fetcher = HfSizeInfoFetcher()
+        headers = fetcher._build_headers()
+
+        assert headers["User-Agent"] == "TFBP-API/1.0"
+        assert "Authorization" not in headers
+
+    @patch("tfbpapi.fetchers.requests.get")
+    def test_fetch_success(self, mock_get, test_repo_id, sample_size_info):
+        """Test successful size info fetch."""
+        # Setup mock response
+        mock_response = Mock()
+        mock_response.json.return_value = sample_size_info
+        mock_get.return_value = mock_response
+
+        fetcher = HfSizeInfoFetcher(token="test_token")
+        result = fetcher.fetch(test_repo_id)
+
+        assert result == sample_size_info
+        mock_get.assert_called_once()
+
+        # Check call arguments
+        call_args = mock_get.call_args
+        assert call_args[1]["params"]["dataset"] == test_repo_id
+        assert call_args[1]["headers"]["Authorization"] == "Bearer test_token"
+        assert call_args[1]["timeout"] == 30
+
+    @patch("tfbpapi.fetchers.requests.get")
+    def test_fetch_404_error(self, mock_get, test_repo_id):
+        """Test fetch with 404 error."""
+        # Setup mock 404 response
+        mock_response = Mock()
+        mock_response.status_code = 404
+        error = HTTPError(response=mock_response)
+        mock_get.side_effect = error
+
+        fetcher = HfSizeInfoFetcher()
+
+        with pytest.raises(HfDataFetchError, match="Dataset .* not found"):
+            fetcher.fetch(test_repo_id)
+
+    @patch("tfbpapi.fetchers.requests.get")
+    def test_fetch_403_error(self, mock_get, test_repo_id):
+        """Test fetch with 403 error."""
+        # Setup mock 403 response
+        mock_response = Mock()
+        mock_response.status_code = 403
+        error = HTTPError(response=mock_response)
+        mock_get.side_effect = error
+
+        fetcher = HfSizeInfoFetcher()
+
+        with pytest.raises(
+            HfDataFetchError, match="Access denied.*check token permissions"
+        ):
+            fetcher.fetch(test_repo_id)
+
+    @patch("tfbpapi.fetchers.requests.get")
+    def test_fetch_other_http_error(self, mock_get, test_repo_id):
+        """Test fetch with other HTTP error."""
+        # Setup mock 500 response
+        mock_response = Mock()
+        mock_response.status_code = 500
+        error = HTTPError(response=mock_response)
+        mock_get.side_effect = error
+
+        fetcher = HfSizeInfoFetcher()
+
+        with pytest.raises(HfDataFetchError, match="HTTP error fetching size"):
+            fetcher.fetch(test_repo_id)
+
+    @patch("tfbpapi.fetchers.requests.get")
+    def test_fetch_request_exception(self, mock_get, test_repo_id):
+        """Test fetch with request exception."""
+        mock_get.side_effect = requests.RequestException("Network error")
+
+        fetcher = HfSizeInfoFetcher()
+
+        with pytest.raises(HfDataFetchError, match="Request failed fetching size"):
+            fetcher.fetch(test_repo_id)
+
+    @patch("tfbpapi.fetchers.requests.get")
+    def test_fetch_json_decode_error(self, mock_get, test_repo_id):
+        """Test fetch with JSON decode error."""
+        # Setup mock response with invalid JSON
+        mock_response = Mock()
+        mock_response.json.side_effect = ValueError("Invalid JSON")
+        mock_get.return_value = mock_response
+
+        fetcher = HfSizeInfoFetcher()
+
+        with pytest.raises(HfDataFetchError, match="Invalid JSON response"):
+            fetcher.fetch(test_repo_id)
+
+
+class TestHfRepoStructureFetcher:
+    """Test HfRepoStructureFetcher class."""
+
+    def test_init(self, test_token):
+        """Test initialization."""
+        fetcher = HfRepoStructureFetcher(token=test_token)
+        assert fetcher.token == test_token
+        assert fetcher._cached_structure == {}
+
+    @patch("tfbpapi.fetchers.repo_info")
+    def test_fetch_success(self, mock_repo_info, test_repo_id, sample_repo_structure):
+        """Test successful repository structure fetch."""
+        # Setup mock repo info
+        mock_info = Mock()
+        mock_info.siblings = [
+            Mock(rfilename="features.parquet", size=2048000, lfs=Mock()),
+            Mock(rfilename="binding/part1.parquet", size=1024000, lfs=Mock()),
+            Mock(
+                rfilename="tracks/regulator=TF1/experiment=exp1/data.parquet",
+                size=5120000,
+                lfs=Mock(),
+            ),
+        ]
+        mock_info.last_modified.isoformat.return_value = "2023-12-01T10:30:00Z"
+        mock_repo_info.return_value = mock_info
+
+        fetcher = HfRepoStructureFetcher(token="test_token")
+        result = fetcher.fetch(test_repo_id)
+
+        assert result["repo_id"] == test_repo_id
+        assert result["total_files"] == 3
+        assert len(result["files"]) == 3
+        assert result["last_modified"] == "2023-12-01T10:30:00Z"
+
+        # Check that repo_info was called correctly
+        mock_repo_info.assert_called_once_with(
+            repo_id=test_repo_id, repo_type="dataset", token="test_token"
+        )
+
+    @patch("tfbpapi.fetchers.repo_info")
+    def test_fetch_with_caching(self, mock_repo_info, test_repo_id):
+        """Test fetch with caching behavior."""
+        # Setup mock
+        mock_info = Mock()
+        mock_info.siblings = []
+        mock_info.last_modified = None
+        mock_repo_info.return_value = mock_info
+
+        fetcher = HfRepoStructureFetcher()
+
+        # First fetch
+        result1 = fetcher.fetch(test_repo_id)
+        assert mock_repo_info.call_count == 1
+
+        # Second fetch should use cache
+        result2 = fetcher.fetch(test_repo_id)
+        assert mock_repo_info.call_count == 1  # Not called again
+        assert result1 == result2
+
+        # Force refresh should call API again
+        result3 = fetcher.fetch(test_repo_id, force_refresh=True)
+        assert mock_repo_info.call_count == 2
+
+    @patch("tfbpapi.fetchers.repo_info")
+    def test_fetch_siblings_none(self, mock_repo_info, test_repo_id):
+        """Test fetch when siblings is None."""
+        # Setup mock with None siblings
+        mock_info = Mock()
+        mock_info.siblings = None
+        mock_info.last_modified = None
+        mock_repo_info.return_value = mock_info
+
+        fetcher = HfRepoStructureFetcher()
+        result = fetcher.fetch(test_repo_id)
+
+        assert result["total_files"] == 0
+        assert result["files"] == []
+        assert result["partitions"] == {}
+
+    @patch("tfbpapi.fetchers.repo_info")
+    def test_fetch_exception(self, mock_repo_info, test_repo_id):
+        """Test fetch when repo_info raises exception."""
+        mock_repo_info.side_effect = Exception("API Error")
+
+        fetcher = HfRepoStructureFetcher()
+
+        with pytest.raises(HfDataFetchError, match="Failed to fetch repo structure"):
+            fetcher.fetch(test_repo_id)
+
+    def test_extract_partition_info(self):
+        """Test extracting partition information from file paths."""
+        fetcher = HfRepoStructureFetcher()
+        partitions = {}
+
+        # Test normal partition pattern
+        fetcher._extract_partition_info(
+            "data/regulator=TF1/condition=control/file.parquet", partitions
+        )
+        assert "regulator" in partitions
+        assert "TF1" in partitions["regulator"]
+        assert "condition" in partitions
+        assert "control" in partitions["condition"]
+
+        # Test multiple values for same partition
+        fetcher._extract_partition_info(
+            "data/regulator=TF2/condition=treatment/file.parquet", partitions
+        )
+        assert len(partitions["regulator"]) == 2
+        assert "TF2" in partitions["regulator"]
+        assert "treatment" in partitions["condition"]
+
+        # Test file without partitions
+        fetcher._extract_partition_info("simple_file.parquet", partitions)
+        # partitions dict should remain unchanged
+        assert len(partitions) == 2
+
+    @patch("tfbpapi.fetchers.repo_info")
+    def test_get_partition_values_success(self, mock_repo_info, test_repo_id):
+        """Test getting partition values for a specific column."""
+        # Setup mock with partitioned files
+        mock_info = Mock()
+        mock_info.siblings = [
+            Mock(rfilename="data/regulator=TF1/file1.parquet", size=1000, lfs=None),
+            Mock(rfilename="data/regulator=TF2/file2.parquet", size=1000, lfs=None),
+            Mock(rfilename="data/regulator=TF3/file3.parquet", size=1000, lfs=None),
+        ]
+        mock_info.last_modified = None
+        mock_repo_info.return_value = mock_info
+
+        fetcher = HfRepoStructureFetcher()
+        values = fetcher.get_partition_values(test_repo_id, "regulator")
+
+        assert values == ["TF1", "TF2", "TF3"]  # Should be sorted
+
+    @patch("tfbpapi.fetchers.repo_info")
+    def test_get_partition_values_no_partitions(self, mock_repo_info, test_repo_id):
+        """Test getting partition values when no partitions exist."""
+        # Setup mock with no partitioned files
+        mock_info = Mock()
+        mock_info.siblings = [
+            Mock(rfilename="simple_file.parquet", size=1000, lfs=None),
+        ]
+        mock_info.last_modified = None
+        mock_repo_info.return_value = mock_info
+
+        fetcher = HfRepoStructureFetcher()
+        values = fetcher.get_partition_values(test_repo_id, "regulator")
+
+        assert values == []
+
+    @patch("tfbpapi.fetchers.repo_info")
+    def test_get_dataset_files_all(self, mock_repo_info, test_repo_id):
+        """Test getting all dataset files."""
+        # Setup mock
+        mock_info = Mock()
+        mock_info.siblings = [
+            Mock(rfilename="file1.parquet", size=1000, lfs=None),
+            Mock(rfilename="file2.parquet", size=2000, lfs=Mock()),
+        ]
+        mock_info.last_modified = None
+        mock_repo_info.return_value = mock_info
+
+        fetcher = HfRepoStructureFetcher()
+        files = fetcher.get_dataset_files(test_repo_id)
+
+        assert len(files) == 2
+        assert files[0]["path"] == "file1.parquet"
+        assert files[0]["size"] == 1000
+        assert files[0]["is_lfs"] is False
+
+        assert files[1]["path"] == "file2.parquet"
+        assert files[1]["size"] == 2000
+        assert files[1]["is_lfs"] is True
+
+    @patch("tfbpapi.fetchers.repo_info")
+    def test_get_dataset_files_with_pattern(self, mock_repo_info, test_repo_id):
+        """Test getting dataset files with path pattern filter."""
+        # Setup mock
+        mock_info = Mock()
+        mock_info.siblings = [
+            Mock(rfilename="data/file1.parquet", size=1000, lfs=None),
+            Mock(rfilename="metadata/info.json", size=500, lfs=None),
+            Mock(rfilename="data/file2.parquet", size=2000, lfs=None),
+        ]
+        mock_info.last_modified = None
+        mock_repo_info.return_value = mock_info
+
+        fetcher = HfRepoStructureFetcher()
+        files = fetcher.get_dataset_files(test_repo_id, path_pattern=r".*\.parquet$")
+
+        assert len(files) == 2
+        assert all(f["path"].endswith(".parquet") for f in files)
+
+    def test_get_dataset_files_uses_cache(self):
+        """Test that get_dataset_files uses fetch caching."""
+        fetcher = HfRepoStructureFetcher()
+
+        with patch.object(fetcher, "fetch") as mock_fetch:
+            mock_fetch.return_value = {"files": []}
+
+            # First call
+            fetcher.get_dataset_files("test/repo")
+            mock_fetch.assert_called_with("test/repo", force_refresh=False)
+
+            # Second call with force_refresh
+            fetcher.get_dataset_files("test/repo", force_refresh=True)
+            mock_fetch.assert_called_with("test/repo", force_refresh=True)
+
+    def test_get_partition_values_uses_cache(self):
+        """Test that get_partition_values uses fetch caching."""
+        fetcher = HfRepoStructureFetcher()
+
+        with patch.object(fetcher, "fetch") as mock_fetch:
+            mock_fetch.return_value = {"partitions": {"regulator": {"TF1", "TF2"}}}
+
+            # First call
+            result = fetcher.get_partition_values("test/repo", "regulator")
+            mock_fetch.assert_called_with("test/repo", force_refresh=False)
+            assert result == ["TF1", "TF2"]
+
+            # Second call with force_refresh
+            fetcher.get_partition_values("test/repo", "regulator", force_refresh=True)
+            mock_fetch.assert_called_with("test/repo", force_refresh=True)
diff --git a/tfbpapi/tests/test_hf_cache_manager.py b/tfbpapi/tests/test_hf_cache_manager.py
new file mode 100644
index 0000000..aa395df
--- /dev/null
+++ b/tfbpapi/tests/test_hf_cache_manager.py
@@ -0,0 +1,783 @@
+"""Comprehensive tests for HfCacheManager class."""
+
+import logging
+from datetime import datetime, timedelta
+from unittest.mock import Mock, patch
+
+import duckdb
+import pytest
+
+from tfbpapi.hf_cache_manager import HfCacheManager
+from tfbpapi.models import DatasetType
+
+
+class TestHfCacheManagerInit:
+    """Test HfCacheManager initialization."""
+
+    def test_init_basic(self):
+        """Test basic initialization."""
+        conn = duckdb.connect(":memory:")
+        repo_id = "test/repo"
+
+        with patch(
+            "tfbpapi.hf_cache_manager.DataCard.__init__", return_value=None
+        ) as mock_datacard_init:
+            cache_manager = HfCacheManager(repo_id, conn)
+            # Manually set the properties that would normally
+            # be set by DataCard.__init__
+            cache_manager.repo_id = repo_id
+            cache_manager.token = None
+
+            assert cache_manager.repo_id == repo_id
+            assert cache_manager.duckdb_conn == conn
+            assert cache_manager.token is None
+            assert cache_manager.logger is not None
+            # DataCard should be initialized as parent
+            mock_datacard_init.assert_called_once_with(repo_id, None)
+
+    def test_init_with_token_and_logger(self):
+        """Test initialization with token and custom logger."""
+        conn = duckdb.connect(":memory:")
+        repo_id = "test/repo"
+        token = "test_token"
+        logger = logging.getLogger("test_logger")
+
+        with patch(
+            "tfbpapi.hf_cache_manager.DataCard.__init__", return_value=None
+        ) as mock_datacard_init:
+            cache_manager = HfCacheManager(repo_id, conn, token=token, logger=logger)
+            # Manually set the properties that would
+            # normally be set by DataCard.__init__
+            cache_manager.repo_id = repo_id
+            cache_manager.token = token
+
+            assert cache_manager.repo_id == repo_id
+            assert cache_manager.duckdb_conn == conn
+            assert cache_manager.token == token
+            assert cache_manager.logger == logger
+            # DataCard should be initialized as parent with token
+            mock_datacard_init.assert_called_once_with(repo_id, token)
+
+
+class TestHfCacheManagerDatacard:
+    """Test DataCard integration since HfCacheManager now inherits from DataCard."""
+
+    def test_datacard_inheritance(self):
+        """Test that HfCacheManager properly inherits from DataCard."""
+        conn = duckdb.connect(":memory:")
+        repo_id = "test/repo"
+        token = "test_token"
+
+        with patch(
+            "tfbpapi.hf_cache_manager.DataCard.__init__", return_value=None
+        ) as mock_datacard_init:
+            cache_manager = HfCacheManager(repo_id, conn, token=token)
+
+            # DataCard should be initialized during construction
+            mock_datacard_init.assert_called_once_with(repo_id, token)
+
+            # Should have DataCard methods available (they exist on the class)
+            assert hasattr(cache_manager, "get_config")
+
+
+class TestHfCacheManagerDuckDBOperations:
+    """Test DuckDB operations that are still part of HfCacheManager."""
+
+    @patch("tfbpapi.hf_cache_manager.DataCard.__init__", return_value=None)
+    def test_create_duckdb_table_from_files_single_file(
+        self, mock_datacard_init, tmpdir
+    ):
+        """Test creating DuckDB table from single parquet file."""
+        # Create a mock parquet file
+        parquet_file = tmpdir.join("test.parquet")
+        parquet_file.write("dummy_content")
+
+        # Use a separate cache manager with mock connection for this test
+        mock_conn = Mock()
+        test_cache_manager = HfCacheManager("test/repo", mock_conn)
+
+        # Mock the validation method since we're testing table creation
+        test_cache_manager._validate_source_sample_fields = Mock()  # type: ignore
+
+        test_cache_manager._create_duckdb_table_from_files(
+            [str(parquet_file)], "test_table", "test_config"
+        )
+
+        mock_conn.execute.assert_called_once()
+        sql_call = mock_conn.execute.call_args[0][0]
+        assert "CREATE OR REPLACE VIEW test_table" in sql_call
+        assert str(parquet_file) in sql_call
+
+    @patch("tfbpapi.hf_cache_manager.DataCard.__init__", return_value=None)
+    def test_create_duckdb_table_from_files_multiple_files(
+        self, mock_datacard_init, tmpdir
+    ):
+        """Test creating DuckDB table from multiple parquet files."""
+        # Create mock parquet files
+        file1 = tmpdir.join("test1.parquet")
+        file1.write("dummy_content1")
+        file2 = tmpdir.join("test2.parquet")
+        file2.write("dummy_content2")
+
+        files = [str(file1), str(file2)]
+
+        # Use a separate cache manager with mock connection for this test
+        mock_conn = Mock()
+        test_cache_manager = HfCacheManager("test/repo", mock_conn)
+
+        # Mock the validation method since we're testing table creation
+        test_cache_manager._validate_source_sample_fields = Mock()  # type: ignore
+
+        test_cache_manager._create_duckdb_table_from_files(
+            files, "test_table", "test_config"
+        )
+
+        mock_conn.execute.assert_called_once()
+        sql_call = mock_conn.execute.call_args[0][0]
+        assert "CREATE OR REPLACE VIEW test_table" in sql_call
+        assert str(file1) in sql_call
+        assert str(file2) in sql_call
+
+
+class TestHfCacheManagerCacheManagement:
+    """Test cache management functionality."""
+
+    def setup_method(self):
+        """Set up test fixtures."""
+        with patch("tfbpapi.hf_cache_manager.DataCard.__init__", return_value=None):
+            self.conn = duckdb.connect(":memory:")
+            self.repo_id = "test/repo"
+            self.cache_manager = HfCacheManager(self.repo_id, self.conn)
+
+    def test_parse_size_string(self):
+        """Test size string parsing."""
+        assert self.cache_manager._parse_size_string("10KB") == 10 * 1024
+        assert self.cache_manager._parse_size_string("5MB") == 5 * 1024**2
+        assert self.cache_manager._parse_size_string("2GB") == 2 * 1024**3
+        assert self.cache_manager._parse_size_string("1TB") == 1 * 1024**4
+        assert self.cache_manager._parse_size_string("500") == 500
+        assert self.cache_manager._parse_size_string("10.5GB") == int(10.5 * 1024**3)
+
+    def test_format_bytes(self):
+        """Test byte formatting."""
+        assert self.cache_manager._format_bytes(0) == "0B"
+        assert self.cache_manager._format_bytes(1023) == "1023.0B"
+        assert self.cache_manager._format_bytes(1024) == "1.0KB"
+        assert self.cache_manager._format_bytes(1024**2) == "1.0MB"
+        assert self.cache_manager._format_bytes(1024**3) == "1.0GB"
+        assert self.cache_manager._format_bytes(1024**4) == "1.0TB"
+
+    @patch("tfbpapi.hf_cache_manager.scan_cache_dir")
+    def test_clean_cache_by_age(self, mock_scan_cache_dir):
+        """Test age-based cache cleaning."""
+        # Setup mock cache info
+        mock_cache_info = Mock()
+        mock_revision = Mock()
+        mock_revision.commit_hash = "abc123"
+        mock_revision.last_modified = (datetime.now() - timedelta(days=35)).timestamp()
+
+        mock_repo = Mock()
+        mock_repo.revisions = [mock_revision]
+
+        mock_cache_info.repos = [mock_repo]
+        mock_delete_strategy = Mock()
+        mock_delete_strategy.expected_freed_size_str = "100MB"
+        mock_cache_info.delete_revisions.return_value = mock_delete_strategy
+
+        mock_scan_cache_dir.return_value = mock_cache_info
+
+        result = self.cache_manager.clean_cache_by_age(max_age_days=30, dry_run=True)
+
+        assert result == mock_delete_strategy
+        mock_cache_info.delete_revisions.assert_called_once_with("abc123")
+
+    @patch("tfbpapi.hf_cache_manager.scan_cache_dir")
+    def test_clean_cache_by_age_no_old_revisions(self, mock_scan_cache_dir):
+        """Test age-based cleaning when no old revisions exist."""
+        mock_cache_info = Mock()
+        mock_revision = Mock()
+        mock_revision.commit_hash = "abc123"
+        mock_revision.last_modified = datetime.now().timestamp()  # Recent
+
+        mock_repo = Mock()
+        mock_repo.revisions = [mock_revision]
+
+        mock_cache_info.repos = [mock_repo]
+        mock_delete_strategy = Mock()
+        mock_delete_strategy.expected_freed_size_str = "0B"
+        mock_cache_info.delete_revisions.return_value = mock_delete_strategy
+
+        mock_scan_cache_dir.return_value = mock_cache_info
+
+        result = self.cache_manager.clean_cache_by_age(max_age_days=30, dry_run=True)
+
+        # Should still return a strategy, but with empty revisions
+        assert result == mock_delete_strategy
+        mock_cache_info.delete_revisions.assert_called_once_with()
+
+    @patch("tfbpapi.hf_cache_manager.scan_cache_dir")
+    def test_clean_cache_by_size(self, mock_scan_cache_dir):
+        """Test size-based cache cleaning."""
+        # Setup mock cache info
+        mock_cache_info = Mock()
+        mock_cache_info.size_on_disk = 5 * 1024**3  # 5GB
+        mock_cache_info.size_on_disk_str = "5.0GB"
+
+        mock_revision = Mock()
+        mock_revision.commit_hash = "abc123"
+        mock_revision.last_modified = datetime.now().timestamp()
+        mock_revision.size_on_disk = 2 * 1024**3  # 2GB
+
+        mock_repo = Mock()
+        mock_repo.revisions = [mock_revision]
+
+        mock_cache_info.repos = [mock_repo]
+        mock_delete_strategy = Mock()
+        mock_delete_strategy.expected_freed_size_str = "2GB"
+        mock_cache_info.delete_revisions.return_value = mock_delete_strategy
+
+        mock_scan_cache_dir.return_value = mock_cache_info
+
+        result = self.cache_manager.clean_cache_by_size(
+            target_size="3GB", strategy="oldest_first", dry_run=True
+        )
+
+        assert result == mock_delete_strategy
+        mock_cache_info.delete_revisions.assert_called_once()
+
+    @patch("tfbpapi.hf_cache_manager.scan_cache_dir")
+    def test_clean_cache_by_size_already_under_target(self, mock_scan_cache_dir):
+        """Test size-based cleaning when already under target."""
+        mock_cache_info = Mock()
+        mock_cache_info.size_on_disk = 1 * 1024**3  # 1GB
+        mock_cache_info.size_on_disk_str = "1.0GB"
+        mock_cache_info.repos = []
+
+        mock_delete_strategy = Mock()
+        mock_delete_strategy.expected_freed_size_str = "0B"
+        mock_cache_info.delete_revisions.return_value = mock_delete_strategy
+
+        mock_scan_cache_dir.return_value = mock_cache_info
+
+        result = self.cache_manager.clean_cache_by_size(
+            target_size="2GB", strategy="oldest_first", dry_run=True
+        )
+
+        assert result == mock_delete_strategy
+
+    @patch("tfbpapi.hf_cache_manager.scan_cache_dir")
+    def test_clean_unused_revisions(self, mock_scan_cache_dir):
+        """Test cleaning unused revisions."""
+        # Setup mock with multiple revisions
+        mock_cache_info = Mock()
+
+        mock_revision1 = Mock()
+        mock_revision1.commit_hash = "abc123"
+        mock_revision1.last_modified = (datetime.now() - timedelta(days=1)).timestamp()
+
+        mock_revision2 = Mock()
+        mock_revision2.commit_hash = "def456"
+        mock_revision2.last_modified = (datetime.now() - timedelta(days=10)).timestamp()
+
+        mock_revision3 = Mock()
+        mock_revision3.commit_hash = "ghi789"
+        mock_revision3.last_modified = (datetime.now() - timedelta(days=20)).timestamp()
+
+        mock_repo = Mock()
+        mock_repo.revisions = [mock_revision1, mock_revision2, mock_revision3]
+
+        mock_cache_info.repos = [mock_repo]
+        mock_delete_strategy = Mock()
+        mock_delete_strategy.expected_freed_size_str = "1GB"
+        mock_cache_info.delete_revisions.return_value = mock_delete_strategy
+
+        mock_scan_cache_dir.return_value = mock_cache_info
+
+        result = self.cache_manager.clean_unused_revisions(keep_latest=2, dry_run=True)
+
+        assert result == mock_delete_strategy
+        # Should delete oldest revision (ghi789)
+        mock_cache_info.delete_revisions.assert_called_once_with("ghi789")
+
+    @patch("tfbpapi.hf_cache_manager.scan_cache_dir")
+    def test_auto_clean_cache(self, mock_scan_cache_dir):
+        """Test automated cache cleaning."""
+        mock_cache_info = Mock()
+        mock_cache_info.size_on_disk = 10 * 1024**3  # 10GB
+        mock_cache_info.repos = []
+
+        mock_delete_strategy = Mock()
+        mock_delete_strategy.expected_freed_size = 1 * 1024**3  # 1GB
+        mock_delete_strategy.expected_freed_size_str = "1GB"
+
+        mock_scan_cache_dir.return_value = mock_cache_info
+
+        with patch.object(
+            self.cache_manager, "clean_cache_by_age", return_value=mock_delete_strategy
+        ):
+            with patch.object(
+                self.cache_manager,
+                "clean_unused_revisions",
+                return_value=mock_delete_strategy,
+            ):
+                with patch.object(
+                    self.cache_manager,
+                    "clean_cache_by_size",
+                    return_value=mock_delete_strategy,
+                ):
+                    result = self.cache_manager.auto_clean_cache(
+                        max_age_days=30,
+                        max_total_size="5GB",
+                        keep_latest_per_repo=2,
+                        dry_run=True,
+                    )
+
+                    assert (
+                        len(result) == 3
+                    )  # All three cleanup strategies should be executed
+                    assert all(strategy == mock_delete_strategy for strategy in result)
+
+
+class TestHfCacheManagerErrorHandling:
+    """Test error handling and edge cases."""
+
+    def setup_method(self):
+        """Set up test fixtures."""
+        with patch("tfbpapi.hf_cache_manager.DataCard.__init__", return_value=None):
+            self.conn = duckdb.connect(":memory:")
+            self.repo_id = "test/repo"
+            self.cache_manager = HfCacheManager(self.repo_id, self.conn)
+
+    def test_parse_size_string_invalid_input(self):
+        """Test error handling for invalid size strings."""
+        with pytest.raises(ValueError):
+            self.cache_manager._parse_size_string("invalid")
+
+    @patch("tfbpapi.hf_cache_manager.scan_cache_dir")
+    def test_clean_cache_invalid_strategy(self, mock_scan_cache_dir):
+        """Test error handling for invalid cleanup strategy."""
+        mock_cache_info = Mock()
+        mock_cache_info.size_on_disk = 5 * 1024**3
+        mock_cache_info.repos = []
+        mock_scan_cache_dir.return_value = mock_cache_info
+
+        with pytest.raises(ValueError, match="Unknown strategy"):
+            self.cache_manager.clean_cache_by_size(
+                target_size="1GB",
+                strategy="invalid_strategy",  # type: ignore[arg-type]
+                dry_run=True,
+            )
+
+
+class TestHfCacheManagerIntegration:
+    """Integration tests with real DuckDB operations."""
+
+    def setup_method(self):
+        """Set up test fixtures."""
+        with patch("tfbpapi.hf_cache_manager.DataCard.__init__", return_value=None):
+            self.conn = duckdb.connect(":memory:")
+            self.repo_id = "test/repo"
+            self.cache_manager = HfCacheManager(self.repo_id, self.conn)
+
+    def test_metadata_workflow_integration(self, tmpdir):
+        """Test complete metadata workflow with real files."""
+        # Create temporary parquet file content
+        metadata_file = tmpdir.join("metadata.parquet")
+        metadata_file.write("dummy_parquet_content")
+
+        # Test the core table creation functionality
+        mock_conn = Mock()
+        test_cache_manager = HfCacheManager("test/repo", mock_conn)
+
+        # Mock the validation method since we're testing table creation
+        test_cache_manager._validate_source_sample_fields = Mock()  # type: ignore
+
+        # Test _create_duckdb_table_from_files directly
+        test_cache_manager._create_duckdb_table_from_files(
+            [str(metadata_file)], "metadata_test_metadata", "test_metadata"
+        )
+
+        # Verify the SQL was generated correctly
+        mock_conn.execute.assert_called_once()
+        sql_call = mock_conn.execute.call_args[0][0]
+        assert "CREATE OR REPLACE VIEW metadata_test_metadata" in sql_call
+        assert str(metadata_file) in sql_call
+
+    def test_embedded_metadata_workflow_integration(self):
+        """Test complete embedded metadata workflow with real DuckDB operations."""
+        # Create real test data in DuckDB
+        self.conn.execute(
+            """
+            CREATE TABLE test_data AS
+            SELECT
+                'gene_' || (row_number() OVER()) as gene_id,
+                CASE
+                    WHEN (row_number() OVER()) % 3 = 0 THEN 'treatment_A'
+                    WHEN (row_number() OVER()) % 3 = 1 THEN 'treatment_B'
+                    ELSE 'control'
+                END as experimental_condition,
+                random() * 1000 as expression_value
+            FROM range(30)
+        """
+        )
+
+        # Extract embedded metadata
+        result = self.cache_manager._extract_embedded_metadata_field(
+            "test_data", "experimental_condition", "metadata_test_condition"
+        )
+
+        assert result is True
+
+        # Verify the metadata table was created correctly
+        metadata_results = self.conn.execute(
+            "SELECT value, count FROM metadata_test_condition ORDER BY count DESC"
+        ).fetchall()
+
+        assert len(metadata_results) == 3  # Three unique conditions
+
+        # Check that the counts make sense (should be 10 each for 30 total rows)
+        total_count = sum(row[1] for row in metadata_results)
+        assert total_count == 30
+
+        # Check that conditions are as expected
+        conditions = {row[0] for row in metadata_results}
+        assert conditions == {"treatment_A", "treatment_B", "control"}
+
+    def test_table_existence_checking_integration(self):
+        """Test table existence checking with real DuckDB operations."""
+        # Test non-existent table
+        assert (
+            self.cache_manager._check_metadata_exists_in_duckdb("nonexistent_table")
+            is False
+        )
+
+        # Create a real table
+        self.conn.execute("CREATE TABLE test_table (id INTEGER, name TEXT)")
+
+        # Test existing table
+        assert self.cache_manager._check_metadata_exists_in_duckdb("test_table") is True
+
+        # Test with view
+        self.conn.execute("CREATE VIEW test_view AS SELECT * FROM test_table")
+        assert self.cache_manager._check_metadata_exists_in_duckdb("test_view") is True
+
+
+# Fixtures for common test data
+@pytest.fixture
+def sample_metadata_config():
+    """Sample metadata configuration for testing."""
+    return Mock(
+        config_name="test_metadata",
+        description="Test metadata configuration",
+        data_files=[Mock(path="metadata.parquet")],
+        applies_to=["data_config"],
+    )
+
+
+@pytest.fixture
+def sample_data_config():
+    """Sample data configuration for testing."""
+    return Mock(
+        config_name="test_data",
+        metadata_fields=["condition", "replicate"],
+        dataset_type=DatasetType.ANNOTATED_FEATURES,
+    )
+
+
+@pytest.fixture
+def mock_cache_revision():
+    """Mock cache revision for testing."""
+    revision = Mock()
+    revision.commit_hash = "abc123def456"
+    revision.last_modified = datetime.now().timestamp()
+    revision.size_on_disk = 1024 * 1024 * 100  # 100MB
+    return revision
+
+
+@pytest.fixture
+def mock_cache_repo(mock_cache_revision):
+    """Mock cache repository for testing."""
+    repo = Mock()
+    repo.repo_id = "test/repository"
+    repo.revisions = [mock_cache_revision]
+    repo.size_on_disk = 1024 * 1024 * 100  # 100MB
+    repo.size_on_disk_str = "100.0MB"
+    return repo
+
+
+@pytest.fixture
+def mock_cache_info(mock_cache_repo):
+    """Mock cache info for testing."""
+    cache_info = Mock()
+    cache_info.cache_dir = "/tmp/cache"
+    cache_info.repos = [mock_cache_repo]
+    cache_info.size_on_disk = 1024 * 1024 * 100  # 100MB
+    cache_info.size_on_disk_str = "100.0MB"
+
+    # Mock delete_revisions method
+    def mock_delete_revisions(*revision_hashes):
+        strategy = Mock()
+        strategy.expected_freed_size = (
+            len(revision_hashes) * 1024 * 1024 * 50
+        )  # 50MB per revision
+        strategy.expected_freed_size_str = f"{len(revision_hashes) * 50}.0MB"
+        strategy.delete_content = list(revision_hashes)
+        strategy.execute = Mock()
+        return strategy
+
+    cache_info.delete_revisions = mock_delete_revisions
+    return cache_info
+
+
+class TestSourceSampleValidation:
+    """Test validation of source_sample field format."""
+
+    def setup_method(self):
+        """Set up test fixtures."""
+        self.conn = duckdb.connect(":memory:")
+        self.repo_id = "test/repo"
+
+    def test_valid_source_sample_format(self, tmpdir):
+        """Test that valid source_sample format passes validation."""
+        # Create parquet file with valid composite identifiers
+        parquet_file = tmpdir.join("valid_data.parquet")
+        self.conn.execute(
+            f"""
+            COPY (
+                SELECT
+                    'BrentLab/harbison_2004;harbison_2004;CBF1_YPD'
+                    as binding_sample_ref,
+                    'gene_' || (row_number() OVER()) as target_locus_tag,
+                    random() * 100 as binding_score
+                FROM range(5)
+            ) TO '{parquet_file}' (FORMAT PARQUET)
+            """
+        )
+
+        # Create mock datacard with source_sample field
+        mock_feature = Mock()
+        mock_feature.name = "binding_sample_ref"
+        mock_feature.role = "source_sample"
+
+        mock_dataset_info = Mock()
+        mock_dataset_info.features = [mock_feature]
+
+        mock_config = Mock()
+        mock_config.config_name = "test_config"
+        mock_config.dataset_info = mock_dataset_info
+
+        with patch("tfbpapi.hf_cache_manager.DataCard.__init__", return_value=None):
+            cache_manager = HfCacheManager(self.repo_id, self.conn)
+            cache_manager.get_config = Mock(return_value=mock_config)  # type: ignore
+
+            # Should not raise any error
+            cache_manager._create_duckdb_table_from_files(
+                [str(parquet_file)], "test_table", "test_config"
+            )
+
+    def test_invalid_source_sample_two_parts(self, tmpdir):
+        """Test that source_sample with only 2 parts raises ValueError."""
+        # Create parquet file with invalid format (only 2 parts)
+        parquet_file = tmpdir.join("invalid_data.parquet")
+        self.conn.execute(
+            f"""
+            COPY (
+                SELECT
+                    'BrentLab/harbison_2004;CBF1_YPD' as binding_sample_ref,
+                    'gene_' || (row_number() OVER()) as target_locus_tag,
+                    random() * 100 as binding_score
+                FROM range(5)
+            ) TO '{parquet_file}' (FORMAT PARQUET)
+            """
+        )
+
+        # Create mock datacard with source_sample field
+        mock_feature = Mock()
+        mock_feature.name = "binding_sample_ref"
+        mock_feature.role = "source_sample"
+
+        mock_dataset_info = Mock()
+        mock_dataset_info.features = [mock_feature]
+
+        mock_config = Mock()
+        mock_config.config_name = "test_config"
+        mock_config.dataset_info = mock_dataset_info
+
+        with patch("tfbpapi.hf_cache_manager.DataCard.__init__", return_value=None):
+            cache_manager = HfCacheManager(self.repo_id, self.conn)
+            cache_manager.get_config = Mock(return_value=mock_config)  # type: ignore
+
+            # Should raise ValueError with clear message
+            with pytest.raises(ValueError) as exc_info:
+                cache_manager._create_duckdb_table_from_files(
+                    [str(parquet_file)], "test_table", "test_config"
+                )
+
+            error_msg = str(exc_info.value)
+            assert "Invalid format in field 'binding_sample_ref'" in error_msg
+            assert "role='source_sample'" in error_msg
+            assert "3 semicolon-separated parts" in error_msg
+            assert "BrentLab/harbison_2004;CBF1_YPD" in error_msg
+
+    def test_invalid_source_sample_one_part(self, tmpdir):
+        """Test that source_sample with only 1 part raises ValueError."""
+        # Create parquet file with invalid format (only 1 part)
+        parquet_file = tmpdir.join("invalid_data.parquet")
+        self.conn.execute(
+            f"""
+            COPY (
+                SELECT
+                    'CBF1_YPD' as binding_sample_ref,
+                    'gene_' || (row_number() OVER()) as target_locus_tag,
+                    random() * 100 as binding_score
+                FROM range(5)
+            ) TO '{parquet_file}' (FORMAT PARQUET)
+            """
+        )
+
+        # Create mock datacard with source_sample field
+        mock_feature = Mock()
+        mock_feature.name = "binding_sample_ref"
+        mock_feature.role = "source_sample"
+
+        mock_dataset_info = Mock()
+        mock_dataset_info.features = [mock_feature]
+
+        mock_config = Mock()
+        mock_config.config_name = "test_config"
+        mock_config.dataset_info = mock_dataset_info
+
+        with patch("tfbpapi.hf_cache_manager.DataCard.__init__", return_value=None):
+            cache_manager = HfCacheManager(self.repo_id, self.conn)
+            cache_manager.get_config = Mock(return_value=mock_config)  # type: ignore
+
+            # Should raise ValueError
+            with pytest.raises(ValueError) as exc_info:
+                cache_manager._create_duckdb_table_from_files(
+                    [str(parquet_file)], "test_table", "test_config"
+                )
+
+            error_msg = str(exc_info.value)
+            assert "Invalid format in field 'binding_sample_ref'" in error_msg
+            assert "CBF1_YPD" in error_msg
+
+    def test_invalid_source_sample_four_parts(self, tmpdir):
+        """Test that source_sample with 4 parts raises ValueError."""
+        # Create parquet file with invalid format (4 parts)
+        parquet_file = tmpdir.join("invalid_data.parquet")
+        self.conn.execute(
+            f"""
+            COPY (
+                SELECT
+                    'a;b;c;d' as binding_sample_ref,
+                    'gene_' || (row_number() OVER()) as target_locus_tag,
+                    random() * 100 as binding_score
+                FROM range(5)
+            ) TO '{parquet_file}' (FORMAT PARQUET)
+            """
+        )
+
+        # Create mock datacard with source_sample field
+        mock_feature = Mock()
+        mock_feature.name = "binding_sample_ref"
+        mock_feature.role = "source_sample"
+
+        mock_dataset_info = Mock()
+        mock_dataset_info.features = [mock_feature]
+
+        mock_config = Mock()
+        mock_config.config_name = "test_config"
+        mock_config.dataset_info = mock_dataset_info
+
+        with patch("tfbpapi.hf_cache_manager.DataCard.__init__", return_value=None):
+            cache_manager = HfCacheManager(self.repo_id, self.conn)
+            cache_manager.get_config = Mock(return_value=mock_config)  # type: ignore
+
+            # Should raise ValueError
+            with pytest.raises(ValueError) as exc_info:
+                cache_manager._create_duckdb_table_from_files(
+                    [str(parquet_file)], "test_table", "test_config"
+                )
+
+            error_msg = str(exc_info.value)
+            assert "Invalid format in field 'binding_sample_ref'" in error_msg
+            assert "a;b;c;d" in error_msg
+
+    def test_no_source_sample_fields(self, tmpdir):
+        """Test that validation is skipped when no source_sample fields exist."""
+        # Create parquet file with normal data
+        parquet_file = tmpdir.join("normal_data.parquet")
+        self.conn.execute(
+            f"""
+            COPY (
+                SELECT
+                    'gene_' || (row_number() OVER()) as target_locus_tag,
+                    random() * 100 as expression_value
+                FROM range(5)
+            ) TO '{parquet_file}' (FORMAT PARQUET)
+            """
+        )
+
+        # Create mock datacard without source_sample fields
+        mock_feature = Mock()
+        mock_feature.name = "target_locus_tag"
+        mock_feature.role = "target_identifier"
+
+        mock_dataset_info = Mock()
+        mock_dataset_info.features = [mock_feature]
+
+        mock_config = Mock()
+        mock_config.config_name = "test_config"
+        mock_config.dataset_info = mock_dataset_info
+
+        with patch("tfbpapi.hf_cache_manager.DataCard.__init__", return_value=None):
+            cache_manager = HfCacheManager(self.repo_id, self.conn)
+            cache_manager.get_config = Mock(return_value=mock_config)  # type: ignore
+
+            # Should not raise any error
+            cache_manager._create_duckdb_table_from_files(
+                [str(parquet_file)], "test_table", "test_config"
+            )
+
+    def test_multiple_source_sample_fields(self, tmpdir):
+        """Test validation with multiple source_sample fields."""
+        # Create parquet file with multiple composite identifier fields
+        parquet_file = tmpdir.join("multi_ref_data.parquet")
+        self.conn.execute(
+            f"""
+            COPY (
+                SELECT
+                    'BrentLab/harbison_2004;harbison_2004;CBF1_YPD'
+                    as binding_sample_ref,
+                    'BrentLab/kemmeren_2014;kemmeren_2014;sample_42'
+                    as expression_sample_ref,
+                    'gene_' || (row_number() OVER()) as target_locus_tag
+                FROM range(5)
+            ) TO '{parquet_file}' (FORMAT PARQUET)
+            """
+        )
+
+        # Create mock datacard with multiple source_sample fields
+        mock_feature1 = Mock()
+        mock_feature1.name = "binding_sample_ref"
+        mock_feature1.role = "source_sample"
+
+        mock_feature2 = Mock()
+        mock_feature2.name = "expression_sample_ref"
+        mock_feature2.role = "source_sample"
+
+        mock_dataset_info = Mock()
+        mock_dataset_info.features = [mock_feature1, mock_feature2]
+
+        mock_config = Mock()
+        mock_config.config_name = "test_config"
+        mock_config.dataset_info = mock_dataset_info
+
+        with patch("tfbpapi.hf_cache_manager.DataCard.__init__", return_value=None):
+            cache_manager = HfCacheManager(self.repo_id, self.conn)
+            cache_manager.get_config = Mock(return_value=mock_config)  # type: ignore
+
+            # Both fields are valid - should not raise
+            cache_manager._create_duckdb_table_from_files(
+                [str(parquet_file)], "test_table", "test_config"
+            )
diff --git a/tfbpapi/tests/test_metadata_config_models.py b/tfbpapi/tests/test_metadata_config_models.py
new file mode 100644
index 0000000..1697930
--- /dev/null
+++ b/tfbpapi/tests/test_metadata_config_models.py
@@ -0,0 +1,514 @@
+"""
+Tests for metadata configuration Pydantic models.
+
+Tests validation, error messages, and config loading for MetadataBuilder.
+
+"""
+
+import pytest
+import yaml  # type: ignore
+from pydantic import ValidationError
+
+from tfbpapi.models import (
+    MetadataConfig,
+    PropertyMapping,
+    RepositoryConfig,
+)
+
+
+class TestPropertyMapping:
+    """Tests for PropertyMapping model."""
+
+    def test_valid_field_level_mapping(self):
+        """Test valid field-level property mapping."""
+        mapping = PropertyMapping(field="condition", path="media.carbon_source")
+        assert mapping.field == "condition"
+        assert mapping.path == "media.carbon_source"
+
+    def test_valid_repo_level_mapping(self):
+        """Test valid repo-level property mapping (no field)."""
+        mapping = PropertyMapping(path="temperature_celsius")
+        assert mapping.field is None
+        assert mapping.path == "temperature_celsius"
+
+    def test_invalid_empty_path(self):
+        """Test that empty path is rejected."""
+        with pytest.raises(ValidationError) as exc_info:
+            PropertyMapping(path="")
+        assert "path cannot be empty" in str(exc_info.value)
+
+    def test_invalid_whitespace_path(self):
+        """Test that whitespace-only path is rejected."""
+        with pytest.raises(ValidationError) as exc_info:
+            PropertyMapping(path="   ")
+        assert "path cannot be empty" in str(exc_info.value)
+
+    def test_invalid_empty_field(self):
+        """Test that empty field string is rejected."""
+        with pytest.raises(ValidationError) as exc_info:
+            PropertyMapping(field="", path="media.carbon_source")
+        assert "field cannot be empty" in str(exc_info.value)
+
+    def test_path_whitespace_stripped(self):
+        """Test that path whitespace is stripped."""
+        mapping = PropertyMapping(path="  media.carbon_source  ")
+        assert mapping.path == "media.carbon_source"
+
+    def test_valid_field_only_mapping(self):
+        """Test valid field-only mapping (column alias)."""
+        mapping = PropertyMapping(field="condition")
+        assert mapping.field == "condition"
+        assert mapping.path is None
+
+    def test_invalid_neither_field_nor_path(self):
+        """Test that at least one of field, path, or expression is required."""
+        with pytest.raises(ValidationError) as exc_info:
+            PropertyMapping()
+        assert (
+            "At least one of 'field', 'path', or 'expression' must be specified"
+            in str(exc_info.value)
+        )
+
+    def test_valid_expression_only(self):
+        """Test valid expression-only mapping (derived field)."""
+        mapping = PropertyMapping(expression="dto_fdr < 0.05")
+        assert mapping.expression == "dto_fdr < 0.05"
+        assert mapping.field is None
+        assert mapping.path is None
+
+    def test_invalid_expression_with_field(self):
+        """Test that expression cannot be combined with field."""
+        with pytest.raises(ValidationError) as exc_info:
+            PropertyMapping(expression="dto_fdr < 0.05", field="sample_id")
+        assert "expression cannot be used with field or path" in str(exc_info.value)
+
+    def test_invalid_expression_with_path(self):
+        """Test that expression cannot be combined with path."""
+        with pytest.raises(ValidationError) as exc_info:
+            PropertyMapping(expression="dto_fdr < 0.05", path="media.carbon_source")
+        assert "expression cannot be used with field or path" in str(exc_info.value)
+
+
+class TestComparativeAnalysis:
+    """Tests for ComparativeAnalysis model."""
+
+    def test_valid_comparative_analysis(self):
+        """Test valid comparative analysis configuration."""
+        from tfbpapi.models import ComparativeAnalysis
+
+        ca = ComparativeAnalysis(
+            repo="BrentLab/yeast_comparative_analysis",
+            dataset="dto",
+            via_field="binding_id",
+        )
+        assert ca.repo == "BrentLab/yeast_comparative_analysis"
+        assert ca.dataset == "dto"
+        assert ca.via_field == "binding_id"
+
+
+class TestDatasetVirtualDBConfig:
+    """Tests for DatasetVirtualDBConfig model."""
+
+    def test_valid_config_with_sample_id(self):
+        """Test valid dataset config with sample_id."""
+        from tfbpapi.models import DatasetVirtualDBConfig, PropertyMapping
+
+        config = DatasetVirtualDBConfig(sample_id=PropertyMapping(field="sample_id"))
+        assert config.sample_id is not None
+        assert config.sample_id.field == "sample_id"
+
+    def test_valid_config_with_comparative_analyses(self):
+        """Test valid dataset config with comparative analyses."""
+        from tfbpapi.models import DatasetVirtualDBConfig
+
+        config_dict = {
+            "sample_id": {"field": "sample_id"},
+            "comparative_analyses": [
+                {
+                    "repo": "BrentLab/yeast_comparative_analysis",
+                    "dataset": "dto",
+                    "via_field": "binding_id",
+                }
+            ],
+        }
+        config = DatasetVirtualDBConfig.model_validate(config_dict)
+        assert config.sample_id is not None
+        assert len(config.comparative_analyses) == 1
+        assert (
+            config.comparative_analyses[0].repo == "BrentLab/yeast_comparative_analysis"
+        )
+
+    def test_config_with_extra_property_mappings(self):
+        """Test that extra fields are parsed as PropertyMappings."""
+        from tfbpapi.models import DatasetVirtualDBConfig
+
+        config_dict = {
+            "sample_id": {"field": "sample_id"},
+            "regulator_locus_tag": {"field": "regulator_locus_tag"},
+            "dto_fdr": {"expression": "dto_fdr < 0.05"},
+        }
+        config = DatasetVirtualDBConfig.model_validate(config_dict)
+
+        # Access extra fields via model_extra
+        assert "regulator_locus_tag" in config.model_extra
+        assert "dto_fdr" in config.model_extra
+
+
+class TestRepositoryConfig:
+    """Tests for RepositoryConfig model."""
+
+    def test_valid_repo_config_with_datasets(self):
+        """Test valid repository config with dataset section."""
+        config_data = {
+            "temperature_celsius": {"path": "temperature_celsius"},
+            "dataset": {
+                "dataset1": {
+                    "carbon_source": {
+                        "field": "condition",
+                        "path": "media.carbon_source",
+                    }
+                }
+            },
+        }
+        config = RepositoryConfig.model_validate(config_data)
+        assert config.dataset is not None
+        assert "dataset1" in config.dataset
+
+    def test_valid_repo_config_no_datasets(self):
+        """Test valid repository config without dataset section."""
+        config_data = {"temperature_celsius": {"path": "temperature_celsius"}}
+        config = RepositoryConfig.model_validate(config_data)
+        assert config.dataset is None
+
+    def test_invalid_dataset_not_dict(self):
+        """Test that dataset section must be a dict."""
+        config_data = {"dataset": "not a dict"}
+        with pytest.raises(ValidationError) as exc_info:
+            RepositoryConfig.model_validate(config_data)
+        assert "'dataset' key must contain a dict" in str(exc_info.value)
+
+    def test_valid_field_only_property(self):
+        """Test that field-only properties are valid (column aliases)."""
+        config_data = {
+            "dataset": {"dataset1": {"carbon_source": {"field": "condition"}}}
+        }
+        config = RepositoryConfig.model_validate(config_data)
+        assert config.dataset is not None
+        assert "dataset1" in config.dataset
+        # Access extra field via model_extra
+        dataset_config = config.dataset["dataset1"]
+        assert "carbon_source" in dataset_config.model_extra
+        assert dataset_config.model_extra["carbon_source"].field == "condition"
+        assert dataset_config.model_extra["carbon_source"].path is None
+
+    def test_valid_repo_wide_field_only_property(self):
+        """Test that repo-wide field-only properties are valid."""
+        config_data = {"environmental_condition": {"field": "condition"}}
+        config = RepositoryConfig.model_validate(config_data)
+        assert "environmental_condition" in config.properties
+        assert config.properties["environmental_condition"].field == "condition"
+        assert config.properties["environmental_condition"].path is None
+
+
+class TestMetadataConfig:
+    """Tests for MetadataConfig model."""
+
+    def test_valid_config_with_aliases(self, tmp_path):
+        """Test valid config with factor aliases."""
+        config_data = {
+            "factor_aliases": {
+                "carbon_source": {
+                    "glucose": ["D-glucose", "dextrose"],
+                    "galactose": ["D-galactose", "Galactose"],
+                }
+            },
+            "repositories": {
+                "BrentLab/test": {
+                    "dataset": {
+                        "test": {"carbon_source": {"path": "media.carbon_source"}}
+                    }
+                }
+            },
+        }
+
+        config_path = tmp_path / "config.yaml"
+        with open(config_path, "w") as f:
+            yaml.dump(config_data, f)
+
+        config = MetadataConfig.from_yaml(config_path)
+        assert "carbon_source" in config.factor_aliases
+        assert "glucose" in config.factor_aliases["carbon_source"]
+        assert config.factor_aliases["carbon_source"]["glucose"] == [
+            "D-glucose",
+            "dextrose",
+        ]
+
+    def test_valid_config_without_aliases(self, tmp_path):
+        """Test that factor_aliases is optional."""
+        config_data = {
+            "repositories": {
+                "BrentLab/test": {
+                    "dataset": {
+                        "test": {"carbon_source": {"path": "media.carbon_source"}}
+                    }
+                }
+            }
+        }
+
+        config_path = tmp_path / "config.yaml"
+        with open(config_path, "w") as f:
+            yaml.dump(config_data, f)
+
+        config = MetadataConfig.from_yaml(config_path)
+        assert config.factor_aliases == {}
+
+    def test_valid_config_empty_aliases(self, tmp_path):
+        """Test that empty factor_aliases dict is allowed."""
+        config_data = {
+            "factor_aliases": {},
+            "repositories": {
+                "BrentLab/test": {
+                    "dataset": {
+                        "test": {"carbon_source": {"path": "media.carbon_source"}}
+                    }
+                }
+            },
+        }
+
+        config_path = tmp_path / "config.yaml"
+        with open(config_path, "w") as f:
+            yaml.dump(config_data, f)
+
+        config = MetadataConfig.from_yaml(config_path)
+        assert config.factor_aliases == {}
+
+    def test_invalid_alias_not_dict(self):
+        """Test that property aliases must be a dict."""
+        config_data = {
+            "factor_aliases": {
+                "carbon_source": ["D-glucose"]  # Should be dict, not list
+            },
+            "repositories": {
+                "BrentLab/test": {"dataset": {"test": {"prop": {"path": "path"}}}}
+            },
+        }
+
+        with pytest.raises(ValidationError) as exc_info:
+            MetadataConfig.model_validate(config_data)
+        # Pydantic catches this with type validation before our custom validator
+        assert "valid dictionary" in str(exc_info.value) or "must be a dict" in str(
+            exc_info.value
+        )
+
+    def test_invalid_alias_value_not_list(self):
+        """Test that alias values must be lists."""
+        config_data = {
+            "factor_aliases": {
+                "carbon_source": {"glucose": "D-glucose"}  # Should be list, not string
+            },
+            "repositories": {
+                "BrentLab/test": {"dataset": {"test": {"prop": {"path": "path"}}}}
+            },
+        }
+
+        with pytest.raises(ValidationError) as exc_info:
+            MetadataConfig.model_validate(config_data)
+        # Pydantic catches this with type validation before our custom validator
+        assert "valid list" in str(exc_info.value) or "must map to a list" in str(
+            exc_info.value
+        )
+
+    def test_invalid_alias_empty_list(self):
+        """Test that alias value lists cannot be empty."""
+        config_data = {
+            "factor_aliases": {"carbon_source": {"glucose": []}},
+            "repositories": {
+                "BrentLab/test": {"dataset": {"test": {"prop": {"path": "path"}}}}
+            },
+        }
+
+        with pytest.raises(ValidationError) as exc_info:
+            MetadataConfig.model_validate(config_data)
+        assert "cannot have empty value list" in str(exc_info.value)
+
+    def test_aliases_allow_numeric_values(self):
+        """Test that aliases can map to numeric values."""
+        config_data = {
+            "factor_aliases": {
+                "temperature_celsius": {
+                    "thirty": [30, "30"],  # Integer and string
+                    "thirty_seven": [37, 37.0],  # Integer and float
+                }
+            },
+            "repositories": {
+                "BrentLab/test": {
+                    "dataset": {
+                        "test": {"temperature": {"path": "temperature_celsius"}}
+                    }
+                }
+            },
+        }
+
+        config = MetadataConfig.model_validate(config_data)
+        assert config.factor_aliases["temperature_celsius"]["thirty"] == [30, "30"]
+        assert config.factor_aliases["temperature_celsius"]["thirty_seven"] == [
+            37,
+            37.0,
+        ]
+
+    def test_invalid_no_repositories(self):
+        """Test that at least one repository is required."""
+        config_data = {"factor_aliases": {"carbon_source": {"glucose": ["D-glucose"]}}}
+        with pytest.raises(ValidationError) as exc_info:
+            MetadataConfig.model_validate(config_data)
+        assert "at least one repository" in str(exc_info.value)
+
+    def test_get_repository_config(self, tmp_path):
+        """Test get_repository_config method."""
+        config_data = {
+            "factor_aliases": {"carbon_source": {"glucose": ["D-glucose"]}},
+            "repositories": {
+                "BrentLab/harbison_2004": {
+                    "dataset": {
+                        "harbison_2004": {
+                            "carbon_source": {
+                                "field": "condition",
+                                "path": "media.carbon_source",
+                            }
+                        }
+                    }
+                }
+            },
+        }
+
+        config_path = tmp_path / "config.yaml"
+        with open(config_path, "w") as f:
+            yaml.dump(config_data, f)
+
+        config = MetadataConfig.from_yaml(config_path)
+        repo_config = config.get_repository_config("BrentLab/harbison_2004")
+        assert repo_config is not None
+        assert isinstance(repo_config, RepositoryConfig)
+        assert repo_config.dataset is not None
+        assert "harbison_2004" in repo_config.dataset
+
+        # Non-existent repo
+        assert config.get_repository_config("BrentLab/nonexistent") is None
+
+    def test_get_property_mappings(self, tmp_path):
+        """Test get_property_mappings method."""
+        config_data = {
+            "factor_aliases": {
+                "carbon_source": {"glucose": ["D-glucose"]},
+                "temperature": {"thirty": [30]},
+            },
+            "repositories": {
+                "BrentLab/kemmeren_2014": {
+                    "temperature": {"path": "temperature_celsius"},  # Repo-wide
+                    "dataset": {
+                        "kemmeren_2014": {
+                            "carbon_source": {"path": "media.carbon_source"}
+                        }
+                    },
+                }
+            },
+        }
+
+        config_path = tmp_path / "config.yaml"
+        with open(config_path, "w") as f:
+            yaml.dump(config_data, f)
+
+        config = MetadataConfig.from_yaml(config_path)
+        mappings = config.get_property_mappings(
+            "BrentLab/kemmeren_2014", "kemmeren_2014"
+        )
+
+        # Should have both repo-wide and dataset-specific
+        assert "temperature" in mappings
+        assert "carbon_source" in mappings
+        # Mappings are PropertyMapping objects, not dicts
+        assert isinstance(mappings["temperature"], PropertyMapping)
+        assert mappings["temperature"].path == "temperature_celsius"
+        assert mappings["carbon_source"].path == "media.carbon_source"
+
+    def test_dataset_specific_overrides_repo_wide(self, tmp_path):
+        """Test that dataset-specific mappings override repo-wide."""
+        config_data = {
+            "repositories": {
+                "BrentLab/test": {
+                    "carbon_source": {"path": "repo.level.path"},  # Repo-wide
+                    "dataset": {
+                        "test_dataset": {
+                            "carbon_source": {"path": "dataset.level.path"}  # Override
+                        }
+                    },
+                }
+            },
+        }
+
+        config_path = tmp_path / "config.yaml"
+        with open(config_path, "w") as f:
+            yaml.dump(config_data, f)
+
+        config = MetadataConfig.from_yaml(config_path)
+        mappings = config.get_property_mappings("BrentLab/test", "test_dataset")
+
+        # Dataset-specific should win
+        assert mappings["carbon_source"].path == "dataset.level.path"
+
+    def test_file_not_found(self):
+        """Test that FileNotFoundError is raised for missing file."""
+        with pytest.raises(FileNotFoundError):
+            MetadataConfig.from_yaml("/nonexistent/path/config.yaml")
+
+    def test_invalid_yaml_structure(self, tmp_path):
+        """Test that non-dict YAML is rejected."""
+        config_path = tmp_path / "config.yaml"
+        with open(config_path, "w") as f:
+            f.write("- not\\n- a\\n- dict\\n")
+
+        with pytest.raises(ValueError) as exc_info:
+            MetadataConfig.from_yaml(config_path)
+        assert "Configuration must be a YAML dict" in str(exc_info.value)
+
+    def test_nested_alias_property_names(self, tmp_path):
+        """Test that alias property names can use dot notation."""
+        config_data = {
+            "factor_aliases": {
+                "carbon_source": {"glucose": ["D-glucose"]},
+                "carbon_source.concentration_percent": {"two_percent": [2]},
+                "carbon_source.specifications": {"no_aa": ["without_amino_acids"]},
+            },
+            "repositories": {
+                "BrentLab/test": {
+                    "dataset": {
+                        "test": {
+                            "carbon_source": {
+                                "field": "condition",
+                                "path": "media.carbon_source",
+                            }
+                        }
+                    }
+                }
+            },
+        }
+
+        config_path = tmp_path / "config.yaml"
+        with open(config_path, "w") as f:
+            yaml.dump(config_data, f)
+
+        config = MetadataConfig.from_yaml(config_path)
+
+        # All alias properties should be preserved
+        assert "carbon_source" in config.factor_aliases
+        assert "carbon_source.concentration_percent" in config.factor_aliases
+        assert "carbon_source.specifications" in config.factor_aliases
+
+        # Values should be correct
+        assert config.factor_aliases["carbon_source"]["glucose"] == ["D-glucose"]
+        assert config.factor_aliases["carbon_source.concentration_percent"][
+            "two_percent"
+        ] == [2]
+        assert config.factor_aliases["carbon_source.specifications"]["no_aa"] == [
+            "without_amino_acids"
+        ]
diff --git a/tfbpapi/tests/test_metric_arrays.py b/tfbpapi/tests/test_metric_arrays.py
deleted file mode 100644
index 45a8203..0000000
--- a/tfbpapi/tests/test_metric_arrays.py
+++ /dev/null
@@ -1,194 +0,0 @@
-import logging
-
-import numpy as np
-import pandas as pd
-import pytest
-
-from tfbpapi.metric_arrays import metric_arrays
-
-
-def test_metric_arrays_expected_result(caplog):
-    res_dict = {
-        "metadata": pd.DataFrame(
-            {
-                "id": ["A", "B"],
-                "regulator_symbol": ["tf1", "tf2"],
-            }
-        ),
-        "data": {
-            "A": pd.DataFrame(
-                {
-                    "target_symbol": ["gene1", "gene2"],
-                    "metric1": [1.0, 2.0],
-                }
-            ),
-            "B": pd.DataFrame(
-                {
-                    "target_symbol": ["gene2", "gene1"],
-                    "metric1": [3.0, 4.0],
-                }
-            ),
-        },
-    }
-    metrics_dict = {"metric1": np.mean}
-
-    # Run function
-    with caplog.at_level(logging.WARNING):
-        output_dict = metric_arrays(res_dict, metrics_dict)
-
-    # Check expected result for metric1
-    # order based on the index of output_dict['metrics1'] since the ordering of
-    # the rows is random due to the set operation
-    expected_df = pd.DataFrame(
-        {"tf1": [1.0, 2.0], "tf2": [4.0, 3.0]},
-        index=pd.Index(["gene1", "gene2"], name="target_symbol"),
-    ).reindex(output_dict["metric1"].index)
-
-    pd.testing.assert_frame_equal(output_dict["metric1"], expected_df)
-
-    # Check no warning since there are no incomplete rows or columns
-    assert "incomplete" not in caplog.text
-
-
-def test_metric_arrays_missing_data(caplog):
-    res_dict = {
-        "metadata": pd.DataFrame(
-            {
-                "id": ["A", "B"],
-                "regulator_symbol": ["tf1", "tf2"],
-            }
-        ),
-        "data": {
-            "A": pd.DataFrame(
-                {
-                    "target_symbol": ["gene1", "gene2"],
-                    "metric1": [1.0, 2.0],
-                }
-            ),
-            "B": pd.DataFrame(
-                {
-                    "target_symbol": ["gene1", "gene3"],
-                    "metric1": [5.0, 3.0],
-                }
-            ),
-        },
-    }
-    metrics_dict = {"metric1": np.mean}
-
-    # Run function with incomplete row dropping
-    with caplog.at_level(logging.WARNING):
-        output_dict1 = metric_arrays(res_dict, metrics_dict, drop_incomplete_rows=False)
-
-    # Check result for metric1 with "gene2" dropped due to missing data in B
-    # sort based on output_dict['metric1'] index since
-    # the ordering of the rows is random
-    expected_df1 = pd.DataFrame(
-        {"tf1": [1.0, 2.0, np.nan], "tf2": [5.0, np.nan, 3.0]},
-        index=pd.Index(["gene1", "gene2", "gene3"], name="target_symbol"),
-    ).reindex(output_dict1["metric1"].index)
-
-    pd.testing.assert_frame_equal(output_dict1["metric1"], expected_df1)
-
-    # Run function with incomplete row dropping
-    with caplog.at_level(logging.WARNING):
-        output_dict2 = metric_arrays(res_dict, metrics_dict, drop_incomplete_rows=True)
-
-    # Check result for metric1 with "gene2" dropped due to missing data in B
-    expected_df2 = pd.DataFrame(
-        {"tf1": [1.0], "tf2": [5.0]},
-        index=pd.Index(["gene1"], name="target_symbol"),
-    ).reindex(output_dict2["metric1"].index)
-
-    pd.testing.assert_frame_equal(output_dict2["metric1"], expected_df2)
-
-    # Check warning for incomplete rows
-    assert "2 rows and 0 columns with incomplete records were dropped" in caplog.text
-
-
-def test_metric_arrays_missing_keys():
-    res_dict = {
-        "metadata": pd.DataFrame(
-            {"id": ["A"], "target_symbol": ["gene1"], "regulator_symbol": ["tf1"]}
-        ),
-        # Missing data for id "A"
-        "data": {},
-    }
-    metrics_dict = {"metric1": np.mean}
-
-    # Expect a KeyError for missing data keys
-    with pytest.raises(KeyError, match="Data dictionary must have the same keys"):
-        metric_arrays(res_dict, metrics_dict)
-
-
-def test_metric_arrays_non_dataframe_value():
-    res_dict = {
-        "metadata": pd.DataFrame(
-            {"id": ["A"], "target_symbol": ["gene1"], "regulator_symbol": ["tf1"]}
-        ),
-        "data": {"A": [1, 2, 3]},  # Invalid non-DataFrame entry
-    }
-    metrics_dict = {"metric1": np.mean}
-
-    # Expect ValueError when data dictionary values are not DataFrames
-    with pytest.raises(
-        ValueError, match="All values in the data dictionary must be DataFrames"
-    ):
-        metric_arrays(res_dict, metrics_dict)
-
-
-def test_metric_arrays_duplicate_rows_without_dedup_func():
-    res_dict = {
-        "metadata": pd.DataFrame(
-            {
-                "id": ["A"],
-                "target_symbol": ["gene1"],
-                "regulator_symbol": ["tf1"],
-            }
-        ),
-        "data": {
-            "A": pd.DataFrame(
-                {
-                    "target_symbol": ["gene1", "gene1"],
-                    "metric1": [1.0, 2.0],
-                }
-            ),
-        },
-    }
-    metrics_dict = {"metric1": None}  # No deduplication function provided
-
-    # Expect a ValueError due to duplicate rows without deduplication function
-    #
-    with pytest.raises(
-        ValueError, match="Duplicate entries found for metric 'metric1'"
-    ):
-        metric_arrays(res_dict, metrics_dict)  # type: ignore
-
-
-def test_metric_arrays_deduplication_function():
-    res_dict = {
-        "metadata": pd.DataFrame(
-            {
-                "id": ["A"],
-                "target_symbol": ["gene1"],
-                "regulator_symbol": ["tf1"],
-            }
-        ),
-        "data": {
-            "A": pd.DataFrame(
-                {
-                    "target_symbol": ["gene1", "gene1"],
-                    "metric1": [1.0, 2.0],
-                }
-            ),
-        },
-    }
-    metrics_dict = {"metric1": np.mean}  # Deduplication function to average duplicates
-
-    # Run function with deduplication
-    output_dict = metric_arrays(res_dict, metrics_dict)
-
-    # Check that duplicates were averaged correctly
-    expected_df = pd.DataFrame(
-        {"tf1": [1.5]}, pd.Index(["gene1"], name="target_symbol")
-    )
-    pd.testing.assert_frame_equal(output_dict["metric1"], expected_df)
diff --git a/tfbpapi/tests/test_models.py b/tfbpapi/tests/test_models.py
new file mode 100644
index 0000000..1771c4d
--- /dev/null
+++ b/tfbpapi/tests/test_models.py
@@ -0,0 +1,577 @@
+"""
+Tests for datainfo Pydantic models.
+
+These tests validate the minimal, flexible models that parse HuggingFace dataset cards.
+
+"""
+
+import pytest
+from pydantic import ValidationError
+
+from tfbpapi.models import (
+    DataFileInfo,
+    DatasetCard,
+    DatasetConfig,
+    DatasetInfo,
+    DatasetType,
+    ExtractedMetadata,
+    FeatureInfo,
+    MetadataRelationship,
+    PartitioningInfo,
+)
+
+
+class TestDatasetType:
+    """Tests for DatasetType enum."""
+
+    def test_dataset_type_values(self):
+        """Test that all expected dataset types are defined."""
+        assert DatasetType.GENOMIC_FEATURES == "genomic_features"
+        assert DatasetType.ANNOTATED_FEATURES == "annotated_features"
+        assert DatasetType.GENOME_MAP == "genome_map"
+        assert DatasetType.METADATA == "metadata"
+        assert DatasetType.COMPARATIVE == "comparative"
+
+    def test_dataset_type_from_string(self):
+        """Test creating DatasetType from string."""
+        dt = DatasetType("genomic_features")
+        assert dt == DatasetType.GENOMIC_FEATURES
+
+    def test_invalid_dataset_type(self):
+        """Test that invalid dataset type raises error."""
+        with pytest.raises(ValueError):
+            DatasetType("invalid_type")
+
+
+class TestFeatureInfo:
+    """Tests for FeatureInfo model."""
+
+    def test_minimal_feature_info(self):
+        """Test creating FeatureInfo with minimal fields."""
+        feature = FeatureInfo(
+            name="gene_id", dtype="string", description="Gene identifier"
+        )
+        assert feature.name == "gene_id"
+        assert feature.dtype == "string"
+        assert feature.description == "Gene identifier"
+        assert feature.role is None
+        assert feature.definitions is None
+
+    def test_feature_info_with_role(self):
+        """Test FeatureInfo with role field."""
+        feature = FeatureInfo(
+            name="condition",
+            dtype="string",
+            description="Experimental condition",
+            role="experimental_condition",
+        )
+        assert feature.role == "experimental_condition"
+
+    def test_feature_info_with_definitions(self):
+        """Test FeatureInfo with definitions for experimental_condition."""
+        feature = FeatureInfo(
+            name="condition",
+            dtype={"class_label": {"names": ["control", "treated"]}},
+            description="Treatment condition",
+            role="experimental_condition",
+            definitions={
+                "control": {"temperature_celsius": 30},
+                "treated": {"temperature_celsius": 37},
+            },
+        )
+        assert feature.definitions is not None
+        assert "control" in feature.definitions
+        assert feature.definitions["control"]["temperature_celsius"] == 30
+
+    def test_feature_info_with_dict_dtype(self):
+        """Test FeatureInfo with class_label dtype."""
+        feature = FeatureInfo(
+            name="category",
+            dtype={"class_label": {"names": ["A", "B", "C"]}},
+            description="Categorical field",
+        )
+        assert isinstance(feature.dtype, dict)
+        assert "class_label" in feature.dtype
+
+
+class TestPartitioningInfo:
+    """Tests for PartitioningInfo model."""
+
+    def test_default_partitioning_info(self):
+        """Test PartitioningInfo with defaults."""
+        partitioning = PartitioningInfo()
+        assert partitioning.enabled is False
+        assert partitioning.partition_by is None
+        assert partitioning.path_template is None
+
+    def test_enabled_partitioning_info(self):
+        """Test PartitioningInfo with partitioning enabled."""
+        partitioning = PartitioningInfo(
+            enabled=True,
+            partition_by=["accession"],
+            path_template="data/accession={accession}/*.parquet",
+        )
+        assert partitioning.enabled is True
+        assert partitioning.partition_by == ["accession"]
+        assert partitioning.path_template == "data/accession={accession}/*.parquet"
+
+
+class TestDataFileInfo:
+    """Tests for DataFileInfo model."""
+
+    def test_default_data_file_info(self):
+        """Test DataFileInfo with default split."""
+        data_file = DataFileInfo(path="data.parquet")
+        assert data_file.split == "train"
+        assert data_file.path == "data.parquet"
+
+    def test_custom_data_file_info(self):
+        """Test DataFileInfo with custom split."""
+        data_file = DataFileInfo(split="test", path="test_data.parquet")
+        assert data_file.split == "test"
+        assert data_file.path == "test_data.parquet"
+
+
+class TestDatasetInfo:
+    """Tests for DatasetInfo model."""
+
+    def test_minimal_dataset_info(self):
+        """Test DatasetInfo with minimal features."""
+        dataset_info = DatasetInfo(
+            features=[
+                FeatureInfo(
+                    name="gene_id", dtype="string", description="Gene identifier"
+                )
+            ]
+        )
+        assert len(dataset_info.features) == 1
+        assert dataset_info.partitioning is None
+
+    def test_dataset_info_with_partitioning(self):
+        """Test DatasetInfo with partitioning."""
+        dataset_info = DatasetInfo(
+            features=[
+                FeatureInfo(name="chr", dtype="string", description="Chromosome"),
+                FeatureInfo(name="pos", dtype="int32", description="Position"),
+            ],
+            partitioning=PartitioningInfo(enabled=True, partition_by=["chr"]),
+        )
+        assert len(dataset_info.features) == 2
+        assert dataset_info.partitioning.enabled is True  # type: ignore
+
+
+class TestDatasetConfig:
+    """Tests for DatasetConfig model."""
+
+    def test_minimal_dataset_config(self):
+        """Test DatasetConfig with minimal required fields."""
+        config = DatasetConfig(
+            config_name="test_data",
+            description="Test dataset",
+            dataset_type=DatasetType.ANNOTATED_FEATURES,
+            data_files=[DataFileInfo(path="data.parquet")],
+            dataset_info=DatasetInfo(
+                features=[FeatureInfo(name="id", dtype="string", description="ID")]
+            ),
+        )
+        assert config.config_name == "test_data"
+        assert config.dataset_type == DatasetType.ANNOTATED_FEATURES
+        assert config.default is False
+        assert config.applies_to is None
+        assert config.metadata_fields is None
+
+    def test_dataset_config_with_applies_to(self):
+        """Test DatasetConfig with applies_to for metadata."""
+        config = DatasetConfig(
+            config_name="metadata",
+            description="Metadata",
+            dataset_type=DatasetType.METADATA,
+            applies_to=["data_config_1", "data_config_2"],
+            data_files=[DataFileInfo(path="metadata.parquet")],
+            dataset_info=DatasetInfo(
+                features=[
+                    FeatureInfo(
+                        name="sample_id", dtype="string", description="Sample ID"
+                    )
+                ]
+            ),
+        )
+        assert config.applies_to == ["data_config_1", "data_config_2"]
+
+    def test_dataset_config_applies_to_validation_error(self):
+        """Test that applies_to raises error for non-metadata configs."""
+        with pytest.raises(ValidationError):
+            DatasetConfig(
+                config_name="data",
+                description="Data",
+                dataset_type=DatasetType.ANNOTATED_FEATURES,
+                applies_to=["other_config"],
+                data_files=[DataFileInfo(path="data.parquet")],
+                dataset_info=DatasetInfo(
+                    features=[FeatureInfo(name="id", dtype="string", description="ID")]
+                ),
+            )
+
+    def test_dataset_config_with_metadata_fields(self):
+        """Test DatasetConfig with metadata_fields."""
+        config = DatasetConfig(
+            config_name="data",
+            description="Data",
+            dataset_type=DatasetType.ANNOTATED_FEATURES,
+            metadata_fields=["regulator_symbol", "condition"],
+            data_files=[DataFileInfo(path="data.parquet")],
+            dataset_info=DatasetInfo(
+                features=[
+                    FeatureInfo(
+                        name="regulator_symbol", dtype="string", description="TF symbol"
+                    ),
+                    FeatureInfo(
+                        name="condition", dtype="string", description="Condition"
+                    ),
+                ]
+            ),
+        )
+        assert config.metadata_fields == ["regulator_symbol", "condition"]
+
+    def test_dataset_config_empty_metadata_fields_error(self):
+        """Test that empty metadata_fields raises error."""
+        with pytest.raises(ValidationError):
+            DatasetConfig(
+                config_name="data",
+                description="Data",
+                dataset_type=DatasetType.ANNOTATED_FEATURES,
+                metadata_fields=[],
+                data_files=[DataFileInfo(path="data.parquet")],
+                dataset_info=DatasetInfo(
+                    features=[FeatureInfo(name="id", dtype="string", description="ID")]
+                ),
+            )
+
+    def test_dataset_config_accepts_extra_fields(self):
+        """Test that DatasetConfig accepts extra fields like experimental_conditions."""
+        config_data = {
+            "config_name": "data",
+            "description": "Data",
+            "dataset_type": "annotated_features",
+            "experimental_conditions": {
+                "temperature_celsius": 30,
+                "media": {"name": "YPD"},
+            },
+            "data_files": [{"path": "data.parquet"}],
+            "dataset_info": {
+                "features": [{"name": "id", "dtype": "string", "description": "ID"}]
+            },
+        }
+        config = DatasetConfig(**config_data)
+        assert hasattr(config, "model_extra")
+        assert "experimental_conditions" in config.model_extra
+
+
+class TestDatasetCard:
+    """Tests for DatasetCard model."""
+
+    def test_minimal_dataset_card(self):
+        """Test DatasetCard with minimal structure."""
+        card = DatasetCard(
+            configs=[
+                DatasetConfig(
+                    config_name="data",
+                    description="Data",
+                    dataset_type=DatasetType.ANNOTATED_FEATURES,
+                    data_files=[DataFileInfo(path="data.parquet")],
+                    dataset_info=DatasetInfo(
+                        features=[
+                            FeatureInfo(name="id", dtype="string", description="ID")
+                        ]
+                    ),
+                )
+            ]
+        )
+        assert len(card.configs) == 1
+
+    def test_dataset_card_accepts_extra_fields(self):
+        """Test that DatasetCard accepts extra top-level fields."""
+        card_data = {
+            "license": "mit",
+            "pretty_name": "Test Dataset",
+            "tags": ["biology", "genomics"],
+            "experimental_conditions": {"strain_background": "BY4741"},
+            "configs": [
+                {
+                    "config_name": "data",
+                    "description": "Data",
+                    "dataset_type": "annotated_features",
+                    "data_files": [{"path": "data.parquet"}],
+                    "dataset_info": {
+                        "features": [
+                            {"name": "id", "dtype": "string", "description": "ID"}
+                        ]
+                    },
+                }
+            ],
+        }
+        card = DatasetCard(**card_data)
+        assert hasattr(card, "model_extra")
+        assert "license" in card.model_extra
+        assert "experimental_conditions" in card.model_extra
+
+    def test_empty_configs_error(self):
+        """Test that empty configs raises error."""
+        with pytest.raises(ValidationError):
+            DatasetCard(configs=[])
+
+    def test_duplicate_config_names_error(self):
+        """Test that duplicate config names raises error."""
+        with pytest.raises(ValidationError):
+            DatasetCard(
+                configs=[
+                    DatasetConfig(
+                        config_name="data",
+                        description="Data 1",
+                        dataset_type=DatasetType.ANNOTATED_FEATURES,
+                        data_files=[DataFileInfo(path="data1.parquet")],
+                        dataset_info=DatasetInfo(
+                            features=[
+                                FeatureInfo(name="id", dtype="string", description="ID")
+                            ]
+                        ),
+                    ),
+                    DatasetConfig(
+                        config_name="data",
+                        description="Data 2",
+                        dataset_type=DatasetType.ANNOTATED_FEATURES,
+                        data_files=[DataFileInfo(path="data2.parquet")],
+                        dataset_info=DatasetInfo(
+                            features=[
+                                FeatureInfo(name="id", dtype="string", description="ID")
+                            ]
+                        ),
+                    ),
+                ]
+            )
+
+    def test_multiple_default_configs_error(self):
+        """Test that multiple default configs raises error."""
+        with pytest.raises(ValidationError):
+            DatasetCard(
+                configs=[
+                    DatasetConfig(
+                        config_name="data1",
+                        description="Data 1",
+                        dataset_type=DatasetType.ANNOTATED_FEATURES,
+                        default=True,
+                        data_files=[DataFileInfo(path="data1.parquet")],
+                        dataset_info=DatasetInfo(
+                            features=[
+                                FeatureInfo(name="id", dtype="string", description="ID")
+                            ]
+                        ),
+                    ),
+                    DatasetConfig(
+                        config_name="data2",
+                        description="Data 2",
+                        dataset_type=DatasetType.ANNOTATED_FEATURES,
+                        default=True,
+                        data_files=[DataFileInfo(path="data2.parquet")],
+                        dataset_info=DatasetInfo(
+                            features=[
+                                FeatureInfo(name="id", dtype="string", description="ID")
+                            ]
+                        ),
+                    ),
+                ]
+            )
+
+    def test_get_config_by_name(self):
+        """Test get_config_by_name method."""
+        card = DatasetCard(
+            configs=[
+                DatasetConfig(
+                    config_name="data1",
+                    description="Data 1",
+                    dataset_type=DatasetType.ANNOTATED_FEATURES,
+                    data_files=[DataFileInfo(path="data1.parquet")],
+                    dataset_info=DatasetInfo(
+                        features=[
+                            FeatureInfo(name="id", dtype="string", description="ID")
+                        ]
+                    ),
+                ),
+                DatasetConfig(
+                    config_name="data2",
+                    description="Data 2",
+                    dataset_type=DatasetType.METADATA,
+                    data_files=[DataFileInfo(path="data2.parquet")],
+                    dataset_info=DatasetInfo(
+                        features=[
+                            FeatureInfo(name="id", dtype="string", description="ID")
+                        ]
+                    ),
+                ),
+            ]
+        )
+        config = card.get_config_by_name("data1")
+        assert config is not None
+        assert config.config_name == "data1"
+        assert card.get_config_by_name("nonexistent") is None
+
+    def test_get_configs_by_type(self):
+        """Test get_configs_by_type method."""
+        card = DatasetCard(
+            configs=[
+                DatasetConfig(
+                    config_name="data",
+                    description="Data",
+                    dataset_type=DatasetType.ANNOTATED_FEATURES,
+                    data_files=[DataFileInfo(path="data.parquet")],
+                    dataset_info=DatasetInfo(
+                        features=[
+                            FeatureInfo(name="id", dtype="string", description="ID")
+                        ]
+                    ),
+                ),
+                DatasetConfig(
+                    config_name="metadata",
+                    description="Metadata",
+                    dataset_type=DatasetType.METADATA,
+                    data_files=[DataFileInfo(path="metadata.parquet")],
+                    dataset_info=DatasetInfo(
+                        features=[
+                            FeatureInfo(name="id", dtype="string", description="ID")
+                        ]
+                    ),
+                ),
+            ]
+        )
+        data_configs = card.get_configs_by_type(DatasetType.ANNOTATED_FEATURES)
+        assert len(data_configs) == 1
+        assert data_configs[0].config_name == "data"
+
+    def test_get_default_config(self):
+        """Test get_default_config method."""
+        card = DatasetCard(
+            configs=[
+                DatasetConfig(
+                    config_name="data1",
+                    description="Data 1",
+                    dataset_type=DatasetType.ANNOTATED_FEATURES,
+                    data_files=[DataFileInfo(path="data1.parquet")],
+                    dataset_info=DatasetInfo(
+                        features=[
+                            FeatureInfo(name="id", dtype="string", description="ID")
+                        ]
+                    ),
+                ),
+                DatasetConfig(
+                    config_name="data2",
+                    description="Data 2",
+                    dataset_type=DatasetType.ANNOTATED_FEATURES,
+                    default=True,
+                    data_files=[DataFileInfo(path="data2.parquet")],
+                    dataset_info=DatasetInfo(
+                        features=[
+                            FeatureInfo(name="id", dtype="string", description="ID")
+                        ]
+                    ),
+                ),
+            ]
+        )
+        default = card.get_default_config()
+        assert default is not None
+        assert default.config_name == "data2"
+
+    def test_get_data_configs(self):
+        """Test get_data_configs method."""
+        card = DatasetCard(
+            configs=[
+                DatasetConfig(
+                    config_name="data",
+                    description="Data",
+                    dataset_type=DatasetType.ANNOTATED_FEATURES,
+                    data_files=[DataFileInfo(path="data.parquet")],
+                    dataset_info=DatasetInfo(
+                        features=[
+                            FeatureInfo(name="id", dtype="string", description="ID")
+                        ]
+                    ),
+                ),
+                DatasetConfig(
+                    config_name="metadata",
+                    description="Metadata",
+                    dataset_type=DatasetType.METADATA,
+                    data_files=[DataFileInfo(path="metadata.parquet")],
+                    dataset_info=DatasetInfo(
+                        features=[
+                            FeatureInfo(name="id", dtype="string", description="ID")
+                        ]
+                    ),
+                ),
+            ]
+        )
+        data_configs = card.get_data_configs()
+        assert len(data_configs) == 1
+        assert data_configs[0].dataset_type != DatasetType.METADATA
+
+    def test_get_metadata_configs(self):
+        """Test get_metadata_configs method."""
+        card = DatasetCard(
+            configs=[
+                DatasetConfig(
+                    config_name="data",
+                    description="Data",
+                    dataset_type=DatasetType.ANNOTATED_FEATURES,
+                    data_files=[DataFileInfo(path="data.parquet")],
+                    dataset_info=DatasetInfo(
+                        features=[
+                            FeatureInfo(name="id", dtype="string", description="ID")
+                        ]
+                    ),
+                ),
+                DatasetConfig(
+                    config_name="metadata",
+                    description="Metadata",
+                    dataset_type=DatasetType.METADATA,
+                    data_files=[DataFileInfo(path="metadata.parquet")],
+                    dataset_info=DatasetInfo(
+                        features=[
+                            FeatureInfo(name="id", dtype="string", description="ID")
+                        ]
+                    ),
+                ),
+            ]
+        )
+        metadata_configs = card.get_metadata_configs()
+        assert len(metadata_configs) == 1
+        assert metadata_configs[0].dataset_type == DatasetType.METADATA
+
+
+class TestExtractedMetadata:
+    """Tests for ExtractedMetadata model."""
+
+    def test_extracted_metadata_creation(self):
+        """Test creating ExtractedMetadata."""
+        metadata = ExtractedMetadata(
+            config_name="test_config",
+            field_name="regulator_symbol",
+            values={"CBF1", "GAL4", "GCN4"},
+            extraction_method="distinct",
+        )
+        assert metadata.config_name == "test_config"
+        assert metadata.field_name == "regulator_symbol"
+        assert len(metadata.values) == 3
+        assert "CBF1" in metadata.values
+
+
+class TestMetadataRelationship:
+    """Tests for MetadataRelationship model."""
+
+    def test_metadata_relationship_creation(self):
+        """Test creating MetadataRelationship."""
+        relationship = MetadataRelationship(
+            data_config="binding_data",
+            metadata_config="experiment_metadata",
+            relationship_type="explicit",
+        )
+        assert relationship.data_config == "binding_data"
+        assert relationship.metadata_config == "experiment_metadata"
+        assert relationship.relationship_type == "explicit"
diff --git a/tfbpapi/tests/test_rank_transforms.py b/tfbpapi/tests/test_rank_transforms.py
deleted file mode 100644
index 31dbeaa..0000000
--- a/tfbpapi/tests/test_rank_transforms.py
+++ /dev/null
@@ -1,80 +0,0 @@
-import numpy as np
-from scipy.stats import rankdata
-
-from tfbpapi.rank_transforms import (
-    shifted_negative_log_ranks,
-    transform,
-)
-
-
-def test_shifted_negative_log_ranks_basic():
-    ranks = np.array([1.0, 2.0, 3.0, 4.0, 5.0])
-    expected_log_ranks = -1 * np.log10(ranks) + np.log10(np.max(ranks))
-
-    actual_log_ranks = shifted_negative_log_ranks(ranks)
-    np.testing.assert_array_almost_equal(actual_log_ranks, expected_log_ranks)
-
-
-def test_shifted_negative_log_ranks_with_ties():
-    ranks = np.array([1.0, 2.5, 2.5, 3.0, 4.0])
-    expected_log_ranks = -1 * np.log10(ranks) + np.log10(np.max(ranks))
-
-    actual_log_ranks = shifted_negative_log_ranks(ranks)
-    np.testing.assert_array_almost_equal(actual_log_ranks, expected_log_ranks)
-
-
-def test_negative_log_transform_basic():
-    pvalues = np.array([0.01, 0.05, 0.01, 0.02, 0.05])
-    enrichment = np.array([5.0, 3.0, 6.0, 4.0, 4.5])
-
-    # Expected ranks based on pvalue (primary) with enrichment (secondary) tie-breaking
-    expected_ranks = np.array([2.0, 5.0, 1.0, 3.0, 4.0])
-    expected_log_ranks = -1 * np.log10(expected_ranks) + np.log10(
-        np.max(expected_ranks)
-    )
-
-    actual_log_ranks = transform(pvalues, enrichment)
-    np.testing.assert_array_almost_equal(actual_log_ranks, expected_log_ranks)
-
-
-def test_all_ties_in_primary_column():
-    pvalues = np.array([0.01, 0.01, 0.01, 0.01])
-    enrichment = np.array([10.0, 20.0, 15.0, 5.0])
-
-    # With all pvalues tied, the ranking should depend solely
-    # on enrichment (higher is better)
-    expected_secondary_ranks = rankdata(-enrichment, method="average")
-    expected_log_ranks = -1 * np.log10(expected_secondary_ranks) + np.log10(
-        np.max(expected_secondary_ranks)
-    )
-
-    actual_log_ranks = transform(pvalues, enrichment)
-    np.testing.assert_array_almost_equal(actual_log_ranks, expected_log_ranks)
-
-
-def test_no_ties_in_primary_column():
-    pvalues = np.array([0.01, 0.02, 0.03, 0.04])
-    enrichment = np.array([5.0, 10.0, 15.0, 20.0])
-
-    # With no ties in pvalue, the secondary column should have no effect
-    expected_ranks = rankdata(pvalues, method="average")
-    expected_log_ranks = -1 * np.log10(expected_ranks) + np.log10(
-        np.max(expected_ranks)
-    )
-
-    actual_log_ranks = transform(pvalues, enrichment)
-    np.testing.assert_array_almost_equal(actual_log_ranks, expected_log_ranks)
-
-
-def test_tied_in_both_pvalue_and_enrichment():
-    pvalues = np.array([0.01, 0.05, 0.01, 0.02, 0.05])
-    enrichment = np.array([5.0, 3.0, 5.0, 4.0, 3.0])
-
-    # With ties in both primary and secondary columns
-    expected_ranks = np.array([1.5, 4.5, 1.5, 3.0, 4.5])
-    expected_log_ranks = -1 * np.log10(expected_ranks) + np.log10(
-        np.max(expected_ranks)
-    )
-
-    actual_log_ranks = transform(pvalues, enrichment)
-    np.testing.assert_array_almost_equal(actual_log_ranks, expected_log_ranks)
diff --git a/tfbpapi/tests/test_real_datacards.py b/tfbpapi/tests/test_real_datacards.py
new file mode 100644
index 0000000..cd07626
--- /dev/null
+++ b/tfbpapi/tests/test_real_datacards.py
@@ -0,0 +1,706 @@
+"""
+Test real datacards from the HuggingFace collection.
+
+This test suite validates that all real datacards from the BrentLab collection parse
+correctly with the updated models.py and specification.
+
+"""
+
+import warnings
+
+import pytest
+import yaml
+
+from tfbpapi.models import DatasetCard
+
+# Real datacard YAML strings from the collection
+BARKAI_COMPENDIUM = """
+license: mit
+language:
+- en
+tags:
+- transcription-factor
+- binding
+- chec-seq
+- genomics
+- biology
+pretty_name: Barkai ChEC-seq Compendium
+size_categories:
+  - 100M<n<1B
+experimental_conditions:
+  temperature_celsius: 30
+  cultivation_method: liquid_culture
+  growth_phase_at_harvest:
+    od600: 4.0
+    stage: overnight_stationary_phase
+  media:
+    name: synthetic_complete_dextrose
+    carbon_source:
+      - compound: D-dextrose
+        concentration_percent: 2.0
+    nitrogen_source: []
+  strain_background: BY4741
+configs:
+- config_name: genomic_coverage
+  description: Genomic coverage data with pileup counts at specific positions
+  dataset_type: genome_map
+  default: true
+  data_files:
+  - split: train
+    path: genome_map/*/*/part-0.parquet
+  dataset_info:
+    features:
+    - name: seqnames
+      dtype: string
+      description: Chromosome or sequence name (e.g., chrI, chrII, etc.)
+    - name: start
+      dtype: int32
+      description: Start position of the genomic interval (1-based coordinates)
+    partitioning:
+      enabled: true
+      partition_by: ["Series", "Accession"]
+"""
+
+CALLINGCARDS = """
+license: mit
+language:
+- en
+tags:
+- biology
+- genomics
+- yeast
+- transcription-factors
+- callingcards
+pretty_name: "Calling Cards Transcription Factor Binding Dataset"
+experimental_conditions:
+  environmental_conditions:
+    temperature_celsius: 30
+    cultivation_method: liquid_culture
+    media:
+      name: synthetic_complete_minus_ura_his_leu
+      carbon_source:
+        - compound: D-galactose
+          concentration_percent: 2
+      nitrogen_source:
+        - compound: amino_acid_dropout_mix
+          specifications:
+            - minus_ura
+            - minus_his
+            - minus_leu
+configs:
+- config_name: annotated_features
+  description: Calling Cards transcription factor binding data
+  dataset_type: annotated_features
+  default: true
+  data_files:
+  - split: train
+    path: annotated_features/*/*.parquet
+  dataset_info:
+    features:
+    - name: id
+      dtype: string
+      description: Unique identifier for each binding measurement
+    - name: regulator_locus_tag
+      dtype: string
+      description: Systematic gene name (ORF identifier) of the transcription factor
+      role: regulator_identifier
+"""
+
+HARBISON_2004 = """
+license: mit
+language:
+  - en
+tags:
+  - genomics
+  - yeast
+  - transcription
+  - binding
+pretty_name: "Harbison, 2004 ChIP-chip"
+configs:
+- config_name: harbison_2004
+  description: ChIP-chip transcription factor binding data with environmental conditions
+  dataset_type: annotated_features
+  default: true
+  data_files:
+  - split: train
+    path: harbison_2004.parquet
+  dataset_info:
+    features:
+    - name: condition
+      dtype:
+        class_label:
+          names: ["YPD", "SM", "RAPA", "H2O2Hi", "H2O2Lo",
+                  "Acid", "Alpha", "BUT14", "BUT90", "Thi-",
+                  "GAL", "HEAT", "Pi-", "RAFF"]
+      description: Environmental condition of the experiment
+      role: experimental_condition
+      definitions:
+        YPD:
+          description: Rich media baseline condition
+          environmental_conditions:
+            temperature_celsius: 30
+            media:
+              name: YPD
+              carbon_source:
+                - compound: D-glucose
+                  concentration_percent: 2
+              nitrogen_source:
+                - compound: yeast_extract
+                  concentration_percent: 1
+                - compound: peptone
+                  concentration_percent: 2
+        Acid:
+          description: Acidic pH stress condition
+          environmental_conditions:
+            temperature_celsius: 30
+            media:
+              name: YPD
+              carbon_source:
+                - compound: D-glucose
+                  concentration_percent: 2
+              nitrogen_source:
+                - compound: yeast_extract
+                  concentration_percent: 1
+                - compound: peptone
+                  concentration_percent: 2
+            chemical_treatment:
+              compound: succinic_acid
+              concentration_percent: 0.59
+              target_pH: 4.0
+              duration_minutes: 30
+        BUT14:
+          description: Long-term filamentation induction with butanol
+          environmental_conditions:
+            temperature_celsius: 30
+            media:
+              name: YPD
+              carbon_source:
+                - compound: D-glucose
+                  concentration_percent: 2
+              nitrogen_source:
+                - compound: yeast_extract
+                  concentration_percent: 1
+                - compound: peptone
+                  concentration_percent: 2
+              additives:
+                - compound: butanol
+                  concentration_percent: 1
+            incubation_duration_hours: 14
+    - name: regulator_locus_tag
+      dtype: string
+      description: Systematic gene name of the transcription factor
+      role: regulator_identifier
+    - name: target_locus_tag
+      dtype: string
+      description: Systematic gene name of the target gene
+      role: target_identifier
+    - name: effect
+      dtype: float64
+      description: The chip channel ratio
+      role: quantitative_measure
+"""
+
+HU_2007 = """
+license: mit
+language:
+  - en
+tags:
+  - genomics
+  - yeast
+  - transcription
+  - perturbation
+  - knockout
+  - TFKO
+pretty_name: Hu 2007/Reimand 2010 TFKO
+experimental_conditions:
+  environmental_conditions:
+    temperature_celsius: 30
+    cultivation_method: batch
+    growth_phase_at_harvest:
+      phase: mid_log
+    media:
+      name: YPD
+      carbon_source:
+        - compound: D-glucose
+          concentration_percent: 2
+      nitrogen_source:
+        - compound: yeast_extract
+          concentration_percent: 1
+        - compound: peptone
+          concentration_percent: 2
+  strain_background: BY4741
+configs:
+  - config_name: data
+    description: Regulator knockout expression data
+    dataset_type: annotated_features
+    default: true
+    data_files:
+    - split: train
+      path: hu_2007_reimand_2010.parquet
+    dataset_info:
+      features:
+        - name: regulator_locus_tag
+          dtype: string
+          description: Systematic ID of the knocked out regulator
+          role: regulator_identifier
+        - name: target_locus_tag
+          dtype: string
+          description: Systematic ID of the target gene
+          role: target_identifier
+        - name: effect
+          dtype: float
+          description: log fold change of mutant vs wt
+          role: quantitative_measure
+        - name: heat_shock
+          dtype: bool
+          description: Whether heat shock was applied
+          role: experimental_condition
+          definitions:
+            "true":
+              environmental_conditions:
+                temperature_celsius: 39
+                duration_minutes: 15
+              strain_background: BY4741
+"""
+
+HUGHES_2006 = """
+license: mit
+language:
+- en
+tags:
+- biology
+- genomics
+- yeast
+- transcription-factors
+pretty_name: "Hughes 2006 Yeast Transcription Factor Perturbation Dataset"
+configs:
+- config_name: overexpression
+  description: Overexpression perturbation normalized log2 fold changes
+  dataset_type: annotated_features
+  default: true
+  data_files:
+  - split: train
+    path: overexpression.parquet
+  dataset_info:
+    features:
+    - name: regulator_locus_tag
+      dtype: string
+      description: Systematic gene name of the perturbed transcription factor
+      role: regulator_identifier
+    - name: target_locus_tag
+      dtype: string
+      description: Systematic gene name of the target gene
+      role: target_identifier
+    - name: mean_norm_log2fc
+      dtype: float64
+      description: Average log2 fold change across dye orientations
+      role: quantitative_measure
+  experimental_conditions:
+    media:
+      name: selective_medium
+      carbon_source:
+        - compound: D-raffinose
+          concentration_percent: 2
+      nitrogen_source: []
+    induction:
+      inducer:
+        compound: D-galactose
+        concentration_percent: 2
+      duration_hours: 3
+"""
+
+KEMMEREN_2014 = """
+license: mit
+language:
+- en
+tags:
+- genomics
+- yeast
+- transcription
+pretty_name: "Kemmeren, 2014 Overexpression"
+experimental_conditions:
+  temperature_celsius: 30
+  cultivation_method: plate
+  growth_phase_at_harvest:
+    phase: early_mid_log
+    od600: 0.6
+    od600_tolerance: 0.1
+  media:
+    name: synthetic_complete
+    carbon_source:
+      - compound: D-glucose
+        concentration_percent: 2
+    nitrogen_source:
+      - compound: yeast_nitrogen_base
+        concentration_percent: 0.671
+        specifications:
+          - without_amino_acids
+configs:
+- config_name: kemmeren_2014
+  description: Transcriptional regulator overexpression perturbation data
+  dataset_type: annotated_features
+  default: true
+  data_files:
+  - split: train
+    path: kemmeren_2014.parquet
+  dataset_info:
+    features:
+    - name: regulator_locus_tag
+      dtype: string
+      description: induced transcriptional regulator systematic ID
+      role: regulator_identifier
+    - name: target_locus_tag
+      dtype: string
+      description: The systematic ID of the feature
+      role: target_identifier
+    - name: M
+      dtype: float64
+      description: log₂ fold change (mutant vs wildtype)
+      role: quantitative_measure
+"""
+
+MAHENDRAWADA_2025 = """
+license: mit
+language:
+- en
+tags:
+- biology
+- genomics
+- yeast
+pretty_name: "Mahendrawada 2025 ChEC-seq and Nascent RNA-seq data"
+configs:
+- config_name: mahendrawada_chec_seq
+  description: ChEC-seq transcription factor binding data
+  default: true
+  dataset_type: annotated_features
+  data_files:
+  - split: train
+    path: chec_mahendrawada_2025.parquet
+  dataset_info:
+    features:
+    - name: regulator_locus_tag
+      dtype: string
+      description: Systematic gene name of the transcription factor
+      role: regulator_identifier
+    - name: target_locus_tag
+      dtype: string
+      description: Systematic gene name of the target gene
+      role: target_identifier
+    - name: peak_score
+      dtype: float64
+      description: ChEC signal around peak center
+      role: quantitative_measure
+  experimental_conditions:
+    environmental_conditions:
+      temperature_celsius: 30
+      growth_phase_at_harvest:
+        od600: 1.0
+      media:
+        name: synthetic_complete
+        carbon_source: []
+        nitrogen_source:
+          - compound: yeast_nitrogen_base
+            concentration_percent: 0.17
+            specifications:
+              - without_ammonium_sulfate
+              - without_amino_acids
+"""
+
+ROSSI_2021 = """
+license: mit
+tags:
+- transcription-factor
+- binding
+- chipexo
+- genomics
+language:
+- en
+pretty_name: Rossi ChIP-exo 2021
+experimental_conditions:
+  environmental_conditions:
+    temperature_celsius: 25
+    growth_phase_at_harvest:
+      phase: mid_log
+      od600: 0.8
+    media:
+      name: yeast_peptone_dextrose
+      carbon_source:
+        - compound: D-glucose
+      nitrogen_source:
+        - compound: yeast_extract
+        - compound: peptone
+  strain_background: W303
+configs:
+- config_name: rossi_annotated_features
+  description: ChIP-exo regulator-target binding features
+  dataset_type: annotated_features
+  default: true
+  data_files:
+    - split: train
+      path: yeastepigenome_annotatedfeatures.parquet
+  dataset_info:
+    features:
+      - name: regulator_locus_tag
+        dtype: string
+        description: Systematic ORF name of the regulator
+        role: regulator_identifier
+      - name: target_locus_tag
+        dtype: string
+        description: The systematic ID of the feature
+        role: target_identifier
+      - name: max_fc
+        dtype: float64
+        description: Maximum fold change
+        role: quantitative_measure
+"""
+
+
+@pytest.mark.parametrize(
+    "datacard_yaml,dataset_name",
+    [
+        (BARKAI_COMPENDIUM, "barkai_compendium"),
+        (CALLINGCARDS, "callingcards"),
+        (HARBISON_2004, "harbison_2004"),
+        (HU_2007, "hu_2007_reimand_2010"),
+        (HUGHES_2006, "hughes_2006"),
+        (KEMMEREN_2014, "kemmeren_2014"),
+        (MAHENDRAWADA_2025, "mahendrawada_2025"),
+        (ROSSI_2021, "rossi_2021"),
+    ],
+)
+def test_real_datacard_parsing(datacard_yaml, dataset_name):
+    """Test that real datacards parse correctly without ValidationError."""
+    data = yaml.safe_load(datacard_yaml)
+
+    # Should not raise validation error
+    card = DatasetCard(**data)
+
+    # Verify basic structure
+    assert card.configs is not None
+    assert len(card.configs) > 0
+
+    # Verify config has required fields
+    config = card.configs[0]
+    assert config.config_name is not None
+    assert config.dataset_type is not None
+    assert config.dataset_info is not None
+    assert config.dataset_info.features is not None
+    assert len(config.dataset_info.features) > 0
+
+
+def test_harbison_2004_condition_definitions():
+    """Test that harbison_2004 field-level definitions parse correctly."""
+    data = yaml.safe_load(HARBISON_2004)
+    card = DatasetCard(**data)
+
+    # Find the config
+    config = card.configs[0]
+    assert config.config_name == "harbison_2004"
+
+    # Find condition feature
+    condition_feature = next(
+        f for f in config.dataset_info.features if f.name == "condition"
+    )
+
+    # Should have definitions
+    assert condition_feature.definitions is not None
+    assert "YPD" in condition_feature.definitions
+    assert "Acid" in condition_feature.definitions
+    assert "BUT14" in condition_feature.definitions
+
+    # YPD definition should have environmental conditions
+    ypd_def = condition_feature.definitions["YPD"]
+    assert "environmental_conditions" in ypd_def
+
+    # Acid definition should have target_pH in chemical_treatment
+    acid_def = condition_feature.definitions["Acid"]
+    assert "environmental_conditions" in acid_def
+    assert "chemical_treatment" in acid_def["environmental_conditions"]
+    assert "target_pH" in acid_def["environmental_conditions"]["chemical_treatment"]
+
+    # BUT14 should have media additives
+    but14_def = condition_feature.definitions["BUT14"]
+    assert "environmental_conditions" in but14_def
+    assert "media" in but14_def["environmental_conditions"]
+    assert "additives" in but14_def["environmental_conditions"]["media"]
+
+
+def test_hughes_2006_induction():
+    """Test that hughes_2006 induction field parses correctly."""
+    data = yaml.safe_load(HUGHES_2006)
+    card = DatasetCard(**data)
+
+    # Check experimental conditions (stored as dict in model_extra)
+    assert card.configs[0].model_extra is not None
+    assert "experimental_conditions" in card.configs[0].model_extra
+    exp_conds = card.configs[0].model_extra["experimental_conditions"]
+
+    # Check induction field
+    assert "induction" in exp_conds
+    induction = exp_conds["induction"]
+    assert "inducer" in induction
+    assert induction["inducer"]["compound"] == "D-galactose"
+    assert induction["duration_hours"] == 3
+
+
+def test_kemmeren_2014_growth_phase():
+    """Test that kemmeren_2014 growth phase with od600_tolerance parses correctly."""
+    data = yaml.safe_load(KEMMEREN_2014)
+    card = DatasetCard(**data)
+
+    # Check growth phase (stored as dict in model_extra)
+    assert card.model_extra is not None
+    assert "experimental_conditions" in card.model_extra
+    exp_conds = card.model_extra["experimental_conditions"]
+
+    assert "growth_phase_at_harvest" in exp_conds
+    growth_phase = exp_conds["growth_phase_at_harvest"]
+    assert growth_phase["phase"] == "early_mid_log"
+    assert growth_phase["od600"] == 0.6
+    assert growth_phase["od600_tolerance"] == 0.1
+
+
+def test_hu_2007_strain_background_in_definitions():
+    """Test that strain_background in field definitions parses correctly."""
+    data = yaml.safe_load(HU_2007)
+    card = DatasetCard(**data)
+
+    # Find heat_shock feature
+    config = card.configs[0]
+    heat_shock_feature = next(
+        f for f in config.dataset_info.features if f.name == "heat_shock"
+    )
+
+    # Check definitions
+    assert heat_shock_feature.definitions is not None
+    assert "true" in heat_shock_feature.definitions
+
+    # Check strain_background in definition
+    true_def = heat_shock_feature.definitions["true"]
+    assert "strain_background" in true_def
+
+
+def test_field_role_validation():
+    """Test that role field accepts any string value."""
+    # This should parse successfully with any role string
+    data = yaml.safe_load(CALLINGCARDS)
+    card = DatasetCard(**data)
+
+    # Find a feature with a role
+    config = card.configs[0]
+    regulator_feature = next(
+        f for f in config.dataset_info.features if f.name == "regulator_locus_tag"
+    )
+
+    # Verify role is a string (not an enum)
+    assert regulator_feature.role == "regulator_identifier"
+    assert isinstance(regulator_feature.role, str)
+
+
+def test_concentration_fields():
+    """Test that various concentration fields parse correctly."""
+    data = yaml.safe_load(KEMMEREN_2014)
+    card = DatasetCard(**data)
+
+    # Check media compounds (stored as dict in model_extra)
+    assert card.model_extra is not None
+    assert "experimental_conditions" in card.model_extra
+    exp_conds = card.model_extra["experimental_conditions"]
+    assert "media" in exp_conds
+    media = exp_conds["media"]
+
+    # Check carbon source
+    assert "carbon_source" in media
+    carbon_sources = media["carbon_source"]
+    assert len(carbon_sources) > 0
+    carbon = carbon_sources[0]
+    assert carbon["concentration_percent"] is not None
+
+    # Check nitrogen source with specifications
+    assert "nitrogen_source" in media
+    nitrogen_sources = media["nitrogen_source"]
+    assert len(nitrogen_sources) > 0
+    nitrogen = nitrogen_sources[0]
+    assert nitrogen["specifications"] is not None
+    assert "without_amino_acids" in nitrogen["specifications"]
+
+
+def test_extra_fields_do_not_raise_errors():
+    """Test that extra fields are accepted (with warnings) but don't raise errors."""
+    # All real datacards should parse without ValidationError
+    # even if they have extra fields
+    datacards = [
+        BARKAI_COMPENDIUM,
+        CALLINGCARDS,
+        HARBISON_2004,
+        HU_2007,
+        HUGHES_2006,
+        KEMMEREN_2014,
+        MAHENDRAWADA_2025,
+        ROSSI_2021,
+    ]
+
+    for datacard_yaml in datacards:
+        data = yaml.safe_load(datacard_yaml)
+        # This should not raise ValidationError
+        card = DatasetCard(**data)
+        assert card is not None
+
+
+def test_empty_nitrogen_source_list():
+    """Test that empty nitrogen_source lists are accepted."""
+    data = yaml.safe_load(BARKAI_COMPENDIUM)
+    card = DatasetCard(**data)
+
+    # Check that nitrogen_source is an empty list (stored as dict in model_extra)
+    assert card.model_extra is not None
+    assert "experimental_conditions" in card.model_extra
+    exp_conds = card.model_extra["experimental_conditions"]
+    assert "media" in exp_conds
+    media = exp_conds["media"]
+    assert media["nitrogen_source"] == []
+
+
+def test_media_additives():
+    """Test that media additives parse correctly."""
+    data = yaml.safe_load(HARBISON_2004)
+    card = DatasetCard(**data)
+
+    # Find BUT14 condition definition
+    config = card.configs[0]
+    condition_feature = next(
+        f for f in config.dataset_info.features if f.name == "condition"
+    )
+    but14_def = condition_feature.definitions["BUT14"]
+
+    # Check additives
+    env_conds_dict = but14_def["environmental_conditions"]
+    media = env_conds_dict["media"]
+    assert "additives" in media
+    additives = media["additives"]
+    assert len(additives) > 0
+    assert additives[0]["compound"] == "butanol"
+    assert additives[0]["concentration_percent"] == 1
+
+
+def test_strain_background_formats():
+    """Test that strain_background accepts both string and dict formats."""
+    # String format
+    data1 = yaml.safe_load(BARKAI_COMPENDIUM)
+    card1 = DatasetCard(**data1)
+    assert card1.model_extra is not None
+    assert "experimental_conditions" in card1.model_extra
+    exp_conds1 = card1.model_extra["experimental_conditions"]
+    assert exp_conds1["strain_background"] == "BY4741"
+
+    # String format in rossi
+    data2 = yaml.safe_load(ROSSI_2021)
+    card2 = DatasetCard(**data2)
+    assert card2.model_extra is not None
+    assert "experimental_conditions" in card2.model_extra
+    exp_conds2 = card2.model_extra["experimental_conditions"]
+    assert exp_conds2["strain_background"] == "W303"
+
+
+if __name__ == "__main__":
+    pytest.main([__file__, "-v"])
diff --git a/tfbpapi/tests/test_virtual_db.py b/tfbpapi/tests/test_virtual_db.py
new file mode 100644
index 0000000..1293bf9
--- /dev/null
+++ b/tfbpapi/tests/test_virtual_db.py
@@ -0,0 +1,695 @@
+"""
+Tests for VirtualDB unified query interface.
+
+Tests configuration loading, schema discovery, querying, filtering, and caching.
+
+"""
+
+import tempfile
+from pathlib import Path
+
+import pandas as pd
+import pytest
+import yaml  # type: ignore
+
+from tfbpapi.virtual_db import VirtualDB, get_nested_value, normalize_value
+
+
+class TestHelperFunctions:
+    """Tests for helper functions."""
+
+    def test_get_nested_value_simple(self):
+        """Test simple nested dict navigation."""
+        data = {"media": {"name": "YPD"}}
+        result = get_nested_value(data, "media.name")
+        assert result == "YPD"
+
+    def test_get_nested_value_missing_key(self):
+        """Test that missing keys return None."""
+        data = {"media": {"name": "YPD"}}
+        result = get_nested_value(data, "media.carbon_source")
+        assert result is None
+
+    def test_get_nested_value_list_extraction(self):
+        """Test extracting property from list of dicts."""
+        data = {
+            "media": {
+                "carbon_source": [{"compound": "glucose"}, {"compound": "galactose"}]
+            }
+        }
+        result = get_nested_value(data, "media.carbon_source.compound")
+        assert result == ["glucose", "galactose"]
+
+    def test_get_nested_value_non_dict(self):
+        """Test that non-dict input returns None."""
+        result = get_nested_value("not a dict", "path")  # type: ignore
+        assert result is None
+
+    def test_normalize_value_exact_match(self):
+        """Test exact alias match."""
+        aliases = {"glucose": ["D-glucose", "dextrose"]}
+        result = normalize_value("D-glucose", aliases)
+        assert result == "glucose"
+
+    def test_normalize_value_case_insensitive(self):
+        """Test case-insensitive matching."""
+        aliases = {"glucose": ["D-glucose", "dextrose"]}
+        result = normalize_value("DEXTROSE", aliases)
+        assert result == "glucose"
+
+    def test_normalize_value_no_match(self):
+        """Test pass-through when no alias matches."""
+        aliases = {"glucose": ["D-glucose"]}
+        result = normalize_value("maltose", aliases)
+        assert result == "maltose"
+
+    def test_normalize_value_no_aliases(self):
+        """Test pass-through when no aliases provided."""
+        result = normalize_value("D-glucose", None)
+        assert result == "D-glucose"
+
+    def test_normalize_value_missing_value_label(self):
+        """Test missing value handling."""
+        result = normalize_value(None, None, "unspecified")
+        assert result == "unspecified"
+
+    def test_normalize_value_missing_value_no_label(self):
+        """Test missing value without label."""
+        result = normalize_value(None, None)
+        assert result == "None"
+
+
+class TestVirtualDBConfig:
+    """Tests for VirtualDB configuration loading."""
+
+    def create_test_config(self, **overrides):
+        """Helper to create test configuration file."""
+        config = {
+            "factor_aliases": {
+                "carbon_source": {
+                    "glucose": ["D-glucose", "dextrose"],
+                    "galactose": ["D-galactose", "Galactose"],
+                }
+            },
+            "missing_value_labels": {"carbon_source": "unspecified"},
+            "description": {"carbon_source": "Carbon source in growth media"},
+            "repositories": {
+                "BrentLab/test_repo": {
+                    "temperature_celsius": {"path": "temperature_celsius"},
+                    "dataset": {
+                        "test_dataset": {
+                            "carbon_source": {
+                                "field": "condition",
+                                "path": "media.carbon_source.compound",
+                            }
+                        }
+                    },
+                }
+            },
+        }
+        config.update(overrides)
+        return config
+
+    def test_init_with_valid_config(self):
+        """Test VirtualDB initialization with valid config."""
+        with tempfile.NamedTemporaryFile(mode="w", suffix=".yaml", delete=False) as f:
+            yaml.dump(self.create_test_config(), f)
+            config_path = f.name
+
+        try:
+            vdb = VirtualDB(config_path)
+            assert vdb.config is not None
+            assert vdb.token is None
+            assert len(vdb.cache) == 0
+        finally:
+            Path(config_path).unlink()
+
+    def test_init_with_token(self):
+        """Test VirtualDB initialization with HF token."""
+        with tempfile.NamedTemporaryFile(mode="w", suffix=".yaml", delete=False) as f:
+            yaml.dump(self.create_test_config(), f)
+            config_path = f.name
+
+        try:
+            vdb = VirtualDB(config_path, token="test_token")
+            assert vdb.token == "test_token"
+        finally:
+            Path(config_path).unlink()
+
+    def test_init_missing_config_file(self):
+        """Test error when config file doesn't exist."""
+        with pytest.raises(FileNotFoundError):
+            VirtualDB("/nonexistent/path.yaml")
+
+    def test_repr(self):
+        """Test string representation."""
+        with tempfile.NamedTemporaryFile(mode="w", suffix=".yaml", delete=False) as f:
+            yaml.dump(self.create_test_config(), f)
+            config_path = f.name
+
+        try:
+            vdb = VirtualDB(config_path)
+            repr_str = repr(vdb)
+            assert "VirtualDB" in repr_str
+            assert "1 repositories" in repr_str
+            assert "1 datasets configured" in repr_str
+            assert "0 views cached" in repr_str
+        finally:
+            Path(config_path).unlink()
+
+
+class TestSchemaDiscovery:
+    """Tests for schema discovery methods."""
+
+    def create_multi_dataset_config(self):
+        """Create config with multiple datasets."""
+        return {
+            "factor_aliases": {},
+            "repositories": {
+                "BrentLab/repo1": {
+                    "temperature_celsius": {"path": "temperature_celsius"},
+                    "dataset": {
+                        "dataset1": {
+                            "carbon_source": {
+                                "field": "condition",
+                                "path": "media.carbon_source",
+                            }
+                        }
+                    },
+                },
+                "BrentLab/repo2": {
+                    "nitrogen_source": {"path": "media.nitrogen_source"},
+                    "dataset": {
+                        "dataset2": {
+                            "carbon_source": {"path": "media.carbon_source"},
+                            "temperature_celsius": {"path": "temperature_celsius"},
+                        }
+                    },
+                },
+            },
+        }
+
+    def test_get_fields_all_datasets(self):
+        """Test getting all fields across all datasets."""
+        with tempfile.NamedTemporaryFile(mode="w", suffix=".yaml", delete=False) as f:
+            yaml.dump(self.create_multi_dataset_config(), f)
+            config_path = f.name
+
+        try:
+            vdb = VirtualDB(config_path)
+            fields = vdb.get_fields()
+            assert "carbon_source" in fields
+            assert "temperature_celsius" in fields
+            assert "nitrogen_source" in fields
+            assert fields == sorted(fields)  # Should be sorted
+        finally:
+            Path(config_path).unlink()
+
+    def test_get_fields_specific_dataset(self):
+        """Test getting fields for specific dataset."""
+        with tempfile.NamedTemporaryFile(mode="w", suffix=".yaml", delete=False) as f:
+            yaml.dump(self.create_multi_dataset_config(), f)
+            config_path = f.name
+
+        try:
+            vdb = VirtualDB(config_path)
+            fields = vdb.get_fields("BrentLab/repo1", "dataset1")
+            assert "carbon_source" in fields
+            assert "temperature_celsius" in fields
+            # nitrogen_source is in repo2, not repo1
+            assert "nitrogen_source" not in fields
+        finally:
+            Path(config_path).unlink()
+
+    def test_get_fields_invalid_partial_args(self):
+        """Test error when only one of repo_id/config_name provided."""
+        with tempfile.NamedTemporaryFile(mode="w", suffix=".yaml", delete=False) as f:
+            yaml.dump(self.create_multi_dataset_config(), f)
+            config_path = f.name
+
+        try:
+            vdb = VirtualDB(config_path)
+            with pytest.raises(ValueError, match="Both repo_id and config_name"):
+                vdb.get_fields(repo_id="BrentLab/repo1")
+        finally:
+            Path(config_path).unlink()
+
+    def test_get_common_fields(self):
+        """Test getting fields common to all datasets."""
+        with tempfile.NamedTemporaryFile(mode="w", suffix=".yaml", delete=False) as f:
+            yaml.dump(self.create_multi_dataset_config(), f)
+            config_path = f.name
+
+        try:
+            vdb = VirtualDB(config_path)
+            common = vdb.get_common_fields()
+            # Both datasets have carbon_source and temperature_celsius
+            assert "carbon_source" in common
+            assert "temperature_celsius" in common
+            # nitrogen_source is only in repo2
+            assert "nitrogen_source" not in common
+        finally:
+            Path(config_path).unlink()
+
+
+class TestCaching:
+    """Tests for view materialization and caching."""
+
+    def create_simple_config(self):
+        """Create simple config for testing."""
+        return {
+            "factor_aliases": {},
+            "repositories": {
+                "BrentLab/test_repo": {
+                    "dataset": {
+                        "test_dataset": {
+                            "carbon_source": {"path": "media.carbon_source"}
+                        }
+                    }
+                }
+            },
+        }
+
+    def test_invalidate_cache_all(self):
+        """Test invalidating all cache."""
+        with tempfile.NamedTemporaryFile(mode="w", suffix=".yaml", delete=False) as f:
+            yaml.dump(self.create_simple_config(), f)
+            config_path = f.name
+
+        try:
+            vdb = VirtualDB(config_path)
+            # Manually add to cache
+            vdb.cache[("BrentLab/test_repo", "test_dataset")] = pd.DataFrame()
+            assert len(vdb.cache) == 1
+
+            vdb.invalidate_cache()
+            assert len(vdb.cache) == 0
+        finally:
+            Path(config_path).unlink()
+
+    def test_invalidate_cache_specific(self):
+        """Test invalidating specific dataset cache."""
+        with tempfile.NamedTemporaryFile(mode="w", suffix=".yaml", delete=False) as f:
+            yaml.dump(self.create_simple_config(), f)
+            config_path = f.name
+
+        try:
+            vdb = VirtualDB(config_path)
+            # Add multiple entries to cache
+            vdb.cache[("BrentLab/test_repo", "test_dataset")] = pd.DataFrame()
+            vdb.cache[("BrentLab/other_repo", "other_dataset")] = pd.DataFrame()
+            assert len(vdb.cache) == 2
+
+            vdb.invalidate_cache([("BrentLab/test_repo", "test_dataset")])
+            assert len(vdb.cache) == 1
+            assert ("BrentLab/other_repo", "other_dataset") in vdb.cache
+        finally:
+            Path(config_path).unlink()
+
+
+class TestFiltering:
+    """Tests for filter application logic."""
+
+    def test_apply_filters_exact_match(self):
+        """Test exact value matching in filters."""
+        df = pd.DataFrame(
+            {
+                "sample_id": ["s1", "s2", "s3"],
+                "carbon_source": ["glucose", "galactose", "glucose"],
+            }
+        )
+
+        # Create minimal VirtualDB instance
+        with tempfile.NamedTemporaryFile(mode="w", suffix=".yaml", delete=False) as f:
+            config = {
+                "repositories": {
+                    "BrentLab/test": {
+                        "dataset": {
+                            "test": {"carbon_source": {"path": "media.carbon_source"}}
+                        }
+                    }
+                }
+            }
+            yaml.dump(config, f)
+            config_path = f.name
+
+        try:
+            vdb = VirtualDB(config_path)
+            filtered = vdb._apply_filters(
+                df, {"carbon_source": "glucose"}, "BrentLab/test", "test"
+            )
+            assert len(filtered) == 2
+            assert all(filtered["carbon_source"] == "glucose")
+        finally:
+            Path(config_path).unlink()
+
+    def test_apply_filters_numeric_range(self):
+        """Test numeric range filtering."""
+        df = pd.DataFrame(
+            {"sample_id": ["s1", "s2", "s3"], "temperature_celsius": [25, 30, 37]}
+        )
+
+        with tempfile.NamedTemporaryFile(mode="w", suffix=".yaml", delete=False) as f:
+            config = {
+                "repositories": {
+                    "BrentLab/test": {
+                        "dataset": {
+                            "test": {
+                                "temperature_celsius": {"path": "temperature_celsius"}
+                            }
+                        }
+                    }
+                }
+            }
+            yaml.dump(config, f)
+            config_path = f.name
+
+        try:
+            vdb = VirtualDB(config_path)
+
+            # Test >= operator
+            filtered = vdb._apply_filters(
+                df, {"temperature_celsius": (">=", 30)}, "BrentLab/test", "test"
+            )
+            assert len(filtered) == 2
+            assert all(filtered["temperature_celsius"] >= 30)
+
+            # Test between operator
+            filtered = vdb._apply_filters(
+                df,
+                {"temperature_celsius": ("between", 28, 32)},
+                "BrentLab/test",
+                "test",
+            )
+            assert len(filtered) == 1
+            assert filtered.iloc[0]["temperature_celsius"] == 30
+        finally:
+            Path(config_path).unlink()
+
+    def test_apply_filters_with_alias_expansion(self):
+        """Test filter with alias expansion."""
+        df = pd.DataFrame(
+            {
+                "sample_id": ["s1", "s2", "s3"],
+                "carbon_source": ["glucose", "D-glucose", "galactose"],
+            }
+        )
+
+        with tempfile.NamedTemporaryFile(mode="w", suffix=".yaml", delete=False) as f:
+            config = {
+                "factor_aliases": {
+                    "carbon_source": {"glucose": ["D-glucose", "dextrose", "glucose"]}
+                },
+                "repositories": {
+                    "BrentLab/test": {
+                        "dataset": {
+                            "test": {"carbon_source": {"path": "media.carbon_source"}}
+                        }
+                    }
+                },
+            }
+            yaml.dump(config, f)
+            config_path = f.name
+
+        try:
+            vdb = VirtualDB(config_path)
+            filtered = vdb._apply_filters(
+                df, {"carbon_source": "glucose"}, "BrentLab/test", "test"
+            )
+            # Should match both "glucose" and "D-glucose" due to alias expansion
+            assert len(filtered) == 2
+        finally:
+            Path(config_path).unlink()
+
+
+class TestExtraction:
+    """Tests for metadata extraction methods."""
+
+    def test_add_field_metadata(self):
+        """Test adding field-level metadata to DataFrame."""
+        df = pd.DataFrame({"sample_id": ["s1", "s2"], "condition": ["YPD", "YPG"]})
+
+        field_metadata = {
+            "YPD": {"carbon_source": ["glucose"], "growth_media": ["YPD"]},
+            "YPG": {"carbon_source": ["glycerol"], "growth_media": ["YPG"]},
+        }
+
+        with tempfile.NamedTemporaryFile(mode="w", suffix=".yaml", delete=False) as f:
+            config = {
+                "repositories": {
+                    "BrentLab/test": {
+                        "dataset": {
+                            "test": {"carbon_source": {"path": "media.carbon_source"}}
+                        }
+                    }
+                }
+            }
+            yaml.dump(config, f)
+            config_path = f.name
+
+        try:
+            vdb = VirtualDB(config_path)
+            result = vdb._add_field_metadata(df, field_metadata)
+
+            assert "carbon_source" in result.columns
+            assert "growth_media" in result.columns
+            assert (
+                result.loc[result["condition"] == "YPD", "carbon_source"].iloc[0]
+                == "glucose"
+            )
+            assert (
+                result.loc[result["condition"] == "YPG", "carbon_source"].iloc[0]
+                == "glycerol"
+            )
+        finally:
+            Path(config_path).unlink()
+
+
+class TestQuery:
+    """Tests for query method - requires mocking HfQueryAPI."""
+
+    def test_query_empty_result(self):
+        """Test query with no matching datasets."""
+        with tempfile.NamedTemporaryFile(mode="w", suffix=".yaml", delete=False) as f:
+            config = {
+                "repositories": {
+                    "BrentLab/test": {
+                        "dataset": {
+                            "test": {"carbon_source": {"path": "media.carbon_source"}}
+                        }
+                    }
+                }
+            }
+            yaml.dump(config, f)
+            config_path = f.name
+
+        try:
+            vdb = VirtualDB(config_path)
+            # Query with non-configured dataset should return empty
+            result = vdb.query(datasets=[("BrentLab/other", "other")])
+            assert isinstance(result, pd.DataFrame)
+            assert result.empty
+        finally:
+            Path(config_path).unlink()
+
+
+class TestComparativeDatasets:
+    """Tests for comparative dataset field-based joins."""
+
+    def test_parse_composite_identifier(self):
+        """Test parsing composite identifiers."""
+        composite_id = "BrentLab/harbison_2004;harbison_2004;sample_42"
+        repo, config, sample = VirtualDB._parse_composite_identifier(composite_id)
+        assert repo == "BrentLab/harbison_2004"
+        assert config == "harbison_2004"
+        assert sample == "sample_42"
+
+    def test_parse_composite_identifier_invalid(self):
+        """Test that invalid composite IDs raise errors."""
+        with pytest.raises(ValueError, match="Invalid composite ID format"):
+            VirtualDB._parse_composite_identifier("invalid:format")
+
+    def test_get_comparative_fields_for_dataset(self):
+        """Test getting comparative fields mapping."""
+        with tempfile.NamedTemporaryFile(mode="w", suffix=".yaml", delete=False) as f:
+            config = {
+                "repositories": {
+                    "BrentLab/primary": {
+                        "dataset": {
+                            "primary_data": {
+                                "sample_id": {"field": "sample_id"},
+                                "comparative_analyses": [
+                                    {
+                                        "repo": "BrentLab/comparative",
+                                        "dataset": "comp_data",
+                                        "via_field": "binding_id",
+                                    }
+                                ],
+                            }
+                        }
+                    },
+                    "BrentLab/comparative": {
+                        "dataset": {
+                            "comp_data": {
+                                "dto_fdr": {"field": "dto_fdr"},
+                                "dto_pvalue": {"field": "dto_empirical_pvalue"},
+                            }
+                        }
+                    },
+                }
+            }
+            yaml.dump(config, f)
+            config_path = f.name
+
+        try:
+            vdb = VirtualDB(config_path)
+            field_mapping = vdb._get_comparative_fields_for_dataset(
+                "BrentLab/primary", "primary_data"
+            )
+
+            # Should have dto_fdr and dto_pvalue, but NOT binding_id (via_field)
+            assert "dto_fdr" in field_mapping
+            assert "dto_pvalue" in field_mapping
+            assert "binding_id" not in field_mapping
+
+            # Check mapping structure
+            assert field_mapping["dto_fdr"]["comp_repo"] == "BrentLab/comparative"
+            assert field_mapping["dto_fdr"]["comp_dataset"] == "comp_data"
+            assert field_mapping["dto_fdr"]["via_field"] == "binding_id"
+        finally:
+            Path(config_path).unlink()
+
+    def test_get_comparative_fields_no_links(self):
+        """Test that datasets without comparative links return empty mapping."""
+        with tempfile.NamedTemporaryFile(mode="w", suffix=".yaml", delete=False) as f:
+            config = {
+                "repositories": {
+                    "BrentLab/primary": {
+                        "dataset": {
+                            "primary_data": {"sample_id": {"field": "sample_id"}}
+                        }
+                    }
+                }
+            }
+            yaml.dump(config, f)
+            config_path = f.name
+
+        try:
+            vdb = VirtualDB(config_path)
+            field_mapping = vdb._get_comparative_fields_for_dataset(
+                "BrentLab/primary", "primary_data"
+            )
+            assert field_mapping == {}
+        finally:
+            Path(config_path).unlink()
+
+    def test_get_comparative_analyses(self):
+        """Test getting comparative analysis relationships."""
+        with tempfile.NamedTemporaryFile(mode="w", suffix=".yaml", delete=False) as f:
+            config = {
+                "repositories": {
+                    "BrentLab/primary": {
+                        "dataset": {
+                            "primary_data": {
+                                "sample_id": {"field": "sample_id"},
+                                "comparative_analyses": [
+                                    {
+                                        "repo": "BrentLab/comparative",
+                                        "dataset": "comp_data",
+                                        "via_field": "binding_id",
+                                    }
+                                ],
+                            }
+                        }
+                    },
+                    "BrentLab/comparative": {
+                        "dataset": {"comp_data": {"dto_fdr": {"field": "dto_fdr"}}}
+                    },
+                }
+            }
+            yaml.dump(config, f)
+            config_path = f.name
+
+        try:
+            vdb = VirtualDB(config_path)
+            info = vdb.get_comparative_analyses()
+
+            # Check primary to comparative mapping
+            assert "BrentLab/primary/primary_data" in info["primary_to_comparative"]
+            links = info["primary_to_comparative"]["BrentLab/primary/primary_data"]
+            assert len(links) == 1
+            assert links[0]["comparative_repo"] == "BrentLab/comparative"
+            assert links[0]["comparative_dataset"] == "comp_data"
+            assert links[0]["via_field"] == "binding_id"
+
+            # Check comparative fields
+            assert "BrentLab/comparative/comp_data" in info["comparative_fields"]
+            assert (
+                "dto_fdr"
+                in info["comparative_fields"]["BrentLab/comparative/comp_data"]
+            )
+        finally:
+            Path(config_path).unlink()
+
+    def test_get_comparative_analyses_filtered(self):
+        """Test filtering comparative analyses by repo and config."""
+        with tempfile.NamedTemporaryFile(mode="w", suffix=".yaml", delete=False) as f:
+            config = {
+                "repositories": {
+                    "BrentLab/primary1": {
+                        "dataset": {
+                            "data1": {
+                                "sample_id": {"field": "sample_id"},
+                                "comparative_analyses": [
+                                    {
+                                        "repo": "BrentLab/comp",
+                                        "dataset": "comp_data",
+                                        "via_field": "id1",
+                                    }
+                                ],
+                            }
+                        }
+                    },
+                    "BrentLab/primary2": {
+                        "dataset": {
+                            "data2": {
+                                "sample_id": {"field": "sample_id"},
+                                "comparative_analyses": [
+                                    {
+                                        "repo": "BrentLab/comp",
+                                        "dataset": "comp_data",
+                                        "via_field": "id2",
+                                    }
+                                ],
+                            }
+                        }
+                    },
+                }
+            }
+            yaml.dump(config, f)
+            config_path = f.name
+
+        try:
+            vdb = VirtualDB(config_path)
+
+            # Get all
+            all_info = vdb.get_comparative_analyses()
+            assert len(all_info["primary_to_comparative"]) == 2
+
+            # Filter by repo and config
+            filtered = vdb.get_comparative_analyses("BrentLab/primary1", "data1")
+            assert len(filtered["primary_to_comparative"]) == 1
+            assert "BrentLab/primary1/data1" in filtered["primary_to_comparative"]
+
+            # Filter by repo only
+            repo_filtered = vdb.get_comparative_analyses("BrentLab/primary2")
+            assert len(repo_filtered["primary_to_comparative"]) == 1
+            assert "BrentLab/primary2/data2" in repo_filtered["primary_to_comparative"]
+        finally:
+            Path(config_path).unlink()
+
+
+# Note: Full integration tests with real HuggingFace datasets would go here
+# but are excluded as they require network access and specific test datasets.
+# These tests cover the core logic and would be supplemented with integration
+# tests using the actual sample config and real datasets like harbison_2004.
diff --git a/tfbpapi/virtual_db.py b/tfbpapi/virtual_db.py
new file mode 100644
index 0000000..f6dd12e
--- /dev/null
+++ b/tfbpapi/virtual_db.py
@@ -0,0 +1,1345 @@
+"""
+VirtualDB provides a unified query interface across heterogeneous datasets.
+
+This module enables cross-dataset queries with standardized field names and values,
+mapping varying experimental condition structures to a common schema through external
+YAML configuration.
+
+Key Components:
+- VirtualDB: Main interface for unified cross-dataset queries
+- Helper functions: get_nested_value(), normalize_value() for metadata extraction
+- Configuration-driven schema via models.MetadataConfig
+
+Example Usage:
+    >>> from tfbpapi.datainfo import VirtualDB
+    >>> vdb = VirtualDB("config.yaml")
+    >>>
+    >>> # Discover available fields
+    >>> fields = vdb.get_fields()
+    >>> print(fields)  # ["carbon_source", "temperature_celsius", ...]
+    >>>
+    >>> # Query across datasets
+    >>> df = vdb.query(
+    ...     filters={"carbon_source": "glucose", "temperature_celsius": 30},
+    ...     fields=["sample_id", "carbon_source", "temperature_celsius"]
+    ... )
+    >>>
+    >>> # Get complete data with measurements
+    >>> df = vdb.query(
+    ...     filters={"carbon_source": "glucose"},
+    ...     complete=True
+    ... )
+
+"""
+
+from __future__ import annotations
+
+from pathlib import Path
+from typing import Any
+
+import duckdb
+import pandas as pd
+
+from tfbpapi.datacard import DataCard
+from tfbpapi.errors import DataCardError
+from tfbpapi.hf_cache_manager import HfCacheManager
+from tfbpapi.models import MetadataConfig, PropertyMapping
+
+
+def get_nested_value(data: dict, path: str) -> Any:
+    """
+    Navigate nested dict/list using dot notation.
+
+    Handles missing intermediate keys gracefully by returning None.
+    Supports extracting properties from lists of dicts.
+
+    :param data: Dictionary to navigate
+    :param path: Dot-separated path (e.g., "media.carbon_source.compound")
+    :return: Value at path or None if not found
+
+    Examples:
+        Simple nested dict:
+            get_nested_value({"media": {"name": "YPD"}}, "media.name")
+            Returns: "YPD"
+
+        List of dicts - extract property from each item:
+            get_nested_value(
+                {"media": {"carbon_source": [{"compound": "glucose"},
+                {"compound": "galactose"}]}},
+                "media.carbon_source.compound"
+            )
+            Returns: ["glucose", "galactose"]
+
+    """
+    if not isinstance(data, dict):
+        return None
+
+    keys = path.split(".")
+    current = data
+
+    for i, key in enumerate(keys):
+        if isinstance(current, dict):
+            if key not in current:
+                return None
+            current = current[key]
+        elif isinstance(current, list):
+            # If current is a list and we have more keys,
+            # extract property from each item
+            if i < len(keys):
+                # Extract the remaining path from each list item
+                remaining_path = ".".join(keys[i:])
+                results = []
+                for item in current:
+                    if isinstance(item, dict):
+                        val = get_nested_value(item, remaining_path)
+                        if val is not None:
+                            results.append(val)
+                return results if results else None
+        else:
+            return None
+
+    return current
+
+
+def normalize_value(
+    actual_value: Any,
+    aliases: dict[str, list[Any]] | None,
+    missing_value_label: str | None = None,
+) -> str:
+    """
+    Normalize a value using optional alias mappings (case-insensitive).
+
+    Returns the alias name if a match is found, otherwise returns the
+    original value as a string. Handles missing values by returning
+    the configured missing_value_label.
+
+    :param actual_value: The value from the data to normalize
+    :param aliases: Optional dict mapping alias names to lists of actual values.
+                    Example: {"glucose": ["D-glucose", "dextrose"]}
+    :param missing_value_label: Label to use for None/missing values
+    :return: Alias name if match found, missing_value_label if None,
+             otherwise str(actual_value)
+
+    Examples:
+        With aliases - exact match:
+            normalize_value("D-glucose", {"glucose": ["D-glucose", "dextrose"]})
+            Returns: "glucose"
+
+        With aliases - case-insensitive match:
+            normalize_value("DEXTROSE", {"glucose": ["D-glucose", "dextrose"]})
+            Returns: "glucose"
+
+        Missing value:
+            normalize_value(None, None, "unspecified")
+            Returns: "unspecified"
+
+        No alias match - pass through:
+            normalize_value("maltose", {"glucose": ["D-glucose"]})
+            Returns: "maltose"
+
+    """
+    # Handle None/missing values
+    if actual_value is None:
+        return missing_value_label if missing_value_label else "None"
+
+    if aliases is None:
+        return str(actual_value)
+
+    # Convert to string for comparison (case-insensitive)
+    actual_str = str(actual_value).lower()
+
+    # Check each alias mapping
+    for alias_name, actual_values in aliases.items():
+        for val in actual_values:
+            if str(val).lower() == actual_str:
+                return alias_name
+
+    # No match found - pass through original value
+    return str(actual_value)
+
+
+class VirtualDB:
+    """
+    Unified query interface across heterogeneous datasets.
+
+    VirtualDB provides a virtual database layer over multiple HuggingFace datasets,
+    allowing cross-dataset queries with standardized field names and normalized values.
+    Each configured dataset becomes a view with a common schema defined by external
+    YAML configuration.
+
+    The YAML configuration specifies:
+    1. Property mappings: How to extract each field from dataset structures
+    2. Factor aliases: Normalize varying terminologies to standard values
+    3. Missing value labels: Handle missing data consistently
+    4. Descriptions: Document each field's semantics
+
+    Attributes:
+        config: MetadataConfig instance with all configuration
+        token: Optional HuggingFace token for private datasets
+        cache: Dict mapping (repo_id, config_name) to cached DataFrame views
+
+    """
+
+    def __init__(self, config_path: Path | str, token: str | None = None):
+        """
+        Initialize VirtualDB with configuration and optional auth token.
+
+        :param config_path: Path to YAML configuration file
+        :param token: Optional HuggingFace token for private datasets
+        :raises FileNotFoundError: If config file doesn't exist
+        :raises ValueError: If configuration is invalid
+
+        """
+        self.config = MetadataConfig.from_yaml(config_path)
+        self.token = token
+        self.cache: dict[tuple[str, str], pd.DataFrame] = {}
+        # Build mapping of comparative dataset references
+        self._comparative_links = self._build_comparative_links()
+
+    def get_fields(
+        self, repo_id: str | None = None, config_name: str | None = None
+    ) -> list[str]:
+        """
+        Get list of queryable fields.
+
+        :param repo_id: Optional repository ID to filter to specific dataset
+        :param config_name: Optional config name (required if repo_id provided)
+        :return: List of field names
+
+        Examples:
+            All fields across all datasets:
+                fields = vdb.get_fields()
+
+            Fields for specific dataset:
+                fields = vdb.get_fields("BrentLab/harbison_2004", "harbison_2004")
+
+        """
+        if repo_id is not None and config_name is not None:
+            # Get fields for specific dataset
+            mappings = self.config.get_property_mappings(repo_id, config_name)
+            return sorted(mappings.keys())
+
+        if repo_id is not None or config_name is not None:
+            raise ValueError(
+                "Both repo_id and config_name must be provided, or neither"
+            )
+
+        # Get all fields across all datasets
+        all_fields: set[str] = set()
+        for repo_id, repo_config in self.config.repositories.items():
+            # Add repo-wide fields
+            all_fields.update(repo_config.properties.keys())
+            # Add dataset-specific fields
+            if repo_config.dataset:
+                for dataset_config in repo_config.dataset.values():
+                    # DatasetVirtualDBConfig stores property mappings in model_extra
+                    if (
+                        hasattr(dataset_config, "model_extra")
+                        and dataset_config.model_extra
+                    ):
+                        all_fields.update(dataset_config.model_extra.keys())
+                    # Also include special fields if they exist
+                    if dataset_config.sample_id:
+                        all_fields.add("sample_id")
+
+        return sorted(all_fields)
+
+    def get_common_fields(self) -> list[str]:
+        """
+        Get fields present in ALL configured datasets.
+
+        :return: List of field names common to all datasets
+
+        Example:
+            common = vdb.get_common_fields()
+            # ["carbon_source", "temperature_celsius"]
+
+        """
+        if not self.config.repositories:
+            return []
+
+        # Get field sets for each dataset
+        dataset_fields: list[set[str]] = []
+        for repo_id, repo_config in self.config.repositories.items():
+            if repo_config.dataset:
+                for config_name in repo_config.dataset.keys():
+                    mappings = self.config.get_property_mappings(repo_id, config_name)
+                    dataset_fields.append(set(mappings.keys()))
+
+        if not dataset_fields:
+            return []
+
+        # Return intersection
+        common = set.intersection(*dataset_fields)
+        return sorted(common)
+
+    def get_unique_values(
+        self, field: str, by_dataset: bool = False
+    ) -> list[str] | dict[str, list[str]]:
+        """
+        Get unique values for a field across datasets (with normalization).
+
+        :param field: Field name to get values for
+        :param by_dataset: If True, return dict keyed by dataset identifier
+        :return: List of unique normalized values, or dict if by_dataset=True
+
+        Examples:
+            All unique values:
+                values = vdb.get_unique_values("carbon_source")
+                # ["glucose", "galactose", "raffinose"]
+
+            Values by dataset:
+                values = vdb.get_unique_values("carbon_source", by_dataset=True)
+                # {"BrentLab/harbison_2004": ["glucose", "galactose"],
+                #  "BrentLab/kemmeren_2014": ["glucose", "raffinose"]}
+
+        """
+        if by_dataset:
+            result: dict[str, list[str]] = {}
+        else:
+            all_values: set[str] = set()
+
+        # Query each dataset that has this field
+        for repo_id, repo_config in self.config.repositories.items():
+            if repo_config.dataset:
+                for config_name in repo_config.dataset.keys():
+                    mappings = self.config.get_property_mappings(repo_id, config_name)
+                    if field not in mappings:
+                        continue
+
+                    # Build metadata table for this dataset
+                    metadata_df = self._build_metadata_table(repo_id, config_name)
+                    if metadata_df.empty or field not in metadata_df.columns:
+                        continue
+
+                    # Get unique values (already normalized)
+                    unique_vals = metadata_df[field].dropna().unique().tolist()
+
+                    if by_dataset:
+                        dataset_key = f"{repo_id}/{config_name}"
+                        result[dataset_key] = sorted(unique_vals)
+                    else:
+                        all_values.update(unique_vals)
+
+        if by_dataset:
+            return result
+        else:
+            return sorted(all_values)
+
+    def get_comparative_analyses(
+        self, repo_id: str | None = None, config_name: str | None = None
+    ) -> dict[str, Any]:
+        """
+        Get information about comparative analysis relationships.
+
+        Returns information about which comparative datasets are available
+        and how they link to primary datasets. Useful for discovering
+        what cross-dataset analyses can be performed.
+
+        :param repo_id: Optional repository ID to filter to specific repo
+        :param config_name: Optional config name (requires repo_id)
+        :return: Dictionary with two keys:
+                 - "primary_to_comparative": Maps primary datasets to their
+                   comparative analyses
+                 - "comparative_fields": Maps comparative datasets to fields
+                   available for joining
+        :raises ValueError: If config_name provided without repo_id
+
+        Examples:
+            Get all comparative analysis relationships:
+                info = vdb.get_comparative_analyses()
+
+            Get relationships for specific primary dataset:
+                info = vdb.get_comparative_analyses(
+                    "BrentLab/callingcards", "annotated_features"
+                )
+
+        """
+        if config_name and not repo_id:
+            raise ValueError("repo_id required when config_name is specified")
+
+        primary_to_comparative: dict[str, list[dict[str, str]]] = {}
+        comparative_fields: dict[str, list[str]] = {}
+
+        # Filter links based on parameters
+        if repo_id and config_name:
+            # Specific dataset requested
+            links_to_process = {
+                (repo_id, config_name): self._comparative_links.get(
+                    (repo_id, config_name), {}
+                )
+            }
+        elif repo_id:
+            # All configs in specific repo
+            links_to_process = {
+                k: v for k, v in self._comparative_links.items() if k[0] == repo_id
+            }
+        else:
+            # All links
+            links_to_process = self._comparative_links
+
+        # Build primary to comparative mapping
+        for (prim_repo, prim_config), link_info in links_to_process.items():
+            if "comparative_analyses" not in link_info:
+                continue
+
+            dataset_key = f"{prim_repo}/{prim_config}"
+            primary_to_comparative[dataset_key] = []
+
+            for ca in link_info["comparative_analyses"]:
+                primary_to_comparative[dataset_key].append(
+                    {
+                        "comparative_repo": ca["repo"],
+                        "comparative_dataset": ca["dataset"],
+                        "via_field": ca["via_field"],
+                    }
+                )
+
+                # Track which fields are available from comparative datasets
+                comp_key = f"{ca['repo']}/{ca['dataset']}"
+                if comp_key not in comparative_fields:
+                    # Get fields from the comparative dataset
+                    # First try config mappings
+                    comp_fields = self.get_fields(ca["repo"], ca["dataset"])
+
+                    # If no mappings, get actual fields from DataCard
+                    if not comp_fields:
+                        try:
+                            card = DataCard(ca["repo"], token=self.token)
+                            config = card.get_config(ca["dataset"])
+                            if config and config.dataset_info:
+                                comp_fields = [
+                                    f.name for f in config.dataset_info.features
+                                ]
+                        except Exception:
+                            comp_fields = []
+
+                    comparative_fields[comp_key] = comp_fields
+
+        return {
+            "primary_to_comparative": primary_to_comparative,
+            "comparative_fields": comparative_fields,
+        }
+
+    def query(
+        self,
+        filters: dict[str, Any] | None = None,
+        datasets: list[tuple[str, str]] | None = None,
+        fields: list[str] | None = None,
+        complete: bool = False,
+    ) -> pd.DataFrame:
+        """
+        Query VirtualDB with optional filters and field selection.
+
+        :param filters: Dict of field:value pairs to filter on
+        :param datasets: List of (repo_id, config_name) tuples to query (None = all)
+        :param fields: List of field names to return (None = all)
+        :param complete: If True, return measurement-level data; if False, sample-level
+        :return: DataFrame with query results
+
+        Examples:
+            Basic query across all datasets:
+                df = vdb.query(filters={"carbon_source": "glucose"})
+
+            Query specific datasets with field selection:
+                df = vdb.query(
+                    filters={"carbon_source": "glucose", "temperature_celsius": 30},
+                    datasets=[("BrentLab/harbison_2004", "harbison_2004")],
+                    fields=["sample_id", "carbon_source", "temperature_celsius"]
+                )
+
+            Complete data with measurements:
+                df = vdb.query(
+                    filters={"carbon_source": "glucose"},
+                    complete=True
+                )
+
+        """
+        # Determine which datasets to query
+        if datasets is None:
+            # Query all configured datasets
+            datasets = []
+            for repo_id, repo_config in self.config.repositories.items():
+                if repo_config.dataset:
+                    for config_name in repo_config.dataset.keys():
+                        datasets.append((repo_id, config_name))
+
+        if not datasets:
+            return pd.DataFrame()
+
+        # Query each dataset
+        results: list[pd.DataFrame] = []
+        for repo_id, config_name in datasets:
+            # Build metadata table
+            metadata_df = self._build_metadata_table(repo_id, config_name)
+            if metadata_df.empty:
+                continue
+
+            # Separate filters into primary and comparative
+            primary_filters = {}
+            comparative_filters = {}
+            if filters:
+                # Get comparative field mapping
+                comp_field_mapping = self._get_comparative_fields_for_dataset(
+                    repo_id, config_name
+                )
+                for field, value in filters.items():
+                    if field in comp_field_mapping:
+                        comparative_filters[field] = value
+                    else:
+                        primary_filters[field] = value
+
+            # Apply primary filters first
+            if primary_filters:
+                metadata_df = self._apply_filters(
+                    metadata_df, primary_filters, repo_id, config_name
+                )
+
+            # Enrich with comparative data if needed
+            # IMPORTANT: Do this BEFORE getting complete data so comparative fields
+            # are joined at the sample level, not measurement level
+            # This happens when: fields are requested from comparative datasets
+            # OR when filtering on comparative fields
+            if fields or comparative_filters:
+                comp_field_mapping = self._get_comparative_fields_for_dataset(
+                    repo_id, config_name
+                )
+                if fields:
+                    requested_comp_fields = [
+                        f for f in fields if f in comp_field_mapping
+                    ]
+                # Also need fields that are filtered on
+                filtered_comp_fields = [
+                    f for f in comparative_filters.keys() if f in comp_field_mapping
+                ]
+                all_comp_fields = list(
+                    set(requested_comp_fields + filtered_comp_fields)
+                )
+                if all_comp_fields:
+                    metadata_df = self._enrich_with_comparative_data(
+                        metadata_df, repo_id, config_name, all_comp_fields
+                    )
+
+            # Apply comparative filters after enrichment
+            if comparative_filters:
+                metadata_df = self._apply_filters(
+                    metadata_df, comparative_filters, repo_id, config_name
+                )
+
+            # If complete=True, join with full data
+            # Do this AFTER comparative enrichment so DTO fields are already added
+            if complete:
+                sample_ids = metadata_df["sample_id"].tolist()
+                if sample_ids:
+                    full_df = self._get_complete_data(
+                        repo_id, config_name, sample_ids, metadata_df
+                    )
+                    if not full_df.empty:
+                        metadata_df = full_df
+
+            # Select requested fields
+            if fields:
+                # Keep sample_id and any dataset identifier columns
+                keep_cols = ["sample_id"]
+                if "dataset_id" in metadata_df.columns:
+                    keep_cols.append("dataset_id")
+                # Add requested fields that exist
+                for field in fields:
+                    if field in metadata_df.columns and field not in keep_cols:
+                        keep_cols.append(field)
+                metadata_df = metadata_df[keep_cols].copy()
+
+            # Add dataset identifier
+            if "dataset_id" not in metadata_df.columns:
+                metadata_df = metadata_df.copy()
+                metadata_df["dataset_id"] = f"{repo_id}/{config_name}"
+
+            results.append(metadata_df)
+
+        if not results:
+            return pd.DataFrame()
+
+        # Concatenate results, filling NaN for missing columns
+        return pd.concat(results, ignore_index=True, sort=False)
+
+    def materialize_views(self, datasets: list[tuple[str, str]] | None = None) -> None:
+        """
+        Build and cache metadata DataFrames for faster subsequent queries.
+
+        :param datasets: List of (repo_id, config_name) tuples to materialize
+                        (None = materialize all)
+
+        Example:
+            vdb.materialize_views()  # Cache all datasets
+            vdb.materialize_views([("BrentLab/harbison_2004", "harbison_2004")])
+
+        """
+        if datasets is None:
+            # Materialize all configured datasets
+            datasets = []
+            for repo_id, repo_config in self.config.repositories.items():
+                if repo_config.dataset:
+                    for config_name in repo_config.dataset.keys():
+                        datasets.append((repo_id, config_name))
+
+        for repo_id, config_name in datasets:
+            # Build and cache
+            self._build_metadata_table(repo_id, config_name, use_cache=False)
+
+    def invalidate_cache(self, datasets: list[tuple[str, str]] | None = None) -> None:
+        """
+        Clear cached metadata DataFrames.
+
+        :param datasets: List of (repo_id, config_name) tuples to invalidate
+                        (None = invalidate all)
+
+        Example:
+            vdb.invalidate_cache()  # Clear all cache
+            vdb.invalidate_cache([("BrentLab/harbison_2004", "harbison_2004")])
+
+        """
+        if datasets is None:
+            self.cache.clear()
+        else:
+            for dataset_key in datasets:
+                if dataset_key in self.cache:
+                    del self.cache[dataset_key]
+
+    def _build_comparative_links(self) -> dict[tuple[str, str], dict[str, Any]]:
+        """
+        Build mapping of primary datasets to their comparative dataset references.
+
+        Returns dict keyed by (repo_id, config_name) with value being dict: {
+        "comparative_analyses": [         {             "repo": comparative_repo_id,
+        "dataset": comparative_config_name,             "via_field":
+        field_name_with_composite_ids         }     ] }
+
+        """
+        links: dict[tuple[str, str], dict[str, Any]] = {}
+
+        for repo_id, repo_config in self.config.repositories.items():
+            if not repo_config.dataset:
+                continue
+
+            for config_name, dataset_config in repo_config.dataset.items():
+                if dataset_config.comparative_analyses:
+                    links[(repo_id, config_name)] = {
+                        "comparative_analyses": [
+                            {
+                                "repo": ca.repo,
+                                "dataset": ca.dataset,
+                                "via_field": ca.via_field,
+                            }
+                            for ca in dataset_config.comparative_analyses
+                        ]
+                    }
+
+        return links
+
+    def _get_comparative_fields_for_dataset(
+        self, repo_id: str, config_name: str
+    ) -> dict[str, dict[str, str]]:
+        """
+        Get mapping of comparative fields available for a primary dataset.
+
+        :param repo_id: Primary dataset repository ID
+        :param config_name: Primary dataset config name
+        :return: Dict mapping field_name to comparative dataset info
+                 {field_name: {
+                     "comp_repo": comparative_repo_id,
+                     "comp_dataset": comparative_dataset_name,
+                     "via_field": field_with_composite_ids
+                 }}
+
+        Example:
+            For callingcards dataset linked to DTO via binding_id:
+            {
+                "dto_fdr": {
+                    "comp_repo": "BrentLab/yeast_comparative_analysis",
+                    "comp_dataset": "dto",
+                    "via_field": "binding_id"
+                },
+                "dto_empirical_pvalue": {...}
+            }
+
+        """
+        field_mapping: dict[str, dict[str, str]] = {}
+
+        # Get comparative analyses for this dataset
+        links = self._comparative_links.get((repo_id, config_name), {})
+        if "comparative_analyses" not in links:
+            return field_mapping
+
+        # For each comparative dataset, get its fields
+        for ca in links["comparative_analyses"]:
+            comp_repo = ca["repo"]
+            comp_dataset = ca["dataset"]
+            via_field = ca["via_field"]
+
+            # Get fields from comparative dataset
+            comp_fields = self.get_fields(comp_repo, comp_dataset)
+
+            # If no fields from config, try DataCard
+            if not comp_fields:
+                try:
+                    from tfbpapi.datacard import DataCard
+
+                    card = DataCard(comp_repo, token=self.token)
+                    config = card.get_config(comp_dataset)
+                    if config and config.dataset_info:
+                        comp_fields = [f.name for f in config.dataset_info.features]
+                except Exception:
+                    comp_fields = []
+
+            # Map each field to this comparative dataset
+            for field_name in comp_fields:
+                # Skip the via_field itself (it's the join key)
+                if field_name == via_field:
+                    continue
+
+                field_mapping[field_name] = {
+                    "comp_repo": comp_repo,
+                    "comp_dataset": comp_dataset,
+                    "via_field": via_field,
+                }
+
+        return field_mapping
+
+    def _enrich_with_comparative_data(
+        self,
+        primary_df: pd.DataFrame,
+        repo_id: str,
+        config_name: str,
+        requested_fields: list[str],
+    ) -> pd.DataFrame:
+        """
+        Enrich primary dataset with fields from comparative datasets.
+
+        :param primary_df: Primary dataset DataFrame with sample_id column
+        :param repo_id: Primary dataset repository ID
+        :param config_name: Primary dataset config name
+        :param requested_fields: List of field names requested by user
+        :return: DataFrame enriched with comparative fields
+
+        """
+        # Get mapping of which fields come from which comparative datasets
+        comp_field_mapping = self._get_comparative_fields_for_dataset(
+            repo_id, config_name
+        )
+
+        if not comp_field_mapping:
+            return primary_df
+
+        # Find which requested fields are from comparative datasets
+        comp_fields_to_fetch = [f for f in requested_fields if f in comp_field_mapping]
+
+        if not comp_fields_to_fetch:
+            return primary_df
+
+        # Group fields by comparative dataset to minimize queries
+        by_comp_dataset: dict[tuple[str, str, str], list[str]] = {}
+        for field in comp_fields_to_fetch:
+            info = comp_field_mapping[field]
+            key = (info["comp_repo"], info["comp_dataset"], info["via_field"])
+            if key not in by_comp_dataset:
+                by_comp_dataset[key] = []
+            by_comp_dataset[key].append(field)
+
+        # For each comparative dataset, load and join
+        result_df = primary_df.copy()
+
+        for (comp_repo, comp_dataset, via_field), fields in by_comp_dataset.items():
+            try:
+                # Load comparative dataset using HfCacheManager
+                # but query the raw data table instead of metadata view
+                from tfbpapi.hf_cache_manager import HfCacheManager
+
+                comp_cache_mgr = HfCacheManager(
+                    comp_repo, duckdb_conn=duckdb.connect(":memory:"), token=self.token
+                )
+
+                # Get the config to load data
+                comp_config = comp_cache_mgr.get_config(comp_dataset)
+                if not comp_config:
+                    continue
+
+                # Load the data (this will download and register parquet files)
+                result = comp_cache_mgr._get_metadata_for_config(comp_config)
+                if not result.get("success", False):
+                    continue
+
+                # Now query the raw data table directly (not the metadata view)
+                # The raw table name is config_name without "metadata_" prefix
+                select_fields = [via_field] + fields
+                columns = ", ".join(select_fields)
+
+                # Query the actual parquet data by creating a view from the files
+                try:
+                    # Get file paths that were loaded
+                    import glob
+
+                    from huggingface_hub import snapshot_download
+
+                    cache_dir = snapshot_download(
+                        repo_id=comp_repo,
+                        repo_type="dataset",
+                        allow_patterns=f"{comp_dataset}/**/*.parquet",
+                        token=self.token,
+                    )
+
+                    parquet_files = glob.glob(
+                        f"{cache_dir}/{comp_dataset}/**/*.parquet", recursive=True
+                    )
+
+                    if not parquet_files:
+                        continue
+
+                    # Create a temporary view from parquet files
+                    temp_view = f"temp_{comp_dataset}_raw"
+                    files_sql = ", ".join([f"'{f}'" for f in parquet_files])
+                    comp_cache_mgr.duckdb_conn.execute(
+                        f"CREATE OR REPLACE VIEW {temp_view} AS "
+                        f"SELECT * FROM read_parquet([{files_sql}])"
+                    )
+
+                    # Query the view
+                    sql = f"SELECT {columns} FROM {temp_view}"
+                    comp_df = comp_cache_mgr.duckdb_conn.execute(sql).fetchdf()
+
+                except Exception:
+                    # If direct parquet loading fails, skip this comparative dataset
+                    continue
+
+                if comp_df.empty:
+                    continue
+
+                # Parse composite identifiers to extract sample_id
+                # via_field contains values like
+                # "BrentLab/harbison_2004;harbison_2004;123"
+                # We need to extract the third component and match on
+                # current repo/config
+                def extract_sample_id(composite_id: str) -> str | None:
+                    """Extract sample_id if composite matches current dataset."""
+                    if pd.isna(composite_id):
+                        return None
+                    try:
+                        parts = composite_id.split(";")
+                        if len(parts) != 3:
+                            return None
+                        # Check if this composite ID references our dataset
+                        if parts[0] == repo_id and parts[1] == config_name:
+                            return parts[2]
+                        return None
+                    except Exception:
+                        return None
+
+                comp_df["_join_sample_id"] = comp_df[via_field].apply(extract_sample_id)
+
+                # Convert _join_sample_id to match primary_df sample_id dtype
+                # This handles cases where sample_id is int but composite has string
+                if "_join_sample_id" in comp_df.columns:
+                    primary_dtype = primary_df["sample_id"].dtype
+                    if pd.api.types.is_integer_dtype(primary_dtype):
+                        # Convert to numeric, coercing errors to NaN
+                        comp_df["_join_sample_id"] = pd.to_numeric(
+                            comp_df["_join_sample_id"], errors="coerce"
+                        )
+                    elif pd.api.types.is_string_dtype(primary_dtype):
+                        comp_df["_join_sample_id"] = comp_df["_join_sample_id"].astype(
+                            str
+                        )
+
+                # Filter to only rows that match our dataset
+                comp_df = comp_df[comp_df["_join_sample_id"].notna()].copy()
+
+                if comp_df.empty:
+                    continue
+
+                # Drop the via_field column (we don't need it in results)
+                comp_df = comp_df.drop(columns=[via_field])
+
+                # Merge with primary data
+                result_df = result_df.merge(
+                    comp_df, left_on="sample_id", right_on="_join_sample_id", how="left"
+                )
+
+                # Drop the temporary join column
+                result_df = result_df.drop(columns=["_join_sample_id"])
+
+            except Exception:
+                # If enrichment fails for this comparative dataset, continue
+                continue
+
+        return result_df
+
+    @staticmethod
+    def _parse_composite_identifier(composite_id: str) -> tuple[str, str, str]:
+        """
+        Parse composite sample identifier into components.
+
+        :param composite_id: Composite ID in format "repo_id;config_name;sample_id"
+        :return: Tuple of (repo_id, config_name, sample_id)
+
+        Example:
+            _parse_composite_identifier(
+                "BrentLab/harbison_2004;harbison_2004;sample_42"
+            )
+            Returns: ("BrentLab/harbison_2004", "harbison_2004", "sample_42")
+
+        """
+        parts = composite_id.split(";")
+        if len(parts) != 3:
+            raise ValueError(
+                f"Invalid composite ID format: {composite_id}. "
+                "Expected 'repo_id;config_name;sample_id'"
+            )
+        return parts[0], parts[1], parts[2]
+
+    def _build_metadata_table(
+        self, repo_id: str, config_name: str, use_cache: bool = True
+    ) -> pd.DataFrame:
+        """
+        Build metadata table for a single dataset.
+
+        Extracts sample-level metadata from experimental conditions hierarchy and field
+        definitions, with normalization and missing value handling.
+
+        :param repo_id: Repository ID
+        :param config_name: Configuration name
+        :param use_cache: Whether to use/update cache
+        :return: DataFrame with one row per sample_id
+
+        """
+        cache_key = (repo_id, config_name)
+
+        # Check cache
+        if use_cache and cache_key in self.cache:
+            return self.cache[cache_key]
+
+        try:
+            # Load DataCard and CacheManager
+            card = DataCard(repo_id, token=self.token)
+            cache_mgr = HfCacheManager(
+                repo_id, duckdb_conn=duckdb.connect(":memory:"), token=self.token
+            )
+
+            # Get property mappings
+            property_mappings = self.config.get_property_mappings(repo_id, config_name)
+            if not property_mappings:
+                return pd.DataFrame()
+
+            # Extract repo/config-level metadata
+            repo_metadata = self._extract_repo_level(
+                card, config_name, property_mappings
+            )
+
+            # Extract field-level metadata
+            field_metadata = self._extract_field_level(
+                card, config_name, property_mappings
+            )
+
+            # Get sample-level data from HuggingFace
+            config = card.get_config(config_name)
+
+            # Check if this is a comparative dataset
+            from tfbpapi.models import DatasetType
+
+            is_comparative = (
+                config
+                and hasattr(config, "dataset_type")
+                and config.dataset_type == DatasetType.COMPARATIVE
+            )
+
+            if config and hasattr(config, "metadata_fields") and config.metadata_fields:
+                # Select only metadata fields
+                columns = ", ".join(config.metadata_fields)
+                if not is_comparative and "sample_id" not in config.metadata_fields:
+                    columns = f"sample_id, {columns}"
+                sql = f"SELECT DISTINCT {columns} FROM {config_name}"
+            else:
+                # No metadata_fields specified, select all
+                sql = f"SELECT DISTINCT * FROM {config_name}"
+
+            df = cache_mgr.query(sql, config_name)
+
+            # For non-comparative datasets: one row per sample_id
+            # For comparative datasets: keep all rows (each row is a relationship)
+            if not is_comparative and "sample_id" in df.columns:
+                df = df.groupby("sample_id").first().reset_index()
+
+            # Add repo-level metadata as columns
+            for prop_name, values in repo_metadata.items():
+                # Use first value (repo-level properties are constant)
+                df[prop_name] = values[0] if values else None
+
+            # Add field-level metadata
+            if field_metadata:
+                df = self._add_field_metadata(df, field_metadata)
+
+            # Apply dtype conversions to DataFrame columns
+            df = self._apply_column_dtypes(df, property_mappings)
+
+            # Cache result
+            if use_cache:
+                self.cache[cache_key] = df
+
+            return df
+
+        except Exception as e:
+            # Log error for debugging with full traceback
+            import traceback
+
+            print(f"Error downloading metadata for {config_name}: {e}")
+            traceback.print_exc()
+            # Return empty DataFrame on error
+            return pd.DataFrame()
+
+    def _apply_column_dtypes(
+        self, df: pd.DataFrame, property_mappings: dict[str, PropertyMapping]
+    ) -> pd.DataFrame:
+        """
+        Apply dtype conversions to DataFrame columns based on property mappings.
+
+        :param df: DataFrame to apply conversions to
+        :param property_mappings: Property mappings with dtype specifications
+        :return: DataFrame with converted column dtypes
+
+        """
+        for prop_name, mapping in property_mappings.items():
+            # Skip if no dtype specified or column doesn't exist
+            if not mapping.dtype or prop_name not in df.columns:
+                continue
+
+            # Convert column dtype
+            try:
+                if mapping.dtype == "numeric":
+                    df[prop_name] = pd.to_numeric(df[prop_name], errors="coerce")
+                elif mapping.dtype == "bool":
+                    df[prop_name] = df[prop_name].astype(bool)
+                elif mapping.dtype == "string":
+                    df[prop_name] = df[prop_name].astype(str)
+            except (ValueError, TypeError):
+                # Conversion failed, leave as is
+                pass
+
+        return df
+
+    def _convert_dtype(self, value: Any, dtype: str) -> Any:
+        """
+        Convert value to specified data type.
+
+        :param value: The value to convert to a given `dtype`
+        :param dtype: Target data type ("numeric", "bool", "string")
+
+        :return: Converted value or None if conversion fails
+
+        """
+        if value is None:
+            return None
+
+        try:
+            if dtype == "numeric":
+                # Try float first (handles both int and float)
+                return float(value)
+            elif dtype == "bool":
+                return bool(value)
+            elif dtype == "string":
+                return str(value)
+            else:
+                # Unknown dtype, pass through unchanged
+                return value
+        except (ValueError, TypeError):
+            # Conversion failed, return None
+            return None
+
+    def _extract_repo_level(
+        self,
+        card: DataCard,
+        config_name: str,
+        property_mappings: dict[str, PropertyMapping],
+    ) -> dict[str, list[str]]:
+        """
+        Extract and normalize repo/config-level metadata.
+
+        :param card: DataCard instance
+        :param config_name: Configuration name
+        :param property_mappings: Property mappings for this dataset
+        :return: Dict mapping property names to normalized values
+
+        """
+        metadata: dict[str, list[str]] = {}
+
+        # Get experimental conditions
+        try:
+            conditions = card.get_experimental_conditions(config_name)
+        except DataCardError:
+            conditions = {}
+
+        if not conditions:
+            return metadata
+
+        # Extract each mapped property
+        for prop_name, mapping in property_mappings.items():
+            # Skip field-level mappings
+            if mapping.field is not None:
+                continue
+
+            # Build full path
+            # Note: `conditions` is already the experimental_conditions dict,
+            # so we don't add the prefix
+            full_path = mapping.path
+
+            # Get value at path
+            value = get_nested_value(conditions, full_path)  # type: ignore
+
+            # Handle missing values
+            missing_label = self.config.missing_value_labels.get(prop_name)
+            if value is None:
+                if missing_label:
+                    metadata[prop_name] = [missing_label]
+                continue
+
+            # Ensure value is a list
+            actual_values = [value] if not isinstance(value, list) else value
+
+            # Apply dtype conversion if specified
+            if mapping.dtype:
+                actual_values = [
+                    self._convert_dtype(v, mapping.dtype) for v in actual_values
+                ]
+
+            # Normalize using aliases
+            aliases = self.config.factor_aliases.get(prop_name)
+            normalized_values = [
+                normalize_value(v, aliases, missing_label) for v in actual_values
+            ]
+
+            metadata[prop_name] = normalized_values
+
+        return metadata
+
+    def _extract_field_level(
+        self,
+        card: DataCard,
+        config_name: str,
+        property_mappings: dict[str, PropertyMapping],
+    ) -> dict[str, dict[str, Any]]:
+        """
+        Extract and normalize field-level metadata.
+
+        :param card: DataCard instance
+        :param config_name: Configuration name
+        :param property_mappings: Property mappings for this dataset
+        :return: Dict mapping field values to their normalized metadata
+
+        """
+        field_metadata: dict[str, dict[str, Any]] = {}
+
+        # Group property mappings by field
+        field_mappings: dict[str, dict[str, PropertyMapping]] = {}
+        for prop_name, mapping in property_mappings.items():
+            # Only process if field is specified AND path exists
+            # (no path means it's just a column alias, not metadata extraction)
+            if mapping.field is not None and mapping.path is not None:
+                field_name = mapping.field
+                if field_name not in field_mappings:
+                    field_mappings[field_name] = {}
+                field_mappings[field_name][prop_name] = mapping
+
+        # Process each field that has mappings
+        for field_name, prop_mappings_dict in field_mappings.items():
+            # Get field definitions
+            definitions = card.get_field_definitions(config_name, field_name)
+            if not definitions:
+                continue
+
+            # Extract metadata for each field value
+            for field_value, definition in definitions.items():
+                if field_value not in field_metadata:
+                    field_metadata[field_value] = {}
+
+                for prop_name, mapping in prop_mappings_dict.items():
+                    # Get value at path
+                    value = get_nested_value(definition, mapping.path)  # type: ignore
+
+                    # Handle missing values
+                    missing_label = self.config.missing_value_labels.get(prop_name)
+                    if value is None:
+                        if missing_label:
+                            field_metadata[field_value][prop_name] = [missing_label]
+                        continue
+
+                    # Ensure value is a list
+                    actual_values = [value] if not isinstance(value, list) else value
+
+                    # Apply dtype conversion if specified
+                    if mapping.dtype:
+                        actual_values = [
+                            self._convert_dtype(v, mapping.dtype) for v in actual_values
+                        ]
+
+                    # Normalize using aliases
+                    aliases = self.config.factor_aliases.get(prop_name)
+                    normalized_values = [
+                        normalize_value(v, aliases, missing_label)
+                        for v in actual_values
+                    ]
+
+                    field_metadata[field_value][prop_name] = normalized_values
+
+        return field_metadata
+
+    def _add_field_metadata(
+        self, df: pd.DataFrame, field_metadata: dict[str, dict[str, Any]]
+    ) -> pd.DataFrame:
+        """
+        Add columns from field-level metadata to DataFrame.
+
+        :param df: DataFrame with base sample metadata
+        :param field_metadata: Dict mapping field values to their properties
+        :return: DataFrame with additional property columns
+
+        """
+        # For each field value, add its properties as columns
+        for field_value, properties in field_metadata.items():
+            for prop_name, prop_values in properties.items():
+                # Initialize column if needed
+                if prop_name not in df.columns:
+                    df[prop_name] = None
+
+                # Find rows where any column matches field_value
+                for col in df.columns:
+                    if col in [prop_name, "sample_id", "dataset_id"]:
+                        continue
+                    mask = df[col] == field_value
+                    if mask.any():
+                        # Set property value (take first from list)
+                        value = prop_values[0] if prop_values else None
+                        df.loc[mask, prop_name] = value
+
+        return df
+
+    def _apply_filters(
+        self,
+        df: pd.DataFrame,
+        filters: dict[str, Any],
+        repo_id: str,
+        config_name: str,
+    ) -> pd.DataFrame:
+        """
+        Apply filters to DataFrame with alias expansion and numeric handling.
+
+        :param df: DataFrame to filter
+        :param filters: Dict of field:value pairs
+        :param repo_id: Repository ID (for alias lookup)
+        :param config_name: Config name (for alias lookup)
+        :return: Filtered DataFrame
+
+        """
+        for field, filter_value in filters.items():
+            if field not in df.columns:
+                continue
+
+            # Handle numeric range filters
+            if isinstance(filter_value, tuple):
+                operator = filter_value[0]
+                if operator == "between" and len(filter_value) == 3:
+                    df = df[
+                        (df[field] >= filter_value[1]) & (df[field] <= filter_value[2])
+                    ]
+                elif operator in (">=", ">", "<=", "<", "==", "!="):
+                    if operator == ">=":
+                        df = df[df[field] >= filter_value[1]]
+                    elif operator == ">":
+                        df = df[df[field] > filter_value[1]]
+                    elif operator == "<=":
+                        df = df[df[field] <= filter_value[1]]
+                    elif operator == "<":
+                        df = df[df[field] < filter_value[1]]
+                    elif operator == "==":
+                        df = df[df[field] == filter_value[1]]
+                    elif operator == "!=":
+                        df = df[df[field] != filter_value[1]]
+            else:
+                # Exact match with alias expansion
+                aliases = self.config.factor_aliases.get(field)
+                if aliases:
+                    # Expand filter value to all aliases
+                    expanded_values = [filter_value]
+                    for alias_name, actual_values in aliases.items():
+                        if alias_name == filter_value:
+                            # Add all actual values for this alias
+                            expanded_values.extend([str(v) for v in actual_values])
+                    df = df[df[field].isin(expanded_values)]
+                else:
+                    # No aliases, exact match
+                    df = df[df[field] == filter_value]
+
+        return df
+
+    def _get_complete_data(
+        self,
+        repo_id: str,
+        config_name: str,
+        sample_ids: list[str],
+        metadata_df: pd.DataFrame,
+    ) -> pd.DataFrame:
+        """
+        Get complete data (with measurements) for sample_ids.
+
+        Uses WHERE sample_id IN (...) approach for efficient retrieval.
+
+        :param repo_id: Repository ID
+        :param config_name: Configuration name
+        :param sample_ids: List of sample IDs to retrieve
+        :param metadata_df: Metadata DataFrame to merge with
+        :return: DataFrame with measurements and metadata
+
+        """
+        try:
+            cache_mgr = HfCacheManager(
+                repo_id, duckdb_conn=duckdb.connect(":memory:"), token=self.token
+            )
+
+            # Build IN clause
+            sample_id_list = ", ".join([f"'{sid}'" for sid in sample_ids])
+            sql = f"""
+                SELECT *
+                FROM {config_name}
+                WHERE sample_id IN ({sample_id_list})
+            """
+
+            full_df = cache_mgr.query(sql, config_name)
+
+            # Merge with metadata (metadata_df has normalized fields)
+            # Drop metadata columns from full_df to avoid duplicates
+            metadata_cols = [
+                col
+                for col in metadata_df.columns
+                if col not in ["sample_id", "dataset_id"]
+            ]
+            full_df = full_df.drop(
+                columns=[c for c in metadata_cols if c in full_df.columns],
+                errors="ignore",
+            )
+
+            # Merge on sample_id
+            result = full_df.merge(metadata_df, on="sample_id", how="left")
+
+            return result
+
+        except Exception:
+            return pd.DataFrame()
+
+    def __repr__(self) -> str:
+        """String representation."""
+        n_repos = len(self.config.repositories)
+        n_datasets = sum(
+            len(rc.dataset) if rc.dataset else 0
+            for rc in self.config.repositories.values()
+        )
+        n_cached = len(self.cache)
+        return (
+            f"VirtualDB({n_repos} repositories, {n_datasets} datasets configured, "
+            f"{n_cached} views cached)"
+        )