Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
![Contributors](https://img.shields.io/github/contributors/zmahnoor14/BioXend)

# BioXend
BioXend is a new computational framework for submitting Microbial Biotransformation of Xenobiotics data.
BioXend is a new computational framework for submitting Microbial Biotransformation of Xenobiotics data. You can checkout the webpage for BioXend project here: https://zmahnoor14.github.io/BioXend/
- WP1: is to develop minimum reporting standards based on community consensus
- WP2: is to automate the metadata collection
- WP3: is to develop a submission workflow to [ChEMBL](https://www.ebi.ac.uk/chembl/)
Expand Down
144 changes: 140 additions & 4 deletions Standards/MIXMB_Biotransformation.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,6 @@ This document identifies Minimum Information (MI) required to report the biotran
- [11. Data Quality Tiers](#11-data-quality-tiers)
- [12. Example Complete Record](#12-example-complete-record)
- [13. Integration with Other Standards](#13-integration-with-other-standards)
- [15. Version History](#15-version-history)
- [16. References](#16-references)
- [17. Contact and Contributions](#17-contact-and-contributions)

Expand Down Expand Up @@ -1571,15 +1570,152 @@ cross_references:

---

## 15. Version History
## 14. Data Access and Repository Guidance

### 14.1 Repository and Standard Documents

The MIX-MB standard documents, submission template, and BioXend pipeline are openly available:

- **Standard documents:** [GitHub — zmahnoor14/BioXend](https://github.com/zmahnoor14/BioXend), under `Standards/`
- **Submission template:** `Standards/Templates/Template.xlsx` — the **Activity** sheet covers MIX-MB(B)
- **BioXend pipeline:** Nextflow workflow (`main.nf`) in the repository root

### 14.2 Files Generated by This Standard

MIX-MB(B) experimental data feed the activity and assay parameter files required for ChEMBL deposition. The BioXend pipeline produces:

| Output File | Generator Script | Content |
|-------------|-----------------|---------|
| `ASSAY.tsv` | `generate_assay.R` | Assay descriptions per organism × condition |
| `ASSAY_PARAM.tsv` | `generate_assay.R` | Quantitative experimental parameters |
| `ACTIVITY.tsv` | `write_activity_tsv.R` | Compound–assay activity links with measurements |

These files are joined to compound records via CIDX (MIX-MB(X)) and organism records via AIDX (MIX-MB(M)).

### 14.3 Submitting to ChEMBL

MIX-MB(B) activity files are deposited to [ChEMBL](https://www.ebi.ac.uk/chembl/) as part of the complete six-file MIX-MB submission package. For deposition enquiries, contact **chembl-help@ebi.ac.uk**.

**Pre-submission checklist for activity data:**
- Every `ACTIVITY.tsv` row has a valid CIDX, AIDX, and RIDX
- `ACTION_TYPE` uses only controlled vocabulary terms (Section 8.3)
- `ACTIVITY_COMMENT` is non-empty for every row
- All controls (Section 9) are documented in `ASSAY_PARAM.tsv` or `ASSAY_DESCRIPTION`

### 14.4 Raw Data Repositories

Raw analytical data accompanying a MIX-MB(B) submission should be deposited alongside the ChEMBL files:

| Repository | URL | Data type |
|------------|-----|-----------|
| MetaboLights | https://www.ebi.ac.uk/metabolights/ | LC-MS raw data (.mzML), processed peak tables |
| GNPS | https://gnps.ucsd.edu/ | MS/MS spectral data, molecular networks |
| MassIVE | https://massive.ucsd.edu/ | Raw MS data (alternative to MetaboLights) |

Record the repository accession in `ACTIVITY_COMMENT` and in the Bioschemas Dataset `@id` field (Section 2.3).

---

## 15. Licence and Reuse

### 15.1 Standard Documents

The MIX-MB standard documents are released under the **Creative Commons Attribution 4.0 International (CC BY 4.0)** licence ([https://creativecommons.org/licenses/by/4.0/](https://creativecommons.org/licenses/by/4.0/)).

You are free to share, adapt, and build upon this standard for any purpose, provided you give appropriate credit:

> Zulfiqar M. *et al.* MIX-MB: Minimum Information about Xenobiotics-Microbiome Biotransformation (v0.1.0). GitHub: https://github.com/zmahnoor14/BioXend

### 15.2 Pipeline and Code

The BioXend Nextflow pipeline and associated scripts are released under the **MIT Licence**. See `LICENSE` in the repository root for full terms.

### 15.3 Submitted Data

Data deposited using MIX-MB formats in public repositories are subject to that repository's terms:

| Repository | Licence |
|------------|---------|
| ChEMBL | CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0/) |
| MetaboLights | CC0 1.0 or CC BY 4.0 (depositor's choice at submission) |
| GNPS / MassIVE | CC0 (public domain) |

When reusing MIX-MB-formatted datasets, cite both the original study (via its RIDX/DOI) and the MIX-MB standard.

---

## 16. Provenance

Provenance records the origin, history, and chain of custody of biotransformation experimental data reported under MIX-MB(B). Complete provenance supports the **Reusable** principle of FAIR and enables independent reproducibility assessment.

### 16.1 Study-Level Provenance

Every activity record is anchored to a **Reference record** (RIDX) in `REFERENCE.tsv`:

| Field | Description |
|-------|-------------|
| `RIDX` | Study-local identifier linking all six submission files |
| `DOI` | Digital Object Identifier of the source publication |
| `TITLE`, `AUTHORS`, `YEAR` | Full bibliographic metadata |

The RIDX appears in every row of `ACTIVITY.tsv`, `ASSAY.tsv`, and `COMPOUND_RECORD.tsv`, making the publication the root provenance node for the entire data package.

### 16.2 Experimental Provenance

MIX-MB(B) traceability runs from raw measurement back to the study:

```
REFERENCE.tsv (RIDX)
├── COMPOUND_RECORD.tsv (CIDX) ← chemical identity and source (MIX-MB(X))
├── ASSAY.tsv (AIDX) ← organism, strain, and conditions (MIX-MB(M))
└── ACTIVITY.tsv (CIDX × AIDX × RIDX) ← measured outcome
```

Each activity row therefore carries the full experimental context via its three index keys.

### 16.3 Protocol and Method Provenance

Document the analytical and experimental methods in:

| Location | What to record |
|----------|----------------|
| Bioschemas LabProtocol record (Section 2.2) | Step-by-step protocol with reagents and instruments |
| `ASSAY_PARAM.tsv` | Quantitative parameters (temperature, duration, cell density, pH) |
| `ASSAY_DESCRIPTION` in `ASSAY.tsv` | Free-text assay description including key conditions |
| `ACTIVITY_COMMENT` | Analytical method details, processing software, mass accuracy |

**Minimum method provenance for Gold-tier submissions (Tier 1):**
- Instrument model and manufacturer
- Software versions used for data acquisition and processing (e.g., MZmine 2.53, Xcalibur 4.1)
- Spectral library used for metabolite annotation (e.g., MassBank, HMDB)

### 16.4 Raw Data Provenance

For each study, record:

- **MetaboLights / GNPS accession** — in `ACTIVITY_COMMENT` and in the Dataset Bioschemas `@id` (Section 2.3)
- **Raw data format** — mzML (preferred) or vendor format with conversion notes
- **Processing pipeline** — version and parameters used (e.g., MZmine feature detection settings)

### 16.5 Pipeline Provenance

When generating submission files with BioXend, record in `REFERENCE.tsv` or `ASSAY_DESCRIPTION`:

- **BioXend pipeline version** — from `versions/pipeline.txt`
- **Date of file generation** — ISO 8601 format (YYYY-MM-DD)

---

## 17. Version History

| Version | Date | Changes |
|---------|------|---------|
| 0.1.0 | 2026-02-05 | Initial draft: Core biotransformation standards |

---

## 16. References
## 18. References

1. ChEMBL Bioactivity Database: https://www.ebi.ac.uk/chembl/
2. BioAssay Ontology (BAO): http://www.bioassayontology.org/
Expand All @@ -1594,7 +1730,7 @@ cross_references:

---

## 17. Contact and Contributions
## 19. Contact and Contributions

For questions, suggestions, or contributions to this standard, please contact:
- **Maintainer:** Mahnoor Zulfiqar
Expand Down
137 changes: 131 additions & 6 deletions Standards/MIXMB_Microbes.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,10 +40,11 @@ This document identifies Minimum Information (MI) required to report microbial o
- [7.4 Growth Phases](#74-growth-phases)
- [4. Data Validation Rules](#4-data-validation-rules)
- [5. Data Quality Tiers](#5-data-quality-tiers)
- [6. How to use the Template](#6-how-to-use-the-template)
- [7. Version History](#7-version-history)
- [8. References](#8-references)
- [9. Contact and Contributions](#9-contact-and-contributions)
- [8. Data Access and Repository Guidance](#8-data-access-and-repository-guidance)
- [9. Licence and Reuse](#9-licence-and-reuse)
- [10. Provenance](#10-provenance)
- [11. References](#11-references)
- [12. Contact and Contributions](#12-contact-and-contributions)

---

Expand Down Expand Up @@ -679,7 +680,131 @@ The MIX-MB submission template is provided as [Templates/Template.xlsx](Template

---

## 8. References
## 8. Data Access and Repository Guidance

### 8.1 Repository and Standard Documents

The MIX-MB standard documents, submission template, and BioXend pipeline are openly available:

- **Standard documents:** [GitHub — zmahnoor14/BioXend](https://github.com/zmahnoor14/BioXend), under `Standards/`
- **Submission template:** `Standards/Templates/Template.xlsx` (see Section 6)
- **BioXend pipeline:** Nextflow workflow (`main.nf`) in the repository root

### 8.2 Files Generated by This Standard

MIX-MB(M) organism data feed the assay files required for ChEMBL deposition. The BioXend pipeline (`generate_assay.R`) produces:

| Output File | Content |
|-------------|---------|
| `ASSAY.tsv` | Assay descriptions per organism × condition, indexed by AIDX |
| `ASSAY_PARAM.tsv` | Quantitative experimental parameters (temperature, pH, incubation time, cell density) |

These files are linked to compound and activity records via AIDX (see MIX-MB(B)).

### 8.3 Submitting to ChEMBL

MIX-MB(M) assay files are deposited to [ChEMBL](https://www.ebi.ac.uk/chembl/) as part of a full MIX-MB submission package. For deposition enquiries, contact **chembl-help@ebi.ac.uk** or use the ChEMBL deposition portal.

**Pre-submission checklist for organism data:**
- All assay rows pass Section 4 validation rules
- Every `ASSAY_TAX_ID` is a valid, numeric NCBI TaxID
- Scientific name matches the registered NCBI name for that TaxID
- At least one `sameAs` URL to NCBI Taxonomy (Gold tier)

### 8.4 Supplementary Sequence Data Repositories

Sequencing data linked to strains used in a MIX-MB study should be deposited in:

| Repository | URL | Data type |
|------------|-----|-----------|
| NCBI SRA | https://www.ncbi.nlm.nih.gov/sra | Short-read and long-read sequencing |
| ENA | https://www.ebi.ac.uk/ena | Alternative / European submission |
| NCBI Assembly | https://www.ncbi.nlm.nih.gov/assembly | Genome assemblies (GCA/GCF accessions) |

Record the assembly accession in `additionalProperty` → `genome_assembly` (Section 2.3) and include it in the `sameAs` field of the Taxon record.

---

## 9. Licence and Reuse

### 9.1 Standard Documents

The MIX-MB standard documents are released under the **Creative Commons Attribution 4.0 International (CC BY 4.0)** licence ([https://creativecommons.org/licenses/by/4.0/](https://creativecommons.org/licenses/by/4.0/)).

You are free to share, adapt, and build upon this standard for any purpose, provided you give appropriate credit:

> Zulfiqar M. *et al.* MIX-MB: Minimum Information about Xenobiotics-Microbiome Biotransformation (v0.1.0). GitHub: https://github.com/zmahnoor14/BioXend

### 9.2 Pipeline and Code

The BioXend Nextflow pipeline and associated scripts are released under the **MIT Licence**. See `LICENSE` in the repository root for full terms.

### 9.3 Submitted Data

Data deposited using MIX-MB formats in public repositories are subject to that repository's terms:

| Repository | Licence |
|------------|---------|
| ChEMBL | CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0/) |
| NCBI SRA / Assembly | NCBI data use policies (https://www.ncbi.nlm.nih.gov/home/about/policies/) |
| ENA | EMBL-EBI data access policies |

When reusing MIX-MB-formatted datasets, cite both the original study (via its RIDX/DOI) and the MIX-MB standard.

---

## 10. Provenance

Provenance records the origin, history, and chain of custody of microbial organism data reported under MIX-MB(M). Complete provenance supports the **Reusable** principle of FAIR and enables reproducibility of biotransformation experiments.

### 10.1 Study-Level Provenance

Every assay record is anchored to a **Reference record** (RIDX) in `REFERENCE.tsv`:

| Field | Description |
|-------|-------------|
| `RIDX` | Study-local identifier linking all submission files |
| `DOI` | Digital Object Identifier of the source publication |
| `TITLE`, `AUTHORS`, `YEAR` | Full bibliographic metadata |

The RIDX appears in every row of `ASSAY.tsv`, making the publication the root provenance node for all organism data.

### 10.2 Strain Identity Provenance

The organism identifiers documented in Section 1.4 serve as provenance anchors for strain identity:

| Identifier | Provenance Source |
|------------|------------------|
| **NCBI TaxID** | Verified against NCBI Taxonomy; links to authoritative taxonomy record |
| **LPSN name** | Valid scientific name from the List of Prokaryotic Names (https://lpsn.dsmz.de/) |
| **Culture collection ID** | Physical strain held at a registered biobank (ATCC, DSMZ, CGSC, etc.) |
| **Genome assembly accession** | Sequenced genome lodged in NCBI Assembly or ENA |
| `sameAs` URLs | Direct links to NCBI Taxonomy, LPSN, BacDive, or other authoritative records |

For novel isolates without a registered TaxID, the closest assigned TaxID plus a note in `ACTIVITY_COMMENT` constitutes the strain's provenance record until formal registration.

### 10.3 Environmental and Isolation Provenance

For strains isolated from environmental or clinical samples, document the following MIxS-compliant fields (Section 3.3) as part of strain provenance:

| Field | Provenance Aspect |
|-------|------------------|
| `isolation_source` | Biological or environmental matrix from which the strain was obtained |
| `collection_date` | ISO 8601 date of sample collection |
| `geo_loc_name` | Geographic origin of the sample |
| `env_broad_scale` / `env_local_scale` | ENVO-annotated habitat classification |

### 10.4 Pipeline Provenance

When generating submission files with BioXend, record in `REFERENCE.tsv` or `ASSAY_PARAM.tsv`:

- **BioXend pipeline version** — from `versions/pipeline.txt`
- **`generate_assay.R` version** — from the script header
- **Date of generation** — ISO 8601 format (YYYY-MM-DD)

---

## 11. References

1. NCBI Taxonomy Database: https://www.ncbi.nlm.nih.gov/taxonomy
2. Genomic Standards Consortium (GSC): https://www.gensc.org/
Expand All @@ -693,7 +818,7 @@ The MIX-MB submission template is provided as [Templates/Template.xlsx](Template

---

## 9. Contact and Contributions
## 12. Contact and Contributions

For questions, suggestions, or contributions to this standard, please contact:
- **Maintainer:** Mahnoor Zulfiqar
Expand Down
12 changes: 9 additions & 3 deletions Standards/MIXMB_Standards_main.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,17 @@
**Version:** 0.1.1
**Release Date:** March 16, 2026 (Draft)
**Status:** Draft Standard
**DOI:** XXXXXXX (to be assigned upon stable release)


---

## Document structure overview

<p align="center">
<img src="standards-main.png" />
</p>


## Table of Contents

- [Abstract](#abstract)
Expand All @@ -25,7 +31,7 @@
- [How are assay/microbe data files integrated into the Template.xlsx](#how-are-assaymicrobe-data-files-are-integrated-into-the-templatexlsx)
- [4. Biotransformation Metadata File(s)](#4-biotransformation-metadata-files)
- [How are biotransformation data files integrated into the Template.xlsx](#how-are-biotransformation-data-files-are-integrated-into-the-templatexlsx)
- [Identifiers and Cross-Referencing](#identifiers-and-cross-referencing)
- [Identifiers and Cross-Referencing](#naming-convention-for-identifiers-and-cross-referencing-in-ChEMBL)
- [Minting Scheme for Unknowns](#minting-scheme-for-unknowns)
- [ChEMBL Links and FAQs](#chembl-links-and-faqs)

Expand Down Expand Up @@ -70,7 +76,7 @@ MIX-MB does not currently cover:

## Component Standards

This standard comprises three interconnected sub-standards:
This standard comprises three interconnected sub-standards: (individual components under construction)

| Component | Description | Version | Last Updated (YYYY-MM-DD) | Document |
|-----------|-------------|---------|--------------------------|----------|
Expand Down
Loading
Loading