Skip to content

Subject: Clarification on “cluster_” identifiers and guidance on building a custom PICRUSt2 reference database #412

@Abelcanc3rhack3r

Description

@Abelcanc3rhack3r

Dear PICRUSt2 team,

I am currently examining the 16S rRNA reference FASTA files used in PICRUSt2 and noticed that some sequence records are labeled with identifiers such as “cluster_XXX”, while others are not.

Could you clarify what these “cluster_” identifiers represent in the context of the reference database? Specifically:

  • Are these derived from OTU clustering (e.g., CD-HIT/VSEARCH)?
  • Do they correspond to representative sequences from dereplicated groups?
  • Or do they reflect some internal processing step specific to PICRUSt2?

In addition, I am attempting to construct a custom PICRUSt2 reference database by combining:

  1. Older IMG-derived genome sequences (legacy version), and
  2. The mouse gut iMGMC PICRUSt2-compatible reference dataset

My goal is to integrate both into a unified database for prediction.

Could you advise on the recommended workflow for this? In particular:

  • How should sequence IDs be standardized (e.g., handling cluster vs non-cluster labels)?

  • Are there specific requirements for:

  • 16S sequence FASTA formatting

  • Phylogenetic placement (EPA-NG / SEPP steps)

  • Hidden-state prediction inputs (trait tables, marker gene copy numbers)

  • Is there an existing pipeline or script to rebuild a PICRUSt2-compatible database from combined sources?

Finally, could you advise on the best point of contact for more detailed guidance on custom database construction? Should this be directed via GitHub issues, or is there a more appropriate channel?

Thank you for your time and for maintaining PICRUSt2.

Best regards,
Abel Tan

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions