Open
Conversation
- Comprehensive datasheet consolidating information from CM4AI publications and data releases - Includes metadata from cm4ai.org, Virginia Dataverse releases, and 38+ publications - Covers 3 main data types: CRISPR perturbation atlas, SEC-MS protein interactions, and IF imaging - Documents 53,788 images, 1,792 proteins, 11,739 genes targeted, 1,374 protein interactions (22.7 TB total) - Structured with detailed resources section for each major data component - Validates against D4D schema Sources: - CM4AI website (https://cm4ai.org) - Virginia Dataverse data releases (DOIs: 10.18130/V3/B35XWX, 10.18130/V3/F3TD5R, 10.18130/V3/DXWOS5) - CM4AI publications list (https://cm4ai.org/publications/) - Existing datasheets in data/extracted_by_column/CM4AI/ Related to: #71 Co-Authored-By: Claude <noreply@anthropic.com>
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Created new consolidated D4D datasheet for Cell Maps for Artificial Intelligence (CM4AI) based on comprehensive documentation from CM4AI website, Virginia Dataverse releases, and 38+ publications.
Files Added
data/sheets_d4dassistant/cm4ai_d4d.yaml- Consolidated CM4AI D4D datasheetValidation
Key Metadata Extracted
Dataset ID:
cm4aiDataset Name: Cell Maps for Artificial Intelligence (CM4AI)
Purpose: CM4AI was created to generate comprehensive, AI-ready maps of human cell architecture from disease-relevant cell lines to support interpretable genotype-phenotype learning and advance functional genomics research using the FAIRSCAPE framework and RO-Crate format.
Composition:
Data Types:
Distribution:
Consortium: UC San Diego (lead), UCSF, Stanford, UVA, Yale, UT Austin, UAB, Simon Fraser, Hastings Center
Funding: NIH Bridge2AI grant 1OT2OD032742-01
Sources
This consolidated datasheet synthesizes information from:
data/extracted_by_column/CM4AI/dataverse_10.18130_V3_B35XWX_d4d.yamland related filesHow to Review
data/sheets_d4dassistant/cm4ai_d4d.yamlfor completeness and accuracyNotes
nullor omitted indicate information not found in source documentationRelated to: #71
🤖 Generated with D4D Assistant