Skip to content

changlabtw/scGHSOM

Repository files navigation

scGHSOM

scGHSOM: A Hierarchical Framework for Single-Cell Data Clustering and Visualization

Prerequisites


Currently running using the WSL terminal in VS Code.

  • Requires JRE (Java Runtime Environment)
  • Python version 3.6 or higher

Data


Sets

  • ./raw-data/Levine_13
  • ./raw-data/Levine_32
  • ./raw-data/CyTOF-Samusik

File Description

  • raw-data (folder): Stores data to be clustered.
  • raw-data/label (folder): Stores labels for clustering data.
    • File names should have the same prefix as the data file, with _label appended.
  • Input data must be in CSV format.
  • Columns: Represent training attributes (all columns).
  • Rows: Represent data to be clustered.
  • Before starting clustering, name the index column (the index name must be passed in the command).

Usage


Run the following commands in the terminal:

# for Levine_13
python3 execute.py --index=Event --data=Levine_13dim_cleaned --tau1=0.06 --tau2=0.1

# for Levine_32
python3 execute.py --index=Event --data=Levine_32dim_cleaned --tau1=0.1 --tau2=0.2

# for CyTOF-Samusik
python3 execute.py --index=Event --data=Samusik_01_cleaned --tau1=0.08 --tau2=0.2

Notes:

  • data and index are mandatory parameters (ensure the index column is named and not empty).
  • If tau1 and tau2 are not provided:
    • tau1 defaults to 0.1
    • tau2 defaults to 0.01

Scripts:

  • execute.py: Runs all the process steps.
  • format_ghsom_input_vector.py: Generates data in a format compatible with GHSOM.
  • get_ghsom_dim.py: Retrieves the dimensions of the clustering results.
  • save_cluster_with_clustered_label.py: Produces a data frame with clustering results (Leaf and each Layer) and saves it to the data folder.

Evaluation:

  • evaluation/clustering_scores: Calculates external and internal evaluation scores.

Visualization


Run the following commands in the terminal:

# Cluster Feature Map
python3 programs/Visualize/cluster_feature_map.py --data=Samusik_01_cleaned --tau1=0.08 --tau2=0.2

# Cluster Distribution Map
python3 programs/Visualize/cluster_distribution_map.py --data=Samusik_01_cleaned --tau1=0.08 --tau2=0.2

Notes:

  • data, tau1, and tau2 should be set based on your dataset and analysis needs.

References


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •