Skip to content

carlynlee/pythonSOM

Repository files navigation

# Python Self-Organizing Map (SOM) Implementation

A Python 3 implementation of Kohonen Self-Organizing Maps for data clustering and visualization.

## Overview

This implementation provides a complete SOM framework with:
- Data structures for organizing SOM data and maps
- Topology support (hexagonal/rectangular lattices, sheet/toroid shapes)
- Training parameter management
- Linear initialization using PCA
- Unit coordinate generation

## Files

- `som_data_struct.py` - Data container class
- `som_topol_struct.py` - Topology structure (lattice, map shape, size)
- `som_train_struct.py` - Training parameters and configuration
- `som_unit_coords.py` - Unit coordinate generation
- `som_map_struct.py` - Main SOM map class with codebook
- `Extr_PCA_Features.py` - PCA feature extraction utility
- `test.py` - Example script using liver data

## Quick Start

### Basic Usage

```python
import numpy as np
from som_data_struct import som_data_struct
from som_map_struct import som_map_struct

# Create data structure
data = np.random.rand(100, 5)  # 100 samples, 5 features
sdata = som_data_struct(data, name="my_data")

# Create SOM map (10x10 grid, 5 dimensions)
smap = som_map_struct(dim=5, msize=np.array([[10, 10]]))

# Initialize using linear initialization (PCA-based)
smap.som_lininit(sdata)

# Print map information
smap.print_all()
```

### Running the Example

The `test.py` script demonstrates PCA feature extraction and data preparation:

```bash
python3 test.py
```

This script:
1. Loads data from `imputed-liver.txt`
2. Extracts PCA features
3. Creates train/validation/test splits
4. Normalizes data to [0, 1] range

### Requirements

- Python 3.x
- numpy
- scipy

## Data Structure

The SOM implementation uses several key classes:

- **som_data_struct**: Holds input data with labels and metadata
- **som_map_struct**: The main SOM with codebook vectors
- **som_topol_struct**: Defines map topology (size, lattice type, shape)
- **som_train_struct**: Manages training parameters (learning rate, radius, etc.)
- **som_unit_coords**: Generates coordinates for map units

## Notes

- This code has been modernized from Python 2 to Python 3
- The implementation supports both hexagonal and rectangular lattices
- Linear initialization uses PCA to initialize codebook vectors based on data principal components

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages