Skinterest-2B β Multi-Modal Skin Condition Classification (Break Through Tech AI Γ Skinterest Tech)
Disclaimer: This project is a research prototype for educational purposes only and is not medical advice. Do not use it to diagnose or treat any condition.
| Name | GitHub Handle | Role / Contribution |
|---|---|---|
| Aisha Salimgereyeva | @aishasalim |
ResNet-152V2 pipeline; training/eval scripts; Streamlit demo; docs |
| Wanying Xu | @OliviaCoding |
MobileNetV3 baselines; EDA & visuals; documentation |
| Ayleen Jimenez | @ayleenjim |
EfficientNet-B7 experiments; error analysis |
| Hoang Do | @hoangggdo |
MaxViT experiments; augmentation/regularization ablations |
| Alexis Amadi | @aalexis123 |
ResNet50 baseline; optimization & speed profiling |
| Susan Qu | @susan-q |
ResNet50 experiments; lighting and skin tone analysis |
| Nandini | @albatrosspreacher |
Reviewer (Write access); PM support; meeting notes |
- Developed a multimodal CNN by using various deep learning models(ResNet-152V2, MobileNetV3, etc) in order to process and classify a wide range of skin conditions, such as Eczema/Atopic Dermatitis, Lupus, and Pigmentation disorders.
- Achived a testing accuracy of over 80%, demonstrating that this model is suitable for image for AI analysis and directly contributing to Skinterest's goal of fostering inclusivity within the dermatology field.
- Implemented (1) lighting harshness and (2) skin undertones analysis of the data so that the model is able to classify images with different lighting and color tones.
- Created a Streamlit demo for qualitative testing and stakeholder feedback.
- Python: 3.9β3.11
- (Recommended) GPU runtime for training (Colab T4/A100 or local NVIDIA)
- Kaggle API credentials (for Kaggle dataset download)
.
βββ app.py
βββ requirements.txt
βββ configs/
β βββ resnet152v2_baseline.yaml
βββ experiments/ # auto-created: one folder per run (small text+png only)
β βββ <run_name>/
β βββ report.json
β βββ metrics.csv # optional training history
β βββ weights.txt # how to fetch large .keras/.h5 from cloud storage
β βββ figures/ # optional PNGs (cm, grad-cam, slice metrics, etc.)
βββ notebooks/
β βββ aisha/
β βββ 01_eda_scins_kaggle.ipynb
β βββ 02_training_multitask_resnet152v2.ipynb
β βββ 03_error_analysis_fairness.ipynb
βββ scripts/
β βββ prepare_kaggle_meta.py # builds meta CSV with labels + lighting + ITA
β βββ train_abc.py # trains Phase A/B/C from a config
βββ src/
βββ ... (data / layers / models / training / eval / utils)
- Training + evaluation artifacts should go to
experiments/<run_name>/ - Large model binaries (
.keras,.h5) should NOT be committed; store externally and put the download command/link inexperiments/<run_name>/weights.txt
-
Open:
notebooks/aisha/02_training_multitask_resnet152v2.ipynbin Colab. -
Install dependencies (top cell):
!pip -q install kaggle==1.6.17 tensorflow==2.20.0 tensorflow-addons==0.23.0 opencv-python==4.10.0.84 -
Kaggle credentials:
-
Upload
kaggle.jsonto/root/.kaggle/kaggle.json -
Set permissions:
!chmod 600 /root/.kaggle/kaggle.json
-
-
Run the notebook cells to:
- Download dataset(s)
- Generate metadata (lighting + ITA)
- Train Phase A/B/C
- Write artifacts into
experiments/<run_name>/
Expected outputs (example):
experiments/<run_name>/report.jsonexperiments/<run_name>/metrics.csv(optional)experiments/<run_name>/weights.txt(download instructions for.keras/.h5)experiments/<run_name>/figures/*.png(optional)
Note: If you currently save figures to
docs/figures/, consider redirecting them intoexperiments/<run_name>/figures/so every run is self-contained.
Tested on Python 3.9β3.11.
Create venv + install:
python -m venv .venv && source .venv/bin/activate
pip install --upgrade pip wheel
pip install -r requirements.txtIf you use TensorFlow on Apple Silicon (recommended pins):
pip install tensorflow-macos==2.16.1 tensorflow-metal==1.1.0 keras==3.3.3Train from config (recommended reproducible entrypoint):
python scripts/train_abc.py --config configs/resnet152v2_baseline.yaml --run_name resnet152v2_baseline_seed42 --seed 42Run the demo:
streamlit run app.pyIf you see model deserialization issues, make sure custom layer class names (e.g.,
ColorCalibration,ResNetV2Preprocess) are imported and match exactly what was used during training.
- Follow the official SCIN access instructions (see References).
- Recommended: place SCIN under
data/scin/(not committed).
Download via Kaggle CLI (example):
kaggle datasets download -d <kaggle-dataset-slug> -p data/kaggle --unzipCreate metadata CSV used by tf.data pipelines:
python scripts/prepare_kaggle_meta.py --input_dir data/kaggle --out_csv data/meta/kaggle_meta.csvEnsure your training config / notebook points to the generated
meta.csvpath.
About the Program:
- The Break Through Tech AI program is an experiential learning opportunity that allows students to gain hands-on technical experience in the competitive AI/ML industry. This program connected us to our project's challenge advisors from Skinterest Tech and through this program, we learned how to work as a team, preprocess and clean data, train AI models, and fine-tune parameters for better results. These learned skills set us apart from other applicants when applying to jobs. Our Goals, Objectives, and the Company:
- Skinterest Tech is a skincare startup whose goal is to diversify skincare and help patients find the right product based on their skin quality, texture, tone, and more. The objective our AI Studio project with Skinterest Tech is to develop a reliable and usable machine learning model that detects poor lighting and classifies images of common dermatologic conditions across diverse skin tones, to be used for clinical review. Business Relevance:
- The problem that our machine learning model solves is significant because training data used by today's dermatology industry is often heavily skewed toward lighter skin tones, neglecting representation for people with darker complexions. This issue can impact skin condition diagnosis and product awareness. Skin condition diagnosis on people with deeper skin tones may be incorrectly classified and patients may be recommended unnecessary, or even harmful products. Our model specifically addresses this by accounting for various skin tones and image lighting of their pictures.
Datasets
- SCIN: large dermatology corpus emphasizing representation across skin tones. Used for primary training and evaluation splits.
- Kaggle: Skin Diseases Image Dataset (ismailpromus): used for stress testing and additional qualitative validation.
Preprocessing & assumptions
- Lighting features (HSV/V/contrast/specular) generate a binary label (well-lit vs poor lighting) via conservative thresholds.
- Skin-tone bucket is computed from ITA (LAB space):
light / medium / dark(median over a simple skin mask). - Center-crop (default
0.8) + resize to 224Γ224 to reduce background bias and normalize scale. - Label encoding for 10-class diagnosis head; consistent class order is stored in
demo/class_index.json.
EDA Insights
- Class imbalance is significant (e.g., nevi >> eczema/psoriasis).
- Lighting quality and tone distribution are skewedβnecessitating class-weights and fairness slices.
- Basic augmentations (flip/rotate/zoom + color jitter) help reduce overfitting without harming calibration.
Architecture (multitask)
-
Input: 224Γ224Γ3 float [0,1] β ColorCalibration (CCM) β ResNetV2Preprocess β Backbone (e.g., ResNet-152V2, ResNet50, EfficientNet-B7, MobileNetV2/V3, MaxViT) β
- Head 1 (lighting): Dense(128, ReLU) β Dropout β Dense(1, Sigmoid)
- Head 2 (diagnosis): Dense(128, ReLU) β Dropout β Dense(10, Softmax)
Training schedule
- Phase A (heads only): backbone + CCM frozen; LR=1e-3 (AdamW).
- Phase B (CCM only): unfreeze CCM; LR=5e-4.
- Phase C (partial backbone): unfreeze top 40%; LR=5e-5.
- Phase D (optional): full unfreeze at tiny LR (1e-5 β 5e-6) with strong regularization + early stop.
Imbalance handling
- Default: class-weights (preferred).
- Ablation: capped oversampling by
(diagnosis Γ tone_bucket)to check fairness trade-offs.
Loss / Metrics
- Lighting: Binary Cross-Entropy (+ label smoothing 0.05), Accuracy, AUC.
- Diagnosis: Sparse Categorical Cross-Entropy, Top-1 Accuracy, Macro-Avg Accuracy, Per-Class Accuracy.
- Fairness: accuracy by
tone_bucket.
-
src/model_multitask.pyColorCalibration: learnable 3Γ3 color transform + bias with L2 prior to identity.build_multitask(backbone=..., drop_rate=...): returns Keras model with two heads.
-
src/data_prep.py- ITA computation + simple skin mask; metadata CSV; stratified splits;
tf.datapipelines with center-crop and augmentations.
- ITA computation + simple skin mask; metadata CSV; stratified splits;
-
src/train.py- Implements Phases A/B/C; class-weights; callbacks (ModelCheckpoint, EarlyStopping, ReduceLROnPlateau).
-
src/eval.py- Confusion matrix, per-class tables, fairness slices, and Grad-CAM utilities.
-
app.py- Streamlit demo; loads
.keraswith custom layers; top-k predictions; optional Grad-CAM.
- Streamlit demo; loads
Numbers below are from a representative ResNet-152V2 + CCM run (Phases A/B/C), single seed 42.
Test set (diagnosis head)
- Overall accuracy: ~0.80
- Macro-avg accuracy: ~0.75
- Notable strong classes: BCC (~0.94), Nevi (~0.92)
- Weaker classes: Eczema / Psoriasis (0.55β0.65); confusions often symmetric.
Lighting head
- Accuracy: ~0.86; AUC: high-0.88/0.89 range.
Fairness slice (diagnosis by tone_bucket)
- light: ~0.82
- medium: ~0.71
- dark: ~0.86 (very small n; wide CI)
Figures (saved under docs/figures/)
confusion_matrix_diagnosis.pngpr_curves_lighting.pnggradcam_examples/β¦fairness_bars_tone_bucket.png
Takeaways
- CCM + center-crop reduce color/illumination drift.
- Class-weights outperform heavy oversampling for generalization.
- Full unfreeze (Phase D) risks overfitting unless combined with stronger regularization and early stopping.
Summary: Our model can be highly susceptable to overfitting while training because of domain shifts and high variance in minority classes of the data due to class imbalance and subtle visual traits. It also sometimes struggles to differentiate Eczema and Psoriasis, likely due to visual overlap and labeling noise factors. Introducing external images outside of data set may also effect model accuracy; Grad-CAM helps audit failure modes.
What worked
- Multitask formulation stabilized training and improved robustness to lighting.
- Lightweight CCM provided consistent gains with negligible compute cost.
- Clear phase schedule (A/B/C) improved convergence and prevented catastrophic forgetting.
What didnβt
- Phase D full unfreeze frequently overfit (valβ while trainβ).
- Eczema/Psoriasis remain challengingβvisual overlap + labeling noise likely factors.
- External images (distribution shift) can degrade accuracy; Grad-CAM helps audit failure modes.
Why
- Class imbalance + subtle visual traits β higher variance in minority classes.
- Domain shift (camera, distance, compression) β emphasize data standardization at inference.
Procedural: With more time and resources, we may consider other project approach options such as having the team focus on one project step and one model at a time rather than all at once. This may encourage even more teamwork and learning opportunities. Some additional data we may want to emplore are additional skin images from either the company or online to increase our dataset size fix imbalances in the data. We may even want to add images of normal skin of different lighting.
Technical:
- DetectorβClassifier: Use YOLO lesion crops instead of global center-crop.
- Calibration: Temperature scaling / Dirichlet calibration for better confidence estimates.
- Data curation: Add cleaner eczema/psoriasis samples; augment under-represented tones.
- Fairness: Track per-tone ECE and per-class macro-F1; evaluate with bootstrapped CIs.
- Light-quality feedback: Turn lighting head into a user tip (βmove closerβ, βavoid flash glareβ).
- Distillation: Compress best model to MobileNetV3-Small for on-device triage.
This project is licensed under the MIT License.
- SCIN: A New Resource for Representative Dermatology Images (Dataset, Blog, and GitHub provided by Google Research).
- Kaggle: Skin Diseases Image Dataset by ismailpromus.
- Deep Residual Learning for Image Recognition (ResNet) - He et al. (2016).
- EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks - Tan & Le (2019).
- MobileNetV2/V3: Searching for MobileNetV3 - Howard et al. (2019/2020).
- MaxViT: Multi-Axis Attention for Vision Transformers - Tu et al. (2022).
- Skin Tone Representation in Dermatologist Social Media Accounts - Paradkar & Kaffenberger (2022).
Many thanks to our Skinterest Tech challenge advisors Ashley Abid and Thandiwe-Kesi Robins and Break Through Tech Coach Nandini Proothi for guiding us and answering our questions!