STAT390_project/step2.qmd at main · NUstat/STAT390_project · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
---
title: "Step 2: Matching slices across stains"
---

**Aim: To identify and match structurally similar slices across stains for each patient.**

![](images/Step2.png){fig-align="center"}


# Methodology

This pipeline automates the preprocessing, matching, alignment, and patch extraction of stained tissue images to enable efficient downstream analysis. The process begins with the preprocessing phase, where images are standardized and organized. Files are renamed to a consistent format, and patients missing one or more strain types (H&E, Melan-A, SOX-10) are excluded. For patients with multiple images of the same strain, only the highest-resolution image is retained. Once cleaned, images are split into folders by patient ID to prepare for further analysis.

The matching phase groups corresponding images across the three stains for each patient by calculating distances between extracted contours to find optimal matches. The aligned images are saved as "matches" and further processed during the alignment phase, where images are rotated and resized to maximize overlap while maintaining consistent dimensions. A contour-based algorithm crops images to their regions of interest before alignment.

Finally, the patch extraction phase identifies and extracts tissue patches containing both epithelium and stroma using skeletonization and gradient-based methods. Each patch is validated to ensure it contains components from all three stain types and meets quality criteria, such as having a balanced proportion of tissue and background pixels. Patches are saved in an organized structure, enabling seamless comparison across stains. The pipeline integrates error handling and modular design to ensure robustness, scalability, and adaptability to varied datasets.


Follow this step-by-step manual to match slices. Our API is included which automates the process.

:::{.callout-note collapse=true}

## Step-by-Step Manual (API included)

Two codes required:

-   `pipeline (1).py`

Follow these steps to use the pipeline to generate matching slices for all patients or a specific subset:

To ensure this script works correctly, please follow the instructions below:

1. Run Cara's automation script to generate the 'processed_images' directory

2. Ensure that each file is named the same way (with upper and lower case letters):
          patient ID + strain type + ROI number (separated by underscores)

3. Run the script `pipeline (1).py` and select the 'processed_images' directory


# API

Here is the API we created to automate this step. Using this tool will speed up the process.

![](images/API.png){fig-align="center"}

:::

# Results

**Example:**

All tissue slices from patient h2114153:

![](images/step%202%20images/slice1.png){width="125"} ![](images/step%202%20images//slice4.png){width="90"} ![](images/step%202%20images//slice5.png){width="106"} ![](images/step%202%20images//slice6.png){width="105"} ![](images/step%202%20images//slice2.png){width="121"} ![](images/step%202%20images//slice7.png){width="100"} ![](images/step%202%20images//slice3.png){width="113"}

**Successfully matched results**

![](images/step%202%20images//slice1.png){width="125"} ![](images/step%202%20images//slice2.png){width="121"} ![](images/step%202%20images//slice3.png){width="113"}

![](images/Step2.png){fig-align="center"}