STAT390_project/step1.qmd at main · NUstat/STAT390_project · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
---
title: "Step 1: Slice extraction"
editor:
  markdown:
    wrap: 72
---

**Aim: To extract the tissue slices from the Whole slide images (WSI)**

![](images/Step1.png)

# Methodology

This method employs a Groovy script in QuPath to automate the identification, processing, and export of tissue regions of interest (ROIs) from Whole Slide Images (WSIs). Key scripts, including `tissues_2.json`, `automate_export_newest.groovy`, and `existing_annotations.groovy`, enable efficient processing and handling of diverse stain types. After setting up the project and placing scripts in their designated directories, the `automate_export_newest.groovy` script is executed to detect and export regions of interest (ROIs) as tissue slices, which are saved in the `processed_data` folder.

The process begins by ensuring the image is not a mask file before applying stain deconvolution to separate Hematoxylin and Eosin (H&E) stains using predefined parameters for color deconvolution. If no annotations exist, a pixel classifier (`tissues_2`) is applied to detect tissue regions, generating initial annotations. These annotations are refined by merging those within a set distance threshold (5000 micrometers) to consolidate closely related tissue regions, leveraging centroid-based Euclidean distance calculations. Duplicate annotations with centroids closer than 50 micrometers are identified and removed to ensure clean and accurate outputs. Finally, the refined annotations are exported as high-resolution image tiles into a specified output directory, using a customizable downsample parameter to accommodate variations in image quality. This pipeline ensures efficient and accurate tissue ROI extraction, supporting downstream analysis and quality control.


Follow this step by step manual to extract slices.

:::{.callout-note collapse=true}

## Step-by-Step Manual


Codes required:

- `tissues_1.json`, `tissues_2.json` and `automate_export_newest.groovy`
- New for other stain types to expedite process: `existing_annotations.groovy`

### Setting up in QuPath

1.  Open a project that has all your images
2.  Put `tissue_2.json` into
    `base_proj_dir/classifiers/pixel_classifiers` (make if does not
    exist) Note, use `tissues_2.json` for most recent results (not
    `tissues_1` but you can still try this too. `tissues_2` contains
    broader parameters for a more sensitive model, works on more stains
    and images)
3.  Put `automate_export_newest.groovy` into `base_project_dir/scripts`
    (make if does not exist)
4.  Make sure you have an image open in QuPath interface
5.  In QuPath, top bar --\> Automate --\> Project scripts --\>
    `automate_export_newest.groovy`
6.  Script Editor has three vertical dots at the bottom --\> Run for
    project
7.  Data will save in processed_data dir in your base project dir

#### To deal with more difficult stain types if you decide to manually annotate:

Runs like `automate_export_newest.groovy` but only if you already have
annotations

1.  Need to set annotation class to "Positive" in QuPath (Annotations
    --\> Positive --\> Set selected and for future annotations to be
    auto "Positive," press "Auto set"")
2.  To export existing annotations only, run
    `existing_annotations.groovy`
3.  `existing_annotations.groovy` --\> `base_project_dir/scripts`
4.  In QuPath, top bar --\> Automate --\> Project scripts --\>
    `existing_annotations.groovy`
5.  Script Editor has three vertical dots at the bottom --\> Run for
    project
6.  Data will save in processed_data dir in your base project dir

#### To create a new pixel classifier or modify (optional):

1.  QuPath Interface top bar --\> Classify --\> Pixel Classification
    --\> Create thresholder
2.  See `tissues_1.json` and `tissues_2.json` for my parameters, and you
    can work from there
3.  Save this and then replace `tissues_2` in .groovy script.

### Step 1: First pass of algorithm

Following the instructions above, open your image in QuPath and run this
“annotation export newest” groovy script.

![](images/step 1 images/step_1_image_1.png){fig-align="center"}

Select Run, then Run For Project

![](images/step 1 images/step_1_image_2.png){fig-align="center"}

***Note*****:** If your automation fails while running due to a
particularly large image or systematically fails on a stain type (i.e.
Sheffield Sox10–most fail because reference image annotation is too
large to export), you have two options: Manually annotate and export
images (more on this later) Downsample an annotated area (last resort,
but can successfully downsample up to a factor of 2 to match
stakeholder’s desired resolution), can do this directly by changing the
downsample parameter

Select your images to process. Not counting the mask images, I tended to
process up to 20 at a time to reduce the memory load.

![](images/step 1 images/step_1_image_3.png){fig-align="center"}

### Step 2: Analyze results and troubleshoot

Once you run the automation for your images, I check in QuPath directly
image by image to ensure all data was properly exported. You should also
check in the processed_images dir created in your Qupath project dir
that no image was corrupted or too blurry. In order of manual work
needed, here are the possible cases for your images. They correspond
with how we dealt with and logged processing these images in the
[Tracker Data of Status of Each
Slice](https://nuwildcat-my.sharepoint.com/:x:/r/personal/akl0407_ads_northwestern_edu/_layouts/15/Doc.aspx?sourcedoc=%7B7050DB24-1558-4042-BF99-C6B17BBAF84D%7D&file=data_uploaded_tracker.xlsx&action=default&mobileredirect=true)

:::

Here are the six result cases we encountered and what to do with each one. Some require rerunning certain codes.

:::{.callout-note collapse=true}

## Result Cases

**Case 1**: perfect ROI identification Self-explanatory, all ROIs were
successfully found and exported

Example: Liverpool h1831023

![](images/step 1 images/step_1_image_4.png){fig-align="center"}

**Case 2**: merging Some of the region was not selected by the algorithm
but belongs in the tissue sample This has to be determined across stains
because some tissues might be separated in one type of stain but appear
merged in another However, we don’t want to over-merge as the amount of
whitespace makes matching difficult

Example of merging: h1846151 small
hanging pieces are okay to merge

![](images/step 1 images/step_1_image_5.png){fig-align="center"}

Example of when to not merge: h1810898B because sox10 looks similar to
unmerged h&e

![](images/step 1 images/step_1_image_6.png){fig-align="center"}

![](images/step 1 images/step_1_image_7.png){fig-align="center"}

Then, rerun “existing annotations” groovy script to export faster and
delete remaining ROIs in your file directory

**Case 3**: deletion For any of the following types of areas, delete the
annotations in QuPath: Blank images Example: Sheffield 77

![](images/step 1 images/step_1_image_8.png){fig-align="center"}

Splotches (shadows on the glass? blurs?)

![](images/step 1 images/step_1_image_9.png){fig-align="center"}

Then, rerun “existing annotations” groovy script to export faster and
delete remaining ROIs in your file directory to ensure consistent ROI
numbering

**Case 4**: manual selection from poor selection Sometimes, the
annotation region is specified correctly but with too much
whitespace/unnecessary area outside Delete the original annotation,
select a new region, set the class to Positive Then, rerun “existing
annotations” groovy script to export faster and delete remaining ROIs in
your file directory to ensure consistent ROI numbering

Example: selecting around the hair in h2114185 h&e

![](images/step 1 images/step_1_image_10.png){fig-align="center"}

Example: h1845484 sox10: selection reduces the splotches’ area and
prevents them from being exported extraneously

![](images/step 1 images/step_1_image_11.png){fig-align="center"}

**Case 5**: manual selection from image too large If Qupath runs out of
memory when trying to run images or is stuck on a particular one (ie
most of Sheffield sox10 due to large reference tissues), I created a
less memory-intensive existing annotations groovy script Select each
annotation region manually in QuPath, then set class as Positive Then,
rerun “existing annotations” groovy script to export faster and delete
remaining ROIs in your file directory to ensure consistent ROI numbering

Example: reference tissues in most of Sheffield sox10–select actual
tissue manually instead of running the algorithm–the large files like
this will prevent efficient exports

![](images/step 1 images/step_1_image_12.png){fig-align="center"}

**Case 6**: not even manual selection works to export large image Try to
export each annotated area at a time by selecting, selecting class →
Positive, and running the “existing annotation” groovy Worst case,
downsample by 2.0 factor max Then, rerun “existing annotations” groovy
script to export faster and delete remaining ROIs in your file directory
to ensure consistent ROI numbering

Example: Sheffield 85 (lots of
samples, junk images, and large files)

:::


These are some ideas on how to create an API from Step 1 to Step 2 if the team decides to do so.

:::{.callout-note collapse=true}

## Integrating Step 1 (QuPath) into Project API


### 1. Using the paquo Library

**paquo** is a Python library designed specifically to interact with QuPath projects. It leverages the jpype library to seamlessly bridge Python and Java, making it possible to manipulate QuPath projects directly from Python. paquo provides native support for creating, editing, and running scripts within QuPath projects, aligning well with the goal of creating a Python-based API.

**Advantages**

- Simplifies Java-Python interaction for people who have little experience with Java or Groovy

- Native support for QuPath scripts and projects

- Can be integrated into current project API (steps 2-3)

**Challenges**

- JVM configuration can be prone to errors
- Requires the correct QuPath version

- References:

https://paquo.readthedocs.io/en/latest/

https://forum.image.sc/t/paquo-read-write-qupath-projects-from-python/41892

### 2. Using Python with QuPath CLI

QuPath provides a command-line interface (CLI) that can be accessed through Python's subprocess module. This allows Python scripts to execute Groovy-based workflows in QuPath indirectly.

**Advantages**

- Don’t need a JVM setup in Python
- Simple and lightweight

**Challenges**

- Limited feedback from QuPath to Python
- Requires separate Groovy scripts
- References:

https://www.imagescientist.com/command-line-and-python

https://forum.image.sc/t/automating-qupath-pipeline-completely-using-python/72341


### 3. Standalone Java Application

A Java application can directly utilize the QuPath API to interact with projects, import images, and execute scripts. This approach bypasses Python entirely and offers complete control over QuPath's capabilities. A Java-based solution can serve as a standalone API or backend that exposes QuPath functionalities via user-friendly interfaces (e.g., GUIs or REST endpoints).

**Advantages**

- Direct access to all QuPath functionalities
- Full performance optimization in Java

**Challenges**

- Requires Java programming expertise
- References: https://forum.image.sc/t/load-project-from-a-project-file-using-qupath-java-api/63613

### 4. Python and Java with Jython

Jython enables Python scripts to directly execute Java code. It acts as a bridge between Python and Java but is limited to Python 2.x. Jython can provide a direct way to call QuPath’s Java API from Python-like syntax, enabling API functionalities like project management and script execution.

**Advantages**

- Direct access to Java classes

**Challenges**

- Limited to Python 2.x.
- No support for modern Python features
- Requires Java programming expertise
- Reference: https://github.com/qupath/qupath/wiki/Working-with-Python

:::


# Results
[**Folder of Extracted
Slices**](https://nuwildcat-my.sharepoint.com/personal/akl0407_ads_northwestern_edu/_layouts/15/onedrive.aspx?id=%2Fpersonal%2Fakl0407%5Fads%5Fnorthwestern%5Fedu%2FDocuments%2FSTAT390%2FProcessed%20data%20%28slices%20extracted%29&ga=1)

[**Tracker Data of Status of Each
Slice**](https://nuwildcat-my.sharepoint.com/:x:/r/personal/akl0407_ads_northwestern_edu/_layouts/15/Doc.aspx?sourcedoc=%7B7050DB24-1558-4042-BF99-C6B17BBAF84D%7D&file=data_uploaded_tracker.xlsx&action=default&mobileredirect=true)