Sample size calculation #519

selenabr · 2024-06-26T18:26:33Z

Description

Implementation of sample_size_calculation to calculate the required sample size for a selected protein. For this purpose, the variance method was added and an output field was implemented to display the result. The variance and the sample size calculation-method are tested with test-data in test_power_analysis.py

Changes

sample_size_calculation + variance method:

protzilla/data_analysis/power_analysis.py
ui/runs/forms/data_analysis.py
protzilla/methods/data_analysis.py

Display output field:

ui/runs/templates/runs/details.html
ui/runs/views.py
protzilla/steps.py

Test:

tests/protzilla/data_analysis/test_power_analysis.py

Mergeability

main-branch has been merged into local branch to resolve conflicts
The tests and linter have passed AFTER local merge
The code has been formatted with black

Code review

I have self-reviewed my code.
At least one other developer reviewed and approved the changes

…ata_analysis.py

…n significance from t-test

…e calculation function

# Conflicts: # protzilla/methods/data_analysis.py # ui/runs/form_mapping.py # ui/runs/forms/data_analysis.py

…ther fields

# Conflicts: # protzilla/methods/data_analysis.py # ui/runs/forms/data_analysis.py

hendraet

Looks good so far. I have written pretty much the same code for the calculation :)

As usual, there are still minor things that could be improved, but that's mainly code style.

hendraet · 2024-06-28T07:31:47Z

protzilla/data_analysis/power_analysis.py

+    if intensity_name is None:
+        intensity_name = "Normalised iBAQ"


Assumes that data has to be normalized before feeding into the step. Otherwise the column doesn't exist. I would say that is an unnecessary limitation that is not transparent to the user

hendraet · 2024-06-28T07:32:31Z

protzilla/data_analysis/power_analysis.py

+    if intensity_name is None:
+        intensity_name = "Normalised iBAQ"


I feel that this could just the default argument if it is set anyways. Or is there a reason why the default has to be None?

protzilla/data_analysis/power_analysis.py

hendraet · 2024-06-28T08:17:33Z

ui/runs/forms/data_analysis.py

+    def fill_form(self, run: Run) -> None:
+        self.fields["t_test_results"].choices = get_t_test_results(run)
+
+class PowerAnalysisSampleSizeCalculationForm(MethodForm):


In general, you should run a code formatter. Often there too few blank lines (between methods or here above the class, you should leave two empty lines) and unnecessary whitespaces around equal signs (as below in the ...Field()

…ary-function of Sample Size Calculation have the same result

…function

hendraet · 2024-07-22T08:44:20Z

protzilla/data_analysis/power_analysis.py

+        intensity_name=intensity_name,
+    )
+    sample_size = differentially_expressed_proteins_df.groupby('Group')['Sample'].count()
+    z_beta = fc_threshold * np.sqrt(sample_size/(2*variance_protein_group**2))-z_alpha


Suggested change

z_beta = fc_threshold * np.sqrt(sample_size/(2*variance_protein_group**2))-z_alpha

z_beta = fc_threshold * np.sqrt(sample_size / (2 * variance_protein_group)) - z_alpha

I think the square is too much since we are already dealing with variances and not standard deviations. (Also some minor formatting issues)

…ved validation methods to separate file

…roteins_only, intensity_name)

…iduals. If so, the mean values per individual are used to calculate the power and sample size.

…ecked values from paper of Cairns

… the distribution in a violin plot

…n't be displayed anymore (WIP...)

hendraet · 2024-09-04T06:55:10Z

protzilla/data_analysis/power_analysis.py

+    fig = go.Figure()
+
+    fig.add_trace(
+        go.Violin(
+            x=["Protein Groups"] * len(required_sample_sizes),
+            y=required_sample_sizes,
+            line_color=colors[1],
+            **violin_plot_args
+        )
+    )
+    fig.update_layout(


If you don't add traces to a figure dynamically (e.g. in a for loop), you can also pass the trace directly to go.Figure()

hendraet · 2024-09-04T06:56:47Z

protzilla/data_analysis/power_analysis.py

+    violin_plot_args = dict(
+        meanline_visible=True,
+        box_visible=True,
+        scalemode='width',
+        spanmode='hard',
+        span=[0, required_sample_size_for_all_proteins],
+        fillcolor='rgba(0,0,0,0)'
+    )


would pass these arguments directly to go.Violin(). Since you are not reusing them somewhere else, it just makes the code harder to read because these args are in a different place

hendraet · 2024-09-04T08:00:37Z

protzilla/data_analysis/power_analysis.py

+
+    fig.add_trace(
+        go.Violin(
+            x=["Protein Groups"] * len(required_sample_sizes),


Would omit the x parameter and just use name="Protein Groups". It's easier to read

…or all proteins

…thods "...for All Proteins"

meta file that includes an additional column that identifies the individual sample IDs.

… test_power_analysis.py

Jonas0000

Ich habe deinen PR einmal gereviewed, damit der möglichst bald gemerged werden kann. Sehr cool, dass du so viele neue Funktionen eingebaut hast!
Ein paar kleinere Fragen habe ich dir an den Code geschrieben.
Außerdem werde ich gleich mal Änderungen commiten, mit denen deinen Steps an den neuen Syntax angepasst werden. Inhaltlich habe ich mir deine neuen Steps nicht angeschaut.

Ich habe gesehen, dass du sehr viel Code geformatted hast - vermutlich automatisch durch einen formatter? Grundsätzlich finde ich das sehr ut und es macht den Code deutlich lesbarer.
Wäre es aber für dich einfach möglich die Formatänderungen rückgängig zu machen?
Ich glaube das würde es uns deutlich einfacher machen, den Code ins neue Protzilla zu mergen. Wenn nicht, bekommen wir das bestimmt auch so hin. Ich frage hierzu auch nochmal im BP nach, wie dort die Meinung ist.

Jonas0000 · 2025-03-06T20:20:28Z

ui/runs/views.py

            description=description,
            method_form=method_form,
            is_form_dynamic=method_form.is_dynamic,
+            plot_form=plot_form,


We merged the plot form with the calculate form so that every step owns only one form containing all input fields. So this line shouldn't be necessary anymore.

Jonas0000 · 2025-03-06T20:23:48Z

user_data/workflows/overhaul.yaml:Zone.Identifier

For what is this file? It doesn't look like a normal workflow and I can't figure out what's the purpose of this file.

Unfortunately : aren't allowed in paths on windows machines so that i can't checkout to your branch because of this file. I hope so much that you don't need this file :D

Oh, I also don't know what this file is for. I've asked the others from the old project, but nobody seems to know. Also, the git history is empty, so should I just delete it?

Jonas0000 · 2025-03-06T22:51:52Z

tests/protzilla/data_analysis/test_power_analysis.py

Why are these test commented out?

Actually, the first tests up to line 202 shouldn't be commented out. They tested the new methods on the old branch, and they worked. I think I commented them out because the methods didn't work on the dev branch due to the new changes...

Jonas0000 · 2025-03-06T23:02:31Z

user_data/workflows/standard.yaml

Should the new steps really be part of the standard workflow?

I'm not sure, maybe we should talk to Chris about this. But I think it's totally fine if the new steps are just available in PROTzilla :)

selenabr

Vielen Dank fürs Reviewen! :)
Bezüglich des formatting: Wir haben alle den Black formatter benutzt. Ich weiß nicht, ob es möglich ist, alle Formatierungsänderungen rückgängig zu machen. Vielleicht könnt ihr einfach über den branch das Formatting rüberlaufen lassen, was ihr selbst benutzt, falls es nicht black ist?

selenabr · 2025-03-07T12:22:28Z

user_data/workflows/standard.yaml

I'm not sure, maybe we should talk to Chris about this. But I think it's totally fine if the new steps are just available in PROTzilla :)

selenabr · 2025-03-07T12:24:31Z

user_data/workflows/overhaul.yaml:Zone.Identifier

Oh, I also don't know what this file is for. I've asked the others from the old project, but nobody seems to know. Also, the git history is empty, so should I just delete it?

selenabr · 2025-03-07T12:28:55Z

tests/protzilla/data_analysis/test_power_analysis.py

Actually, the first tests up to line 202 shouldn't be commented out. They tested the new methods on the old branch, and they worked. I think I commented them out because the methods didn't work on the dev branch due to the new changes...

selenabr added 13 commits June 5, 2024 17:12

added sample size calculation in methods\data_analysis.py and forms\d…

73e13a2

…ata_analysis.py

enabled possibility to choose one protein for calculation dependent o…

f133c87

…n significance from t-test

fixed errors with missing inputs

49c7f0e

added variance calculation and testing function and edited sample siz…

6d8c9a8

…e calculation function

fixed some errors

0b95cf0

output field for result

22c293d

Merge branch 'dev' into bachelor-thesis-selena

fd756df

# Conflicts: # protzilla/methods/data_analysis.py # ui/runs/form_mapping.py # ui/runs/forms/data_analysis.py

further implementation of output field for result

b22b6e7

display display_output in output field

c6a2f3b

display_output field displayed in the same size and position as the o…

032286c

…ther fields

test function for sample_size_calculation

e90fab3

Merge branch 'dev' into bachelor-thesis-selena

01eba42

# Conflicts: # protzilla/methods/data_analysis.py # ui/runs/forms/data_analysis.py

edited description of function

d3cf9d8

hendraet reviewed Jun 28, 2024

View reviewed changes

selenabr added 2 commits July 8, 2024 06:38

check if implemented function of Paper (Cairns et al., 2009) and libr…

3ce4ae1

…ary-function of Sample Size Calculation have the same result

power calculation and test of library-function and implemented paper-…

f78b0b9

…function

hendraet reviewed Jul 22, 2024

View reviewed changes

selenabr added 10 commits August 21, 2024 00:56

added test for power_calculation method

e3dd1c3

fixed constructor error

2e3de5a

sample size calculation for different group sizes (Cohen 1988) and mo…

a46a074

…ved validation methods to separate file

code formatting, resolved comments (output not a float, significant_p…

3446be3

…roteins_only, intensity_name)

feature: user can choose whether metadata contains a column for indiv…

cb25777

…iduals. If so, the mean values per individual are used to calculate the power and sample size.

adapted test for power_calculation and sample_size_calculation and ch…

52ef105

…ecked values from paper of Cairns

added function that calculates sample size for all proteins and shows…

ac9e783

… the distribution in a violin plot

formatting

e54c767

commented the dataframe-output-stuff out, otherwise violin plot could…

2faa972

…n't be displayed anymore (WIP...)

changed color of violinplot and added axis-description

25cf2b2

hendraet reviewed Sep 4, 2024

View reviewed changes

selenabr added 2 commits September 5, 2024 13:10

changed color of violinplot and removed axis-description

ae4e8cb

resolved comments

5c63008

selenabr and others added 8 commits September 5, 2024 16:20

Added function to get dataframes with sample size column as output

0adc15c

Added power_calculation_for_all_proteins to calculate minimum power f…

dcba877

…or all proteins

Fixed hover display of violin plots

eb32984

fixed typo and removed unnecessary comment

1adda1b

calculations for thesis (should be removed before merging into dev)

d0ec174

calculations for thesis (should be removed before merging into dev)

776dc55

put calculation for thesis into comment and changed description of me…

7131d3b

…thods "...for All Proteins"

Add files via upload

6e2daa3

meta file that includes an additional column that identifies the individual sample IDs.

henninggaertner marked this pull request as draft November 21, 2024 14:41

henninggaertner changed the title ~~WIP: Sample size calculation~~ Sample size calculation Nov 21, 2024

sarahvgls added the to be merged label Feb 26, 2025

selenabr and others added 4 commits March 4, 2025 19:13

Merge branch 'dev' into bachelor-thesis-selena

6778796

merge bachelor-thesis-selena into dev

01e9d5f

fixed error in power_analysis.py (constants.color) and commented file…

412dfd1

… test_power_analysis.py

changed steps to new format

7b6c159

Jonas0000 reviewed Mar 6, 2025

View reviewed changes

selenabr commented Mar 7, 2025

View reviewed changes

		if intensity_name is None:
		intensity_name = "Normalised iBAQ"

	z_beta = fc_threshold * np.sqrt(sample_size/(2variance_protein_group*2))-z_alpha
	z_beta = fc_threshold * np.sqrt(sample_size / (2 * variance_protein_group)) - z_alpha

Sample size calculation #519

Are you sure you want to change the base?

Sample size calculation #519

Uh oh!

Conversation

selenabr commented Jun 26, 2024

Description

Changes

Uh oh!

hendraet left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hendraet Jul 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jonas0000 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

selenabr left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

hendraet Jul 22, 2024 •

edited

Loading