-
Notifications
You must be signed in to change notification settings - Fork 1
Sample size calculation #519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
…n significance from t-test
…e calculation function
# Conflicts: # protzilla/methods/data_analysis.py # ui/runs/form_mapping.py # ui/runs/forms/data_analysis.py
# Conflicts: # protzilla/methods/data_analysis.py # ui/runs/forms/data_analysis.py
hendraet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good so far. I have written pretty much the same code for the calculation :)
As usual, there are still minor things that could be improved, but that's mainly code style.
| if intensity_name is None: | ||
| intensity_name = "Normalised iBAQ" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assumes that data has to be normalized before feeding into the step. Otherwise the column doesn't exist. I would say that is an unnecessary limitation that is not transparent to the user
| if intensity_name is None: | ||
| intensity_name = "Normalised iBAQ" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that this could just the default argument if it is set anyways. Or is there a reason why the default has to be None?
| def fill_form(self, run: Run) -> None: | ||
| self.fields["t_test_results"].choices = get_t_test_results(run) | ||
|
|
||
| class PowerAnalysisSampleSizeCalculationForm(MethodForm): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, you should run a code formatter. Often there too few blank lines (between methods or here above the class, you should leave two empty lines) and unnecessary whitespaces around equal signs (as below in the ...Field()
…ary-function of Sample Size Calculation have the same result
| intensity_name=intensity_name, | ||
| ) | ||
| sample_size = differentially_expressed_proteins_df.groupby('Group')['Sample'].count() | ||
| z_beta = fc_threshold * np.sqrt(sample_size/(2*variance_protein_group**2))-z_alpha |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| z_beta = fc_threshold * np.sqrt(sample_size/(2*variance_protein_group**2))-z_alpha | |
| z_beta = fc_threshold * np.sqrt(sample_size / (2 * variance_protein_group)) - z_alpha |
I think the square is too much since we are already dealing with variances and not standard deviations. (Also some minor formatting issues)
…ved validation methods to separate file
…roteins_only, intensity_name)
…iduals. If so, the mean values per individual are used to calculate the power and sample size.
…ecked values from paper of Cairns
… the distribution in a violin plot
…n't be displayed anymore (WIP...)
| fig = go.Figure() | ||
|
|
||
| fig.add_trace( | ||
| go.Violin( | ||
| x=["Protein Groups"] * len(required_sample_sizes), | ||
| y=required_sample_sizes, | ||
| line_color=colors[1], | ||
| **violin_plot_args | ||
| ) | ||
| ) | ||
| fig.update_layout( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't add traces to a figure dynamically (e.g. in a for loop), you can also pass the trace directly to go.Figure()
| violin_plot_args = dict( | ||
| meanline_visible=True, | ||
| box_visible=True, | ||
| scalemode='width', | ||
| spanmode='hard', | ||
| span=[0, required_sample_size_for_all_proteins], | ||
| fillcolor='rgba(0,0,0,0)' | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would pass these arguments directly to go.Violin(). Since you are not reusing them somewhere else, it just makes the code harder to read because these args are in a different place
|
|
||
| fig.add_trace( | ||
| go.Violin( | ||
| x=["Protein Groups"] * len(required_sample_sizes), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would omit the x parameter and just use name="Protein Groups". It's easier to read
…thods "...for All Proteins"
meta file that includes an additional column that identifies the individual sample IDs.
Jonas0000
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ich habe deinen PR einmal gereviewed, damit der möglichst bald gemerged werden kann. Sehr cool, dass du so viele neue Funktionen eingebaut hast!
Ein paar kleinere Fragen habe ich dir an den Code geschrieben.
Außerdem werde ich gleich mal Änderungen commiten, mit denen deinen Steps an den neuen Syntax angepasst werden. Inhaltlich habe ich mir deine neuen Steps nicht angeschaut.
Ich habe gesehen, dass du sehr viel Code geformatted hast - vermutlich automatisch durch einen formatter? Grundsätzlich finde ich das sehr ut und es macht den Code deutlich lesbarer.
Wäre es aber für dich einfach möglich die Formatänderungen rückgängig zu machen?
Ich glaube das würde es uns deutlich einfacher machen, den Code ins neue Protzilla zu mergen. Wenn nicht, bekommen wir das bestimmt auch so hin. Ich frage hierzu auch nochmal im BP nach, wie dort die Meinung ist.
| description=description, | ||
| method_form=method_form, | ||
| is_form_dynamic=method_form.is_dynamic, | ||
| plot_form=plot_form, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We merged the plot form with the calculate form so that every step owns only one form containing all input fields. So this line shouldn't be necessary anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what is this file? It doesn't look like a normal workflow and I can't figure out what's the purpose of this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately : aren't allowed in paths on windows machines so that i can't checkout to your branch because of this file. I hope so much that you don't need this file :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I also don't know what this file is for. I've asked the others from the old project, but nobody seems to know. Also, the git history is empty, so should I just delete it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these test commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the first tests up to line 202 shouldn't be commented out. They tested the new methods on the old branch, and they worked. I think I commented them out because the methods didn't work on the dev branch due to the new changes...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the new steps really be part of the standard workflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, maybe we should talk to Chris about this. But I think it's totally fine if the new steps are just available in PROTzilla :)
selenabr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vielen Dank fürs Reviewen! :)
Bezüglich des formatting: Wir haben alle den Black formatter benutzt. Ich weiß nicht, ob es möglich ist, alle Formatierungsänderungen rückgängig zu machen. Vielleicht könnt ihr einfach über den branch das Formatting rüberlaufen lassen, was ihr selbst benutzt, falls es nicht black ist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, maybe we should talk to Chris about this. But I think it's totally fine if the new steps are just available in PROTzilla :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I also don't know what this file is for. I've asked the others from the old project, but nobody seems to know. Also, the git history is empty, so should I just delete it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the first tests up to line 202 shouldn't be commented out. They tested the new methods on the old branch, and they worked. I think I commented them out because the methods didn't work on the dev branch due to the new changes...
Description
Implementation of sample_size_calculation to calculate the required sample size for a selected protein. For this purpose, the variance method was added and an output field was implemented to display the result. The variance and the sample size calculation-method are tested with test-data in test_power_analysis.py
Changes
sample_size_calculation + variance method:
Display output field:
Test:
Mergeability
blackCode review