Added project #24

avapapetti · 2020-03-20T22:24:31Z

Ava Papetti

avapapetti · 2020-05-05T15:24:50Z

I am having trouble using pytest-- I've uploaded my first stab at tests anyway but am not positive I am effectively testing each aspect of my code.

I have a few other questions as well:

Would it be better to rename the variable names within the for loop of my recent Project instead of filtering with the same name each time?
In the beginning where I am removing CNVs, I feel it is not the most sophisticated way to eliminate rows I don't need, but I felt that would be more efficient than using a for loop again-- is this OK?

emilyyaklich · 2020-05-05T18:36:28Z

Hi Ava,

What exactly are you having trouble with regarding pytest? After looking at your tests, I think each unit test should test a function that you define in your main scripts to ensure it is performing as desired. So, for example, an example of a test for a function that drops the 'chrM' column from your data is below.
The function:

file1 = pd.read_csv('SL88824_20180802.cnv.csv', sep = "\t")

def cnv_drop_column(file1):
    file1_new = file1.drop(file1[file1.Chrom == 'chrM'].index)
    chroms = file1_new.Chrom.unique()
    return chroms

and a test for this would be (located in your testing directory):

from example_main_script.py import cnv_drop_column

def test_cnv_drop_column():
    file1 = pd.read_csv('SL88824_20180802.cnv.csv', sep = "\t")
    chroms = cnv_drop_column(file1)
    assert 'chrM' not in chroms

By renaming variables do you mean the "file1" and "file2" names? If your code is working and iterating through your data and giving the correct output I see no need to change the names through each iteration, but I also am a bit confused about what this question is asking.
I think that the way that your are removing the rows is fine with

file1 = file1.drop(file1[file1.Chrom == 'chrM'].index)
file1 = file1[(np.abs(file1.Start - file1.Stop) >= min_cnv_length) & (file1.P_Value >= p_value_threshold)]

but I would define this as a function and apply the function to both files because you end up writing the same code twice for file1 and file2.

avapapetti · 2020-05-05T21:12:14Z

Hi Emily,

Thank you for your reply- that cleared up a lot! So for this line in your example:

from example_main_script.py import cnv_drop_column

that goes at the beginning of each test, with 'cnv_drop_column' being the name of the function I'm testing? And is it OK to keep the tests in one file or is it better to have one file for each test?

In regards to pytest, I am able to import it into my notebook without any issues, but when I go to use it, I get "UsageError: Cell magic %%pytest not found."

emilyyaklich · 2020-05-05T21:41:12Z

Yes, exactly. And the import of the functions can go at the top of your script and you can import all of the functions you will test from that file at one time. For example:

from example_main_script.py import cnv_drop_column, other_function, other function2

or you could also use:

from example_main_script.py import *

which will import all of the functions from that script.

That being said, I think it might be good to keep your tests located in one script, similar to the example that we worked through in class.

For the pytest, I believe you need to use:

!pytest

in order to get pytest to run.

Let me know if you have any other questions!

avapapetti · 2020-05-05T21:52:15Z

Ok, got it!

One other thing. I had previously tried to create a function to remove the CNVs as you had suggested I do too, and I realized when I used '.apply()' that it was converting my dataframe into a series:

And when I do '.apply(cnv_remover, axis = 1)', I get this error:

Not sure how to bypass either of these issues.

leesup · 2020-05-06T03:37:08Z

Could you potentially try without axis=1? It might be due to the fact that pandas series is a one-dimensional array!

avapapetti · 2020-05-06T13:12:32Z

I think my main problem is I'm not sure how to access the rows where 'Chrom' is 'chrM'. For the code I have above, with file.drop(), I get this error message:

AttributeError: ("'Series' object has no attribute 'Chrom'", 'occurred at index Chrom')

Is there a way for me to specify for it to look at the Series with the Chrom values and then drop each index where chrM is found?

avapapetti · 2020-05-06T13:34:49Z

Sorry for all of the questions about this, but do I have to use .apply()? Could I instead do

def cnv_remover(file):
    min_cnv_length = 1000
    p_value_threshold = .90
    
    file = file.drop(file[file.Chrom == 'chrM'].index)
    file = file[(np.abs(file.Start - file.Stop) >= min_cnv_length) & (file.P_Value >= p_value_threshold)]
    
    return file

Then save the new file this way: file1 = cnv_remover(file1)

emilyyaklich · 2020-05-06T20:27:02Z

Hi Ava,

Yes, that looks good to me!

avapapetti · 2020-05-07T00:48:23Z

Great!

I'm having another issue with testing now. I have added an init.py file to the same directory as the module, but I keep getting a ModuleNotFoundError.

This is what I ran: from Package.Common_CNV_Finder import *

I've also updated my tests. Is it OK to test the output of my Common_CNV_Finder function is correct by comparing it to a nested for loop output that is reliable (based on test files with which I know the expected outcome)?

My thought was to do add this at the end of my test: assert_frame_equal(test_common_cnvs, for_loop_common_cnvs)

leej3 · 2020-05-07T03:09:26Z

@avapapetti, I've submitted a pull request to your branch with some suggested changes. Just to clarify:

a module is a file ending in .py. Yours were notebooks (ending in ipynb).
a function is an isolated bit of code that can take input and output. It starts with def function_name(arg1,arg2): and is followed by the appropriate code. I have moved your code into functions
common_cnv_finder seems like a nice package name. I have renamed a directory to make that consistent with this.
__init__.py is required (not init.py, it's hard to write that in markdown unfortunately!)

Overall you have a nice solution to your problem. You were just lacking significantly in the packaging/modularization/testing side of things. Have a look through the changes to move forward. Merging them into your branch and working from there might be the easiest.

Moving forward, the most important thing to do is to complete the test template I have written called test_common_csv_finder. That will suffice to get most of the marks for testing. If you have time after that try to make sure you are passing all the tests (and add more if you can).

When completing test_common_csv_finder, think of it as a way to guarantee that if someone pip installs your package, and passes two file paths to common_csv_finder will it produce a result, and is that result correct. Add two small datafilee to tests/data and use them in this call of common_csv_finder

some pointers

avapapetti · 2020-05-08T00:38:49Z

Seems to be getting late for someone to pop into Zoom, so here are my questions:

I've added tests for all of my functions and they all passed! Just wanted to make sure they are legitimate.
I also wanted to double check that I've done all of the steps for packaging.
Is it OK if tests are passing in pytest but not in CircleCI?
If/when I'm ready to submit my final project, do I just push my final changes to my repository?

Thank you.

leej3 · 2020-05-08T01:40:23Z

Sorry, thought you were happily working away.

I've added tests for all of my functions and they all passed! Just wanted to make sure they are legitimate.

yes. well done. Especially using pandas built in assertions. Very nice. In the future looking up pytest "fixtures" for supplying data to tests in a slightly cleaner way. What you have done is fine though.

I also wanted to double check that I've done all of the steps for packaging.

Correct. Your repository is a pip installable python package!

Is it OK if tests are passing in pytest but not in CircleCI?

Not really. Local environments change. People's set ups are different. Having online tests allows several people to have an identical system to confirm their tests are working.

I'm not certain but I think your tests are passing locally because I'm guessing you have a Mac laptop (which has a case insensitive file system). So when tests_dir is defined as "data" it finds it, even though it is named "Data". The online tests are on a linux system that doesn't make that odd logical jump.

Change either in the directory name or the test module and you will pass most of the tests on circleci. You have another test that is failing apart from that. you should see it if you type pytest tests locally.

If/when I'm ready to submit my final project, do I just push my final changes to my repository?

Correct. Your version in your repo at noon tomorrow is what will be graded

avapapetti · 2020-05-08T02:13:16Z

That one is on me I realize my comment in the Google Doc was misleading, sorry about that!

There seems to be discrepancies between what's in the Data directory file locally and what's in the Data directory in my repository. When I try to commit changes, I keep getting things like this:

Also, I'm unsure where I should be running pytest tests.

avapapetti · 2020-05-08T02:15:47Z

Update: I figured out pytest tests. It was the example test you had added-- do I need to keep that file or should I delete it?

leej3 · 2020-05-08T02:17:14Z

That's not a problem. .ipynb_checkpoints was added to .gitignore. That tells git to ignore it. But that directory had already been added. So now you are in a weird state where you struggle to add changes that occur in that directory. It shouldn't be in the repository anyway.

Update: I figured out pytest tests. It was the example test you had added-- do I need to keep that file or should I delete it?

You should absolutely delete it. It's in your way to getting that glorious green tick beside your pull request. It's also a demonstration that "passing all of the tests" is not necessarily a good thing.

avapapetti · 2020-05-08T02:25:19Z

Oh my goodness there's the green check mark! Yay! The sample files I have inside the Data directory are up to date anyway, so there's nothing more I need to do there right? I really appreciate your prompt replies by the way.

leej3 · 2020-05-08T03:00:46Z

Oh my goodness there's the green check mark! Yay!

Glorious isn't it? Well done. For the record adding DS_store, .ipynb_checkpoints, egg_info, and pycache to .gitignore is a good idea. And any other superfluous files. not a big deal for this project but worth keeping in mind.

But yes, all done.

avapapetti · 2020-05-08T03:04:35Z

Thank you! That is good to know, I'll try to do that now. And thanks for all your help and insight!

avapapetti · 2020-05-08T03:59:36Z

So sorry it's late but one last thing-- I just tried to pull my repository and I got this:

From https://github.com/avapapetti/project_spring_2020
(*) branch HEAD -> FETCH_HEAD
fatal: refusing to merge unrelated histories

That won't affect anything when you compile my code tomorrow will it?

leej3 · 2020-05-08T13:25:13Z

You are all good. Working tests on circleci are hard to argue with.

I think it may be just an issue with the command you ran. Or you ran it from a different git repository perhaps.

emilyyaklich · 2020-05-09T13:56:07Z

Hi Ava,

You did a really great job on this project. Pandas is a great tool when working with datasets and you employed it nicely. Also, the unit tests for your dataframes are great as well. In the future, I think it is good practice to add the .ipynb_checkpoints files to your .gitignore file. Overall, very well done!

Cheers,

Emily

avapapetti added 7 commits February 27, 2020 20:15

Added project

a306954

first draft commit

d458e3b

second draft commit

3ba9f42

reupload of first draft

a61cabc

updated project draft

5e12d17

Project update

13ba497

Test draft1

ddb3a00

avapapetti added 5 commits May 6, 2020 17:13

add setup

2566b37

updated README

3087965

Delete Project_withmethods.ipynb

9ed3c89

Delete sample_file.py

0cb550a

Delete Project2_Update.ipynb

14138b6

leej3 added 6 commits May 6, 2020 22:10

add setup.py and rename directory containing packages code

7cd3fff

move code to module

5bd0f83

move test code to module

2bc3d57

add init file so that installation works

b0fc239

make into functions

a925220

add an example pattern for a test

0001b0d

Merge pull request #1 from leej3/turn_into_package_with_functions

9c18ed0

some pointers

avapapetti added 2 commits May 7, 2020 20:26

add test for read_cnv function

be38560

add year and author to license

7684cd8

avapapetti added 5 commits May 7, 2020 21:50

updated data directory

0268076

add overlap argument

3ec3f5b

update common_cnv_finder

bca14cb

updat .egg

018a784

update data directory

f3d78d8

removed sample_test

1849dc7

avapapetti added 4 commits May 7, 2020 22:32

reorganize tests

ca9b782

reorganize fucntion

12cbe06

update stup description

d2cf98e

Delete Project_withforloop.ipynb

b89551b

avapapetti added 2 commits May 7, 2020 23:01

update README

d5a3412

Update README.md

5dbb8e6

commit test_output

ff81037

Delete test_output.csv

d7fed1c

avapapetti added 3 commits May 25, 2020 11:25

remove file

8af2058

remove .egg

c6630b5

update gitignore

f9cbdda

Added project #24

Are you sure you want to change the base?

Added project #24

Uh oh!

Conversation

avapapetti commented Mar 20, 2020

Uh oh!

avapapetti commented May 5, 2020

Uh oh!

emilyyaklich commented May 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

avapapetti commented May 5, 2020

Uh oh!

emilyyaklich commented May 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

avapapetti commented May 5, 2020

Uh oh!

leesup commented May 6, 2020

Uh oh!

avapapetti commented May 6, 2020

Uh oh!

avapapetti commented May 6, 2020

Uh oh!

emilyyaklich commented May 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

avapapetti commented May 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leej3 commented May 7, 2020

Uh oh!

avapapetti commented May 8, 2020

Uh oh!

leej3 commented May 8, 2020

Uh oh!

avapapetti commented May 8, 2020

Uh oh!

avapapetti commented May 8, 2020

Uh oh!

leej3 commented May 8, 2020

Uh oh!

avapapetti commented May 8, 2020

Uh oh!

leej3 commented May 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

avapapetti commented May 8, 2020

Uh oh!

avapapetti commented May 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leej3 commented May 8, 2020

Uh oh!

emilyyaklich commented May 9, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

emilyyaklich commented May 5, 2020 •

edited

Loading

emilyyaklich commented May 5, 2020 •

edited

Loading

emilyyaklich commented May 6, 2020 •

edited

Loading

avapapetti commented May 7, 2020 •

edited

Loading

leej3 commented May 8, 2020 •

edited

Loading

avapapetti commented May 8, 2020 •

edited

Loading