Skip to content

Modified namedb.py and annotate-seqs.py to work with files supplied via ...#4

Open
RamRS wants to merge 3 commits intoctb:masterfrom
RamRS:master
Open

Modified namedb.py and annotate-seqs.py to work with files supplied via ...#4
RamRS wants to merge 3 commits intoctb:masterfrom
RamRS:master

Conversation

@RamRS
Copy link

@RamRS RamRS commented Aug 23, 2013

...command line to annotate-seqs.py

mouse.namedb, mouse.namedb.fullname and mouse.protein.faa were hardcoded into namedb.py.
Now, it picks up from sys.argv[4] and sys.argv[5], where sys.argv is supplied to
annotate-seqs.py

Possible regression bugs if namedb.py is used by scripts other than annotate-seqs.py
that use a different command line argument pattern

…ia command line to annotate-seqs.py

mouse.namedb, mouse.namedb.fullname and mouse.protein.faa were hardcoded into namedb.py.
Now, it picks up from sys.argv[4] and sys.argv[5], where sys.argv is supplied to
annotate-seqs.py

Possible regression bugs if namedb.py is used by scripts other than annotate-seqs.py
that use a different command line argument pattern
@ctb
Copy link
Owner

ctb commented Aug 24, 2013

Ram, thanks for doing, I'll take a look. Can you submit a similar pull request updating the khmer-protocols tutorial, please?

@RamRS
Copy link
Author

RamRS commented Aug 25, 2013

Sure! First thing tomorrow!

Ram

On Sat, Aug 24, 2013 at 5:06 PM, C. Titus Brown notifications@github.comwrote:

Ram, thanks for doing, I'll take a look. Can you submit a similar pull
request updating the khmer-protocols tutorial, please?


Reply to this email directly or view it on GitHubhttps://github.com//pull/4#issuecomment-23216643
.

@RamRS
Copy link
Author

RamRS commented Aug 25, 2013

@ctb : The tutorial is not wired up the the github repo. I'm not sure how to work on the linking, any help would be appreciated!

@ctb
Copy link
Owner

ctb commented Aug 25, 2013

See Khmer-protocols repository.


C. Titus Brown, ctb@msu.edu

On Aug 25, 2013, at 8:36, Ram RS notifications@github.com wrote:

@ctb : The tutorial is not wired up the the github repo. I'm not sure how to work on the linking, any help would be appreciated!


Reply to this email directly or view it on GitHub.

@ctb
Copy link
Owner

ctb commented Aug 25, 2013

One immediate comment: 'namedb' is an imported script and should not reference sys.argv. If we want to have the database name be an argument, the proper way to handle this would be to have a function that you use to load the pickles, I think.

I also don't like making commands even more complicated :).

So, I'm leaning towards doing this differently -- you should never need multiple namedbs in a given directory, so why not have 'names.db' which is a pickle that also contains a reference to the sequence file being used? Then you set up the names.db once, with the specific sequence file,

make-namedb.py mouse.protein.faa

and all the info goes into a standard location, 'names.db', which contains a reference to 'mouse.protein.faa'.

Whaddya think?

@RamRS
Copy link
Author

RamRS commented Aug 25, 2013

I was thinking the exact same thing, just did not know how to express it. Like you say, scripts that are not directly directly invoked should not use sys.argv.

I agree, we could introduce an intermediate step to create the names file.

Also, I'm unfamiliar with the term "pickle (n)". Am I right in saying it refers to a file with the output of a serialization operation?

@ctb
Copy link
Owner

ctb commented Aug 25, 2013

On Sun, Aug 25, 2013 at 08:05:12AM -0700, Ram RS wrote:

I was thinking the exact same thing, just did not know how to express it. Like you say, scripts that are not directly directly invoked should not use sys.argv.

I agree, we could introduce an intermediate step to create the names file.

We already have one, right? make-namedb.py. So let's just modify it...

Also, I'm unfamiliar with the term "pickle (n)". Am I right in saying it refers to a file with the output of a serialization operation?

http://docs.python.org/2/library/pickle.html

--titus

C. Titus Brown, ctb@msu.edu

@RamRS
Copy link
Author

RamRS commented Aug 25, 2013

I am reading that and a couple of other docs too, just not fluent enough to
use it and describe stuff. Thanks though!

Ram

On Sun, Aug 25, 2013 at 11:07 AM, C. Titus Brown
notifications@github.comwrote:

On Sun, Aug 25, 2013 at 08:05:12AM -0700, Ram RS wrote:

I was thinking the exact same thing, just did not know how to express
it. Like you say, scripts that are not directly directly invoked should not
use sys.argv.

I agree, we could introduce an intermediate step to create the names
file.

We already have one, right? make-namedb.py. So let's just modify it...

Also, I'm unfamiliar with the term "pickle (n)". Am I right in saying it
refers to a file with the output of a serialization operation?

http://docs.python.org/2/library/pickle.html

--titus

C. Titus Brown, ctb@msu.edu


Reply to this email directly or view it on GitHubhttps://github.com//pull/4#issuecomment-23229260
.

@RamRS
Copy link
Author

RamRS commented Aug 25, 2013

Just clarifying, we
a. use a names.db instead of the mouse.namedb (or do we retain the mouse.namedb but create a new names.db to point to both mouse.namedb and mouse.protein.faa?)
b. if a is true, we push the seq file name as the first/last line of the names.db

@ctb
Copy link
Owner

ctb commented Aug 25, 2013

A seems simplest. And I would just pickle the name of the sequence database as the first dump into the names.db.


C. Titus Brown, ctb@msu.edu

On Aug 25, 2013, at 11:54, Ram RS notifications@github.com wrote:

Just clarifying, we
a. use a names.db instead of the mouse.namedb (or do we retain the mouse.namedb but create a new names.db to point to both mouse.namedb and mouse.protein.faa?)
b. if a is true, we push the seq file name as the first/last line of the names.db


Reply to this email directly or view it on GitHub.

@RamRS
Copy link
Author

RamRS commented Aug 25, 2013

Alright, working on that now. Just a quick question - why not dump names
and fullnames in the same file? Use a
dump(dict(seq=seqFile,names=namesArray,fullnames=fullNameArray),fp)

and load into a variable (say, "data") and access using data["names"] and
data["fullnames"]?

Ram

On Sun, Aug 25, 2013 at 12:57 PM, C. Titus Brown
notifications@github.comwrote:

A seems simplest. And I would just pickle the name of the sequence
database as the first dump into the names.db.


C. Titus Brown, ctb@msu.edu

On Aug 25, 2013, at 11:54, Ram RS notifications@github.com wrote:

Just clarifying, we
a. use a names.db instead of the mouse.namedb (or do we retain the
mouse.namedb but create a new names.db to point to both mouse.namedb and
mouse.protein.faa?)
b. if a is true, we push the seq file name as the first/last line of the
names.db


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHubhttps://github.com//pull/4#issuecomment-23231184
.

file and constant namedb files

make-namedb changed to pickle into constant names.db and fullnames.db
files, and pickle sequence filename into names.db. namedb.py modified to
unpickle correspondingly. Lines expecting extra CMD line args in
annotate-seqs removed.
@RamRS
Copy link
Author

RamRS commented Aug 25, 2013

These changes are being overridden by the next commit, so I'm closing this pull request. I'll open a new pr on the latest commit shortly.

@RamRS RamRS closed this Aug 25, 2013
@ctb
Copy link
Owner

ctb commented Aug 26, 2013

No, please continue on this pr. If you find yourself wanting to close a pr, please don't! They record valuable history and conversation!


C. Titus Brown, ctb@msu.edu

On Aug 25, 2013, at 18:16, Ram RS notifications@github.com wrote:

These changes are being overridden by the next commit, so I'm closing this pull request. I'll open a new pr on the latest commit shortly.


Reply to this email directly or view it on GitHub.

@RamRS
Copy link
Author

RamRS commented Aug 26, 2013

But closed prs exist too, don't they?

Ram

On Sun, Aug 25, 2013 at 8:02 PM, C. Titus Brown notifications@github.comwrote:

No, please continue on this pr. If you find yourself wanting to close a
pr, please don't! They record valuable history and conversation!


C. Titus Brown, ctb@msu.edu

On Aug 25, 2013, at 18:16, Ram RS notifications@github.com wrote:

These changes are being overridden by the next commit, so I'm closing
this pull request. I'll open a new pr on the latest commit shortly.


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHubhttps://github.com//pull/4#issuecomment-23238284
.

@ctb
Copy link
Owner

ctb commented Aug 26, 2013

A closed pr is an idea we've totally abandoned, or one we've merged.


C. Titus Brown, ctb@msu.edu

On Aug 25, 2013, at 20:26, Ram RS notifications@github.com wrote:

But closed prs exist too, don't they?

Ram

On Sun, Aug 25, 2013 at 8:02 PM, C. Titus Brown notifications@github.comwrote:

No, please continue on this pr. If you find yourself wanting to close a
pr, please don't! They record valuable history and conversation!


C. Titus Brown, ctb@msu.edu

On Aug 25, 2013, at 18:16, Ram RS notifications@github.com wrote:

These changes are being overridden by the next commit, so I'm closing
this pull request. I'll open a new pr on the latest commit shortly.


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHubhttps://github.com//pull/4#issuecomment-23238284
.


Reply to this email directly or view it on GitHub.

@RamRS
Copy link
Author

RamRS commented Aug 26, 2013

Alright, I'll reopen the request, but we're no longer working on the version referenced by the pr - just a note!

@RamRS RamRS reopened this Aug 26, 2013
@ctb
Copy link
Owner

ctb commented Aug 26, 2013

You should be working on the version referenced by the PR; that's kinda the point. Fix the version referenced by the PR, don't create a new version or branch.

@RamRS
Copy link
Author

RamRS commented Aug 26, 2013

Got it, will do from the next time I work on this - I'm reading up on Git
usage, please excuse the lapses - they will positively go down rapidly over
a very short time!

Ram

On Sun, Aug 25, 2013 at 9:58 PM, C. Titus Brown notifications@github.comwrote:

You should be working on the version referenced by the PR; that's kinda
the point. Fix the version referenced by the PR, don't create a new version
or branch.


Reply to this email directly or view it on GitHubhttps://github.com//pull/4#issuecomment-23240390
.

@ctb
Copy link
Owner

ctb commented Aug 26, 2013

np.

On Sun, Aug 25, 2013 at 07:15:33PM -0700, Ram RS wrote:

Got it, will do from the next time I work on this - I'm reading up on Git
usage, please excuse the lapses - they will positively go down rapidly over
a very short time!

Ram

On Sun, Aug 25, 2013 at 9:58 PM, C. Titus Brown notifications@github.comwrote:

You should be working on the version referenced by the PR; that's kinda
the point. Fix the version referenced by the PR, don't create a new version
or branch.

???
Reply to this email directly or view it on GitHubhttps://github.com//pull/4#issuecomment-23240390
.


Reply to this email directly or view it on GitHub:
#4 (comment)

C. Titus Brown, ctb@msu.edu

dict() used in previous commit has been removed, sequential calls to
dump() and load() are used instead. A variable 'seqFile' id used to
store sequence file name from CMD line args.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants