Modified namedb.py and annotate-seqs.py to work with files supplied via ...#4
Modified namedb.py and annotate-seqs.py to work with files supplied via ...#4RamRS wants to merge 3 commits intoctb:masterfrom
Conversation
…ia command line to annotate-seqs.py mouse.namedb, mouse.namedb.fullname and mouse.protein.faa were hardcoded into namedb.py. Now, it picks up from sys.argv[4] and sys.argv[5], where sys.argv is supplied to annotate-seqs.py Possible regression bugs if namedb.py is used by scripts other than annotate-seqs.py that use a different command line argument pattern
|
Ram, thanks for doing, I'll take a look. Can you submit a similar pull request updating the khmer-protocols tutorial, please? |
|
Sure! First thing tomorrow! Ram On Sat, Aug 24, 2013 at 5:06 PM, C. Titus Brown notifications@github.comwrote:
|
|
@ctb : The tutorial is not wired up the the github repo. I'm not sure how to work on the linking, any help would be appreciated! |
|
See Khmer-protocols repository. C. Titus Brown, ctb@msu.edu On Aug 25, 2013, at 8:36, Ram RS notifications@github.com wrote:
|
|
One immediate comment: 'namedb' is an imported script and should not reference sys.argv. If we want to have the database name be an argument, the proper way to handle this would be to have a function that you use to load the pickles, I think. I also don't like making commands even more complicated :). So, I'm leaning towards doing this differently -- you should never need multiple namedbs in a given directory, so why not have 'names.db' which is a pickle that also contains a reference to the sequence file being used? Then you set up the names.db once, with the specific sequence file, make-namedb.py mouse.protein.faa and all the info goes into a standard location, 'names.db', which contains a reference to 'mouse.protein.faa'. Whaddya think? |
|
I was thinking the exact same thing, just did not know how to express it. Like you say, scripts that are not directly directly invoked should not use sys.argv. I agree, we could introduce an intermediate step to create the names file. Also, I'm unfamiliar with the term "pickle (n)". Am I right in saying it refers to a file with the output of a serialization operation? |
|
On Sun, Aug 25, 2013 at 08:05:12AM -0700, Ram RS wrote:
We already have one, right? make-namedb.py. So let's just modify it...
http://docs.python.org/2/library/pickle.html --titusC. Titus Brown, ctb@msu.edu |
|
I am reading that and a couple of other docs too, just not fluent enough to Ram On Sun, Aug 25, 2013 at 11:07 AM, C. Titus Brown
|
|
Just clarifying, we |
|
A seems simplest. And I would just pickle the name of the sequence database as the first dump into the names.db. C. Titus Brown, ctb@msu.edu On Aug 25, 2013, at 11:54, Ram RS notifications@github.com wrote:
|
|
Alright, working on that now. Just a quick question - why not dump names and load into a variable (say, "data") and access using data["names"] and Ram On Sun, Aug 25, 2013 at 12:57 PM, C. Titus Brown
|
file and constant namedb files make-namedb changed to pickle into constant names.db and fullnames.db files, and pickle sequence filename into names.db. namedb.py modified to unpickle correspondingly. Lines expecting extra CMD line args in annotate-seqs removed.
|
These changes are being overridden by the next commit, so I'm closing this pull request. I'll open a new pr on the latest commit shortly. |
|
No, please continue on this pr. If you find yourself wanting to close a pr, please don't! They record valuable history and conversation! C. Titus Brown, ctb@msu.edu On Aug 25, 2013, at 18:16, Ram RS notifications@github.com wrote:
|
|
But closed prs exist too, don't they? Ram On Sun, Aug 25, 2013 at 8:02 PM, C. Titus Brown notifications@github.comwrote:
|
|
A closed pr is an idea we've totally abandoned, or one we've merged. C. Titus Brown, ctb@msu.edu On Aug 25, 2013, at 20:26, Ram RS notifications@github.com wrote:
|
|
Alright, I'll reopen the request, but we're no longer working on the version referenced by the pr - just a note! |
|
You should be working on the version referenced by the PR; that's kinda the point. Fix the version referenced by the PR, don't create a new version or branch. |
|
Got it, will do from the next time I work on this - I'm reading up on Git Ram On Sun, Aug 25, 2013 at 9:58 PM, C. Titus Brown notifications@github.comwrote:
|
|
np. On Sun, Aug 25, 2013 at 07:15:33PM -0700, Ram RS wrote:
C. Titus Brown, ctb@msu.edu |
dict() used in previous commit has been removed, sequential calls to dump() and load() are used instead. A variable 'seqFile' id used to store sequence file name from CMD line args.
...command line to annotate-seqs.py
mouse.namedb, mouse.namedb.fullname and mouse.protein.faa were hardcoded into namedb.py.
Now, it picks up from sys.argv[4] and sys.argv[5], where sys.argv is supplied to
annotate-seqs.py
Possible regression bugs if namedb.py is used by scripts other than annotate-seqs.py
that use a different command line argument pattern