Modified namedb.py and annotate-seqs.py to work with files supplied via ... by RamRS · Pull Request #4 · ctb/eel-pond

RamRS · 2013-08-23T18:44:50Z

...command line to annotate-seqs.py

mouse.namedb, mouse.namedb.fullname and mouse.protein.faa were hardcoded into namedb.py.
Now, it picks up from sys.argv[4] and sys.argv[5], where sys.argv is supplied to
annotate-seqs.py

Possible regression bugs if namedb.py is used by scripts other than annotate-seqs.py
that use a different command line argument pattern

…ia command line to annotate-seqs.py mouse.namedb, mouse.namedb.fullname and mouse.protein.faa were hardcoded into namedb.py. Now, it picks up from sys.argv[4] and sys.argv[5], where sys.argv is supplied to annotate-seqs.py Possible regression bugs if namedb.py is used by scripts other than annotate-seqs.py that use a different command line argument pattern

ctb · 2013-08-24T21:06:11Z

Ram, thanks for doing, I'll take a look. Can you submit a similar pull request updating the khmer-protocols tutorial, please?

RamRS · 2013-08-25T05:22:35Z

Sure! First thing tomorrow!

Ram

On Sat, Aug 24, 2013 at 5:06 PM, C. Titus Brown notifications@github.comwrote:

Ram, thanks for doing, I'll take a look. Can you submit a similar pull
request updating the khmer-protocols tutorial, please?

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/4#issuecomment-23216643
.

RamRS · 2013-08-25T12:36:39Z

@ctb : The tutorial is not wired up the the github repo. I'm not sure how to work on the linking, any help would be appreciated!

ctb · 2013-08-25T12:48:22Z

See Khmer-protocols repository.

C. Titus Brown, ctb@msu.edu

On Aug 25, 2013, at 8:36, Ram RS notifications@github.com wrote:

@ctb : The tutorial is not wired up the the github repo. I'm not sure how to work on the linking, any help would be appreciated!

—
Reply to this email directly or view it on GitHub.

ctb · 2013-08-25T14:58:02Z

One immediate comment: 'namedb' is an imported script and should not reference sys.argv. If we want to have the database name be an argument, the proper way to handle this would be to have a function that you use to load the pickles, I think.

I also don't like making commands even more complicated :).

So, I'm leaning towards doing this differently -- you should never need multiple namedbs in a given directory, so why not have 'names.db' which is a pickle that also contains a reference to the sequence file being used? Then you set up the names.db once, with the specific sequence file,

make-namedb.py mouse.protein.faa

and all the info goes into a standard location, 'names.db', which contains a reference to 'mouse.protein.faa'.

Whaddya think?

RamRS · 2013-08-25T15:05:12Z

I was thinking the exact same thing, just did not know how to express it. Like you say, scripts that are not directly directly invoked should not use sys.argv.

I agree, we could introduce an intermediate step to create the names file.

Also, I'm unfamiliar with the term "pickle (n)". Am I right in saying it refers to a file with the output of a serialization operation?

ctb · 2013-08-25T15:07:57Z

On Sun, Aug 25, 2013 at 08:05:12AM -0700, Ram RS wrote:

I was thinking the exact same thing, just did not know how to express it. Like you say, scripts that are not directly directly invoked should not use sys.argv.

I agree, we could introduce an intermediate step to create the names file.

We already have one, right? make-namedb.py. So let's just modify it...

Also, I'm unfamiliar with the term "pickle (n)". Am I right in saying it refers to a file with the output of a serialization operation?

http://docs.python.org/2/library/pickle.html

--titus

C. Titus Brown, ctb@msu.edu

RamRS · 2013-08-25T15:11:47Z

I am reading that and a couple of other docs too, just not fluent enough to
use it and describe stuff. Thanks though!

Ram

On Sun, Aug 25, 2013 at 11:07 AM, C. Titus Brown
notifications@github.comwrote:

On Sun, Aug 25, 2013 at 08:05:12AM -0700, Ram RS wrote:

I was thinking the exact same thing, just did not know how to express
it. Like you say, scripts that are not directly directly invoked should not
use sys.argv.

I agree, we could introduce an intermediate step to create the names
file.

We already have one, right? make-namedb.py. So let's just modify it...

Also, I'm unfamiliar with the term "pickle (n)". Am I right in saying it
refers to a file with the output of a serialization operation?

http://docs.python.org/2/library/pickle.html

--titus

C. Titus Brown, ctb@msu.edu

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/4#issuecomment-23229260
.

RamRS · 2013-08-25T15:54:20Z

Just clarifying, we
a. use a names.db instead of the mouse.namedb (or do we retain the mouse.namedb but create a new names.db to point to both mouse.namedb and mouse.protein.faa?)
b. if a is true, we push the seq file name as the first/last line of the names.db

ctb · 2013-08-25T16:57:43Z

A seems simplest. And I would just pickle the name of the sequence database as the first dump into the names.db.

C. Titus Brown, ctb@msu.edu

On Aug 25, 2013, at 11:54, Ram RS notifications@github.com wrote:

Just clarifying, we
a. use a names.db instead of the mouse.namedb (or do we retain the mouse.namedb but create a new names.db to point to both mouse.namedb and mouse.protein.faa?)
b. if a is true, we push the seq file name as the first/last line of the names.db

—
Reply to this email directly or view it on GitHub.

RamRS · 2013-08-25T21:00:52Z

Alright, working on that now. Just a quick question - why not dump names
and fullnames in the same file? Use a
dump(dict(seq=seqFile,names=namesArray,fullnames=fullNameArray),fp)

and load into a variable (say, "data") and access using data["names"] and
data["fullnames"]?

Ram

On Sun, Aug 25, 2013 at 12:57 PM, C. Titus Brown
notifications@github.comwrote:

A seems simplest. And I would just pickle the name of the sequence
database as the first dump into the names.db.

C. Titus Brown, ctb@msu.edu

On Aug 25, 2013, at 11:54, Ram RS notifications@github.com wrote:

Just clarifying, we
a. use a names.db instead of the mouse.namedb (or do we retain the
mouse.namedb but create a new names.db to point to both mouse.namedb and
mouse.protein.faa?)
b. if a is true, we push the seq file name as the first/last line of the
names.db

—
Reply to this email directly or view it on GitHub.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/4#issuecomment-23231184
.

file and constant namedb files make-namedb changed to pickle into constant names.db and fullnames.db files, and pickle sequence filename into names.db. namedb.py modified to unpickle correspondingly. Lines expecting extra CMD line args in annotate-seqs removed.

RamRS · 2013-08-25T22:16:14Z

These changes are being overridden by the next commit, so I'm closing this pull request. I'll open a new pr on the latest commit shortly.

ctb · 2013-08-26T00:02:08Z

No, please continue on this pr. If you find yourself wanting to close a pr, please don't! They record valuable history and conversation!

C. Titus Brown, ctb@msu.edu

On Aug 25, 2013, at 18:16, Ram RS notifications@github.com wrote:

These changes are being overridden by the next commit, so I'm closing this pull request. I'll open a new pr on the latest commit shortly.

—
Reply to this email directly or view it on GitHub.

RamRS · 2013-08-26T00:26:04Z

But closed prs exist too, don't they?

Ram

On Sun, Aug 25, 2013 at 8:02 PM, C. Titus Brown notifications@github.comwrote:

No, please continue on this pr. If you find yourself wanting to close a
pr, please don't! They record valuable history and conversation!

C. Titus Brown, ctb@msu.edu

On Aug 25, 2013, at 18:16, Ram RS notifications@github.com wrote:

These changes are being overridden by the next commit, so I'm closing
this pull request. I'll open a new pr on the latest commit shortly.

—
Reply to this email directly or view it on GitHub.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/4#issuecomment-23238284
.

ctb · 2013-08-26T01:00:35Z

A closed pr is an idea we've totally abandoned, or one we've merged.

C. Titus Brown, ctb@msu.edu

On Aug 25, 2013, at 20:26, Ram RS notifications@github.com wrote:

But closed prs exist too, don't they?

Ram

On Sun, Aug 25, 2013 at 8:02 PM, C. Titus Brown notifications@github.comwrote:

No, please continue on this pr. If you find yourself wanting to close a
pr, please don't! They record valuable history and conversation!

C. Titus Brown, ctb@msu.edu

On Aug 25, 2013, at 18:16, Ram RS notifications@github.com wrote:

These changes are being overridden by the next commit, so I'm closing
this pull request. I'll open a new pr on the latest commit shortly.

—
Reply to this email directly or view it on GitHub.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/4#issuecomment-23238284
.

—
Reply to this email directly or view it on GitHub.

RamRS · 2013-08-26T01:10:45Z

Alright, I'll reopen the request, but we're no longer working on the version referenced by the pr - just a note!

ctb · 2013-08-26T01:58:16Z

You should be working on the version referenced by the PR; that's kinda the point. Fix the version referenced by the PR, don't create a new version or branch.

RamRS · 2013-08-26T02:15:33Z

Got it, will do from the next time I work on this - I'm reading up on Git
usage, please excuse the lapses - they will positively go down rapidly over
a very short time!

Ram

On Sun, Aug 25, 2013 at 9:58 PM, C. Titus Brown notifications@github.comwrote:

You should be working on the version referenced by the PR; that's kinda
the point. Fix the version referenced by the PR, don't create a new version
or branch.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/4#issuecomment-23240390
.

ctb · 2013-08-26T02:22:30Z

np.

On Sun, Aug 25, 2013 at 07:15:33PM -0700, Ram RS wrote:

Got it, will do from the next time I work on this - I'm reading up on Git
usage, please excuse the lapses - they will positively go down rapidly over
a very short time!

Ram

On Sun, Aug 25, 2013 at 9:58 PM, C. Titus Brown notifications@github.comwrote:

You should be working on the version referenced by the PR; that's kinda
the point. Fix the version referenced by the PR, don't create a new version
or branch.

???
Reply to this email directly or view it on GitHubhttps://github.com//pull/4#issuecomment-23240390
.

Reply to this email directly or view it on GitHub:
#4 (comment)

C. Titus Brown, ctb@msu.edu

dict() used in previous commit has been removed, sequential calls to dump() and load() are used instead. A variable 'seqFile' id used to store sequence file name from CMD line args.

RamRS mentioned this pull request Aug 23, 2013

old version of namedb referenced+used in annotate-partitions-with-mouse.py #5

Open

RamRS mentioned this pull request Aug 25, 2013

Modified annotate-seqs.py step in Building Transcript Families to confor... dib-lab/khmer-protocols#6

Closed

RamRS closed this Aug 25, 2013

RamRS reopened this Aug 26, 2013

Pickling no longer uses dict. var seqFile stores seq file name

eb177a4

dict() used in previous commit has been removed, sequential calls to dump() and load() are used instead. A variable 'seqFile' id used to store sequence file name from CMD line args.

Conversation

RamRS commented Aug 23, 2013

Uh oh!

ctb commented Aug 24, 2013

Uh oh!

RamRS commented Aug 25, 2013

Uh oh!

RamRS commented Aug 25, 2013

Uh oh!

ctb commented Aug 25, 2013

Uh oh!

ctb commented Aug 25, 2013

Uh oh!

RamRS commented Aug 25, 2013

Uh oh!

ctb commented Aug 25, 2013

--titus

Uh oh!

RamRS commented Aug 25, 2013

--titus

Uh oh!

RamRS commented Aug 25, 2013

Uh oh!

ctb commented Aug 25, 2013

Uh oh!

RamRS commented Aug 25, 2013

Uh oh!

RamRS commented Aug 25, 2013

Uh oh!

ctb commented Aug 26, 2013

Uh oh!

RamRS commented Aug 26, 2013

Uh oh!

ctb commented Aug 26, 2013

Uh oh!

RamRS commented Aug 26, 2013

Uh oh!

ctb commented Aug 26, 2013

Uh oh!

RamRS commented Aug 26, 2013

Uh oh!

ctb commented Aug 26, 2013

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants