WIP ADAM queries. by fnothaft · Pull Request #6 · Steven-N-Hart/VariantDB_Challenge

fnothaft · 2017-01-05T00:51:14Z

Don't have these quite done, but they're 99% there. I've been running these in the Spark shell. Query 2 (gVCF merge) is done and tested. Query 1 (integrate across/within populations) is done, but I'm debugging a serialization error that the Spark shell gives me. The scripts are pretty self contained, but I will be adding a README with some minor implementation details. I'm also going to be going through and improving the time measurement code. @Steven-N-Hart let me know what information you would need from me to make maximum use of this.

BTW, this depends on a few features that we are triaging into the next release of ADAM, which should be out very soon. They are:

fnothaft · 2017-01-05T00:51:46Z

CC @heuermh @jpdna

drachimera · 2017-01-05T02:53:31Z

Awesome sauce!

…

Sent from my iPhone

On Jan 4, 2017, at 6:51 PM, Frank Austin Nothaft ***@***.***> wrote: CC @heuermh @jpdna — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

heuermh

LGTM, added a few comments for discussion.

heuermh · 2017-01-05T15:19:00Z

submissions/bdgenomics/scripts/2_continuous.scala

+
+// reload datasets and cache
+val na12878Gts = sc.loadGenotypes("%s/NA12878.gt.adam".format(dataDir)).transform(_.cache())
+val otherGts = sc.loadGenotypes("%s/NA1289*.gt.adam/*".format(dataDir)).transform(_.cache())


Also asked this elsewhere, why the /* in the glob pattern?

heuermh · 2017-01-05T15:22:27Z

submissions/bdgenomics/scripts/2_continuous.scala

+	// and dumping a bad GQ
+	//
+	// since these lines are obviously strong HomRef calls, let's recover them by parsing PL's
+        if (gt.getVariant.getAlternateAllele == null) {


In converting from VCF, we go from <NON_REF> symbolic allele to null variant alternateAllele? For going back to VCF, is there any other reason why variant alternateAllele might be null? Do we write "<NON_REF>" when variant alternateAllele is null or do we rely on htsjdk to do that for us?

heuermh · 2017-01-05T15:30:14Z

submissions/bdgenomics/scripts/1_integration.scala

+      bcastSampleToPopulationMap.value
+       .get(gt.getSampleId)
+       .map(population => (population, gt))
+    }).filter(kv => {


I wonder if there would be utility in filterBy method(s) on VariantRDD/GenotypeRDD to make this easier to read? The filter would need to happen before the flatMap to (population, gt) then. Is there much performance difference between a compound filter like this with lots of expressions and a series of filter calls with smaller expressions?

heuermh · 2017-01-05T15:32:29Z

submissions/bdgenomics/scripts/2_continuous.scala

+/**
+ *
+ */
+val dataDir = "/data/variant_db/2"


Could you add some metrics to this script?

Steven-N-Hart · 2017-01-18T16:37:50Z

Thanks @fnothaft and @heuermh ! Should I accept now or wait until you've completed? I will at least need a README (since I'm not a Scala person).

fnothaft · 2017-01-18T16:42:25Z

Hi @Steven-N-Hart! I think things are complete from my side, but I was considering going through and doing a bit of code cleanup, possible repackaging. Let me square with @heuermh this morning and we'll figure out next steps. Beyond the README that we owe you, what would be helpful for you?

Steven-N-Hart · 2017-01-19T18:10:06Z

@fnothaft ,
The README should be sufficient. If I can't complete the analysis or I find a missing dependancy, etc., I'll ping you.

WIP ADAM queries.

b763832

heuermh reviewed Jan 5, 2017

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP ADAM queries.#6

WIP ADAM queries.#6
fnothaft wants to merge 1 commit intoSteven-N-Hart:masterfrom
fnothaft:adam

fnothaft commented Jan 5, 2017

Uh oh!

fnothaft commented Jan 5, 2017

Uh oh!

drachimera commented Jan 5, 2017 via email

Uh oh!

heuermh left a comment

Uh oh!

heuermh Jan 5, 2017

Uh oh!

heuermh Jan 5, 2017

Uh oh!

heuermh Jan 5, 2017

Uh oh!

heuermh Jan 5, 2017

Uh oh!

Steven-N-Hart commented Jan 18, 2017

Uh oh!

fnothaft commented Jan 18, 2017

Uh oh!

Steven-N-Hart commented Jan 19, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

fnothaft commented Jan 5, 2017

Uh oh!

fnothaft commented Jan 5, 2017

Uh oh!

drachimera commented Jan 5, 2017 via email

Uh oh!

heuermh left a comment

Choose a reason for hiding this comment

Uh oh!

heuermh Jan 5, 2017

Choose a reason for hiding this comment

Uh oh!

heuermh Jan 5, 2017

Choose a reason for hiding this comment

Uh oh!

heuermh Jan 5, 2017

Choose a reason for hiding this comment

Uh oh!

heuermh Jan 5, 2017

Choose a reason for hiding this comment

Uh oh!

Steven-N-Hart commented Jan 18, 2017

Uh oh!

fnothaft commented Jan 18, 2017

Uh oh!

Steven-N-Hart commented Jan 19, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants