Add "headless" mode to metis_client #724

arkal · 2022-01-19T23:14:08Z

The current schema for printing output in metis_client is based
on a standard interactive terminal. This is not relevant when one
is attempting to automate multiple commands in one session.
For example,

get/put both implement carriage-return-based progress updates
that render terrible when piping output to a file

> project mvir1
mvir1:> 
mvir1:> cd data/
mvir1:data> put /krummellab/data1/staging/COMET_Viral/batch_11/process_batch_11/ ^Mbulk_RNASeq/ .
^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 3.78kB/s^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 7.52MB/s^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 10.06MB/s^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 10.94MB/s^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 12.0MB/s^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 12.86MB/s^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 12.99MB/s^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 13.36MB/s^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 13.37MB/s^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 13.6MB/s^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 13.81MB/s^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 14.13MB/s^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 14.23MB/s^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 14.41MB/s^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 14.44MB/s^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 14.62MB/s^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 14.64MB/s^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 14.72MB/s^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 14.8MB/s^M

ls uses the COLUMNS environment variable to make the output prettier, but if i'm piping to a file i'd like to see full filenames/paths and fields.
Nuke uses the same progress bar as get/put to show progress on files md5checked.

This PR implements a --headless flag to metis_client that modifies the behavior of metis_client slightly, to make for
better output when the result is piped into a logfile.

arkal · 2022-01-20T01:02:51Z

Sweet, the tests all passed, so this isn't breaking the original behavior. @corps I'm not sure how to go about testing this stuff without duplicating almost all of client_spec.rb . Let me know what you think?

coleshaw · 2022-01-20T14:07:35Z

metis/bin/metis_client

    attr_reader :options

    def initialize(shell)
+      @headless = ENV['METIS_headless'] == 'true' ? true : false


First, thanks so much for putting in a PR -- this is awesome!

super-nit: is the convention for environment variables to be in all caps? i.e. METIS_HEADLESS

haha yeah that's a good point.... it should be ALL_CAPS. Will fix in the next version!

graft · 2022-01-20T17:27:50Z

This is a great idea, thanks for writing this. Two points:

This relies on a global flag, which is a concept that heretofore has not existed in metis_client (previously arguments are parsed by individual commands); I'm dubious about this introduction since it creates some ambiguity about where options are being applied, in what order they should be specified, etc. It is also somewhat irritating to have to remember to to specify --headless every time you want to pipe to a file. Instead I might rely on STDOUT.isatty to determine headless? and avoid specifying an option at all.
The use of ENV["COLUMNS"] is somewhat problematic here. When you are running in a shell you inherit the local ENV, and so you get a value for ENV["COLUMNS"]. However, if you simply run the script non-interactively, because "COLUMNS" is not exported by default, it does not appear in the ENV of the script so ENV["COLUMNS"].to_i will be 0. This should result in metis_client behaving like ls does when piping (e.g., ls | less returns a single column of files), but this bears some investigation. No need to fear long file names; they will not be truncated.

arkal · 2022-01-20T22:41:09Z

Hmm. Looks like i was mistaken about point 2 and lines are NOT truncated. Maybe i recalled wrong from a previous use of the command.

[arrao@c4-dev2 process_batch_11]$ cat foo.out 
> project mvir1
mvir1:> 
mvir1:> ls --long /data/bulk_RNASeq/processed/MVIR1-HS59-D0PBMC1-RSQ2
mvir1:> arjunarkal.rao@ucsf.edu      24434 Nov 24  2021 archived MVIR1-HS59-D0PBMC1-RSQ2_from_filtered.merged.biallelic.maxMissin1.rmIndel.rsOnly.maf0.05.min-meanDP2.recode.vcf.gz.tbi
arjunarkal.rao@ucsf.edu     171441 Nov 24  2021 archived     MVIR1-HS59-D0PBMC1-RSQ2_from_filtered.merged.biallelic.maxMissin1.rmIndel.rsOnly.maf0.05.min-meanDP2.recode.vcf.gz
arjunarkal.rao@ucsf.edu    1376815 Nov 24  2021 archived                                                                           MVIR1-HS59-D0PBMC1-RSQ2_ReadsPerGene.out.tab
arjunarkal.rao@ucsf.edu      21792 Nov 24  2021 archived                                                                         MVIR1-HS59-D0PBMC1-RSQ2_marked_dup_metrics.txt
arjunarkal.rao@ucsf.edu       2035 Nov 24  2021 archived                                                                                  MVIR1-HS59-D0PBMC1-RSQ2_Log.final.out
arjunarkal.rao@ucsf.edu 2485875680 Nov 24  2021 archived                                  MVIR1-HS59-D0PBMC1-RSQ2_Aligned.sortedByCoord.duplicateRemoved.withReadGroups.out.bam

regarding using STDOUT.isatty, that's pretty cool I like that much better! I'll check it out and resubmit this.

arkal · 2022-01-20T22:42:37Z

Also, just as a quick question, do you guys want me to squash + rebase + force push, or can i have multiple commits on the branch? The lead on my old collaborative repo would make us do that so the commit history would be super clean

graft · 2022-01-20T23:00:14Z

Also, just as a quick question, do you guys want me to squash + rebase + force push, or can i have multiple commits on the branch?

We are not usually picky. Multiple commits is fine, squashing and rebasing is fine, merging in master is fine.

1. Add `--headless` as a client argument that makes some changes to output including increasing line size in ls (via increasing term size to an arbitrarily large number, 1000), and updating the get/put progress to only write 1 line on completion of the action. 2. rename `progress` to `print_progress` and refactor to move print statements into the function. 3. Add a "verbose" argument to print_progress to implement (1) above 4. Refactored all calls to progress to appropriately implement print_progress 5. Given `Nuke` is a destructive action, the print behavior is modified to print one line per file processed with `puts` instead of `print` so the output buffer in headless mode isn't corrupted by the carriage returns.

Modify `MetisShell.new` calls to reflect new `client_args` addition to `initialize`

Reverse changes to tests.

Readline uses [0, 0] for terminal size in a script, and this will truncate the echo-ed input command, making the logfile annoying to read/parse. Explicitly setting the col width to a large number in a scripted mode will fix this issue

arkal · 2022-01-21T22:13:21Z

Looks like it works. I've modified the code to use STDOUT.isatty, and i've just pushed a small fix that alters Readline's perception of terminal size to prevent truncation of the echo-ed input command, making for prettier outputs.

This brings me back to my original question: How do you want me to test this? Given that the code is modified ever so slightly, I don't think i need unit tests for any of this but that may affect the code coverage and i don't know how important you guys think that is.

arkal · 2022-01-21T22:20:23Z

Already loving the results, if i do say so myself.

BEFORE

AFTER

graft · 2022-01-22T15:17:13Z

metis/bin/metis_client

+          if STDOUT.isatty
+            print "Computing md5s - #{progress_bar(i.to_f / files.length)} - #{i} of #{files.length}\r"
+          else
+            puts "Computing md5s - #{progress_bar(i.to_f / files.length)} - #{i} of #{files.length}"


Won't this be spammy?

Absolutely, but like I said above, Nuke is a destructive operation so verbosity is good, no? This is just an md5 check though so it may be OK to only print completion

coleshaw · 2022-01-24T13:22:11Z

We're not super strict about test coverage percentages as a metric ... I think the general philosophy I take is to make sure that tests cover the important logic of the features I add, to prevent regression bugs. You could perhaps set STDOUT.isatty during test setup (not sure??), and verify the output format changes.

arkal · 2022-01-24T21:00:03Z

Right that was an option, but i'd have to duplicate a large chunk of client_spec so i test bot with and without STDOUT.isatty ?

coleshaw · 2022-01-24T21:18:10Z

Oh, I wouldn't duplicate all the tests in client_spec with / without STDOUT.isattty. I would probably just add one or two small tests that capture the core formatting changes. Kind of like these for set -e. I don't think it's possible to test upload see comment, but you could test download output to see if it matches the new, expected format...so like some version of this test, perhaps combined with the set -e spec, but with STDOUT.isatty set to true.

If that is still a bit unclear, let me know and we can pair or I can sketch out in more detail what some specs might look like...

arkal · 2022-01-24T21:22:06Z

That makes sense. I can work on this later this week will ping when i have something working

graft · 2022-01-24T23:45:12Z

I'd be content with a test each covering get and ls.

arkal requested a review from corps January 19, 2022 23:14

coleshaw reviewed Jan 20, 2022

View reviewed changes

arkal added 3 commits January 21, 2022 12:57

Update client tests to work with the new code

04f83c0

Modify `MetisShell.new` calls to reflect new `client_args` addition to `initialize`

Update code to use STDOUT.isatty instead of adding global arguments

70b362b

Reverse changes to tests.

arkal force-pushed the aar/headless-metis-client branch from b8bffed to 70b362b Compare January 21, 2022 21:08

graft reviewed Jan 22, 2022

View reviewed changes

Add "headless" mode to metis_client #724

Are you sure you want to change the base?

Add "headless" mode to metis_client #724

Uh oh!

Conversation

arkal commented Jan 19, 2022

Uh oh!

arkal commented Jan 20, 2022

Uh oh!

coleshaw Jan 20, 2022

Choose a reason for hiding this comment

Uh oh!

arkal Jan 20, 2022

Choose a reason for hiding this comment

Uh oh!

graft commented Jan 20, 2022

Uh oh!

arkal commented Jan 20, 2022

Uh oh!

arkal commented Jan 20, 2022

Uh oh!

graft commented Jan 20, 2022

Uh oh!

arkal commented Jan 21, 2022

Uh oh!

arkal commented Jan 21, 2022

Uh oh!

graft Jan 22, 2022

Choose a reason for hiding this comment

Uh oh!

arkal Jan 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coleshaw commented Jan 24, 2022

Uh oh!

arkal commented Jan 24, 2022

Uh oh!

coleshaw commented Jan 24, 2022

Uh oh!

arkal commented Jan 24, 2022

Uh oh!

graft commented Jan 24, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

arkal Jan 22, 2022 •

edited

Loading