Skip to content

Conversation

@arkal
Copy link

@arkal arkal commented Jan 19, 2022

The current schema for printing output in metis_client is based
on a standard interactive terminal. This is not relevant when one
is attempting to automate multiple commands in one session.
For example,

  1. get/put both implement carriage-return-based progress updates
    that render terrible when piping output to a file
> project mvir1
mvir1:> 
mvir1:> cd data/
mvir1:data> put /krummellab/data1/staging/COMET_Viral/batch_11/process_batch_11/ ^Mbulk_RNASeq/ .
^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 3.78kB/s^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 7.52MB/s^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 10.06MB/s^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 10.94MB/s^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 12.0MB/s^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 12.86MB/s^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 12.99MB/s^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 13.36MB/s^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 13.37MB/s^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 13.6MB/s^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 13.81MB/s^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 14.13MB/s^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 14.23MB/s^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 14.41MB/s^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 14.44MB/s^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 14.62MB/s^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 14.64MB/s^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 14.72MB/s^M                    bulk_RNASeq/raw/MVIR1-HS145-D1ETA1-RSQ1/MVIR1-HS145-D1ETA1-RSQ1_S196_R1_001.fastq.gz - 14.8MB/s^M     
  1. ls uses the COLUMNS environment variable to make the output prettier, but if i'm piping to a file i'd like to see full filenames/paths and fields.
  2. Nuke uses the same progress bar as get/put to show progress on files md5checked.

This PR implements a --headless flag to metis_client that modifies the behavior of metis_client slightly, to make for
better output when the result is piped into a logfile.

@arkal arkal requested a review from corps January 19, 2022 23:14
@arkal
Copy link
Author

arkal commented Jan 20, 2022

Sweet, the tests all passed, so this isn't breaking the original behavior. @corps I'm not sure how to go about testing this stuff without duplicating almost all of client_spec.rb . Let me know what you think?

attr_reader :options

def initialize(shell)
@headless = ENV['METIS_headless'] == 'true' ? true : false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, thanks so much for putting in a PR -- this is awesome!

super-nit: is the convention for environment variables to be in all caps? i.e. METIS_HEADLESS

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha yeah that's a good point.... it should be ALL_CAPS. Will fix in the next version!

@graft
Copy link
Contributor

graft commented Jan 20, 2022

This is a great idea, thanks for writing this. Two points:

  • This relies on a global flag, which is a concept that heretofore has not existed in metis_client (previously arguments are parsed by individual commands); I'm dubious about this introduction since it creates some ambiguity about where options are being applied, in what order they should be specified, etc. It is also somewhat irritating to have to remember to to specify --headless every time you want to pipe to a file. Instead I might rely on STDOUT.isatty to determine headless? and avoid specifying an option at all.
  • The use of ENV["COLUMNS"] is somewhat problematic here. When you are running in a shell you inherit the local ENV, and so you get a value for ENV["COLUMNS"]. However, if you simply run the script non-interactively, because "COLUMNS" is not exported by default, it does not appear in the ENV of the script so ENV["COLUMNS"].to_i will be 0. This should result in metis_client behaving like ls does when piping (e.g., ls | less returns a single column of files), but this bears some investigation. No need to fear long file names; they will not be truncated.

@arkal
Copy link
Author

arkal commented Jan 20, 2022

Hmm. Looks like i was mistaken about point 2 and lines are NOT truncated. Maybe i recalled wrong from a previous use of the command.

[arrao@c4-dev2 process_batch_11]$ cat foo.out 
> project mvir1
mvir1:> 
mvir1:> ls --long /data/bulk_RNASeq/processed/MVIR1-HS59-D0PBMC1-RSQ2
mvir1:> arjunarkal.rao@ucsf.edu      24434 Nov 24  2021 archived MVIR1-HS59-D0PBMC1-RSQ2_from_filtered.merged.biallelic.maxMissin1.rmIndel.rsOnly.maf0.05.min-meanDP2.recode.vcf.gz.tbi
arjunarkal.rao@ucsf.edu     171441 Nov 24  2021 archived     MVIR1-HS59-D0PBMC1-RSQ2_from_filtered.merged.biallelic.maxMissin1.rmIndel.rsOnly.maf0.05.min-meanDP2.recode.vcf.gz
arjunarkal.rao@ucsf.edu    1376815 Nov 24  2021 archived                                                                           MVIR1-HS59-D0PBMC1-RSQ2_ReadsPerGene.out.tab
arjunarkal.rao@ucsf.edu      21792 Nov 24  2021 archived                                                                         MVIR1-HS59-D0PBMC1-RSQ2_marked_dup_metrics.txt
arjunarkal.rao@ucsf.edu       2035 Nov 24  2021 archived                                                                                  MVIR1-HS59-D0PBMC1-RSQ2_Log.final.out
arjunarkal.rao@ucsf.edu 2485875680 Nov 24  2021 archived                                  MVIR1-HS59-D0PBMC1-RSQ2_Aligned.sortedByCoord.duplicateRemoved.withReadGroups.out.bam

regarding using STDOUT.isatty, that's pretty cool I like that much better! I'll check it out and resubmit this.

@arkal
Copy link
Author

arkal commented Jan 20, 2022

Also, just as a quick question, do you guys want me to squash + rebase + force push, or can i have multiple commits on the branch? The lead on my old collaborative repo would make us do that so the commit history would be super clean

@graft
Copy link
Contributor

graft commented Jan 20, 2022

Also, just as a quick question, do you guys want me to squash + rebase + force push, or can i have multiple commits on the branch?

We are not usually picky. Multiple commits is fine, squashing and rebasing is fine, merging in master is fine.

arkal added 3 commits January 21, 2022 12:57
1. Add `--headless` as a client argument that makes some changes to
output including increasing line size in ls (via increasing term size to
an arbitrarily large number, 1000), and updating the get/put progress to
only write 1 line on completion of the action.
2. rename `progress` to `print_progress` and refactor to move print
statements into the function.
3. Add a "verbose" argument to print_progress to implement (1) above
4. Refactored all calls to progress to appropriately implement
print_progress
5. Given `Nuke` is a  destructive action, the print behavior is modified
to print one line per file processed with `puts` instead of `print` so
the output buffer in headless mode isn't corrupted by the carriage
returns.
Modify `MetisShell.new` calls to reflect new `client_args` addition to
`initialize`
@arkal arkal force-pushed the aar/headless-metis-client branch from b8bffed to 70b362b Compare January 21, 2022 21:08
Readline uses [0, 0] for terminal size in a script, and this will
truncate the echo-ed input command, making the logfile annoying to
read/parse. Explicitly setting the col width to a large number in a
scripted mode will fix this issue
@arkal
Copy link
Author

arkal commented Jan 21, 2022

Looks like it works. I've modified the code to use STDOUT.isatty, and i've just pushed a small fix that alters Readline's perception of terminal size to prevent truncation of the echo-ed input command, making for prettier outputs.

This brings me back to my original question: How do you want me to test this? Given that the code is modified ever so slightly, I don't think i need unit tests for any of this but that may affect the code coverage and i don't know how important you guys think that is.

@arkal
Copy link
Author

arkal commented Jan 21, 2022

Already loving the results, if i do say so myself.

BEFORE
Screen Shot 2022-01-21 at 2 16 55 PM

AFTER
Screen Shot 2022-01-21 at 2 17 07 PM

if STDOUT.isatty
print "Computing md5s - #{progress_bar(i.to_f / files.length)} - #{i} of #{files.length}\r"
else
puts "Computing md5s - #{progress_bar(i.to_f / files.length)} - #{i} of #{files.length}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this be spammy?

Copy link
Author

@arkal arkal Jan 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, but like I said above, Nuke is a destructive operation so verbosity is good, no? This is just an md5 check though so it may be OK to only print completion

@coleshaw
Copy link
Collaborator

We're not super strict about test coverage percentages as a metric ... I think the general philosophy I take is to make sure that tests cover the important logic of the features I add, to prevent regression bugs. You could perhaps set STDOUT.isatty during test setup (not sure??), and verify the output format changes.

@arkal
Copy link
Author

arkal commented Jan 24, 2022

Right that was an option, but i'd have to duplicate a large chunk of client_spec so i test bot with and without STDOUT.isatty ?

@coleshaw
Copy link
Collaborator

Oh, I wouldn't duplicate all the tests in client_spec with / without STDOUT.isattty. I would probably just add one or two small tests that capture the core formatting changes. Kind of like these for set -e. I don't think it's possible to test upload see comment, but you could test download output to see if it matches the new, expected format...so like some version of this test, perhaps combined with the set -e spec, but with STDOUT.isatty set to true.

If that is still a bit unclear, let me know and we can pair or I can sketch out in more detail what some specs might look like...

@arkal
Copy link
Author

arkal commented Jan 24, 2022

That makes sense. I can work on this later this week will ping when i have something working

@graft
Copy link
Contributor

graft commented Jan 24, 2022

I'd be content with a test each covering get and ls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants