Added output/input comparison file by lyla-kumari · Pull Request #6 · drtconway/svelt

lyla-kumari · 2025-07-18T04:12:43Z

No description provided.

lyla-kumari · 2025-07-18T04:14:39Z

Tested missing variants in input files and output files separately (without running svelt again)

Haven't done coverage testing yet

drtconway · 2025-07-18T04:38:28Z

compare_svelt_output_to_inputs.py

+    for rec in vcf:
+        original_ids = rec.info.get("ORIGINAL_IDS")
+        if not original_ids:
+            continue


If a row in the output of svelt does not contain ORIGINAL_IDS, that is an error in the output, so the script should complain, even if it keeps going. I'd consider writing out rec to stdout or you could even make a file containing the problem records.

drtconway · 2025-07-18T04:46:18Z

compare_svelt_output_to_inputs.py

+        ids = set()
+        for rec in vcf:
+            if rec.id and rec.id != ".":
+                clean_id = rec.id.split('_', 1)[-1] if '_' in rec.id else rec.id


I'm pretty worried about this line. If it's not empty (Does pysam return "." or None for absent IDs?), we shouldn't be modifying it, otherwise it might collide with other IDs in this file!

Can you explain why this line is there? There may be an excellent reason, but I can't see it.

drtconway · 2025-07-18T04:47:15Z

compare_svelt_output_to_inputs.py

+        for rec in vcf:
+            if rec.id and rec.id != ".":
+                clean_id = rec.id.split('_', 1)[-1] if '_' in rec.id else rec.id
+                ids.add(clean_id)


It would be a very good idea to check to see if the ID is in the set already, so you can produce a warning about duplicate IDs in the input.

drtconway · 2025-07-18T04:48:34Z

compare_svelt_output_to_inputs.py

+
+def extract_input_ids(input_files):
+    sample_to_ids = {}
+    all_ids = set()


IDs are not guaranteed to be unique between different VCFs. So the same ID value could refer to two different variants in two VCF files. Accordingly, I don''t understand why you make a set of all the IDs in the inputs.

drtconway · 2025-07-18T04:51:25Z

compare_svelt_output_to_inputs.py

+    all_ids = set()
+    for path in input_files:
+        sample = os.path.basename(path).split('.')[0]
+        vcf = pysam.VariantFile(path)


I would consider using a with statement here with pysam.VariantFile(path) as vcf: to make sure the file gets closed. With the 3 samples we're using it's not going to be a problem, but it's a good defensive coding practice to use, to make sure that you don't get strange errors from running out of file descriptors.

If you're not familiar with with statements, I can explain them, or you can read up on them.

drtconway · 2025-07-18T04:53:14Z

compare_svelt_output_to_inputs.py

+import os
+from collections import defaultdict
+
+def extract_input_ids(input_files):


I would usually organise the code slightly differently, and make a function for reading the IDs from 1 file, and then either do the loop in the caller, or add an extra function. A function that does something with a file kind of makes logical sense, so helps guide the reader through the code.

drtconway · 2025-07-18T04:54:58Z

Cargo.toml

 [package]
 name = "svelt"
-version = "0.1.14"
+version = "0.1.15"


This PR isn''t changing svelt itself, so we don't really need a version bump.

drtconway · 2025-07-18T05:00:05Z

compare_svelt_output_to_inputs.py

+        original_ids = rec.info.get("ORIGINAL_IDS")
+        if not original_ids:
+            continue
+        if isinstance(original_ids, (list, tuple)):


This code is wrong, because it loses track of which ID came from which original file.

If you've got a bare string, then split on ,, then whether it was originally a list or tuple, or you split it, you want to iterate and put the ID into an appropriate corresponding set.

So you should return a list of sets corresponding to the original files. Does that make sense?

drtconway · 2025-07-18T05:02:21Z

compare_svelt_output_to_inputs.py

+            svelt_ids.update(oid for oid in original_ids.split(',') if oid != '.')
+    return svelt_ids
+
+def compare_ids(sample_to_ids, all_input_ids, svelt_ids):


This function doesn't do at all what we need. Because the IDs are not unique between different files, you can't process them this way.

You need to compare the IDs from the first input file with the IDs in the first position of ORIGINAL_IDS, from the second input file with the ones from the second position of ORIGINAL_IDS, and so on.

drtconway

Tat looks pretty good. Does it work? :)

lyla-kumari changed the base branch from main to development July 18, 2025 04:15

drtconway reviewed Jul 18, 2025

View reviewed changes

lyla-kumari force-pushed the testing branch from b485ac0 to d4a409d Compare July 22, 2025 03:02

lyla-kumari added 3 commits July 22, 2025 13:21

with new svelt format

5551d0c

jasmine first draft

215e8cf

svelt comparison update

85d8969

drtconway reviewed Jul 22, 2025

View reviewed changes

lyla-kumari and others added 3 commits July 25, 2025 13:20

combined program + slightly adjusted logic

2cdbb59

combined svelt/jasmine function? not sure if it works fully yet

e838554

Merge branch 'drtconway:main' into testing

9ec9851

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added output/input comparison file#6

Added output/input comparison file#6
lyla-kumari wants to merge 6 commits intodrtconway:developmentfrom
lyla-kumari:testing

lyla-kumari commented Jul 18, 2025

Uh oh!

lyla-kumari commented Jul 18, 2025

Uh oh!

drtconway Jul 18, 2025

Uh oh!

drtconway Jul 18, 2025

Uh oh!

drtconway Jul 18, 2025

Uh oh!

drtconway Jul 18, 2025

Uh oh!

drtconway Jul 18, 2025

Uh oh!

drtconway Jul 18, 2025

Uh oh!

drtconway Jul 18, 2025

Uh oh!

drtconway Jul 18, 2025

Uh oh!

drtconway Jul 18, 2025

Uh oh!

drtconway left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lyla-kumari commented Jul 18, 2025

Uh oh!

lyla-kumari commented Jul 18, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

drtconway left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants