fix: handle comments properly instead of hard-coding skips #67

mtran-code · 2025-10-21T14:11:36Z

From test_that(".build_cellosaurus_request is acting as expected"):

Noticed we skip comment lines in TSVs by manually skipped 14 lines. Instead, just use the comment = "#" param to skip any lines starting with #. This way, we don't need to hard code how many lines to skip.

Summary by CodeRabbit

Improvements
- TSV parsing now treats lines beginning with "#" as comments, improving robustness when reading cell mapping files.
- Response column handling changed to preserve parsed column order rather than performing conditional dropping or reordering, affecting final column arrangement.
Tests
- Updated unit tests to reflect the new parsing behavior and resulting row/column expectations.

coderabbitai · 2025-10-21T14:11:51Z

Walkthrough

Removed conditional post-processing that dropped or reordered query-related columns in mapCell2Accession, added comment-line handling to TSV parsing, and updated tests to stop skipping header lines and to expect exact row counts.

Changes

Cohort / File(s)	Summary
Post-processing removal `R/cellosaurus.R`	Removed logic that conditionally dropped query-related columns when `include_query = FALSE` and removed the custom column reordering that separated core/extra/query columns; now returns parsed `responses_dt` without that include_query-based cleanup or reordering.
TSV comment parsing `R/utils-httr2.R`	Added `comment = "#"` to the `read_tsv` call in `.parse_resp_tsv` (treat lines starting with `#` as comments) and reformatted the call to a multi-line style.
Tests updated for parsing `tests/testthat/test_cellosaurus_helpers.R`	Removed `skip = 14` from `parse_resp_tsv` calls; adjusted expectations to exact row counts (`expect_equal(nrow(...), 1)` and `expect_equal(nrow(...), 2)`) reflecting the new parsing behavior.

Sequence Diagram(s)

sequenceDiagram
    participant Caller as mapCell2Accession
    participant Parser as .parse_resp_tsv
    participant DT as responses_dt

    Note over Caller,Parser: Request and parsing phase
    Caller->>Parser: fetch & parse TSV (comment="#" handled)
    Parser-->>Caller: return parsed rows

    alt Old behavior (before)
        Caller->>DT: drop query cols if include_query==FALSE
        Caller->>DT: reorder cols -> (core, extra, query)
        DT-->>Caller: final reordered table
    else New behavior (now)
        Caller->>DT: no query-specific dropping
        Caller->>DT: preserve parsed column order
        DT-->>Caller: final table unchanged in order
    end

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

feat: include_query and name mapping in cellosaurus #66: Modifies mapCell2Accession include_query handling and column ordering; directly related to the removed include_query-specific cleanup and reordering here.

Poem

🐰 I nibbled lines that start with hash,

kept columns whole — no extra slash.
Tests now count each row with care,
parsers skip the noisy air,
a tidy table, carrot flair. 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The pull request title "fix: handle comments properly instead of hard-coding skips" directly and accurately describes the main change in the changeset. The PR adds a `comment = "#"` parameter to the TSV reader to automatically handle comment lines instead of using hard-coded `skip = 14` values, which is exactly what the title conveys. The title is concise, specific, and clearly summarizes the primary improvement across the modified files. A teammate reviewing the git history would immediately understand that this PR addresses a code quality issue by replacing a brittle hard-coded approach with a more robust parameter-based solution.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch mtran/cellosaurus-comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 10fe414 and 2a4261a.

📒 Files selected for processing (1)

R/cellosaurus.R (0 hunks)

💤 Files with no reviewable changes (1)

R/cellosaurus.R

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: pkgdown
GitHub Check: Code-Coverage
GitHub Check: ubuntu-latest (release)
GitHub Check: ubuntu-latest (oldrel-1)
GitHub Check: Test-Docker-Build
GitHub Check: macos-latest (release)
GitHub Check: ubuntu-latest (devel)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

R/cellosaurus.R (1)

204-217: Column reordering logic is correct but could be simplified.

The logic correctly places core columns first, followed by extra columns, with the query column at the end when include_query = TRUE. However, the nested conditionals make it harder to follow.

Consider this simpler approach:

 core_cols <- c("cellLineName", "accession")
 existing_core <- core_cols[core_cols %in% names(responses_dt)]
 if (length(existing_core) > 0L) {
-  extra_cols <- setdiff(names(responses_dt), existing_core)
-  if (include_query && "query" %in% extra_cols) {
-    extra_cols <- setdiff(extra_cols, "query")
-    data.table::setcolorder(
-      responses_dt,
-      c(existing_core, extra_cols, "query")
-    )
-  } else {
-    data.table::setcolorder(responses_dt, c(existing_core, extra_cols))
-  }
+  other_cols <- setdiff(names(responses_dt), existing_core)
+  # When include_query is TRUE and query exists, move it to the end
+  if (include_query && "query" %in% other_cols) {
+    other_cols <- c(setdiff(other_cols, "query"), "query")
+  }
+  data.table::setcolorder(responses_dt, c(existing_core, other_cols))
 }

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2a77bef and d50e653.

📒 Files selected for processing (8)

DESCRIPTION (1 hunks)
R/cellosaurus.R (12 hunks)
R/cellosaurus_annotations.R (3 hunks)
R/cellosaurus_helpers.R (2 hunks)
R/utils-httr2.R (1 hunks)
man/mapCell2Accession.Rd (2 hunks)
tests/testthat/test_cellosaurus.R (2 hunks)
tests/testthat/test_cellosaurus_helpers.R (3 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: ubuntu-latest (release)
GitHub Check: ubuntu-latest (devel)
GitHub Check: ubuntu-latest (oldrel-1)
GitHub Check: Code-Coverage
GitHub Check: macos-latest (release)
GitHub Check: Test-Docker-Build
GitHub Check: pkgdown

🔇 Additional comments (16)

DESCRIPTION (1)

44-44: LGTM! Routine documentation tool version update.

The RoxygenNote version bump from 7.3.2 to 7.3.3 is a standard maintenance update with no functional impact.

R/utils-httr2.R (1)

64-69: Excellent fix! Dynamic comment handling replaces hard-coded skips.

Adding comment = "#" to read_tsv enables automatic filtering of comment lines, eliminating the need for hard-coded skip values. This makes the parsing more robust and maintainable.

R/cellosaurus_annotations.R (2)

19-34: LGTM! Improved readability through multi-line formatting.

The default parameter vector is now formatted across multiple lines, improving code readability without changing behavior.

51-58: LGTM! Consistent multi-line formatting.

The parallel request invocation and raw response handling are reformatted for clarity with no behavioral changes.

tests/testthat/test_cellosaurus.R (2)

116-120: LGTM! Good test coverage for user-friendly column names.

This test verifies that parsed = FALSE returns metadata columns with readable names (e.g., synonyms, diseases), ensuring a better user experience.

122-128: LGTM! Comprehensive test coverage for include_query parameter.

These tests verify that include_query = FALSE correctly removes query-related columns in both parsed and unparsed modes, confirming the feature works as designed.

man/mapCell2Accession.Rd (2)

40-41: LGTM! Clear documentation of the new parameter.

The include_query parameter is well-documented with its purpose and default value clearly stated.

47-49: LGTM! Helpful clarification of return value behavior.

The updated documentation clearly explains that parsed = FALSE returns user-friendly column names, improving the API's usability documentation.

R/cellosaurus_helpers.R (2)

66-85: LGTM! Improved parameter readability.

Multi-line formatting of function parameters enhances code readability without affecting functionality.

154-262: LGTM! Clear vertical layout for external resources.

The one-per-line formatting makes the list of external resources easier to scan and maintain.

tests/testthat/test_cellosaurus_helpers.R (2)

35-38: LGTM! Test expectation correctly updated for comment filtering.

The nrow expectation change from 2 to 1 reflects that comment lines are now properly filtered by the comment = "#" parameter, confirming the fix is working as intended.

70-72: LGTM! Test expectation correctly updated for comment filtering.

The nrow expectation change from 3 to 2 reflects that comment lines are now properly skipped, validating the comment handling improvement.

R/cellosaurus.R (4)

87-94: LGTM! Well-documented new parameter.

The include_query parameter is clearly documented with examples of affected columns and default behavior.

100-112: LGTM! Clean API extension.

The new include_query parameter is added with a sensible default (TRUE) that preserves backward compatibility.

193-202: LGTM! Correct implementation of query column removal.

The regex pattern ^query correctly matches all query-related columns (e.g., query, query:ac), and the data.table column deletion syntax is appropriate.

182-189: LGTM! Named parameters improve code clarity.

Using named parameters for .parse_cellosaurus_text makes the function call more readable and maintainable.

codecov · 2025-10-21T14:17:17Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.83%. Comparing base (e1086fe) to head (6b1e9af).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #67      +/-   ##
==========================================
+ Coverage   77.76%   77.83%   +0.06%     
==========================================
  Files          20       20              
  Lines        1309     1313       +4     
==========================================
+ Hits         1018     1022       +4     
  Misses        291      291

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

R/cellosaurus.R (3)
18-56: Simplify boolean comparisons.

The explicit comparisons common == TRUE and upper == TRUE are redundant in R. The idiomatic approach is to use if (common) and if (upper) directly.

Apply this diff:
-  if (common == TRUE) {
+  if (common) {
     fields <- c(
       "id",
       "ac",
@@ ... @@
-  if (upper == TRUE) {
+  if (upper) {
     fields <- toupper(fields)
459-459: TODO: Refactor to eliminate duplicate matchNested calls.

The TODO comment correctly identifies code duplication. Currently, matchNested is called twice in each conditional block—once to check if results exist and again to retrieve them. This doubles the computation.

Would you like me to generate a refactored implementation that stores matchNested results to avoid duplicate calls?

463-501: Redundant matchNested calls impact performance.

In multiple conditional blocks (lines 464-468 & 471-475, lines 478-479 & 481-485, lines 491-495 & 497-500), matchNested is called twice with identical arguments. For large datasets, this redundancy could noticeably degrade performance.

Apply this pattern to eliminate duplicate calls:
-  } else if (
-    length(matchNested(
-      query,
-      responses_dt,
-      keep_duplicates = keep_duplicates
-    )) >
-      0
-  ) {
-    matches <- matchNested(
-      query,
-      responses_dt,
-      keep_duplicates = keep_duplicates
-    )
+  } else {
+    matches <- matchNested(query, responses_dt, keep_duplicates = keep_duplicates)
+    if (length(matches) > 0) {
-    result <- responses_dt[matches]
+      result <- responses_dt[matches]
+      return(result)
+    }
Apply similar changes to the other conditional blocks at lines 478-486 and 491-501.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d50e653 and df872e1.

📒 Files selected for processing (1)

R/cellosaurus.R (12 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: pkgdown
GitHub Check: ubuntu-latest (devel)
GitHub Check: Code-Coverage
GitHub Check: ubuntu-latest (oldrel-1)
GitHub Check: macos-latest (release)
GitHub Check: ubuntu-latest (release)

🔇 Additional comments (9)

R/cellosaurus.R (9)

87-88: LGTM: New include_query parameter.

The new parameter is well-documented and defaults to TRUE, maintaining backward compatibility.

Also applies to: 110-110

155-169: LGTM: Improved readability.

The multi-line formatting of early returns improves code readability without changing functionality.

182-189: LGTM: Named arguments improve clarity.

Using named arguments for .parse_cellosaurus_text makes the function call more explicit and maintainable.

193-212: LGTM: Correctly implements include_query feature.

The logic correctly removes query columns when include_query = FALSE and reorders columns to prioritize core fields (cellLineName, accession) while placing query columns at the end when included.

228-242: LGTM: Good practice using integer literals.

The change from j - 1 to j - 1L is good R practice, ensuring integer arithmetic.

248-279: LGTM: Consistent formatting improvements.

The multi-line formatting of function parameters and control structures improves readability without altering logic.

371-416: LGTM: Well-structured column renaming.

The explicit old_names and new_names vectors with skip_absent = TRUE provide a clear, maintainable mapping from Cellosaurus field codes to user-friendly names.

520-541: LGTM: Consistent formatting.

The multi-line formatting in .formatSynonyms and .formatComments aligns with the rest of the refactored code.

1-541: PR description doesn't match changes in this file.

The PR title and description focus on "handle comments properly instead of hard-coding skips" and using comment = "#" for TSV parsing. However, none of the changes in R/cellosaurus.R relate to TSV comment handling. The actual changes are:

Adding the include_query parameter

Wide formatting/style normalization

Column renaming improvements

The TSV comment handling mentioned in the PR description is likely in R/cellosaurus_helpers.R or another file that wasn't provided for review.

coderabbitai bot reviewed Oct 21, 2025

View reviewed changes

mtran-code added 2 commits October 21, 2025 15:25

fix: properly skip tsv comments instead of hardcoding skips

e1daecd

refactor: streamline cellosaurus column ordering

10fe414

mtran-code force-pushed the mtran/cellosaurus-comment branch from df872e1 to 10fe414 Compare October 21, 2025 19:31

chore: restore cellosaurus column ordering

6b1e9af

mtran-code force-pushed the mtran/cellosaurus-comment branch from 2a4261a to 6b1e9af Compare October 21, 2025 21:25

mtran-code merged commit f57851f into main Oct 21, 2025
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: handle comments properly instead of hard-coding skips #67

fix: handle comments properly instead of hard-coding skips #67

Uh oh!

mtran-code commented Oct 21, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 21, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

codecov bot commented Oct 21, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: handle comments properly instead of hard-coding skips #67

fix: handle comments properly instead of hard-coding skips #67

Uh oh!

Conversation

mtran-code commented Oct 21, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mtran-code commented Oct 21, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 21, 2025 •

edited

Loading

codecov bot commented Oct 21, 2025 •

edited

Loading