-
Notifications
You must be signed in to change notification settings - Fork 1
fix: handle comments properly instead of hard-coding skips #67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughRemoved conditional post-processing that dropped or reordered query-related columns in Changes
Sequence Diagram(s)sequenceDiagram
participant Caller as mapCell2Accession
participant Parser as .parse_resp_tsv
participant DT as responses_dt
Note over Caller,Parser: Request and parsing phase
Caller->>Parser: fetch & parse TSV (comment="#" handled)
Parser-->>Caller: return parsed rows
alt Old behavior (before)
Caller->>DT: drop query cols if include_query==FALSE
Caller->>DT: reorder cols -> (core, extra, query)
DT-->>Caller: final reordered table
else New behavior (now)
Caller->>DT: no query-specific dropping
Caller->>DT: preserve parsed column order
DT-->>Caller: final table unchanged in order
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
💤 Files with no reviewable changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
R/cellosaurus.R (1)
204-217: Column reordering logic is correct but could be simplified.The logic correctly places core columns first, followed by extra columns, with the
querycolumn at the end wheninclude_query = TRUE. However, the nested conditionals make it harder to follow.Consider this simpler approach:
core_cols <- c("cellLineName", "accession") existing_core <- core_cols[core_cols %in% names(responses_dt)] if (length(existing_core) > 0L) { - extra_cols <- setdiff(names(responses_dt), existing_core) - if (include_query && "query" %in% extra_cols) { - extra_cols <- setdiff(extra_cols, "query") - data.table::setcolorder( - responses_dt, - c(existing_core, extra_cols, "query") - ) - } else { - data.table::setcolorder(responses_dt, c(existing_core, extra_cols)) - } + other_cols <- setdiff(names(responses_dt), existing_core) + # When include_query is TRUE and query exists, move it to the end + if (include_query && "query" %in% other_cols) { + other_cols <- c(setdiff(other_cols, "query"), "query") + } + data.table::setcolorder(responses_dt, c(existing_core, other_cols)) }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
DESCRIPTION(1 hunks)R/cellosaurus.R(12 hunks)R/cellosaurus_annotations.R(3 hunks)R/cellosaurus_helpers.R(2 hunks)R/utils-httr2.R(1 hunks)man/mapCell2Accession.Rd(2 hunks)tests/testthat/test_cellosaurus.R(2 hunks)tests/testthat/test_cellosaurus_helpers.R(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: ubuntu-latest (release)
- GitHub Check: ubuntu-latest (devel)
- GitHub Check: ubuntu-latest (oldrel-1)
- GitHub Check: Code-Coverage
- GitHub Check: macos-latest (release)
- GitHub Check: Test-Docker-Build
- GitHub Check: pkgdown
🔇 Additional comments (16)
DESCRIPTION (1)
44-44: LGTM! Routine documentation tool version update.The RoxygenNote version bump from 7.3.2 to 7.3.3 is a standard maintenance update with no functional impact.
R/utils-httr2.R (1)
64-69: Excellent fix! Dynamic comment handling replaces hard-coded skips.Adding
comment = "#"toread_tsvenables automatic filtering of comment lines, eliminating the need for hard-codedskipvalues. This makes the parsing more robust and maintainable.R/cellosaurus_annotations.R (2)
19-34: LGTM! Improved readability through multi-line formatting.The default parameter vector is now formatted across multiple lines, improving code readability without changing behavior.
51-58: LGTM! Consistent multi-line formatting.The parallel request invocation and raw response handling are reformatted for clarity with no behavioral changes.
tests/testthat/test_cellosaurus.R (2)
116-120: LGTM! Good test coverage for user-friendly column names.This test verifies that
parsed = FALSEreturns metadata columns with readable names (e.g.,synonyms,diseases), ensuring a better user experience.
122-128: LGTM! Comprehensive test coverage for include_query parameter.These tests verify that
include_query = FALSEcorrectly removes query-related columns in both parsed and unparsed modes, confirming the feature works as designed.man/mapCell2Accession.Rd (2)
40-41: LGTM! Clear documentation of the new parameter.The
include_queryparameter is well-documented with its purpose and default value clearly stated.
47-49: LGTM! Helpful clarification of return value behavior.The updated documentation clearly explains that
parsed = FALSEreturns user-friendly column names, improving the API's usability documentation.R/cellosaurus_helpers.R (2)
66-85: LGTM! Improved parameter readability.Multi-line formatting of function parameters enhances code readability without affecting functionality.
154-262: LGTM! Clear vertical layout for external resources.The one-per-line formatting makes the list of external resources easier to scan and maintain.
tests/testthat/test_cellosaurus_helpers.R (2)
35-38: LGTM! Test expectation correctly updated for comment filtering.The nrow expectation change from 2 to 1 reflects that comment lines are now properly filtered by the
comment = "#"parameter, confirming the fix is working as intended.
70-72: LGTM! Test expectation correctly updated for comment filtering.The nrow expectation change from 3 to 2 reflects that comment lines are now properly skipped, validating the comment handling improvement.
R/cellosaurus.R (4)
87-94: LGTM! Well-documented new parameter.The
include_queryparameter is clearly documented with examples of affected columns and default behavior.
100-112: LGTM! Clean API extension.The new
include_queryparameter is added with a sensible default (TRUE) that preserves backward compatibility.
193-202: LGTM! Correct implementation of query column removal.The regex pattern
^querycorrectly matches all query-related columns (e.g.,query,query:ac), and the data.table column deletion syntax is appropriate.
182-189: LGTM! Named parameters improve code clarity.Using named parameters for
.parse_cellosaurus_textmakes the function call more readable and maintainable.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #67 +/- ##
==========================================
+ Coverage 77.76% 77.83% +0.06%
==========================================
Files 20 20
Lines 1309 1313 +4
==========================================
+ Hits 1018 1022 +4
Misses 291 291 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
R/cellosaurus.R (3)
18-56: Simplify boolean comparisons.The explicit comparisons
common == TRUEandupper == TRUEare redundant in R. The idiomatic approach is to useif (common)andif (upper)directly.Apply this diff:
- if (common == TRUE) { + if (common) { fields <- c( "id", "ac", @@ ... @@ - if (upper == TRUE) { + if (upper) { fields <- toupper(fields)
459-459: TODO: Refactor to eliminate duplicatematchNestedcalls.The TODO comment correctly identifies code duplication. Currently,
matchNestedis called twice in each conditional block—once to check if results exist and again to retrieve them. This doubles the computation.Would you like me to generate a refactored implementation that stores
matchNestedresults to avoid duplicate calls?
463-501: RedundantmatchNestedcalls impact performance.In multiple conditional blocks (lines 464-468 & 471-475, lines 478-479 & 481-485, lines 491-495 & 497-500),
matchNestedis called twice with identical arguments. For large datasets, this redundancy could noticeably degrade performance.Apply this pattern to eliminate duplicate calls:
- } else if ( - length(matchNested( - query, - responses_dt, - keep_duplicates = keep_duplicates - )) > - 0 - ) { - matches <- matchNested( - query, - responses_dt, - keep_duplicates = keep_duplicates - ) + } else { + matches <- matchNested(query, responses_dt, keep_duplicates = keep_duplicates) + if (length(matches) > 0) { - result <- responses_dt[matches] + result <- responses_dt[matches] + return(result) + }Apply similar changes to the other conditional blocks at lines 478-486 and 491-501.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
R/cellosaurus.R(12 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: pkgdown
- GitHub Check: ubuntu-latest (devel)
- GitHub Check: Code-Coverage
- GitHub Check: ubuntu-latest (oldrel-1)
- GitHub Check: macos-latest (release)
- GitHub Check: ubuntu-latest (release)
🔇 Additional comments (9)
R/cellosaurus.R (9)
87-88: LGTM: Newinclude_queryparameter.The new parameter is well-documented and defaults to TRUE, maintaining backward compatibility.
Also applies to: 110-110
155-169: LGTM: Improved readability.The multi-line formatting of early returns improves code readability without changing functionality.
182-189: LGTM: Named arguments improve clarity.Using named arguments for
.parse_cellosaurus_textmakes the function call more explicit and maintainable.
193-212: LGTM: Correctly implementsinclude_queryfeature.The logic correctly removes query columns when
include_query = FALSEand reorders columns to prioritize core fields (cellLineName,accession) while placing query columns at the end when included.
228-242: LGTM: Good practice using integer literals.The change from
j - 1toj - 1Lis good R practice, ensuring integer arithmetic.
248-279: LGTM: Consistent formatting improvements.The multi-line formatting of function parameters and control structures improves readability without altering logic.
371-416: LGTM: Well-structured column renaming.The explicit old_names and new_names vectors with
skip_absent = TRUEprovide a clear, maintainable mapping from Cellosaurus field codes to user-friendly names.
520-541: LGTM: Consistent formatting.The multi-line formatting in
.formatSynonymsand.formatCommentsaligns with the rest of the refactored code.
1-541: PR description doesn't match changes in this file.The PR title and description focus on "handle comments properly instead of hard-coding skips" and using
comment = "#"for TSV parsing. However, none of the changes inR/cellosaurus.Rrelate to TSV comment handling. The actual changes are:
- Adding the
include_queryparameter- Wide formatting/style normalization
- Column renaming improvements
The TSV comment handling mentioned in the PR description is likely in
R/cellosaurus_helpers.Ror another file that wasn't provided for review.
df872e1 to
10fe414
Compare
2a4261a to
6b1e9af
Compare
From
test_that(".build_cellosaurus_request is acting as expected"):Noticed we skip comment lines in TSVs by manually skipped 14 lines. Instead, just use the
comment = "#"param to skip any lines starting with#. This way, we don't need to hard code how many lines to skip.Summary by CodeRabbit
Improvements
Tests