-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
rna-transcription approach: a few improvements #4052
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -5,7 +5,7 @@ LOOKUP = {"G": "C", "C": "G", "T": "A", "A": "U"} | |||||
|
|
||||||
|
|
||||||
| def to_rna(dna_strand): | ||||||
| return ''.join(LOOKUP[chr] for chr in dna_strand) | ||||||
| return ''.join(LOOKUP[char] for char in dna_strand) | ||||||
|
|
||||||
| ``` | ||||||
|
|
||||||
|
|
@@ -16,15 +16,32 @@ but the `LOOKUP` dictionary is defined with all uppercase letters, which is the | |||||
| It indicates that the value is not intended to be changed. | ||||||
|
|
||||||
| In the `to_rna()` function, the [`join()`][join] method is called on an empty string, | ||||||
| and is passed the list created from a [list comprehension][list-comprehension]. | ||||||
| and is passed the list created from a [generator expression][generator-expression]. | ||||||
|
|
||||||
| The list comprehension iterates each character in the input, | ||||||
| The generator expression iterates each character in the input, | ||||||
| looks up the DNA character in the look-up dictionary, and outputs its matching RNA character as an element in the list. | ||||||
|
|
||||||
| The `join()` method collects the list of RNA characters back into a string. | ||||||
| The `join()` method collects the RNA characters back into a string. | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
We're joining the RNA characters into a new string, not the original string so "back" is a bit confusing. |
||||||
| Since an empty string is the separator for the `join()`, there are no spaces between the RNA characters in the string. | ||||||
|
|
||||||
| A generator expression is similar to a [list comprehension][list-comprehension], but instead of creating a list, it returns a generator, and iterating that generator yields the elements on the fly. | ||||||
|
|
||||||
| A variant that uses a list comprehension is almost identical, but note the additional square brackets inside the `join()`: | ||||||
|
|
||||||
| ```python | ||||||
| LOOKUP = {"G": "C", "C": "G", "T": "A", "A": "U"} | ||||||
|
|
||||||
|
|
||||||
| def to_rna(dna_strand): | ||||||
| return ''.join([LOOKUP[char] for char in dna_strand]) | ||||||
| ``` | ||||||
|
|
||||||
| For a relatively small number of elements, using lists is fine and is usually even faster, but as the size of the data increases, the memory consumption increases and performance decreases. | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
The size of the data makes me think about the size of each element, but the way the sentence is written, we're referring to the number of elements. |
||||||
| See also [this discussion][list-comprehension-choose-generator-expression] regarding when to choose one over the other. | ||||||
|
|
||||||
| [dictionaries]: https://docs.python.org/3/tutorial/datastructures.html?#dictionaries | ||||||
| [const]: https://realpython.com/python-constants/ | ||||||
| [join]: https://docs.python.org/3/library/stdtypes.html?#str.join | ||||||
| [list-comprehension]: https://realpython.com/list-comprehension-python/#using-list-comprehensions | ||||||
| [list-comprehension-choose-generator-expression]: https://realpython.com/list-comprehension-python/#choose-generators-for-large-datasets | ||||||
| [generator-expression]: https://realpython.com/introduction-to-python-generators/#building-generators-with-generator-expressions | ||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -30,15 +30,15 @@ LOOKUP = {"G": "C", "C": "G", "T": "A", "A": "U"} | |
|
|
||
|
|
||
| def to_rna(dna_strand): | ||
| return ''.join(LOOKUP[chr] for chr in dna_strand) | ||
| return ''.join(LOOKUP[char] for char in dna_strand) | ||
|
|
||
| ``` | ||
|
|
||
| For more information, check the [dictionary look-up with `join()` approach][approach-dictionary-join]. | ||
|
|
||
| ## Which approach to use? | ||
|
|
||
| The `translate()` with `maketrans()` approach benchmarked over four times faster than the dictionary look-up with `join()` approach. | ||
| The `translate()` with `maketrans()` approach benchmarked many times faster than the dictionary look-up with `join()` approach. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "Over four times" gives an idea. "Many times" makes that statement very fuzzy. If someone took the time to benchmark, having an idea of the impact is good, even if "over four times" is not exactly precise.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I actually like the fuzziness here. I would also link to (or mention/link) the benchmarking doc (if it has acceptable data). That way, students can read about it and also re-run the benchmarks if so desired.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I know, for me, 4 times faster would give me the push to check it out on my system and version to see if it tracks. And also to investigate the "why" a bit more. The fuzzy is more "handwavy" to me, like "do not worry about it". But good benchmark data is good to. :) Benchmarking well can be difficult.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As written, the text implies benchmark performance is the only reason to pick I'd also probably include a sentence about what was actually benchmarked if we have that information.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @BNAndras - agree that there needs to be more reasons for choosing one strategy over another. And there likely needs to be more than just the two strategies (why not use a regexp?) ... but I am loathe to sign up @petrem for anything he doesn't want to do. 😉 But consider this an open invitation to find a few more strategies to write up, should any of you want to. 😁 For benchmarking, we have https://github.com/exercism/python/blob/main/exercises/practice/rna-transcription/.articles/performance/content.md and related stuff for this exercise (hence my suggestion to link to it). If that is not sufficient (and no, I don't really think it is), we should either remove it and remove mention of benchmarks, or revise it and be clearer. But again, I don't want to sign anyone up for extra work unless they want to do it. 😄
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we could lead into the benchmarking by saying if performance is important, then consider using translate. That’d be a fairly small tweak, and we can build off that later if other approaches get added. |
||
|
|
||
| [ASCII]: https://www.asciitable.com/ | ||
| [approach-translate-maketrans]: https://exercism.org/tracks/python/exercises/rna-transcription/approaches/translate-maketrans | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using "character" as a full word is better than using "char" as an abbreviation, especially since it is also a full word on its own. And there are very few abbreviations in this document, other than the defined in context dna/DNA and rna/RNA of the subject matter. That is a good thing.