-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
rna-transcription approach: a few improvements #4052
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
rna-transcription approach: a few improvements #4052
Conversation
- Renamed `chr` to `char` in the snippets because `chr` is a built-in function and although shadowing it in this case may not be a problem, it is still a bad practice when it can be avoided. - The dictionary-join approach mentions list comprehensions, but instead it uses a generator expression. Replaced this in the explanation and expanded to give the list comprehension based implementation along with a brief comparison. - The overview mentions one approach is four times faster. In a brief comparison, it varies from 2.5x for a very short string and up to 60x faster for a 10^6 long one. Probably not worth going into the details, but 4x is just innacurate.
exercises/practice/rna-transcription/.approaches/dictionary-join/content.md
Outdated
Show resolved
Hide resolved
|
Hi there @petrem! Thanks so much for this. I haven't taken a close look yet...but wanted to reply to your message/questions first.
Good call. Shadowing is just something to avoid, generally. Unless of course you program in a language where its SOP for some reason. I would probably go even further and just name it
I might go even further here and talk a little about how generator expressions avoid excess memory by not creating values in a list in memory and instead yielding values at runtime. The downside being that once consumed, a generator cannot be "re-run". They can also (paradoxically seeming) be slower, depending on the data and the expression. However, list comprehensions (or any concrete comprehension) do create their values in memory, which might quickly become a no-op when processing large amounts of data. For the size of data we're talking about in this exercise, its not terribly important - but worth discussing.
I don't know if you took a look at the performance article/benchmarks, but the info likely comes from there. And as you are probably well aware, accurate benchmarking is complicated and also highly dependent on hardware. The best you can do is be really honest about how you crafted the benchmark and what hardware was used. I'd love if someone did a quick once-over of the performance stuff .. and if it is just horrid, we should probably remove the mention of 4x or do the range you pointed out. But yeah, I was not really a fan of the perf stuff from this author, but was sort of in a position where I had to approve the PR.
Do you think this is worth mentioning? Yes. If you have the energy and interest, it is well worth mentioning. Python claims to be UTF-8 first and full able to process Unicode. So students should be aware of how that happens. |
…in/content.md Co-authored-by: András B Nagy <20251272+BNAndras@users.noreply.github.com>
|
Just took a deeper look, and I don't have any nits at the moment beyond the one found by @BNAndras. 😄 But will wait to merge in case you have additions after my comments above. 🎉 |
|
|
||
| ``` | ||
|
|
||
| For more information, check the [dictionary look-up with `join()` approach][approach-dictionary-join]. | ||
|
|
||
| ## Which approach to use? | ||
|
|
||
| The `translate()` with `maketrans()` approach benchmarked over four times faster than the dictionary look-up with `join()` approach. | ||
| The `translate()` with `maketrans()` approach benchmarked many times faster than the dictionary look-up with `join()` approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Over four times" gives an idea. "Many times" makes that statement very fuzzy. If someone took the time to benchmark, having an idea of the impact is good, even if "over four times" is not exactly precise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually like the fuzziness here. I would also link to (or mention/link) the benchmarking doc (if it has acceptable data). That way, students can read about it and also re-run the benchmarks if so desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know, for me, 4 times faster would give me the push to check it out on my system and version to see if it tracks. And also to investigate the "why" a bit more. The fuzzy is more "handwavy" to me, like "do not worry about it".
But good benchmark data is good to. :) Benchmarking well can be difficult.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As written, the text implies benchmark performance is the only reason to pick translate() over join(). What about readability? Maintainability? Preferably, this text should indicate that if performance is important, then the benchmarks suggest translate() is a good choice.
I'd also probably include a sentence about what was actually benchmarked if we have that information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BNAndras - agree that there needs to be more reasons for choosing one strategy over another. And there likely needs to be more than just the two strategies (why not use a regexp?) ... but I am loathe to sign up @petrem for anything he doesn't want to do. 😉 But consider this an open invitation to find a few more strategies to write up, should any of you want to. 😁
For benchmarking, we have https://github.com/exercism/python/blob/main/exercises/practice/rna-transcription/.articles/performance/content.md and related stuff for this exercise (hence my suggestion to link to it). If that is not sufficient (and no, I don't really think it is), we should either remove it and remove mention of benchmarks, or revise it and be clearer. But again, I don't want to sign anyone up for extra work unless they want to do it. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could lead into the benchmarking by saying if performance is important, then consider using translate. That’d be a fairly small tweak, and we can build off that later if other approaches get added.
| @@ -5,7 +5,7 @@ LOOKUP = {"G": "C", "C": "G", "T": "A", "A": "U"} | |||
|
|
|||
|
|
|||
| def to_rna(dna_strand): | |||
| return ''.join(LOOKUP[chr] for chr in dna_strand) | |||
| return ''.join(LOOKUP[char] for char in dna_strand) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using "character" as a full word is better than using "char" as an abbreviation, especially since it is also a full word on its own. And there are very few abbreviations in this document, other than the defined in context dna/DNA and rna/RNA of the subject matter. That is a good thing.
|
Thank you. I'll make some updates tomorrow (ASCII/unicode and perhaps take another look at the list comprehension vs generator expression).
I was actually tempted to name it |
I like that. 😄 |
Exactly, using language at the problem being solved (domain language) is definitely good for showing intent in the reading of the code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| [approach-dictionary-join]: https://exercism.org/tracks/python/exercises/rna-transcription/approaches/dictionary-join | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for blank line at the end of markdown/asciidoc files is that when processing, you end up with a blank line between the sources, which is generally intended. That is to say that the final line of a document of this type is typically terminated as \n\n.
| looks up the DNA character in the look-up dictionary, and outputs its matching RNA character as an element in the list. | ||
|
|
||
| The `join()` method collects the list of RNA characters back into a string. | ||
| The `join()` method collects the RNA characters back into a string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The `join()` method collects the RNA characters back into a string. | |
| The `join()` method collects the RNA characters into a string. |
We're joining the RNA characters into a new string, not the original string so "back" is a bit confusing.
| return ''.join([LOOKUP[char] for char in dna_strand]) | ||
| ``` | ||
|
|
||
| For a relatively small number of elements, using lists is fine and is usually even faster, but as the size of the data increases, the memory consumption increases and performance decreases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| For a relatively small number of elements, using lists is fine and is usually even faster, but as the size of the data increases, the memory consumption increases and performance decreases. | |
| For a relatively small number of elements, using lists is fine and may be faster, but as the number of elements increases, the memory consumption increases and performance decreases. |
The size of the data makes me think about the size of each element, but the way the sentence is written, we're referring to the number of elements.
| return ''.join([LOOKUP[char] for char in dna_strand]) | ||
| ``` | ||
|
|
||
| For a relatively small number of elements, using lists is fine and is usually even faster, but as the size of the data increases, the memory consumption increases and performance decreases. See also [this discussion][list-comprehension-choose-generator-expression] regarding when to choose one over the other. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"this discussion" is fairly generic and doesn't describe the topic very well. That could be an issue for screen readers.
https://github.com/github/markdownlint-github/blob/main/docs/rules/GH002-no-generic-link-text.md
chrtocharin the snippets becausechris a built-in function and although shadowing it in this case may not be a problem, it is still a bad practice when it can be avoided.I noticed that the
maketransbased approach mentions the translation is done between ASCII ordinal values. This happens to be true because the characters involved are ASCII letters, but the functions deal with unicode ordinals.Do you think this is worth mentioning?