rna-transcription approach: a few improvements #4052

petrem · 2025-12-21T20:05:00Z

Renamed chr to char in the snippets because chr is a built-in function and although shadowing it in this case may not be a problem, it is still a bad practice when it can be avoided.
The dictionary-join approach mentions list comprehensions, but instead it uses a generator expression. Replaced this in the explanation and expanded to give the list comprehension based implementation along with a brief comparison.
The overview mentions one approach is four times faster. In a brief comparison, it varies from 2.5x for a very short string and up to 60x faster for a 10^6 long one. Probably not worth going into the details, but 4x is just innacurate.

I noticed that the maketrans based approach mentions the translation is done between ASCII ordinal values. This happens to be true because the characters involved are ASCII letters, but the functions deal with unicode ordinals.

Do you think this is worth mentioning?

- Renamed `chr` to `char` in the snippets because `chr` is a built-in function and although shadowing it in this case may not be a problem, it is still a bad practice when it can be avoided. - The dictionary-join approach mentions list comprehensions, but instead it uses a generator expression. Replaced this in the explanation and expanded to give the list comprehension based implementation along with a brief comparison. - The overview mentions one approach is four times faster. In a brief comparison, it varies from 2.5x for a very short string and up to 60x faster for a 10^6 long one. Probably not worth going into the details, but 4x is just innacurate.

exercises/practice/rna-transcription/.approaches/dictionary-join/content.md

BethanyG · 2025-12-21T21:12:58Z

Hi there @petrem! Thanks so much for this. I haven't taken a close look yet...but wanted to reply to your message/questions first.

Renamed chr to char in the snippets because chr is a built-in function and although shadowing it in this case may not be a problem, it is still a bad practice when it can be avoided.

Good call. Shadowing is just something to avoid, generally. Unless of course you program in a language where its SOP for some reason. I would probably go even further and just name it character - but I'm fine with char too.

The dictionary-join approach mentions list comprehensions, but instead it uses a generator expression. Replaced this in the explanation and expanded to give the list comprehension based implementation along with a brief comparison.

I might go even further here and talk a little about how generator expressions avoid excess memory by not creating values in a list in memory and instead yielding values at runtime. The downside being that once consumed, a generator cannot be "re-run". They can also (paradoxically seeming) be slower, depending on the data and the expression.

However, list comprehensions (or any concrete comprehension) do create their values in memory, which might quickly become a no-op when processing large amounts of data.

For the size of data we're talking about in this exercise, its not terribly important - but worth discussing.

The overview mentions one approach is four times faster. In a brief comparison, it varies from 2.5x for a very short string and up to 60x faster for a 10^6 long one. Probably not worth going into the details, but 4x is just inaccurate.

I don't know if you took a look at the performance article/benchmarks, but the info likely comes from there. And as you are probably well aware, accurate benchmarking is complicated and also highly dependent on hardware. The best you can do is be really honest about how you crafted the benchmark and what hardware was used. I'd love if someone did a quick once-over of the performance stuff .. and if it is just horrid, we should probably remove the mention of 4x or do the range you pointed out. But yeah, I was not really a fan of the perf stuff from this author, but was sort of in a position where I had to approve the PR.

I noticed that the maketrans based approach mentions the translation is done between ASCII ordinal values. This happens to be true because the characters involved are ASCII letters, but the functions deal with unicode ordinals.

Do you think this is worth mentioning?

Yes. If you have the energy and interest, it is well worth mentioning. Python claims to be UTF-8 first and full able to process Unicode. So students should be aware of how that happens.

…in/content.md Co-authored-by: András B Nagy <20251272+BNAndras@users.noreply.github.com>

BethanyG · 2025-12-21T21:27:18Z

Just took a deeper look, and I don't have any nits at the moment beyond the one found by @BNAndras. 😄 But will wait to merge in case you have additions after my comments above. 🎉

kotp · 2025-12-21T22:14:44Z

exercises/practice/rna-transcription/.approaches/introduction.md


 ```

 For more information, check the [dictionary look-up with `join()` approach][approach-dictionary-join].

 ## Which approach to use?

-The `translate()` with `maketrans()` approach benchmarked over four times faster than the dictionary look-up with `join()` approach.
+The `translate()` with `maketrans()` approach benchmarked many times faster than the dictionary look-up with `join()` approach.


"Over four times" gives an idea. "Many times" makes that statement very fuzzy. If someone took the time to benchmark, having an idea of the impact is good, even if "over four times" is not exactly precise.

I actually like the fuzziness here. I would also link to (or mention/link) the benchmarking doc (if it has acceptable data). That way, students can read about it and also re-run the benchmarks if so desired.

I know, for me, 4 times faster would give me the push to check it out on my system and version to see if it tracks. And also to investigate the "why" a bit more. The fuzzy is more "handwavy" to me, like "do not worry about it".

But good benchmark data is good to. :) Benchmarking well can be difficult.

As written, the text implies benchmark performance is the only reason to pick translate() over join(). What about readability? Maintainability? Preferably, this text should indicate that if performance is important, then the benchmarks suggest translate() is a good choice.

I'd also probably include a sentence about what was actually benchmarked if we have that information.

@BNAndras - agree that there needs to be more reasons for choosing one strategy over another. And there likely needs to be more than just the two strategies (why not use a regexp?) ... but I am loathe to sign up @petrem for anything he doesn't want to do. 😉 But consider this an open invitation to find a few more strategies to write up, should any of you want to. 😁

For benchmarking, we have https://github.com/exercism/python/blob/main/exercises/practice/rna-transcription/.articles/performance/content.md and related stuff for this exercise (hence my suggestion to link to it). If that is not sufficient (and no, I don't really think it is), we should either remove it and remove mention of benchmarks, or revise it and be clearer. But again, I don't want to sign anyone up for extra work unless they want to do it. 😄

I think we could lead into the benchmarking by saying if performance is important, then consider using translate. That’d be a fairly small tweak, and we can build off that later if other approaches get added.

kotp · 2025-12-21T22:16:10Z

exercises/practice/rna-transcription/.approaches/dictionary-join/content.md

@@ -5,7 +5,7 @@ LOOKUP = {"G": "C", "C": "G", "T": "A", "A": "U"}


 def to_rna(dna_strand):
-    return ''.join(LOOKUP[chr] for chr in dna_strand)
+    return ''.join(LOOKUP[char] for char in dna_strand)


Using "character" as a full word is better than using "char" as an abbreviation, especially since it is also a full word on its own. And there are very few abbreviations in this document, other than the defined in context dna/DNA and rna/RNA of the subject matter. That is a good thing.

petrem · 2025-12-21T22:21:21Z

Thank you. I'll make some updates tomorrow (ASCII/unicode and perhaps take another look at the list comprehension vs generator expression).

would probably go even further and just name it character - but I'm fine with char too.

I was actually tempted to name it nucleotide, I wonder if you (plural) think that would be a good idea, too?

BethanyG · 2025-12-21T22:30:31Z

I was actually tempted to name it nucleotide, I wonder if you (plural) think that would be a good idea, too?

I like that. 😄

kotp · 2025-12-21T22:32:28Z

would probably go even further and just name it character - but I'm fine with char too.

I was actually tempted to name it nucleotide, I wonder if you (plural) think that would be a good idea, too?

Exactly, using language at the problem being solved (domain language) is definitely good for showing intent in the reading of the code.

kotp · 2025-12-21T22:35:04Z

exercises/practice/rna-transcription/.approaches/introduction.md

Suggested change

[approach-dictionary-join]: https://exercism.org/tracks/python/exercises/rna-transcription/approaches/dictionary-join

The reason for blank line at the end of markdown/asciidoc files is that when processing, you end up with a blank line between the sources, which is generally intended. That is to say that the final line of a document of this type is typically terminated as \n\n.

BNAndras · 2025-12-21T22:59:38Z

exercises/practice/rna-transcription/.approaches/dictionary-join/content.md

 looks up the DNA character in the look-up dictionary, and outputs its matching RNA character as an element in the list.

-The `join()` method collects the list of RNA characters back into a string.
+The `join()` method collects the RNA characters back into a string.


Suggested change

The `join()` method collects the RNA characters back into a string.

The `join()` method collects the RNA characters into a string.

We're joining the RNA characters into a new string, not the original string so "back" is a bit confusing.

BNAndras · 2025-12-21T23:02:10Z

exercises/practice/rna-transcription/.approaches/dictionary-join/content.md

+    return ''.join([LOOKUP[char] for char in dna_strand])
+```
+
+For a relatively small number of elements, using lists is fine and is usually even faster, but as the size of the data increases, the memory consumption increases and performance decreases.


Suggested change

For a relatively small number of elements, using lists is fine and is usually even faster, but as the size of the data increases, the memory consumption increases and performance decreases.

For a relatively small number of elements, using lists is fine and may be faster, but as the number of elements increases, the memory consumption increases and performance decreases.

The size of the data makes me think about the size of each element, but the way the sentence is written, we're referring to the number of elements.

BNAndras · 2025-12-21T23:06:43Z

exercises/practice/rna-transcription/.approaches/dictionary-join/content.md

+    return ''.join([LOOKUP[char] for char in dna_strand])
+```
+
+For a relatively small number of elements, using lists is fine and is usually even faster, but as the size of the data increases, the memory consumption increases and performance decreases. See also [this discussion][list-comprehension-choose-generator-expression] regarding when to choose one over the other.


"this discussion" is fairly generic and doesn't describe the topic very well. That could be an issue for screen readers.

https://github.com/github/markdownlint-github/blob/main/docs/rules/GH002-no-generic-link-text.md

BNAndras reviewed Dec 21, 2025

View reviewed changes

exercises/practice/rna-transcription/.approaches/dictionary-join/content.md Outdated Show resolved Hide resolved

Update exercises/practice/rna-transcription/.approaches/dictionary-jo…

2e8a4e4

…in/content.md Co-authored-by: András B Nagy <20251272+BNAndras@users.noreply.github.com>

kotp reviewed Dec 21, 2025

View reviewed changes

BNAndras reviewed Dec 21, 2025

View reviewed changes


	[approach-dictionary-join]: https://exercism.org/tracks/python/exercises/rna-transcription/approaches/dictionary-join

	The `join()` method collects the RNA characters back into a string.
	The `join()` method collects the RNA characters into a string.

	For a relatively small number of elements, using lists is fine and is usually even faster, but as the size of the data increases, the memory consumption increases and performance decreases.
	For a relatively small number of elements, using lists is fine and may be faster, but as the number of elements increases, the memory consumption increases and performance decreases.

Uh oh!

rna-transcription approach: a few improvements #4052

Are you sure you want to change the base?

rna-transcription approach: a few improvements #4052

Conversation

petrem commented Dec 21, 2025

Uh oh!

Uh oh!

BethanyG commented Dec 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BethanyG commented Dec 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BethanyG Dec 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BNAndras Dec 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BethanyG Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

petrem commented Dec 21, 2025

Uh oh!

BethanyG commented Dec 21, 2025

Uh oh!

kotp commented Dec 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

BethanyG commented Dec 21, 2025 •

edited

Loading

BethanyG Dec 21, 2025 •

edited

Loading

BNAndras Dec 21, 2025 •

edited

Loading

BethanyG Dec 22, 2025 •

edited

Loading

kotp commented Dec 21, 2025 •

edited

Loading