Skip to content

Conversation

@petrem
Copy link
Contributor

@petrem petrem commented Dec 21, 2025

  • Renamed chr to char in the snippets because chr is a built-in function and although shadowing it in this case may not be a problem, it is still a bad practice when it can be avoided.
  • The dictionary-join approach mentions list comprehensions, but instead it uses a generator expression. Replaced this in the explanation and expanded to give the list comprehension based implementation along with a brief comparison.
  • The overview mentions one approach is four times faster. In a brief comparison, it varies from 2.5x for a very short string and up to 60x faster for a 10^6 long one. Probably not worth going into the details, but 4x is just innacurate.

I noticed that the maketrans based approach mentions the translation is done between ASCII ordinal values. This happens to be true because the characters involved are ASCII letters, but the functions deal with unicode ordinals.

Do you think this is worth mentioning?

- Renamed `chr` to `char` in the snippets because `chr` is a built-in function and
  although shadowing it in this case may not be a problem, it is still a bad practice
  when it can be avoided.
- The dictionary-join approach mentions list comprehensions, but instead it uses a
  generator expression. Replaced this in the explanation and expanded to give the list
  comprehension based implementation along with a brief comparison.
- The overview mentions one approach is four times faster. In a brief comparison, it
  varies from 2.5x for a very short string and up to 60x faster for a 10^6 long one.
  Probably not worth going into the details, but 4x is just innacurate.
@BethanyG
Copy link
Member

BethanyG commented Dec 21, 2025

Hi there @petrem! Thanks so much for this. I haven't taken a close look yet...but wanted to reply to your message/questions first.

Renamed chr to char in the snippets because chr is a built-in function and although shadowing it in this case may not be a problem, it is still a bad practice when it can be avoided.

Good call. Shadowing is just something to avoid, generally. Unless of course you program in a language where its SOP for some reason. I would probably go even further and just name it character - but I'm fine with char too.

The dictionary-join approach mentions list comprehensions, but instead it uses a generator expression. Replaced this in the explanation and expanded to give the list comprehension based implementation along with a brief comparison.

I might go even further here and talk a little about how generator expressions avoid excess memory by not creating values in a list in memory and instead yielding values at runtime. The downside being that once consumed, a generator cannot be "re-run". They can also (paradoxically seeming) be slower, depending on the data and the expression.

However, list comprehensions (or any concrete comprehension) do create their values in memory, which might quickly become a no-op when processing large amounts of data.

For the size of data we're talking about in this exercise, its not terribly important - but worth discussing.

The overview mentions one approach is four times faster. In a brief comparison, it varies from 2.5x for a very short string and up to 60x faster for a 10^6 long one. Probably not worth going into the details, but 4x is just inaccurate.

I don't know if you took a look at the performance article/benchmarks, but the info likely comes from there. And as you are probably well aware, accurate benchmarking is complicated and also highly dependent on hardware. The best you can do is be really honest about how you crafted the benchmark and what hardware was used. I'd love if someone did a quick once-over of the performance stuff .. and if it is just horrid, we should probably remove the mention of 4x or do the range you pointed out. But yeah, I was not really a fan of the perf stuff from this author, but was sort of in a position where I had to approve the PR.

I noticed that the maketrans based approach mentions the translation is done between ASCII ordinal values. This happens to be true because the characters involved are ASCII letters, but the functions deal with unicode ordinals.

Do you think this is worth mentioning?

Yes. If you have the energy and interest, it is well worth mentioning. Python claims to be UTF-8 first and full able to process Unicode. So students should be aware of how that happens.

…in/content.md

Co-authored-by: András B Nagy <20251272+BNAndras@users.noreply.github.com>
@BethanyG
Copy link
Member

Just took a deeper look, and I don't have any nits at the moment beyond the one found by @BNAndras. 😄 But will wait to merge in case you have additions after my comments above. 🎉


```

For more information, check the [dictionary look-up with `join()` approach][approach-dictionary-join].

## Which approach to use?

The `translate()` with `maketrans()` approach benchmarked over four times faster than the dictionary look-up with `join()` approach.
The `translate()` with `maketrans()` approach benchmarked many times faster than the dictionary look-up with `join()` approach.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Over four times" gives an idea. "Many times" makes that statement very fuzzy. If someone took the time to benchmark, having an idea of the impact is good, even if "over four times" is not exactly precise.

Copy link
Member

@BethanyG BethanyG Dec 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually like the fuzziness here. I would also link to (or mention/link) the benchmarking doc (if it has acceptable data). That way, students can read about it and also re-run the benchmarks if so desired.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, for me, 4 times faster would give me the push to check it out on my system and version to see if it tracks. And also to investigate the "why" a bit more. The fuzzy is more "handwavy" to me, like "do not worry about it".

But good benchmark data is good to. :) Benchmarking well can be difficult.

Copy link
Member

@BNAndras BNAndras Dec 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As written, the text implies benchmark performance is the only reason to pick translate() over join(). What about readability? Maintainability? Preferably, this text should indicate that if performance is important, then the benchmarks suggest translate() is a good choice.

I'd also probably include a sentence about what was actually benchmarked if we have that information.

Copy link
Member

@BethanyG BethanyG Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BNAndras - agree that there needs to be more reasons for choosing one strategy over another. And there likely needs to be more than just the two strategies (why not use a regexp?) ... but I am loathe to sign up @petrem for anything he doesn't want to do. 😉 But consider this an open invitation to find a few more strategies to write up, should any of you want to. 😁

For benchmarking, we have https://github.com/exercism/python/blob/main/exercises/practice/rna-transcription/.articles/performance/content.md and related stuff for this exercise (hence my suggestion to link to it). If that is not sufficient (and no, I don't really think it is), we should either remove it and remove mention of benchmarks, or revise it and be clearer. But again, I don't want to sign anyone up for extra work unless they want to do it. 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could lead into the benchmarking by saying if performance is important, then consider using translate. That’d be a fairly small tweak, and we can build off that later if other approaches get added.

@@ -5,7 +5,7 @@ LOOKUP = {"G": "C", "C": "G", "T": "A", "A": "U"}


def to_rna(dna_strand):
return ''.join(LOOKUP[chr] for chr in dna_strand)
return ''.join(LOOKUP[char] for char in dna_strand)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using "character" as a full word is better than using "char" as an abbreviation, especially since it is also a full word on its own. And there are very few abbreviations in this document, other than the defined in context dna/DNA and rna/RNA of the subject matter. That is a good thing.

@petrem
Copy link
Contributor Author

petrem commented Dec 21, 2025

Thank you. I'll make some updates tomorrow (ASCII/unicode and perhaps take another look at the list comprehension vs generator expression).

would probably go even further and just name it character - but I'm fine with char too.

I was actually tempted to name it nucleotide, I wonder if you (plural) think that would be a good idea, too?

@BethanyG
Copy link
Member

I was actually tempted to name it nucleotide, I wonder if you (plural) think that would be a good idea, too?

I like that. 😄

@kotp
Copy link
Member

kotp commented Dec 21, 2025

would probably go even further and just name it character - but I'm fine with char too.

I was actually tempted to name it nucleotide, I wonder if you (plural) think that would be a good idea, too?

Exactly, using language at the problem being solved (domain language) is definitely good for showing intent in the reading of the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[approach-dictionary-join]: https://exercism.org/tracks/python/exercises/rna-transcription/approaches/dictionary-join

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for blank line at the end of markdown/asciidoc files is that when processing, you end up with a blank line between the sources, which is generally intended. That is to say that the final line of a document of this type is typically terminated as \n\n.

looks up the DNA character in the look-up dictionary, and outputs its matching RNA character as an element in the list.

The `join()` method collects the list of RNA characters back into a string.
The `join()` method collects the RNA characters back into a string.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `join()` method collects the RNA characters back into a string.
The `join()` method collects the RNA characters into a string.

We're joining the RNA characters into a new string, not the original string so "back" is a bit confusing.

return ''.join([LOOKUP[char] for char in dna_strand])
```

For a relatively small number of elements, using lists is fine and is usually even faster, but as the size of the data increases, the memory consumption increases and performance decreases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For a relatively small number of elements, using lists is fine and is usually even faster, but as the size of the data increases, the memory consumption increases and performance decreases.
For a relatively small number of elements, using lists is fine and may be faster, but as the number of elements increases, the memory consumption increases and performance decreases.

The size of the data makes me think about the size of each element, but the way the sentence is written, we're referring to the number of elements.

return ''.join([LOOKUP[char] for char in dna_strand])
```

For a relatively small number of elements, using lists is fine and is usually even faster, but as the size of the data increases, the memory consumption increases and performance decreases. See also [this discussion][list-comprehension-choose-generator-expression] regarding when to choose one over the other.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"this discussion" is fairly generic and doesn't describe the topic very well. That could be an issue for screen readers.

https://github.com/github/markdownlint-github/blob/main/docs/rules/GH002-no-generic-link-text.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants