-
Notifications
You must be signed in to change notification settings - Fork 47
Implement Read Aloud #169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Implement Read Aloud #169
Conversation
That works. Selected segments will probably be pretty short, so starting over each time isn't a big deal. (It also would start over at the beginning of the paragraph anyway!)
Sure. We can link to Spoken Content (GitHub won't link to this, but it's x-apple.systempreferences:com.apple.preference.universalaccess?SpokenContent), which is probably close enough.
In snapshots, definitely, since the page doesn't necessarily accurately set its
Did you pull the latest version after the force-push? (d57f0f2, not 08e3bae.) I broke something in the initial commit but I haven't seen any issue with switching voices in the latest.
Do you have an example of a snapshot (I'm assuming) where this happens? Did it happen in focus mode? We can change the styling of the highlight, but that sounds like an issue with the node mapping, not the display.
All sound good, on it. |
e7bbe28 to
9cddb4c
Compare
|
I've been thinking about how the reading "scope" (where Read Aloud should start and stop reading) should work. Here's one possibility:
@yexingsha, thoughts? |
|
This all sounds reasonable to me. Just to clarify, we can only start at the beginning of a block, right? So if the first visible block only has one line visible, we probably don't want to scroll back to where it begins and start from there. So does "the fist visible block of text on the page" mean the first block with a visible starting point? (Though if the visible block on the page starts before and ends after the current page, then I guess we have no choice but to scroll back and start from its beginning.) Also, I'm still having trouble switching voices on the latest version. The read aloud also seems to start with a different voice, then after pause or restart, it switches over to the first voice in the list and gets stuck there. Switching language works. I get this error when I try to switch voices: |
|
Oh, you’re testing in the client! I’ll open a PR that adds the necessary code there. I’ve been testing in dev mode (in the browser) so far. |
|
zotero/zotero#5355 will prevent the error in the client. |
|
@dstillman, @yexingsha: This is ready to test. The remote voice is obviously a mock - we need to settle on a provider first. The big caveat is that is that it's frankly pretty buggy in Chrome, but the bugs are all on the browser's end. Speech keeps playing after the tab is reloaded, sometimes speech doesn't work at all on any site until the browser is restarted, and the voice list takes a while to populate. That's in addition to the Google server-side voice issue I mentioned before. My focus is on making Read Aloud work well in the client, though. If Chrome is just too buggy, we don't have to enable it in the web library for now. |
|
(The mock remote voice won't work in the client because we explicitly block all remote resources there. But I think it would make sense to proxy requests through a |
|
This is great! Issues I've encountered so far:
|
|
Should all be fixed.
Yeah. I removed the pause behavior for the time being because it was just too unexpected IMO, and also caused a performance hit (because every selection change recalculated Read Aloud segments - though that could probably be worked around). You should be able to test it by checking out c470551. I think it makes sense in concept, but in practice, it seems like it'll be fairly common to pause the audio, select a passage and annotate or copy it somewhere, then start playing again. I definitely would not want reading to restart from my selection position (and, worse, stop at the end of it!) if I did that. What about a button in the popup that moves the reading position to the current selection instead? |
|
Looks like there is something wrong with b86bd42. With it the read aloud won't start, and clicking on settings will bug out the entire reader interface and throw this error:
Without this commit, the "starting incorrectly at beginning of document" and "no way to continue on after reading selected text" don't seem fixed either, but maybe there's something wrong with my build?
I tried the commit and see what you mean. Though I feel like we can avoid that by calculating the starting position when the play button is clicked, instead of when text selection changes. If I select something when the read aloud is paused, it shouldn't change the restart position immediately. If I then unselect the text and restart the read aloud, it should just continue from where it was paused. Only if I restart the read aloud when there is selected text should it switch position. Does that make sense? But perhaps we do need some way to start read aloud at a specific position, not just for selected text, but also in the case where you want to skip ahead a lot of pages, or start on a paragraph that's very low on a page. I've been testing other text-to-speech programs, and Speechify (with their chrome plugin) handles this by adding a little play button to the left of each paragraph on hover. Do you think this is something that's worth exploring? |
|
Did you update zotero/zotero as well? To zotero/zotero@ |
|
I did, and it works fine if I drop b86bd42. |
Oh, yeah, I can see that. Makes sense to me. What would we do about the active segment highlight, though? Should we hide it while reading is paused and there's a selection, to indicate that the selection will become the new reading target?
Possibly! I think it would get annoying if we showed it all the time, though, so maybe just when Read Aloud is open? |
|
Oh, go into Zotero Settings -> Advanced -> Config Editor, search for |
|
It works now! Speech and speed is persisting as expected, and read aloud can now continue on smoothly after reading selected text. But:
I think the opposite: when someone pauses read aloud to annotate, the segment highlight should remain visible to indicate where the read aloud will continue from. Only if they restart the read aloud when there is a selection should the segment highlight change to that.
Yeah, I agree. |
Ah, thanks, that's what I was missing when I was trying to reproduce this. What do we want to do in that case? We'd like to avoid changing the page when Read Aloud starts, but the best option here might just be to navigate back to the start of the half-cut-off paragraph if there's no fully visible paragraph to read from. |
|
An alternative I can think of is to "pseudo select" the visible text on the page, so that the read aloud can start at the beginning of the page, and then continue on normally from there. Do you think that's possible, and would that be a better experience than navigating back to the start of the paragraph? |
|
I'll see how feasible it would be to find the first sentence boundary (if there is one) on the current page, and start reading from that. If it doesn't find one, it could navigate back one page as a last resort. I think that's relatively OK behavior. |
|
It was not too feasible (lots and lots of complicated and slow code for a relatively uncommon edge case), but I added a bunch of fallbacks to at least prevent it from navigating to the start of the document. I think it more or less works OK now? I'll revisit if we end up needing to calculate the first visible bit of text in snapshots for some other reason. |
|
I tested it and it works pretty good. Is it feasible to scroll to the beginning of the paragraph when read aloud starts? Reading selection in EPUB is also fixed, but I found another issue: when the selected text is at the end of a paragraph, read aloud will automatically continue on instead of stopping at the end of selection. This happens in both snapshot and EPUB. |
Yeah, definitely.
I was just working on that (and a related issue, where it can't figure out how to split on a segment at the end of a block of text). |
|
Works very well now! The annotation style with rounded corners looks pretty good, but yes we should expand it horizontally by 2px, and set its height to line height so that multiple lines appear as an entire block, to make it more distinguishable from selection and annotation. |
f250694 to
d749180
Compare
|
I think I understand the problem with read aloud shortcuts. It would be nice to use simple shortcuts to pause/play/skip read aloud but all simple shortcuts are already taken. It looks like space is already responsible for playing/stopping read aloud whenever the panel is open. Right/left arrows are natural candidates for skip ahead/back as well. One issue now is that if I focus any button in the reader and press Space, that button will not be clicked but read-aloud will pause/continue. Enter on buttons still works as expected but this conditional Space behavior doesn't feel quite right. The same would apply to left/right arrows if they were to be added. A few ideas:
As a separate issue, I don't think it's currently possible to tab to the voices config dropdowns and read-aloud speed slider from the skip back/play/skip ahead section of the popup. |



With a new type of persistent, non-modal, draggable popup.
This is a mostly-complete prototype; there are a few kinks we still need to work out.
pause()/resume()because Firefox helpfully unpauses when your computer wakes from sleep, and Chrome doesn't honorpause()before any utterances are queued, along with other bugs. Instead, we have to cancel speaking on pause, which means restarting from the beginning of the line on unpause. I'm not happy about that!This only implements Read Aloud for the EPUB and snapshot views, but it should be straightforward to implement for PDFs once we stabilize the API.
Closes zotero/zotero#5327, see discussion in zotero/zotero#5326