-
Notifications
You must be signed in to change notification settings - Fork 1
Multi threading #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
ce378da to
c6f64bd
Compare
5f513e0 to
f55c4b3
Compare
f55c4b3 to
8b8d059
Compare
3550101 to
73979e9
Compare
* public API to manage the DSP thread pool * private API to create DSP task queues and DSP tasks * PD_DSPTHREADS define to enable/disable the thread pool at compile time
* add global DSP task queue * reset and join DSP global task queue in dsp_tick() * add "-threads" command line option to set number of DSP threads
* add thread local variable for the current DSP thread index * if clocks are set/unset in the main thread (= 0), do as usual * if clocks are set/unset in a DSP helper thread (> 0), cache the desired time point and put the clock on a lock-free stack * at the end of dsp_tick() we take the list of deferred clocks and set/unset them for real.
* swapping/resizing an array doesn't require a DSP graph update * in control objects, garrayref can be faster because the garray is cached * in perform routines, garrayref allows thread-safe read or write access to garrays
protect st_soundout with spinlocks
expands to CLASS_THREADSAFE if PD_PARALLEL is 1
* add a_numthreads to t_audiosettings * add "threads" to preferences * add "Audio threads" option in audio settings dialog
by default, the DSP graph uses the global signal context. However, if a graph is processed in parallel, it needs its own signal context. For this purpose, we can also temporarily push/pop a new signal context with signalcontext_push() and signalcontext_pop().
* add global DSP task queue * parallel canvasses use their own signal context * vinlet makes a copy of the parent input signal (in the new context) * voutlet uses double buffering: voutlet_dspprolog() writes last buffer to parent output signal, voutlet_dsp() writes input signal to buffer.
* if "join" is true, block~ has its own DSP task queue * in ugen_donegraph, we push the DSP task queue, reset it and finally join it. this will synchronize all parallel DSP tasks in subpatches or child abstractions.
* each DSP task object registers itself with every outer switch~ object. If a switch~ object changes it state, it notifies all its DSP task children, so they can in turn notify their DSP task queue. * DSP tasks in switched-off parallel canvasses are simply not scheduled. * each DSP task queue maintains a count of switched-off DSP tasks, so that it won't lock up in case there is a switch~ *between* the queue and the task.
* if "parallel" is true, clone has its own signal context and DSP task queue. * child abstractions are scheduled as DSP tasks and finally joined. * the cloned abstractions don't need to know that they are being processed in parallel, we can carry out all necessary steps in clone_dsp().
dsptaskqueue_update() checks if the canvas sub tree contains non-thread-safe objects. Note that any non-thread-safe objects *outside* the tree are ignored. This function is called in ugen_done_graph() for canvases with block~ + "join" and also in ugen_start() for the toplevel queue. dsptaskqueue_check() checks if the task queue is thread-safe, and if false, posts the first N non-thread-safe object (only once per queue). This is called in ugen_done_graph() for canvases with block~ + "parallel". Each parallel canvas checks if the enclosing joining canvas is thread-safe and prints an error message if false. As an optimization, we traverse the complete object tree once in ugen_start() and mark every sub-tree (depth first) by setting gl_threadsafe. This speeds up subsequent calls to canvas_isthreadsafe() tremendously. clone is handled specially since it contains both the DSP task queue and the DSP tasks, so the check can be performed all inside clone_dsp(). Thread safety checks can be disabled with the "-nothreadsafe" command line option.
... and without -nothreadsafe
used in thread_physical_concurrency() and later for thread pinning
instead of going to sleep everytime the task queue is (temporarily) empty, the DSP helper threads only wake up once at the beginning of each DSP tick and then they spin-wait until all tasks have been finished. Pro: minimize wake-up latency = more stable performance Con: burning CPU cycles Can be enabled/disabled with the "-spinwait" resp. "-nospinwait" command line options.
allow to pin DSP threads to dedicated cores; useful for spinwaiting! Can be enabled/disabled with the "-affinity" resp. "-noaffinity" command line options
7f78314 to
d35002f
Compare
d35002f to
29ced3f
Compare
|
@umlaeute What does the |
|
You have to add all the headers to the src/Makefile.am
|
umlaeute
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually the headers are not installed (hence the noinst_ prefix), but that is just as it should be (it's just that the commit message is a bit misleading) :-P
It was just for testing :-)
Actually, |
|
Can you add screenshots of the block and clone help patches? |
|
@danomatika done :-) |
| #endif | ||
|
|
||
| /* override for parallel processing support */ | ||
| #ifndef PD_PARALLEL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I would call this PDPARALLEL without the underscore to better match the existing PDINSTANCE and PDTHREADS defines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. On the other hand, there are quite a few constants/defines with a PD_ prefix:
PD_INTERNAL, PD_LONGINTTYPE, PD_FLOATSIZE, PD_BIGORSMALL, PD_<loglevel>, etc. Personally, I think PD_<name> is better style, but I don't care too much :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, any suggestions for better naming are very much appreciated! (Naming things is hard...)
dummy PR for development
configure with
--enable-parallel!check
block~-help.pdandclone-help.pdfor help and usage examples.