Skip to content
This repository was archived by the owner on Jun 19, 2025. It is now read-only.

Conversation

@waylonflinn
Copy link
Contributor

Simplify the cython code with bcolz.iterblocks, as in @FrancescElies PR

@waylonflinn
Copy link
Contributor Author

Adapted from #24

@CarstVaartjes
Copy link
Member

Really nice work! the only thing what I need to check now if we can do this inside a nogil prange (for multi threading)

@waylonflinn
Copy link
Contributor Author

I've also got one more thing I want to add before this gets merged.

@waylonflinn
Copy link
Contributor Author

This is done, if you'd like to merge it.

@CarstVaartjes
Copy link
Member

I think that we cannot use bz.iterblocks because of this: http://docs.cython.org/src/userguide/external_C_code.html#nogil

@FrancescElies there is no cython implementation possible/planned right? as we cannot multi-thread python functions...

@waylonflinn
Copy link
Contributor Author

I've been thinking a lot about this and I'm planning a more general solution to problems like this. It will involve a map reduce style processing layer.

I don't think processing multiple columns simultaneously is the way forward. I'm going to focus on processing rows in a single chunk in parallel.


Sent from Mailbox

On Sun, Oct 11, 2015 at 2:18 PM, Carst Vaartjes notifications@github.com
wrote:

I think that we cannot use bz.iterblocks because of this: http://docs.cython.org/src/userguide/external_C_code.html#nogil

@FrancescElies there is not possible cython implementation right? as we cannot multi-thread python functions...

Reply to this email directly or view it on GitHub:
#53 (comment)

@CarstVaartjes
Copy link
Member

Let's see, really open to suggestions but the problem is that we have a few algorithms that depend on the processing order; so parallelism in the blocks that would disturb the order can give issues :/

P.s. why do you think column-parallelism is a bad idea?

@FrancescElies
Copy link
Contributor

...no cython implementation possible/planned? Not sure if I understand the question

@CarstVaartjes
Copy link
Member

Sorry! to explain myself: the iterblocks is currently defined here as a normal python function: https://github.com/Blosc/bcolz/blob/master/bcolz/toplevel.py

unless it's a nogil defined cython function, we cannot call it multi-threaded, so that's why we cannot use it (even though it makes the code much nicer as you can see in this pull request)

@FrancescElies
Copy link
Contributor

Not that I'm aware, but you could ask in the mailing list

On Sun, Oct 11, 2015 at 10:01 PM, Carst Vaartjes notifications@github.com
wrote:

Sorry! to explain myself: the iterblocks is currently defined here as a
normal python function:
https://github.com/Blosc/bcolz/blob/master/bcolz/toplevel.py

unless it's a nogil defined cython function, we cannot call it
multi-threaded, so that's why we cannot use it (even though it makes the
code much nicer as you can see in this pull request)


Reply to this email directly or view it on GitHub
#53 (comment).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants