Multi-core support for bquery

After missing groupby with bcolz for some time I was excited to find this interesting project. To get started, I was looking at the `unique` method.

I found some interesting timing results with a 12-character string column in my database:

```
import blaze
import bquery
import bcolz
from multiprocessing import Pool
p = Pool()

db = bquery.open(...)

%%timeit
db.unique('my_col')
--> 5.5 sec (uses only one core)

db = bcolz.open(...)
d = blaze.Data(db)

%%timeit
blaze.compute(d['my_col'].distinct(), map=p.map)
--> 3.32 sec (using 2 cores on my dual core machine)

%%timeit
blaze.compute(d['my_col'].distinct())
--> 7.69 sec (using only one core)
```

It appears that parallel processing with blaze provides a fairly significant speedup with my database given its inherent overhead. Is it conceivable to parallelise the bquery code?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multi-core support for bquery #17

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multi-core support for bquery #17

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions