Multi-threaded scan

## First findings

- It seems that DuckdDB might be able to read multiple parquet-files in concurrently -- but not one file concurrently

## Thoughts

- In theory, we could do this by copy from with exactly the same number of threads & use each thread the location info of the sheetreader thread.
- Would it be possible to partition excel sheet in 2048 / (number of threads) rows? + make the buffers that size? Probably tricky, because we would have to know the number of columns before (because buffer size / columns is the numbers of rows, which fit into one buffer)

## TODO

A multi-threaded scan would be interesting, since our copy/scan function takes some time.

Have a look at:

https://github.com/duckdb/duckdb_delta/blob/main/src/functions/delta_scan.cpp

According to the README, it supports a multi-threaded scan. I suspect that this doesn't need any new implementation, since they are reading the parquet files. 

- [ ] Find out whether this is due to the parquet files
- [ ] Find out whether DuckDB supports also a multi-threaded scan of Apache Arrow format
- [ ] Have a look at how the multi-threaded scan is implemented
- [ ] Find out whether we could copy concurrently -- this might not be possible, because `sheetreader-core` saves the data in a special way (per thread & some rows are split in multiple threads -- and there is only an implicit order)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-threaded scan #47

First findings

Thoughts

TODO

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multi-threaded scan #47

Description

First findings

Thoughts

TODO

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions