Skip to content

Optimize CsvInsight w/ striped reading/splitting #19

@mpenkov

Description

@mpenkov

Currently, the preprocessor splits the input file into multiple parts (using split). This part runs on a single core, because the splitting in its current form cannot be parallelized.

Modify the splitter to run on multiple cores:

  • Open N files, where N is the number of cores
  • Start N subprocesses to read from the input file
  • Each subprocess reads the input file entirely
  • nth subprocess only writes lines where line_number % N == N

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions