Skip to content

[mi-dbdata-import] Parallelize import jobs#15

Open
SerhatG wants to merge 1 commit intoaerius:mainfrom
SerhatG:parallel
Open

[mi-dbdata-import] Parallelize import jobs#15
SerhatG wants to merge 1 commit intoaerius:mainfrom
SerhatG:parallel

Conversation

@SerhatG
Copy link
Member

@SerhatG SerhatG commented Aug 7, 2025

This is done using GNU Parallel.
It has a lot of useful functionality.
Also handles some edge cases and does error handling properly.

It does mean we have to move the add_json_to_collection function to an include file for it to work. That might need some cleaning up.
With my tests an import job uses about 1.25-1.6 CPUs per job, so went for 60% as the default value.

This is done using GNU Parallel.
It has a lot of useful functionality.
Also handles some edge cases and does error handling properly.

It does mean we have to move the `add_json_to_collection` function to an include file for it to work.
That might need some cleaning up.
With my tests an import job uses about 1.25-1.6 CPUs per job, so went for 60% as the default value.
@SerhatG
Copy link
Member Author

SerhatG commented Aug 7, 2025

Some local numbers:

### Small data job

# Default
Time: 00:51:23
CPU: 125-165%

# 4 jobs
Time: 00:19:57
CPU: 125%-660%

# 60% of the threads (14 jobs on my machine)
Time: 00:11:38
CPU: 125%-2100%

### Bigger data job

# Default
Time: 04:03:23

# 4 jobs
Time: 03:55:06

# 4 jobs - split files
Time: 02:12:23

# 60% of the threads (14 jobs on my machine) - split files
Time: 01:13:15

# 40% of the threads (9 jobs on my machine) - split files
Time: 01:24:57

@SerhatG
Copy link
Member Author

SerhatG commented Aug 7, 2025

Ah the export part is missing for parse_arguments(). But I'll refrain from fixing it, as this might be one of the things that might need to be tidied up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant