Use ujson for faster job execution

Various MR jobs are using the built-in json or simplejson packages for deserializing json payloads. Switching to ujson gives a significant speed-up.

I have a single day of the Firefox update hotfix payloads cached locally. There are 627,404 records that lz4 decompress to 11,655,091,683 bytes. I have a dead simple MR script that performs a JSON deserialize and extracts a single value from the payload and combines it.

Here is the performance of that job with 8 concurrent processes on 4+4HT cores with various JSON implementations.
### built-in json

real    0m54.250s
user    6m26.780s
sys 0m9.834s

real    0m53.691s
user    6m22.161s
sys 0m9.710s

real    0m52.698s
user    6m14.038s
sys 0m9.596s
### simplejson

real    0m34.825s
user    4m7.692s
sys 0m7.125s

real    0m34.218s
user    4m3.766s
sys 0m7.055s

real    0m34.830s
user    4m4.105s
sys 0m7.043s
### ujson

real    0m26.212s
user    3m6.775s
sys 0m5.789s

real    0m27.636s
user    3m16.358s
sys 0m6.077s

real    0m28.094s
user    3m18.188s
sys 0m6.227s
### Averages

The averages for CPU time is:

json: 391s
simplejson: 252s
ujson: 200s

`lzma --decompress --stdout` on this data set takes about 83s of CPU time.

As the data demonstrates, ujson is significantly faster than simplejson and will thus make Telemetry jobs faster and more efficient.

My data should not need validation: any Google search on "Python json benchmark" will tell you others have reached the same conclusion that ujson is the bomb.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use ujson for faster job execution #70

built-in json

simplejson

ujson

Averages

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Use ujson for faster job execution #70

Description

built-in json

simplejson

ujson

Averages

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions