Split multipolygons in source dataset into separate polygon files with separate bounding boxes to improve perf#12
Split multipolygons in source dataset into separate polygon files with separate bounding boxes to improve perf#12NikoRoberts wants to merge 5 commits intozverok:masterfrom
Conversation
|
@zverok seems like a good pr to me. @NikoRoberts are you running this in prod? |
|
I am interested in seeing this update as well 👍 |
|
Did a slightly different benchmark using predictable coordinates instead of random. Nice performance improvement! iterations = 1000
Benchmark.bmbm do |x|
x.report("lookup!") do
iterations.times do |i|
lat = (75 - (150.0/i))
lng = (150 - (300.0/i))
WhereTZ.lookup(lat, lng) rescue ArgumentError
lat = (-75 + (150.0/i))
lng = (150 - (300.0/i))
WhereTZ.lookup(lat, lng) rescue ArgumentError
lat = (75 - (150.0/i))
lng = (-150 + (300.0/i))
WhereTZ.lookup(lat, lng) rescue ArgumentError
end
end
end |
|
Unfortunately, I should aknowledge that I have no resources currently to support this gem. (Proper support will require at least updating at every timezone-boundary-builder data update, which haven't been done for quite some time already.) I am not using it myself currently, so it is very hard for me to estimate the consequences of this PR properly (whether "skipping the oceans in most cases" is sensible enough, whether having 1400 files is acceptable etc.) In addition, I currently believe that the approach with "read from many JSON files" is childishly naive, there probably should be better ways. So... Sorry to disappoint you, and thanks for all the hard work, but that's all I can say as of now 🤷 |
|
Timezonefinder also removes oceans and Antarctica timezones. There are "shortcuts" in the binary data, which I guess are manually maintained bounding boxes around areas in the same time zone or something along those lines. If we added simplified bounding boxes that were searched first, we could probably get similar performance for the vast majority of coordinates. |
|
I threw together a gem that wraps a Rust implementation that seems to be quite a bit faster: https://github.com/HarlemSquirrel/tzf-rb https://gist.github.com/HarlemSquirrel/701f83783486794001a75cb8cf5576ba |
Hi @zverok, really appreciate the work you've done with this gem and wanted to say thank you. I hope you are safe in these crazy times 🙏 💙 💛
I've done some work to improve performance of the multipolygon data that some timezones contain. By splitting the geojson features into single polygons the search algorithm you created can be much faster (2x or so). The 2020d data I believe had more multipolygon data, which meant it slowed things down. There is a slight increase in parsing 3.5x the number of files to find the matching bounding boxes, but this is offset by the reduction in cost from having to do the
contains_point?calcs on many more polygons than actually needed.Running benchmarking on Ruby 3 with Benchmarkbmbm to minimise any impact of garbage collection
WhereTZ 0.0.5
WhereTZ 0.0.6
WhereTZ NikoRoberts fork with oceans
WhereTZ NikoRoberts fork without oceans