forked from ontoportal/ncbo_annotator
-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Labels
Description
Currently, the dictionary is re-generated every time an ontology (submission) is processed. This process takes over an hour due to retrieving a huge data structure from Redis in a single call:
https://github.com/ncbo/ncbo_annotator/blob/master/lib/ncbo_annotator.rb#L122
There is room for optimization here. Possible avenues to pursue:
- Incremental dictionary file population
We may not need to rebuild the dictionary file for the entire system on every ontology parse. Updating it incrementally may drastically improve performance
- Retrieve data from Redis in an iterative way:
Instead of using all = redis.hgetall(dict_holder), it's possible to iterate of the data structure using SCAN:
cursor = 0
loop do
cursor, key_values = redis.hscan(dict_holder, cursor, count: 1000)
@logger.info cursor if cursor.to_i % 1000 == 0
break if cursor == "0"
end
Reactions are currently unavailable