Optimize Mgrep dictionary generation process

Currently, the dictionary is re-generated every time an ontology (submission) is processed. This process takes over an hour due to retrieving a huge data structure from Redis in a single call:

https://github.com/ncbo/ncbo_annotator/blob/master/lib/ncbo_annotator.rb#L122

There is room for optimization here. Possible avenues to pursue:

1. Incremental dictionary file population

We may not need to rebuild the dictionary file for the entire system on every ontology parse. Updating it incrementally may drastically improve performance

2. Retrieve data from Redis in an iterative way:

Instead of using `all = redis.hgetall(dict_holder)`, it's possible to iterate of the data structure using `SCAN`:

```
          cursor = 0
          loop do
            cursor, key_values = redis.hscan(dict_holder, cursor, count: 1000)
            @logger.info cursor if cursor.to_i % 1000 == 0
            break if cursor == "0"
          end
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Mgrep dictionary generation process #15

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimize Mgrep dictionary generation process #15

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions