Skip to content

Releases: openaleph/openaleph-search

v5.1.2

19 Dec 08:33
3868afc

Choose a tag to compare

v5.1.0

04 Nov 16:34
858b124

Choose a tag to compare

5.1.0 Release

This now has a comprehensive technical documentation at https://openaleph.org/docs/lib/openaleph-search/

New feature

  • Entity tagging in #2

Breaking changes

After running OpenAleph 5.0.x for some time on our big instance, we noticed some performance improvements and a slight restructuring of the index would benefit even more the future development.

What’s changed:

  • Reduced complexity of elasticsearch ingest/analyze pipeline
  • Introduce a dedicated Page index that only stores child pages of documents. By separating this from the existing Pages index (introduced in 5.0) this decreases storage costs as we don’t need to store full text here for highlighting, as opposed to the Pages entity (which is the parent of 1 or more Page entities.)

Because of the changed analyzers and the new Page index, this requires reindexing.

About reindexing in general

We know that reindexing a big Aleph instance isn't convenient and that's why it was avoided in previous versions to introduce breaking changes on the index. But we realized to move forward in adding new features, reindexing can't be a blocker. That's why we worked on the indexing module to be multi-threaded which speeds up the re-indexing. As well reindexing will happen in the future more often and we are actively working on it making it as fast and efficient as possible, so that it is not a blocker anymore.

We are indexing more information now, most notably about person and company names. We acknowledge that storage is always an issue and there has always been the tendency to keep the Aleph index size at its minimum to save storage costs for the cost of features (e.g. highlighting for documents, improved name-matching), but we decide that if you want to find things, you need to index things.

Full Changelog: v5.0.5...v5.1.0

v5.0.0

01 Sep 14:42
56af46d

Choose a tag to compare

This is the first production release for OpenAleph 5 versions.

openaleph-search is the new standalone-package for indexing and search FollowTheMoney entities for OpenAleph and similar applications and takes this application logic out of the main OpenAleph app.

Summary

  • Upgrade codebase to use FollowTheMoney 4.x.x
  • Handle names using rigour, remove fingerprints (Read more about names)
  • Use a routing key (the collection_id) to ensure that all entities in a certain collection are assigned to the same shard, to speed up searching within a collection
  • Use ElasticSearch 9, change index structure, improve search and highlighting (see future blog post)
  • Expose CLI commands

Breaking changes from OpenAleph 3 / Aleph 4

  • New index structure with rewritten mappings. This requires re-building the indices. While you are on it, just upgrade to Elasticsearch 9. Contact us or open an issue for any upgrade related topics.

Full Changelog: v0.0.8...v5.0.0