-
|
Hello, I would like to deploy mosparo on Kubernetes/OpenShift with at least two pods. The multi-node setup documentation comes in handy and indicates various considerations:
Am I missing anything? Is the replicated shared cache a necessity for multi-node setup? Thanks |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 8 replies
-
|
Hi @gnieser Thank you very much for your question. Your setup sounds perfect! You didn't mention the database, but other than that, you didn't miss anything. I would always recommend synchronizing Shared cacheThat's a bit of a complicated topic. No, a shared cache is not necessary as long as you synchronize the cache between all nodes (the best is to synchronize the whole The problem in a multi-node setup without a synchronized cache is that each node has a different database cleanup timestamp and will execute that cleanup anyway. The reason for that is that the frontend API will trigger the cleanup automatically, even if you use the cronjob to clean it at a specific time. If you synchronize the file cache between the nodes, you should have no problem since the timestamp is synchronized. The next issue can be that the frontend API controller executes the cleanup at the same time since the locking of the cleanup is also done in the cache, and that was not synchronized in time. For this problem, we have the environment variable Since file synchronization is complex (if you do it by yourself, not with Kubernetes), especially fast (regarding how long it takes to synchronize the file after creating/updating it), a shared cache can be an excellent alternative. Memcached is pretty easy since it's distributed without a master/replica setup. But, on the other hand, the cache data is not synchronized between the Memcached nodes. For this, you could use mcrouter. Or accept that one of the memcached servers can forget the data. In that case, the data will be recreated. You could even use only one Memcached server since it is not critical to run mosparo (see note below). FutureIn the future, we will have other use cases where we can/should use the shared cache. One use case is to store the uploaded file for the import functionality. If you have multiple nodes and no sticky session in the load balancer, the file may not be synchronized in time to the other node to process the import functionality, so the import will fail. If the file is stored in the shared cache, all servers can access it. SummarySo, long story short: You do not need a shared cache, but you should ensure you synchronize the cache ( If you don't do that, the most significant negative effect it can have is that the cleanup logic will be executed on both nodes (so it's not a dramatic problem, honestly). But in this scenario, you should execute the cleanup cron job on both nodes independently. NoteWhile writing this response, I found a problem in the cleanup logic. It will pass through the Exception if the Memcached/Redis server is unavailable, leading to error 500 in the frontend API. I will add a fix for that in v1.3.4. Additionally, I saw that Memcached offers an option to create automatic replicas from your data. I will investigate that, too. I also just discovered that we clear the cache after updating mosparo, so we also remove the timestamp at which the next cleanup should be executed. We may have to think about storing that information in the database instead. I hope my answer helps you. Let me know if you have more questions. Kind regards, zepich |
Beta Was this translation helpful? Give feedback.
Hi @gnieser
The new version, v1.3.4, is released.
Now you can specify a path in the
FILESYSTEM_CACHE_PATHenvironment variable, which mosparo will use for the shared file system cache.Synchronize that directory between the nodes, and it should work as expected.
Once again, sorry for the misinformation. Looking forward to your feedback.
Kind regards,
zepich