Lock cache file with python #168

kairstenfay · 2020-09-17T20:58:06Z

Lock the cache file in python rather than relying on the user knowing to lock at runtime with shell's flock, etc.

Update backoffice cron jobs accordingly

Open questions:
How do we want to handle cache collision?
Right now if I run two REDCap DET ETL processes operating on the same cache, I sometimes get

_pickle.UnpicklingError: pickle data was truncated

Perhaps this means my implementation isn't working as expected?
Which command(s) do we want to use when creating a lock?

kairstenfay · 2020-09-17T20:59:45Z

lib/id3c/cli/command/geocode.py

 from smartystreets_python_sdk.us_street import Lookup
 from smartystreets_python_sdk.us_extract import Lookup as ExtractLookup
 from id3c.cli import cli
+from id3c.cli.command import pickled_cache


This edit can be dropped from this commit.

kairstenfay · 2020-11-17T23:03:55Z

Lock cache file in Python instead of externally with flock

tsibley · 2020-12-21T19:28:44Z

lib/id3c/cli/command/__init__.py

        LOG.info(f"Loading cache from «{filename}»")
        try:
            with open(filename, "rb") as file:
+                lock_file(file)


I can't comment on the implementation of lock_file() yet, but I think the usage here isn't doing what you expect because it's not being called as a context manager. It should be called as:

with lock_file(file): …

tsibley · 2020-12-21T19:41:01Z

I think ultimately we may want to switch the geocoding caching from our in-house solution to diskcache, which I used for the Husky Musher caching. When we first added the caching, I didn't know diskcache existed and, unlike cachetools, didn't uncover it during due diligence.

@joverlee521 asked in Slack:

What's the difference between the FanoutCache used for Musher and the TTLCache used for geocoding?

and I thought it'd be useful to copy my answer here as well:

There's a couple differences:

TTLCache comes from cachetools, which implements lower-level cache primitives than FanoutCache from diskcache.

The TTLCache we use is entirely in-memory during execution. On top of that, we layer our own serialization/deserialization to load/save the cache. In contrast, the FanoutCache (diskcache.Cache subclass) is transactionally-safe and is loaded/saved per access. FanoutCache is a further-specialized subclass of diskcache.Cache to avoid waiting on blocking writes.

TTLCache requires a TTL (expiry). Expirations are optional with FanoutCache, and we don't use them.

I found diskcache last night when looking not to reinvent the wheel. It seems like a very nice package that's robust and well-thought out. We could replace our TTLCache usage with it to make the geocoding caching a lot simpler.

kairstenfay requested a review from tsibley September 17, 2020 20:58

kairstenfay commented Sep 17, 2020

View reviewed changes

kairstenfay force-pushed the redcap-cache branch from 06b84f8 to ea9d469 Compare September 17, 2020 21:06

wip: Write custom flock context manager

77eb693

kairstenfay force-pushed the redcap-cache branch from ea9d469 to 77eb693 Compare November 17, 2020 23:03

kairstenfay changed the title ~~Redcap cache~~ Lock cache file with python Nov 17, 2020

tsibley reviewed Dec 21, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Lock cache file with python #168

Lock cache file with python #168

Uh oh!

kairstenfay commented Sep 17, 2020 •

edited

Loading

Uh oh!

kairstenfay Sep 17, 2020

Uh oh!

kairstenfay commented Nov 17, 2020

Uh oh!

tsibley Dec 21, 2020

Uh oh!

tsibley commented Dec 21, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Lock cache file with python #168

Are you sure you want to change the base?

Lock cache file with python #168

Uh oh!

Conversation

kairstenfay commented Sep 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kairstenfay Sep 17, 2020

Choose a reason for hiding this comment

Uh oh!

kairstenfay commented Nov 17, 2020

Uh oh!

tsibley Dec 21, 2020

Choose a reason for hiding this comment

Uh oh!

tsibley commented Dec 21, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kairstenfay commented Sep 17, 2020 •

edited

Loading