-
-
Notifications
You must be signed in to change notification settings - Fork 4.7k
perf(github): Cache accessible repos for accessibleOnly search #112548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
1d3c1d8
07de4ec
b41e1c5
c06ce7a
0261ee2
b8dd4e2
434bdcb
1589a68
d642bfb
a594905
c0987be
5b01fd6
c074271
b24d6f5
da09de1
68df5e6
155c015
3bf81a4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -226,6 +226,7 @@ def get_repositories( | |
| query: str | None = None, | ||
| page_number_limit: int | None = None, | ||
| accessible_only: bool = False, | ||
| use_cache: bool = False, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. use_cache parameter is accepted but ignored in GitHub Enterprise integration The VerificationRead github_enterprise/integration.py lines 224-254 to confirm use_cache parameter is never referenced in method body. Read github/integration.py lines 355-376 to see correct implementation using use_cache. Read organization_integration_repos.py line 76 to confirm use_cache=accessible_only is passed to all integrations. Identified by Warden
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fix attempt detected (commit 3bf81a4) The use_cache parameter was added to the GitHub Enterprise get_repositories signature for interface compliance, but the method implementation still ignores it and unconditionally calls get_client().get_repos() without any conditional caching logic, identical to the before state in the critical execution paths. The original issue appears unresolved. Please review and try again. Evaluated by Warden
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this change is siloed to github only, not GHE |
||
| ) -> list[RepositoryInfo]: | ||
| if not query: | ||
| all_repos = self.get_client().get_repos(page_number_limit=page_number_limit) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that rather than making the cache use conditional on
accessible_only, we should haveuse_cache=False. We should have an assert likeassert not use_cache or not querysince the cache doesn't work with the queryUh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little confused by this comment because the cache is set up to work with the
accessible_onlyvariant of querying.accessible_onlyis a means to query accessible repos, default query behavior was exposing repos that sentry does not necessarily have access to (public repos not selected during installation configuration).the cache in its current form its only enabled for the
accessible_onlypath because I didn't want to alter behavior for existing consumers.that said cache could instead be opt in and applied to either the
not querypath or theaccessible_onlypath, is that along the lines of your thinking?Further changes here are introduced with pagination support #112591
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main point is that it doesn't really make sense to make the cache specific to accessible only. We're caching all the repos, and then we're filtering to accessible only in python. So if we add
use_cache=False, then in your usages you passuse_cache=True, is_accessible=Trueit's more general and allows us to use the cache in other places later, if we want to. In general, I feel like Claude tends to do things in an overly specific way so I just wanted to guide us away from that here.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I will alter things so that cache can be applied to either the
not queryoraccessible_onlypaths.Filtering of accessible is not handled via python filtering (its a different github api path), the "archived" filter is kind of a red herring there.
accessible vs non accessible:
accessible repo fetch
client.get_repos(accessible agnostic repo search
client.search_repositories(There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry, I misread here... although I'm confused about the split that exists in the current code because it looks like it fetches all repos from github and then filters out archived repos when it's accessible.
It seems like if there's no query, we should always be using
self.get_client().get_repos? I'm not totally sure why we'd use the search api when we're fetching everythingThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, discussed on slack and I understand what's going on a bit better here now.
Yeah, I think it still makes sense to have
use_cachebe explicit. Then wheneveris_accessibleis true, we can optionally use the cache or not based on it, and just haveassert not use_cache or not is_accessibleto make sure that we don't confuse folks who useis_accessible.Probably it'd be nice to get rid of
is_accessiblecompletely but idk if we're relying on the behaviour implicitly somewhere.