Skip to content

Conversation

@thalman
Copy link
Contributor

@thalman thalman commented Sep 19, 2025

Implement support for range retrieval of attribute values.
When a multi-valued attribute has a large number of values Active
Directory response only with first batch of values. A common case
is the memberOf attribute when a user belongs to many groups.
This is controlled by AD MaxValRange values (default 1500) policy.

Range retrieval is indicated in the response and provides information
which part of the batch is provided (e. g., memberOf;range=0-1499).

SSSD must then perform additional requests to fetch the remaining
values (e. g., memberOf;range=1500-*)..

The implementation is written to allow extension to other search types
(e. g., root DSE search), although that is not currently required.

Resolves: #8102

@thalman thalman changed the title Draft: Range retrieval Range retrieval Sep 22, 2025
@thalman thalman marked this pull request as ready for review September 22, 2025 12:00
@alexey-tikhonov alexey-tikhonov self-requested a review September 25, 2025 12:35
@alexey-tikhonov alexey-tikhonov self-assigned this Sep 25, 2025
@thalman
Copy link
Contributor Author

thalman commented Sep 25, 2025

I realized that I need also update a man page. Coming soon as separate commit.

@alexey-tikhonov
Copy link
Member

Looks like man: Update range retrieval information commit includes stray changes.

@thalman thalman force-pushed the range_retrieval branch 2 times, most recently from 8273797 to 3b10082 Compare October 21, 2025 20:08
@thalman
Copy link
Contributor Author

thalman commented Oct 21, 2025

Hi guys, I incorporated Alexey's comments. We also have a test, thanks @jakub-vavra-cz.

Copy link
Member

@pbrezina pbrezina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failure in test_access_filter__ldap_attributes_approximately_greater_and_less_than_permits_user_login does not seem to be related.

@pbrezina
Copy link
Member

pbrezina commented Nov 4, 2025

@thalman The failure does not seem to be related, but the result is consistent after a retry and CI on master branch is green. Please, take a closer look at it.

@pbrezina pbrezina force-pushed the master branch 3 times, most recently from f5d64b3 to b854636 Compare November 4, 2025 14:27
client.sssctl.cache_expire(everything=True)
client.host.conn.run(f"id -u big-user@{client.sssd.default_domain}")
grps = client.host.conn.run(
f"ldbsearch -H /var/lib/sss/db/cache_{client.sssd.default_domain}.ldb"
Copy link
Member

@alexey-tikhonov alexey-tikhonov Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakub-vavra-cz, why do we need to read cache directly?
What's wrong with using id $user output?

Copy link
Contributor Author

@thalman thalman Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id $user is too complex and runs more ldap searches. That can add missing parts of the object. The test was not working when we tried it that way, maybe even unstable (Am I right @jakub-vavra-cz?).

This way it provides stable and correct result.

Copy link
Member

@alexey-tikhonov alexey-tikhonov Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I got it. This is because you look for originalMemberOf that is obtained as a multi-value attr.
getgrouplist() will use LDAP search by member/memberUid and will return a multiple objects instead, not what this test want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if originalMemberOf is actually used at all.
getgrouplist() for AD is usually obtained via token-groups search.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So perhaps this test tests something that is not important.
Relies on an internal implementation details and thus prone to be broken.

"""
)
client.sssctl.cache_expire(everything=True)
client.host.conn.run(f"id -u big-user@{client.sssd.default_domain}")
Copy link
Member

@alexey-tikhonov alexey-tikhonov Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakub-vavra-cz , why not client.tools.id()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And why -u?

Copy link
Contributor Author

@thalman thalman Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the -u guaranties that we perform just one ldapsearch asking for user object. Not resolving all the group where user is member of.

This is also why we can't use client.tools.id()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this will only call getpwnam() under the hood.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just do client.tools.getent(), it will have the same effect and will avoid custom 'conn.run()'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue that I have with "high level tools" at the moment is that they work inconsistently with old version of SSSD. So if we break range retrieval, such test may or may not fail. This is why I decided to check cache content. At the moment I do not have reproducer for this issue that is stable.

I'm exploring some ideas to have the large group test case as well but I doubt that we can do more in this one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue that I have with "high level tools" at the moment is that they work inconsistently with old version of SSSD. So if we break range retrieval, such test may or may not fail. This is why I decided to check cache content.

You check for the cache content because you check 'originalMemberOf' that is not exposed otherwise.

Still you can use client.tools.getent() because it is essentially the same as id -u.

If it is not the same then I miss something and would like to understand what is it.

@pytest.mark.importance("high")
def test_range_retrieval__group_membership(client: Client, ad: AD):
"""
:title: Users with with more groups than MaxValRange can be retrieved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakub-vavra-cz, could you please also add a test for a case "a group with more than MaxValRange users"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexey-tikhonov I had that exact test initially as it is how I tried to reproduce the issue
but it was not working properly. The test was mostly stable and the failures happening randomly were not deterministic and could be attributed to large data flying over network on from under-speced machines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not know at the moment how to test it reliably (the test sometimes succeeded with old SSSD version without range retrieval) but I will think of it. I suggest not to wait and do that in separate PR once we figure out how to actually test that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then for a start, I need to understand something.

Does AD really store group membership in user objects directly?

IIUC, that's not the case with OpenLDAP/DS (at least by default).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right:

In Active Directory, memberOf is a "virtual" or "back-link" attribute that the server calculates and maintains automatically.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be attributed to large data flying over network

What is the difference '152 values of memberOf attr' vs '152 values of member/memberUid attr'?

Did you try using 'getent group'?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexey-tikhonov See: bc577fe
This is how I was trying to reproduce with group containing 1502 members, but the failures were random and inconsistent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakub-vavra-cz , can you try the same but with size = 152?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo, this is more important use-case to cover than 'originalMemberOf' as the latter, likely, is not really used.

@alexey-tikhonov
Copy link
Member

alexey-tikhonov commented Nov 4, 2025

It would be great to have also @sumit-bose review for the most important patch - "sdap: implement range retrieval for generic search"

Copy link
Member

@pbrezina pbrezina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, ack.

thalman and others added 5 commits November 6, 2025 15:16
Implement support for range retrieval of attribute values.
When a multi-valued attribute has a large number of values Active
Directory response only with first batch of values. A common case
is the `memberOf` attribute when a user belongs to many groups.
This is controlled by AD `MaxValRange` values (default 1500) policy.

Range retrieval is indicated in the response and provides information
which part of the batch is provided (e. g., memberOf;range=0-1499).

SSSD must then perform additional requests to fetch the remaining
values (e. g., `memberOf;range=1500-*`)..

The implementation is written to allow extension to other search types
(e. g., root DSE search), although that is not currently required.

Resolves: SSSD#8102
When performing range retrieval Active Directory guaranties that the
follwing response does not contain overlaps (in the same connection).

It is expensive to compare all the attribute values against each other.

We can relay on AD and append the subsequent response without checking.
@thalman
Copy link
Contributor Author

thalman commented Nov 10, 2025

@pbrezina, @alexey-tikhonov - I incorporated comments and also fixed the crash that I introduced and was discovered by one test. Feel free to re-check

@alexey-tikhonov
Copy link
Member

All the comments I have are around the test, but I'd like to have those addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Range retrieval of AD multi-value attributes

5 participants