Skip to content

Conversation

@bmerkle
Copy link
Contributor

@bmerkle bmerkle commented Dec 9, 2025

  • Add knowpro/dataclasses.py, which contains a wrapper for pydantic.dataclasses.dataclass that preserves the original class's type -- and change all the imports to use this instead of importing pydantic.dataclasses.dataclass directly.
  • Split interfaces into five thematic parts.

@gvanrossum
Copy link
Collaborator

Interesting. Can you explain why you made these changes? Everything passes without them as well, so why are they necessary? Let me first summarize what you did:

  1. Added knowpro/datackasses.py, containing a wrapper for pydantic.dataclasses.dataclass that has a better type signature; and import it in some places instead of the latter. (Did you replace all imports of the latter?)
  2. Added knowpro/types.py, containing a variant of ConversationDataWithIndexes that uses Any for some of the container fields, and a version of SearchTermGroupTypes that's just Any. And then you import these instead of the originals from interfaces. (Again, do you use these fake ones everywhere?)

How did you determine that precisely these changes were needed, and is there something I can do that shows some check or test failing without these changes? I'm a bit uncomfortable with using the fake types from types.py with all those Anys, and I worry that the fake ConversationDataWithIndexes might be instantiated and then fail an isinstance check against the real class.

I have an open mind, but I can't read your mind regarding the motivation and scope of these changes. Thanks in advance for your explanation!

@bmerkle
Copy link
Contributor Author

bmerkle commented Dec 21, 2025

  1. Added knowpro/datackasses.py, containing a wrapper for pydantic.dataclasses.dataclass that has a better type signature; and import it in some places instead of the latter. (Did you replace all imports of the latter?)

it is a small functionally equivalent wrapper around pydantic.dataclasses.dataclass. The idea was to reduce the circular dependencies and have a small wrapper around the pydantic lib, so we could e.g. exchange it also with another lib/framework in future and/or have a central place where we can react to pydantic api changes. I refactored all imports where it was used.
I checked again and found 4 places we still import pydantic directly (test_utils.py, email_memory.py, podcast.py and transcript.py), but AFAIK the imports are not use, so can be removed safely.

2. Added knowpro/types.py, containing a variant of ConversationDataWithIndexes that uses Any for some of the container fields, and a version of SearchTermGroupTypes that's just Any. And then you import these instead of the originals from interfaces. (Again, do you use these fake ones everywhere?)

ok. I see your point and it is valid. I cut the dependencies and cirucular imports, but we loose the typesafety which is not good.
Loosing the typesafety and contracts is not a good idea. So types.py as it is implemented now it a bad idea.

OTOH interfaces.py is quite large and contains different aspects AFAIS:
(1) conversation + message data structures and TypedDicts (serialization),
(2) indexing protocols and helper dataclasses,
(3) storage-provider interfaces,
(4) search/query models, and
(5) knowledge extraction helpers

So would it make sense to incrementally split up intefaces.py into these different aspects and create own files for them ?

@gvanrossum
Copy link
Collaborator

Yeah, splitting interfaces.py makes sense to me. It was originally one file because the original author of the code (in TypeScript) wanted all public interfaces together, but I think that hasn't worked out as well as we had hoped. I'm okay exporting everything through interfaces but defining the different categories in different files, which interfaces then imports * from.

Regarding data classes, I don't really like the idea of complicating code for the sake of something that may but probably won't happen in the future (look up YAGNI).

I was hoping that the extra type-safety would show somewhere. E.g. areas that are untyped without the change becoming typed with it? Did you see any of that?


from typing import Any, Generic, NotRequired, TypedDict, TypeVar

TMessageData = TypeVar("TMessageData")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's please use the modern notation for generics as used elsewhere in the codebase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added also instructions in AGENTS.md to assert the behavior

Copy link
Collaborator

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one more nit and then I'm ready to approve.

suppress pyright warning reportUnsupportedDunderAll
@gvanrossum gvanrossum merged commit 71312d1 into microsoft:main Dec 28, 2025
3 of 15 checks passed
@gvanrossum
Copy link
Collaborator

@bmerkle Thanks for your patience!

@bmerkle
Copy link
Contributor Author

bmerkle commented Dec 28, 2025

Thanks for your help and feedback. Appreciate a lot :-)

@bmerkle bmerkle deleted the reduce-circular-dependencies branch December 28, 2025 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants