Skip to content

Conversation

@aidandj
Copy link
Collaborator

@aidandj aidandj commented Dec 18, 2025

Pulling in the whole googleapis tree, generating stubs, and running mypy on them.

Found a few things.

Fields named property and collections caused conflicts with @property and collections.abc.Iterable. In theory this means that fields called builtins, typing, or other built in names could cause issues.

To work around this I did 2 things. property -> builtins.property, and all the import names aliased.

import collections.abc-> import collections.abc as _collections_abc

...

things: _collections_abc.Iterable[_builtins.str]

This is similar to how the first party .pyi generator does it (from collections.abc import Mapping as _Mapping), but matching that exactly would have required more architectural changes.

During this switch, I found out that flake8 does ast parsing on string names, so _typing does not get parsed as typing, this caused a few issues, but I added some noqa's to handle those. I figure most people aren't running flake8 on their generated stubs.

Overall the change seems large, but conceptually it's a very minor change. And we're now testing on over 6500 proto files!!!

image

It also turns out that even though mypy-protobuf reserves field numbers 1151-1154, google itself appears to not respect it. 😬

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR integrates the googleapis repository as comprehensive test data for mypy-protobuf, significantly expanding test coverage from a few dozen to over 13,000 proto files. The changes address naming conflicts discovered during this integration where proto field names like property and collections clash with Python built-ins and standard library modules. The solution aliases all imported modules with underscore prefixes (e.g., import collections.abc as _collections_abc) and uses builtins.property for the @property decorator to avoid conflicts.

Key Changes:

  • Modified import aliasing strategy to prefix all standard library imports with underscores
  • Updated field name handling to use builtins.property to avoid conflicts with @property decorator
  • Added googleapis as a third-party test target
  • Extended test exclusions and cleanup procedures for the new test data
  • Added test proto fields (property, collections) to validate the fix

Reviewed changes

Copilot reviewed 77 out of 8800 changed files in this pull request and generated no comments.

File Description
run_test.sh Adds googleapis proto generation, mypy validation, and cleanup steps to test pipeline
pyproject.toml Excludes unittest proto stubs from type checking
proto/testproto/test.proto Adds property and collections fields to test naming conflict resolution
proto/google/protobuf/*.proto Adds Google protobuf test proto files for comprehensive testing

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Aidan Jensen <aidandj.github@gmail.com>
Signed-off-by: Aidan Jensen <aidandj.github@gmail.com>
Signed-off-by: Aidan Jensen <aidandj.github@gmail.com>
Signed-off-by: Aidan Jensen <aidandj.github@gmail.com>
git-subtree-dir: third_party/googleapis
git-subtree-split: 347b0e45a6ec42e183e44ce11e0cb0eaf7f24caa
@aidandj
Copy link
Collaborator Author

aidandj commented Dec 21, 2025

This is similar to how the first party .pyi generator does it (from collections.abc import Mapping as _Mapping), but matching that exactly would have required more architectural changes.

During this switch, I found out that flake8 does ast parsing on string names, so _typing does not get parsed as typing, this caused a few issues, but I added some noqa's to handle those. I figure most people aren't running flake8 on their generated stubs.

I think I'm going to try changing up the imports to be from typing import TypeVar as _TypeVar and see if that fixes the flake8 issues. I think it may simplify things a bit as well, even though its a larger change. And I may as well change message import as well. This will match the built in generator, but would be another breaking change.

@aidandj
Copy link
Collaborator Author

aidandj commented Dec 21, 2025

And I think there's a latent bug in this implementation where import names would conflict.

test.proto.test -> import test.proto.test as _test_proto_test
test.proto_test -> import test.proto_test as _test_proto_test

@aidandj aidandj force-pushed the aidan/third-party branch 2 times, most recently from 81b33c6 to c0824e3 Compare December 22, 2025 06:58
Signed-off-by: Aidan Jensen <aidandj.github@gmail.com>
@aidandj aidandj force-pushed the aidan/third-party branch 2 times, most recently from b0fbcdc to 7706bb1 Compare December 22, 2025 07:02
@aidandj
Copy link
Collaborator Author

aidandj commented Dec 22, 2025

And I think there's a latent bug in this implementation where import names would conflict.

test.proto.test -> import test.proto.test as _test_proto_test test.proto_test -> import test.proto_test as _test_proto_test

This is done. It fixed a ton of possible corner cases, and passes all tests. Lots of field names would have caused issues that won't anymore.

@aidandj
Copy link
Collaborator Author

aidandj commented Dec 22, 2025

protoc generator:

from mypy_protobuf import extensions_pb2 as _extensions_pb2
from testproto.inner import inner_pb2 as _inner_pb2
from testproto.inner import test3_pb2 as _test3_pb2
from testproto.nested import nested_pb2 as _nested_pb2
from testproto import nopackage_pb2 as _nopackage_pb2
from testproto import test3_pb2 as _test3_pb2_1
from google.protobuf.internal import containers as _containers
from google.protobuf.internal import enum_type_wrapper as _enum_type_wrapper
from google.protobuf.internal import python_message as _python_message
from google.protobuf import descriptor as _descriptor
from google.protobuf import message as _message
from collections.abc import Iterable as _Iterable, Mapping as _Mapping
from typing import ClassVar as _ClassVar, Optional as _Optional, Union as _Union

this pr:

import sys
from collections import abc as _abc
from google.protobuf import descriptor as _descriptor
from google.protobuf import message as _message
from google.protobuf.internal import containers as _containers
from google.protobuf.internal import enum_type_wrapper as _enum_type_wrapper
from google.protobuf.internal import extension_dict as _extension_dict
from test import test_generated_mypy as _test_generated_mypy
from testproto import nopackage_pb2 as _nopackage_pb2
from testproto import test3_pb2 as _test3_pb2
from testproto.inner import inner_pb2 as _inner_pb2
from testproto.inner import test3_pb2 as _test3_pb2_1
from testproto.nested import nested_pb2 as _nested_pb2
import builtins as _builtins
import typing as _typing

  - Move to model similar to protoc pyi generator. This prevents name collisions in field names
    - `from test.a.b import c as _c`
    - If multiple names collied, append `_{count}`
      - `from test.a.x import c as _c_1`
  - Fix bug in duplicate package name imports

Signed-off-by: Aidan Jensen <aidandj.github@gmail.com>
* Re-use import aliasing logic for readable stubs and re-exporting.
  * Fixes issues with readable stubs and grpc sync/async

Signed-off-by: Aidan Jensen <aidandj.github@gmail.com>
@aidandj
Copy link
Collaborator Author

aidandj commented Dec 22, 2025

@nipunn1313 I think I'm pretty happy with this and ready for review. It aligns more closely with the built in pyi generator, and fixes a ton of possible corner cases with field naming. It adds 6500+ files for testing, and a few more corner case tests I found. It's another breaking change, but it fixes readable stubs with grpc, allows for more deprecation, and hopefully I can take a break after this one :D

Signed-off-by: Aidan Jensen <aidandj.github@gmail.com>
Signed-off-by: Aidan Jensen <aidandj.github@gmail.com>
@aidandj
Copy link
Collaborator Author

aidandj commented Dec 23, 2025

I just added envoy protos as well, which depended on otel, googleapis, and prometheus as well. Envoy also depends on googleapis protos, so its a bit redundant, but the test time is pretty low, so it didn't feel necessary to remove googleapis.

@nipunn1313
Copy link
Owner

I can't really review this - there's too many changes. I feel like pulling in 6500+ files copied in for testing may make the repo get out of hand?

I feel like doing this exercise is useful to help find bugs, but I'm not really convinced it's helpful for regression testing. Instead as an idea, we could do this exercise to uncover bugs, and then write tests that are specific to the bugs being fixed.

I think it'll make the repo more maintainable in this way.

Maybe we could have instructions on how to do this manual process of testing?

@aidandj
Copy link
Collaborator Author

aidandj commented Dec 23, 2025

I can't really review this - there's too many changes. I feel like pulling in 6500+ files copied in for testing may make the repo get out of hand?

I feel like doing this exercise is useful to help find bugs, but I'm not really convinced it's helpful for regression testing. Instead as an idea, we could do this exercise to uncover bugs, and then write tests that are specific to the bugs being fixed.

I think it'll make the repo more maintainable in this way.

Maybe we could have instructions on how to do this manual process of testing?

Yeah I realized this and pulled the subtree commits. Now it just pulls it in during a test run and cleans up after.

I can squash this history so that the changes aren't in the commits, and that will be easier to review.

I felt that just pulling in the Google protobuf tree was small enough, but I can make that a 3rd party test as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants