Skip to content

🚧 LINQ + classes support#584

Draft
atifaziz wants to merge 20 commits intotonybaloney:mainfrom
atifaziz:linq
Draft

🚧 LINQ + classes support#584
atifaziz wants to merge 20 commits intotonybaloney:mainfrom
atifaziz:linq

Conversation

@atifaziz
Copy link
Copy Markdown
Collaborator

@atifaziz atifaziz commented Jul 20, 2025

This PR is a super early draft of an API and a vision to eventually help address custom classes being discussed in #397. One idea builds on top of another and they can be forked and brought in separately. Meanwhile, I think it helps to see them in action together and where one feature can help drive the shape of another.

The two features are:

  • Computational expressions for ad-hoc PyObject conversions via LINQ
  • Conversion of Python classes to C# (Custom types/classes #397) in a way that's flexible, type-safe, doesn't use any reflection whatsoever

The first abstraction being introduced is a PyObject reader:

public interface IPyObjectReader<out T>
{
    T Read(PyObject obj);
}

Then, a whole set of implementation methods are housed in PyObjectReader that enable computational expressions over PyObject. The computational expressions are integrated into C# via LINQ such that one can express something as simple as the following:

var reader =
    from x in PyObjectReader.Int64
    select x * 2;

using var obj = PyObject.From(42);

var result = reader.Read(obj);

Note that reader is defined before and independent of any PyObject instance, which enables some interesting things. A second abstraction is for types that know how to read or convert themselves from a PyObject:

public interface IPyObjectReadable<out TSelf>
    where TSelf : IPyObjectReadable<TSelf>
{
    static abstract IPyObjectReader<TSelf> Reader { get; }
}

Now suppose you have the following in Python (the dataclass below could be a Pydantic model too):

from dataclasses import dataclass

@dataclass
class FooBarBaz:
    foo: int
    bar: str | None
    baz: list[int]
    qux: tuple[int, str]
    quux: dict[str, int]

In C#, you can define the desired shape and provide a reader expressed in LINQ that knows exactly how to convert one into the other:

using static CSnakes.Linq.PyObjectReader;

public sealed class FooBarBaz : IPyObjectReadable<FooBarBaz>
{
    public long Foo { get; private init;  }
    public string? Bar { get; private init; }
    public required ImmutableArray<long> Baz { get; init; }
    public (long, string) Qux { get; private init; }
    public required ImmutableDictionary<string, long> Quux { get; init; }

    public static IPyObjectReader<FooBarBaz> Reader { get; } =
        from foo in GetAttr("foo", Int64)
        from bar in GetAttr("bar", String)
        from baz in GetAttr("baz", List(Int64, ImmutableArray.CreateRange))
        from qux in GetAttr("qux", Tuple(Int64, String))
        from quux in GetAttr("quux", Dict(Int64, ImmutableDictionary.CreateRange))
        select new FooBarBaz
        {
            Foo = foo,
            Bar = bar,
            Baz = baz,
            Qux = qux,
            Quux = quux,
        };
}

What's nice is that the C# type doesn't have to match the shape of the Python type. The names and types can differ as demonstrated above.

I also went ahead and update the source generator so that you can express in Python what the resulting C# class will be via an Annotated type:

from dataclasses import dataclass
from typing import Annotated, Union

@dataclass
class FooBarBaz:
    foo: int
    bar: Union[str, None]
    baz: list[int]
    qux: tuple[int, str]
    quux: dict[str, int]

def foo_bar_baz() -> Annotated[FooBarBaz, "c#:Integration.Tests.RichReturnTests.FooBarBaz"]:
    return FooBarBaz(1, "hello", [1, 2, 3], (42, "world"), { "foo": 1, "bar": 2, "baz": 3 })

def foo_bar_baz_list() -> list[Annotated[FooBarBaz, "c#:Integration.Tests.RichReturnTests.FooBarBaz"]]:
    return [foo_bar_baz()]

def foo_bar_baz_dict(key: str) -> dict[str, Annotated[FooBarBaz, "c#:Integration.Tests.RichReturnTests.FooBarBaz"]]:
    return { key: foo_bar_baz() }

The annotation is honoured in the return type position only, including being deeply nested in container types like list and dict (shown above). If the source generator sees a return type with metadata that contains a string literal starting with c#: then the rest of the type name is taken verbatim to identify a C# class that implements IPyObjectReadable<T>. The generated code will automatically use the reader of the type to return the strong-typed class! If you look at the integration test, the C# side only sees FooBarBaz and natural .NET types:

[Fact]
public void Test()
{
    var module = Env.TestRichReturn();
    var result = module.FooBarBaz();
    Assert.Equal(1, result.Foo);
    Assert.Equal("hello", result.Bar);
    Assert.Equal([1L, 2L, 3L], result.Baz);
    Assert.Equal((42L, "world"), result.Qux);
    Assert.Equal(3, result.Quux.Count);
    Assert.Equal(1, result.Quux["foo"]);
    Assert.Equal(2, result.Quux["bar"]);
    Assert.Equal(3, result.Quux["baz"]);
}

It is still possible to build on top of all of this where attributes can be decorated on a class:

[PyObjectReadable]
public partial class Person
{
    public required string FirstName { get; set; }
    [PythonName("last_name")]
    public required string Surname { get; set; }
}

and the source generator will complete with another partial definition holding the IPyObjectReadable<Person> implementation.

Pros of this design are:

  • Strong-typed
  • Reflection-free (unlike Implement basic class reflection #404)
  • NativeAOT support
  • LINQ is familiar to C# developers
  • Readers are extensible and composable for maximum reusability
  • C# developer is in full control of the shape of the type brought back (can be class/struct, record or not) and the source generator can tightly integrate it
  • If the C# class name changes, the generated code will fail to compile unless the quoted Python name is also changed, avoiding run-time errors

Cons of this design are:

  • The computational expressions add virtual dispatch costs, but trumps the cost of reflection
  • If type member names or types change on the Python side, you can still get a run-time failure (could be somewhat mitigated in the future)

Historical design notes from before merge of PR #728

I also went ahead and update the source generator so that you can express in Python what the resulting C# class will be via a magic forward reference:

from dataclasses import dataclass

@dataclass
class FooBarBaz:
    foo: int
    bar: str | None
    baz: list[int]
    qux: tuple[int, str]
    quux: dict[str, int]

def foo_bar_baz() -> "__extern__.Integration.Tests.RichReturnTests.FooBarBaz":
    return FooBarBaz(1, "hello", [1, 2, 3], (42, "world"), { "foo": 1, "bar": 2, "baz": 3 })

def foo_bar_baz_list() -> list["__extern__.Integration.Tests.RichReturnTests.FooBarBaz"]:
    return [foo_bar_baz()]

def foo_bar_baz_dict(key: str) -> dict[str, "__extern__.Integration.Tests.RichReturnTests.FooBarBaz"]:
    return { key: foo_bar_baz() }

The parser is updated to allow [forward references] in the return type position (only), including being deeply nested in container types like list and dict (shown above). If the source generator sees a forward reference (a quoted string) and it begins with the magic module name __extern__ then the rest of the type name is taken verbatim to identify a C# class that implements IPyObjectReadable<T>. The generated code will automatically use the reader of the type to return the strong-typed class! If you look at the integration test, the C# side only sees FooBarBaz and natural .NET types:

[Fact]
public void Test()
{
    var module = Env.TestRichReturn();
    var result = module.FooBarBaz();
    Assert.Equal(1, result.Foo);
    Assert.Equal("hello", result.Bar);
    Assert.Equal([1L, 2L, 3L], result.Baz);
    Assert.Equal((42L, "world"), result.Qux);
    Assert.Equal(3, result.Quux.Count);
    Assert.Equal(1, result.Quux["foo"]);
    Assert.Equal(2, result.Quux["bar"]);
    Assert.Equal(3, result.Quux["baz"]);
}

The forward reference is really a hack. If we go ahead with this overall design then the parser should be enhanced to support type annotation metadata via typing.Annotated, so that the external reference is more naturally expressed as metadata (keeping Python type-checkers happy too):

from typing import Annotated, Any

def foo_bar_baz() -> Annotated[Any, "__extern__.Integration.Tests.RichReturnTests.FooBarBaz"]:
    ...

- src/CSnakes.Runtime/Python/PyObjectImporters.cs
- src/CSnakes.SourceGeneration/Parser/PythonParser.TypeDef.cs
- src/CSnakes.SourceGeneration/Reflection/TypeReflection.cs
- src/CSnakes.SourceGeneration/ResultConversionCodeGenerator.cs
-	src/CSnakes.Tests/GeneratedSignatureTests.cs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant