Skip to content

Conversation

@SemyonSinchenko
Copy link
Collaborator

Close #21

It looks like it works:
image

Changes:

  • I refactored python part to simplify the code and reduce code-duplication;
  • After analyzing the joins logic I realized that the simplest way would be to pass pre-generated keys to the rust part;
  • Generation of joins is slightly slower from now because of doing magic with keys in numpy on the python side;

I tried SMALL and MEDIUM generations only, I don't think that my laptop is able to handle generation of BIG.

I'm trying to realize how keys manipulation can be faster on the python side.
@MrPowers Could you please try it? I'm not very familiar with this benchmark queries, I just tried one. Anyway you are the only person who can approve it ;)

@zhuqi-lucas cc

@SemyonSinchenko SemyonSinchenko added the enhancement New feature or request label Feb 12, 2025
@SemyonSinchenko SemyonSinchenko self-assigned this Feb 12, 2025
@zhuqi-lucas
Copy link

Thank you @SemyonSinchenko !

@SemyonSinchenko
Copy link
Collaborator Author

I cannot reach @MrPowers. Theoretically, I can bypass branch protection and merge. @zhuqi-lucas how urgent is this for you?

@zhuqi-lucas
Copy link

Thank you @SemyonSinchenko for your work, i also think we can merge it, and i can try it in apache datafusion as a follow-up, and if we meet new errors, we can push another fix, do you think so?

@SemyonSinchenko SemyonSinchenko merged commit 9961be0 into main Feb 24, 2025
39 checks passed
@SemyonSinchenko
Copy link
Collaborator Author

@zhuqi-lucas Merged & Released to PyPI (0.0.3). Feel free to ping me in case of any issues!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Join datasets have values that seem off

3 participants