Skip to content

Conversation

@AI-ELka
Copy link
Collaborator

@AI-ELka AI-ELka commented Sep 28, 2025

No description provided.

@AI-ELka AI-ELka requested a review from orichardson September 28, 2025 20:19
@josephdviviano
Copy link
Collaborator

can you resolve merge conflicts?

@AI-ELka
Copy link
Collaborator Author

AI-ELka commented Oct 1, 2025

can you resolve merge conflicts?

I noticed that Mehran's attention code was merged into main (that's where the conflict came from), so I guess I can keep this attention implementation just in case we need it. I was coded it because I wasn't able to solve a problem in the other attention implementation.

@orichardson
Copy link
Owner

@AI-ELka Can you help us determine whether or not that problem persists in the implementation that's currently on the main branch? I've forgotten the details of the issue.

@AI-ELka
Copy link
Collaborator Author

AI-ELka commented Oct 2, 2025

@AI-ELka Can you help us determine whether or not that problem persists in the implementation that's currently on the main branch? I've forgotten the details of the issue.

The main problem we had was that the loss remained constant (in a case where it should decrease), but after testing now with the code, this problem seems to be gone.
One thing that pops up now when testing with "uniform" (and not with "from_cpd" or "random") is an assertion error coming from:

print(f"Any unfrozen edge changed? {any_changed}")
assert any_changed, "No learnable edges changed; attention/control masks may be misapplied."

So the main problem we had seems to have been solved, but we now have this issue with the assertion error when using uniform initialization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants