Where is output linear ? #6

gaetansnl · 2022-12-09T16:53:23Z

gaetansnl
Dec 9, 2022

Hello, @awf
Sorry if my question is naive, but I'm not sure where to find this in your implementation:
https://github.com/vpj/jax_transformer/blob/521d67e9160a6362a18e68e6b3aeafc270d40ad0/transformer.py#L588

Also why don't you use batched matmul instead of looping over heads

Thanks !

awf · 2022-12-09T17:17:41Z

awf
Dec 9, 2022
Maintainer

It's here:
https://github.com/awf/functional-transformer/blob/main/transformer.py#L199

Or am I misunderstanding your question?

And I don't use batched matmul because I the code is intended for educational use: to be clear and easy to read. In an ideal world, the JAX/XLA compiler would transform the graph to use batched operations, but I haven't checked if it does so.

0 replies

gaetansnl · 2022-12-09T18:33:44Z

gaetansnl
Dec 9, 2022
Author

Thank you for your answer. For me the line you sent me is here https://github.com/vpj/jax_transformer/blob/521d67e9160a6362a18e68e6b3aeafc270d40ad0/transformer.py#L741

4 replies

awf Dec 10, 2022
Maintainer

I see now. In the 'object oriented' nnModule form, that operation is essentially defined at line 741, and called at line 755. In the functional form, it's called just once, at 199 (sorry for no hyperlinks, I'm answering on phone, will add more later)

gaetansnl Dec 12, 2022
Author

Maybe I'm completely missing something, but here is how I read the code. In the module implementation and in original paper:

I see linear for Q, K and V
https://github.com/vpj/jax_transformer/blob/521d67e9160a6362a18e68e6b3aeafc270d40ad0/transformer.py#L560
Then another at the output of each MultiHead attention
https://github.com/vpj/jax_transformer/blob/521d67e9160a6362a18e68e6b3aeafc270d40ad0/transformer.py#L588
Then another at the end https://github.com/vpj/jax_transformer/blob/521d67e9160a6362a18e68e6b3aeafc270d40ad0/transformer.py#L755

In the functional transformer, for Q, K and V it's here

functional-transformer/transformer.py

Line 169 in e44f460

query = linear(head.query, t1) # L x Dk
For the one at the output of each multihead I can't find it. Do you know where it is ?
Then another at the end

functional-transformer/transformer.py

Line 199 in e44f460

return linear(params.output, embeddings) # L x n_vocab

awf Dec 12, 2022
Maintainer

Ah, yes, you're right - I produce several L x Dm matrices (the value head has size Dm x Dm), and simply add them, while the original code postmultiplies them by an Lm x Lm which I don't have. It feels roughly the same class of model, as linear o linear is equivalent to linear, but the parameter count is clearly different.

I'll make an issue to try it with the heads being Dm x Dk and a final linear layer.

Thanks for your diligence in explaining it!

awf Dec 12, 2022
Maintainer

So, roughly, the current version is:

t1 = a1 @ V1    # V1 : MxM
t2 = a2 @ V2    # V2 : MxM
...
tH = aH @ VH    # VH : MxM
tot = a1 @ V1 + a2 @ V2 + ... + aH @ VH

Total params: H x M x M

While the original is:

t1 = a1 @ V1    # V1 : M x (M/H)  t1 : L x M/2
t2 = a2 @ V2    # V2 : M x (M/H)  t2 : L x M/2
...
tH = aH @ VH     # VH : M x (M/H)  t2 : L x M/2
tot = concat(t1, t2, ..., tH) @ P      # P : M x M

Total params: H * (M x M/H) + M x M = 2 x M x M

I can see arguments for how the different architecture might help or hinder, but it wasn't a conscious choice, more a transcription error.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Where is output linear ? #6

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Where is output linear ? #6

Uh oh!

Uh oh!

gaetansnl Dec 9, 2022

Replies: 2 comments · 4 replies

Uh oh!

awf Dec 9, 2022 Maintainer

Uh oh!

Uh oh!

gaetansnl Dec 9, 2022 Author

Uh oh!

awf Dec 10, 2022 Maintainer

Uh oh!

gaetansnl Dec 12, 2022 Author

Uh oh!

awf Dec 12, 2022 Maintainer

Uh oh!

Uh oh!

awf Dec 12, 2022 Maintainer

gaetansnl
Dec 9, 2022

Replies: 2 comments 4 replies

awf
Dec 9, 2022
Maintainer

gaetansnl
Dec 9, 2022
Author

awf Dec 10, 2022
Maintainer

gaetansnl Dec 12, 2022
Author

awf Dec 12, 2022
Maintainer

awf Dec 12, 2022
Maintainer