Hacks to make free generate faster #34

karthikviswanathn · 2025-10-04T09:23:14Z

Changes to make patched_layer_forward() faster. This is done by calling

outputs = super().forward(
            hidden_states=hidden_states,
            attention_mask=attention_mask,
            position_ids=position_ids,
            past_key_value=past_key_value,
            output_attentions=output_attentions,
            use_cache=use_cache,
            cache_position=cache_position,
            position_embeddings=position_embeddings,
            **kwargs
        )

and

torch.where(
            unfrozen_elements.unsqueeze(-1),  # Expand mask to match hidden dimension
            new_hidden_states,                 # Use new values where unfrozen
            original_hidden_states             # Keep original values where frozen
        )

Hacks to make free generate faster

48bf1e1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hacks to make free generate faster #34

Hacks to make free generate faster #34

karthikviswanathn commented Oct 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Hacks to make free generate faster #34

Are you sure you want to change the base?

Hacks to make free generate faster #34

Conversation

karthikviswanathn commented Oct 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants