Skip to content

Conversation

@wheremyfoodat
Copy link

This PR gains a couple FPS depending on the scene in emerald in my VM, tell me if you get any benefit from it

The truthfulness of an ARM condition depends on 2 factors:

  • The condition code (4 bits)
  • The upper 4 CPSR bits
    This means that you can use a 256-entry truth table that uses the upper 8 bits as a hash, instead of using a switch-case which would probably compile to an array lookup + indirect jump.

LUTs aren't generally the best thing for the cache and stuff, so they shouldn't be abused toooo much. So, here's a neat bit packing trick which originates from MelonDS's ARM interpreter, which uses a packed 32 (16*2) byte LUT of masks depending on the condition code, instead of a switch-case, to verify if a condition is true. The 16 masks in the LUT are magic numbers which get masked by (1 << CPSR_FLAGS). The masks are specially-made so that masks [conditionCode] & (1 << CPSR_FLAGS) will always return a non-zero value if the condition is met, and 0 if not. This way, you can

  • Minimize the dcache overhead of a 256-byte truth table by tightly packing it (32 bytes are fewer than 256 :p)
  • Not use a switch-case

I used Pokemon Emerald to make sure it works and arm.gba wihch still passes.
I tried ARMWrestler too but I couldn't find the start button. It boots though.
Tell me what you think when you can

@mattrbeck
Copy link
Owner

Hey thanks so much for submitting this! This is a tricky little change that I don't think I would have thought of haha. However, I just pulled down the changes locally and compared to the current FPS I'm seeing, and I wasn't actually able to see any improvement in Emerald. In fact, I'm seeing ~2 FPS lower on Golden Sun on average across a few runs. I don't really understand why it would be slower for me, since logically it seems like it should just be an improvement. You were able to see an FPS gain though?

@wheremyfoodat
Copy link
Author

Hey thanks so much for submitting this! This is a tricky little change that I don't think I would have thought of haha. However, I just pulled down the changes locally and compared to the current FPS I'm seeing, and I wasn't actually able to see any improvement in Emerald. In fact, I'm seeing ~2 FPS lower on Golden Sun on average across a few runs. I don't really understand why it would be slower for me, since logically it seems like it should just be an improvement. You were able to see an FPS gain though?

Yeah though nothing too groundbreaking.
Oh well :(

@ITotalJustice
Copy link

Somewhat related to this issue, but I think an easy / free optimisation to implement is to check if the cond is AL (0xE), if so, continue, otherwise, use the LUT (or switch). In the vast majority or cases, the cond is going to be AL, so the switch / LUT won't be hit.

If crystal supports marking stuff likely/unlikely, you can label that if(cond==AL) as likely.

@mattrbeck
Copy link
Owner

@ITotalJustice Thanks for the idea! Tested in 8d9c789, although I didn't see any noticeable improvement in the few games I tested

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants