Skip to content

NDS Speed Boost#24

Open
RetroGamer02 wants to merge 8 commits intoHydr8gon:ndsfrom
RetroGamer02:nds
Open

NDS Speed Boost#24
RetroGamer02 wants to merge 8 commits intoHydr8gon:ndsfrom
RetroGamer02:nds

Conversation

@RetroGamer02
Copy link
Copy Markdown

I made a few small changes to boost the speed a bit.
Used some NDS BIOS Math and Hardware accelerated Math where I found possible and optimized with O3 instead of O2.

gd_set_identity_mat4(mtx);
if (hMag != 0.0f) {
invertedHMag = 1.0f / hMag;
invertedHMag = swiDivide(1.0f, hMag);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

swiDivide doesn't support floats. Also, changes to the game code should be ifdefed to preserve the N64 build.


// Store vertices in the vertex buffer
memcpy(&vertex_buffer[index - count], vertices, count * sizeof(Vtx));
swiFastCopy(vertices, &vertex_buffer[index - count], sizeof(Vtx) * 4);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this replace count with 4? That seems like a bug.

Copy link
Copy Markdown

@Epicpkmn11 Epicpkmn11 Nov 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi swiFastCopy is bugged and actually significantly slower than memcpy, if you want something faster try DMA or tonccpy or some other DS optimized memcpy

http://problemkaputt.de/gbatek-bios-memory-copy.htm

BUG: The NDS/DSi uses the fast 32-byte-block processing only for the first N bytes (not for the first N words), so only the first quarter of the memory block is FAST, the remaining three quarters are SLOWLY copied word-by-word.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi swiFastCopy is bugged and actually significantly slower than memcpy, if you want something faster try DMA or tonccpy or some other DS optimized memcpy

http://problemkaputt.de/gbatek-bios-memory-copy.htm

BUG: The NDS/DSi uses the fast 32-byte-block processing only for the first N bytes (not for the first N words), so only the first quarter of the memory block is FAST, the remaining three quarters are SLOWLY copied word-by-word.

Thank you for the info. I tried DMA copy and found it caused graphical corruption. This is the first I have heard of tonccpy. Would you say that normal swiCopy would be good for this?

Copy link
Copy Markdown

@Epicpkmn11 Epicpkmn11 Nov 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

swiCopy is even worse than swiFastCopy, I did some testing and tonccpy seems to be more or less equal to memcpy (but VRAM safe), swiFastCopy is about 10% slower than memcpy, and swiCopy is about half the speed of memcpy. For whatever reason dmaCopy isn't cooperating with my testing so not sure exactly on it but I know it should be faster than memcpy.

You might need to flush the cache for DMA copy to work, since DMA is separate from the CPU it can't access the CPU's cache. I'm not sure how big if a speed penalty cache flushing has so CPU caching might end up faster in some cases because of that.

Edit: DMA wasn't cooperating because I was using no$gba it turns out, though my results still seem a bit weird on hardware... I'm getting DMA as like half the speed of memcpy which doesn't seem right... I'm just putting cpuStartTiming(0) before and cpuEndTiming() after doing a large copy, not sure if there's a better way to do that.

Copy link
Copy Markdown
Author

@RetroGamer02 RetroGamer02 Nov 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to use dmaCopy but I can't seem to get it to work without corrupted graphics. I am flushing the cache.
I used
DC_FlushRange(&vertex_buffer[index - count], count * sizeof(Vtx));
dmaCopy(vertices, &vertex_buffer[index - count], count * sizeof(Vtx));
I have no idea what is wrong. I tried using
DC_FlushRange(vertices, count * sizeof(Vtx));
But that is even worse.

Copy link
Copy Markdown

@Epicpkmn11 Epicpkmn11 Nov 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to flush the source, not the destination

DC_FlushRange(vertices, count * sizeof(Vtx));

edit: didn't see your whole message whoops, not sure why it's not working tbh 😅, i usually just use tonccpy since it's good enough and always works

@RetroGamer02
Copy link
Copy Markdown
Author

I did not think about preserving the N64 build my apologies. As for replacing count with 4 it seems that swiFastCopy calculates size differently than memcpy. I will start adding the ifdefs in a bit.

@1upus
Copy link
Copy Markdown

1upus commented Feb 17, 2023

Maybe you can fix ingame dialogs fonts too? It will be great!

@riolubruh
Copy link
Copy Markdown

BTW the optimization flag -Ofast is better than -O3 in some cases (I have also tested it and as far as I can tell there is no downside to doing so)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants