I've found that the software implementation of sf2d_texture_tile32() is extremely slow. It's much faster to delegate this task to hardware by performing a display transfer from the source into a buffer, then copying the buffer back to the source. Something roughly along these lines:
u32* buffer = linearAlloc(surface.tex->data_size);
GSPGPU_FlushDataCache(surface.tex->data, surface.tex->data_size);
u32 flags = (GX_TRANSFER_FLIP_VERT(1) | GX_TRANSFER_OUT_TILED(1) | GX_TRANSFER_RAW_COPY(0) | \
GX_TRANSFER_IN_FORMAT(GX_TRANSFER_FMT_RGBA8) | GX_TRANSFER_OUT_FORMAT(GX_TRANSFER_FMT_RGBA8) | \
GX_TRANSFER_SCALING(GX_TRANSFER_SCALE_NO));
GX_DisplayTransfer(
surface.tex->data,
GX_BUFFER_DIM(surface.tex->pow2_w, surface.tex->pow2_h),
buffer,
GX_BUFFER_DIM(surface.tex->pow2_w, surface.tex->pow2_h),
flags
);
gspWaitForPPF();
memcpy(surface.tex->data, buffer, surface.tex->data_size);
linearFree(buffer);
This speed boost is vital if you want to update the texture every frame (e.g. because you're doing software rendering).
Thanks to @WinterMute for getting me on the right track.
I've found that the software implementation of sf2d_texture_tile32() is extremely slow. It's much faster to delegate this task to hardware by performing a display transfer from the source into a buffer, then copying the buffer back to the source. Something roughly along these lines:
This speed boost is vital if you want to update the texture every frame (e.g. because you're doing software rendering).
Thanks to @WinterMute for getting me on the right track.