What operations are particularly slow on the ARM processor?
Division and modulus. The ARM has no division instruction so these are painfully slow. Whenever possible, use >> (right shift) or multiply by the reciprocal instead. If you absolutely need to use division then use the routines in the GBA BIOS. One report has the BIOS divison code operating ~2.5x faster than GCC division.