the reason why the PhoenixMouse thread (priority=0x22) is not scheduled as often as on the PSP is because the threads "thread1" and "thread2" (priority=0x21) are doing CPU intensive work (spent in ndHAL_WindowsRender.c): on a real PSP, there are still some free CPU cycles between 2 VBLANKs for PhoenixMouse, on Jpcsp, the processing is a little bit slower and there are no free cycles available when thread1/thread2 are running. I've tried to change the priority of the PhoenixMouse thread to 0x20 and the mouse is responding much better. Is it a problem to increase the priority of this thread on a real PSP?
BTW, you would get much better performance (even on a PSP), when using the PSP graphical engine instead of using software rendering. E.g. for the blend operations in ndHAL_WindowsRender.c (see sceGuBlendFunc() ).
A profiling on Jpcsp shows that around 50% of the CPU cycles are spent in the rendering routines from ndHAL_WindowsRender.c. If switching to the PSP graphical engine is not an option for you, you might have a look at the code generated for ndHAL_WindowsRender.c and see if there are potential optimizations.
Code:
089123C0:[952D004A]: lhu $t5, 74($t1)
089123C4:[95240054]: lhu $a0, 84($t1)
089123C8:[95230048]: lhu $v1, 72($t1)
089123CC:[014D5823]: subu $t3, $t2, $t5
089123D0:[3162FFFF]: andi $v0, $t3, -1
089123D4:[00820018]: mult $a0, $v0
089123D8:[02083821]: addu $a3, $s0, $t0
089123DC:[00072840]: sll $a1, $a3, 0x0001
089123E0:[0103C823]: subu $t9, $t0, $v1
089123E4:[00B13021]: addu $a2, $a1, $s1
089123E8:[94CB0000]: lhu $t3, 0($a2)
089123EC:[3323FFFF]: andi $v1, $t9, -1
089123F0:[00081040]: sll $v0, $t0, 0x0001
089123F4:[7D6508C0]: ext $a1, $t3, 3, 2
089123F8:[7D660A00]: ext $a2, $t3, 8, 2
089123FC:[7D670B40]: ext $a3, $t3, 13, 2
08912400:[004E5821]: addu $t3, $v0, $t6
08912404:[250D0001]: addiu $t5, $t0, 1
08912408:[31A8FFFF]: andi $t0, $t5, -1
0891240C:[0308682B]: sltu $t5, $t8, $t0
08912410:[00002012]: mflo $a0
08912414:[0083C821]: addu $t9, $a0, $v1
08912418:[00192040]: sll $a0, $t9, 0x0001
0891241C:[008F1021]: addu $v0, $a0, $t7
08912420:[94590000]: lhu $t9, 0($v0)
08912424:[7F232280]: ext $v1, $t9, 10, 5
08912428:[3324001F]: andi $a0, $t9, 31
0891242C:[7F222140]: ext $v0, $t9, 5, 5
08912430:[00673821]: addu $a3, $v1, $a3
08912434:[00852021]: addu $a0, $a0, $a1
08912438:[00461021]: addu $v0, $v0, $a2
0891243C:[2CF90020]: sltiu $t9, $a3, 32
08912440:[2C850020]: sltiu $a1, $a0, 32
08912444:[17200002]: bne $t9, $zr, 0x08912450
08912448:[2C460020]: sltiu $a2, $v0, 32
0891244C:[2407001F]: addiu $a3, $zr, 31 <=> li $a3, 31
08912450:[14A00002]: bne $a1, $zr, 0x0891245C
08912454:[00071A80]: sll $v1, $a3, 0x000A
08912458:[2404001F]: addiu $a0, $zr, 31 <=> li $a0, 31
0891245C:[14C00002]: bne $a2, $zr, 0x08912468
08912460:[00641821]: addu $v1, $v1, $a0
08912464:[2402001F]: addiu $v0, $zr, 31 <=> li $v0, 31
08912468:[00023940]: sll $a3, $v0, 0x0005
0891246C:[24E58000]: addiu $a1, $a3, -32768
08912470:[00653021]: addu $a2, $v1, $a1
08912474:[11A0FFD2]: beq $t5, $zr, 0x089123C0
08912478:[A5660000]: sh $a2, 0($t3)
Some pointer calculations could be done outside the loop (just incrementing the pointer inside the loop) and also there are some unnecessary unsigned short to int conversions (e.g. "andi $v0, $t3, -1").