EmuNewz Network - Bad management of PSP thread priority

We have tried the Nanodesktop GRAPHDEMO application under JPCSP
emulator.

It seems that the emulation of the priorities of the PSP threads
in incorrect.

You can download a copy of GRAPHDEMO for the real PSP and for
JPCSP here:

http://www.megaupload.com/?d=8FY7GZM8

Note: copy the GRAPHDEMO folder in ms0:/ before starting the
application.

Please compare the behaviour of the program under the real PSP
and under JPCSP emulator. Under JPCSP, the mouse pointer is very,
very slow because of the slow emulation of the PhoenixMouse thread.

Double click of the icons doesn't work for the same reason.

Thanks in advance for your support and... Merry Christmas Smile

(12-26-2010, 06:21 PM)pegasus2000 Wrote: [ -> ]We have tried the Nanodesktop GRAPHDEMO application under JPCSP
emulator.

It seems that the emulation of the priorities of the PSP threads
in incorrect.

You can download a copy of GRAPHDEMO for the real PSP and for
JPCSP here:

http://www.megaupload.com/?d=8FY7GZM8

Note: copy the GRAPHDEMO folder in ms0:/ before starting the
application.

Please compare the behaviour of the program under the real PSP
and under JPCSP emulator. Under JPCSP, the mouse pointer is very,
very slow because of the slow emulation of the PhoenixMouse thread.

Double click of the icons doesn't work for the same reason.

Thanks in advance for your support and... Merry Christmas

Hi!

the reason why the PhoenixMouse thread (priority=0x22) is not scheduled as often as on the PSP is because the threads "thread1" and "thread2" (priority=0x21) are doing CPU intensive work (spent in ndHAL_WindowsRender.c): on a real PSP, there are still some free CPU cycles between 2 VBLANKs for PhoenixMouse, on Jpcsp, the processing is a little bit slower and there are no free cycles available when thread1/thread2 are running. I've tried to change the priority of the PhoenixMouse thread to 0x20 and the mouse is responding much better. Is it a problem to increase the priority of this thread on a real PSP?

BTW, you would get much better performance (even on a PSP), when using the PSP graphical engine instead of using software rendering. E.g. for the blend operations in ndHAL_WindowsRender.c (see sceGuBlendFunc() ).
A profiling on Jpcsp shows that around 50% of the CPU cycles are spent in the rendering routines from ndHAL_WindowsRender.c. If switching to the PSP graphical engine is not an option for you, you might have a look at the code generated for ndHAL_WindowsRender.c and see if there are potential optimizations.
Here is an example of the generated code doing MathBlend for the X-axis:

Code:
089123C0:[952D004A]: lhu        $t5, 74($t1)

089123C4:[95240054]: lhu        $a0, 84($t1)

089123C8:[95230048]: lhu        $v1, 72($t1)

089123CC:[014D5823]: subu       $t3, $t2, $t5

089123D0:[3162FFFF]: andi       $v0, $t3, -1

089123D4:[00820018]: mult       $a0, $v0

089123D8:[02083821]: addu       $a3, $s0, $t0

089123DC:[00072840]: sll        $a1, $a3, 0x0001

089123E0:[0103C823]: subu       $t9, $t0, $v1

089123E4:[00B13021]: addu       $a2, $a1, $s1

089123E8:[94CB0000]: lhu        $t3, 0($a2)

089123EC:[3323FFFF]: andi       $v1, $t9, -1

089123F0:[00081040]: sll        $v0, $t0, 0x0001

089123F4:[7D6508C0]: ext        $a1, $t3, 3, 2

089123F8:[7D660A00]: ext        $a2, $t3, 8, 2

089123FC:[7D670B40]: ext        $a3, $t3, 13, 2

08912400:[004E5821]: addu       $t3, $v0, $t6

08912404:[250D0001]: addiu      $t5, $t0, 1

08912408:[31A8FFFF]: andi       $t0, $t5, -1

0891240C:[0308682B]: sltu       $t5, $t8, $t0

08912410:[00002012]: mflo       $a0

08912414:[0083C821]: addu       $t9, $a0, $v1

08912418:[00192040]: sll        $a0, $t9, 0x0001

0891241C:[008F1021]: addu       $v0, $a0, $t7

08912420:[94590000]: lhu        $t9, 0($v0)

08912424:[7F232280]: ext        $v1, $t9, 10, 5

08912428:[3324001F]: andi       $a0, $t9, 31

0891242C:[7F222140]: ext        $v0, $t9, 5, 5

08912430:[00673821]: addu       $a3, $v1, $a3

08912434:[00852021]: addu       $a0, $a0, $a1

08912438:[00461021]: addu       $v0, $v0, $a2

0891243C:[2CF90020]: sltiu      $t9, $a3, 32

08912440:[2C850020]: sltiu      $a1, $a0, 32

08912444:[17200002]: bne        $t9, $zr, 0x08912450

08912448:[2C460020]: sltiu      $a2, $v0, 32

0891244C:[2407001F]: addiu      $a3, $zr, 31 <=> li $a3, 31

08912450:[14A00002]: bne        $a1, $zr, 0x0891245C

08912454:[00071A80]: sll        $v1, $a3, 0x000A

08912458:[2404001F]: addiu      $a0, $zr, 31 <=> li $a0, 31

0891245C:[14C00002]: bne        $a2, $zr, 0x08912468

08912460:[00641821]: addu       $v1, $v1, $a0

08912464:[2402001F]: addiu      $v0, $zr, 31 <=> li $v0, 31

08912468:[00023940]: sll        $a3, $v0, 0x0005

0891246C:[24E58000]: addiu      $a1, $a3, -32768

08912470:[00653021]: addu       $a2, $v1, $a1

08912474:[11A0FFD2]: beq        $t5, $zr, 0x089123C0

08912478:[A5660000]: sh         $a2, 0($t3)

Some pointer calculations could be done outside the loop (just incrementing the pointer inside the loop) and also there are some unnecessary unsigned short to int conversions (e.g. "andi $v0, $t3, -1").

Merry Christmas!

(12-27-2010, 05:05 PM)gid15 Wrote: [ -> ]
(12-26-2010, 06:21 PM)pegasus2000 Wrote: [ -> ]We have tried the Nanodesktop GRAPHDEMO application under JPCSP
emulator.

It seems that the emulation of the priorities of the PSP threads
in incorrect.

You can download a copy of GRAPHDEMO for the real PSP and for
JPCSP here:

http://www.megaupload.com/?d=8FY7GZM8

Note: copy the GRAPHDEMO folder in ms0:/ before starting the
application.

Please compare the behaviour of the program under the real PSP
and under JPCSP emulator. Under JPCSP, the mouse pointer is very,
very slow because of the slow emulation of the PhoenixMouse thread.

Double click of the icons doesn't work for the same reason.

Thanks in advance for your support and... Merry Christmas
Hi!

the reason why the PhoenixMouse thread (priority=0x22) is not scheduled as often as on the PSP is because the threads "thread1" and "thread2" (priority=0x21) are doing CPU intensive work (spent in ndHAL_WindowsRender.c): on a real PSP, there are still some free CPU cycles between 2 VBLANKs for PhoenixMouse, on Jpcsp, the processing is a little bit slower and there are no free cycles available when thread1/thread2 are running. I've tried to change the priority of the PhoenixMouse thread to 0x20 and the mouse is responding much better. Is it a problem to increase the priority of this thread on a real PSP?

BTW, you would get much better performance (even on a PSP), when using the PSP graphical engine instead of using software rendering. E.g. for the blend operations in ndHAL_WindowsRender.c (see sceGuBlendFunc() ).
A profiling on Jpcsp shows that around 50% of the CPU cycles are spent in the rendering routines from ndHAL_WindowsRender.c. If switching to the PSP graphical engine is not an option for you, you might have a look at the code generated for ndHAL_WindowsRender.c and see if there are potential optimizations.
Here is an example of the generated code doing MathBlend for the X-axis:

Code:
089123C0:[952D004A]: lhu $t5, 74($t1) 089123C4:[95240054]: lhu $a0, 84($t1) 089123C8:[95230048]: lhu $v1, 72($t1) 089123CC:[014D5823]: subu $t3, $t2, $t5 089123D0:[3162FFFF]: andi $v0, $t3, -1 089123D4:[00820018]: mult $a0, $v0 089123D8:[02083821]: addu $a3, $s0, $t0 089123DC:[00072840]: sll $a1, $a3, 0x0001 089123E0:[0103C823]: subu $t9, $t0, $v1 089123E4:[00B13021]: addu $a2, $a1, $s1 089123E8:[94CB0000]: lhu $t3, 0($a2) 089123EC:[3323FFFF]: andi $v1, $t9, -1 089123F0:[00081040]: sll $v0, $t0, 0x0001 089123F4:[7D6508C0]: ext $a1, $t3, 3, 2 089123F8:[7D660A00]: ext $a2, $t3, 8, 2 089123FC:[7D670B40]: ext $a3, $t3, 13, 2 08912400:[004E5821]: addu $t3, $v0, $t6 08912404:[250D0001]: addiu $t5, $t0, 1 08912408:[31A8FFFF]: andi $t0, $t5, -1 0891240C:[0308682B]: sltu $t5, $t8, $t0 08912410:[00002012]: mflo $a0 08912414:[0083C821]: addu $t9, $a0, $v1 08912418:[00192040]: sll $a0, $t9, 0x0001 0891241C:[008F1021]: addu $v0, $a0, $t7 08912420:[94590000]: lhu $t9, 0($v0) 08912424:[7F232280]: ext $v1, $t9, 10, 5 08912428:[3324001F]: andi $a0, $t9, 31 0891242C:[7F222140]: ext $v0, $t9, 5, 5 08912430:[00673821]: addu $a3, $v1, $a3 08912434:[00852021]: addu $a0, $a0, $a1 08912438:[00461021]: addu $v0, $v0, $a2 0891243C:[2CF90020]: sltiu $t9, $a3, 32 08912440:[2C850020]: sltiu $a1, $a0, 32 08912444:[17200002]: bne $t9, $zr, 0x08912450 08912448:[2C460020]: sltiu $a2, $v0, 32 0891244C:[2407001F]: addiu $a3, $zr, 31 <=> li $a3, 31 08912450:[14A00002]: bne $a1, $zr, 0x0891245C 08912454:[00071A80]: sll $v1, $a3, 0x000A 08912458:[2404001F]: addiu $a0, $zr, 31 <=> li $a0, 31 0891245C:[14C00002]: bne $a2, $zr, 0x08912468 08912460:[00641821]: addu $v1, $v1, $a0 08912464:[2402001F]: addiu $v0, $zr, 31 <=> li $v0, 31 08912468:[00023940]: sll $a3, $v0, 0x0005 0891246C:[24E58000]: addiu $a1, $a3, -32768 08912470:[00653021]: addu $a2, $v1, $a1 08912474:[11A0FFD2]: beq $t5, $zr, 0x089123C0 08912478:[A5660000]: sh $a2, 0($t3)
Some pointer calculations could be done outside the loop (just incrementing the pointer inside the loop) and also there are some unnecessary unsigned short to int conversions (e.g. "andi $v0, $t3, -1").

Merry Christmas!

Thanks for your support.

There is no problem to change the priority of PhoenixMouse thread:
I can simply change the priority for the JPCSP HAL and keep unchanged
it for the other HALs.

Thanks for the advices for MathBlend: I'll verify if an optimization can
be done.

I have a propose for you: we are planning to activate a SVN for
the next Nanodesktop 0.5 source code. I could include you into the
list of the authorized members if you are agree.

So, you could submit patches for our source code and we could make
even closer the integration between JPCSP and Nanodesktop.

What do you think about our idea ?