| 
		
	
	
	
		
	Posts: 107 
	Threads: 0 
	Joined: Feb 2014
	
 Reputation: 
5 
	
	
		Yeah, multiple things are slow, it was just when I was profiling the memory reads were pretty much obscuring any other slow instructions, it was spending 80-95% of its time in the loop from "isMyMemory" or whatever it's called but I don't even remember whether that was release or debug where I pulled that profiling info from (but if I remember right these reads were also triggered per CPU instruction read).
	 
	
	
	
		
	Posts: 218 
	Threads: 2 
	Joined: Mar 2014
	
 Reputation: 
8 
	
	
		 (05-13-2014, 08:11 AM)Bigpet Wrote:  Yeah, multiple things are slow, it was just when I was profiling the memory reads were pretty much obscuring any other slow instructions, it was spending 80-95% of its time in the loop from "isMyMemory" or whatever it's called but I don't even remember whether that was release or debug where I pulled that profiling info from (but if I remember right these reads were also triggered per CPU instruction read). 
I got the same when profiling in release mode -- and yeah, every memory block read/write uses IsMyMemory/IsInRange etc. They are triggered per instruction decode.
	 
	
	
	
		
	Posts: 5Threads: 2
 Joined: Apr 2014
 
	
		
		
		05-14-2014, 05:15 PM 
(This post was last modified: 05-14-2014, 05:24 PM by Ontakeio.)
		
	 
		Ok, I just thought of new idea for RPCS3 for speed-ups. You continue down the same path of dynamic recompilation for speed improvements, of course, but a new thought:
 You pre-decode the entire binary step-by-step, either during execution whilst running itself or decode the entire game into an intermediate language before running. By doing this, you can cache the decoded "trigger instructions" that can directly modify state info of RPCS3. You are essentially caching and/or storing the instructions, however, the "instructions" will be in a form that the emu will read them and immediately take action without having to decode them.
 
 Alas, you can of course run the game while doing this but it will be slower, so it can decode the binary in pieces and load it in completed segments.
 
 Or you can just decode it all and wait a while (depends on some games, some over 10 GB I think and can take a while), but it would be worth it because you'll notice speed. Of course RPCS3 will need to be optimized for this, but nonetheless I would love to see what others think of this, as I really feel it could make a difference better than just dynamic recompilation.
 
 On another note, I've tried this stuff before and while I did so in something much, much simpler than a PS3's architecture I notice performance increase because less clock cycles are needed, but much more RAM could be(RPCS3 may need to cache gigabytes of data in this fashion to see big speed-ups).
 
 I know this all is probably too much to read and my English isn't quite perfect yet, but if you could see this the way I am seeing it working I think you'd at least see some benefit in this. I feel that this is one of the only methods where high-end games will see closer to full-speed on high-end computers on the market today and in the next 5 years.
 
	
	
	
		
	Posts: 2,485 
	Threads: 77 
	Joined: Dec 2013
	
 Reputation: 
32 
	
	
		 (05-14-2014, 05:15 PM)Ontakeio Wrote:  Ok, I just thought of new idea for RPCS3 for speed-ups. You continue down the same path of dynamic recompilation for speed improvements, of course, but a new thought:
 You pre-decode the entire binary step-by-step, either during execution whilst running itself or decode the entire game into an intermediate language before running. By doing this, you can cache the decoded "trigger instructions" that can directly modify state info of RPCS3. You are essentially caching and/or storing the instructions, however, the "instructions" will be in a form that the emu will read them and immediately take action without having to decode them.
 
 Alas, you can of course run the game while doing this but it will be slower, so it can decode the binary in pieces and load it in completed segments.
 
 Or you can just decode it all and wait a while (depends on some games, some over 10 GB I think and can take a while), but it would be worth it because you'll notice speed. Of course RPCS3 will need to be optimized for this, but nonetheless I would love to see what others think of this, as I really feel it could make a difference better than just dynamic recompilation.
 
 On another note, I've tried this stuff before and while I did so in something much, much simpler than a PS3's architecture I notice performance increase because less clock cycles are needed, but much more RAM could be(RPCS3 may need to cache gigabytes of data in this fashion to see big speed-ups).
 
 I know this all is probably too much to read and my English isn't quite perfect yet, but if you could see this the way I am seeing it working I think you'd at least see some benefit in this. I feel that this is one of the only methods where high-end games will see closer to full-speed on high-end computers on the market today and in the next 5 years.
 
So you are basically saying that rpcs3 should create some kind of bytecode that can be executed faster? Really this is sort of like what a recompiler would do, only slower. 
 
I think the fastest way is an optimized recompiler that translates the game assembly in to optimized  x86 assembly. Optimized is the hard part though, but it would in principle give you native speed on the code execution. (There is of course a fair bit of other stuff going on that makes things slower.)
	 
Asus N55SF, i7-2670QM (~2,8 ghz under typical load), GeForce GT 555M (only OpenGL)
 
	
	
	
		
	Posts: 7 
	Threads: 0 
	Joined: May 2014
	
 Reputation: 
1 
	
	
		Well, the decode step is more basic than what you're thinking.  Specifically, it involves:
 1. Memory fetch (what instruction is it?)
 2. Byteswap (PPC instructions are in big endian.)
 3. Table lookup(s) - determine encoding of instruction and packed parameters.
 4. Dispatch (call the function which can handle the instruction.)
 
 The only step above that can practically be removed on a complicated architecture is step 2.  A shadow copy of the binary could be kept pre-byteswapped, and all instructions read from there.  This could be a win, depending on how much time the CPU actually spends on byteswapping.  I think BSWAP is quite fast.
 
 Anyway, the table lookup is hard to avoid.  You either need to pack things in different lanes and generally deal with decoding and tables, or you need to make the instructions bigger.  If they're bigger, then you spend more time on step 1.  It could be tested, but I would not expect it to be a win.
 
 You can't avoid dispatch or memory fetch without actually recompiling it.  Any kind of recompiler will avoid all four of those steps, and inline the actual operation - including jit.
 
 Using AOT instead of JIT may be relevant for the PS3.  Someone mentioned to me that games were forbidden from modifying executable code, which is great news.  There are some cases where AOT can be tricky, though, specifically switch jump tables in some cases... but I'm not very familiar with PowerPC or how code is generated for it by compilers usually, so that may not be an issue or it may be a larger issue than I expect.
 
 Certainly no game binary is 10GB.  The size of the disc is irrelevant for AOT, only the binary matters.  It will never be larger than 256MB, since the PS3 only has 256MB of RAM.  However, there's likely some means of dynamically loading binaries (like dlls), and "finding" these for AOT compilation may either be very easy or tricky.
 
 For example, the PSP does allow self modifying code, which some PSP games use, and games can even load binary code from a datafile into memory and call it.  They can also call official functions to load dynamic modules, either from data files or even from memory (which may ultimately be from a compressed or packed data file.)
 
 So, AOT makes sense if the PS3 does not have those problems, or has them in well-defined, easily detectable ways (and if homebrew doesn't need them either.)
 
 Anyway, rpcs3 currently spends lots of time (and memory accesses) on each of the 4 steps above.  Except step 2, each one involves at least one, and probably multiple, virtual method lookups (which will be cached at least, probably, but are individual memory accesses.)
 
 -[Unknown]
 
	
	
	
		
	Posts: 35Threads: 4
 Joined: Mar 2014
 
	
		
		
		05-15-2014, 07:49 PM 
(This post was last modified: 05-15-2014, 07:51 PM by mushroom.)
		
	 
		Unknown, I think you are mistaking game size with RAM size factors. Certainly almost every PS3 game is larger than the maximum PS3's RAM, since game size is not 1:1 with RAM, and even many PS1 games were more than 256 MB of size on CD-ROMs.
 For example, Final Fantasy XIII-2's image is roughly 14 GB on a single-layer BD; however, upon execution, it will be limited to 256 MB of main RAM.
 
 Much of that is likely data of the game and not just pure code as well; everything of the game binary's code is not all at once loaded to RAM anyways.
 
 And if I understand Ontakeio correctly, they are trying to mean that the whole game can be decoded to some calling functions that can change RPCS3's state and update without having to go through the decoding phase at all, or skip some of it.
 
	
	
	
		
	Posts: 7 
	Threads: 0 
	Joined: May 2014
	
 Reputation: 
1 
	
	
		I'm sorry, but it's pretty clear you did not understand my post.  I discussed binaries, dynamic loading, and also the decoding stage.
 -[Unknown]
 
	
	
	
		
	Posts: 35Threads: 4
 Joined: Mar 2014
 
	
	
		 (05-15-2014, 09:24 PM)[Unknown] Wrote:  I'm sorry, but it's pretty clear you did not understand my post.  I discussed binaries, dynamic loading, and also the decoding stage.
 -[Unknown]
 
Well, stepping by what you wrote it's obvious that what you're saying is wrong:
 Quote:Certainly no game binary is 10GB.http://www.examiner.com/article/file-siz...d-xbox-360 
Perhaps we have a different idea of "game binary" here. Game "binary" is everything that makes up the game on the medium, not the max capacity of the medium itself. You also have a skewed description by thinking that RAM size must be 1:1 with the size of a binary itself, which couldn't make any sense.
 
Final Fantasy 7 for PSX takes up over 700 MB for disc 1's total binary image, including all code and data of the entire game; PSX has 2 MB of RAM total. Based on your assumption that a binary can't be larger than the size of RAM, how could the Playstation run the game then?
	 
	
	
	
		
	Posts: 2,485 
	Threads: 77 
	Joined: Dec 2013
	
 Reputation: 
32 
	
		
		
		05-15-2014, 09:59 PM 
(This post was last modified: 05-15-2014, 10:00 PM by ssshadow.)
		
	 
		 (05-15-2014, 09:43 PM)mushroom Wrote:   (05-15-2014, 09:24 PM)[Unknown] Wrote:  I'm sorry, but it's pretty clear you did not understand my post.  I discussed binaries, dynamic loading, and also the decoding stage.
 -[Unknown]
 Well, stepping by what you wrote it's obvious that what you're saying is wrong:
 
 Quote:Certainly no game binary is 10GB.http://www.examiner.com/article/file-siz...d-xbox-360 
 Perhaps we have a different idea of "game binary" here. Game "binary" is everything that makes up the game on the medium, not the max capacity of the medium itself. You also have a skewed description by thinking that RAM size must be 1:1 with the size of a binary itself, which couldn't make any sense.
 
 Final Fantasy 7 for PSX takes up over 700 MB for disc 1's total binary image, including all code and data of the entire game; PSX has 2 MB of RAM total. Based on your assumption that a binary can't be larger than the size of RAM, how could the Playstation run the game then?
 
Game binary  usually refers to the executable code, not game data which would be sound and graphic files and such. 
 
When you load a game, either a Windows .exe, a .elf, or something else, you always load that program into memory for the CPU to fetch and execute the instructions. On a really basic level, that is basically what an .exe is, a long list of instructions for the cpu to follow. But this is only the ".exe" part of a game. Everything else such as graphics gets loaded into RAM/VRAM as it is  needed, and removed when it isn't needed any more. This can be cached to ram in order to improve loading times (and a smart OS probably will if the data is loaded/unloaded frequently), but hardly any more than that.
 
Also, [Unknown] is one of the big guys from ppsspp, assume he knows what he is talking about    
Asus N55SF, i7-2670QM (~2,8 ghz under typical load), GeForce GT 555M (only OpenGL)
 
	
	
	
		
	Posts: 231 
	Threads: 1 
	Joined: Mar 2014
	
 Reputation: 
3 
	
	
		 (05-15-2014, 09:43 PM)mushroom Wrote:  Perhaps we have a different idea of "game binary" here. Game "binary" is everything that makes up the game on the medium, not the max capacity of the medium itself. You also have a skewed description by thinking that RAM size must be 1:1 with the size of a binary itself, which couldn't make any sense. No it isn't. Binary is code that console can execute. Like .exe (.dll) on pc. Read again.
	 |