This forum uses cookies
This forum makes use of cookies to store your login information if you are registered, and your last visit if you are not. Cookies are small text documents stored on your computer; the cookies set by this forum can only be used on this website and pose no security risk. Cookies on this forum also track the specific topics you have read and when you last read them. Please confirm whether you accept or reject these cookies being set.

A cookie will be stored in your browser regardless of choice to prevent you being asked this question again. You will be able to change your cookie settings at any time using the link in the footer.

Thread Rating:
  • 3 Vote(s) - 3.67 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Some (possibly) New Ideas for RPCS3
#10
To be sure, there are i7's with 6/12 cores:
http://ark.intel.com/products/63696/Inte...o-3_90-GHz

Something I read indicated that the Cell PPU supported basically hyperthreading. And I will say, hyperthreading does work. For example, the uber slow softgpu in ppsspp runs much faster on my i7 with 8 threads than with 4, but worse with more than 8. They're not as good as two real cores, but they're way better than just one.

IMHO, writing assembly for certain routines is sometimes a very good idea. Some reasons:
* Hand coded assembly does not need to conform to the parameter passing ABI. This can have a large impact on performance in tight code (in cases where inlining might even hurt performance.)
* Optimizers / compilers are sometimes stupid. This is more relevant when targeting ARM / etc.
* If generated at runtime, it can allow you to more conveniently use/not use features based on the host CPU.
* It will not bloat the executable nearly as much as using a ton of templates would.

For example, the vertexdecoder jit in ppsspp gave great performance gains on both x86 and ARM, but it's not a recompiler - it just generates assembly instead of calling an array of functions, and ignores the C++ ABI.

That said, it's often not a good idea unless you've tried everything else first (especially for portability reasons.) A much smarter thing is to look at the assembly that is being produced, and first try to understand and resolve poor codegen by the compiler. For example, MSVC iirc will not optimize accesses to a member variable. You will often get better performance by doing this (only for hot loops):

const int x = m_x;
// tight loop
m_x = x;

Than by using x directly inside the loop. This doesn't require writing assembly to figure out, and if you think about multiple threads you might even realize why the compiler can't safely optimize it.

Anyway, moving things more low level would help in some areas for sure. Some things are very unnecessarily abstracted. The more I look at the less I know where to start to improve things.

Bigpet: to be sure, there are multiple areas. Even in my lazy approach to mapping memory, which did help a little, performance did not change much because it became dominated by the PPU interpreter (and its X vtable lookups, breakpoint checks, and thread status checks per single CPU instruction.) There are definitely multiple areas which are slow right now.

-[Unknown]
Reply


Messages In This Thread
Some (possibly) New Ideas for RPCS3 - by Ontakeio - 04-24-2014, 09:46 PM
RE: Some (possibly) New Ideas for RPCS3 - by notq - 04-25-2014, 12:16 AM
RE: Some (possibly) New Ideas for RPCS3 - by [Unknown] - 05-13-2014, 07:54 AM

Forum Jump:


Users browsing this thread: 1 Guest(s)