Deathlike2 wrote:Personally byuu, I think you could do better than to prefer asthetics than to trying to optimize when it comes to coding.
You're right, of course. But I honestly have tried somewhat. Take a look at the PPU color blending code (many thanks to blargg there of course), look at my tile caching system, the optimizations in the PPU render loop, leaving blargg's S-DSP to run as a state machine rather than as a cothread (which would be ideal to me personally), and much of the CPU timing system.
There's really only two
major bottlenecks, one which is the CPU NMI/IRQ edge testing. Profiling shows sCPU::add_clocks() consumes >40% CPU time alone. I've spent years trying to optimize it, it's all in the massive "bsnes thread archive." But I've failed. It really truly is above me as a programmer to do better there, sadly. The months I've wasted there trying to optimize could've meant mouse / superscope / justifier / SFX / SA-1 emulation instead.
The other major issue is that PGO doesn't work anymore. That gave a 30% speedup, free of charge.
Combine those two, and bsnes was almost equal to Super Sleuth in terms of speed.
The things I'm stubborn about don't really eat up that much extra time. The cothreads are debatably faster. We got a ~10% speedup compared to the previously convoluted nested switch tables when I first added them. The memory may have an extra indirection that allows me to dynamically remap entire memory ranges in a single line of code, but it's enabled things like SPC7110 and BS-X support to be trivial at a mere ~3% speed hit. It also means I'm not directly manipulating raw memory pointers, so you rarely if ever see bsnes crash from out-of-bounds memory accesses. I may use abstract base classes, but I've mitigated most of the damage by bypassing the polymorphism and referencing the final classes directly. The overhead I still take from this is made up by the fact that I can (and have, twice) replace entire CPU cores in a matter of weeks, rather than months or years.
I could enslave the S-SMP to the S-DSP (as all other emulators do) for a ~5-10% speedup, but that would be truly detrimental to readability to me. That my CPU and SMP work completely independently of one another is one of the things I am most proud of, in fact. But yeah, enslavement isn't something you can add a quick comment on and say "by the way, real hardware doesn't do any of this."
I may not use things like "NZ" processor flag optimization, but that really wouldn't make a huge difference, either.
If you have any other glaring areas that you feel should be improved, I'll welcome constructive criticism, and especially some help improving it. Seriously, if a comment will allow the same meaning to come across while not substantially hurting readability, I'll add it.
I rewrote the video filtering code after hearing that you didn't like it (and rightly so, it
was a mess), so I honestly do listen to constructive criticism.