FAQ and compatibility/feature status (moved to new forum)

Archived bsnes development news, feature requests and bug reports. Forum is now located at http://board.byuu.org/

FAQ and compatibility/feature status (moved to new forum)

Post by byuu »

Why are savestates impossible in bsnes?

Truthfully, they may be possible. But at the very least, they are severely implausible.


The SNES is a complex system, which contains five unique processors, as well as various memory chips and such. The actual hardware works by running all of these processors at the same time.

Unfortunately, such a thing is not possible with an emulator, where only one emulated processor can run at a time. Now you may be thinking about multi-core processors, and indeed there is potential for this in the future, when we have 8+ cores and vastly improved fine-grained threading. But this is a separate discussion. Such an approach would also make savestates implausible.

Emulators handle this parallelism by breaking down each processor so that it only runs for a small amount of time. This is essentially the primary measurement of accuracy for emulators: the more each processor is broken down, the more precise the communication between each processor is.

Traditionally, nearly all emulators will execute one single processor instruction, generate one audio sample, etc; and then return control. To keep track of where it is, it keeps a state machine. This basically keeps track of what the next operation should be, and goes to that section of code when that chip's emulator is called again.

But this level of precision has proven to be insufficient for at least the SNES. Many games, such as Earthworm Jim 2, d4s Breath of Fire 2 German translation, etc have sound communication problems with such an approach. So more precision is needed. Note that this is typically not as important with more advanced systems running at higher clock rates, such as the PS2, and you often see multi-threading there. But it is needed for the SNES.

The problem is that, with the exception of the S-DSP, the SNES processors are far too complex to encapsulate into a single function, and state machines can only work inside one function. The second you call another subfunction that consumes more than one cycle of emulated time, it needs its own state machine. What ends up happening is the more precision you want, the more state machines you need. The S-CPU alone would require roughly 300 or more, for instance.

To get perfect parallelism between multiple processors would require multiple levels of nested state machines, which would end up becoming ~90+% of your code, and in turn eat up ~90+% of CPU time, as more of your time would be spent maintaining the state machines, and less of it spent emulating small chunks of each emulated processor.

This approach is unwieldy: even with the speed hit and the code complexity, such a complex nested state machine is prone to programming errors, even from the best of us. These bugs are extremely complex to debug and correct. It's simply not a maintainable strategy when you get down to the bus / clock level. I know this from experience, I got down to the processor cycle level, and I was having severe trouble with the design. To get to the bus level would have required 2-3x the state machines, which was not at all feasible.

That means another approach is necessary.

libco - Cooperative Threads:

What if you could omit the state machines entirely, and be able to have 100% precise parallelism, being able to break out of code immediately, anywhere you like, and resume right where you left off?

It sounds simple enough. Operating systems do it all the time, it's called multi-tasking, or multi-threading. Simply save the processor instruction counter, the stack position, and registers to a temporary memory area and then restore the registers, stack position, and finally program counter from the other processor emulator, and it will resume where it left off. Note here that there are multiple program stacks now. This is what allows you to jump right out of the middle of a nested function from one thread to a nested function in another; without the need for state machines.

To make a long story short, the type of threading does not matter. Cooperative threading is advantageous for an emulator because it is hundreds of times faster, by nature of it putting the emulator in control of the threading, rather than the host OS. The emulator being in charge is also quite essential, and allows us to easily control exactly how much emulated time is spent within each processor. But regardless of the threading type, you hit the same problem with savestates.

State: in software, or hardware?

You may have realized that state machines and threading sound very similar. They both keep track of where you are at in a program. The former is written and maintained by the programmer, and the latter is handled transparently by the host computer. It's evident that the latter is a superior approach for simplifying code, and gaining maximum precision without introducing any complexity that could cause additional bugs.

But it also makes savestates implausible. Before, all you had to do was save the state machine variables into your save state, and restore them upon load. But now, there are no state machine variables. There are only threads. What are threads? Again, the program counter, a stack, and processor registers.

Aside from just being non-portable, the stack contains many temporary references that change between each program run. Mostly, this is pointers to allocated memory. So while you can save the threads easily for a save state, you simply cannot restore them, otherwise many addresses inside the stack will now be wrong, and the program will crash horribly.

You may be thinking that you could simply capture a save state when each emulated processor is at a known point in code. Typically, this would be at the entry point of the primary function for your thread. But it's not actually possible to align all five processors to their entry points, as they run on their own; and can quite easily cause other processors to run ahead while they try and get to their entry point. Sure, it may happen that all five align. But how frequently? Probably every few seconds / minutes / years. Hard to say, but definitely way too infrequently to capture a save state exactly when it was requested.

So that's the story. Nothing in life comes without tradeoffs, and this was certainly no exception. I traded savestate support for speed, vastly simplified code (which helps both readability and maintainability) and perfect, just-in-time precision for parallelism.

Note: mozz presented a possible model for savestates inside a threaded emulator, which involves emulating all accesses to other emulated processors as read/write devices. Building up the buffers when a save state is saved, and depleting them when it is loaded. The problem is that this would add substantial overhead by way of making core memory access functions into overloaded functions, would be yet another complex layer on top of the already difficult job of capturing and restoring emulated machine states, and may not even work in practice. It's never been tested, and I simply can't afford to make bsnes the guinea pig for the idea. If it fails, reverting such a massive change by hand may not even be possible; forcing me to revert to a version before attempting this change. Thereby sacrificing months worth of development time.

I'm sorry, but I just don't have the kind of time to test the theory; and the added complexity scares me greatly.

So, this is why bsnes does not and can not have save states. I'm very sorry for the inconvenience this causes. I hope that other emulators in the future can obtain the level of precision I have whilst still using state machines, so that save states are possible. But it's not something I am capable of nor willing to maintain.

Post by byuu »

Why aren't the SuperFX or SA-1 chips emulated?

Similar to what I discussed in save state support, these two chips have severe issues with parallelism. Only it's at least an order of magnitude worse in their case.

Proper, bus level emulation of these chips with modern approaches (state machines or threading) would require tremendous processing power that is not currently available. Octa-core and better processors in the future with truly fine-grained threading may make this possible. But there's no indication that the market is heading toward fine-grained threading at this time, and even still, it's many years off. And even more still until these processors would be common place.

The chips are currently supported by other emulators by means of having more coarse-grained synchronization between them and the main SNES CPU.

This same method could be used in bsnes, at least allowing the games to be playable, albeit with similar emulation bugs.

The reason this has not been done to date is because they are extremely complex chips to support. It would take several months of effort, and I personally believe that time can be better spent working on core emulation, that affects several thousands of games; rather than on special chip emulation that affects only a few dozen.

Going with the less accurate approach to achieve playable speed means that my work here would not even be productive compared to what is already available with other emulators.

I may add support for these chips in the future, but it is not a priority, nor on the to-do list at this time.

Please don't request these features be added.

I sincerely apologize that the features you want are not in bsnes, but asking me about them and/or complaining that they are not there isn't going to speed the process up at all.

I know these features are highly desirable. Please be patient, or provide your programming assistance in adding support for these things.

The old bsnes thread:

The infamous "bsnes thread" was removed. We are all sad to see it go, but the size of said thread became too massive, and was hindering the performance of this message board.

As it contains two years worth of technical discussion, an archive of it has been made available at the links below:

http://rapidshare.com/files/110190007/b ... d.zip.html