Multithreaded emulator state saving

blargg · Post by **blargg** » Mon Jan 21, 2008 12:51 am

Rather than get lost in the bsnes thread, let's discuss state saving in a multithreaded emulator like bsnes here. I wrote some minimal demo code, including a very simple cooperative thread implementation. There are examples of single-threaded (trivial), dual-threaded done wrong, and dual-threaded using mozz's log-replay solution. byuu might be able to improve the example to be closer to how bsnes works.

bsnes_state_save.zip

henke37 wrote:Yes, it does start at CPU::run. But the trick here is that when loading, we feed it the data that causes it to move to the right sync call. We ignore what it does with the data, it is just a trick to get it into the right position. Once the thread is swapped out, the data in the thread is completely ignored and overwritten with data from the save state(remapped as needed).

Makes sense. You want to ultimately get the thread to some deeply nested point, maybe a few function calls deep, but you can't easily just put it there from the start. So you start it at the beginning and feed it doctored inputs for the sole purpose of getting it to that deeply nested point, not caring at all what you have to feed it or what else it does, since you've ensured that whatever it does won't have any ill effects. Then when it's there, it yields and you then restore all the data state to the proper values.

henke37 wrote:Also, you claim that the memory layout may be different at each time. So what? How do you think exceptions with their unrolling of the stack is supposed to work? (Rhetorical question) It is smart enough to understand the stack frames, so why can't the threading library understand them?

Say a new variable is added to a function; what will this be set to when restoring an old save state? For that matter, what if a function is removed entirely but an old save state was saved while in this function? The problem here is that highly volatile aspects of the implementation are captured in the save state, requiring that the implementation be frozen or states be broken constantly.

Exceptions work because the compiler regenerates the information every time you recompile, and exceptions don't save state across runs of a program. It only has to deal with the current program, which it has current tables for.

henke37 wrote:And seriously, if we understand the stack frames, what prevents us from parsing them, saving the data in a portable format and later pre filing the stack with the data and then "returning" to the function?

Maintainability. For every version of the program, you must be able to load states from previous versions, down to every last detail, converting state to the new format as necessary. Function removed in new version? Well, you'll have to figure out how to convert the old state within that function into the equivalent where that function doesn't even exist.

"But, the same issues apply to file format compatibility!" you say. Yes, but the layout of a file can be independently maintained from the organization of the functions and local variables, so changes to a file format can be kept to a minimum. Saving stack frame contents tightly couples the two, making changes frequent and unplanned.

henke37 wrote:Yes, it is portable, the data on the stack is after all just the function's local variables, the return address and the arguments to the function. That data does not change. Sure, the format of the data can be different, but it is still the same values, just in different formats.

Sure, the actual binary format is very platform, compilator and build setting specific. But the data that is stored is not.

First, let's imagine a spherical program. Seriously, this kind of abstraction might be possible in Java, but forget about something like cothreads in the first place on that straight-jacket language.

creaothceann · Post by **creaothceann** » Mon Jan 21, 2008 7:56 am

So savestates will be build-specific? Still worth the trouble imo.

henke37 · Post by **henke37** » Mon Jan 21, 2008 9:20 am

blargg wrote:Seriously, this kind of abstraction might be possible in Java, but forget about something like cothreads in the first place on that straight-jacket language.

Doesn't Java have non cooperative threads? that can probably be hacked up to be used as cooperative threads for the sake of examples. Not like I recommend it.

Yes, I kinda did mention it, but not good enough, that these kind of tricks can either be limited to a specific version of the source (ok for public releases) or a specific build (Face it, most people don't need compability between different builds).

And before somebody claims that it would be impossible to get the stack format right with every build, let me prove you wrong with an example application that does it right, the debuggers.
The debuggers does indeed know the stack frame format, and there is nothing stopping you from using the debugging data to implement these kinds of tricks.

A trick that I thought up to ID each call to the sync function is surprisingly simple and has a speed penalty of 0. It is build specific (unless mapped with debug data, then it is source specific), but it truly is O(0) in complexity. The trick is simple, read the return address of the stack, it's already there! The return address is already used as an unique ID to return to, so there is no speed hit, compared to the alternative of adding an argument to the sync call.
It would be a trivial thing to build a mapping between the (mangled) function name + call count to and from the return addresses.
Even if the binary is relocated at every single load, it is fast enough to build such a mapping table at load time.

This mapping is only slightly less stable than a manually assigned ID for each call. In fact, it can be mapped to and from such an ID.

It is possible to filter all the data in each stack frame and end up with completely portable data, the actual state of each chip. This data can further be converted to and from emulator specific formats that may have more or less variables that need to be saved.
But the basic common ground can be saved in a portable format.

byuu · Post by **byuu** » Mon Jan 21, 2008 8:36 pm

blargg, I take it libco won't build at all on your target machine?

The reason I ask is because your setjmp / longjmp trick won't work if you yield from a different function that you entered from (stack misalignment.) Would probably make the test more realistic if you could do that.

Now then,

Even a single-threaded emulator requires a state machine to save the program execution position. They usually require a state machine (or multiple machines) anyway to step the other processor(s), which is why savestates are (mostly) a freebie for them. A multi-threaded implementation really isn't special in that regard. You just need one state machine per thread.

What I'm getting at is that the whole point of cothreading is to eliminate the need for these state machines entirely. That was the design goal. The reason for wanting to eliminate state machines was because I kept increasing the precision of the emulator, beyond that of what any modern emulator had attempted. More precision means more state machines.

At first I just had something like:

Code: Select all

void CPU() {
  switch(state) {
    case IRQ: ...
    case DMA: CPU_dma(); break; //transfer one DMA byte
    case Opcode: CPU_opcode(); break; //execute one entire opcode
  }
}

Which matched the precision of every other SNES emulator at the time. Easily maintainable.

When I started trying to get cycle-level precision, that code grew to:

Code: Select all

void CPU() {
  switch(state) {
    case Opcode: (this->*optbl[state.opcode])(); break; //execute one cycle
  }
}

void CPU_opcode_lda_addr() {
  switch(state.opcode_cycle++) {
    ...
    case 2: read(); break;
    //start new opcode for next CPU() call.
    case 4: state.opcode_cycle = 0; state.opcode = fetch(); break;
  }
}

And then I wanted to emulate the bus hold delay, which meant adding another state machine around each opcode cycle, eg:

Code: Select all

void CPU_opcode_lda_addr() {
  switch(state.opcode_cycle) {
    ...
    case 2:
      switch(state.opcode_cycle_part) {
        case 0:
          wait_cycles(4);
          state.opcode_cycle_part++;
          break;
        case 1:
          read();
          state.opcode_cycle_part = 0;
          state.opcode_cycle++;
          break;
      }
      break;
    //start new opcode for next CPU() call.
    case 4:
      state.opcode_cycle = 0;
      state.opcode = fetch();
      break;
  }
}

And it was really growing out of hand. Without the state machine, the above code looks like this:

Code: Select all

void CPU_opcode_lda_addr() {
  ...
  read();
}

Quite a bit more readable, isn't it? The red tape of software-based state machines made up 90+% of the code. Not to mention, that extra code is extremely error prone.

And the point of all of this ... now that we're wanting to add back savestates, I don't see how we're going to bypass the need to have these state machines.

If we were to add the state machines back, then there would be no point to using cothreading in the first place. It would just be redundant.

If we were to come up with a way of bypassing the need for state machines, we would most likely subsequently also eliminate the need for state machines in single-threaded code. The fact that I've yet to ever see anyone do this makes me doubtful that there is a solution, but I'm certainly willing to continue trying, rather than saying it's impossible outright.

Feeding each thread intentionally invalid data to force it to get to a certain point without any adverse effects seems rather promising, but quite complicated. It seems more sustainable than mozz's I/O buffer hijack approach, but more hackish, too.

Doesn't Java have non cooperative threads? that can probably be hacked up to be used as cooperative threads for the sake of examples. Not like I recommend it.

Exactly the reason I avoid Java. It's a one-size-fits-all approach to programming. Whereas a sane human being would use DSLs as much as possible. And if you can mold a language you want to solve your problem, all the better.

And yes, pre-emptive therading is beyond worthless as a cooperative threading solution. My benchmarks using Win32's CreateThread(), etc API with this approach yielded me roughly ~100,000 switches per second. At least 100x slower than cooperative.

blargg · Post by **blargg** » Mon Jan 21, 2008 9:43 pm

byuu wrote:blargg, I take it libco won't build at all on your target machine?

Didn't try (probably wouldn't), but when researching things I make them as simple as possible.

The reason I ask is because your setjmp / longjmp trick won't work if you yield from a different function that you entered from (stack misalignment.) Would probably make the test more realistic if you could do that.

Sorry, I don't follow what you mean. I tested my super-simple implementation in various circumstances and encountered no problems on Linux-x86 and Mac OS Classic-PowerPC.

Feeding each thread intentionally invalid data to force it to get to a certain point without any adverse effects seems rather promising, but quite complicated. It seems more sustainable than mozz's I/O buffer hijack approach, but more hackish, too.

Yeah, it's not practical and hacky in the sense that any little change in how something is implemented might require changes to this PC-restoring code.

henke37 wrote:Yes, I kinda did mention it, but not good enough, that these kind of tricks can either be limited to a specific version of the source (ok for public releases) or a specific build (Face it, most people don't need compability between different builds).

can be->must be. I suppose it would be acceptable if one used save states as a way to pause a game for an extended period, and for debugging, but people also like to build libraries of save states at various points in games.

The debuggers does indeed know the stack frame format, and there is nothing stopping you from using the debugging data to implement these kinds of tricks.

Yeah, but debuggers don't try to save and restore the program state. This is certainly possible, but it's platform and compiler-specific, and would be a ton of work.

It would be a trivial thing to build a mapping between the (mangled) function name + call count to and from the return addresses.

You keep using that word. I do not think that word means what you think it means.

It is possible to filter all the data in each stack frame and end up with completely portable data, the actual state of each chip. This data can further be converted to and from emulator specific formats that may have more or less variables that need to be saved.

That is a glimmer of hope. To make it worthwhile, someone working on this would have to make the system fairly general, for saving any program's state.

Care to try implementing any of this in the example code I posted?

henke37 · Post by **henke37** » Mon Jan 21, 2008 10:27 pm

blargg wrote: can be->must be.

Noted.

blargg wrote: Yeah, but debuggers don't try to save and restore the program state. This is certainly possible, but it's platform and compiler-specific, and would be a ton of work.

Well, a co threading library is kinda specific too you know.

blargg wrote:
It would be a trivial thing to build a mapping between the (mangled) function name + call count to and from the return addresses.
You keep using that word. I do not think that word means what you think it means.

Implementation taking 4 lines max is what I call it being trivial.

blargg wrote: Care to try implementing any of this in the example code I posted?

I am a good thinker, but my actual implementation skills lacking in this area + me having other projects that I do want to code, leads me to have to refuse.

bobthebuilder · Post by **bobthebuilder** » Mon Jan 21, 2008 11:57 pm

blargg wrote:<Iñigo Montoya voice>You keep using that word. I do not think that word means what you think it means.</Iñigo Montoya voice>

Sorry, I just have to point out your joke.

Post by **grinvader** » Tue Jan 22, 2008 8:00 pm

To be perfectly nerdy, the quote is "I do not think it means what you think it means."

Now back to your regular thread.

Deathlike2 · Post by **Deathlike2** » Tue Jan 22, 2008 8:06 pm

henke37 wrote:
blargg wrote:
blargg wrote:
It would be a trivial thing to build a mapping between the (mangled) function name + call count to and from the return addresses.
You keep using that word. I do not think that word means what you think it means.
Implementation taking 4 lines max is what I call it being trivial.

I get the strong feeling that this is simply not 4 lines max in the real world.

blargg wrote: Care to try implementing any of this in the example code I posted?
I am a good thinker, but my actual implementation skills lacking in this area + me having other projects that I do want to code, leads me to have to refuse.

This is where theory sounds like crap if you can't actually put up an example.

Nightcrawler · Post by **Nightcrawler** » Wed Jan 23, 2008 2:23 pm

What a cop-out. You've carried on through this whole topic, start definitively stating that what you're talking about can be implemented in 4 lines or less, and then when asked to show an example in the example code blargg had or in BSNES itself, you all the sudden can't be bothered and don't have time to implement 4 lines of code.

How do you expect any developer to take you seriously?

Not worth your time gentlemen. Let's move on.

Verdauga Greeneyes · Post by **Verdauga Greeneyes** » Wed Jan 23, 2008 3:53 pm

I have to agree.. I'm not one to judge quickly, but atleast cough up some pseudo-code in whatever syntax seems natural to you, or there's nothing to work with.

henke37 · Post by **henke37** » Wed Jan 23, 2008 10:22 pm

Right right, i can do pseudo code.

Code: Select all


At application launch:
  Load the stack frame format into memory using debugging info

At the sync call:

  If we are saving state,
    for each thread,
      For each stack frame
        (using the debug info)
        save each local variable
        save the parameters too
        save the return address, mapped to a code relocation safe id(mangled function name + N in "the Nth call to sync in the function")
      end for
      save any per thread data, like the sync info
      //Note, no need to save eip, since all we need is the topmost return address
      But we should probably save the registers that the threading library saves
    end for

    save global stuff like the full ram contents.
  end if

At state load:

  Empty any buffers, like the sound buffers.

  Nuke( aka destroy) all co threads without saving state

  create each thread using the saved data:
    allocate a new stack
    for each saved stack frame
      create a new stack frame in the new stack:
        (using the debug info)
        write the function parameters and local variables to the new stack frame
        write the mapped return address to the new stack frame
        put more boring stack frame generating code here, like stack canaries and what not
      end stack frame creation
    end for
    load thread specific data, like the sync info
    load thread registers
  end thread creation
  load global data like the ram contents

  uh, maybe do a screen redraw, just for the heck of it

The algorithms is not difficult, assembler is just not on my skill list.

About the thing that is trivial, I am talking about just using std::map to map a not that portable thing (the return address) to something that is portable, as long the source is the same(mangled function name + N in "Nth call to sync in the function").

mozz · Post by **mozz** » Mon Jan 28, 2008 8:22 pm

Wow, I've been away from these boards for a long time...

I tracked down the old thread where I proposed the "log replay" solution mentioned by blargg: http://nesdev.parodius.com/bbs/viewtopic.php?t=2174

Note one advantage of this is that it *doesn't* depend on how you implement the instructions internally (i.e. if the "implicit state machine" is modified, or something). All it depends on, is that your emulated CPU instructions cause the right side effects (i.e. memory accesses) in the right order. In that case, having logged the values used in those reads and writes, when resuming the savestate, we are able to trap the read and write attempts by the CPU and "replay" those values until they run out. At that point we switch back to the regular memory handlers and allow new side effects to happen for real.

At least, this is how I remember it... (geez its been a long time since I thought about emulation stuff!)

Are there more recent threads about this stuff that I should read through?

DancemasterGlenn · Post by **DancemasterGlenn** » Tue Jan 29, 2008 6:00 am

mozz wrote:Are there more recent threads about this stuff that I should read through?

None more recent, but if you wade through the last few pages of the bsnes thread you'll probably find bits and pieces. This thread began because of the discussion that started in there.

Also, welcome back.

mozz · Post by **mozz** » Wed Jan 30, 2008 2:00 am

Okay, I've read through a lot of interesting stuff in the bsnes thread (yay!), including the handful of posts that spawned this one...

non portable save states

Forgive me, but I think it would be a terrible idea to implement save states which might then become un-loadable due to minor changes to the emulator. Anything storing directly from the host machine state (host registers, host stack...) would be pretty fragile and seems like a bad idea.

I still think "log replay" is the way to go.

I guess the question is, how to implement it so that its not too complicated, and it doesn't hurt the performance of memory accesses for the "normal" case?

Panzer88 · Post by **Panzer88** » Wed Jan 30, 2008 2:37 am

creaothceann wrote:So savestates will be build-specific? Still worth the trouble imo.

agreed.

byuu · Post by **byuu** » Wed Jan 30, 2008 4:21 am

I still think "log replay" is the way to go.

I agree, it's the best (and only practical) solution presented thus far.

However, I'm extremely hesitant to try it. Why? Let's say I spend three months trying to add this into bsnes. And then let's say I overlook one small detail that makes the solution either impossible or far too slow to be practical. Now I have to wade back through and remove the whole system. That, or completely revert and lose all other progress during those several months.

I simply don't want to take the chance with my codebase at this time. I don't want to be the first to try such a radically new idea, even though I do believe it to be a workable solution. Perhaps when I'm not so actively working on other things (better Linux support, special chip support, better PPU emulation, etc etc), I'll be more willing to take the risk.

funkyass · Post by **funkyass** » Wed Jan 30, 2008 5:13 am

how large could this log get, do we log from rom load or just what went into the last frame?

henke37 · Post by **henke37** » Wed Jan 30, 2008 9:05 am

My idea requires nothing more than a single point of modification to the bsnes code, the gui code to save/load the savestate.

Sure, there is code to be written, but it would be a part of the cothreading library.

blargg · Post by **blargg** » Wed Jan 30, 2008 2:05 pm

It's "put up or shut up" time. Post an implementation of this (even if only for a toy program, like the working code I wrote and posted), or stop claiming how easy (you think) it is.

mozz · Post by **mozz** » Wed Jan 30, 2008 3:51 pm

funkyass wrote:how large could this log get, do we log from rom load or just what went into the last frame?

The log info would be pretty small (hopefully). You don't even need to log an entire frame---probably just a very small region necessary to get you from an instruction boundary, to the "precise save time" that you want ALL processors to have in the savestate.

The idea goes something like this:

(1) Simulate the CPU, APU, PPU until their simulated clocks are all at a "safe stopping point", which is close to the same chosen time. E.g. for the CPU or APU, it would probably be at an instruction boundary.
(2) Replace all the handler functions which the CPU core uses to read and write memory (for example), with logging versions. These versions execute the original version of the handler, and then they check a "has already been saved" boolean flag (e.g. for CPU read/write handlers, it would be the "has the CPU state been saved already" flag). If this flag is true, then it must write a log entry giving the address read or written, and the value which was returned or stored. You might want to log the identity of the handler too, to avoid confusion (or use separate logs for each?).
(3) ** When a handler is called and the "exact save time" has been reached, maybe it should suspend the cothread? I.e. it might be unsafe to run "past" the point where you want to save. This seems like a source of complexity.

(3) Once all of the simulated processors have reached the "exact save time", we save all of the logs, as well as the contents of main memory, registers accessed like memory, etc.

So the log information only needs to be collected about side effects caused by a processor AFTER ITS STATE HAS BEEN SAVED.

When loading, you restore each processor to the saved state (which will probably be, an instruction boundary or whatever), but it is temporarily isolated from other processors, the CPU is isolated from main memory, etc. You have to use the logged data to get them 100% "in sync" before they resume normal operation. So you install "replay" handlers, which expect the CPU to issue the same reads and writes that it issued when the savestate was created. You satisfy reads from the logged data, and maybe you check writes against the logged data (different value == corrupt or incompatible savestate?). In effect, the log tells you exactly how to advance the emulated processor from its saved position (an instruction boundary) to the "chosen save time" of the savestate as a whole. For each processor, you feed it log entries one by one until they run out, then you suspend it and do the same with the next processor. For example, once the CPU has been "caught up" like this, it will match the saved state of the main memory. Once all processors have been fed their logs, you restore their regular handlers for memory access, and resume running normally.

One thing that might be tricky to get right, is any interactions between processors. The "consentual" communcation between CPU<-->APU over their I/O ports will probably be handled fine (each processor must do a read in order to be affected by what the other one has done). Things like IRQs are more worrisome.

Honest question: is it worth the complexity to have a savestate where all the processors are at "the same" simulated time?
Why not just step them all to *some* reasonable stopping point (such as instruction boundary) and then save all state, INCLUDING the current simulated time of each processor? The only reason I can think of to try and line them up at the *exact same time* is to make it easier for other emulators or other tools to correctly interpret the savestate, which "log replay" is not very good for either.

I.e. does it even matter if I've simulated the CPU 200 cycles farther ahead than my APU? Can't I just save them both as they currently are (or at the next instruction boundary for each) and then when restoring them, restore them exactly as they are including the two different simulated timers? Wouldn't that be easier and simpler? Its less likely to break because of a change in emulator code between versions, too (an old savestate could still capture minor timing differences between an old version of the emulator and the new version, but that is unavoidable in any case).

mozz · Post by **mozz** » Wed Jan 30, 2008 4:00 pm

mozz wrote:Honest question: is it worth the complexity to have a savestate where all the processors are at "the same" simulated time?
Why not just step them all to *some* reasonable stopping point (such as instruction boundary) and then save all state, INCLUDING the current simulated time of each processor?

Damn, now I remember the answer... its hard or impossible in general to guarantee that you can stop all processors at a "safe point" such as an instruction boundary.

The usefulness of the "log replay" idea is that it gives us a safe way to advance A to a stopping point, then save it, then advance B to a stopping point *while recording any further stuff that A was forced to do because of advancing B*, etc.

I'm not sure anymore if the way I've described it above will achieve that or not. Its been so long since I had all the details clear in my head.

Jipcy · Post by **Jipcy** » Wed Jan 30, 2008 5:23 pm

mozz wrote:The usefulness of the "log replay" idea is that it gives us a safe way to advance A to a stopping point, then save it, then advance B to a stopping point *while recording any further stuff that A was forced to do because of advancing B*, etc.

Would it be possible, then, that there would be certain conditions that would cause an infinite loop? As in, each time you advanced A or B or C to a safe point, it affected another processor, going round in a round in a never-ending circle?

In terms of actual implementation:
Would the difference between when a user requests a save state and the actual saved state of the emulator always be less than a frame? Or is there any conditions (as above, infinite or very long loops) where the difference becomes more than one frame? Is this a problem?

I'm not a programmer, so I'm certainly not trying to talk as if I were. But I am trying to visualize mozz' idea.

byuu · Post by **byuu** » Wed Jan 30, 2008 5:43 pm

My idea requires nothing more than a single point of modification to the bsnes code, the gui code to save/load the savestate.

Nobody here can understand your idea. That leaves two possibilities. Either everyone else here is not smart enough to comprehend it, or you don't understand how difficult it really would be.

I honestly don't know which is the case, but it doesn't really matter. I can't implement an idea of yours that I can't personally understand. "(using the debug info)" written in pseudo-code doesn't help me any.

Would it be possible, then, that there would be certain conditions that would cause an infinite loop? As in, each time you advanced A or B or C to a safe point, it affected another processor, going round in a round in a never-ending circle?

That would happen if you tried to sync all processors to a known stopping point and then save. The log replay avoids that case by only waiting on one processor to sync, and then allows all processors to continue. So long as each processor eventually reaches a stopping point, that is. Once all processors have passed a sync point, you'll have enough logged data for your savestate. It doesn't matter if other processors that were in sync get out of sync again, that's what the log is for.

Why not just step them all to *some* reasonable stopping point (such as instruction boundary) and then save all state, INCLUDING the current simulated time of each processor?

The best I could do is to wait until the most complex processor (the S-CPU) reaches a stopping point, and try and catch up the S-SMP, S-DSP and S-PPU as much as possible, and failing that, forcefully ignore the fact that they are out of sync and continue running them to save points. The problem is that, although it would be rare, it would not be a perfect save. It's very possible the game could crash when you try and save a state, as well as when you try and load it. Even if that's a 1:10,000 chance, it isn't worth the risk. One crash right before a major boss battle, and I've lost a potential user forever.

henke37 · Post by **henke37** » Wed Jan 30, 2008 6:53 pm

byuu wrote:"(using the debug info)" written in pseudo-code doesn't help me any.

Ok, my bad. Then I will be more elaborate.

The using the debug info parts means that the structure of the stack frame is to be looked up in the debugging info that the compiler generates.

For example, the step "save each local variable" expands to something like this:

Code: Select all

Find the place the function was calling the function it called(but the function has not yet returned) using the saved return address for the called function.
In the stackframe format database(aka debug data), find the info entry that is for this callee, at the spot the function call from it happened.
For each local variable at the time of the call
  Lookup in the debug info where the variable is stored (stack offset/register).
  Read the value of the variable
  Save the value and name of the variable (see the example for a bloated way)
Next variable

Things that is looked up include the location of every local variable, the return address and the parameters.

Also, my idea is something that can not be done fully in c++. It needs to be done in assembler, because it is going to have to read and write the raw bytes that makes up a stack frame. True, it will be very simple for the bsnes only code, since it is just two functions to call, CothreadLoad and CothreadSave.
But the cothreading library will need to have these functions written, it is those that I wrote pseudo code to before.

Here is an example of how a savefile format could look.
The data is not going to be close to what bsnes uses, I honestly don't have time to fake a working save, this is after all only to illustrate what data there would be saved. My comments in parentheses.

Code: Select all

romname=Lol.smc
romhash=1234567890
emulatorbuilduniqueid=9876544321

Thread 1:
%MAINCPU::Run_NofuckingparmsNofuckingreturn%:
Localvars:
TimetoDMAthingy=124
PointerToBsnesCoreObject=#1
ThatStringObject={
PointertoData=#2
Length=5
}
Parameters:
ReturnAddress:
ThatCoThreadMainReturnsFunction.

%MAINCPU::HandleOPCode_IntParmNofuckingreturn%:
Localvars:
AddressToWriteTo:5820
SumOfAddition:52
Parameters:
42
ReturnAddress:
InTheMiddleOfCPU::Run

%CoThread::Sync%:
ReturnAddress:
InTheMiddleOfMAINCPU::HandleOpCode

(Stack data ends)
(Threadwide data follows)
TimeUnitsBeforeThreadNeedsToRun=123

(Next thread)

Thread 2:
%SOUNDCPU::Run_NofuckingparmsNofuckingreturn%:
Localvars:
PointerToSoundbuffer=#3
Parameters:
ReturnAddress:
ThatCoThreadMainReturnsFunction.

%SOUNDCPU::CalcSoundFrame_IntParmReturnsInt%:
Localvars:
DoblerEffectDataDummy:842
Parameters:
621
ReturnAddress:
InTheMiddleOfSOUNDCPU::Run

%CoThread::Sync%:
ReturnAddress:
InTheMiddleOfSOUNDCPU:CalcSoundFrame
(Stack data ends)

(Threadwide data follows)
TimeUnitsBeforeThreadNeedsToRun=47

(No more threads)
(Global data follows)
VideoRam=00000121012398 29342304982509r28 09254820
ThatOtherRam=9294389829382987
CurrentVideoMode=3
(PointerRefference table follows)
#1=object of class Bsnes { rocks=1 }
#2=char array(5) "yeah\0"
#3=object of class EpicSoundTorture { quality="badashell"}