2-stage pipeline

Strictly for discussing ZSNES development and for submitting code. You can also join us on IRC at irc.libera.chat in #zsnes.
Please, no requests here.

Moderator: ZSNES Mods

Post Reply
grinvader
ZSNES Shake Shake Prinny
Posts: 5632
Joined: Wed Jul 28, 2004 4:15 pm
Location: PAL50, dood !

2-stage pipeline

Post by grinvader »

Code: Select all

<grinvader> Nach: do we have a model (picture/text, whatever as long as it describes properly) of the 5a22's 2-stage pipeline ?
<grinvader> or will i have to torture byuu for it ?
<grinvader> >:3
Either way, I win.
皆黙って俺について来い!!

Code: Select all

<jmr> bsnes has the most accurate wiki page but it takes forever to load (or something)
Pantheon: Gideon Zhi | CaitSith2 | Nach | kode54
byuu

Post by byuu »

I haven't written up any documentation on it, sadly.

The basic idea is that there are bus cycles and work cycles. Bus cycles read from and write to memory; and work cycles do things like modify registers, set flags, etc.

Basically, the SNES performs bus cycle N while performing work cycle N-1. The latter is always one behind. Makes sense ... do stuff with data you have read while you wait on future data to be read.

(Pretend there's a cheesy picture of a conveyor belt here with the bus section handing things to the work section :P)

It's only noticeable due to the race conditions exposed through opcodes like cli that modify I on the very last cycle. If the processor were one-stage, the I flag clear would have an effect on the IRQ trigger test. But since it's two-stage, it appears to be "delayed" by one opcode (really one cycle, but as it's the last cycle, it effectively pushes it forward one whole opcode before another IRQ can possibly trigger.)

Here's an example:

Code: Select all

lda #$1234; cli

----------

1-stage approach w/ 2-stage simulation, by testing IRQ trigger one cycle early:

cycle X+0: fetch opcode $a9
cycle X+1: fetch operand lo $34
cycle X+2: perform IRQ trigger test; then fetch operand hi $12; then A = #$1234
cycle Y+0: if IRQ test passed, perform IRQ; otherwise fetch opcode $58
cycle Y+1: perform IRQ trigger test; then I = 0
cycle Z+0: if IRQ test passed, perform IRQ; otherwise fetch next opcode

----------

2-stage pipeline:

Note: work/bus cycle ordering does not matter; they happen at the exact same time

work cycle W+?: complete last cycle of previous opcode
bus  cycle X+0: if(IRQ) fetch irq vector lo; else fetch next opcode $a9

work cycle X+0: idle
bus  cycle X+1: fetch operand lo $34 to MDR

work cycle X+1: A.lo = MDR ($34)
bus  cycle X+2: fetch operand hi $12 to MDR + test for IRQ

work cycle X+2: A.hi = MDR ($12)
bus  cycle Y+0: if(IRQ) fetch irq vector lo; else fetch next opcode $58

work cycle Y+0: idle
bus  cycle Y+1: test for IRQ

work cycle Y+1: I = 0
bus  cycle Z+0: if(IRQ) fetch irq vector lo; else fetch next opcode
The only way I can find to emulate that properly would be to make up some sort of microcode system to buffer work commands; or to use two separate threads and single-step each one. The problem is trying to execute the last work cycle inside the first bus cycle. Not really possible when you implement each opcode as its own function.

Sadly, it's better to just simulate it by testing for IRQs before performing the last cycle of each opcode.

The S-SMP (SPC700) almost certainly also has a two-stage pipeline, but without interrupts it really doesn't matter.
grinvader
ZSNES Shake Shake Prinny
Posts: 5632
Joined: Wed Jul 28, 2004 4:15 pm
Location: PAL50, dood !

Post by grinvader »

byuu wrote:I haven't written up any documentation on it, sadly.
Your descriptions are good documentation. Thanks muchly.

But now I got the pincers and spoons ready for nothing, geh. You need to resist more before yielding such critical info, else how will I justify my extra hours as spanish inquisitor ?

Now to let this simmer over a little sugar...
First draft is a weirdly coupled state machine-controlled struct for it. No real surprise.
It seems more sugar can give me access to an opcode-based solution. Will look into it.

Yea... Sure, it's not clear, but that's not my goal. I might have a decent overall solution.
皆黙って俺について来い!!

Code: Select all

<jmr> bsnes has the most accurate wiki page but it takes forever to load (or something)
Pantheon: Gideon Zhi | CaitSith2 | Nach | kode54
byuu

Post by byuu »

It gets even more fun if you want to support the bus hold delays that can be observed from the S-PPU and S-SMP. Need to split every read access into two state table entries each ;)
Your descriptions are good documentation. Thanks muchly.
I thought pagefault had already written the new cycle-level S-CPU core ... are you improving it, or starting on a different one? Just curious.

I understand my core won't work for ZSNES, as it's slow, uses C++ and threading, and doesn't have any notion of save states. But I'd be happy to help you guys with a core if you wanted.

You can ping me on Freenode any time during the weekday if you like.
Yea... Sure, it's not clear, but that's not my goal. I might have a decent overall solution.
Not like all code has to be clear. I already break it down in more of a mark-up language. If people want a reference, they can use that.

I'm very interested in your ideas. I've never come up with a good solution, other than sticking last_cycle() calls all over my CPU core. Please let me know once you have something written up.
Nach
ZSNES Developer
ZSNES Developer
Posts: 3904
Joined: Tue Jul 27, 2004 10:54 pm
Location: Solar powered park bench
Contact:

Post by Nach »

byuu wrote: I'm very interested in your ideas. I've never come up with a good solution, other than sticking last_cycle() calls all over my CPU core.
That sounds like a good candidate for the good ol'

Code: Select all

#define } last_cycle(); }
Of course probably have to make that work with other logic nests being used...
May 9 2007 - NSRT 3.4, now with lots of hashing and even more accurate information! Go download it.
_____________
Insane Coding
byuu

Post by byuu »

That sounds like a good candidate for the good ol' #define } last_cycle(); }
Seems I wasn't clear. last_cycle() has to go before the last work cycle. Otherwise I'd just stick the call immediately after the opcode invocation. Take the worst case example where the last cycle changes based on the register size setting:

Code: Select all

case 0xa9: {
  if(regs.p.m) last_cycle();
  rd.l = op_readpc();
  if(regs.p.m) { op_lda_b(); break; }  //flag calculation, end opcode
  last_cycle();
  rd.h = op_readpc();
  op_lda_w();  //flag calculation
} break;
Might be faster to make both an 8-bit and 16-bit version with a dynamically swappable opcode pointer table. I may consider that if and when I come up with a new template system.

And for why it matters at all ...

Code: Select all

case 0x58: {
  last_cycle();  //this looks at the state of regs.p.i ...
  op_io_irq();   //I/O cycle that becomes a read cycle if IRQ triggers*
  regs.p.i = 0;  //... which is cleared *after* the check
} break;
* speaking of which ... damn that was tough to figure out. Had to execute programs out of temporary registers, faking the MDR values through specially crafted palette color writes and such.

You should probably factor this in to whatever you're working on with this two-stage pipeline thing:

Code: Select all

//immediate, 2-cycle opcodes with I/O cycle will become bus read
//when an IRQ is to be triggered immediately after opcode completion
//this affects the following opcodes:
//  clc, cld, cli, clv, sec, sed, sei,
//  tax, tay, txa, txy, tya, tyx,
//  tcd, tcs, tdc, tsc, tsx, tcs,
//  inc, inx, iny, dec, dex, dey,
//  asl, lsr, rol, ror, nop, xce.
alwaysinline void sCPU::op_io_irq() {
  if(event.irq) {
    //IRQ pending, modify I/O cycle to bus read cycle, do not increment PC
    op_read(regs.pc.d);
  } else {
    op_io();
  }
}
grinvader
ZSNES Shake Shake Prinny
Posts: 5632
Joined: Wed Jul 28, 2004 4:15 pm
Location: PAL50, dood !

Post by grinvader »

byuu wrote:It gets even more fun if you want to support the bus hold delays that can be observed from the S-PPU and S-SMP. Need to split every read access into two state table entries each ;)
You wouldn't happen to have measured these in ppu ticks or some other workable time unit, would you ? ^^
I thought pagefault had already written the new cycle-level S-CPU core ... are you improving it, or starting on a different one? Just curious.
Different. We all have our little favourite quirks and goals, after all. Oh, I did omit to mention this is not for ZSNES at all, which could lead to confusion.
I'm very interested in your ideas. I've never come up with a good solution, other than sticking last_cycle() calls all over my CPU core. Please let me know once you have something written up.
Well actually doing the 2-stage pipeline helps tremendously there, hehe.
However, a low-sugar examination of my first draft presented too many issues with an opcode solution. The struct idea is kept for ease of use and flexibility.

Think of it as a sort of twin FIFO list of commands to execute, each one adding command(s) at the end of the appropriate list and proceeding to the next item... a sort of autofeeding scheduler.
I haven't thought about timing or syncing yet, though. Lone solid core goes first.

I also intend to add my signature crazy features to it, which might lower the speed a bit... still mucho worth it. Of course they would only be interesting to a select few, hahaha.
Or more, who knows.


and no, i'm not telling what they are.
Nach wrote:That sounds like a good candidate for the good ol'

Code: Select all

#define } last_cycle(); }
Of course probably have to make that work with other logic nests being used...
blargh, i hate that sort of stuff.
Don't use a language's power to maim its own syntax (i'm looking at you, operator overloaders) or hide code, please.
皆黙って俺について来い!!

Code: Select all

<jmr> bsnes has the most accurate wiki page but it takes forever to load (or something)
Pantheon: Gideon Zhi | CaitSith2 | Nach | kode54
Nach
ZSNES Developer
ZSNES Developer
Posts: 3904
Joined: Tue Jul 27, 2004 10:54 pm
Location: Solar powered park bench
Contact:

Post by Nach »

grinvader wrote:
Nach wrote:That sounds like a good candidate for the good ol'

Code: Select all

#define } last_cycle(); }
Of course probably have to make that work with other logic nests being used...
blargh, i hate that sort of stuff.
Don't use a language's power to maim its own syntax (i'm looking at you, operator overloaders) or hide code, please.
No, but the point is, you can create a different little language in between which can simplify things.

For example:

Code: Select all

#define START_OPCODE { opcode_init();
#define END_OPCODE opcode_cleanup(); }
#define IF(x) if (x) { reinit_flags();
#define ELSE } else { save_flags();
#define ENDIF }
Now:

Code: Select all

START_OPCODE
  IF (ready())
    begin_countdown();
  ELSE
    delay();
  ENDIF
  proccess_current_state();
END_OPCODE
If you have a situation where you need the same operations done at the exact same state in each function, you can clean up the base logic with an intermediary language, and focus on where the functions differ, instead of littering the main code with whatever overhead you need to emulate something else.
May 9 2007 - NSRT 3.4, now with lots of hashing and even more accurate information! Go download it.
_____________
Insane Coding
grinvader
ZSNES Shake Shake Prinny
Posts: 5632
Joined: Wed Jul 28, 2004 4:15 pm
Location: PAL50, dood !

Post by grinvader »

Nach wrote:If you have a situation where you need the same operations done at the exact same state in each function, you can clean up the base logic with an intermediary language, and focus on where the functions differ, instead of littering the main code with whatever overhead you need to emulate something else.
Or you can spend time doing awesome instead of writing new languages

Code: Select all

returnvaltype look_ma_one_func_only(void (*f1)(), void (*f2)(), void (*f3)())
{
  opcode_init();
  if (x)
  {
    reinit_flags();
    f1(); // hahahahahaha
  }
  else
  {
    save_flags();
    f2();
  }
  f3();
  opcode_cleanup();
}
THAT's what I'd do if things got repeated. Not your silly other way.
I already dislike tremendously macros that break the ; ending, those fucking with {} can go to hell for all I care.
皆黙って俺について来い!!

Code: Select all

<jmr> bsnes has the most accurate wiki page but it takes forever to load (or something)
Pantheon: Gideon Zhi | CaitSith2 | Nach | kode54
Nach
ZSNES Developer
ZSNES Developer
Posts: 3904
Joined: Tue Jul 27, 2004 10:54 pm
Location: Solar powered park bench
Contact:

Post by Nach »

You're not supposed to mention function pointers in the forum :evil:

That's a programming tool reserved to us experts of ZSNES.
May 9 2007 - NSRT 3.4, now with lots of hashing and even more accurate information! Go download it.
_____________
Insane Coding
byuu

Post by byuu »

You wouldn't happen to have measured these in ppu ticks or some other workable time unit, would you ? ^^
I based them upon observations of the delay to latch counters through $2137 reads (2 clocks of 6) and writes to $4201 (6 clocks of 6).

Given the difference between $2137 and $4201, reads are either acknowledged instantly and writes 2 clocks before the end, or reads 2 clocks in and writes at the very end. The latter makes more sense.

Further, TRAC went over some timing doc with me that listed the bus hold delays in uS, and we converted that back to the S-CPU frequency and came up with:

Reads are acknowledged 4 clocks before the cycle ends, writes at the very end, thus:

FastROM: hold 2, wait 4 = 6
SlowROM: hold 4, wait 4 = 8
XSlowROM: hold 8, wait 4 = 12

FastROM write: hold 6
SlowROM write: hold 8
XSlowROM write: hold 12
Oh, I did omit to mention this is not for ZSNES at all, which could lead to confusion.
Share plz. PM is fine. kthxbai.
(i'm looking at you, operator overloaders)
Operator overloads are great, so long as you use them responsibly.

Code: Select all

if(!strcmp(config::driver.video.cstr(), "direct3d"))
if(config::driver.video == "direct3d")
If you have a situation where you need the same operations done at the exact same state in each function, you can clean up the base logic with an intermediary language, and focus on where the functions differ, instead of littering the main code with whatever overhead you need to emulate something else.
If the C preprocessor wasn't such a horribly useless piece of shit, perhaps.

As it stands, I'd rather write in a DSL and chain a translator at compile-time.
Or you can spend time doing awesome instead of writing new languages
Damn C people :P
Sadly, C++ isn't much better there: you can't pass a member function via template parameter. Well, you can ... but you have to pass the object context at run-time. No way to "imply" that you're using "this", if the template function is inside the class that'll be calling it. Because of that, you can't inline the function calls, costing you a lot of speed.
But now I got the pincers and spoons ready for nothing, geh.
On the bright side, you have them ready for some ailurophagy.
You're not supposed to mention function pointers in the forum Evil or Very Mad

That's a programming tool reserved to us experts of ZSNES.
Oh? I didn't realize you accepted the Christ Public License V2 to use those ... if you didn't, you'll be hearing from Jesus Stallman soon.
Nach
ZSNES Developer
ZSNES Developer
Posts: 3904
Joined: Tue Jul 27, 2004 10:54 pm
Location: Solar powered park bench
Contact:

Post by Nach »

byuu wrote: If the C preprocessor wasn't such a horribly useless piece of shit, perhaps.

As it stands, I'd rather write in a DSL and chain a translator at compile-time.
That sounds like something I would do...
Actually, I've done exactly that.

I once was asking the guys in ##c about some really tricky preprocessor stuff, and stretched things a bit farther than they should go. When I asked them if they knew of a way I could do something a bit further, and they answered no, I stated I guess I'll just have to make my own preprocessor with a couple more features.
The guys there said I won't. When I asked for an elaboration, they said, they knew I won't be able to, since anyone who thinks of using a preprocessor for anything isn't capable of writing a preprocessor.

And I guess it's thinking like that that gives us mostly worthless preprocessors, or no default preprocessor in some languages.
May 9 2007 - NSRT 3.4, now with lots of hashing and even more accurate information! Go download it.
_____________
Insane Coding
grinvader
ZSNES Shake Shake Prinny
Posts: 5632
Joined: Wed Jul 28, 2004 4:15 pm
Location: PAL50, dood !

Post by grinvader »

byuu wrote:thus:

FastROM: hold 2, wait 4 = 6
SlowROM: hold 4, wait 4 = 8
XSlowROM: hold 8, wait 4 = 12

FastROM write: hold 6
SlowROM write: hold 8
XSlowROM write: hold 12
Thanks.
Share plz. PM is fine. kthxbai.
Nothing to share. Cannot comply.

Code: Select all

//immediate, 2-cycle opcodes with I/O cycle will become bus read
//when an IRQ is to be triggered immediately after opcode completion
//this affects the following opcodes:
//  clc, cld, cli, clv, sec, sed, sei,
//  tax, tay, txa, txy, tya, tyx,
//  tcd, tcs, tdc, tsc, tsx, tcs,
//  inc, inx, iny, dec, dex, dey,
//  asl, lsr, rol, ror, nop, xce.
Issue: logic dictates that txs (0x9a) belongs in this list. Is it an exception ?

Edit: oh wait, i see 2 tcs. Typo get !

Also, the behaviour of XBA would be interesting, given it's another 'implied' opcode yet 3 cycles.
皆黙って俺について来い!!

Code: Select all

<jmr> bsnes has the most accurate wiki page but it takes forever to load (or something)
Pantheon: Gideon Zhi | CaitSith2 | Nach | kode54
byuu

Post by byuu »

grinvader wrote:Issue: logic dictates that txs (0x9a) belongs in this list. Is it an exception ?
XBA works as expected, this doesn't apply. Seriously, what is it with ZSNES devs and secrecy? It's very rude :P
Also, the behaviour of XBA would be interesting, given it's another 'implied' opcode yet 3 cycles.
Nothing to share. Cannot comply.

:P
adventure_of_link
Locksmith of Hyrule
Posts: 3634
Joined: Sun Aug 08, 2004 7:49 am
Location: 255.255.255.255
Contact:

Post by adventure_of_link »

Nach wrote:That's a programming tool reserved to us experts of ZSNES.
and neither byuu or grinvader are ZSNES experts?
<Nach> so why don't the two of you get your own room and leave us alone with this stupidity of yours?
NSRT here.
Nach
ZSNES Developer
ZSNES Developer
Posts: 3904
Joined: Tue Jul 27, 2004 10:54 pm
Location: Solar powered park bench
Contact:

Post by Nach »

adventure_of_link wrote:
Nach wrote:That's a programming tool reserved to us experts of ZSNES.
and neither byuu or grinvader are ZSNES experts?
grinvader is indeed a ZSNES expert, that doesn't mean he should give away our trade secrets on the forum though.

byuu sadly is not a "ZSNES Expert", he may be an expert in using ZSNES, but he is not a member of the ZSNES Expert Stronghold Alliance, members who have altered a significant % of ZSNES' codebase.
May 9 2007 - NSRT 3.4, now with lots of hashing and even more accurate information! Go download it.
_____________
Insane Coding
grinvader
ZSNES Shake Shake Prinny
Posts: 5632
Joined: Wed Jul 28, 2004 4:15 pm
Location: PAL50, dood !

Post by grinvader »

Nach wrote:members who have altered a significant % of their brain tissue.
fixed
皆黙って俺について来い!!

Code: Select all

<jmr> bsnes has the most accurate wiki page but it takes forever to load (or something)
Pantheon: Gideon Zhi | CaitSith2 | Nach | kode54
byuu

Post by byuu »

Nach wrote:byuu sadly is not a "ZSNES Expert", he may be an expert in using ZSNES, but he is not a member of the ZSNES Expert Stronghold Alliance, members who have altered a significant % of ZSNES' codebase.
That's gratitude for you. I've submitted hundreds of kilobytes worth of code for the massive debugger upgrade, which you can see here (my contribution in red):

Image

Then there was the several months I spent laboring -- researching and rewriting the InitSNES2 function from ASM to C ...

And that's to say nothing of my complete rewrite of the ZSNES font interface to support variable-width letters that was rejected. As seen below:

Image
diminish

Post by diminish »

Srsly, since when is rocket science involved with function pointers? (sarcasm detector overload?) Cannot see "secrets".
grinvader wrote:
Nach wrote:members who have altered a significant % of their brain tissue.
fixed
That sounded like it were reserved for some only (well, actually from other point of view - true).
creaothceann
Seen it all
Posts: 2302
Joined: Mon Jan 03, 2005 5:04 pm
Location: Germany
Contact:

Post by creaothceann »

byuu wrote:That's gratitude for you. I've submitted hundreds of kilobytes worth of code
Tiny drop in the sea?
vSNES | Delphi 10 BPLs
bsnes launcher with recent files list
AamirM
Regen Developer
Regen Developer
Posts: 533
Joined: Sun Feb 17, 2008 8:01 am
Contact:

Post by AamirM »

Hi,
It seems more sugar can give me access to an opcode-based solution. Will look into it.
Here is what I came up with in my old opcode based CPU emulator in TE to simulate pipeline and I haven't seen it fail anywhere yet:

Code: Select all

void cli(void)
{
	if( status & s_i )
	{
		status &= ~s_i;
		if(!irq_pending)
			irq_pending = 2;
	}
}

.......

void cpu_emulate()
{

	while(cycles > 0)
	{
		// execute instruction

		jump_table[read_op()]();

		if ( irq_pending )
		{
			if ( irq_pending == 1 )
			{
				if ( !(status & s_i) )
				{
					irq_pending--;
					take_irq();
				}
			} else
			{
				irq_pending--;
			}
		}

	}
}
stay safe,

AamirM
I.S.T.
Zealot
Posts: 1325
Joined: Tue Nov 27, 2007 7:03 am

Post by I.S.T. »

byuu wrote:
Nach wrote:byuu sadly is not a "ZSNES Expert", he may be an expert in using ZSNES, but he is not a member of the ZSNES Expert Stronghold Alliance, members who have altered a significant % of ZSNES' codebase.
That's gratitude for you. I've submitted hundreds of kilobytes worth of code for the massive debugger upgrade, which you can see here (my contribution in red):

Image

Then there was the several months I spent laboring -- researching and rewriting the InitSNES2 function from ASM to C ...

And that's to say nothing of my complete rewrite of the ZSNES font interface to support variable-width letters that was rejected. As seen below:

Image
Hate to necrobump, but that picture is one of the most awesome things I've seen on this forum.
mozz
Hazed
Posts: 56
Joined: Mon Oct 10, 2005 3:12 pm
Location: Montreal, QC

Post by mozz »

byuu wrote:I haven't written up any documentation on it, sadly.

The basic idea is that there are bus cycles and work cycles. Bus cycles read from and write to memory; and work cycles do things like modify registers, set flags, etc.

Basically, the SNES performs bus cycle N while performing work cycle N-1. The latter is always one behind. Makes sense ... do stuff with data you have read while you wait on future data to be read.

(Pretend there's a cheesy picture of a conveyor belt here with the bus section handing things to the work section :P)

It's only noticeable due to the race conditions exposed through opcodes like cli that modify I on the very last cycle. If the processor were one-stage, the I flag clear would have an effect on the IRQ trigger test. But since it's two-stage, it appears to be "delayed" by one opcode (really one cycle, but as it's the last cycle, it effectively pushes it forward one whole opcode before another IRQ can possibly trigger.)

...
I posted something about this some time ago... (Ah. Here it is.)

Not quite a picture, but surely cheesy enough for this thread:

Code: Select all

  What really happens...

InsnA-Mem1 
InsnA-Mem2   InsnA-Exec1 
InsnA-Mem3   InsnA-Exec2 
InsnB-Mem1   InsnA-Exec3    <-- the overlap cycle 
InsnB-Mem2   InsnB-Exec1 
...          InsnB-Exec2
However, since we are emulating things sequentially anyways, we find it convenient to pretend that the "execution" work of a cycle occurs at the same time as the "memory" work of the PREVIOUS cycle.

Code: Select all

What actually happens on    |    However, we find it convenient 
hardware looks like this:   |    to emulate it like this: 
                            | 
...                         |    ...            
InsnA-Mem1                  |    --------------    
--------------              |    InsnA-Mem1 
InsnA-Exec1     <-- (same   |    InsnA-Exec1    
InsnA-Mem2      <-- time)   |    -------------- 
--------------              |    InsnA-Mem2    
InsnA-Exec2                 |    InsnA-Exec2    
InsnA-Mem3                  |    -------------- 
--------------              |    InsnA-Mem3    
InsnA-Exec3                 |    InsnA-Exec3    
InsnB-Mem1                  |    -------------- 
--------------              |    InsnB-Mem1    
InsnB-Exec1                 |    InsnB-Exec1    
InsnB-Mem2                  |    -------------- 
--------------              |    InsnB-Mem2    
InsnB-Exec2                 |    InsnB-Exec2    
...                         |    --------------
grinvader
ZSNES Shake Shake Prinny
Posts: 5632
Joined: Wed Jul 28, 2004 4:15 pm
Location: PAL50, dood !

Post by grinvader »

Thanks.
皆黙って俺について来い!!

Code: Select all

<jmr> bsnes has the most accurate wiki page but it takes forever to load (or something)
Pantheon: Gideon Zhi | CaitSith2 | Nach | kode54
byuu

Post by byuu »

Right, and the IRQ test actually occurs during the ----- part after the very last memory cycle under your "what actually happens on hardware" list.
Eg for your example, right after { InsnA-Exec2 + InsnA-Mem3 }, but before { InsnA-Exec3 + InsnB-Mem1 }. Makes sense when you think about it, the bus thinks it's about to start a new opcode. So this is why InsnA-Exec3's effects on P.I aren't seen yet.

Thanks for the diagram. I was having trouble remembering how I laid mine out before, but yours works great.

It'd still be cool to figure out a really nice system to simulate N-stage pipelines, if only as a reference for processors where N>2.

Then again, I'm not entirely sure anyone's going to care about 100% accuracy past the SNES. Things like 20+-stage x86 processors with three levels of cache ...
Post Reply