SNES internal header detection

Archived bsnes development news, feature requests and bug reports. Forum is now located at http://board.byuu.org/
Locked
byuu

SNES internal header detection

Post by byuu »

Okay, something I really dislike, but a necessary evil.

This topic is for discussing how to detect the SNES internal header. This is not the same thing as the copier header, which is trivial to detect (well, so long as file is padded to a multiple of 1kb or so.)

This is what I have at the moment:

EDIT: updated slightly per below.

Code: Select all

unsigned Cartridge::score_header(unsigned addr) {
  if(cart.rom_size < addr + 64) return 0;
  int score = 0;

  uint8 *rom = cart.rom;
  if((rom[addr + MAPPER] & ~0x10) == 0x20 && addr <  0x008000) score++;
  if((rom[addr + MAPPER] & ~0x10) == 0x21 && addr >= 0x008000) score++;
  if((rom[addr + MAPPER] & ~0x10) == 0x22 && addr <  0x008000) score++;
  if((rom[addr + MAPPER] & ~0x10) == 0x25 && addr >= 0x408000) score++;
  if(rom[addr + ROM_TYPE] < 0x08) score++;
  if(rom[addr + ROM_SIZE] < 0x10) score++;
  if(rom[addr + RAM_SIZE] < 0x08) score++;
  if(rom[addr + REGION] < 14) score++;
  if(rom[addr + COMPANY] == 0x33) score += 2;

  uint16 cksum, icksum;
  cksum  = rom[addr +  CKSUM] | (rom[addr +  CKSUM + 1] << 8);
  icksum = rom[addr + ICKSUM] | (rom[addr + ICKSUM + 1] << 8);
  if((cksum + icksum) == 0xffff && (cksum != 0) && (icksum != 0)) score += 4;

  uint16 reset;
  reset = rom[addr + RESL] | (rom[addr + RESL + 1] << 8);
  if(reset < 0x8000) return 0;

  uint8 resb = rom[(addr & ~0x7fff) + (reset & 0x7fff)];

  if(resb == 0x18 //clc
  || resb == 0x78 //sei
  || resb == 0x4c //jmp $nnnn
  || resb == 0x5c //jml $nnnnnn
  || resb == 0x20 //jsr $nnnn
  || resb == 0x22 //jsl $nnnnnn
  || resb == 0x9c //stz $nnnn
  ) score += 8;

  if(resb == 0xc2 //rep #$nn
  || resb == 0xe2 //sep #$nn
  || resb == 0xa9 //lda
  || resb == 0xa2 //ldx
  || resb == 0xa0 //ldy
  ) score += 4;

  if(resb == 0x00 //brk #$nn
  || resb == 0xff //sbc $nnnnnn,x
  || resb == 0xcc //cpy $nnnn
  ) score -= 8;

  printf("* resb = %0.2x\n", resb);

  return score < 0 ? 0 : score;
}

void Cartridge::find_header() {
  unsigned score_lo = score_header(0x007fc0);
  unsigned score_hi = score_header(0x00ffc0);
  unsigned score_ex = score_header(0x40ffc0);
  if(score_ex) score_ex += 4;

  printf("* score = %2d, %2d, %2d\n", score_lo, score_hi, score_ex);

  if(score_lo >= score_hi && score_lo >= score_ex) {
    info.header_index = 0x007fc0;
  } else if(score_hi >= score_ex) {
    info.header_index = 0x00ffc0;
  } else {
    info.header_index = 0x40ffc0;
  }
}
Planning to extent the reset vector first-byte check to a table with better, ranged probabilities.

Scores:
(Legend = Name: LoROM score, HiROM score, ExHiROM score)

Batman: RotJ: 13, 1, 0
Daikaijuu Monogatari 2: 0, 0, 16
Double Dragon: 17, 0, 0
Far East of Eden Zero: 0, 15, 0
Street Fighter Alpha II: 16, 15, 0
Star Ocean: 8, 0, 0
Tales of Phantasia (J): 0, 4, 9
Tales of Phantasia (Fan): 0, 8, 9
Ys 3 (J): 17, 8, 0

Highest score wins. If scores are equal, favors ExHiROM, then LoROM, then HiROM. All of the above scores obviously result in correct picks, but the wider the differences, the better.
Last edited by byuu on Tue Aug 12, 2008 7:49 pm, edited 1 time in total.
creaothceann
Seen it all
Posts: 2302
Joined: Mon Jan 03, 2005 5:04 pm
Location: Germany
Contact:

Re: SNES internal header detection

Post by creaothceann »

byuu wrote:

Code: Select all

icksum
Indeed. :?
vSNES | Delphi 10 BPLs
bsnes launcher with recent files list
Nach
ZSNES Developer
ZSNES Developer
Posts: 3904
Joined: Tue Jul 27, 2004 10:54 pm
Location: Solar powered park bench
Contact:

Post by Nach »

Well, you got the right idea for the most part.

Checking for valid start opcodes at the reset should get the most points. That should be followed by a valid checksum. After that, check for how valid other values are, you got the right idea for the most part. Company < 3 though doesn't make that much sense... If you want to base anything off the company, add several points for Company = 0x33.

The main thing you seem to be missing though is a file size check. If the ROM is < 64KB, lots and lots of points for LoROM. If the ROM is >4MB lots and lots of points for Extended HiROM.

For the other values, <2MB is most of the time LoROM, and >2MB is most of the time HiROM, so you can give a point or two for falling on either end of that.
May 9 2007 - NSRT 3.4, now with lots of hashing and even more accurate information! Go download it.
_____________
Insane Coding
byuu

Post by byuu »

Well, you got the right idea for the most part.
Good to hear, thanks.
Company < 3 though doesn't make that much sense... If you want to base anything off the company, add several points for Company = 0x33.
You're right, the value can be anything. Not sure what I was thinking there. I'll set it to only give credit for 0x33 for the new-style headers. 1:256 chance of it being that unintentionally, so give it +2 instead of just +1.

The idea of the other header checks is merely to give some leverage in case it just so happens that there's a valid reset vector + boot byte in the wrong header area.
The main thing you seem to be missing though is a file size check.
It was kind of there, but not really. If the file size was too small, score would automatically be set to zero. But just to be safe, I re-ordered things to lo >= hi >= ex as before, and gave it an automatic bonus to ex if the score is not zero (eg if it has a valid reset vector location,) as you suggested.
For the other values, <2MB is most of the time LoROM, and >2MB is most of the time HiROM, so you can give a point or two for falling on either end of that.
That one seems a tad bit dangerous, as I get really close right now. On one or two, the only way I can tell them apart is by the mapper ID. SFA2, for instance, is 4MB, but you want the header at 0x7fc0.

That one is quite evil, too. Valid checksums in both areas, the reset vector points to the same byte in both locations, etc. I can only tell it should be at 0x7fc0 because the mapper is 0x32.
Nach
ZSNES Developer
ZSNES Developer
Posts: 3904
Joined: Tue Jul 27, 2004 10:54 pm
Location: Solar powered park bench
Contact:

Post by Nach »

Well, if you want to stay away from close stuff, add points for special chip info in the right place.

SA-1, S-DD1, SFX are all LoROM. SPC7110 are all HiROM.
May 9 2007 - NSRT 3.4, now with lots of hashing and even more accurate information! Go download it.
_____________
Insane Coding
FitzRoy
Veteran
Posts: 861
Joined: Wed Aug 04, 2004 5:43 pm
Location: Sloop

Post by FitzRoy »

Code: Select all

*A***: No special chip (LoROM)
*B***: "DSP-1/DSP-2/DSP-3/DSP-4" special chip (LoROM)
*C***: "Super FX" special chip (LoROM) ?
*CA***: "Super FX" special chip (LoROM) ?
*CB***: "Super FX2" special chip (LoROM)
*DC***: "C4" special chip (LoROM)
*DE***: "ST018" special chip (LoROM)
*DH***: "SPC7110" special chip (HiROM)
*DS***: "ST010/ST011" special chip (LoROM)
*E***: "OBC1" special chip (LoROM)
*J***: No special chip (HiROM)
*K***: "DSP-1/DSP-2/DSP-3/DSP-4" special chip (HiROM)
*L***: "SA-1" special chip (LoROM)
*N***: "S-DD1" special chip (LoROM)
*P***: Prototype board (LoROM or HiROM)
*PV***: Prototype board (LoROM or HiROM)
This is what I have so far on this section of my PCB document. I don't really know what the difference is between C and CA.
henke37
Lurker
Posts: 152
Joined: Tue Apr 10, 2007 4:30 pm
Location: Sweden
Contact:

Post by henke37 »

Blob vs chip?

Anyway, with all this uhm, wild mass guessing? Maybe not that much, but still. I say that a neural network could be trained to guess fairly well.
Overload
Hazed
Posts: 70
Joined: Sat Sep 18, 2004 12:47 am
Location: Australia
Contact:

Post by Overload »

FitzRoy wrote:

Code: Select all

*A***: No special chip (LoROM)
*B***: "DSP-1/DSP-2/DSP-3/DSP-4" special chip (LoROM)
*C***: "Super FX" special chip (LoROM) ?
*CA***: "Super FX" special chip (LoROM) ?
*CB***: "Super FX2" special chip (LoROM)
*DC***: "C4" special chip (LoROM)
*DE***: "ST018" special chip (LoROM)
*DH***: "SPC7110" special chip (HiROM)
*DS***: "ST010/ST011" special chip (LoROM)
*E***: "OBC1" special chip (LoROM)
*J***: No special chip (HiROM)
*K***: "DSP-1/DSP-2/DSP-3/DSP-4" special chip (HiROM)
*L***: "SA-1" special chip (LoROM)
*N***: "S-DD1" special chip (LoROM)
*P***: Prototype board (LoROM or HiROM)
*PV***: Prototype board (LoROM or HiROM)
This is what I have so far on this section of my PCB document. I don't really know what the difference is between C and CA.
C = Mario Chip
CA = GSU-1
CB = GSU-2
neviksti
Lurker
Posts: 122
Joined: Thu Jul 29, 2004 6:15 am

Post by neviksti »

byuu,
I liked the "compromise" you and Nach reached regarding a new cartridge format, and your suggested specifics for such a format. Where in the development path does this fit in? (Is this something way down the road?) You could have the emulator just farm all this non-sense out to a rom tool.
FitzRoy
Veteran
Posts: 861
Joined: Wed Aug 04, 2004 5:43 pm
Location: Sloop

Post by FitzRoy »

Overload wrote: C = Mario Chip
CA = GSU-1
CB = GSU-2
Thanks. Seems to be a chip packaging revision between Mario Chip 1 and GSU-1 upon further research.
byuu

Post by byuu »

neviksti wrote:byuu,
I liked the "compromise" you and Nach reached regarding a new cartridge format, and your suggested specifics for such a format. Where in the development path does this fit in? (Is this something way down the road?) You could have the emulator just farm all this non-sense out to a rom tool.
Good question, not entirely sure. To be honest, we've been talking about this stuff for years, but not much has ever really been done.

If you wouldn't mind me implementing a version subject to change in the future, it really shouldn't be more than a weekend project. Getting it added in finalized form, I'd like it if I had a version of NSRT that could spit out all the PCB files for me, so I had lots to test with right off the bat.
Overload
Hazed
Posts: 70
Joined: Sat Sep 18, 2004 12:47 am
Location: Australia
Contact:

Post by Overload »

FitzRoy wrote:
Overload wrote: C = Mario Chip
CA = GSU-1
CB = GSU-2
Thanks. Seems to be a chip packaging revision between Mario Chip 1 and GSU-1 upon further research.
Not quite. The Mario Chip only supports 256kb of SRAM and up to 8Mbit ROM. The address decoding is different in all three PCB revisions.
neviksti
Lurker
Posts: 122
Joined: Thu Jul 29, 2004 6:15 am

Post by neviksti »

byuu wrote:If you wouldn't mind me implementing a version subject to change in the future, it really shouldn't be more than a weekend project. Getting it added in finalized form, I'd like it if I had a version of NSRT that could spit out all the PCB files for me, so I had lots to test with right off the bat.
Well, if it is going to happen, it needs to start somewhere. As long as you make it clear other people shouldn't support this format until it has stabalized, then sure... go for it! It would be great to see this step finally taken.

We've got some agreement amongst people which is more than we've ever had before on this topic. Now is the time to run with it.
FitzRoy
Veteran
Posts: 861
Joined: Wed Aug 04, 2004 5:43 pm
Location: Sloop

Post by FitzRoy »

Overload wrote:
FitzRoy wrote:
Overload wrote: C = Mario Chip
CA = GSU-1
CB = GSU-2
Thanks. Seems to be a chip packaging revision between Mario Chip 1 and GSU-1 upon further research.
Not quite. The Mario Chip only supports 256kb of SRAM and up to 8Mbit ROM. The address decoding is different in all three PCB revisions.
Cripes, why didn't they just call them Super FX 1A, 1B, and 1C? Would that have been too simple?
Locked