bsnes v0.038 released

Archived bsnes development news, feature requests and bug reports. Forum is now located at http://board.byuu.org/
Locked
Nach
ZSNES Developer
ZSNES Developer
Posts: 3904
Joined: Tue Jul 27, 2004 10:54 pm
Location: Solar powered park bench
Contact:

Post by Nach »

FitzRoy wrote:
Nach wrote: I would be happy to support your naming convention, it just might not be the one I prefer, or the one I have as default in NSRT, but I'd like to record it and allow users to use it nontheless.

As for working offline, I am a lot more capable with cross platform + web platform apps now than I was even just a few months ago.

If it was really needed, I could see writing a program where you browse files, fill in fields and whatever, and when you're online, you'd hit a sync to web button. I don't think it can be done right off the bat though, unless a lot of people want to pitch in with the software development.
It isn't just about placating me personally, I'm not trying to blackmail you. It's about what every emu author, user or dumping organization wants individually, and you just can't do that with a central database.
Why not?
FitzRoy wrote: I didn't mention JMA to implicate all formats you've worked on as being the equivalent. I mentioned it as an example to cast doubt on your suggestion that we all just go away and trust you to come up with the best solution for the database method in private.
Are you implying that anyone wanted a new archive format with strong compression, and I came up with JMA which is somehow a bad example of an archive format?

I can't recall anyone asking me for a new archive format. The only reason I did it was because I wanted to, and I knew I could. Just because some people don't care to compress with JMA doesn't mean the format in and of itself is bad.

It's not like I went ahead and removed support for everything but JMA loading from SNES emulators. On the other hand, I did provide a way for people to get 7-Zip level compression at a time when there was no clean API to work with to do such.

As for a database format, I never just went ahead and came up with one. If I did, where exactly is it? In fact, I've had talks with several about it, like NGEfreak and others, and like I said, if people want to seriously discuss this, I'll create a forum for it.
FitzRoy wrote: It's clear now that my concerns were warranted, I wouldn't have been happy with these solutions. It's far easier to defeat a bad idea before you've spent many man hours laboring it into existence.
I'm still waiting to see this bad idea that you somehow nipped in the bud.
May 9 2007 - NSRT 3.4, now with lots of hashing and even more accurate information! Go download it.
_____________
Insane Coding
henke37
Lurker
Posts: 152
Joined: Tue Apr 10, 2007 4:30 pm
Location: Sweden
Contact:

Post by henke37 »

Is it just me or have these "version X released" topics begun to smell just like the 100+ page topic?
Rashidi
Trooper
Posts: 515
Joined: Fri Aug 18, 2006 2:45 pm

Post by Rashidi »

as JMA format end user, i can not say that JMA format is useless.

currently i have compiled about 3-thousands of JMA-ed SNES rom images into a single DVD-rom image file, which then i mounted on a virtual drive.
without JMA compression it would certainly surpass the 4 GB barrier mark, which is make it un-useable on FAT32 volume.

sure, i can pack the rom files into other compression format such as 7z or Rar, but if i do so, i can't associate the extension to directly open it with favorite emulator...
but with .JMA extension, i just able to do so.

true that GB/$ getting permanently cheaper, but i have my personal reason why i want it compiled on dvd-rom, why FAT32, why file associations and so on....
with all that personal resaons, one thing i know for sure .JMA perfectly fits it.
Deathlike2
ZSNES Developer
ZSNES Developer
Posts: 6747
Joined: Tue Dec 28, 2004 6:47 am

Post by Deathlike2 »

Rashidi wrote:as JMA format end user, i can not say that JMA format is useless.
As I've used JMA as an example.. it suffers from the same issue UPS is suffering from. It needs better tools to be used with it.

It's only considered "useless" because it isn't "in style"... and the method of creating JMA files isn't as obvious as one intended. Right now, it's a small niche.. pretty much in the same boat UPS is in.

One of the non-obvious things about JMA is I believe NSRT is able to convert ROMs using it.. however, it's not very obvious (I think I know how to do it).. it's not a one click (or a right click and select) solution (NF is not actually that intuitive to the average person for this purpose).

Besides, this format does actually save more space (if you need to save all those bytes), comparitively speaking to 7z/rar, it already beats the living crap out of zip.
Continuing [url=http://slickproductions.org/forum/index.php?board=13.0]FF4[/url] Research...
FitzRoy
Veteran
Posts: 861
Joined: Wed Aug 04, 2004 5:43 pm
Location: Sloop

Post by FitzRoy »

Nach wrote:Why not?
Um, explain to me how you would support every name, verification, and inclusion ruleset on a central website db. When byuu pointed this out before I could, your rebuttal was just a confirmation of what people mostly argue about, you never really said how you planned to centrally database every single viewpoint.

And if you can explain that, tell me what this is supposed to accomplish over independent database files.
Nach wrote:I can't recall anyone asking me for a new archive format. The only reason I did it was because I wanted to, and I knew I could.
Fine, I'm glad you enjoyed yourself. Will you push every format using that reasoning, or just the one's you create?
Nach wrote:As for a database format, I never just went ahead and came up with one. If I did, where exactly is it?
Okay, what are we arguing about then? Onward with the independent bundle/replace/maintain method, any criticism with it was proven fallacious. When the details of its mythical superior become less sketchy, its replacement can be considered.
Nach
ZSNES Developer
ZSNES Developer
Posts: 3904
Joined: Tue Jul 27, 2004 10:54 pm
Location: Solar powered park bench
Contact:

Post by Nach »

FitzRoy wrote:
Nach wrote:Why not?
Um, explain to me how you would support every name, verification, and inclusion ruleset on a central website db. When byuu pointed this out before I could, your rebuttal was just a confirmation of what people mostly argue about, you never really said how you planned to centrally database every single viewpoint.
I have better things to do than start making a list of every viewpoint out there and explain exactly how it can be covered. This isn't a discussion on that. If you think storing certain kinds of data is mutually exclusive, please take a course in database design.

If there's something specific you would like addressed, I can explain how it'll work.
FitzRoy wrote: And if you can explain that, tell me what this is supposed to accomplish over independent database files.
It can generate independent database files, that's the point. I don't want to have to create independent database files by hand. Why shouldn't we have a community setup to manage it, instead of people all over duplicating work copying and pasting, instead of just new info being written once?
FitzRoy wrote:
Nach wrote:I can't recall anyone asking me for a new archive format. The only reason I did it was because I wanted to, and I knew I could.
Fine, I'm glad you enjoyed yourself. Will you push every format using that reasoning, or just the one's you create?
I really don't get what your problem is. There's dozens of formats out there, so what if I created another one? Why are you getting so worked up over it?
If you found another format which had pros for storing SNES ROMs in, over what is currently supported, and has a decent lightweight library that we can use, I'll integrate it.
FitzRoy wrote:
Nach wrote:As for a database format, I never just went ahead and came up with one. If I did, where exactly is it?
Okay, what are we arguing about then? Onward with the independent bundle/replace/maintain method, any criticism with it was proven fallacious. When the details of its mythical superior become less sketchy, its replacement can be considered.
We're arguing about a central website for storing data, which can be used to create all the other data people want out there, without having to duplicate effort. I really don't understand what your problems with it are, or what is mythical about it. Is there some terminology I'm using that you're not understanding?
May 9 2007 - NSRT 3.4, now with lots of hashing and even more accurate information! Go download it.
_____________
Insane Coding
tetsuo55
Regular
Posts: 307
Joined: Sat Mar 04, 2006 3:17 pm

Post by tetsuo55 »

Although we can still disagree on the exact naming convention.
Over at no-intro we have made a lot of changes to the datting system.

The result is that at this point we can store all the different names for each rom in a single entry, the rom scanning programs allow selection of the desired naming

At this point we store all names according to the no-intro convention, for every language that actually had an alternative name, where possible worldwide ISO standards are used.

Adopting a similar system, combined with a non-name based detection method will assure that everyone is happy.
funkyass
"God"
Posts: 1128
Joined: Tue Jul 27, 2004 11:24 pm

Post by funkyass »

start with the absolute minimum of fields for the system to work, then go from there.

Its meta-data about a subject that isn't going to change drastically.
Last edited by funkyass on Fri Jan 02, 2009 11:32 pm, edited 1 time in total.
Does [Kevin] Smith masturbate with steel wool too?

- Yes, but don’t change the subject.
FitzRoy
Veteran
Posts: 861
Joined: Wed Aug 04, 2004 5:43 pm
Location: Sloop

Post by FitzRoy »

Nach wrote:It can generate independent database files, that's the point. I don't want to have to create independent database files by hand. Why shouldn't we have a community setup to manage it, instead of people all over duplicating work copying and pasting, instead of just new info being written once?
Because all people collectively can't agree on what information is important, how to name games, how to tag them, what's on the website and how it should be laid out, who should pay for the bandwidth, etc. It's not possible to combine conflicting databases. I'd rather not have good minority ideas snuffed out from ever being chosen by majority rule.

In other words, you can make your site and have the file generation onsite, it's compatible with my idea. What isn't compatible is locking the emulator to it, either overtly or effectively by using proprietary file formats and scanning techniques.
byuu

Post by byuu »

The only thing proprietary being suggested is the verification certificates that an emulator would use to say "yes, this ROM was verified as authentic."

And really, that's something all database devs should be working on a central project to support. ROM names, genre, etc really don't mean shit (no offense) in the grand scheme of things -- we can reinterpret that every other year for all time if we want. If we can match a binary blob to an exact file with an exact PCB, that's all that really matters to an emulator.

Any other group will be free to implement a database of games with their own info in the same format. I won't promise to use their custom data in any way in the emulator, but I'm happy to allow use of any certificates we release -- I just don't want just anyone to be able to make those for obvious reasons.

Really, we should consider the certificate signing and database two separate projects.

The only thing we need to do is pick a generic, variable-key database format and define two mandatory key names and formats, "hash" and "pcb".

Something like IBM dBASE III (old school, very simple) or MySQL (modern, powerful) would be fine.
funkyass
"God"
Posts: 1128
Joined: Tue Jul 27, 2004 11:24 pm

Post by funkyass »

good old flat file CSV. Sorting would be good. If its going to be two fields then we don't need really crazy DB stuff. Like joins, stored procedures, or unions...
Does [Kevin] Smith masturbate with steel wool too?

- Yes, but don’t change the subject.
byuu

Post by byuu »

Plain text may not be ideal for a 6,000 entry database. Or a GoodSNES set with ~60,000 entries for Super Mario World (U) variants alone.

Size may not be a huge issue, but if we're including it with an emulator, it'd be ideal if binary data (such as hashes) were stored in binary format. Compression may or may not compensate for that.

More importantly, being able to index into each element in O(1) time would also be important to quickly find hashes without reading the entire database into memory and pre-parsing it.

CSV also has an annoying ass nested quoting method.
FitzRoy
Veteran
Posts: 861
Joined: Wed Aug 04, 2004 5:43 pm
Location: Sloop

Post by FitzRoy »

byuu wrote:The only thing proprietary being suggested is the verification certificates that an emulator would use to say "yes, this ROM was verified as authentic."
Let me make some points in case you're not aware of them. I'm not a programmer and I don't know MySQL or care to know. If I'm going to be allowed to construct my own database, I want to be able to hand modify it. CSV is simple and advantageous to me, it's supported in OpenOffice, and the formatting doesn't look funny when viewed in notepad. I agree with funkyass, this might be the best choice for both parties.

I have no idea what a certificate is or why it's needed, but verification works like this: Someone tells a db author that they've dumped a game at X checksums and with X board serial. The author documents that to that person's name. Nobody but the dumper knows for 100% certain that this information is true and accurately reported. It could be fraud, it could be erroneous, or it could come from someone else. This is also the case for purported repeat verifications. It is a fallacy to believe that two potentially untrue verifications creates 100% certainty worthy of a golden sticker. In fact, if you did have fraudulent intentions, submitting the same fraud under as many different names is exactly what you'd be incentivized to do.

Now, there are ways to reduce uncertainty, but it's always there. Everything you see that was not submitted by yourself is based on a degree of trust. You cannot really verify a dump as authentic any more than I or some other group can, so what phantom problem are your certificates trying to solve? Somebody from replacing the database file with a phony one of their own? I don't get it.
Deathlike2
ZSNES Developer
ZSNES Developer
Posts: 6747
Joined: Tue Dec 28, 2004 6:47 am

Post by Deathlike2 »

FitzRoy wrote:
byuu wrote:The only thing proprietary being suggested is the verification certificates that an emulator would use to say "yes, this ROM was verified as authentic."
Let me make some points in case you're not aware of them. I'm not a programmer and I don't know MySQL or care to know. If I'm going to be allowed to construct my own database, I want to be able to hand modify it. CSV is simple and advantageous to me, it's supported in OpenOffice, and the formatting doesn't look funny when viewed in notepad. I agree with funkyass, this might be the best choice for both parties.
Once you don't understand the issues of a database, please stop trying to think too much. The whole point is to create a good one that minimizes duplication and can distinguish every important detail necessary. Minor details such as genre of a game is not important, and is simply cosmetic. Important details will include revision, hashes/checksums... stuff that one can distinguish one ROM over another. How it is stored internally should not matter (for the non-programmer).. how it is displayed is all important to the end user, but that's really insignificant in the grand scheme (when you get to displaying it, you can comment all you want on it then).
I have no idea what a certificate is or why it's needed, but verification works like this: Someone tells a db author that they've dumped a game at X checksums and with X board serial. The author documents that to that person's name. Nobody but the dumper knows for 100% certain that this information is true and accurately reported. It could be fraud, it could be erroneous, or it could come from someone else. This is also the case for purported repeat verifications. It is a fallacy to believe that two potentially untrue verifications creates 100% certainty worthy of a golden sticker. In fact, if you did have fraudulent intentions, submitting the same fraud under as many different names is exactly what you'd be incentivized to do.
When you download something from like MS's website for example, those executables come with built-in certificates, verifying what that app is and a checksum or whatever that confirms it that this is a good file. In the same sense, the database created will probably be done by Nach or other parties that are responsible that can add legitimate entries and whatnot. I'm fuzzy on all the details on when this matters in the end.. it's safe to say that no bad ROMs will not be certified by people you trust.
Continuing [url=http://slickproductions.org/forum/index.php?board=13.0]FF4[/url] Research...
FitzRoy
Veteran
Posts: 861
Joined: Wed Aug 04, 2004 5:43 pm
Location: Sloop

Post by FitzRoy »

Deathlike2 wrote: Once you don't understand the issues of a database, please stop trying to think too much. The whole point is to create a good one that minimizes duplication and can distinguish every important detail necessary. Minor details such as genre of a game is not important, and is simply cosmetic. Important details will include revision, hashes/checksums... stuff that one can distinguish one ROM over another. How it is stored internally should not matter (for the non-programmer).. how it is displayed is all important to the end user, but that's really insignificant in the grand scheme (when you get to displaying it, you can comment all you want on it then).
I agree that genres aren't that useful (it's also highly subjective information, differentiation of gameplay or music is theoretically infinite and subject to combinations). But don't lump the information I include with this. Consider the fact that it could act like a cartridge reference, and that users could check it to become interested in participating in dumping carts that are not documented. In fact, the whole reason I'm more able to find new dumps is by correlating this information in a massive list. (Just look at the European situation, nobody on earth has a complete list of PAL serial codes). Does this not indirectly benefit emu authors in obtaining more of what they care about?
When you download something from like MS's website for example, those executables come with built-in certificates, verifying what that app is and a checksum or whatever that confirms it that this is a good file. In the same sense, the database created will probably be done by Nach or other parties that are responsible that can add legitimate entries and whatnot. I'm fuzzy on all the details on when this matters in the end.. it's safe to say that no bad ROMs will not be certified by people you trust.
Not in the same sense. The people who made the program or drivers are the ones who register the information with MS. Are the game publishers the ones sending data to us to obtain our certification? No, it's all users submitting what they purport to be the game developer's unmolested data.
Deathlike2
ZSNES Developer
ZSNES Developer
Posts: 6747
Joined: Tue Dec 28, 2004 6:47 am

Post by Deathlike2 »

FitzRoy wrote: I agree that genres aren't that useful (it's also highly subjective information, differentiation of gameplay or music is theoretically infinite and subject to combinations). But don't lump the information I include with this. Consider the fact that it could act like a cartridge reference, and that users could check it to become interested in participating in dumping carts that are not documented. In fact, the whole reason I'm more able to find new dumps is by correlating this information in a massive list. (Just look at the European situation, nobody on earth has a complete list of PAL serial codes). Does this not indirectly benefit emu authors in obtaining more of what they care about?
If it's significantly subjective, it's technically irrelevent (and optional).
Not in the same sense. The people who made the program or drivers are the ones who register the information with MS. Are the game publishers the ones sending data to us to obtain our certification? No, it's all users submitting what they purport to be the game developer's unmolested data.
This is done by a community of users (probably dumpers or some people that have some credibility) that will maintain this database. It's not like random people are going to be picked for this.

In my MS app example, it's not registered to MS.. it has security validation from a third party (I think Verisign). So, if you download the DX web updater and you look at the properties, you will see a digital signature on it. This is the same kind of stuff you'd find at secure websites. The idea here is that a community can have the power to add/remove/update these "certificates" verifying that you are using the original data. It's not like a mystery is being invented... this can be expanded for hacks or translation or other non-SNES games... so it's not a system just for the SNES community, but a generalized community effort not done by one person and some arbitrary system.
Continuing [url=http://slickproductions.org/forum/index.php?board=13.0]FF4[/url] Research...
byuu

Post by byuu »

If I'm going to be allowed to construct my own database, I want to be able to hand modify it.
That seems somewhat dangerous (personal opinion, don't want to debate that) ... but it should be easy enough to perform a one-time pass over any DB and cache the hashes for O(log n) searches.

Okay, so we'll find a format that most spreadsheets can edit. If we use CSV, then we should use the first row for column names. Require hash, pcb and maybe title. Make the rest optional. Format would be UTF-8.
what phantom problem are your certificates trying to solve? Somebody from replacing the database file with a phony one of their own? I don't get it.
Hashes continue to be broken with time. MD5 is pretty easy to break, and SHA-1 is being picked away at. What good is a database of hash validations if anyone can spoof matches in 10 years?

A signed certificate would be something ridiculous, like a 4096-byte key with both public and private signing keys. It would be used as a super-strength checksum, in other words. Maybe that can be broken in 100 years with quantum computers ... I don't care, though. I'll be dead with no offspring by then :P

Signed with a person's name, we create a chain of trust. The idea being to minimize the number of dumpers. If there's a dumper with only 3 dumps and he leaves, get someone with 200 to verify those 3.

Allow everyone to tell us if they find a false / corrupt / whatever entry, and we call all that person's work into question and resolve it.

Another issue with the database is ... how do you know you have the authentic DB? One can easily add an entry for their ROM hack to your DB, seed it with an emu, and claim it's a commercial title or something stupid.

Yes, this is all paranoia extreme stuff. It's more because we can than anything else. Get a surefire, failproof way to validate ROMs. Require that people get proper validation before accepting bug reports. The last thing I want to see is regular SNES carts end up like BS-X dumps are today.
Thristian
Hazed
Posts: 76
Joined: Tue Feb 07, 2006 11:02 am

Post by Thristian »

byuu wrote:Something like IBM dBASE III (old school, very simple) or MySQL (modern, powerful) would be fine.
I think you'll find dBase was an Ashton-Tate product, not IBM.

Just in case people haven't heard of it, SQLite is a very popular embedded database, under a public-domain licence. Granted, it's not quite as easy to edit as CSV, but there are handy GUI editors like this one. Both Firefox and Python come with SQLite these days.

As for dump certifications, I imagine you would create a little text-file saying something like "I dumped PAL revision 1.1 of game #2345 and found it to have sha-256 1234567890abcdef" (presumably with more... machine-readable formatting), and let people sign it with their GPG keys. Then you can dump the signatures into the database and keep count of who claims which hash represents which game.

Anyway, aren't we supposed to be arguing about how to represent the PCB layout associated with a PCB serial, rather than how to associate a PCB serial with a particular ROM?
FitzRoy
Veteran
Posts: 861
Joined: Wed Aug 04, 2004 5:43 pm
Location: Sloop

Post by FitzRoy »

Deathlike2 wrote: This is done by a community of users (probably dumpers or some people that have some credibility) that will maintain this database. It's not like random people are going to be picked for this.
I don't care who you pick for your database.
In my MS app example, it's not registered to MS.. it has security validation from a third party (I think Verisign).
Whatever, it doesn't matter if it's first or third party validation. It doesn't change the fact that the registrant for a program's validation is the publisher of that program, while ours is not. My logic is unbroken.
The idea here is that a community can have the power to add/remove/update these "certificates" verifying that you are using the original data.
No benefit here, golden sticker by the community for community submitted data. Pure circle-jerk, it accomplishes no greater guarantee over individually controlled databases.
but a generalized community effort not done by one person and some arbitrary system.
The number of users involved makes no difference, the source for the submission of data is the same. Nach submitting data to Group A is the same as Nach submitting data to Group B. Stranger123 submitting data to Group A is the same as Stranger123 submitting data to Group B. What determines a user's perception of a database submission's integrity is:

1. What trust the user places in the submitter and maintainer
2. Under what conditions that data is accepted by that database's maintainer

If Group B actually required Stranger123 and Nach to submit a scan of the cartridge and pcb while Group A didn't, that database would effectively trump Group A. The added effort and difficulty would dissuade casual fraud, only the most serious verifiers would bother doing it. NSRT, for example, does not do this. It doesn't require cartridge data at all, let alone a scan. Just report the checksums with an NSRT scan of a file you downloaded and he'll write you down as a verifier. A certificate checksum wouldn't change anything, you could do the same exact thing and they'd take it. You could claim duplicate verifications with very little effort under this system.

Furthermore, NSRT had fake entries to dissuade mass preservation and availability of data. It might have accepted everything on Overload's website under Overload's name, even though Overload did not document if the data had come from him or was submitted to him by someone else. Person A could have told Person B the information, then Person B submitted to Overload. Who knows, the verifications list was never made public.

Are we sort of understanding where I'm coming from yet, why I'm so adamant about making this simple and freedom oriented? I'm done with this nonsense, I'm not arguing with you or Nach or anyone else in a roundtable superdatabase. I already had these worthless debates in NSRT and I'm having them here. I've spent lots of time and money on this and I'm being told by people who haven't that all we need is a new hash function. A system wherein we all issue each other golden tickets to suppress the reality of mutual uncertainty. Unless we all can see each other when we're dumping, that's what we have.

If you really wanted to know what the community needs from bored programmers, we really need a better multi-system rom manager to promote data integrity. Clrmamepro and Romcenter are garbage, there is far simpler way. It's in my mind, I just can't make it.
byuu wrote:A signed certificate would be something ridiculous, like a 4096-byte key with both public and private signing keys. It would be used as a super-strength checksum, in other words. Maybe that can be broken in 100 years with quantum computers ... I don't care, though. I'll be dead with no offspring by then :P


All you really need is two existing checksum methods. Fraud and collision is next to impossible with CRC+MD5 documentation. If by some miracle someone had the motivation and skill to brute force both, the file would be so butchered that it would never propagate in distribution channels. Because it wouldn't function in emulators.

byuu wrote:That seems somewhat dangerous (personal opinion, don't want to debate that)


I'm not working on the thing from the bsnes directory, I just want the format to be workable in my spreadsheet program. You could bundle a CSV database under a DB extension to dissuade people from opening it if you really wanted. As long as the internals are CSV, I don't really care.
creaothceann
Seen it all
Posts: 2302
Joined: Mon Jan 03, 2005 5:04 pm
Location: Germany
Contact:

Post by creaothceann »

byuu wrote:
If I'm going to be allowed to construct my own database, I want to be able to hand modify it.
[...] If we use CSV, then we should use the first row for column names. Require hash, pcb and maybe title. Make the rest optional. Format would be UTF-8.
The optional info could be labelled, to make it not completely column-based, and flexible.

Code: Select all

<hash>,<pcb>,title=<title>,genre=<genre>,publisher=<publisher>,BirthdayOfLeadDev1=...
vSNES | Delphi 10 BPLs
bsnes launcher with recent files list
byuu

Post by byuu »

As for dump certifications, I imagine you would create a little text-file saying something like "I dumped PAL revision 1.1 of game #2345 and found it to have sha-256 1234567890abcdef" (presumably with more... machine-readable formatting), and let people sign it with their GPG keys.
Ah, GPG may be a nice way of signing things. But what happens when someone breaks SHA-256 enough to reduce its complexity enough to find collisions?
Anyway, aren't we supposed to be arguing about how to represent the PCB layout associated with a PCB serial, rather than how to associate a PCB serial with a particular ROM?
I don't think we've ever been able to stay on topic more than two pages here :P
1. What trust the user places in the submitter and maintainer
All the more reason for everyone to send all of their SNES carts directly to me ^_^
(joking -- much as I'd like to, I don't have that kind of time.)
Furthermore, NSRT had fake entries to dissuade mass preservation and availability of data.
I don't think it was for dissuading preservation, more of thwarting the people who freak out because the UI says "missing 1 of 6000".

That's a tough spot to be in ... I'd guess that most people with complete ROM sets are more interested in having them all available for play, rather than preservation. But at the same time, they do technically serve as potential mirrors of the data.
I've spent lots of time and money on this and I'm being told by people who haven't that all we need is a new hash function.
Very true. You certainly trump my expertise in this field. I hope I'm not coming off offensive to you as I did with Nach and his header formulas.

Note that I'm only discussing the matter because it will directly affect me in how I have to implement it. As you said, you lack the means to make the tools. I can do that, but I need to understand the problem first.

Your posts have been very helpful there. I'd still dream of a single database, but it's more apparent how unrealistic that is now. I guess what I'm saying is ... thank you for crushing my hopes :P
All you really need is two existing checksum methods. Fraud and collision is next to impossible with CRC+MD5 documentation. If by some miracle someone had the motivation and skill to brute force both, the file would be so butchered that it would never propagate in distribution channels. Because it wouldn't function in emulators.
Nobody has managed an MD5+CRC collision yet? I'm sure it will happen in the future, though.

At the end of the day, a hash will always be a hash: if the has is 512 bits, that means it can only 100% uniquely identify 512 bits of data with zero collisions.

You make a good point about functioning in emulators ... the longer the hash, the more that will need to be modified, hindering its ability to go unnoticed. But with a 4MB game, 512 bits isn't a whole lot. I know, I'm grossly over-simplifying the security model of hashes and the lengths they go to not make it so trivial.

Still ... take every possible combination of 65-byte files, and I guarantee you there's a collision with one of them when using SHA-512.

We want to get the hash function right the first time, so we need to pick the best method that will stand up to time as possible. What if we went with a custom 4k-byte hashing function?

Then as a final fallback, seed the complete set to trusted peers who can make new hash sets off a "golden master" set of sorts.
Thristian
Hazed
Posts: 76
Joined: Tue Feb 07, 2006 11:02 am

Post by Thristian »

byuu wrote:
As for dump certifications, I imagine you would create a little text-file saying something like "I dumped PAL revision 1.1 of game #2345 and found it to have sha-256 1234567890abcdef" (presumably with more... machine-readable formatting), and let people sign it with their GPG keys.
Ah, GPG may be a nice way of signing things. But what happens when someone breaks SHA-256 enough to reduce its complexity enough to find collisions?
Then people can invent SNES ROMs with the same hash as an official ROM, but with customized content, such as a security exploit against an emulator or a ROM cataloging tool or whatever. Of course, unless your emulator happens to have the ability to refuse to play non-known-good ROMs, such attacks are already possible, so we don't really lose anything.

Note that I said 'sign' there, which in encryption terms means 'hash the data and encrypt the hash with the signer's public key'. There's a hash function being used anyway, and probably not one we can consciously control (without lobbying the IETF to amend the OpenPGP spec, etc.) so what harm will one more hash function do?
byuu wrote:We want to get the hash function right the first time, so we need to pick the best method that will stand up to time as possible. What if we went with a custom 4k-byte hashing function?
I have a very limited education in secure software implementation, but "What if we went with a custom..." is one of those phrases that sets alarm bells ringing. I would expect that the probability of a collision in meaningful data would be higher for a custom-written hash than for even MD5. Security professionals take this hashing thing really seriously: the NSA's SHA-3 competition stopped accepting new entrants in October 2008, and they're going to spend four years evaluating them before they decide on a winner.

Assuming every hash function will be broken eventually (an assumption that's certainly held so far), each game should be associated with a list of (hash algorithm, hash value) tuples so as old hash algorithms get broken we can add the hashes from newer, stronger algorithms. And then one day every original SNES card will be undumpable dust, but hopefully it will be safe to assume that any file that matches all the previous hashes is the original content and we can generate the new hash value from the existing dump rather than having to re-dump.
h4tred

Post by h4tred »

All the more reason for everyone to send all of their SNES carts directly to me ^_^
(joking -- much as I'd like to, I don't have that kind of time.)
Which makes me wonder why you guys are thinking of this PCB idea. I haven't been following this much but I would like to know the gist, since it seems like a important thing.

As for this hashing thing:

* MD5 is busted
* SHA1 and SHA-512 is busted
* RIPEMD160, HAVAL3-256 and Tiger are busted
* Stuff based on elliptic curve cryptography is busted
* RSA-1024 is busted


So, you are running out of options.....[/code]
byuu

Post by byuu »

New private WIP. Nothing worth downloading it over, really.
- fixed first scanline DRAM refresh event (passes irq.smc and nmi.smc again)
- fixed PPUcounter to initialize before CPU; not that it affected anything as-is, but it's nice for future proofing to do it right
- optimized priority queue thing to move instead of swap; didn't affect overall emu speed sadly (still infinitesimally faster than the last official release), but I still like the model for timing events that will occur no matter what
- made the ALU delays more permanent advanced config options; 32 and 48 were still screwing with taz-mania ... not even a whole opcode on the mul -- that game literally reads the regs immediately. We can't get things any better than we already have until we emulate the formula; so I set them both to 2 clock cycles for now, they're at least there for hobbyist devs, who can set them fairly high to guarantee their code would work on hardware
- removed a bit of cruft
* RSA-1024 is busted
Really? What are its factors, then? Please tell me in private so I can claim the $100,000 bounty when it's offered again :D
(they've only broken a 200-decimal digit one with the equivalent of 75 PC work-years, RSA-1024 has 309 and the problem is exponential, not linear.)
DataPath
Lurker
Posts: 128
Joined: Wed Jul 28, 2004 1:35 am
Contact:

Post by DataPath »

byuu wrote:
* RSA-1024 is busted
Really? What are its factors, then? Please tell me in private so I can claim the $100,000 bounty when it's offered again :D
(they've only broken a 200-decimal digit one with the equivalent of 75 PC work-years, RSA-1024 has 309 and the problem is exponential, not linear.)
Apparently in his lexicon, finding some shortcuts, not even order-of-magnitude shortcuts, equals "busted".
Locked