Programming: Searching a large array

Place to talk about all that new hardware and decaying software you have.

Moderator: General Mods

Post Reply
Palin
Hazed
Posts: 96
Joined: Tue Nov 08, 2005 12:40 pm

Programming: Searching a large array

Post by Palin »

I have a completely random question regarding the implementation of a data structure. I have a list of 5000 names linked to IDs. Because the IDs are sequential I realized that I can use the IDs as an index for an array.

As an example "Random_Name" has an ID of 3243 so writing myArray[3243] will return "Random_Name." That leads to a nice O(1) access time for converting ID to name, but what if I have a name and I want to find an ID?

The only way to do a string search that I know of requires me to have the listed sorted alphabetically. But if I do that then I lose the ability to search by ID. So what's my best option? Maintain two separate copies of the data that are sorted differently?
grinvader
ZSNES Shake Shake Prinny
Posts: 5632
Joined: Wed Jul 28, 2004 4:15 pm
Location: PAL50, dood !

Post by grinvader »

Someone would say hashing is where you want to go.

With an algo that converts a string of any length to an int, you have both the index and the string at your fingertips.

This is assuming you can have a decent amount of strings without encountering a collision...
皆黙って俺について来い!!

Code: Select all

<jmr> bsnes has the most accurate wiki page but it takes forever to load (or something)
Pantheon: Gideon Zhi | CaitSith2 | Nach | kode54
Palin
Hazed
Posts: 96
Joined: Tue Nov 08, 2005 12:40 pm

Post by Palin »

No collisions, I'll go look up hashing.

Much appreciated.
Nach
ZSNES Developer
ZSNES Developer
Posts: 3904
Joined: Tue Jul 27, 2004 10:54 pm
Location: Solar powered park bench
Contact:

Post by Nach »

If both sets are short, and you don't modify them, two separate sets sorted each way is best.

Otherwise, you want something with good time for addition and searching. Either a self balancing tree or a hash table.
May 9 2007 - NSRT 3.4, now with lots of hashing and even more accurate information! Go download it.
_____________
Insane Coding
byuu

Post by byuu »

Red-black tree if you go for an SBT:
http://en.wikipedia.org/wiki/Red-black_tree

If you're using C++, std::map<> is usually implemented this way.
Palin
Hazed
Posts: 96
Joined: Tue Nov 08, 2005 12:40 pm

Post by Palin »

Hmm, I might as well explain my full problem.

I'm playing with a database export from the MMO Eve Online. I'm writing a tool in C++/C# that connects to their API service to get information about my property in the game. The API returns all values as itemIDs which I can find in the database export. The specific task I'm trying to do is translate the itemID that corresponds to "Solar System" into a string.

There are five thousand some-odd solar systems in the game with non-repeating names. This information is relatively static, so I won't have to worry about adding or removing values.

As an example, the solar system "Jita" has an ID of 0142, the solar system "XX9-WV" has an ID of 1021.

If the API tells me that I have a control tower low on fuel in the solar system 0142, I want to be able to convert it into the string "Jita"

On the other hand, I might also want to search for all items that I own in the solar system "XX9-WV" which means I would need to convert it to the integer 1021 to query the API.

I'm not really sure, is having several thousand strings loaded into memory "trivial?" Since I don't modify the data having two lists might make more sense.
Nach
ZSNES Developer
ZSNES Developer
Posts: 3904
Joined: Tue Jul 27, 2004 10:54 pm
Location: Solar powered park bench
Contact:

Post by Nach »

Palin wrote: I'm not really sure, is having several thousand strings loaded into memory "trivial?"
Yes.
May 9 2007 - NSRT 3.4, now with lots of hashing and even more accurate information! Go download it.
_____________
Insane Coding
Palin
Hazed
Posts: 96
Joined: Tue Nov 08, 2005 12:40 pm

Post by Palin »

I guess that answers my question then. Two separate lists.

I should probably learn how to generate hash tables regardless, looks useful.
funkyass
"God"
Posts: 1128
Joined: Tue Jul 27, 2004 11:24 pm

Post by funkyass »

if you are getting this from a database, whats preventing from letting the database to the searching for you?

Or if that isn't feasible, put the DB export into a database...
Does [Kevin] Smith masturbate with steel wool too?

- Yes, but don’t change the subject.
Palin
Hazed
Posts: 96
Joined: Tue Nov 08, 2005 12:40 pm

Post by Palin »

I can't access the EVE database directly, but the devs provide an exported copy every time they make a major revision to the game. I've uploaded individual tables to use in web applications before, but I'm not quite sure how I would interface with the database files using a C++/C# program.

One reason why extracting the data into a binary file seems like a better idea is because the mapSolarSystems table contains lots of extra information:
RegionID
constellationID
solarSystemID
solarSystemName
xyz
xMin
xMax
yMin
yMax
zMin
zMax
luminosity
border
fringe
corridor
hub
international
regional
constellation
security
factionID
radius
sunTypeID
securityClass
allianceID
All I really need is solarSystemName and solarSystemID, and maybe regionID.
creaothceann
Seen it all
Posts: 2302
Joined: Mon Jan 03, 2005 5:04 pm
Location: Germany
Contact:

Post by creaothceann »

Palin wrote:I'm not really sure, is having several thousand strings loaded into memory "trivial?"
Estimate. If each string has a maximum size of 100 (single-byte) characters: 100 * 5,000 = 500,000 bytes.
vSNES | Delphi 10 BPLs
bsnes launcher with recent files list
grinvader
ZSNES Shake Shake Prinny
Posts: 5632
Joined: Wed Jul 28, 2004 4:15 pm
Location: PAL50, dood !

Post by grinvader »

Not even 500k. I've seen fatter top models.
皆黙って俺について来い!!

Code: Select all

<jmr> bsnes has the most accurate wiki page but it takes forever to load (or something)
Pantheon: Gideon Zhi | CaitSith2 | Nach | kode54
Palin
Hazed
Posts: 96
Joined: Tue Nov 08, 2005 12:40 pm

Post by Palin »

Probably closer to ten characters at most. I guess I could have figured that out on my own. All of my previous projects have just been simple take input, create output console applications. Dealing with loading resources into memory kinda intimidates me.
funkyass
"God"
Posts: 1128
Joined: Tue Jul 27, 2004 11:24 pm

Post by funkyass »

if its only three columns, then sorted arrays would be the most efficient method.
Does [Kevin] Smith masturbate with steel wool too?

- Yes, but don’t change the subject.
grinvader
ZSNES Shake Shake Prinny
Posts: 5632
Joined: Wed Jul 28, 2004 4:15 pm
Location: PAL50, dood !

Post by grinvader »

Save the whales, free the mallocs.
皆黙って俺について来い!!

Code: Select all

<jmr> bsnes has the most accurate wiki page but it takes forever to load (or something)
Pantheon: Gideon Zhi | CaitSith2 | Nach | kode54
Post Reply