Programming: Searching a large array
Moderator: General Mods
Programming: Searching a large array
I have a completely random question regarding the implementation of a data structure. I have a list of 5000 names linked to IDs. Because the IDs are sequential I realized that I can use the IDs as an index for an array.
As an example "Random_Name" has an ID of 3243 so writing myArray[3243] will return "Random_Name." That leads to a nice O(1) access time for converting ID to name, but what if I have a name and I want to find an ID?
The only way to do a string search that I know of requires me to have the listed sorted alphabetically. But if I do that then I lose the ability to search by ID. So what's my best option? Maintain two separate copies of the data that are sorted differently?
As an example "Random_Name" has an ID of 3243 so writing myArray[3243] will return "Random_Name." That leads to a nice O(1) access time for converting ID to name, but what if I have a name and I want to find an ID?
The only way to do a string search that I know of requires me to have the listed sorted alphabetically. But if I do that then I lose the ability to search by ID. So what's my best option? Maintain two separate copies of the data that are sorted differently?
-
- ZSNES Shake Shake Prinny
- Posts: 5632
- Joined: Wed Jul 28, 2004 4:15 pm
- Location: PAL50, dood !
Someone would say hashing is where you want to go.
With an algo that converts a string of any length to an int, you have both the index and the string at your fingertips.
This is assuming you can have a decent amount of strings without encountering a collision...
With an algo that converts a string of any length to an int, you have both the index and the string at your fingertips.
This is assuming you can have a decent amount of strings without encountering a collision...
皆黙って俺について来い!!
Pantheon: Gideon Zhi | CaitSith2 | Nach | kode54
Code: Select all
<jmr> bsnes has the most accurate wiki page but it takes forever to load (or something)
-
- ZSNES Developer
- Posts: 3904
- Joined: Tue Jul 27, 2004 10:54 pm
- Location: Solar powered park bench
- Contact:
If both sets are short, and you don't modify them, two separate sets sorted each way is best.
Otherwise, you want something with good time for addition and searching. Either a self balancing tree or a hash table.
Otherwise, you want something with good time for addition and searching. Either a self balancing tree or a hash table.
May 9 2007 - NSRT 3.4, now with lots of hashing and even more accurate information! Go download it.
_____________
Insane Coding
_____________
Insane Coding
Red-black tree if you go for an SBT:
http://en.wikipedia.org/wiki/Red-black_tree
If you're using C++, std::map<> is usually implemented this way.
http://en.wikipedia.org/wiki/Red-black_tree
If you're using C++, std::map<> is usually implemented this way.
Hmm, I might as well explain my full problem.
I'm playing with a database export from the MMO Eve Online. I'm writing a tool in C++/C# that connects to their API service to get information about my property in the game. The API returns all values as itemIDs which I can find in the database export. The specific task I'm trying to do is translate the itemID that corresponds to "Solar System" into a string.
There are five thousand some-odd solar systems in the game with non-repeating names. This information is relatively static, so I won't have to worry about adding or removing values.
As an example, the solar system "Jita" has an ID of 0142, the solar system "XX9-WV" has an ID of 1021.
If the API tells me that I have a control tower low on fuel in the solar system 0142, I want to be able to convert it into the string "Jita"
On the other hand, I might also want to search for all items that I own in the solar system "XX9-WV" which means I would need to convert it to the integer 1021 to query the API.
I'm not really sure, is having several thousand strings loaded into memory "trivial?" Since I don't modify the data having two lists might make more sense.
I'm playing with a database export from the MMO Eve Online. I'm writing a tool in C++/C# that connects to their API service to get information about my property in the game. The API returns all values as itemIDs which I can find in the database export. The specific task I'm trying to do is translate the itemID that corresponds to "Solar System" into a string.
There are five thousand some-odd solar systems in the game with non-repeating names. This information is relatively static, so I won't have to worry about adding or removing values.
As an example, the solar system "Jita" has an ID of 0142, the solar system "XX9-WV" has an ID of 1021.
If the API tells me that I have a control tower low on fuel in the solar system 0142, I want to be able to convert it into the string "Jita"
On the other hand, I might also want to search for all items that I own in the solar system "XX9-WV" which means I would need to convert it to the integer 1021 to query the API.
I'm not really sure, is having several thousand strings loaded into memory "trivial?" Since I don't modify the data having two lists might make more sense.
-
- ZSNES Developer
- Posts: 3904
- Joined: Tue Jul 27, 2004 10:54 pm
- Location: Solar powered park bench
- Contact:
Yes.Palin wrote: I'm not really sure, is having several thousand strings loaded into memory "trivial?"
May 9 2007 - NSRT 3.4, now with lots of hashing and even more accurate information! Go download it.
_____________
Insane Coding
_____________
Insane Coding
I can't access the EVE database directly, but the devs provide an exported copy every time they make a major revision to the game. I've uploaded individual tables to use in web applications before, but I'm not quite sure how I would interface with the database files using a C++/C# program.
One reason why extracting the data into a binary file seems like a better idea is because the mapSolarSystems table contains lots of extra information:
One reason why extracting the data into a binary file seems like a better idea is because the mapSolarSystems table contains lots of extra information:
All I really need is solarSystemName and solarSystemID, and maybe regionID.RegionID
constellationID
solarSystemID
solarSystemName
xyz
xMin
xMax
yMin
yMax
zMin
zMax
luminosity
border
fringe
corridor
hub
international
regional
constellation
security
factionID
radius
sunTypeID
securityClass
allianceID
-
- Seen it all
- Posts: 2302
- Joined: Mon Jan 03, 2005 5:04 pm
- Location: Germany
- Contact:
Estimate. If each string has a maximum size of 100 (single-byte) characters: 100 * 5,000 = 500,000 bytes.Palin wrote:I'm not really sure, is having several thousand strings loaded into memory "trivial?"
vSNES | Delphi 10 BPLs
bsnes launcher with recent files list
bsnes launcher with recent files list
-
- ZSNES Shake Shake Prinny
- Posts: 5632
- Joined: Wed Jul 28, 2004 4:15 pm
- Location: PAL50, dood !
Not even 500k. I've seen fatter top models.
皆黙って俺について来い!!
Pantheon: Gideon Zhi | CaitSith2 | Nach | kode54
Code: Select all
<jmr> bsnes has the most accurate wiki page but it takes forever to load (or something)
-
- ZSNES Shake Shake Prinny
- Posts: 5632
- Joined: Wed Jul 28, 2004 4:15 pm
- Location: PAL50, dood !
Save the whales, free the mallocs.
皆黙って俺について来い!!
Pantheon: Gideon Zhi | CaitSith2 | Nach | kode54
Code: Select all
<jmr> bsnes has the most accurate wiki page but it takes forever to load (or something)