In computing, a hash table (hash map) is a data structure used to implement an associative array, a structure that can map keys to values. A hash table uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found.
In general, hash table components: The idea of hashing is to distribute the entries (key/value pairs) across an array of buckets. Given a key, the algorithm computes an index that suggests where the entry can be found:
The end result may look something like the following:
f(key, array_size){ int hash = hash(key); int index = hash % array_size; return index; } hash(key){ // Code for whatever transformation you want // return that transformation }
Download PDF
Hash collisions are practically unavoidable when hashing a random subset of a large set of possible keys. For example, if 2,450 keys are hashed into a million buckets, even with a perfectly uniform random distribution, according to the birthday problem there is approximately a 95% chance of at least two of the keys being hashed to the same slot. Therefore, almost all hash table implementations have some collision resolution strategy to handle such events. Some common strategies are described below. All these methods require that the keys (or pointers to them) be stored in the table, together with the associated values.
In the method known as separate chaining, each bucket is independent, and has some sort of list of entries with the same index. The time for hash table operations is the time to find the bucket (which is constant) plus the time for the list operation. In a good hash table, each bucket has zero or one entries, and sometimes two or three, but rarely more than that. Therefore, structures that are efficient in time and space for these cases are preferred. Structures that are efficient for a fairly large number of entries per bucket are not needed or desirable. If these cases happen often, the hashing is not working well, and this needs to be fixed.
In another strategy, called open addressing, all entry records are stored in the bucket array itself. When a new entry has to be inserted, the buckets are examined, starting with the hashed-to slot and proceeding in some probe sequence, until an unoccupied slot is found. When searching for an entry, the buckets are scanned in the same sequence, until either the target record is found, or an unused array slot is found, which indicates that there is no such key in the table.[13] The name "open addressing" refers to the fact that the location ("address") of the item is not determined by its hash value. (This method is also called closed hashing; it should not be confused with "open hashing" or "closed addressing" that usually mean separate chaining.)
One interesting variation on double-hashing collision resolution is Robin Hood hashing.[15][16] The idea is that a new key may displace a key already inserted, if its probe count is larger than that of the key at the current position. The net effect of this is that it reduces worst case search times in the table. This is similar to ordered hash tables[17] except that the criterion for bumping a key does not depend on a direct relationship between the keys. Since both the worst case and the variation in the number of probes is reduced dramatically, an interesting variation is to probe the table starting at the expected successful probe value and then expand from that position in both directions.[18] External Robin Hashing is an extension of this algorithm where the table is stored in an external file and each table position corresponds to a fixed-sized page or bucket with B records.
Hash tables are the fastest data storage structure. This makes them a necessity for
situations in which a computer program, rather than a human, is interacting with
the data. Hash tables are typically used in spelling checkers and as symbol tables in
computer language compilers, where a program must check thousands of words or
symbols in a fraction of a second.
Download PDF
/** This programs uses the hash table to change and store phone numbers from names into an array. NOTE: If you try adding more than 1 contact this could cause collisions. For example if we had a contact named Nick and another named Norbert, using the below code, the key for both will be the first character 'N' which will map to the same number and thus inserting the second person into the phonebook/array will overwrite the first. Written in the C programming language By: (randerson112358) */ # include <stdio.h> int main(void) { int phoneNumber= 5554447777;// This is the 7 digit phone number of the contact int index;// This is the index within the phonebook/ hashTable char[31] name = "Nick";// This is the contacts name int table_size = 10; int hashTable[table_size]; // This is the phone book index = f(name[0], table_size); hashTable[index] = phoneNumber; system("pause"); //Comment this out if you are not using windows operating system } //returns the character key as an integer int hash( char key){ return (int)key; } int f(char key, int hashTable_size){ int hash = hash(key); int index = hash %hashTable_size; return index; }