21.1. Hash Tables¶

21.1.1. Hashing Introduction¶

Hashing is a method for storing and retrieving records from a database. It lets you insert, delete, and search for records based on a search key value. When properly implemented, these operations can be performed in constant time \(O(1)\). In fact, a properly tuned hash system typically looks at only one or two records for each search, insert, or delete operation. This is far better than the \(O(\log n)\) average cost required to do a binary search on a sorted array of \(n\) records, or the \(O(\log n)\) average cost required to do an operation on a binary search tree. However, even though hashing is based on a simple idea, it is surprisingly difficult to implement properly. Designers need to pay careful attention to all of the details involved with implementing a hash system.

A hash system stores records in an array called a hash table.

Hashing works by performing a computation on a search key key in a way that is intended to identify the position in the hash table that contains the record with a given key key. The function that does this calculation is called the hash function, which is denoted by hashCode(). Since hashing schemes place records in the table in whatever order satisfies the needs of the address calculation, records are not ordered by value. A position in the hash table is also known as a slot. The number of slots in a hash table will be denoted by the capacity with slots numbered from 0 to capacity-1.

The goal for a hashing system is to arrange things such that, for any key key and some hash function hashCode(), we map the key to a slot i in the table such that 0 <= i < capacity, and we have the key of the record stored at position i in the table equal to key.

Hashing is not good for applications where multiple records with the same key value are permitted. Hashing is not a good method for answering range searches. In other words, we cannot easily find all records (if any) whose key values fall within a certain range. Nor can we easily find the record with the minimum or maximum key value, or visit the records in key order. Hashing is most appropriate for answering the question, ‘What record, if any, has the given key?’ For applications where all search is done by exact-match queries, hashing is the search method of choice because it is extremely efficient when implemented correctly.

As a simple illustration of hashing, we can actually think of the indices of an array as a hash code. If the key is an integer, then the hash function is just the key itself:

int hashCode(int key) {
    return key;
}

To find the record with key value key, we look in array[key].

Settings

Saving...

Server Error
Resubmit

In general, the hashCode() function can produce any integer value from the key input, so we’ll need to additionally ensure that the resulting index is within the bounds of the array. We accomplish this by taking the modulus of the hashCode() with the capacity of the array. Assuming that the capacity of the array is CAPACITY, we have the follwing method getIndex() that computes the index for any key:

    /**
     * Find the index into the array for the given key
     * @param key the key being looked up
     * @return the position in the table for the key
     */
    private int getIndex(K key) {
        // Get the hash code for the key.
        // java has a built-in hashcode method for all objects
        int hashCode = key.hashCode(); 

        // Mod the hashCode by the capacity of the array to get a valid index
        int index = Math.abs(hashCode) % CAPACITY;

        return index;
    }

Note

The use of the modulo % operator ensures that the resulting index is within the bounds of the array, just like how we did for our circular array implementation of queues.

In practice, we may have more sophisticated hash functions that map keys to indices, as the key can be of any type. We ideally want a hash function that distributes keys uniformly across the array, so that we avoid two keys mapping to the same slot (called a collision). Conveniently, Java provides a reasonable built-in hashCode() method for all objects, so we can use that as our hash function. Just like how every object in Java has a toString() method, every object also has a hashCode() method!

21.1.2. Hash Table Implementation¶

Hash tables are a good data structure choice for implementing the Map interface we saw before. To begin our implementation, we first need a way to represent key-value pairs, which we do through the KeyValuePair class:

/**
 * A key-value pair to be stored in a hash table entry.
 * K is the type of the key.  V is the type of the value.
 */
public class KeyValuePair<K, V> {
    // The key
    private K key;

    // The value
    private V value;

    public KeyValuePair(K key, V value) {
        this.key = key;
        this.value = value;
    }

    public V getValue() {
        return value;
    }

    // similar to a setter, but we're returning the old value
    public V putValue(V newValue) {
        // Remember the value so it can be returned.
        V oldValue = this.value;

        // Update the value
        this.value = newValue;

        // Return its original value
        return oldValue;
    }

    public K getKey() {
        return key;
    }
}

Note

Notice that the putValue() method returns the previous value associated with the key, or null if the key was not in the map. This matches the behavior of the put() method in the MHCMap interface!

Next, let’s take a look at our MHCHashMap instance variables:

public class MHCHashMap<K, V> implements MHCMap<K, V> {
    // Capacity of the table.
    private static final int CAPACITY = 100;

    // Holds the key-value pairs that have been inserted
    private KeyValuePair<K,V>[] entries;   

The entries array holds KeyValuePair objects, and the capacity of the hash table is given by CAPACITY.

21.1.3. MHCHashMap get()¶

Let’s now take a look at the get() method:

    /** 
     * Get the value associated with a key
     * @param key the key to look up
     * @return the value associated with the key or null if the key is not
     * in the map.
     */
    public V get(K key) {
        // get the index for the key
        int index = getIndex(key);

        // check whether the entry at that index exists
        if(entries[index] == null) { // if an array pos is empty, it will be null
            return null;
        }

        // otherwise, return the value in the entry
        return entries[index].getValue();
    }

We first compute the index for the key using the getIndex() method we saw above, and then we check if the entry at that index is null. If it is, then the key is not in the map, so we return null.

Note

Remember that array slots are initialized to null to indicate an empty slot in the array!

Otherwise, we return the value associated with the key by calling getValue() on the KeyValuePair object at the computed index.

21.1.4. MHCHashMap put()¶

Next, let’s look at the put() method:

    /** 
     * Associates a value with a key.
     * @param key the key whose value is set
     * @param value the value to be associated with the key
     * @return if the key previously had a value, return the previous value.  If the key
     * was not in the table, return null.
     */
    public V put(K key, V value) {        
        // get the index for the key
        int index = getIndex(key);

        // No entry for the key currently
        if(entries[index] == null) { // if an array pos is empty, it will be null
            entries[index] = new KeyValuePair<>(key, value);
            return null;
        }
        // Entry already exists for the key
        else {
            // 1. updates the value at the index
            // 2. returns the old value stored at that index
            return entries[index].putValue(value);
        }
    }

Like with get(), we first compute the index for the key using getIndex(), and then we check if the entry at that index is null. If it is, then the key is not in the map, so we create a new KeyValuePair with the given key and value, and place it in the entries array at the computed index.

Otherwise, we update the value associated with the key by calling putValue() on the KeyValuePair object at the computed index.

21.1.5. MHCHashMap Complete Reference¶

We focus on the get() and put() methods for our hash table implementation in our discussions, but the complete implementation can be found in the code block below:

/**
 * A hash table implementation of a Map.  
 * K is the type of the keys used
 * V is the type of the values associated with the keys.
 */
public class MHCHashMap<K, V> implements MHCMap<K, V> {
    // Capacity of the table.
    private static final int CAPACITY = 100;

    // Holds the key-value pairs that have been inserted
    private KeyValuePair<K,V>[] entries;   
    @SuppressWarnings("unchecked")
    public MHCHashMap() {
        entries = new KeyValuePair[CAPACITY];
    }

    /** 
     * Get the value associated with a key
     * @param key the key to look up
     * @return the value associated with the key or null if the key is not
     * in the map.
     */
    public V get(K key) {
        // get the index for the key
        int index = getIndex(key);

        // check whether the entry at that index exists
        if(entries[index] == null) { // if an array pos is empty, it will be null
            return null;
        }

        // otherwise, return the value in the entry
        return entries[index].getValue();
    }

    /**
     * Find the index into the array for the given key
     * @param key the key being looked up
     * @return the position in the table for the key
     */
    private int getIndex(K key) {
        // Get the hash code for the key.
        // java has a built-in hashcode method for all objects
        int hashCode = key.hashCode(); 

        // Mod the hashCode by the capacity of the array to get a valid index
        int index = Math.abs(hashCode) % CAPACITY;

        return index;
    }

    /** 
     * Associates a value with a key.
     * @param key the key whose value is set
     * @param value the value to be associated with the key
     * @return if the key previously had a value, return the previous value.  If the key
     * was not in the table, return null.
     */
    public V put(K key, V value) {        
        // get the index for the key
        int index = getIndex(key);

        // No entry for the key currently
        if(entries[index] == null) { // if an array pos is empty, it will be null
            entries[index] = new KeyValuePair<>(key, value);
            return null;
        }
        // Entry already exists for the key
        else {
            // 1. updates the value at the index
            // 2. returns the old value stored at that index
            return entries[index].putValue(value);
        }
    }

    /** 
     * Returns true if the map has an entry for the key
     * @param key the key to look up
     * @return true if the key is in the table.  Otherwise return false.
     */
    @Override
    public boolean containsKey(K key) {
        int index = getIndex(key);
        return entries[index] != null;
   }

    /** 
     * Removes the entry for the key.
     * @param key the key for the entry to remove
     * @return If the key was in the table, the value previously associated with 
     *  the key is returned.  If the key was not in the table, null is returned.
     */
    public V remove(K key) {
        int index = getIndex(key);

        // No entry for the key; just return
        if (entries[index] == null) {
            return null;
        }

        // Entry exists.  Save the value currently stored 
        // so it can be returned
        V oldValue = entries[index].getValue();

        // Remove the entry
        entries[index] = null;

        // Return the original value
        return oldValue;
    }
}

MHC S25 COMSC 205: Data Structures

Chapter 21 Hash Tables I

21.1. Hash Tables¶

21.1.1. Hashing Introduction¶

21.1.2. Hash Table Implementation¶

21.1.3. MHCHashMap get()¶

21.1.4. MHCHashMap put()¶

21.1.5. MHCHashMap Complete Reference¶