Kyo Suayan | suayan.com | Implementing Hashes

Implementing Hashes

Sunday, February 19th 2023

function simpleHash(str, max) {
  let hash = 0;
  for (let i = 0; i < str.length; i++) {
    hash += str.charCodeAt(i);
  }
  return hash % max;
}

This function takes two arguments: str, the string to be hashed, and max, the maximum number of slots in the hash table. The function uses a simple algorithm to compute the hash value of the string by summing up the ASCII values of its characters. It then returns the hash value modulo max, which maps the hash value to a slot in the hash table.

Here's an example of how to use this function to create a hash table:

const hashTable = new Array(10);
for (let i = 0; i < hashTable.length; i++) {
  hashTable[i] = [];
}

const keys = ['apple', 'banana', 'cherry', 'date', 'elderberry', 'fig', 'grape', 'honeydew', 'kiwi', 'lemon'];

for (let key of keys) {
  const index = simpleHash(key, hashTable.length);
  hashTable[index].push({ key, value: key.toUpperCase() });
}

console.log(hashTable);

In this example, we create an array hashTable with 10 slots, and initialize each slot to an empty array. We then create an array of keys to be inserted into the hash table. For each key, we compute its hash value using the simpleHash function, and use the hash value to index into the hashTable array. We then push an object containing the key and its uppercase value into the corresponding slot in the hash table.

This implementation is just a simple example, and in practice, more advanced hash functions and collision resolution techniques are often used to improve the performance and security of hash tables.

A more secure implementation that uses a collision resolution technique

class HashTable {
  constructor(size = 100) {
    this.size = size;
    this.buckets = new Array(size);

    for (let i = 0; i < this.buckets.length; i++) {
      this.buckets[i] = new LinkedList();
    }
  }

  hash(key) {
    let hash = 0;
    for (let i = 0; i < key.length; i++) {
      hash += key.charCodeAt(i);
    }
    return hash % this.size;
  }

  set(key, value) {
    const index = this.hash(key);
    const bucket = this.buckets[index];

    for (let node of bucket) {
      if (node.key === key) {
        node.value = value;
        return;
      }
    }

    bucket.append({ key, value });
  }

  get(key) {
    const index = this.hash(key);
    const bucket = this.buckets[index];

    for (let node of bucket) {
      if (node.key === key) {
        return node.value;
      }
    }

    return undefined;
  }

  delete(key) {
    const index = this.hash(key);
    const bucket = this.buckets[index];

    for (let node of bucket) {
      if (node.key === key) {
        bucket.delete(node);
        return true;
      }
    }

    return false;
  }
}

In this implementation, we use a class HashTable to represent a hash table, which takes an optional size argument to set the number of slots in the table. We use a linked list implementation to handle collision resolution, where each slot in the hash table is initialized with an empty linked list.

The hash method computes the hash value of a key using a simple algorithm that sums up the ASCII values of its characters, and returns the hash value modulo the size of the hash table.

The set method takes a key and a value, and stores them in the hash table. If the key already exists in the table, it updates its value. Otherwise, it appends a new node containing the key and value to the linked list at the corresponding slot.

The get method takes a key, looks up its corresponding slot in the hash table, and searches the linked list for the node with the matching key. If it finds a node with the key, it returns its value. Otherwise, it returns undefined.

The delete method takes a key, looks up its corresponding slot in the hash table, and searches the linked list for the node with the matching key. If it finds a node with the key, it deletes it from the linked list and returns true. Otherwise, it returns false.

This implementation handles collision resolution using a linked list, which allows for efficient insertion and deletion of nodes, and ensures that no data is lost due to collisions. However, other collision resolution techniques, such as open addressing and probing, can also be used depending on the specific requirements of the application.

Open Addressing

Open addressing is another technique for resolving collisions in hash tables. Unlike chaining, which uses a linked list to store multiple values that hash to the same index, open addressing stores all the values in the hash table itself, usually in consecutive slots.

The idea behind open addressing is that when a collision occurs, instead of storing the value in the first slot at the hash index, we keep searching for the next available slot until we find an empty slot or a slot that already contains the same key.

There are several ways to find the next available slot in the hash table when a collision occurs, such as linear probing, quadratic probing, and double hashing. Linear probing is the simplest and most common approach, where we search for the next empty slot by incrementing the hash index by a fixed interval until an empty slot is found.

Here's an example of how open addressing using linear probing can be implemented in JavaScript:

class HashTable {
  constructor(size = 100) {
    this.size = size;
    this.keys = new Array(size);
    this.values = new Array(size).fill(null);
  }

  hash(key) {
    let hash = 0;
    for (let i = 0; i < key.length; i++) {
      hash += key.charCodeAt(i);
    }
    return hash % this.size;
  }

  set(key, value) {
    let index = this.hash(key);

    while (this.keys[index] !== undefined) {
      if (this.keys[index] === key) {
        this.values[index] = value;
        return;
      }

      index++;
      if (index === this.size) {
        index = 0;
      }
    }

    this.keys[index] = key;
    this.values[index] = value;
  }

  get(key) {
    let index = this.hash(key);

    while (this.keys[index] !== undefined) {
      if (this.keys[index] === key) {
        return this.values[index];
      }

      index++;
      if (index === this.size) {
        index = 0;
      }
    }

    return undefined;
  }

  delete(key) {
    let index = this.hash(key);

    while (this.keys[index] !== undefined) {
      if (this.keys[index] === key) {
        this.keys[index] = undefined;
        this.values[index] = null;
        return true;
      }

      index++;
      if (index === this.size) {
        index = 0;
      }
    }

    return false;
  }
}

In this implementation, we use two arrays, keys and values, to store the keys and values in the hash table, respectively. When a key is hashed to an index that is already occupied, we keep searching for the next empty slot by incrementing the index by 1 until we find an empty slot. If we reach the end of the table, we wrap around to the beginning.

The set, get, and delete methods use the same search algorithm to find the corresponding key in the hash table. When a key is found, the associated value is updated, returned, or deleted, respectively. If the search reaches an undefined slot, the key is not in the hash table, and the methods return undefined or false.

One important consideration when using open addressing is that the hash table needs to have sufficient empty slots to handle the worst-case scenario, where every key hashes to the same index. This can be achieved by setting the load factor, which is the ratio of the number of items in the hash table to the size of the table, to a suitable value, typically less than 0.7.

If the load factor exceeds the threshold, the hash table needs to be resized and rehashed to a larger size to reduce the probability of collisions. Resizing the hash table can be an expensive operation, so it's important to choose a suitable initial size and load factor to balance the performance and memory usage of the hash table. Overall, open addressing is a useful technique for implementing hash tables with a small memory footprint and good cache locality, especially when the number of items in the table is relatively small and the hash function is well-distributed. However, it requires careful tuning and may not be suitable for all use cases.