A Class for Efficient Key Lookup in C++ - Symbol(C++)

A perfect hash

Consider the comparison operator for Hstring:

       int operator==(const Hstring& s, const Hstring& t) {
           if (s.hashval() != t.hashval())
               return 0;
           return s.the_string() == t.the_string();
       }

It first checks the hashes; if they are different, the strings must be different. Otherwise it compares the underlying strings using the String comparison operator. (The latter typically compares the cached lengths of the Strings, and if they are equal, does a memcmp(); assume the function the_string() returns the underlying String.) If the hash function is reasonably good, then most of the time different symbols will have different hashes, and comparison between different symbols will be fast. However, two Hstrings representing the same symbol will always have the same hash. Thus, comparison between two Hstrings representing the same symbol will be slower, requiring a call to the String comparison operator. If we spend a lot of time comparing symbols that turn out to be the same (as compilers typically do), we will spend a lot of time doing symbol comparison. This problem can be solved by using a perfect hash. A perfect hash for a string is a hash that is always different for different strings. If we had a perfect hash, we could implement the comparison operator as a simple comparison between hash values (Phstring is short for ``perfect hashed string''):

       int operator==(const Phstring& s, const Phstring& t) {
           return s.hashval() == t.hashval();
       }

Assume for the moment the existence of a function, perfect_hash(), which returns a perfect hash for a given string. Here's is what Phstring would look like:

       class Phstring {
       private:
           String s;
           unsigned long hash;
       public:
           Phstring(const char* s_) : s(s_) {
               hash = perfect_hash(s);
           }
           unsigned long hashval() const {
               return hash;
           }
           // ...
       };