Sets - Sets(C++)

Symbol(C++) as an application of sets

Now that you not only know something about using unordered collections, but also something about how they work, let's examine how Symbol(C++) achieves the efficiency of its relational operators by using Sets as their underlying representation. We'll look at only those parts of the Symbol(C++) interface whose implementation is interesting from a Set perspective:

       Symbol.h        class Symbol{
               inline Symbol(const String& s);
               inline int operator==(const Symbol& b) const;
               inline int operator<(const Symbol& b) const;

As stated in the Symbol(C++) manpage, Symbols are ``unique identifiers based on character strings'' having ``extremely efficient equality and total order relations whose speed is independent of the string on which the Symbol is based.'' Uniqueness is achieved by (1) inserting the associated String into a Set shared by all Symbols (a static data member) and (2) using the String's address within the Set (which is guaranteed unique by the very nature of Sets) as a surrogate for the String in all remaining Symbol operations. This means we must use the type Set<String>:

       string_set.c        #include <String.h>
           #include <Set.h>
               Set<String>::hash(const String& s){
               return (Set_or_Bag_hashval)s.hashval();

We can now define the internal representation:

       class Symbol{
           static Set<String>* set_ptr;
           const String* sp;

The operations are now fairly straightforward:

       Symbol::Symbol(const String& s){
           const String* result = set_ptr->contains(s);
           sp = result?result:set_ptr->insert(s);
       int Symbol::operator==(const Symbol& b)const{
           return sp == b.sp;
       int Symbol::operator<(const Symbol& b)const{
           return sp < b.sp;

Symbol creation from a String s is accomplished by (1) looking up the String in the Set, and if present, returning its address or (2) if not present, inserting it and returning its address. Note the Set member functions insert() and contains() both return pointers to elements, allowing their results to be used in boolean conditions, where a null pointer tests as ``false.'' The equality and less-than operators simply compare pointers. By the very nature of Sets, two Symbols will be equal precisely when the corresponding Strings would be equal. Note however, that while operator< defines a total ordering relation (allowing Symbols to be used as Map(C++) keys, for example), the relation is not lexicographic.

The actual implementation of Symbols, like those of so many classes that have static data members, is complicated by the necessity to insure that, prior to the construction of any Symbol, the static data member set_ptr is initialized to point to an empty String Set. While there are several technically interesting solutions to this problem, they are not specific to Sets, and will not be discussed here.

Previous topic: Internal representation of Sets

© 2005 The SCO Group, Inc. All rights reserved.
SCO OpenServer Release 6.0.0 -- 02 June 2005