No More Array Errors (Part II) - Array_alg(C++)

Performance

We wrote a program to compare the performance of Array_set with the current implementation of Set(C++) as described in the Set tutorial, elsewhere in this volume. The latter uses a relatively complicated digital trie data structure. Actually, we used the Set_of_p in the comparison because, of the three unordered collection classes defined in Set(C++), Set_of_p has the best performance characteristics.

Here are the first few lines of our program:

       #include <Stopwatch.h>    see Stopwatch(C++)
       #include <Array_alg.h>
       Stopwatch w;
       typedef double* DOUBLEP;

       #ifdef SETH
           #include <Set.h>
           typedef Set_of_p<double> SET;
       #else
           #include <Array_set.h>
           typedef Array_set<DOUBLEP> SET;
       #endif
       SET p,q;

The program uses a Stopwatch(C++) for timing. It uses the Array Algorithm shuffle(), described in shuffle(C++), to obtain good averages (see below).

The data used for the comparison consisted of N unique pseudo-random pointers:

       main(int,char* argv[]){
           int N = atoi(argv[1]);
           DOUBLEP* data = new DOUBLEP[N];
           for(int i = 0;i<N;i++){
               do{
                   data[i] = (DOUBLEP)rand();   
   see
   rand(S)
               }while(p.contains(data[i]));
               p.insert(data[i]);
           }
           ...
       }

Note that uniqueness is guaranteed by testing each pointer for containment before inserting it into p. For our first test, we repeatedly emptied p into q, then q into p, and so on. We timed the repetitions using the Stopwatch and took an average over the number of repetitions. To get a good average, we used shuffle() to apply a random permutation to the data at the end of each pass:

           int NREPS = 10000/N;
           w.start();
   
           for(i=0;i<NREPS;i++){
               for(int ii=0;ii<N;ii++){
                   assert(p.remove(data[ii]));
                   assert(q.insert(data[ii]));
               }

                for(ii=0;ii<N;ii++){
                   assert(q.remove(data[ii]));
                   assert(p.insert(data[ii]));
               }
               shuffle(data,data+N);
           }
           w.stop();
           cout << "insert/remove time = "
                << w.user()/NREPS << "\ n";

This test showed that, for insertion and removal, Array_set outperforms Set_of_p for values of N up to about 8. In terms of order estimates, the O(1) complexity of the single-element Set_of_p operations quickly overtakes the O(N) complexity of the corresponding Array_set operations.

Next, we timed the contains operation for a variety of elements known to be either members (data) or non-members (data2):

           DOUBLEP* data2 = new DOUBLEP[N];
   
           for(i=0;i<N;i++){
               do{
                   data2[i] = (DOUBLEP)rand();
               }while(p.contains(data2[i]));
           }
           w.reset();
           w.start();

            for(n=0;n<NREPS;n++){
               for (i = 0; i < HOWMANY; i++){
                   assert(p.contains(arr1[i]));
               }
               for(i = 0; i < HOWMANY; i++){
                   assert(!p.contains(arr2[i]));
               }
           }

           w.stop();
           cerr << "contains (yes/no) time = "
                << w.user()/NREPS << "\ n";

This test showed that Array_set outperforms Set_of_p for values of N up to 9.

The story changes dramatically when we compare operations that are O(N) in both implementations. Consider iteration over all elements. Array_set iteration is little more than a for-loop over an array; in terms of order estimates, the constant multiplier for iteration is very close to 1. Set_of_p iteration is a walk over a complicated N-ary tree structure; in terms of order estimates, the constant multiplier is relatively large. We would therefore expect Array_set iteration to outperform Set_of_p iteration even for small values of N, and to become progressively better as N grows larger. We would expect the same for the algebraic (union, difference, and so-on) and relational operations (equality, subset, superset), which are also O(N) in both implementations. Actual measurements bear this out (times are in milliseconds):

N = 100 Array_set Set_of_p

equality 0.5 11.2 union 2.5 14.5 difference 3.3 28.2

N = 1000 Array_set Set_of_p

equality 5.0 71.7 union 11.6 89.0 difference 20.0 161.7

In summary, for applications emphasizing O(1) operations (insertion, removal, membership testing), Set_of_p overtakes Array_set at around 10 elements, and becomes progressively better as the number of elements increases. For applications emphasizing O(N) operations (iteration, algebraic, relational operations), Array_set is faster than Set_of_p for all values of N and becomes progressively better as the size increases.

A similar comparison of the current implementation of Map(C++) against a Block(C++) and Array_alg(C++) implementation of the same specification gave even more dramatic results: insertion breaks even at 155 elements, membership at 80 elements, and Array_set iteration is always faster.

If your application runs on a machine with limited memory, executable size is probably just as important to you as execution speed. To give an idea of relative sizes of programs using Array_set and the current Set implementation, we compiled and linked the following two programs:

        CC -o main1 -l++
           #include <Set.h>
           main(){ }
        CC -o main2 -l++
           #include <Array_set.h>
           main(){ }

On a Sun 3, the stripped executable sizes of main1 and main2 were 82K bytes and 41K bytes, respectively.