Miscellaneous

This chapter documents facilities of the ICL that come with a simple caveat. Before using any of these facilities, try solving the problem in a different way. Then, if you still can not think of a different way, try again. Some of these facilities are machine dependent and some allow faster execution at the loss of readability. Still others provide the only way of doing some things and you need to make sure you actually need to do those things.

Bit Banging

C++ evolved from a language designed for systems programming. As such, it includes facilities that are avoided in other structured programming languages such as Pascal. In this tradition, the ICL has added its own non-structured programming constructs. The primary set of these constructs allows you to access individual bits of the values in a plane. The Bit method allows you to access a particular bit. One way of selecting all the PEs with positive values in a ShortPlane s would be

    {Select active(s.Bit(15)); ... }

A better way would be

    {Select active(s >= 0); ... }

But, sometimes you just need to access a particular bit. Bits are numbered starting at zero from the least significant to the most significant(1).

The InsertBits(2) methods allow you to set particular bits in a plane. For example, the following code implements the absolute value function for floating point planes by setting the sign bits to zero(3).

    FloatPlane absf(FloatPlane arg)
     {
      return arg.InsertBits(0,1,31,0);  // Set sign to zero
     }

There are three forms of arguments that InsertBits will allow.

    InsertBits(plane source, int length, int to, int from)
    InsertBits(scalar source, int length, int to, int from)
    InsertBits(BitPlane source, int to)

The source is the source of the bits that will be inserted into the plane to which the method is applied(4). length specifies the number of bits starting at the from position in the source. The to position specifies where in the destination the bits will be inserted.

The algorithm for the case with the scalar source and a CharPlane is given below.

  CharPlane
  insert_bits (CharPlane &dest, int source, int length, int to, int from)
   {
    int mask = (1 << length) - 1;
    return (dest & ~(mask << to)) | (((source >> from) & mask) << to);
   }

Scalar Processing

A plane is a grid of scalar values. There may be many thousands of these values in a single plane. The CAAPP is designed to process these grids efficiently by doing operations in parallel on all the values at the same time. There are times when you may want to process a single value at a time. These times should be avoided because this type of processing removes any advantage the CAAPP has over a standard sequential computer.

One way to process one value at a time is to simply select a single value using the activity. Then, the operations performed will only affect that single value. To select a single PE, that PE must be unique in some way that may be tested. It must have a unique value stored in it. We test for this value and use the result to set the activity. One thing that is unique about any PE is its row and column index.

A more probable need is to use an individual PE's value in the ACU. For example, the following program will extract the values from a CharPlane and store them in an array.

    void to_array(CharPlane x, char* x_copy)
     {
      Everywhere active;
      int rows, cols;
      BitPlane row(x.Size());

      rows = x.RowSize();
      cols = x.ColSize();

      for (int i = 0; i < rows; i++)
       {
        row = (x.RowIndex() == i);
        for (int j = 0; j < cols; j++)
          *(x_copy + j + i * cols) = x.Access(row & x.ColIndex() == j);
       }
     }

This program works because the Access method, applied to the plane x, returns the selected value. If there is more than one selected PE, then Access returns the bit-wise logical OR of the bits of the different values selected in x. The inverse to the above code is shown below.

    void from_array(CharPlane &x, char* x_copy)
     {
      Everywhere active;
      int rows, cols;

      rows = x.RowSize();
      cols = x.ColSize();

      for (int i = 0; i < rows; i++)
       {
        Select active((x.RowIndex()== i);
        for (int j = 0; j < cols; j++)
         {
          Select active(x.ColIndex() == j);
          x = *(x_copy + j + i * cols);
         }
       }
     }

The definition of Access for CharPlanes is simple.

    char Access(CharPlane source, BitPlane selected)
     {
      char result = 0;
      for (int i = 7; i >= 0; i--)
       {
        result = result << 1;
        result |= (selected & source.Bit(i)).Any();
       }
      return result;
     }

Arrays

The example functions, to_array and from_array, shown above require that the array argument be the same size as the plane argument. The ICL provides two methods which do the same operation, are faster, and are more robust.

    ToArray(array, int rows, int cols)
    FromArray(array, int rows, int cols)

The rows and cols arguments specify the size of the array. If the plane is larger, only the top, left corner of the plane is used. If the plane is smaller, ToArray leaves the unmatched locations of the array alone. The following code is safe if not correct.

    IntPlane x(200,300);
    CharPlane y(64,128);
    short x_copy[100][200];
    x.ToArray((short *)x_copy,100,200);
    y.FromArray((short *)x_copy,100,200);

Type conversions are used as needed. The above example converts from int to short to char. Because of the expected applications for these methods, these data movement methods are not affected by the activity(5). The ToArray and FromArray method implementations use a different, more parallel, mechanism than that used in the example functions above.

Results, One at a Time

A common situation in using SIMD systems is that, of all the thousands of processors that may be in the machine, a much smaller set ends up containing the values that are of interest to the application. An example would be a program that finds all the vertices in an image. The image may have 2^18 pixels and be processed on a virtual CAAPP of the same size. After the processing, there may be only 100 vertices scattered among the PEs. If we wish to print out a list of these vertices, we must access them one at a time and print them. We can use the methods described above as shown below. In this example, the BitPlane vertex has a 1 at the PEs that correspond to vertices in the image.

    BitPlane vertex(256,256);
    char vertic_c[256][256];
    ...
    vertex.ToArray((char *)vertic_c,256,256);
    for (int i = 0; i < 256; i++)
      for (int j = 0; j < 256; j++)
        if (vertic_c[i][j]) printf("Vertex at (%d,%d).\n",i,j);

This program will execute the if statement 65000 times. Let us rewrite this to use the SelectMin method. This new program will loop as many times as there are vertices. SelectMin(6) finds the set of selected PEs with the minimum value for the plane to which it is applied(7).

    BitPlane vertex(ps);
    ....
     {
      Everywhere active;
      BitPlane remaining(ps);
      BitPlane min(ps);

      remaining = vertex;

      while (remaining.Any())
       { // While any not processed
         // Select minimum of those remaining
        min = (vertex.Index()).SelectMin(remaining));
        printf("Vertex at (%d,%d).\n",
               (vertex.RowIndex()).Access(min),
               (vertex.ColIndex()).Access(min));
        remaining &= ~min;
       }
     }

The result is a printed list of the vertices in row major order because SelectMin selects the minimum PE index for the active PEs with each iteration. Whichever one is selected, it can not be selected the next time because its value for remaining is set to zero. Note that the two Select declarations can not be combined as shown below.

    {Select active(remaining & (vertex.Index()).SelectMin()); ... }

because, after the first iteration, the set of active PEs would be empty. We must select the minimum of the remaining PEs.

A program to print out all of the unique values, of some UShortPlane from the largest to the smallest value, is given below to illustrate the SelectMax method. There may be more than one PE active and responding to the Access method, but each of these PEs will have the same value.

    void print_unique(ShortPlane vals)
     {
      Everywhere active;
      BitPlane remaining(vals.Size());
      remaining = 1;
      int count = 0;
      while (remaining.Any())
       {
        Select active(remaining);
         {Select active (vals.SelectMax());
          printf("%5d ",vals.Access());
          if ((++count % 10) == 0) printf("\n");
          remaining = 0;
         }
       }
      printf("There were %d different values.\n",count);
     }

Multi-resolution

An important technique, used in image processing applications, does portions of the processing at different resolutions of the same image. On a standard computer, the fewer pixels that must be looked at, the faster the processing is. In other applications, it may be easier to do the processing when there is less resolution because the lack of resolution hides some of the unimportant details in the image. The first reason has some applicability to the CAAPP even though the actual sizes of images are often the same when the original images were different sizes. The second reason is always pertinent.

As an example, let us use a typical technique of smoothing an image and then do sub-sampling. We will smooth by selecting the minimum value in each two by two window as the value of the PE in the center of the window.

    UCharPlane original_image(ps);
    UCharPlane smoothed_image(ps);
    UCharPlane east(ps), west(ps);
    ....
    smoothed_image = original_image;
    east = original_image.East();
    west = original_image.West();
     {Select active(smoothed_image > east);
      smoothed_image = east;
     }
     {Select active(smoothed_image > west);
      smoothed_image = west;
     }
     {Select active(smoothed_image > east.North());
      smoothed_image = east.North();
     }
     {Select active(smoothed_image > west.North());
      smoothed_image = west.North();
     }
     {Select active(smoothed_image > east.South());
      smoothed_image = east.South();
     }
     {Select active(smoothed_image > west.South());
      smoothed_image = west.South();
     }
     {Select active(smoothed_image > original_image.South());
      smoothed_image = original_image.South();
     }
     {Select active(smoothed_image > original_image.North());
      smoothed_image = original_image.North();
     }

We will now sub-sample by selecting every upper-left PE in that 2 by 2 window.

    {Select active((~(smoothed_image.RowIndex()).Bit(0))
                 & (~(smoothed_image.ColIndex()).Bit(0)));
        ....
    }

There are several problems with this algorithm. First, we have not made the planes any smaller so there is no chance that we will use fewer tiles and run faster. Second, if we wish to do any non-trivial processing on this reduced image, we will have trouble because no PE's neighbor in the reduced image is their physical neighbor. Therefore, mesh and Coterie operations will be difficult. We must actually reduce the image by moving the values. Though there are faster ways, we will use Route to move the values.

    UCharPlane reduced(ps);
    UShortPlane row_address(ps);
    UShortPlane col_address(ps);
    row_address = 0xffff;  // Non-existent PE address
     {Select active((~(smoothed_image.RowIndex()).Bit(0)) 
                  & (~(smoothed_image.ColIndex()).Bit(0)));
      row_address = smoothed_image.RowIndex() >> 1;    
      col_address = smoothed_image.ColIndex() >> 1;
     }
    reduced = smoothed_image.Route(row_address, col_address);

Now, the values we are interested in are in the upper-left corner of the reduced plane. We routed the values from the other PEs to non-existent destinations. But, this plane is no smaller than the smoothed_image plane. One last step will finish the job.

    int sr = reduced.RowSize() / 2;
    int sc = reduced.ColSize() / 2;
    UCharPlane small_reduced(sr, sc);
    small_reduced = reduced.Resize(sr,sc);

The Resize method creates a new plane from the original one and gives it the size specified(8). If the new size is larger, the upper-left corner is set with the values from the source plane and values at other PEs are undefined. If the new size is smaller, only the corresponding upper-left corner of the source plane is used.

Next -- AppendixA
ICL Table of Contents
ICL Index
ICL Home Page

Notes

This is a machine dependent operation. On computers other than the CAAPP, the ICL implementation may place the sign bit for ShortPlanes in some other position than bit 15.
This method may be eliminated from the ICL at some future time. You should avoid using it.
The ICL uses IEEE floating point format which represents values as signed-magnitude values.
The plane is not modified. InsertBits returns the merge of the source and this plane as its value.
This is the reason why the code for the to_array and from_array examples used the Everywhere declaration.
SelectMin is similar to RegionSelectMin but does not use the Coterie because it treats the entire virtual CAAPP as a single region.
If no argument is supplied, SelectMin finds the minimum of all PEs.
The size may be specified with two integers as shown in the example or with a PlaneSize instance.