Arithmetic

We have seen how to define a plane so that we may map an image to the CAAPP with one pixel at each PE. We have also seen how to transfer the values from one plane to another. Let us now consider how to perform arithmetic operations on the elements of planes. A standard image processing operation is to convert three images representing the red, green, and blue components of a color image into the three images representing luminance (Y) and chrominance (I & Q) used by television broadcasters. If we were to write C++ statements to convert just one set of three pixels, it would look like the following code.

    unsigned char red, green, blue;
    unsigned char Y, I, Q;
    ...
    Y = 0.30 * red + 0.59 * green + 0.11 * blue;
    I = 0.60 * red + 0.28 * green + 0.32 * blue;
    Q = 0.21 * red + 0.52 * green + 0.31 * blue;

The standard representation for the red, green, and blue components of an RGB image is three integer values from 0 to 255. The same is true for luminance (Y) and chrominance (I, Q). The constants result from the relationship between these two representations. In the above computation, the integer values are converted to floating point representation before the multiply and the result of the additions is then converted back to integer values by truncation.

There is something missing in the code above. This code works for scalar values. However, images contain many scalar values. In order for the above code to be useful, it must be surrounded by a loop with code that accesses individual pixels in the red, green, and blue images and stores them one at a time in the red, green, and blue variables. In the same loop after the computation, another piece of code must do the same with the Y, I, and Q values to store them in the result images. That is,

    unsigned char red[480][640], green[480][640], blue[480][640];
    unsigned char Y[480][640], I[480][640], Q[480][640];
    ...
    for (int i = 0; i < 480; i++) 
      for (int j = 0; j < 640; j++)
       {
        Y[i][j] = 0.30 * red[i][j]  + 0.59 * green[i][j]  + 0.11 * blue[i][j];
        I[i][j] = 0.60 * red[i][j]  + 0.28 * green[i][j]  + 0.32 * blue[i][j];
        Q[i][j] = 0.21 * red[i][j]  + 0.52 * green[i][j]  + 0.31 * blue[i][j];
       }

Using the ICL, the code to perform the conversion is almost identical to the code above. Instead of looping, the operations are performed in parallel at every pixel.

    PlaneSize ntsc(480, 640);
    UCharPlane red(ntsc), green(ntsc), blue(ntsc);
    UCharPlane Y(ntsc), I(ntsc), Q(ntsc);
    ...
    Y = 0.30 * red + 0.59 * green + 0.11 * blue;
    I = 0.60 * red + 0.28 * green + 0.32 * blue;
    Q = 0.21 * red + 0.52 * green + 0.31 * blue;

The computation performed on every pixel is identical. The pixels of the red image(1) are converted to floating point and multiplied by the floating point constants. The different products are added and the result is converted back to integer to be stored in the Y, I, and Q images.

From this, you might conclude that writing expressions for planes is identical to writing expressions for scalars. Indeed, the ICL was designed to provide this level of correspondence so that people would have an easier time of writing code for a SIMD computer.

There are differences of which you should be aware. Sequential computers, such as a standard workstation, take the same amount of time to add two eight bit values as to add two 16 bit values or two 32 bit values because they always operate on 32 bits at a time.

Consequently, a C++ compiler will always maintain 32 bits of precision in scalar integer operations and will process all floating point scalar values using double precision floating point.

Remember that a PE is a bit serial processor. This means that operations are performed one bit at a time. Adding two eight bit values requires 17 operations. Adding two 16 bit values requires 33 operations. For elements of a plane, the fewer bits used in an operation, the faster that operation will be performed. Consequently, the ICL does not maintain 32 bits of precision in integer operations or use double precision floating point. Consider the computation below.

    UCharPlane x(ps), y(ps), z(ps), average(ps);
    ...
    average = (x + y + z) / 3;

If x is 20, y is 35, and z is 31 at some PE, you would expect the value of average at that PE to be set to 28. Suppose the values were 120, 135, and 131. Would the result be 128? 120 plus 135 plus 131 is 386. But, in this case intermediate results are kept as unsigned chars which can represent a maximum value of 255. Thus, the sum is 130 and the result is 43 not 128 (2)! To write this code correctly for the ICL, it is necessary to expand the precision used as shown below.

    average = (ShortPlane(x) + y + z) / 3;

Because the values in x are converted to shorts, the addition of y is performed using 16 bits. The addition of z is also performed using 16 bits. The division is performed between a short and a 32 bit integer scalar (3) instead of between a unsigned char and a 32 bit integer scalar. It was not necessary to write this statement as

     average = (ShortPlane(x) + ShortPlane(y) + ShortPlane(z)) / 3;

because, when using bit serial algorithms, it is possible to add a 16 bit value and an 8 bit value to produce a 16 bit result. This operation is also faster than adding two 16 bit values.

When designing the ICL, we made a deliberate decision to give up some ease of use for increased speed of execution. Your programs run faster but they are harder to write.

The general rule that the ICL follows is that types of are not promoted unless absolutely necessary. Thus, the result of adding two 8 bit unsigned values (UCharPlane) is an unsigned 8 bit value. The result of subtracting a CharPlane from another CharPlane is a CharPlane. But, the result of subtracting one unsigned 8 bit value (UCharPlane) from another is a signed 8 bit value (CharPlane). Adding two BitPlanes results in a UCharPlane. Multiplying two CharPlanes results in a ShortPlane. A complete compendium of the result types of operations on planes is in Appendix A.

Most of the standard arithmetic operations supported by C++ for scalars are supported for planes by the ICL. The operators supported are:

    Dyadic:         +  -  *  /  ^  &  |  %  <<  >>  ==  !=  >  <  >=  <=
    Monadic:        ~  -  +  ++(postfix)  --(postfix)
    Compound:    +=  -=  *=  /=  ^=  &=  |=  %=

The operators that are not supported are:

    Dyadic:    &&  ||
    Monadic:    !  *  --(prefix)  ++(prefix)
    Triadic:    ?:

The prefix monadic operator & returns the address of the structure used to represent a plane in the ACU.

Since the CAAPP PEs are bit serial, you may have wondered if floating point operations are performed bit serial. Floating point operations are much slower than integer operations because they are performed bit-serial. Let us revisit our original color conversion problem. It was programmed using floating point. How can we perform the same operation using integer arithmetic while maintaining the same accuracy? Remember that the red, green, and blue images were restricted to integer values in the range 0 to 255 and that the result was also restricted to the same range. We can eliminate the floating point constants by using ratios instead. We must be careful to do the multiply before the divide or we will lose all significance in our result. We must write the operation as 30 * red / 100, not (30 / 100) * red. The code now becomes.

    PlaneSize ntsc(480, 640);
    UCharPlane red(ntsc), green(ntsc), blue(ntsc);
    UCharPlane Y(ntsc), I(ntsc), Q(ntsc);
    ...
    Y = 30 * red / 100 + 59 * green / 100 + 11 * blue / 100;
    I = 60 * red / 100 + 28 * green / 100 + 32 * blue / 100;
    Q = 21 * red / 100 + 52 * green / 100 + 31 * blue / 100;

If we used three significant digits in our constants (such as .305 and .596), we would have had to divide by 1000. The above code will work correctly but it can be made much faster and more accurate.

    PlaneSize ntsc(480, 640);
    UCharPlane red(ntsc), green(ntsc), blue(ntsc);
    UCharPlane Y(ntsc), I(ntsc), Q(ntsc);
    ...
    Y = (30 * red + 59 * green + 11 * blue) / 100;
    I = (60 * red + 28 * green + 32 * blue) / 100;
    Q = (21 * red + 52 * green + 31 * blue) / 100;

We are now doing three divides instead of 9 and the result has not lost any significance over the results produced with floating point.

Further improvement is possible. Because the values are restricted to the range 0 to 255, we know that the sum of the three products is less than 30000 and can be represented as a ShortPlane.

Additions of ShortPlanes are twice as fast as additions of IntPlanes. The result of each multiply is an IntPlane because integer scalars are treated as full 32 bit integers unless we force the compiler to use fewer bits. Multiplying a UCharPlane by a char scalar is much faster than multiplying by an integer and produces a ShortPlane instead of an IntPlane. The final code becomes

    PlaneSize ntsc(480, 640);
    UCharPlane red(ntsc), green(ntsc), blue(ntsc);
    UCharPlane Y(ntsc), I(ntsc), Q(ntsc);
    ...
    Y = (((char)30) * red + ((char)59) * green + ((char)11) * blue) / 
          ((char)100);
    I = (((char)60) * red + ((char)28) * green + ((char)32) * blue) /
         ((char)100);
    Q = (((char)21) * red + ((char)52) * green + ((char)31) * blue) /
          ((char)100);

With some implementations of C++ compilers, it is not necessary to cast constant values as is done in this example. These compilers will consider the constants to be of the smallest type that can represent the value. However, the principle is still important because you will often be using other variables instead of constants.

Being aware of the possible range of values, that any variable may assume, is critical to achieving the best performance from the CAAPP!

The list of dyadic operators above included the relational operations of ==, !=, >, >=, <, and <=. These operations may be applied between any two compatible planes. The result is a BitPlane whose value at each PE corresponds to the result of the comparison between the values at the same PE. For example, suppose we want to increment all of the non-zero values in a plane x by 3. The following code would accomplish this operation.

    x += (x != 0) * 3;

The operation x != 0 produces a BitPlane whose value is one everywhere that the value of X is not zero and produces a value of zero everywhere else. If we multiply this BitPlane by the scalar 3, we produce an IntPlane whose value is 3 everywhere that x is not zero and is zero everywhere else. This result is then added to to x. In the chapter on activity, we will see an alternative mechanism for accomplishing this.

Next -- Activity
ICL Table of Contents
ICL Index
ICL Home Page

Notes

Why do we use the term image here instead of plane? A plane may be used to represent other things besides images. The term image is more specific about what is represented and is less specific about how it is represented.
The same problem occurs with scalars in C and C++ but only with much larger values near 2^31 or when results are stored into scalar variables.