PIC32MX: Benchmarking Mathematical Operations

From Mech
Jump to navigationJump to search

Original Assignment

Do not erase this section!

Your assignment is to empirically test how long it takes to perform add, subtract, multiply, divide, sqrt, sin, and cos operations with the 80 MHz PIC32460F512L and our standard code optimization setting. You will do these tests with chars (8-bit integers), shorts (16-bit), integers (32-bit), long long integers (64-bit), floats (32-bit single precision floating point), and double (64-bit double-precision floating point). The integers can be unsigned or signed. Your end result will be a table with the operation on one axis (likely the horizontal axis) and the kind of variable on the other axis, and each cell of the table will have a normalized duration for the operation. The time will be normalized by the fastest operation, so the smallest number in the table will be 1.00. All other numbers will indicate how many times longer that operation takes. All numbers will have two decimal places, e.g., 2.57 or 24.72. You will also give the time that 1.00 corresponds to in nanoseconds.

Since bit-shifting left and right correspond to a version of multiplying and dividing, you should also include the operations >>1 and >>4 and <<1 and <<4. (If the results are identical, you can eliminate shift left from your table.)

To generate this table, you can set an output bit low before the operation, then high immediately after the operation, and measure the time on an oscilloscope. Two things to consider: (1) Time a single operation, over and over, with a short delay between the operation. This should create a pulse train on your oscilloscope. Can you get an accurate estimate of the time this way? You could also try doing five or ten operations between changing the digital output. See if this gives the same estimate. (This estimate might be more accurate as you are essentially averaging over a number of operations.) Avoid using arrays and for loops in your test, as indexing arrays and running the loop each take time. (2) Make sure the compiler doesn't compute the results in advance. You could try testing operations with numbers generated randomly (don't time this operation!) vs. numbers that you just type in manually to make sure that both are giving you the same result.

Overview

We were tasked with determining the real-time cost (measured in nanoseconds) of performing seven basic mathematical operations with each one of the six commonly used ANSI C data types.

The mathematical operations we tested were:

  • subtraction
  • addition
  • multiplication
  • division
  • square root
  • sine
  • cosine

The six data types we tested each operation on were:

  • char
  • short
  • integer
  • long long
  • float
  • double

Our testing procedure was simple: throw an output pin high on the NU32 development board, perform a mathematical operation with a given data type, and then pull the same pin low.

Placing the above three steps in an infinite while loop afforded us the opportunity to use an oscilloscope to measure the duration between each high-low pair in the output waveform. After subtracting the time it took for the PIC to raise and lower the voltage on the output pin (something we previously measured), we were able to determine the amount of time required for the PIC32 chip to execute an operation with a high level of accuracy.

With seven operations to perform on six different data types, we created the following table to help us assign and keep track of the various tests we planned to run:

Operation vs. Data type
char (8-bit) short (16-bit) int (32-bit) long long (64-bit) float (32-bit) double (64-bit)
subtraction Test 2 Test 9 Test 16 Test 23 Test 30 Test 37
addition Test 3 Test 10 Test 17 Test 24 Test 31 Test 38
multiplication Test 4 Test 11 Test 18 Test 25 Test 32 Test 39
division Test 5 Test 12 Test 19 Test 26 Test 33 Test 40
square root Test 6 Test 13 Test 20 Test 27 Test 34 Test 41
sine Test 7 Test 14 Test 21 Test 28 Test 35 Test 42
cosine Test 8 Test 15 Test 22 Test 29 Test 36 Test 43

Several tests contained multiple procedures that explored various ways to carry out a given mathematical operation on a given data type. For example, in the multiplication tests, not only did we test the traditional multiplication operator (*), but also the bitwise left shift operator (<<). Our goal was to find out if one particular operator was faster than the other. Similarly, we also included procedures that performed the above operations on hard-coded numbers (such as 347) as well as randomly chosen numbers stored in variables (such as 'random_int1'). We wanted to ensure that the compiler didn't compute the results of each operation in advance. While pre-compiling can indeed afford welcome increases in execution time, situations in which the compiler can't optimize the operations ahead of time (for example, situations where the data to be operated on is not known in advance) are still common occurrences and are worth benchmarking.

Accordingly, several tests contain multiple procedures that not only account for multiple methods of performing a particular operation, but multiple sets of numbers to perform those operations on.

Test 1 was used to determine the duration required for the PIC32 to throw a pin high and pull a pin low, while Tests 2 through 43 were used to measure the actual performance of each operation and data-type pair.


Results

Below is the quick summary of the testing results comparing each data type and each operation. All results are normalized to 60ns (1.00 = 60ns).

Operation vs. Data type
Subtraction Addition Multiplication Division Square Root Sine Cosine
Char 1.87 1.65 2.28 N/A 34.78 116.03 98.93
Short 1.03 1.67 1.47 5.00 144.57 230.40 215.40
Int 1.07 1.00 1.43 8.10 145.62 316.43 330.62
Long Long 2.50 3.33 6.63 28.73 271.85 498.30 510.38
Float 15.00 17.07 12.27 27.90 143.93 326.03 338.28
Double 26.00 20.60 23.97 53.07 133.30 343.73 333.52

Below are the results of each particular test we performed, coupled with a short explanation for each result.

Basic Timing Constants (Test 1)

Test (a): Time required to throw an output pin high
Caption1
Test (b): Time required to pull an output pin low
Caption2
Test (b): Time required to pull an output pin low
Caption3
Execution waveforms as seen on the output pin

This test determines the length of time required by the PIC32 chip to push a given output pin high and pull the same pin low.

  • Test (a): Time required to throw an output pin high
    • Instruction: PIN_A2 = 1;
    • Time: 63 ns
  • Test (b): Time required to pull an output pin low
    • Instruction: PIN_A2 = 0;
    • Time: 63 ns
  • Test (c): Time required to execute 1 empty while loop cycle
    • Instruction: while(1){}
    • Time: 23 ns


char Performance


A char data type, in ANSI C, is a value holding one byte, or one character code. The actual number of bits in a char in a particular implementation is documented as CHAR_BIT in that implementation's limits.h file. In practice, it is almost always 8 bits, corresponding to a decimal range of 0 to 255 inclusive. Unless otherwise noted, all (a) benchmarks are operations on two predefined (and mot likely pre-computed) ASCII letters, all (b) benchmarks are operations on two predefined (and most likely pre-computed) numbers in the range of 0 to 255, and all (c) benchmarks are operations on two random (and most likely not pre-computed) numbers. These multiple benchmarks per test exist to illustrate the differences in execution time between pre-compiled operations and operations the PIC must perform in real time.


Subtraction (Test 2)

Test (a): Time required to throw an output pin high
Caption1
Test (b): Time required to pull an output pin low
Caption2
Test (b): Time required to pull an output pin low
Caption3
Execution waveforms as seen on the output pin

This test determines the length of time required by the PIC32 chip to subtract one 8-bit number (a char) from another 8-bit number (a char).

  • Test (a): Time required to subtract two constant chars (may be pre-computed)
    • Instruction: letter_capital_a = 'z'-'7';
    • Time: 50 ns
  • Test (b): Time required to subtract two ints cast into a char (may be pre-computed)
    • Instruction: letter_b = 100-2;
    • Time: 50 ns
  • Test (c): Time required to subtract of two random chars (guaranteed not to be pre-comp)
    • Instruction: random_char = larger-smaller;*
    • Time: 112 ns

*See Code section for a more details on how randomness was guaranteed.

Addition (Test 3)

Test (a): Time required to throw an output pin high
Caption1
Test (b): Time required to pull an output pin low
Caption2
Test (b): Time required to pull an output pin low
Caption3
Execution waveforms as seen on the output pin

This test determines the length of time required by the PIC32 chip to add one 8-bit number (a char) to another 8-bit number (a char).

  • Test (a): Time required to add two constant chars (may be pre-computed)
    • Instruction: letter_a = ')'+'8';
    • Time: 50 ns
  • Test (b): Time required to add two ints cast into a char (may be pre-computed)
    • Instruction: letter_b = 97+1;
    • Time: 50 ns
  • Test (c): Time required to add of two random chars (guaranteed not to be pre-comp)
    • Instruction: random_char = random_char1+random_char2;*
    • Time: 99 ns

*See Code section for a more details on how randomness was guaranteed.

Multiplication (Test 4)

Test (a): Time required to throw an output pin high
Caption1
Test (b): Time required to pull an output pin low
Caption2
Test (b): Time required to pull an output pin low
Caption3
Execution waveforms as seen on the output pin

This test determines the length of time required by the PIC32 chip to multiply one 8-bit number (a char) by another 8-bit number (a char).

  • Test (a): Time required to multiply two constant chars (may be pre-computed)
    • Instruction: ascii_225 = 'K'*'♥';
    • Time: 49 ns
  • Test (b): Time required to multiply two ints cast into a char (may be pre-computed)
    • Instruction: ascii_200 = 100*2;
    • Time: 48 ns
  • Test (c): Time required to multiply of two random chars (guaranteed not to be pre-comp)
    • Instruction: random_char = larger*smaller;*
    • Time: 137 ns

*See Code section for a more details on how randomness was guaranteed.

Division (Test 5)

Test (a): Time required to throw an output pin high
Caption1
Test (b): Time required to pull an output pin low
Caption2
Test (b): Time required to pull an output pin low
Caption3
Execution waveforms as seen on the output pin

This test determines the length of time required by the PIC32 chip to divide one 8-bit number (a char) by another 8-bit number (a char).

  • Test (a): Time required to subtract two constant chars (may be pre-computed)
    • Instruction: ascii_25 = 'K'/'♥'; //thp
    • Time: 48 ns
  • Test (b): Time required to subtract two ints cast into a char (may be pre-computed)
    • Instruction: letter_2 = 100/2;
    • Time: 50 ns
  • Test (c): Time required to subtract of two random chars (guaranteed not to be pre-comp)
    • Instruction: random_char = larger/smaller;*
    • Time: N/A

* We had great difficulty in trying to test this particular operation. After some research with an oscilloscope and voltmeter, it seems that the PIC32 is not capable of dividing chars in this way. More specifically, every time the PIC32 attempts to divide one char by another, all output pins are immediately grounded. We've tested this code in other C environments, and it works as expected, so the error must either lie somewhere within our specific PICs (which would be very unlikely - we tested 3) or the silicon architecture of the PIC32 itself (sill unlikely, but given the number of PICs we tested, more probable). If you absolutely need to divide chars, cast them to ints first, perform your division, then cast them back to chars.

Square Root (Test 6)

Test (a): Time required to throw an output pin high
Caption1
Test (b): Time required to pull an output pin low
Caption2
Test (c): Time required to pull an output pin low
Caption3
Test (d): Time required to throw an output pin high
Caption4
Test (e): Time required to pull an output pin low
Caption5
Execution waveforms as seen on the output pin

This test determines the length of time required by the PIC32 chip to square root one 8-bit number (a char). Benchmarks (a) through (c) use the sqrt() function while benchmarks (d) through (f) raise the operands to the 1/2 power.

  • Test (a): Time required to sqrt() a constant char (may be pre-computed)
    • Instruction: ascii_25 = sqrt('u');
    • Time: 48 ns
  • Test (b): Time required to sqrt() an int cast into a char (may be pre-computed)
    • Instruction: number_10 = sqrt(100);
    • Time: 48 ns
  • Test (c): Time required to sqrt() a random char (guaranteed not to be pre-comp)
    • Instruction: random_char = sqrt(random_char1);
    • Time: 2087 ns
  • Test (d): Time required to ^(1/2) a constant char (may be pre-computed)
    • Instruction: ascii_25 = ('u')^(1/2);
    • Time: 48 ns
  • Test (e): Time required to ^(1/2) an int cast into a char (may be pre-computed)
    • Instruction: number_10 = (100)^(1/2);
    • Time: 48 ns
  • Test (f): Time required to ^(1/2) a random char (guaranteed not to be pre-comp)
    • Instruction: random_char = (random_char2)^(1/2);*
    • Time: 75 ns

*See Code section for a more details on how randomness was guaranteed.

Sine (Test 7)

Test (a): Time required to throw an output pin high
Caption1
Test (b): Time required to pull an output pin low
Caption2
Test (b): Time required to pull an output pin low
Caption3
Execution waveforms as seen on the output pin

This test determines the length of time required by the PIC32 chip to take the sine of an 8-bit number (a char).

  • Test (a): Time required to take the sine of a constant char (may be pre-computed)
    • Instruction: ascii_25 = sin('K');
    • Time: 9963 ns
  • Test (b): Time required to take the sine of a constant int cast into a char (may be pre-computed)
    • Instruction: letter_2 = sin(50);
    • Time: 9550 ns
  • Test (c): Time required to take the sine of a random char (guaranteed not to be pre-comp)
    • Instruction: random_char = sin(larger);*
    • Time: 6962 ns

*See Code section for a more details on how randomness was guaranteed.

Cosine (Test 8)

Test (a): Time required to throw an output pin high
Caption1
Test (b): Time required to pull an output pin low
Caption2
Test (b): Time required to pull an output pin low
Caption3
Execution waveforms as seen on the output pin

This test determines the length of time required by the PIC32 chip to take the cosine of an 8-bit number (a char).

  • Test (a): Time required to subtract two constant chars (may be pre-computed)
    • Instruction: ascii_25 = cos('K');
    • Time: 9111 ns
  • Test (b): Time required to subtract two ints cast into a char (may be pre-computed)
    • Instruction: letter_2 = cos(50);
    • Time: 8724 ns
  • Test (c): Time required to subtract of two random chars (guaranteed not to be pre-comp)
    • Instruction: random_char = cos(larger);*
    • Time: 5936 ns

*See Code section for a more details on how randomness was guaranteed.


short Performance


A short data type, in ANSI C, is a value that holds 2 bytes, or 16 bits. This corresponds to a range of 0 to 65535 (2^16 - 1). If the variable is signed (negative), then the range is from -32767 to 32767 (-2^15 + 1 to 2^15 -1). In this series of tests, tests (a) are with a predefined number, and tests (b) are with a random number. This is in order to test times for pre-compiled operations and operations on the PIC.

Subtraction (Test 9)

This test determines the length of time required by the PIC32 chip to subtract one 16-bit number (a short) from another 16-bit number (a short).

  • Test a: 25ns
  • Test b: 62ns

Addition (Test 10)

This test determines the length of time required by the PIC32 chip to add one 16-bit number (a short) to another 16-bit number (a short).

  • Test a: 50ns
  • Test b: 100ns

Multiplication (Test 11)

This test determines the length of time required by the PIC32 chip to multiply one 16-bit number (a short) by another 16-bit number (a short).

  • Test a: 24ns
  • Test b: 88ns

Division (Test 12)

This test determines the length of time required by the PIC32 chip to divide one 16-bit number (a short) by another 16-bit number (a short).

  • Test a: 28ns
  • Test b: 300ns

Square Root (Test 13)

This test determines the length of time required by the PIC32 chip to get the square root of one 16-bit number (a short). Tests (a) and (b) use the 'sqrt()' method, while tests (c) and (d) use a number to the 1/2 power.

  • Test a: 50ns
  • Test b: 8674ns
  • Test c: 50ns
  • Test d: 76ns

Sine (Test 14)

This test determines the length of time required by the PIC32 chip to get the sine of one 16-bit number (a short).

  • Test a: 13014ns
  • Test b: 13824ns

Cosine (Test 15)

This test determines the length of time required by the PIC32 chip to get the cosine of one 16-bit number (a short).

  • Test a: 12174ns
  • Test b: 12924ns

int Performance


An int data type, in ANSI C, is a value that holds 4 bytes, or 32 bits. This corresponds to a range of 0 to 4294967295 (2^32 - 1). If the variable is signed (negative), then the range is from -2147483647 to 2147483647 (-2^31 + 1 to 2^31 -1). In this series of tests, tests (a) are with a predefined number, and tests (b) are with a random number. This is in order to test times for pre-compiled operations and operations on the PIC.

Subtraction (Test 16)

This test determines the length of time required by the PIC32 chip to subtract one 32-bit number (an int) from another 32-bit number (an int).

  • Test a: 38ns
  • Test b: 64ns

Addition (Test 17)

This test determines the length of time required by the PIC32 chip to add one 32-bit number (an int) to another 32-bit number (an int).

  • Test a: 26ns
  • Test b: 60ns

Multiplication (Test 18)

This test determines the length of time required by the PIC32 chip to multiply one 32-bit number (an int) by another 32-bit number (an int).

  • Test a: 38ns
  • Test b: 86ns

Division (Test 19)

This test determines the length of time required by the PIC32 chip to divide one 32-bit number (an int) by another 32-bit number (an int).

  • Test a: 38ns
  • Test b: 486s

Square Root (Test 20)

This test determines the length of time required by the PIC32 chip to get the square root of one 32-bit number (an int). Tests (a) and (b) use the 'sqrt()' method, while tests (c) and (d) use a number to the 1/2 power.

  • Test a: 50ns
  • Test b: 8737ns
  • Test c: 88ns
  • Test d: 74ns

Sine (Test 21)

This test determines the length of time required by the PIC32 chip to get the sine of one 32-bit number (an int).

  • Test a: 19488ns
  • Test b: 18988ns

Cosine (Test 22)

This test determines the length of time required by the PIC32 chip to get the cosine of one 32-bit number (an int).

  • Test a: 20324ns
  • Test b: 19837ns


long long Performance


An long long data type, in ANSI C, is a value that holds 8 bytes, or 64 bits. This corresponds to a range of 0 to 1.84467441 × 10^19 (2^64 - 1). If the variable is signed (negative), then the range is from -9.22337204 × 10^18 to 9.22337204 × 10^18 (-2^63 + 1 to 2^63 -1). In this series of tests, tests (a) are with a predefined number, and tests (b) are with a random number. This is in order to test times for pre-compiled operations and operations on the PIC.

Subtraction (Test 23)

This test determines the length of time required by the PIC32 chip to subtract one 64-bit number (a long long) from another 64-bit number (a long long).

  • Test a: 186ns
  • Test b: 150ns

Addition (Test 24)

This test determines the length of time required by the PIC32 chip to add one 64-bit number (a long long) to another 64-bit number (a long long).

  • Test a: 88ns
  • Test b: 200ns

Multiplication (Test 25)

This test determines the length of time required by the PIC32 chip to multiply one 64-bit number (a long long) by another 64-bit number (a long long).

  • Test a: 74ns
  • Test b: 398ns

Division (Test 26)

This test determines the length of time required by the PIC32 chip to divide one 64-bit number (a long long) by another 64-bit number (a long long).

  • Test a: 74ns
  • Test b: 1724ns

Square Root (Test 27)

This test determines the length of time required by the PIC32 chip to get the square root of one 64-bit number (a long long). Tests (a) and (b) use the 'sqrt()' method, while tests (c) and (d) use a number to the 1/2 power.

  • Test a: 87ns
  • Test b: 16311ns
  • Test c: 188ns
  • Test d: 74ns

Sine (Test 28)

This test determines the length of time required by the PIC32 chip to get the sine of one 64-bit number (a long long).

  • Test a: 23837ns
  • Test b: 29898ns

Cosine (Test 29)

This test determines the length of time required by the PIC32 chip to get the cosine of one 64-bit number (a long long).

  • Test a: 24611ns
  • Test b: 30623ns


float Performance


Subtraction (Test 30)

This test determines the length of time required by the PIC32 chip to subtract one 32-bit number (a float) from another 32-bit number (a float).

  • Test a: 100ns
  • Test b: 900ns

Addition (Test 31)

This test determines the length of time required by the PIC32 chip to add one 32-bit number (a float) to another 32-bit number (a float).

  • Test a: 124ns
  • Test b: 1024ns

Multiplication (Test 32)

This test determines the length of time required by the PIC32 chip to multiply one 32-bit number (a float) by another 32-bit number (a float).

  • Test a: 124ns
  • Test b: 736ns

Division (Test 33)

This test determines the length of time required by the PIC32 chip to divide one 32-bit number (a float) by another 32-bit number (a float).

  • Test a: 99ns
  • Test b: 1674ns

Square Root (Test 34)

This test determines the length of time required by the PIC32 chip to get the square root of one 32-bit number (a float). Tests (a) and (b) use the 'sqrt()' method, while tests (c) and (d) use a number to the 1/2 power.

  • Test a: 99ns
  • Test b: 8636ns
  • Test c: N/A
  • Test d: N/A

Sine (Test 35)

This test determines the length of time required by the PIC32 chip to get the sine of one 32-bit number (a float).

  • Test a: 19574ns
  • Test b: 19562ns

Cosine (Test 36)

This test determines the length of time required by the PIC32 chip to get the cosine of one 32-bit number (a float).

  • Test a: 20311ns
  • Test b: 20297


double Performance


Subtraction (Test 37)

This test determines the length of time required by the PIC32 chip to subtract one 64-bit number (a double) from another 64-bit number (a double).

  • Test a: 199ns
  • Test b: 1560ns

Addition (Test 38)

This test determines the length of time required by the PIC32 chip to add one 64-bit number (a double) to another 64-bit number (a double).

  • Test a: 199ns
  • Test b: 1236ns

Multiplication (Test 39)

This test determines the length of time required by the PIC32 chip to multiply one 64-bit number (a double) by another 64-bit number (a double).

  • Test a: 188ns
  • Test b: 1438ns

Division (Test 40)

This test determines the length of time required by the PIC32 chip to divide one 64-bit number (a double) by another 64-bit number (a double).

  • Test a: 187ns
  • Test b: 3184ns

Square Root (Test 41)

This test determines the length of time required by the PIC32 chip to get the square root of one 64-bit number (a double). Tests (a) and (b) use the 'sqrt()' method, while tests (c) and (d) use a number to the 1/2 power.

  • Test a: 188ns
  • Test b: 7998ns
  • Test c: N/A
  • Test d: N/A

Sine (Test 42)

This test determines the length of time required by the PIC32 chip to get the sine of one 64-bit number (a double).

  • Test a: 20299ns
  • Test b: 20624ns

Cosine (Test 43)

This test determines the length of time required by the PIC32 chip to get the cosine of one 64-bit number (a double).

  • Test a: 19762ns
  • Test b: 20011ns

Code

Test 1

This is the first test.

Test 2

This is the second test.

Blah blah blah, I would really like to be watching the super bowl right about now. /sigh.