Vectorization and array computing

Vectorization, an extremely important concept in high-performance scientific computing, is the process of simultaneous execution of a set of computer instructions. This is contrary to the idea of looping and iteration which performs all program instructions sequentially. Vectorization can lead to significant runtime speed-up of the code.

Vectorization in MATLAB

Experienced programmers who are concerned with producing compact and fast programs, try to avoid for-loops wherever possible in their MATLAB codes. There is a reason for this: for-loops and while-loops have significant overhead in interpreted languages such as MATLAB and Python.

There is, of course, a remedy for this inefficiency. Since MATLAB is a matrix language, many of the matrix-level operations and functions are carried out internally using compiled C, Fortran, or assembly codes and are therefore executed at near-optimum efficiency. This is true of the arithmetic operators *, +,-,\, / and of relational and logical operators.

However, for-loops may be executed relatively slowly, depending on what is inside the loop, and MATLAB may or may not be able to optimize the loop. One of the most important tips for producing efficient M-files is to avoid for-loops in favor of vectorized constructs, that is, to convert for-loops into an equivalent vector or matrix operations. Vectorization has important benefits beyond simply increasing the speed of execution. It can lead to shorter and more readable MATLAB code. Furthermore, it expresses algorithms in terms of high-level constructs that are more appropriate for high-performance computing. For example, consider the process of summation of a random vector in MATLAB,

n = 5e7; x = randn(n,1);
tic, s = 0; for i=1:n, s = s + x(i)^2; end, toc

Elapsed time is 0.581945 seconds.

Now doing the same thing, using array notation would yield,

tic, s = sum(x.^2); toc

Elapsed time is 0.200450 seconds.

Amazing, isn’t it? You get almost 3x speedup in your MATLAB code if you use vectorized computation instead of for-loops. Later on, we will see that MATLAB has inherited these excellent vectorization techniques and syntax for matrix calculations from its high-performance ancestor, Fortran.

Exercise

How do you vectorize the following code?

i = 0;
for t = 0:.01:10
    i = i + 1;
    y(i) = sin(t);
end

Answer

t = 0:.01:10;
y = sin(t);

Vectorization of array operations

Vectorization of arrays can be done through array operators, which perform the same operation for all elements in the data set. These types of operations are useful for repetitive calculations. For example, suppose you collect the volume (V) of various cones by recording their diameter (D) and height (H). If you collect the information for just one cone, you can calculate the volume for that single cone as,

D = 0.2;
H = 0.04;
V = 1/12*pi*(D^2)*H

V =
   4.1888e-04

Now, suppose we collect information on 10,000 cones. The vectors D and H each contain 10,000 elements, and you want to calculate 10,000 volumes. In most programming languages (except Fortran and R which have similar vectorization capabilities to MATLAB), you need to set up a loop similar to this MATLAB code (here instead of 10000, I am using 7),

D = [-0.2 1.0 1.5 3.0 -1.0 4.2 3.14];
H = [0.0400 1.0000 2.2500 9.0000 1.0000 17.6400 9.8596];
for n = 1:7
   V(n) = 1/12*pi*(D(n)^2)*H(n);
end
V

V =
    0.0004    0.2618    1.3254   21.2058    0.2618   81.4640   25.4500

With MATLAB, you can perform the calculation for each element of a vector with similar syntax as the scalar case,

V = 1/12*pi*(D.^2).*H;  % Vectorized Calculation
V

V =
    0.0004    0.2618    1.3254   21.2058    0.2618   81.4640   25.4500

Important: Placing a period (.) before the operators *, /, and ^, transforms them into array operators.

Logical array operations

MATLAB comparison operators also accept vector inputs and return vector outputs. For example, suppose while collecting data from 10,000 cones, you record several negative values for the diameter. You can determine which values in a vector are valid with the >= operator,

D = [-0.2 1.0 1.5 3.0 -1.0 4.2 3.14];
D >= 0

ans =
     0     1     1     1     0     1     1

class(ans)

ans =
logical

You can directly exploit the logical indexing power of MATLAB to select the valid cone volumes, Vgood, for which the corresponding elements of D are nonnegative,

Vgood = V(D >= 0) % removing all data corresponding to negative diameters

Vgood =
    0.2618    1.3254   21.2058   81.4640   25.4500

MATLAB allows you to perform a logical AND or OR on the elements of an entire vector with the functions all and any, respectively. You can throw a warning if all values of D are below zero,

if all(D < 0) % gives no warning because not all values are negative
   warning('All values of diameter are negative.')
end

or,

if (D < 0)
   warning('Some values of diameter are negative.')
end

Warning: Some values of diameter are negative.

MATLAB can also compare two vectors of the same size, allowing you to impose further restrictions. This code finds all the values where V is nonnegative and D is greater than H,

D = [-0.2 1.0 1.5 3.0 -1.0 4.2 3.14];
H = [0.0400 1.0000 2.2500 1.5000 1.0000 0.6400 9.8596];
V((V >= 0) & (D > H))

ans =
   21.2058   81.4640

V =
    0.0004    0.2618    1.3254   21.2058    0.2618   81.4640   25.4500

(V >= 0) & (D > H)

ans =
     0     0     0     1     0     1     0

The resulting vector is the same size as the inputs. To aid comparison, MATLAB contains special values to denote overflow, underflow, and undefined operators, such as inf and nan. Logical operators isinf and isnan exist to help perform logical tests for these special values. For example, it is often useful to exclude NaN values from computations,

x = [2 -1 0 3 NaN 2 NaN 11 4 Inf];
xvalid = x(~isnan(x))

xvalid =
     2    -1     0     3     2    11     4   Inf

Warning: Note that Inf == Inf returns true; however, NaN == NaN always returns false in MATLAB.

Matrix Operations

Matrix operations act according to the rules of linear algebra. These operations are most useful in vectorization if you are working with multidimensional data. Suppose you want to evaluate a function, $F$, of two variables, $x$, and $y$,

$F(x,y) = x*\exp(-x2 - y2) ~.$

To evaluate this function at every combination of points in the $x$ and $y$, you need to define a grid of values,

x = -2:0.2:2;
y = -1.5:0.2:1.5;
[X,Y] = meshgrid(x,y);
F = X.*exp(-X.^2-Y.^2);

Without meshgrid(), you might need to write two for-loops to iterate through vector combinations. The function ndgrid() also creates number grids from vectors, but unlike meshgrid(), it can construct grids beyond three dimensions. meshgrid() can only construct 2-D and 3-D grids.

The following table contains a list of MATLAB functions that are commonly used in vectorized codes,

Table of MATLAB's most-widely used vectorization functions.
Function	Description
`all`	Determine if all array elements are nonzero or true
`any`	Determine if any array elements are nonzero
`cumsum`	Cumulative sum
`diff`	Differences and Approximate Derivatives
`find`	Find indices and values of nonzero elements
`ind2sub`	Subscripts from linear index
`ipermute`	Inverse permute dimensions of N-D array
`logical`	Convert numeric values to logicals
`meshgrid`	Rectangular grid in 2-D and 3-D space
`ndgrid`	Rectangular grid in N-D space
`permute`	Rearrange dimensions of N-D array
`prod`	Product of array elements
`repmat`	Repeat copies of array
`reshape`	Reshape array
`shiftdim`	Shift dimensions
`sort`	Sort array elements
`squeeze`	Remove singleton dimensions
`sub2ind`	Convert subscripts to linear indices
`sum`	Sum of array elements

Why is vectorized code faster than for-loops?

The reason for the speedup in vectorized has to be sought in the way the memory of the computer is built. The figure below represents a schematic diagram of the Central Processing Unit (CPU) of every modern computer in relationship with computer memory.

The hierarchy of memory in most modern computers and its relationship with the CPU.

At the highest level of the memory hierarchy, closest to the CPU, we have the CPU register. A processor register is a quickly accessible location available to a computer’s CPU. Registers usually consist of a small amount of fast storage and maybe read-only or write-only. The CPU has super fast access to data stored in the register. But the problem is that this memory is very small, typically on the orders of bits of information.

At the second level of the hierarchy of memory, we have the CPU cache, typically comprised of three different levels L1, L2, L3, which rank from fastest to slowest respectively, in terms of CPU access. However, the faster the cache memory, the smaller it is. Therefore, L1 is the fastest of the three, but also the smallest of the three levels.

CPU Caching was invented to solve a significant problem. In the early decades of computing, the main memory was extremely slow and incredibly expensive — but CPUs weren’t particularly fast, either. Starting in the 1980s, the gap began to widen quickly. Microprocessor clock speeds took off, but memory access times improved far less dramatically. As this gap grew, it became increasingly clear that a new type of fast memory was needed to bridge the gap. See the figure below.

The growing gap between the speed of DRAM memories and CPUs in time.

After CPU cache, there the Random Access Memory (RAM) which you hear the most about when you go to buy a new computer. Typical computers contain 4-32 Gb of RAM. When you open MATLAB and create some variables, all of your data is stored on this memory. However, this memory is the slowest of all in terms of access to CPU.

When you use for-loops in MATLAB to perform some specific calculations on a vector, you are asking MATLAB to go to this memory at each loop iteration to fetch an element of the loop, bring it to the CPU, perform the set of operations requested, and send it back to memory. However, the CPU is much more capable than doing a single calculation at a time. Therefore, if you could somehow tell MATLAB to fetch a bulk of elements from your vector and bring them to CPU to perform the requested operations, your code would become much faster. The way to tell MATLAB to do so is called vectorization. By vectorizing your code, you tell MATLAB to bring as much information as possible to the highest memory level close to CPU, to perform the operations on all of them simultaneously and return the result for all of them back to the memory altogether. This results in much faster code, since nowadays, as the figure above shows, the bottleneck in code performance is not the CPU speed, but the memory access.

Measuring the performance of your MATLAB functions and scripts

MATLAB has several built-in methods of timing how long it takes to run a MATLAB function or script. The timeit() function as well as tic and toc, are in particular very useful. Use the timeit() function for rigorous measurement of your function’s execution time. Use tic and toc to estimate time for smaller portions of code that are not complete functions.

For additional details about the performance of your code, such as function call information and execution time of individual lines of code, MATLAB has more sophisticated tools such as MATLAB® Profiler.

Timing MATLAB functions

To measure the time required to run a function, whether built-in or your own, you can use the timeit() function. The timeit() function calls the user-specified function multiple times and returns the median of the time measurements. This function takes a handle to the function whose performance is to be measured and returns the typical execution time, in seconds.

For example, suppose that you want to measure the performance of MATLAB’s built-in function, isprime() for a given input value to this function. You can compute the time to execute the function using timeit() like the following,

timeit( @()isprime(10^14) ) % pass the function as a handle to timeit()

ans =
    0.0787

Note that, this function isprime() will have different performance given different input numbers,

timeit( @()isprime(10^4) ) % pass the function as a handle to timeit()

ans =
   2.0402e-05

Time Portions of Code

To estimate how long a portion of your program takes to run or to compare the speed of different implementations of portions of your program, you can use MATLAB stopwatch timer functions: tic and toc. Invoking tic starts the timer, and the next toc reads the elapsed time.

tic
   % The program section to time. 
toc

Sometimes programs run too fast for tic and toc to provide useful data. If your code is faster than 1/10 second, consider timing it while running in a loop, and then average the result to find the time for a single run of the loop.

The `cputime()` function vs. `tic/toc` and `timeit()`

There is another MATLAB function that can do the timing of your scripts or your functions: The cputime() function measures the total CPU time and sums across all threads (cores) in the CPU. This measurement is different from the wall-clock time that timeit() or tic/toc return, and could be misleading. For example, the CPU time for the pause function is typically small, but the wall-clock time accounts for the actual time that MATLAB execution is paused. Therefore, the wall-clock time might be longer.

If your function uses four processing cores equally, the CPU time could be approximately four times higher than the wall-clock time.

Frequently, your best choice to measure the performance of your code is timeit() or tic and toc. These functions return wall-clock time. Note that, unlike tic and toc, the timeit() function calls your code multiple times, and, therefore, considers the cost of first-time calls to your functions, which are typically more time-consuming than subsequent calls.

Some tips for Measuring Performance

Always time a significant enough portion of code. Normally, the code that you are timing should take more than 1/10 second to run, otherwise, the timing may not be very accurate.
Put the code you are trying to time into a function instead of timing it at the command line or inside a script.
Unless you are trying to measure first-time cost of running your code, run your code multiple times. Use the timeit() function for multiple calls timing of your function.
Avoid clear all when measuring the performance of your MATLAB scripts. This will add additional time to wipe MATLAB workspace from all current existing variable definitions, and therefore contaminate the timing measurements of the actual code in your MATLAB scripts.
When performing timing measurements, assign your output to a variable instead of letting it default to ans().