Vectorization tricks for cell arrays in MATLAB

9 minute read

Motivations

As is known, MATLAB is notoriously slow in executing for loops. Looping through a large array is usually a nightmare, even more so if we add if/then within, and/or for sparse matrices. Nevertheless, MATLAB is highly optimized in vectorized array and matrix operations using the LAPACK/BLAS backend, and as an interpreted/scripting language, MATLAB operates in 4k by 4k matrices hundreds even thousands of times faster than direct implementations in compiled languages like C/C++ and Java.

Recently I am learning to code “virtual element method” for Long Chen’s $i$FEM. As it requires a polytopal data structure, the biggest difference with traditional finite element is:

The number of indices stored in each element is of variable lengths.

This poses some great challenges when we want to write a general purpose code to retrieve sparse matrices based on the topological structures. In numpy, the solution is usually adding paddings. Yet MATLAB has a more neat solution to this, which is the cell array:

a = {[2,3], [], [2, 3, 1]}

then accessing the element of a can either be done through {} or () indexing, with the latter being the preferred way of manipulating it as a(i) returns the $i$-th entry as a cell, while a{i} will yield a matrix or vector whenever applicable.

First example

Consider the following scripts to generate something like a variable length dice rolling simulation:

N = 100000;
result = cell(N,1);
nRoll = randi(10, [N,1]); % num of rolls each trial is between 1 to 10
for i = 1:N
  result{i} = randi(6, [1, nRoll(i)]);
end

the first 5 elements of result may look like

result
ans =
    [6,6,5,5,1,4]
    [2,1,3]
    [1,4,4,6,3,4]
    [4,6,1,1,1,6,6,2,6]
    [5,6,2,4,4,4,6,4,4,6]

Now we want to retrieve the number of rolling in each sequence (i.e., recovering nRoll from result), the first way is through a for loop:

avg1 = zeros(N,1);
for i = 1:N
    avg1(i) = length(result{i});
end

On my old Retina Macbook Pro 2013, 10 runs yield an average of 0.03 seconds to execute. However, if we apply a built-in function through cellfun:

avg2 = cellfun('length', result);

the average time is only 0.004 seconds for 10 runs. While the speed-up may not sound so significant, let us move to the next example.

Element-wise matrix vector multiplication

Now suppose we have a bunch of matrices $A$ stored in a 3D array, we want to compute element by element matrix vector multiplication $b = A\,\mathbf{x}$. An example setup can be:

N = 100000;
n = 10;
A = rand(N,n,n);
x = rand(N,n);

We want to obtain another array b such that b(i,:) = A(i,:,:)*x(i,:), of course the simplest is using a for loop to execute this literally:

b = zeros(N,n);
for i = 1:N
  b(i,:) = A(i,:,:)*x(i,:);
end

It takes about 1.5 seconds. Now we vectorize this routine using bsxfun:

b = sum(bsxfun(@times, A, x), 2);

This takes about 0.03 seconds, which is about 50 times speed-up.

Remark on tensor prod

Note that this is NOT a tensor product between these two arrays, as

\([A_1, A_2] \otimes [\mathbf{x}_1, \mathbf{x}_2] = \begin{pmatrix} A_1 \mathbf{x}_1 & A_1 \mathbf{x}_2 \\ A_2 \mathbf{x}_1 & A_2 \mathbf{x}_2 \end{pmatrix}\) computes more off-diagonal entries than the element-wise matrix vector product. The method here can be viewed as a vectorization for retrieving the diagonal blocks of a general tensor product between 3D arrays.

Use accumarray as np.append for cells

MATLAB has a built-in function dedicated to vectorization which is accumarray which adds up VALS in SUBS by default in the expression of:

A = accumarray(subs,val,sz)

For example, if you have say

subs = [1 2 3 8 1 3 5 6 1 5 5]';
ix = accumarray(subs,1);

and ix will be [3 1 2 0 3 1 0 1], by letting VALS be 1, we are just recording the number of times each number appearing in the subs array.

Now say we do not want to add up the times, instead we just want to append 1’s each time an index appears in the array, i.e., we want to get a cell array like the following:

[1,1,1]
1
[1,1]
[]
[1,1,1]
1
[]
1

1 is appended three times as 1 appears three times in subs, and empty cells represent that number has not appeared once in the subs. There is a single line of code can do this:

ix = accumarray(subs,1,[],@(x) {x});

which takes advantage of the fourth variable in accumarray can be a function handle.

mat2cell is cell array’s reshape

reshape function plays somewhat a central role in MATLAB vectorization for matrices and vectors. For example, to convert back and forth between : (linear indexing) and the [row,col] indices, sometimes there are too many pointers involved the MATLAB built-in sub2ind does not meed our needs; or sometimes we would like to do some row-wise reshape instead of the default column-based linear indexing, reshape are handy.

In the context of cell array, reshape cannot be used. There is a function num2cell but can only output uniform result (no variable length arrays within each cell). mat2cell is here to rescue.

Suppose we have the following scenerio, we have a cell array recording how the connection of a node with neighbors in a graph:

C = {
      [2, 3, 4, 6], ...
      [1, 5, 4], ...
      [2, 4], ...
      [2], ...
      [4, 6], ...
      [1, 2, 3, 4, 5]
    }

Say we would like to check which nodes are connected with node 2. If C has consistent sizes, a simple C == 2 yields a logical matrix that gives you in this information; if an indicator is more preferred, sum(C==2, 2) or any(C==2,2) will do the trick. With more pointers involved, you may want to use reshape(C(:) == 2, size(C)) as well. However, none of these simple elegant code works for cell array.

Nevertheless, there does exist a simple elegant code to achieve something like reshape(C(:) == 2, size(C)) for cell array. First we can flatten the cell array by : syntax:

C_flat = [C{:}];

or

C_flat = horzcat(C{:});

Then we obtain the logical array we want for this flattened array:

isConnected = (C_flat == 2);

Now we would like to reshape isConnected to have the shape of C, the first row has 4 indicators, the second row has 3, etc. We can do this by a trick used earlier:

sizeC = cellfun(@length, C);
isConnected = mat2cell(isConnected, sizeC);

Now isConnected has the same shape as C. To get an indicator array for each cell, cellfun can be used again:

cellfun(@any, isConnected)

A less-convoluted example

(Updated July 27, 2021) In the comment, Jan Valdman asked how we can convert the following code to a one-liner: for the C defined above

v=[(1:8)' (2:9)']

v =
     1     2
     2     3
     3     4
     4     5
     5     6
     6     7
     7     8
     8     9

Now say we want to vectorize the following routine:

vC={v(C{1},:), v(C{2},:), v(C{3},:),v(C{4},:), v(C{5},:), v(C{6},:)}

I think this is actually a more illustrative example than the one in my original post. So let us give a try. Using the mat2cell trick above again, the full code should be:

sizeC = cellfun(@length, C); % get the length of each cell
C_flat = [C{:}]; % flatten the cell array
vC = mat2cell(v(C_flat,:), sizeC); % reshape the flattened array to the shape of C

Final remark on cellfun

Notice that if we use some customized function handle on cellfun, for example,

cellfun(@(x) yourOwnFunction(x), yourCellArray,'UniformOutput',false)

we will lose some of the charm in speedup as MATLAB reverts back to essentially executing for loops.

Linear indexing for cell array

Continuing from the previous section, we may encounter some scenarios such as accessing $t$-th cell’s $i$-th element in a cell array, with $t$ and $i$ both changing. In the context of matrix vectorization, using linear indexing with sub2ind and ind2sub are extremely useful, however, there is no such thing for cell array. We can construct a linear indexing for cell array again using some of the tricks above.

Say we have:

T = {
      [1,10,11,2], 
      [2,11,12,7,3],
      [11,28,29,21,18,12],
      [10,27,28,11],
      [27,34,35,28],
      [28,35,36,37,29],
      [35,40,41,36],
      [36,41,42,37],
      [41,45,46,42]
    }

and we want to build an array A such that A(j) = T{t(j)}(i(j)), a for loop can be used as follows:

t = [1 1 3 4 9];
i = [2 3 3 1 4];
for j = 1:length(t)
    A(j) = T{t(j)}(i(j));
end

and

A
ans = 
  [10 11 29 10 42]

BTW here we assume t is sorted in ascended order, and i is well-posed in that it does not exceed the size of the corresponding cell.

The vectorization of the routine above can benefit from the linear indexing of the cell array T: if we can convert T to its linear indexed counterpart T_flat, and retrieving the linear ind for (t,i), then constructing A is as simple as A = T_linear(ind).

The routine can be as follows: first we flatten T as before:

T_flat = [T{:}]';

To construct the linear indexing we need the length array again and using cumsum:

sZ = cellfun('length', T);
ix = [0; cumsum(sZ(1:end-1))]';

The array ix has same dimension with T and acts in place of the mod operator used in ind2sub:

ix
ans = 
    [0 4 9 15 19 23 28 32 36]

Now if we want to retrieve the say the 4th cell’s 2nd entry, ix(4)+2 would yield the correct linear index for T_flat: now a single line of code yield the same result as before.

T_flat(ix(t)+i)'
ans = 
    [10  11  29  10  42]

Applications

Some routines above are used in functions on retrieving topological relation and obtaining topological indicators related to geometries for edges in a polygonal mesh here. The journal article references is (Cao, 2021).

  1. Cao, S. (2021). A simple virtual element-based flux recovery on quadtree. In Electronic Research Archive: Special issue on recent advances in numerical analysis. https://doi.org/doi:10.3934/era.2021054

Comments