# Vectorization tricks on cell array manipulation in MATLAB

## Prologue

As is known, MATLAB is notoriously slow in executing `for`

loops. Looping through a large array is usually a nightmare, even more so if you add `if/then`

within, and/or for sparse matrices. Nevertheless, MATLAB is highly optimized in vectorized array and matrix operations using the LAPACK/BLAS backend, and as an interpreted/scripting language, MATLAB
operates in 4k by 4k matrices hundreds even thousands of times faster than direct implementations
in compiled languages like C/C++ and Java^{1}.

Recently I am learning to code “virtual element method” for Long Chen’s $i$FEM^{2}, as it requires a polytopal data structure, in which the number of indices stored in each element is of variable length. In `numpy`

, the solution is usually adding paddings^{3}. Yet MATLAB has a more neat solution to this, which is the cell array:

```
a = {[2,3], [], [2, 3, 1]}
```

then accessing the element of `a`

can either be done through `{}`

or `()`

indexing, with the latter being the preferred way of manipulating it.

## Motivating benchmark

Consider the following scripts to generate something like a variable length’d dice rolling simulation:

```
N = 100000;
result = cell(N,1);
nRoll = randi(10, [N,1]); % num of rolls each trial is between 1 to 10
for i = 1:N
result{i} = randi(6, [1, nRoll(i)]);
end
```

the first 5 elements of `result`

may look like

```
[6,6,5,5,1,4]
[2,1,3]
[1,4,4,6,3,4]
[4,6,1,1,1,6,6,2,6]
[5,6,2,4,4,4,6,4,4,6]
```

Now we want to retrieve the number of rolling in each sequence (i.e., recovering `nRoll`

from `result`

), the first way is through a `for`

loop:

```
avg1 = zeros(N,1);
for i = 1:N
avg1(i) = length(result{i});
end
```

On my old Retina Macbook Pro 2013, 10 runs yield an average of `0.03`

seconds to execute. However, if we apply a built-in function through `cellfun`

:

```
avg2 = cellfun('length', result);
```

the average time is only `0.004`

seconds for 10 runs. While the speed-up may not sound so significant, let us move to the next example.

## Element-wise matrix vector multiplication

Now suppose we have a bunch of matrices $A$ stored in a 3D array, we want to compute element by element matrix vector multiplication $b = A\,\mathbf{x}$. An example setup can be:

```
N = 100000;
n = 10;
A = rand(N,n,n);
x = rand(N,n);
```

We want to obtain another array `b`

such that `b(i,:) = A(i,:,:)*x(i,:)`

, of course the simplest is using a `for`

loop to execute this literally:

```
b = zeros(N,n);
for i = 1:N
b(i,:) = A(i,:,:)*x(i,:);
end
```

It takes about `1.5`

seconds. Now we vectorize this routine using `bsxfun`

:

```
b = sum(bsxfun(@times, A, x), 2);
```

This takes about `0.03`

seconds, which is about 50 times speed-up.

#### Remark

Note that this is NOT a tensor product between these two arrays, as

\([A_1, A_2] \otimes [\mathbf{x}_1, \mathbf{x}_2] = \begin{pmatrix} A_1 \mathbf{x}_1 & A_1 \mathbf{x}_2 \\ A_2 \mathbf{x}_1 & A_2 \mathbf{x}_2 \end{pmatrix}\) computes more off-diagonal entries than the element-wise matrix vector product. The method here can be viewed as a vectorization for retrieving the diagonal blocks of a general tensor product between 3D arrays.

## Use `accumarray`

as `np.append`

but for cells

MATLAB has a built-in function dedicated to vectorization which is `accumarray`

which adds up `VALS`

in `SUBS`

by default in the expression of:

```
A = accumarray(subs,val,sz)
```

For example, if you have say

```
subs = [1 2 3 8 1 3 5 6 1 5 5]';
ix = accumarray(subs,1);
```

and `ix`

will be `[3 1 2 0 3 1 0 1]`

, by letting `VALS`

be 1, we are just recording the number of times each number appearing in the `subs`

array.

Now say we do not want to add up the times, instead we just want to append 1’s each time an index appears in the array, i.e., we want to get a cell array like the following:

```
[1,1,1]
1
[1,1]
[]
[1,1,1]
1
[]
1
```

`1`

is appended three times as `1`

appears three times in `subs`

, and empty cells represent that number has not appeared once in the `subs`

. There is a single line of code can do this:

```
ix = accumarray(subs,1,[],@(x) {x});
```

which takes advantage of the fourth variable in `accumarray`

can be a function handle.

`mat2cell`

is cell array’s `reshape`

`reshape`

function plays somewhat a central role in MATLAB vectorization for matrices and vectors. For example, to convert back and forth between `:`

(linear indexing) and the `[row,col]`

indices, sometimes there are too many pointers involved the MATLAB built-in `sub2ind`

does not meed our needs; or sometimes we would like to do some row-wise reshape instead of the default column-based linear indexing, `reshape`

are handy.

In the context of cell array, `reshape`

cannot be used. There is a function `num2cell`

but can only output uniform result (no variable length arrays within each cell). `mat2cell`

is here to rescue.

Suppose we have the following scenerio, we have a cell array recording how the connection of a node with neighbors in a graph:

```
C = {
[2, 3, 4, 6],
[1, 5, 4],
[2, 4],
[2],
[4, 6],
[1, 2, 3, 4, 5]
}
```

Say we would like to check which nodes are connected with node 2. If `C`

has consistent sizes, a simple `C == 2`

yield a logical matrix that gives you in this information; if an indicator is more preferred, `sum(C==2, 2)`

or `any(C==2,2)`

will do the trick. With more pointers involved, you may want to use `reshape(C(:) == 2, size(C))`

as well. However, none of these simple elegant code works for cell array.

However, there does exist a simple elegant code to achieve something like `reshape(C(:) == 2, size(C))`

for cell array. First we can **flatten** the cell array by `:`

syntax:

```
C_flat = [C{:}];
```

or

```
C_flat = horzcat(C{:});
```

Then we obtain the logical array we want for this flattened array:

```
isConnected = (C_flat == 2);
```

Now we would like to reshape `isConnected`

to have the shape of `C`

, the first row has 4 indicators, the second row has 3, etc. We can do this by a trick used earlier:

```
sizeC = cellfun(@length, C);
isConnected = mat2cell(isConnected, sizeC);
```

Now `isConnected`

has the same shape as `C`

, to get an indicator array, `cellfun`

can be used again:

```
cellfun(@any, isConnected)
```

#### Final remark on `cellfun`

Notice that if we use some customized function handle on `cellfun`

, for example,

```
cellfun(@(x) yourOwnFunction(x), yourCellArray,'UniformOutput',false)
```

we will lose some of the charm in speedup as MATLAB reverts back to essentially executing `for`

loops.

## Linear indexing for cell array

Continuing from the previous section, we may encounter some scenarios such as accessing $t$-th cell’s $i$-th element in a cell array, with $t$ and $i$ both changing. In the context of matrix vectorization, using linear indexing with `sub2ind`

and `ind2sub`

are extremely useful, however, there is no such thing for cell array. We can construct a linear indexing for cell array again using some of the tricks above.

Say we have:

```
T = {[1,10,11,2],
[2,11,12,7,3],
[11,28,29,21,18,12],
[10,27,28,11],
[27,34,35,28],
[28,35,36,37,29],
[35,40,41,36],
[36,41,42,37],
[41,45,46,42]}
```

and we want to build an array `A`

such that `A(j) = T{t(j)}(i(j))`

, a `for`

loop can be used as follows:

```
t = [1 1 3 4 9];
i = [2 3 3 1 4];
for j = 1:length(t)
A(j) = T{t(j)}(i(j));
end
```

and

```
A
ans =
[10 11 29 10 42]
```

BTW here we assume `t`

is sorted in ascended order, and `i`

is well-posed in that it does not exceed the size of the corresponding cell.

The vectorization of the routine above can benefit from the linear indexing of the cell array `T`

: if we can convert `T`

to its linear indexed counterpart `T_flat`

, and retrieving the linear `ind`

for `(t,i)`

, then constructing `A`

is as simple as `A = T_linear(ind)`

.

The routine can be as follows: first we flatten `T`

as before:

```
T_flat = [T{:}]';
```

To construct the linear indexing we need the length array again and using `cumsum`

:

```
sZ = cellfun('length', T);
ix = [0; cumsum(sZ(1:end-1))]';
```

The array `ix`

has same dimension with `T`

and acts in place of the `mod`

operator used in `ind2sub`

:

```
ix
ans =
[0 4 9 15 19 23 28 32 36]
```

Now if you want to retrieve the say the 4th cell’s 2nd entry, `ix(4)+2`

would yield the correct linear index for `T_flat`

: now a single line of code yield the same result as before.

```
T_flat(ix(t)+i)'
ans =
[10 11 29 10 42]
```

## Applications

Some of the routines above are used in functions on retrieving topological relation and obtaining topological indicators related to geometries for edges in a polygonal mesh here.

## Comments