Vectorization tricks for cell arrays in MATLAB
Motivations
As is known, MATLAB is notoriously slow in executing for
loops. Looping through a large array is usually a nightmare, even more so if we add if/then
within, and/or for sparse matrices. Nevertheless, MATLAB is highly optimized in vectorized array and matrix operations using the LAPACK/BLAS backend, and as an interpreted/scripting language, MATLAB
operates in 4k by 4k matrices hundreds even thousands of times faster than direct implementations in compiled languages like C/C++ and Java.
Recently I am learning to code “virtual element method” for Long Chen’s $i$FEM. As it requires a polytopal data structure, the biggest difference with traditional finite element is:
The number of indices stored in each element is of variable lengths.
This poses some great challenges when we want to write a general purpose code to retrieve sparse matrices based on the topological structures.
In numpy
, the solution is usually adding paddings. Yet MATLAB has a more neat solution to this, which is the cell array:
a = {[2,3], [], [2, 3, 1]}
then accessing the element of a
can either be done through {}
or ()
indexing, with the latter being the preferred way of manipulating it as a(i)
returns the $i$-th entry as a cell, while a{i}
will yield a matrix or vector whenever applicable.
First example
Consider the following scripts to generate something like a variable length dice rolling simulation:
N = 100000;
result = cell(N,1);
nRoll = randi(10, [N,1]); % num of rolls each trial is between 1 to 10
for i = 1:N
result{i} = randi(6, [1, nRoll(i)]);
end
the first 5 elements of result
may look like
result
ans =
[6,6,5,5,1,4]
[2,1,3]
[1,4,4,6,3,4]
[4,6,1,1,1,6,6,2,6]
[5,6,2,4,4,4,6,4,4,6]
Now we want to retrieve the number of rolling in each sequence (i.e., recovering nRoll
from result
), the first way is through a for
loop:
avg1 = zeros(N,1);
for i = 1:N
avg1(i) = length(result{i});
end
On my old Retina Macbook Pro 2013, 10 runs yield an average of 0.03
seconds to execute. However, if we apply a built-in function through cellfun
:
avg2 = cellfun('length', result);
the average time is only 0.004
seconds for 10 runs. While the speed-up may not sound so significant, let us move to the next example.
Element-wise matrix vector multiplication
Now suppose we have a bunch of matrices $A$ stored in a 3D array, we want to compute element by element matrix vector multiplication $b = A\,\mathbf{x}$. An example setup can be:
N = 100000;
n = 10;
A = rand(N,n,n);
x = rand(N,n);
We want to obtain another array b
such that b(i,:) = A(i,:,:)*x(i,:)
, of course the simplest is using a for
loop to execute this literally:
b = zeros(N,n);
for i = 1:N
b(i,:) = A(i,:,:)*x(i,:);
end
It takes about 1.5
seconds. Now we vectorize this routine using bsxfun
:
b = sum(bsxfun(@times, A, x), 2);
This takes about 0.03
seconds, which is about 50 times speed-up.
Remark on tensor prod
Note that this is NOT a tensor product between these two arrays, as
\([A_1, A_2] \otimes [\mathbf{x}_1, \mathbf{x}_2] = \begin{pmatrix} A_1 \mathbf{x}_1 & A_1 \mathbf{x}_2 \\ A_2 \mathbf{x}_1 & A_2 \mathbf{x}_2 \end{pmatrix}\) computes more off-diagonal entries than the element-wise matrix vector product. The method here can be viewed as a vectorization for retrieving the diagonal blocks of a general tensor product between 3D arrays.
Use accumarray as np.append for cells
MATLAB has a built-in function dedicated to vectorization which is accumarray
which adds up VALS
in SUBS
by default in the expression of:
A = accumarray(subs,val,sz)
For example, if you have say
subs = [1 2 3 8 1 3 5 6 1 5 5]';
ix = accumarray(subs,1);
and ix
will be [3 1 2 0 3 1 0 1]
, by letting VALS
be 1, we are just recording the number of times each number appearing in the subs
array.
Now say we do not want to add up the times, instead we just want to append 1’s each time an index appears in the array, i.e., we want to get a cell array like the following:
[1,1,1]
1
[1,1]
[]
[1,1,1]
1
[]
1
1
is appended three times as 1
appears three times in subs
, and empty cells represent that number has not appeared once in the subs
. There is a single line of code can do this:
ix = accumarray(subs,1,[],@(x) {x});
which takes advantage of the fourth variable in accumarray
can be a function handle.
mat2cell is cell array’s reshape
reshape
function plays somewhat a central role in MATLAB vectorization for matrices and vectors. For example, to convert back and forth between :
(linear indexing) and the [row,col]
indices, sometimes there are too many pointers involved the MATLAB built-in sub2ind
does not meed our needs; or sometimes we would like to do some row-wise reshape instead of the default column-based linear indexing, reshape
are handy.
In the context of cell array, reshape
cannot be used. There is a function num2cell
but can only output uniform result (no variable length arrays within each cell). mat2cell
is here to rescue.
Suppose we have the following scenerio, we have a cell array recording how the connection of a node with neighbors in a graph:
C = {
[2, 3, 4, 6], ...
[1, 5, 4], ...
[2, 4], ...
[2], ...
[4, 6], ...
[1, 2, 3, 4, 5]
}
Say we would like to check which nodes are connected with node 2. If C
has consistent sizes, a simple C == 2
yields a logical matrix that gives you in this information; if an indicator is more preferred, sum(C==2, 2)
or any(C==2,2)
will do the trick. With more pointers involved, you may want to use reshape(C(:) == 2, size(C))
as well. However, none of these simple elegant code works for cell array.
Nevertheless, there does exist a simple elegant code to achieve something like reshape(C(:) == 2, size(C))
for cell array. First we can flatten the cell array by :
syntax:
C_flat = [C{:}];
or
C_flat = horzcat(C{:});
Then we obtain the logical array we want for this flattened array:
isConnected = (C_flat == 2);
Now we would like to reshape isConnected
to have the shape of C
, the first row has 4 indicators, the second row has 3, etc. We can do this by a trick used earlier:
sizeC = cellfun(@length, C);
isConnected = mat2cell(isConnected, sizeC);
Now isConnected
has the same shape as C
. To get an indicator array for each cell, cellfun
can be used again:
cellfun(@any, isConnected)
A less-convoluted example
(Updated July 27, 2021)
In the comment, Jan Valdman asked how we can convert the following code to a one-liner: for the C
defined above
v=[(1:8)' (2:9)']
v =
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
Now say we want to vectorize the following routine:
vC={v(C{1},:), v(C{2},:), v(C{3},:),v(C{4},:), v(C{5},:), v(C{6},:)}
I think this is actually a more illustrative example than the one in my original post. So let us give a try. Using the mat2cell
trick above again, the full code should be:
sizeC = cellfun(@length, C); % get the length of each cell
C_flat = [C{:}]; % flatten the cell array
vC = mat2cell(v(C_flat,:), sizeC); % reshape the flattened array to the shape of C
Final remark on cellfun
Notice that if we use some customized function handle on cellfun
, for example,
cellfun(@(x) yourOwnFunction(x), yourCellArray,'UniformOutput',false)
we will lose some of the charm in speedup as MATLAB reverts back to essentially executing for
loops.
Linear indexing for cell array
Continuing from the previous section, we may encounter some scenarios such as accessing $t$-th cell’s $i$-th element in a cell array, with $t$ and $i$ both changing. In the context of matrix vectorization, using linear indexing with sub2ind
and ind2sub
are extremely useful, however, there is no such thing for cell array. We can construct a linear indexing for cell array again using some of the tricks above.
Say we have:
T = {
[1,10,11,2],
[2,11,12,7,3],
[11,28,29,21,18,12],
[10,27,28,11],
[27,34,35,28],
[28,35,36,37,29],
[35,40,41,36],
[36,41,42,37],
[41,45,46,42]
}
and we want to build an array A
such that A(j) = T{t(j)}(i(j))
, a for
loop can be used as follows:
t = [1 1 3 4 9];
i = [2 3 3 1 4];
for j = 1:length(t)
A(j) = T{t(j)}(i(j));
end
and
A
ans =
[10 11 29 10 42]
BTW here we assume t
is sorted in ascended order, and i
is well-posed in that it does not exceed the size of the corresponding cell.
The vectorization of the routine above can benefit from the linear indexing of the cell array T
: if we can convert T
to its linear indexed counterpart T_flat
, and retrieving the linear ind
for (t,i)
, then constructing A
is as simple as A = T_linear(ind)
.
The routine can be as follows: first we flatten T
as before:
T_flat = [T{:}]';
To construct the linear indexing we need the length array again and using cumsum
:
sZ = cellfun('length', T);
ix = [0; cumsum(sZ(1:end-1))]';
The array ix
has same dimension with T
and acts in place of the mod
operator used in ind2sub
:
ix
ans =
[0 4 9 15 19 23 28 32 36]
Now if we want to retrieve the say the 4th cell’s 2nd entry, ix(4)+2
would yield the correct linear index for T_flat
: now a single line of code yield the same result as before.
T_flat(ix(t)+i)'
ans =
[10 11 29 10 42]
Applications
Some routines above are used in functions on retrieving topological relation and obtaining topological indicators related to geometries for edges in a polygonal mesh here. The journal article references is (Cao, 2021).
Comments