I am having a problem with rank deficiency in a covariance matrix.
I have a data-set of M variables and N observations, M>N.
Calculating the singular value decomposition of the data-sets covariance matrix (MxM) I find that it has rank=N-1 and not M.
Maybe someone here can explain to me why.
Below is a small-scale example:
data:
-0.3430 -1.4018 -0.1397 -0.7793 0.9132
1.6663 -1.3601 -0.9833 -1.7622 0.9764
-0.7667 -1.5217 0.4078 -1.9355 -1.5769
2.6355 1.0547 0.2430 1.5269 0.2041
-0.0168 -0.1278 0.3975 0.6787 -1.8883
0.3042 -1.4103 -0.1757 -2.2772 0.7362
0.6843 0.6029 -0.3175 -1.4286 1.1169
0.0558 -0.4569 -1.1016 -1.1146 0.7434
covariance matrix:
0.7326 0.8063 0.1298 -0.3592 -0.6147 0.8389 0.3320 0.4167
0.8063 2.3060 0.1711 0.4710 -0.8912 1.6259 1.1083 0.9849
0.1298 0.1711 0.8714 -0.1736 0.2504 0.5108 0.0355 -0.2082
-0.3592 0.4710 -0.1736 1.0184 0.4131 -0.2144 -0.0841 -0.0075
-0.6147 -0.8912 0.2504 0.4131 1.0045 -0.8427 -0.7919 -0.7248
0.8389 1.6259 0.5108 -0.2144 -0.8427 1.5615 0.9651 0.7206
0.3320 1.1083 0.0355 -0.0841 -0.7919 0.9651 1.0335 0.6954
0.4167 0.9849 -0.2082 -0.0075 -0.7248 0.7206 0.6954 0.6295
Singular values:
5.7404 1.6008 1.3164 0.5000 0.0000 0.0000 0.0000 0.0000
Matlab code used:
M = 8;
N = 5;
rng(5);
data = randn(M, N);
cov_matrix = cov(data')';
[U,S,V]=svd(cov_matrix);
fprintf('Rank of data: %0.f\n', rank(data));
fprintf('Rank of covariance: %0.f\n', rank(cov_matrix));
fprintf('Rank of singular values: %0.f\n', rank(S));
figure;
plot(diag(S));