0

This question refers to the following paper:

Support Vector Machines for Speaker and Language Recognition, W. M. Campbell, J. P. Campbell, D. A. Reynolds, E. Singer, P. A. Torres-Carrasquillo, Computer speech and Language 20 (2006) 210-229.

I am trying to implement the algorithm in table 1 and table 2 in page 18. In step 6 of of table 1 they are calculating $b_z^i$ as a mean (or sum) of $b(z_i)$ and number of entries is $N_z$ which they claim to be the number of features.

The question is what is $N_z$ here. As I understand each feature set, which is of dimension $N_z$, has been used to create $b(z_i)$, so what this summation means? One can only sum over time dimension, which has nothing to do $N_z$. $N_z$ is kind of spatial dimension as one time frame of data is converted to features.

D.W.
  • 167,959
  • 22
  • 232
  • 500
Creator
  • 3
  • 4

1 Answers1

0

$N_z$ is number of frames in the utterance, it is exactly time dimension. Instead "number of features" they should say "number of feature vectors".

Nikolay Shmyrev
  • 385
  • 2
  • 7