I am implementing a paper which need Dynamic Frequency Warping as a component. They have written very briefly about this algorithm and cite to this paper: Voice transformation using PSOLA technique (page 9/13 of the PDF, section 3.3). I have read it but can't understand its mechanic. I've also made some search on Google but almost have no improvement. As I understand, DFW take an input of 2 spectra (A and B) and then calculate a warping matrix that convert A->B. Can someone give me some clearer intuition?
Thanks in advance.