State of the art in multi-modal command recognition

Asked Aug 31 '15 at 21:16

Active Aug 31 '15 at 22:01

Viewed 30 times

I'm currently researching various fusion methods in a multi-modal (video, audio, identity, user position and gesture) human-computer interaction environment (think in terms of a smart-home system). What is the current state of the art in this field and what fusion methods do they use?

The most recent publication I found was a PhD thesis that relied on lip reading, but I don't think this methodology is reasonable in the type of wide environment I'm considering.

Additionally, I found this publication which fuses various channels into "acts", but this seems to focus semantic-level fusion, which is seems simplistic.

edited Aug 31 '15 at 22:01

asked Aug 31 '15 at 21:16

Seanny123

State of the art in multi-modal command recognition

0 Answers0