I was looking into how a next-word prediction engine like swift key or XT9 can be implemented.
Here's what I did.
- I read about n-grams here - en.wikipedia.org/wiki/N-gram and aicat.inf.ed.ac.uk/entry.php?id=663
- I read about Language Models/Markov Model/n-grams/training/Smoothing/Back-Offs - en.wikipedia.org/wiki/Language_model & www.stanford.edu/class/cs124/lec/languagemodeling.pptx & www.statmt.org/book/slides/07-language-models.pdf.
- I read about the T9 engine design for next-word prediction based on Tries - courses.cs.washington.edu/courses/cse303/09wi/homework/T9files/T9_Tries.pdf
- I came across SRILM, a popular toolkit for building & applying Language Models here - www.speech.sri.com/projects/srilm/ (the toolkit) & www.speech.sri.com/cgi-bin/run-distill?papers/icslp2002-srilm.ps.gz (the documentation)
- I came across the blog where Google's Peter Norvig made an announcement to share it's huge training corpus of one trillion words to the entire world - googleresearch.blogspot.in/2006/08/all-our-n-gram-are-belong-to-you
- I came across an n-gram viewer based on google books' corpus - books.google.com/ngrams/
- I came across Microsoft's N-gram services - web-ngram.research.microsoft.com/
- I came across an algorithm for N-Gram Language Models which is as fast as but smaller (in memory footprint) than SRILM's model (not based on tries, uses encoding) - nlp.cs.berkeley.edu/pubs/Pauls-Klein_2011_LM_paper.pdf (I need to do more work here.)
- I had a look at some open-source engines available like AnySoftKeyboard - github.com/AnySoftKeyboard. That's is a huge amount of code with no documentation!
Some discussions on stackoverflow:
- Implementing T9 prediction engine - Implementing T9 text prediction
- A discussion on implementation of autocomplete using tries vs. ternary search trees vs. succint trees - stackoverflow.com/questions/10970416/tries-versus-ternary-search-trees-for-autocomplete
The major players in this area:
- Swift Key - en.wikipedia.org/wiki/SwiftKey & www.swiftkey.net/en/
- XT9 by Nuance - en.wikipedia.org/wiki/XT9 & www.nuance.com/for-business/by-product/xt9/index.htm
Can anybody guide me how to proceed further.
I am relatively new to this site. So please guide me if my question is inappropriate for this site.