I have a text documents that contain 1000s of abstracts from medical whitepapers. I want to find themes from that text. Any suggestions other than text clustering since clustering helped me to find keywords arranged in a cluster. I tried to auto summarize using markovify library in python but the summary it created or the sentences did not make much sense. Any suitable suggestions are welcome. Thank you
Asked
Active
Viewed 9,286 times
1 Answers
3
The best method to find themes in a collection of documents is topic modeling. Topic modeling finds the hidden (aka, latent) themes beyond just keyword counts.
There are many approaches to topic modeling. Latent Dirichlet allocation (LDA) is a standard topic modeling approach. LDA is a probabilistic graphical model that assumes that each document is a mixture of a small number of topics and that each word's creation is attributable to one of the document's topics. The number of topics is a selected hyperparameter.
Brian Spiering
- 23,131
- 2
- 29
- 113