2

I have a text documents that contain 1000s of abstracts from medical whitepapers. I want to find themes from that text. Any suggestions other than text clustering since clustering helped me to find keywords arranged in a cluster. I tried to auto summarize using markovify library in python but the summary it created or the sentences did not make much sense. Any suitable suggestions are welcome. Thank you

sona_1105
  • 43
  • 1
  • 6

1 Answers1

3

The best method to find themes in a collection of documents is topic modeling. Topic modeling finds the hidden (aka, latent) themes beyond just keyword counts.

There are many approaches to topic modeling. Latent Dirichlet allocation (LDA) is a standard topic modeling approach. LDA is a probabilistic graphical model that assumes that each document is a mixture of a small number of topics and that each word's creation is attributable to one of the document's topics. The number of topics is a selected hyperparameter.

Brian Spiering
  • 23,131
  • 2
  • 29
  • 113