0

I have fuzzy clustering for Topic modelling and got this enter image description here.
There are all total 50 topics[0 to 49] and each topic consists 30 words with a probability multiplicative factor. Now how do I make it as a Classifier input. My final goal to document classification.

Demo

pip install octis
pip install FuzzyTM
from octis.dataset.dataset import Dataset
dataset = Dataset()
dataset.fetch_dataset('DBLP')
data = dataset._Dataset__corpus
print(data[0:5])
pwgt, ptgd = flsaW1.get_matrices()
topics = flsaW1.show_topics()
topics
Amartya
  • 133
  • 5

1 Answers1

0
  1. Prepare an evaluation dataset of atleat 100 documents.
  2. It is important to train with right data. Garbage in means gargage out. Manually verify the result of topic modelling.
  3. Prepare word vectors from documents: Gensim algo is better at context capture than countvector/tfid
  4. Try Navier Bayes or Neural network and use the most promising model. Decision Tree do not work well on Text Classification
amol goel
  • 341
  • 1
  • 6