Questions tagged [topic-model]

A topic model describes text from a large corpus as a probability distribution over topics which are probability distributions over words. There are quantified contributions from all topics to a specific text.

A topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Intuitively, given that a document is about a particular topic, one would expect particular words to appear in the document more or less frequently: "dog" and "bone" will appear more often in documents about dogs, "cat" and "meow" will appear in documents about cats (source: wikipedia)

Generative models (i.e. the statistical models used for topic modelling)

Latent Dirichlet Allocation (LDA)
Hierarchical Dirichlet process (HDP)
Non-Negative Matrix Factorisation

Software / Libraries

Mallet (Java)
Stanford Topic Modeling Toolbox (software)
Gensim – Topic Modelling for Humans
lda (R)

142 questions

votes

6 answers

Latent Dirichlet Allocation vs Hierarchical Dirichlet Process

Latent Dirichlet Allocation (LDA) and Hierarchical Dirichlet Process (HDP) are both topic modeling processes. The major difference is LDA requires the specification of the number of topics, and HDP doesn't. Why is that so? And what are the…

nlp topic-model lda

asked May 18 '14 at 06:10

alvas

2,510
7
28
40

votes

1 answer

NLP - why is "not" a stop word?

I am trying to remove stop words before performing topic modeling. I noticed that some negation words (not, nor, never, none etc..) are usually considered to be stop words. For example, NLTK, spacy and sklearn include "not" on their stop word lists.…

nlp topic-model sentiment-analysis

asked Dec 15 '16 at 22:20

E.K.

votes

3 answers

What is difference between text classification and topic models?

I know the difference between clustering and classification in machine learning, but I don't understand the difference between text classification and topic modeling for documents. Can I use topic modeling over documents to identify a topic? Can I…

classification text-mining topic-model

asked Aug 12 '14 at 03:50

Ali

votes

2 answers

What does the alpha and beta hyperparameters contribute to in Latent Dirichlet allocation?

LDA has two hyperparameters, tuning them changes the induced topics. What does the alpha and beta hyperparameters contribute to LDA? How does the topic change if one or the other hyperparameters increase or decrease? Why are they hyperparamters…

topic-model lda parameter

asked May 23 '14 at 06:25

alvas

2,510
7
28
40

votes

3 answers

Why should we not feed LDA with TF-IDF input?

Can someone explain why we can not feed LDA topic model with TFIDF? What is wrong with this approach conceptually?

machine-learning python topic-model lda

asked Aug 04 '17 at 03:56

sariii

votes

1 answer

What is the difference between topic modeling and clustering?

I know that topic modeling and clustering are related, but not similar techniques. Can anyone suggest what are the main differences?

clustering unsupervised-learning topic-model

asked Jan 18 '18 at 06:20

sara

votes

1 answer

Gensim LDA model: return keywords based on relevance (λ - lambda) value

I am using the gensim library for topic modeling, more specifically LDA. I created my corpus, my dictionary, and my LDA model. With the help of the pyLDAvis library I visualized the results. When I print the words with the highest probability on…

python topic-model lda gensim

asked Aug 21 '19 at 17:40

Tasos Lytos

votes

5 answers

Tutorials on topic models and LDA

I would like to know if you people have some good tutorials (fast and straightforward) about topic models and LDA, teaching intuitively how to set some parameters, what they mean and if possible, with some real examples.

topic-model lda

asked Jan 08 '15 at 15:47

pedrobisp

votes

4 answers

How to give name to topics created using LDA?

I have categorized 800,000 documents into 500 categories using the Mahout topic modelling. Instead of representing the topic using the top 5/10 words for each topics, I want to infer a generic name for the group using any existing algorithm. For the…

machine-learning data-mining nlp text-mining topic-model

asked Jan 07 '16 at 04:28

adihere

votes

1 answer

Resume Parsing - extracting skills from resume using Machine Learning

I am trying to extract a skill set of an employee from his/her resume. I have resumes stored as plain text in Database. I do not have predefined skills in this case. How should I approach this problem? I can think of two ways: Using unsupervised…

machine-learning python text-mining topic-model

asked Aug 04 '18 at 05:27

Sociopath

1,293
2
12
27

votes

4 answers

BERT: it is possible to use it for topic modeling?

I'm struggling to understand which are the full capabilities of BERT: it is possible to make topic modeling of text, like the one we can achieve with LDA?

topic-model lda bert

asked Jun 05 '19 at 17:07

xcsob

votes

1 answer

How to split natural language script into segments?

I have a bunch of .txt and .srt files extracted from a MOOC website, they are the scripts of the videos. I would like to segment the scripts into parts such that each part falls into one of the following categories: MainConceptDescription->…

python data-mining topic-model lda processing

asked Apr 16 '18 at 15:32

A.D.

votes

1 answer

Comparing two Corpora using Topic Model

I want to compare two corpora (two different collections of texts) using Topic Modeling. I trained the model separately on the two collections and manually matched similar topics based on their frequent words. I was wondering if there is a…

text-mining topic-model

asked Feb 21 '17 at 20:26

saghi

votes

1 answer

Calculating optimal number of topics for topic modeling (LDA)

am going to do topic modeling via LDA. I run my commands to see the optimal number of topics. The output was as follows: It is a bit different from any other plots that I have ever seen. Do you think it is okay? or it is better to use other…

nlp data-science-model topic-model lda

asked Apr 17 '21 at 10:21

Tahereh Maghsoudi

votes

2 answers

Why do my Latent Dirichlet Allocation Topics mix words that never co-occurred?

I have one corpus of documents on diabetes, another on Leonardo da Vinci, and another on animation and computer graphics. I combined all of these documents into a LDA and got a topic like the one below. I'm listing the top 30 terms, in descending…

python topic-model lda

asked Aug 03 '15 at 11:45

Matt

2 3

…

9 10 Next