Questions tagged [lda]

Latent Dirichlet Allocation (LDA) is an algorithm in the field of topic modeling.

If observations are words collected into documents, it posits that each document is a mixture of a small number of topics and that each word's creation is attributable to one of the document's topics. LDA represents documents as mixtures of topics that spit out words with certain probabilities.

Popular software packages to perform LDA include

Mallet (Java)
Stanford Topic Modeling Toolbox (software)
Gensim – Topic Modelling for Humans

It should not be confused with Linear Discriminant Analysis, a supervised learning procedure for classifying observations into a set of categories.

118 questions

votes

6 answers

Latent Dirichlet Allocation vs Hierarchical Dirichlet Process

Latent Dirichlet Allocation (LDA) and Hierarchical Dirichlet Process (HDP) are both topic modeling processes. The major difference is LDA requires the specification of the number of topics, and HDP doesn't. Why is that so? And what are the…

nlp topic-model lda

asked May 18 '14 at 06:10

alvas

2,510
7
28
40

votes

2 answers

What does the alpha and beta hyperparameters contribute to in Latent Dirichlet allocation?

LDA has two hyperparameters, tuning them changes the induced topics. What does the alpha and beta hyperparameters contribute to LDA? How does the topic change if one or the other hyperparameters increase or decrease? Why are they hyperparamters…

topic-model lda parameter

asked May 23 '14 at 06:25

alvas

2,510
7
28
40

votes

3 answers

Why should we not feed LDA with TF-IDF input?

Can someone explain why we can not feed LDA topic model with TFIDF? What is wrong with this approach conceptually?

machine-learning python topic-model lda

asked Aug 04 '17 at 03:56

sariii

votes

3 answers

Clustering of documents using the topics derived from Latent Dirichlet Allocation

I want to use Latent Dirichlet Allocation for a project and I am using Python with the gensim library. After finding the topics I would like to cluster the documents using an algorithm such as k-means(Ideally I would like to use a good one for…

python clustering lda

asked Nov 13 '14 at 09:19

Swan87

votes

1 answer

Gensim LDA model: return keywords based on relevance (λ - lambda) value

I am using the gensim library for topic modeling, more specifically LDA. I created my corpus, my dictionary, and my LDA model. With the help of the pyLDAvis library I visualized the results. When I print the words with the highest probability on…

python topic-model lda gensim

asked Aug 21 '19 at 17:40

Tasos Lytos

votes

5 answers

Tutorials on topic models and LDA

I would like to know if you people have some good tutorials (fast and straightforward) about topic models and LDA, teaching intuitively how to set some parameters, what they mean and if possible, with some real examples.

topic-model lda

asked Jan 08 '15 at 15:47

pedrobisp

votes

4 answers

BERT: it is possible to use it for topic modeling?

I'm struggling to understand which are the full capabilities of BERT: it is possible to make topic modeling of text, like the one we can achieve with LDA?

topic-model lda bert

asked Jun 05 '19 at 17:07

xcsob

votes

1 answer

How to split natural language script into segments?

I have a bunch of .txt and .srt files extracted from a MOOC website, they are the scripts of the videos. I would like to segment the scripts into parts such that each part falls into one of the following categories: MainConceptDescription->…

python data-mining topic-model lda processing

asked Apr 16 '18 at 15:32

A.D.

votes

1 answer

Can I use euclidean distance for Latent Dirichlet Allocation document similarity?

I have a Latent Dirichlet Allocation (LDA) model with $K$ topics trained on a corpus with $M$ documents. Due to my hyper parameter configurations, the output topic distributions for each document is heavily distributed on only 3-6 topics and all the…

nlp lda distance similar-documents

asked Nov 17 '17 at 12:04

PyRsquared

1,666
1
12
18

votes

1 answer

Calculating optimal number of topics for topic modeling (LDA)

am going to do topic modeling via LDA. I run my commands to see the optimal number of topics. The output was as follows: It is a bit different from any other plots that I have ever seen. Do you think it is okay? or it is better to use other…

nlp data-science-model topic-model lda

asked Apr 17 '21 at 10:21

Tahereh Maghsoudi

votes

1 answer

How to choose threshold for gensim Phrases when generating bigrams?

I'm generating bigrams with from gensim.models.phrases, which I'll use downstream with TF-IDF and/or gensim.LDA from gensim.models.phrases import Phrases, Phraser # 7k documents, ~500-1k tokens each. Already ran cleanup, stop_words, lemmatization,…

nlp text-mining lda gensim

asked Aug 14 '20 at 21:05

lefnire

votes

2 answers

Why do my Latent Dirichlet Allocation Topics mix words that never co-occurred?

I have one corpus of documents on diabetes, another on Leonardo da Vinci, and another on animation and computer graphics. I combined all of these documents into a LDA and got a topic like the one below. I'm listing the top 30 terms, in descending…

python topic-model lda

asked Aug 03 '15 at 11:45

Matt

votes

3 answers

Need help with LDA for selecting features

I am currently selecting features of products by using LDA to group 6000 keywords of product into topics. Here is the sample of my dataset after being organized into list of keywords for each product id. I consider each id as a "document" and each…

machine-learning bigdata topic-model lda

asked May 28 '15 at 22:07

sylvia

votes

2 answers

Topic modeling for short length sentences

I have a graph which was already separated into clusters. Each node in the graph has a label (typically, it's a function's name like org.java.somepackage.validateLogin). What I want to do is to give a representative label for each cluster. For the…

machine-learning nlp clustering text-mining lda

asked Dec 13 '17 at 15:40

Elimination

votes

3 answers

scikit-learn - Should I fit model with TF or TF-IDF?

I am trying to find out the best way to fit different probabilistic models (like Latent Dirichlet Allocation, Non-negative Matrix Factorization, etc) on sklearn (Python). Looking at the example in the sklearn documentation, I was wondering why the…

machine-learning python topic-model lda

asked Oct 21 '16 at 08:38

Luca P.

2 3 4 5 6 7 8 Next