Highest Voted 'text' Questions - Data Science Stack Exchange

37

votes

6 answers

Sentence similarity prediction

I'm looking to solve the following problem: I have a set of sentences as my dataset, and I want to be able to type a new sentence, and find the sentence that the new one is the most similar to in the dataset. An example would look like: New…

asked Oct 22 '17 at 07:36

lte__

1,379
5
19
29

25

votes

3 answers

How do you apply SMOTE on text classification?

Synthetic Minority Oversampling Technique (SMOTE) is an oversampling technique used in an imbalanced dataset problem. So far I have an idea how to apply it on generic, structured data. But is it possible to apply it on text classification problem?…

class-imbalance text smote

asked Feb 10 '18 at 11:18

catris25

369
1
3
5

10

votes

1 answer

How to use TFIDF vectors with multinomial naive bayes?

Say we have used the TFIDF transform to encode documents into continuous-valued features. How would we now use this as input to a Naive Bayes classifier? Bernoulli naive-bayes is out, because our features aren't binary anymore. Seems like we can't…

scikit-learn naive-bayes-classifier text

asked Apr 05 '17 at 17:10

dhrumeel

201
2
4

9

votes

2 answers

How to implement hierarchical labeling classification?

I am currently working on the task of eCommerce product name classification, so I have categories and subcategories in product data. I noticed that using subcategories as labels delivers worse results (84% acc) than categories (94% acc). But…

neural-network classification keras nlp text

asked Feb 25 '19 at 12:17

chacid

171
7

9

votes

1 answer

Which type auto encoder gives best results for text

I did I couple of examples for auto encoders for images and they worked fine. Now I want to do an auto encoder for text that takes as input a sentence and returns the same sentence. But when I try to use the same auto encoders as the ones I used for…

python keras autoencoder text

asked Mar 25 '18 at 20:43

sspp

109
2
6

7

votes

2 answers

Data transformations in hierarchical classification

I am building a hierarchical text classifier using the Local Classifier Per Parent Node (LCPN) approach with the 'siblings' policy as described in the A survey of hierarchical classification across different application domains: E.g. if we have the…

classification multiclass-classification text

asked Dec 05 '19 at 00:03

matentzn

171
1

6

votes

2 answers

What is the minimum number of times a word needs to appear in word2vec training corpus for quality results?

When training a word2vec model with, eg, gensim, you can specify the minimum times a word needs to be seen (with the parameter min_count). The default value for this seems to be 5. Are there any theoretical considerations for selecting a threshold…

deep-learning word2vec text

asked May 07 '18 at 20:01

user1253952

203
2
5

5

votes

1 answer

Doc2vec to calculate cosine similarity - absolutely inaccurate

I'm trying to modify the Doc2vec tutorial to calculate cosine similarity and take Pandas dataframes instead of .txt documents. I want to find the most similar sentence to a new sentence I put in from my data. However, after training, even if I give…

python nlp similarity text gensim

asked Nov 06 '17 at 11:03

lte__

1,379
5
19
29

5

votes

2 answers

Text similarity using RNN

Data set contains records of short text, typically a sentence. The goal is to find duplicated records and similar records. Currently, I have tried R package 'text2vec', the glove word vectors and the similarity APIs provided by the package. There is…

machine-learning nlp similarity rnn text

asked Jan 24 '17 at 11:13

user28251

51
1
2

5

votes

3 answers

How can I group texts with similar content together?

I need to find a solution to group a corpus of texts according to document similarity. Premising I have no experience in ML - only a few readings - I'd like to know if calculating the tf-idf on each text is the right approach. I've read something…

nlp similarity text

asked May 03 '16 at 14:33

Max

191
1
6

5

votes

1 answer

How to evaluate the similarity of two columns containing strings?

I am new to text processing and stuck on a problem to identify the similarity of columns. To detail the problem, consider we have two columns with string values: Column A | Column B ------------------------------- abcd | …

text text-processing

asked Nov 12 '21 at 09:17

Rachit Tayal

213
1
2
4

4

votes

1 answer

encoding of text data in NLP

I'm getting data using web scraping to create a dataset. I have a 'company' column that contains the names of the companies. I would like to encode this column but i don't know how to find the sentences that represent the same companies . For…

python nlp preprocessing similarity text

asked Jun 17 '21 at 09:28

Lydia

43
2

4

votes

3 answers

Bidirectional Encoder Representations from Transformers in R

Can anybody suggest to me, where I can find example code for R language for BERT neural network for text mining tasks. All I can see are python examples, and I need…

nlp r text programming

asked May 13 '19 at 11:59

Kogan

4

votes

1 answer

How does ,the Mutlinomial Bayes's alpha parameter, affects the text classification task?

I would like to know how the alpha parameter, in Multinomial Bayes, affects the text classification task. I know that this parameter is correlated to the algorithm's ability in classifying unseen words during training. How changes text…

classification naive-bayes-classifier text

asked Apr 18 '18 at 10:28

Simone

725
2
14
23

4

votes

4 answers

Extract 2 pieces of information from a string - what to use?

First of all, I am a complete newbie in regard to data science and I am not asking for the complete solution but some guidance as to what I should read up to achieve my task (what algorithms, techniques etc are used to tackle similar problems). I…

text-mining information-retrieval training text

asked Sep 26 '16 at 09:52

kyriakos

141
1

Questions tagged [text]