Questions tagged [nlp]

Natural language processing (NLP) is a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural) languages. As such, NLP is related to the area of human–computer interaction. Many challenges in NLP involve natural language understanding, that is, enabling computers to derive meaning from human or natural language input, and others involve natural language generation.

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches. It is often regarded as the engineering arm of Computational Linguistics.

NLP tasks

Text pre-processing
Coreference resolution
Dependency parsing
Document summarization
Named entity recognition (NER) named-entity-recognition
Information extraction (IE) information-retrieval
Language modeling
Part-of-speech (POS) tagging
Morphological analysis and wordform generation
Phrase-structure (constituency) parsing
Machine translation (MT) machine-translation
Question answering (QA)
Sentiment analysis sentiment-analysis
Semantic parsing
Text categorization
Textual entailment detection
Topic modeling topic-model
Word Sense Disambiguation (WSD)

Beginner books on Natural Language Processing

2672 questions

109

votes

4 answers

What is the positional encoding in the transformer model?

I'm trying to read and understand the paper Attention is all you need and in it, there is a picture: I don't know what positional encoding is. by listening to some youtube videos I've found out that it is an embedding having both meaning and…

nlp encoding attention-mechanism transformer

asked Apr 28 '19 at 14:43

Peyman

1,235
2
9
8

votes

4 answers

What is purpose of the [CLS] token and why is its encoding output important?

I am reading this article on how to use BERT by Jay Alammar and I understand things up until: For sentence classification, we’re only only interested in BERT’s output for the [CLS] token, so we select that slice of the cube and discard everything…

nlp sentiment-analysis bert language-model text-classification

asked Jan 09 '20 at 17:20

user3768495

votes

6 answers

Latent Dirichlet Allocation vs Hierarchical Dirichlet Process

Latent Dirichlet Allocation (LDA) and Hierarchical Dirichlet Process (HDP) are both topic modeling processes. The major difference is LDA requires the specification of the number of topics, and HDP doesn't. Why is that so? And what are the…

nlp topic-model lda

asked May 18 '14 at 06:10

alvas

2,510
7
28
40

votes

8 answers

In a Transformer model, why does one sum positional encoding to the embedding rather than concatenate it?

While reviewing the Transformer architecture, I realized something I didn't expect, which is that : the positional encoding is summed to the word embeddings rather than concatenated to…

nlp encoding transformer attention-mechanism

asked Jul 18 '19 at 08:34

FremyCompany

votes

5 answers

What are some standard ways of computing the distance between documents?

When I say "document", I have in mind web pages like Wikipedia articles and news stories. I prefer answers giving either vanilla lexical distance metrics or state-of-the-art semantic distance metrics, with stronger preference for the latter.

machine-learning data-mining nlp text-mining similarity

asked Jul 05 '14 at 16:10

Matt

votes

7 answers

How to get sentence embedding using BERT?

How to get sentence embedding using BERT? from transformers import BertTokenizer tokenizer=BertTokenizer.from_pretrained('bert-base-uncased') sentence='I really enjoyed this movie a lot.' #1.Tokenize the…

tensorflow nlp pytorch bert

asked Nov 04 '19 at 15:22

star

1,521
7
20
31

votes

4 answers

When to use cosine simlarity over Euclidean similarity

In NLP, people tend to use cosine similarity to measure document/text distances. I want to hear what do people think of the following two scenarios, which to pick, cosine similarity or Euclidean? Overview of the task set: The task is to compute…

machine-learning nlp clustering similarity

asked Feb 12 '18 at 13:31

Logan

votes

6 answers

Sentence similarity prediction

I'm looking to solve the following problem: I have a set of sentences as my dataset, and I want to be able to type a new sentence, and find the sentence that the new one is the most similar to in the dataset. An example would look like: New…

python nlp scikit-learn similarity text

asked Oct 22 '17 at 07:36

lte__

1,379
5
19
29

votes

6 answers

How do I load FastText pretrained model with Gensim?

I tried to load fastText pretrained model from here Fasttext model. I am using wiki.simple.en from gensim.models.keyedvectors import KeyedVectors word_vectors = KeyedVectors.load_word2vec_format('wiki.simple.bin', binary=True) But, it shows the…

nlp gensim

asked Jun 30 '17 at 02:14

Sabbiu Shah

votes

8 answers

Best practical algorithm for sentence similarity

I have two sentences, S1 and S2, both which have a word count (usually) below 15. What are the most practically useful and successful (machine learning) algorithms, which are possibly easy to implement (neural network is ok, unless the architecture…

nlp clustering word2vec similarity

asked Nov 23 '17 at 14:40

DaveTheAl

votes

5 answers

How can I get a measure of the semantic similarity of words?

What is the best way to figure out the semantic similarity of words? Word2Vec is okay, but not ideal: # Using the 840B word Common Crawl GloVe vectors with gensim: # 'hot' is closer to 'cold' than 'warm' In [7]: model.similarity('hot',…

nlp word-embeddings word2vec nltk

asked Jul 19 '16 at 21:54

Thomas Johnson

votes

3 answers

General approach to extract key text from sentence (nlp)

Given a sentence like: Complimentary gym access for two for the length of stay ($12 value per person per day) What general approach can I take to identify the word gym or gym access?

machine-learning nlp text-mining data-cleaning

asked Mar 13 '15 at 16:41

William Falcon

votes

3 answers

What is a better input for Word2Vec?

This is more like a general NLP question. What is the appropriate input to train a word embedding namely Word2Vec? Should all sentences belonging to an article be a separate document in a corpus? Or should each article be a document in said…

nlp word-embeddings

asked Nov 08 '15 at 04:17

wacax

3,500
4
26
48

votes

4 answers

What algorithms should I use to perform job classification based on resume data?

Note that I am doing everything in R. The problem goes as follow: Basically, I have a list of resumes (CVs). Some candidates will have work experience before and some don't. The goal here is to: based on the text on their CVs, I want to classify…

machine-learning classification nlp text-mining

asked Jul 03 '14 at 16:11

user1769197

votes

1 answer

NLP - why is "not" a stop word?

I am trying to remove stop words before performing topic modeling. I noticed that some negation words (not, nor, never, none etc..) are usually considered to be stop words. For example, NLTK, spacy and sklearn include "not" on their stop word lists.…

nlp topic-model sentiment-analysis

asked Dec 15 '16 at 22:20

E.K.

2 3

…

99 100 Next