Highest Voted 'semantic-similarity' Questions

5

votes

3 answers

where to store embeddings for similarity search?

I've asked on stackoverflow already (here), but I figured that the approach of storing embeddings in an ordinary postgres-Database might be flawed from the very beginning. I will shortly etch out the application again: text corpora (few hundred…

asked Nov 26 '19 at 10:30

Angus

51
1
2

4

votes

1 answer

How to build recommendation model based on resume and job description?

How to build a model which will result in better recommendation of resumes based on the job description given? I am familiar with bow or tfidf (n-grams) approach and then take a cosine similarity but I'm looking for a deep learning approach. I don't…

machine-learning deep-learning nlp recommender-system semantic-similarity

asked Jun 24 '20 at 20:16

user_12

347
3
10

3

votes

1 answer

Evaluation metric for Information retrieval system

I am currently reading Semantic Product Search paper published by Amazon. They are using two evaluation subtasks matching and ranking. In matching, they tune the model hyperparameters to maximize Recall@100 and Mean Average Precision…

nlp model-evaluations information-retrieval semantic-similarity

asked Dec 07 '20 at 12:12

Sayali Sonawane

2,101
3
13
13

3

votes

1 answer

Semantic network using word2vec

I have thousands of headlines and I would like to build a semantic network using word2vec, specifically google news files. My sentences look like Titles Dogs are humans’ best friends A dog died because of an accident You can clean dogs’ paws using…

python neural-network word2vec nlp semantic-similarity

asked Jul 27 '20 at 23:19

Math

161
1
13

3

votes

1 answer

Why do semantically different words produce similar embeddings?

I am comparing words in HuggingFace web UI using e5-small-v2, one of the best vector embedding models: Assuming that the scores are in the range from 0 to 1, how come all the scores are so high? In fact, I was not able to produce any example with a…

word-embeddings semantic-similarity

asked Aug 05 '23 at 23:48

AlwaysLearning

141
2

3

votes

2 answers

Semantic search - combine text and image embedding

I have a question regarding combining text and image embeddings for semantic search. The use case is product search on a (B2B) marketplace, we have image(s) and title&description of the products. I want to allow the user to search both the image and…

word-embeddings search semantic-similarity

asked May 08 '23 at 15:52

Steven

31
1
4

3

votes

1 answer

Is there a reference dataset for contextual similarity?

I'm doing some experiments with word embeddings to try to capture context-aware similarity, so that for example the word pair apple - hardware, are very dissimilar in the context of a fruit store, but very similar in an IT context. My question is if…

nlp word-embeddings similarity semantic-similarity

asked Mar 03 '23 at 08:01

Jorgemar

241
1
5

3

votes

1 answer

How do we evaluate the outputs of text generation models?

Evaluation of a wide variety of natural language generation (NLG) tasks is difficult. For instance, for a question answering model, it is hard for a human to quantify how well the model has answered a particular question. Doing this at scale is even…

nlp word-embeddings model-evaluations text-generation semantic-similarity

asked Feb 23 '23 at 11:51

Greggs

131
1
3

3

votes

1 answer

Threshold determination / prediction for cosine similarity scores

Given a query sentence, we search and find similar sentences in our corpus using transformer-based models for semantic textual similarity. For one query sentence, we might get 200 similar sentences with scores ranging from 0.95 to 0.55. For a…

nlp transformer semantic-similarity

asked Sep 13 '22 at 07:41

DarknessPlusPlus

215
2
10

3

votes

1 answer

How to choose similarity measurement between sentences and paragraphs

Problems 1. How to find appropriate measurement method There are several ways to measure sentence similarities, but I have no idea how to find appropriate method among them for my data (sentences). Related Question on Stack overflow: is there a way…

python nlp similarity semantic-similarity

asked Feb 07 '22 at 14:47

user120022

2

votes

1 answer

Cluster words into groups of similar meaning (synonyms)

How can words be clustered into groups of similar meaning (synonyms)? I started with pre-trained word embeddings (e.g., Google News), which is great, but not perfect - a limitation arises because the word embeddings are based on surrounding words.…

nlp clustering word-embeddings text semantic-similarity

asked May 17 '21 at 15:51

Ben

141
4

2

votes

1 answer

What's the best way to generate similar words?

Hi all I'm fairly up to date with all the NLP tasks out there (nlpprogress.com, paperswithcode.com) and great tools like (nltk, flair, huggingface etc). I want to take a single word, and predict a similar word, a little like the old "Google Sets"…

semantic-similarity

asked Mar 08 '21 at 23:10

Julian H

123
4

2

votes

1 answer

If i use use BERT embeddings for if cosine(sent1,sent2) > 0.9, then is it fair to assume s1 and s2 are similar

According to BERT author Jacob Devlin: I'm not sure what these vectors are, since BERT does not generate meaningful sentence vectors. It seems that this is doing average pooling over the word tokens to get a sentence vector, but we never suggested…

nlp bert cosine-distance semantic-similarity

asked Oct 12 '20 at 13:16

user2478236

123
4

2

votes

2 answers

Semantic Search

There is a problem we are trying to solve where we want to do semantic search on our set of data, i.e we have a domain specific data (example: sentences talking about automobiles) Our data is just a bunch of sentences and what we want is to give a…

unsupervised-learning word-embeddings similarity similar-documents semantic-similarity

asked Feb 12 '20 at 10:08

Farhaan Bukhsh

31
3

2

votes

0 answers

in RAG, for large dataset, which similarity works? Why? how to handle problem with size of matrix in cosine similarity?

If we want to implement RAG for large dataset, which similarity works? Why? Also, how to handle problem with size of matrix in cosine similarity?

similarity llm semantic-similarity rag vector-database

asked Feb 11 '25 at 05:15

user10296606

1,906
6
18
33

Questions tagged [semantic-similarity]