Questions tagged [semantic-similarity]
60 questions
5
votes
3 answers
where to store embeddings for similarity search?
I've asked on stackoverflow already (here), but I figured that the approach of storing embeddings in an ordinary postgres-Database might be flawed from the very beginning. I will shortly etch out the application again:
text corpora (few hundred…
Angus
- 51
- 1
- 2
4
votes
1 answer
How to build recommendation model based on resume and job description?
How to build a model which will result in better recommendation of resumes based on the job description given?
I am familiar with bow or tfidf (n-grams) approach and then take a cosine similarity but I'm looking for a deep learning approach. I don't…
user_12
- 347
- 3
- 10
3
votes
1 answer
Evaluation metric for Information retrieval system
I am currently reading Semantic Product Search paper published by Amazon. They are using two evaluation subtasks matching and ranking. In matching, they tune the model hyperparameters to
maximize Recall@100 and Mean Average Precision…
Sayali Sonawane
- 2,101
- 3
- 13
- 13
3
votes
1 answer
Semantic network using word2vec
I have thousands of headlines and I would like to build a semantic network using word2vec, specifically google news files.
My sentences look like
Titles
Dogs are humans’ best friends
A dog died because of an accident
You can clean dogs’ paws using…
Math
- 161
- 1
- 13
3
votes
1 answer
Why do semantically different words produce similar embeddings?
I am comparing words in HuggingFace web UI using e5-small-v2, one of the best vector embedding models:
Assuming that the scores are in the range from 0 to 1, how come all the scores are so high? In fact, I was not able to produce any example with a…
AlwaysLearning
- 141
- 2
3
votes
2 answers
Semantic search - combine text and image embedding
I have a question regarding combining text and image embeddings for semantic search. The use case is product search on a (B2B) marketplace, we have image(s) and title&description of the products. I want to allow the user to search both the image and…
Steven
- 31
- 1
- 4
3
votes
1 answer
Is there a reference dataset for contextual similarity?
I'm doing some experiments with word embeddings to try to capture context-aware similarity, so that for example the word pair apple - hardware, are very dissimilar in the context of a fruit store, but very similar in an IT context.
My question is if…
Jorgemar
- 241
- 1
- 5
3
votes
1 answer
How do we evaluate the outputs of text generation models?
Evaluation of a wide variety of natural language generation (NLG) tasks is difficult. For instance, for a question answering model, it is hard for a human to quantify how well the model has answered a particular question. Doing this at scale is even…
Greggs
- 131
- 1
- 3
3
votes
1 answer
Threshold determination / prediction for cosine similarity scores
Given a query sentence, we search and find similar sentences in our corpus using transformer-based models for semantic textual similarity.
For one query sentence, we might get 200 similar sentences with scores ranging from 0.95 to 0.55.
For a…
DarknessPlusPlus
- 215
- 2
- 10
3
votes
1 answer
How to choose similarity measurement between sentences and paragraphs
Problems
1. How to find appropriate measurement method
There are several ways to measure sentence similarities, but I have no idea how to find appropriate method among them for my data (sentences).
Related Question on Stack overflow: is there a way…
user120022
2
votes
1 answer
Cluster words into groups of similar meaning (synonyms)
How can words be clustered into groups of similar meaning (synonyms)?
I started with pre-trained word embeddings (e.g., Google News), which is great, but not perfect - a limitation arises because the word embeddings are based on surrounding words.…
Ben
- 141
- 4
2
votes
1 answer
What's the best way to generate similar words?
Hi all I'm fairly up to date with all the NLP tasks out there (nlpprogress.com, paperswithcode.com) and great tools like (nltk, flair, huggingface etc). I want to take a single word, and predict a similar word, a little like the old "Google Sets"…
Julian H
- 123
- 4
2
votes
1 answer
If i use use BERT embeddings for if cosine(sent1,sent2) > 0.9, then is it fair to assume s1 and s2 are similar
According to BERT author Jacob Devlin: I'm not sure what these vectors are, since BERT does not generate meaningful sentence vectors. It seems that this is doing average pooling over the word tokens to get a sentence vector, but we never suggested…
user2478236
- 123
- 4
2
votes
2 answers
Semantic Search
There is a problem we are trying to solve where we want to do semantic search on our set of data,
i.e we have a domain specific data (example: sentences talking about automobiles)
Our data is just a bunch of sentences and what we want is to give a…
Farhaan Bukhsh
- 31
- 3
2
votes
0 answers
in RAG, for large dataset, which similarity works? Why? how to handle problem with size of matrix in cosine similarity?
If we want to implement RAG for large dataset, which similarity works? Why?
Also, how to handle problem with size of matrix in cosine similarity?
user10296606
- 1,906
- 6
- 18
- 33