4

I am looking for the correct model / approach for the task of checking if two sentences have the same meaning

I know I can use embeddings to check similarity, but that is not what I am after. I suspect BERT style LLM have nice higher level vector that mights be useful, but I'm not sure how to apply that.

For example this sentence:

  • I am very lazy

Has a somewhat similar meaning as:

  • I don't like to work hard

But not

  • A lazy horse is not very useful

Using 'just' embeddings (for example HF: allMiniLM-L6-v2) gives results that are not useful.

enter image description here

What would be a good appoarch?

1 Answers1

4

The similarity used to train this model might be different from the similarity you expect.

A better approach would be create your own large and good quality training set of similar and dissimilar sentences and fine-tune a pretrained model (the one from your question or some other) using the same sentence transformers library (https://www.sbert.net).

Another currently available alternative is to play with prompts for the huge commercial models (ChatGPT, GPT-4, Google Bard, etc) and hopefully they can understand what you want and do the task for you without any additional effort. For example ChatGPT said sentence B is more similar to A in 10/10 retries in my test.

Valentas
  • 1,412
  • 1
  • 10
  • 22