Questions tagged [search-engine]

29 questions
9
votes
3 answers

Why do popular search engines not follow the usual AND, OR logic for queries?

I am teaching myself Information Retrieval from Christopher Manning's book (PDF link: http://nlp.stanford.edu/IR-book/pdf/01bool.pdf). I tried Exercise 1.13: "Try using the Boolean search features on a couple of major web search engines.…
user21595
7
votes
3 answers

Best way to vectorise names and addresses for similarity searching?

I have a large dataset of around 9 million people with names and addresses. Given quirks of the process used to get the data it is highly likely that a person is in the dataset more than once, with subtle differences between each record. I want to…
Sandy Lee
  • 267
  • 2
  • 9
3
votes
2 answers

What ML/DL techniques power Youtube/Netflix search systems?

Video platforms like YouTube, Netflix, Amazon prime have an excellent search system - given a search string, find most relevant videos. Which Machine Learning /Deep Learning techniques used for this? Any pointers will be of great help
Anuj Gupta
  • 266
  • 1
  • 10
3
votes
1 answer

Best method for similarity searching on 10,000 data points with 8,000 features each in Python?

As mentioned in the title I am attempting to search through 10,000 vectors with 8000 features each, all in Python. Currently I have the vectors saved in their own directories as pickled numpy arrays. The features were pulled from this deep neural…
2
votes
1 answer

scalable tools to build kNN graph over sparse data

I'm looking for scalable tools to build kNN graph over sparse data points. The dimension and number of data points can be both up to millions. What I have tried already: sklearn.neighbors.kneighbors_graph: which does brute-force search for sparse…
2
votes
0 answers

Can Google really bring back billions of results in a blink of an eye (almost)

I was always fascinated by Google's search ability, a great achievement by Google and other search engine providers also, but more so a collective human talent and ability that makes me appreciate our amazing mind and our potential to innovate. I…
Saleh
  • 21
  • 1
2
votes
1 answer

Why don't search engines filter out unethical/illegal searches?

(Not sure if this question is appropriate to this SE) I'm studying the LLMs course on Coursera. One topic they deal with is how to get the LLM to not respond with unethical/illegal things, e.g. if you ask Bing "how do I hack my neighbour's Wifi?",…
Allure
  • 285
  • 2
  • 7
2
votes
1 answer

Learning to Rank with Unlabelled Dataset

I have folder of about 60k PDF documents that I would like to learn to rank based on queries to surface the most relevant results. The goal is to surface and rank relevant documents, very much like a search engine. I understand that Learning to Rank…
amber
  • 51
  • 1
1
vote
1 answer

How does Google's 'showing results for' work?

If I search 'I love to eate my food' on Google then Google will 'show results for' I love to eat my food.... How does this algorithm work?
1
vote
0 answers

About Natural Question (NQ) benchmark in NLP

I recently learned that there is a benchmark called NQ. https://ai.google.com/research/NaturalQuestions/visualization Unlike other QA benchmarks which relevant document is povided with query, it has to find information from millions of corpus by…
giniper
  • 11
  • 1
1
vote
1 answer

What is the difference between Okapi bm25 and NMSLIB?

I was trying to make a search system and then I got to know about Okapi bm25 which is a ranking function like tf-idf. You can make an index of your corpus and later retrieve documents similar to your query. I imported a python library rank_bm25 and…
1
vote
1 answer

What is the formula and log base for idf?

To calculate tf-idf, we do: tf*idf tf=number of times word occurs in document What is formula for idf and log base: Log(number of documents/number of documents containing the word) Log((1+number of documents)/(1+number of documents containing the…
variable
  • 227
  • 3
  • 10
1
vote
1 answer

Measuring quality of answers from QnA systems

I am having a question answering system which is using Seq2Seq kind of architecture. Actually it is a transformer architecture. When a question is asked it gives startposition and endposition of answer along with their logits. The answer is formed…
Sandeep Bhutani
  • 914
  • 1
  • 7
  • 26
1
vote
1 answer

An exhaustive, representative test database in phrase search algorithm

For a phrase searching algorithm, imagine the goal is to search for a name phrase and return matched results based on a pre-defined threshold. For example, searching for "Jon Smith" could return "Jon Smith", "Jonathan Smith", "Jonathan David Smith",…
Xiaohan Du
  • 13
  • 3
1
vote
2 answers

How can I improve the recall of a certain class in a multiclass-classification result

I am working on a multiclass classification which is to assign medical related queries of web search to certain departments of hospital.My classifier is based on the fastText. I found for most conditions, the result is good enough say recall is 0.8…
leakey
  • 13
  • 6
1
2