Highest Voted 'search-engine' Questions - Data Science Stack Exchange

9

votes

3 answers

Why do popular search engines not follow the usual AND, OR logic for queries?

I am teaching myself Information Retrieval from Christopher Manning's book (PDF link: http://nlp.stanford.edu/IR-book/pdf/01bool.pdf). I tried Exercise 1.13: "Try using the Boolean search features on a couple of major web search engines.…

information-retrieval search search-engine

asked Jan 11 '17 at 05:55

user21595

7

votes

3 answers

Best way to vectorise names and addresses for similarity searching?

I have a large dataset of around 9 million people with names and addresses. Given quirks of the process used to get the data it is highly likely that a person is in the dataset more than once, with subtle differences between each record. I want to…

nlp word-embeddings k-nn search-engine elastic-search

asked Mar 02 '22 at 13:39

Sandy Lee

267
2
9

3

votes

2 answers

What ML/DL techniques power Youtube/Netflix search systems?

Video platforms like YouTube, Netflix, Amazon prime have an excellent search system - given a search string, find most relevant videos. Which Machine Learning /Deep Learning techniques used for this? Any pointers will be of great help

machine-learning deep-learning search-engine

asked Jan 11 '18 at 04:00

Anuj Gupta

266
1
10

3

votes

1 answer

Best method for similarity searching on 10,000 data points with 8,000 features each in Python?

As mentioned in the title I am attempting to search through 10,000 vectors with 8000 features each, all in Python. Currently I have the vectors saved in their own directories as pickled numpy arrays. The features were pulled from this deep neural…

python search search-engine

asked Jul 31 '17 at 14:46

Michael Vander Meiden

31
1

2

votes

1 answer

scalable tools to build kNN graph over sparse data

I'm looking for scalable tools to build kNN graph over sparse data points. The dimension and number of data points can be both up to millions. What I have tried already: sklearn.neighbors.kneighbors_graph: which does brute-force search for sparse…

machine-learning data-mining k-nn search-engine

asked Jul 18 '18 at 06:08

xiaohan2012

163
4

2

votes

0 answers

Can Google really bring back billions of results in a blink of an eye (almost)

I was always fascinated by Google's search ability, a great achievement by Google and other search engine providers also, but more so a collective human talent and ability that makes me appreciate our amazing mind and our potential to innovate. I…

bigdata visualization data search-engine

asked Jul 12 '18 at 14:22

Saleh

21
1

2

votes

1 answer

Why don't search engines filter out unethical/illegal searches?

(Not sure if this question is appropriate to this SE) I'm studying the LLMs course on Coursera. One topic they deal with is how to get the LLM to not respond with unethical/illegal things, e.g. if you ask Bing "how do I hack my neighbour's Wifi?",…

reinforcement-learning search-engine

asked Dec 28 '23 at 11:08

Allure

285
2
7

2

votes

1 answer

Learning to Rank with Unlabelled Dataset

I have folder of about 60k PDF documents that I would like to learn to rank based on queries to surface the most relevant results. The goal is to surface and rank relevant documents, very much like a search engine. I understand that Learning to Rank…

nlp xgboost ranking search-engine learning-to-rank

asked Mar 28 '22 at 13:48

amber

51
1

1

vote

1 answer

How does Google's 'showing results for' work?

If I search 'I love to eate my food' on Google then Google will 'show results for' I love to eat my food.... How does this algorithm work?

machine-learning deep-learning algorithms google search-engine

asked May 27 '21 at 04:40

google

19
1

1

vote

0 answers

About Natural Question (NQ) benchmark in NLP

I recently learned that there is a benchmark called NQ. https://ai.google.com/research/NaturalQuestions/visualization Unlike other QA benchmarks which relevant document is povided with query, it has to find information from millions of corpus by…

nlp bert search-engine

asked Feb 22 '21 at 02:29

giniper

11
1

1

vote

1 answer

What is the difference between Okapi bm25 and NMSLIB?

I was trying to make a search system and then I got to know about Okapi bm25 which is a ranking function like tf-idf. You can make an index of your corpus and later retrieve documents similar to your query. I imported a python library rank_bm25 and…

nlp python-3.x information-retrieval search-engine

asked Feb 16 '21 at 08:45

coderina

113
5

1

vote

1 answer

What is the formula and log base for idf?

To calculate tf-idf, we do: tf*idf tf=number of times word occurs in document What is formula for idf and log base: Log(number of documents/number of documents containing the word) Log((1+number of documents)/(1+number of documents containing the…

tfidf search-engine

asked May 14 '20 at 19:18

variable

227
3
10

1

vote

1 answer

Measuring quality of answers from QnA systems

I am having a question answering system which is using Seq2Seq kind of architecture. Actually it is a transformer architecture. When a question is asked it gives startposition and endposition of answer along with their logits. The answer is formed…

bert transformer search-engine question-answering

asked Dec 21 '19 at 15:21

Sandeep Bhutani

914
1
7
26

1

vote

1 answer

An exhaustive, representative test database in phrase search algorithm

For a phrase searching algorithm, imagine the goal is to search for a name phrase and return matched results based on a pre-defined threshold. For example, searching for "Jon Smith" could return "Jon Smith", "Jonathan Smith", "Jonathan David Smith",…

nlp search-engine

asked Jul 25 '19 at 21:32

Xiaohan Du

13
3

1

vote

2 answers

How can I improve the recall of a certain class in a multiclass-classification result

I am working on a multiclass classification which is to assign medical related queries of web search to certain departments of hospital.My classifier is based on the fastText. I found for most conditions, the result is good enough say recall is 0.8…

nlp multiclass-classification search-engine

asked Jan 15 '19 at 04:02

leakey

13
6

Questions tagged [search-engine]