265

When I search a file on my HD in Windows 7 or Windows XP it takes some minutes to finish the process. If I fill in a search term in Google, the answer is on my screen in milliseconds

How is it possible for Google to search the Internet, which is many times larger than my hard drive, faster than my OS can search my computer? Is it only a matter of computing power and the right algorithm?

nc4pk
  • 9,257
  • 14
  • 61
  • 71
Arne
  • 1,787

10 Answers10

215

Google is not searching the internet: it is searching an index. Google has huge server farms which are constantly scanning and indexing the internet. This process takes a lot of time, just like the search of your unindexed hard drive. In Windows 7, there is an option to index your hard drives. This process takes some time at first but once it is up and running the results of a search will be instantaneous.

If you want to know more about how the Google search works you can read Google's article "How Search Works" or read the article "How Stuff Works: How Google Works".

TRiG
  • 1,360
Simon
  • 3,973
74

Google is like searching the yellow pages for an address (indexed). Windows search is akin to driving around checking numbers on buildings (non-indexed).

Another analogy would be looking through a well organized library and card catalog, or just sorting through an unorganized pile of books every time.

Fundamentally it's all the organizational work done prior to the search that makes it fast.

FYI: When searching indexed locations, windows search can be just as responsive.

Ryan
  • 774
36

Google's business is search (and serving up Ads) and it's very focused on that. There are number of things that Google does to ensure data is returned to you very fast:

  • First it uses MapReduce and PageRank to generate a comprehensive index of the World Wide Web. It updates this regularly so the results are fresh.
  • That index is distributed and replicated across Google's many servers
  • Your query is split across multiple servers to build the returned results. This allows the process to be highly parallelized.
  • Common queries and results are cached, reducing the need to perform the search at all.

See this link for more information about How Search Works

Comparatively a hard drive search without an index has to read through every file on the drive and this can take a lot of time.

Additionally you can think of both a filesystem and an index as a tree. In the filesystem the root of the tree is the top-level folder and it can have branches (folders) or leaves (files) in that one folder. Each branch can have sub-branches for more folders and leaves for more files. To search this structure you have to 'walk' all of the branches (and sub-branches) to find the leaf you are looking for. An index flips this hierarchy around. The base becomes the alphabet and all of the sub-branches further refinements on this. The leaves are the location of the item you are looking for. Searching this structure allows you to prune (exclude) large sections of tree (eg. the first letter of your search term allows you to trim 25 other branches right away).

Brad Patton
  • 10,668
31

About 4 years ago I also asked myself the same question. But as I googled around doing my research I eventually read that besides the fact that they hire the best of the best to come up with some of the most sophisticated search algorithms and all of that.

One of the key design they used is similar to the idea of map reduce I think. You have a lot of cheap computers on farms. Let these computers have only about 80 gig of hard disk space and push hard to have about 16 gig RAM or even better 32 gig RAM on these computers(as much as possible). Remember that they are connected through some sophisticated system they designed. But the key idea here is that when a query is submitted, it is passed into their system where it will try and search the fresh data in RAM. Keep in mind they have a lot of these cheap computers. And since the data is in RAM, it is found a lot faster than it would be on a hard disk. But don't forget that they have a sophisticated(indexing and all those algorithms) system too that help greatly.

And this data doesn't have to be fresh, because we all know that Google stores everything. So as to what should be in RAM, the same principle with splay trees can be used, keep what ever people are searching the most in RAM and flush the least searched stuff to hard disk.

This little idea coupled with their indexing and all the other things others have mentioned in their answers, might be one of the reasons why it is faster than a hard-drive search.

  • The power to predict based on other searches.
  • The data is most likely in RAM which we all know is faster.
  • Use multiple systems to divide and conquer
  • Searching is their main priority.

Of course I could be wrong, but this made sense to me. And I was happy with what I learned.

Touch
  • 419
20

Google uses an extremely sophisticated indexing system, parallel operations, and a number of load balancing techniques not available to a standard standalone computer. there is really very little similarity between a web search and a hard disk file search, and google optimizes heavily for their specific use cases.

Frank Thomas
  • 37,476
4

In 2004, some Google employees published a paper: MapReduce and from that time on they improved that hundreds of times.

Also, they use Google File System(GFS) which is a distributed file system like Hadoop Distribud File System(HDFS) and extremely optimized for their purposes. Also as far as I know, GFS works maybe thousand time faster than HDFS.

smttsp
  • 141
2

I thought I would add to this as I too had this question a while ago and found these great videos which describes what Google do on the surface. Interesting to watch.

Google on Youtube 1
Google on Youtube 2

He goes a little bit deeper but not deep enough that you get lost in technicalities.

Cheers.

Mogget
  • 1,353
2

Just adding something to the wonderful answers here. Google use caching of popular search phrases. The results of these searches reside in a memory. So if you search something that is searched a lot, the results will show up almost immediately.

1

To answer the question on a simplistic level: imagine you have a textbook with a keyword index at the back.

Searching a hard disk (naively, at least) is like going through the book, page by page, scanning each line for an occurrence of your keyword.

Using an Internet search engine is like looking up the keyword in the index, and then turning directly to the page number it gives.

In reality of course, it's a lot more complex than this. For example, you would usually search your hard disk for different kinds of information than the Internet. But the basic thing to take away is that the search engine is using an index. It has already gone through the "book", word by word, and it has compiled a list of those words along with where to find them, and it has organised the list in such a way that it can look up things in it very quickly.

For example, think about the organisation of an index in a book. Firstly, it is usually sorted alphabetically, and secondly it may have letter headings. When you look up a word in the index you can see straight away the list of words beginning with the letter you want. And because the list is sorted, it is easy to find the word you want within the list, or to tell quickly if it is missing.

So to summarize, it's like your hard disk just has a book, while the search engine has the index. Though as some others have pointed out, it's possible to use software to index your hard disk, and then you can use the index instead of the whole thing.

mwfearnley
  • 7,889
-1

I guess one of the reasons Google emerged Auto Complete and used AJAX was speed problem. Now when you are typing, words are sent in background so Google can do part of job while you are not finished yet. Also indices are based on multiple word combinations (which you can find as suggestions at the bottom of page). Currently network speed is higher than hard-drives and probably much of those indices resides in RAM of the servers in their farm.

Xaqron
  • 218