How does google searches so fast




















When your query gets to the datacenter, it is assigned to one master server that breaks the job apart and assigns lookup tasks to a number of worker servers.

The results get returned to the master server which organizes and sorts them and then sends the results back to your browser.

When you perform a search, you might be using the capabilities of a dozen or even two dozen servers…and those servers have the capacity to handle multiple lookups like this simultaneously…and there are hundreds of servers in the datacenter…and a few dozen datacenters around the world.

All the combined and Google has some serious lookup horsepower at its disposal. This software ranges from the filesystem itself—called GFS or Collosus—to the spiders that crawl the web, to the database management systems, to specialized programming languages for creation and control of these new software packages.

Every ounce of this software is designed to increase speed and decrease the amount of time it takes to return your search results.

One other thing to remember is that Google is under no obligation to provide the most accurate, most consistent, or the most up-to-date results. Because we typically find an acceptable set of results within an acceptable time frame when performing searches, in a certain way, we as users place a blind trust in a web search process.

We just accept at face value that the search results we see are the very best search results possible—even if there are more accurate results available. Jerod consults with internal teams and external clients on all manner of technical projects, manages the flow of information surrounding the company's online objectives, manages relationships with external partners and suppliers, and is a constant bother to everyone in terms of maintaining online security. Web More Posts However, quite some time ago, conventional text search was exhausted.

A new approach was needed that allowed searching large strings in sublinear time, that is, without looking at each single character. It was discovered that this can be solved by pre-processing the large string and building a special index data structure over it. Many different such data structures have been proposed. Each have their strengths and weaknesses but there's one that is especially remarkable because it allows a lookup in constant time. Now, in the orders of magnitude in which Google operates this isn't strictly true anymore because load balancing across servers, preprocessing and some other sophisticated stuff has to be taken into account.

But in the essence, the so-called q-gram index allows a lookup in constant time. The only disadvantage: The data structure gets ridiculously big.

Additionally, there has to be one field for each letter position in the string that was indexed or in the case of google, for each web site. To mitigate the sheer size, Google will probably use multiple indices in fact, they do , to offer services like spelling correction. The topmost ones won't work on character level but on word level instead. This reduces q but it makes S infinitely bigger so they will have to use hashing and collision tables to cope with the infinite number of different words.

On the next level, these hashed words will point to other index data structures which, in turn, will hash characters pointing to websites. Long story short, these q -gram index data structures are arguably the most central part of Google's search algorithm. Unfortunately, there are no good non-technical papers explaining how q -gram indices work.

The only publication that I know that contains a description of how such an index works is … alas, my bachelor thesis. One of the most important delays is webservers is getting your query to the webserver, and the response back. THis latency is bound by the speed of light, which even Google has to obey.

However, they have datacenters all over the world. As a result, the average distance to any one of them is lower. This keeps the latency down. Sure, the difference is measured in milliseconds, but it matters if the response has to arrive within milliseconds.

Everyone knows it's because they use pigeons , of course! They pretty much have a local copy of the internet cached on thousands of PC's on custom filesystems. Google hires the best of the best. Some of the smartest people in IT work at google. They have virtually infinite money to throw at hardware and engineers. An attempt at a generalized list that does not depend on you having access to Google's internal tools :. You can find in the google research homepage some pointers about the research papers written by some of the google guys.

This link it also very informative Behind the scenes of a google query. TraumaPony is right. There was a lot of articles on the net describing google services architecture. I'm sure you can find them via Google :. Map Reduce does not play a role for the search itself, but is only used for indexing. Check this video interview with the Map Reduce inventors.

And algorithms that can harness that hardware power. Like mapreduce for instance. If you are interested in more details about how the google cluster works, I'll suggest this open source implementation of their HDFS. It's based on Mapreduce by google. How exactly all these are done is summarized by all the links that you have in the question summary. Stack Overflow for Teams — Collaborate and share knowledge with a private group. Create a free Team What is Teams? Collectives on Stack Overflow.

Google tries to determine the highest quality answers, and factor in other considerations that will provide the best user experience and most appropriate answer, by considering things such as the user's location, language, and device desktop or phone.

For example, searching for "bicycle repair shops" would show different answers to a user in Paris than it would to a user in Hong Kong. Google doesn't accept payment to rank pages higher, and ranking is done programmatically. Want more in-depth information about how Search works? Read our Advanced guide to how Google Search works. Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4. For details, see the Google Developers Site Policies.

Documentation Not much time? Beginner SEO Get started. Establish your business details with Google.



0コメント

  • 1000 / 1000