How Does Google Work
For my Internet Technologies class, I presented a paper describing google. In 1998, Sergey Brin and Lawrence Page wrote a research paper entitled The Anatomy of a Large-Scale Hypertextual Web Search Engine*. It describes, in some detail, the original design of the google seach engine. Topics include: • Design goals • scalability • search quality • provide a platform for academic search engine research • Architecture • Components • Crawlers • Indexer • URL Resolver • Sorter • Searcher • Page Rank • Data Structures • Repository • Barrels (Forward / Inverted Index) • Lexicon • Link Database • Document Index • Performance • Storage Requirements • Search Quality • Search Speed Short Summary: The short, short summary is that Google crawls the web, stores a local cache of the pages it finds, and builds a lexicon of common words. For each word, it creates a list of pages that contain that word. A query for a given word returns that list, sorted by pagerank. Pagerank is computed based on the pa
A Basic Understanding. Google indexes pages on the Web by using what are commonly known as “spiders”, “crawlers”, or “robots”. Google’s famous search engine spider, GoogleBot, uses links on web pages as a sort of freeway. It travels from site to site by following links. When Google finds a new web page, Google will “crawl” the code on the page and transport it back to its datacenter. Google’s “FreshBot” may visit “indexed websites” everyday in order to keep the index fresh. How often this is done varies wildly, is often speculated, and varies from site to site. Drive more traffic to your websiteGoogle’s database maintain billions of pages. They use a proprietary formula (or alogorithm) to “score” the relevancy of websites for each search query. The highest ranking, or “most relevant” websites for a specific query are listed first in the search results. Take for example the search query “Tiger Woods”. Imagine Google maintains two pages in it’s index containing the name “Tiger Woods” (in