I think its very interesting to learn, how Google creates the index and the database of the documents. The following are some of the basic steps of this process…
1. Google creates its own version of the Internet, using automated programmes called “Googlebots“, which crawl the web in search of new information. Web sites known to be important and frequently modified are scanned every few minute; sites less frequently updated may be scanned every few weeks.
2. Googlebots feed key information from a Web page to Google’s central network: URL, full text of the page, references to images and other embedded files and specific information the site owner creates about the page, called metadata
3. At central network the information is indexed; every word that could be used in a search query is listed along with information referencing Web sites where the word can be found.
4. The index is broken into “shards” and send to the data centers of the servers wired together- around the world; because centers may have slightly different versions of the index, depending on when they received the last update, users in different places may get slightly different results for the same search.
Searching and ranking
When the people search Google, they are asking the company to find every instance of the term in its index and rank the corresponding documents by their relevance.
1. The user types a search query; the typical query is two or three words which can make finding the most relevant results challenging; roughly one in 10 queries is misspelled
2. Before Google provides any information, it identifies the searcher’s location through his or her Internet Protocol (IP) address. The IP helps speed up the search by sending the request to the nearest data center and allows the Google to identify geagraphically appropriate ads.
3. The query is sent to the central network then redirected to the nearest data center.
4. At the data center, the search item is run through the index; matching terms are sent back to the central network, then to the user with a summary of the webpage, called a “snippet”.
The “SECRET SAUCE”
Google determines which web sites are more relevant to a search item by using its “secret sauce”, a formula that weights more than 200 measurements, such as the number of times the search item appears on a web page, the number of visitors to the page and the Page Rank- the number of sites linking to the page and the popularity of those sites.
Technorati tags: Google