In this section we will discuss about the search engine, and then show you how you can develop your own search engine for your website in Java technologies. We will be using Hibernate Search for developing the search engine.
What is a Search Engine?
A Search Engine is typically a program that based on some specific keywords enabling the retrieval of some data or documents from the databases over the World Wide Web. Search engine uses a special automated program called a Spider or a Web Crawler which fetches web pages in a methodical manner and build a database of such web pages. The documents are then added to the search index and when an user input a keyword in the search engine, the spider search it against the index and retrieves the results accordingly. Google, Alta Vista are etc. are some examples of web search engines.
The frequently and mostly used search is based on a keyword search. In full-text search, all the words in the input are taken into account except the most common ones like “a”, “an”, “the”, “www” etc.
Web crawler provides up-to-date data and manages the web pages by keeping track of them for further processing which makes the search faster. On the basis of the words contained in a document an Indexer makes an index of the documents. This helps in easier search of a web page. Search engine applies a proprietary algorithm for making an index so as to enable retrieval of only meaningful information.
The results of a search query is different in different search engines just because of the difference in the efficiency of the algorithms being used in respective spiders. The index must have to be always updated for better performance.
Basically there are three types of search engines:
A typical crawler based search engine has three components:
The ranking of the web pages by the search engines (relevancy):
Generally, a search engine retrieves the data for an user which is the most relevant to the subject of search and ranks them automatically in order of relevance. How different search engines retrieves and ranks information is based on the algorithms they use to implement. The description of the algorithms are made secret to the public as a trade policy but the general rules can be listed as below-
About Our search Engine
The Search engine we are developing is power full engine that you can use on your website. There is no crawler included in the search engine. You have to manually add the pages of your website into the index. Or you can develop some automatic program that adds the pages of your website into the full text index.
In the next section we will see how our search engine works.
Recommend the tutorial