- Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.
Apache Lucene is an open source project available for free download from Apache Jakarta. Please use the links on the left to access Lucene.
- What is a BDDBot, you ask? BDDBot is a web robot, search engine, and web server written entirely in Java(TM). It was written by Tim Macinta for his book (co-authored with Wes Sonnenreich), a Web Developer's Guide to Search Engines by Wiley Publishing. It was written as an example for a chapter on how to write your search engines, and as such it is very simplistic.
- Carrot2 is a research framework for experimenting with automated querying of various data sources (such as search engines), processing search results and their visualization.
Under the term "research", we understand that the architecture of the system is oriented mostly toward flexibility, sometimes at a price of performance losses. Mechanisms such as data exchange via XML language, dynamically loaded components accessible via HTTP protocol, the use of Java as primary language of implementation -- they all make the system very easy to tailor to one's needs.
Carrot2 was primarily built with search results clustering in mind, but it can be easily configured to do other, interesting things.
- egothor is an Open Source, high-performance, full-featured text search engine written entirely in Java. It is technology suitable for nearly any application that requires full-text search, especially cross-platform. It can be configured as a standalone engine, metasearcher, peer-to-peer HUB, and, moreover, it can be used as a library for an application that needs full-text search.
There are two main branches of the product:
- 1.x branch is public and is developed on egothor.sf.net
- 2.x branch is available for beta testers and developers of separate
modules and is not public.
This branch is a complete rewrite of the core algorithms.
- eXist is an Open Source native XML database featuring efficient, index-based XQuery processing, automatic indexing, extensions for full-text search, XUpdate support and tight integration with existing XML development tools. The database implements the current XQuery 1.0 working draft as of November, 2003 (for the core syntax, some details already following later versions), with the exception of the XML schema related features.
XQuery support in eXist makes it possible to write entire web applications with just XQuery and XSLT. XQuery files can be directly passed to the database, using either the XQueryServlet, the XQueryGenerator for Cocoon or the REST-style API
- JoSQL (SQL for Java Objects) provides the ability for a developer to apply a SQL statement to a collection of Java Objects. JoSQL provides the ability to search, order and group ANY Java objects and should be applied when you want to perform SQL-like queries on a collection of Java Objects.
- JXTA Search is a distributed search system, designed for peer to peer networks and web sites.
JXTA™ technology is a set of open protocols that allow any connected device on the network ranging from cell phones and wireless PDAs to PCs and servers to communicate and collaborate in a P2P manner.
JXTA peers create a virtual network where any peer can interact with other peers and resources directly even when some of the peers and resources are behind firewalls and NATs or are on different network transports.
- MG4J (Managing Gigabytes for Java) is a collaborative effort aimed at providing a free Java implementation of inverted-index compression techniques; as a by-product, it offers several general-purpose optimised classes, including fast & compact mutable strings, bit-level I/O, fast unsychronised buffered streams, (possibly signed) minimal perfect hashing for very large strings collections, etc.
- Nutch is a nascent effort to implement an open-source web search engine.
Web search is a basic requirement for internet navigation, yet the number of web search engines is decreasing. Today's oligopoly could soon be a monopoly, with a single company controlling nearly all web search for its commercial gain. That would not be good for users of the internet.
Nutch provides a transparent alternative to commercial web search engines. Only open source search results can be fully trusted to be without bias. (Or at least their bias is public.) All existing major search engines have proprietary ranking formulas, and will not explain why a given page ranks as it does. Additionally, some search engines determine which sites to index based on payments, rather than on the merits of the sites themselves. Nutch, on the other hand, has nothing to hide and no motive to bias its results or its crawler in any way other than to try to give each user the best results possible.
- Oxyus is an open source search engine written in 100% Java.
Oxyus is aimed to provide that search button to your website in an easy way.
- XQEngine is a full-text search engine for XML documents. Utilizing XQuery as its front-end query language, it lets you interrogate collections of XML documents for boolean combinations of keywords, much as Google and other search engines let you do for HTML. XQuery, however, provides much more powerful search capabilities than equivalent HTML-based engines, since its XPath component lets you specify constraints on attributes and element hierarchies, in addition to the specific word content you're searching on. Refer to the W3C's XML Query website to see what the W3C and other vendors are doing with XQuery and XPath.
XQEngine is a compact (roughly 300K) embeddable component written in Java. It's not a standalone application and requires a reasonable amount of Java programming skill to use. It has a straightforward programming interface that makes that fairly easy to do. It should work well as a personal productivity tool on a single desktop, as part of a CD-based application, or on a server with low to moderate traffic.
- Zilverline is a search engine that offers web access to your personal or intranet content.
Zilverline is a Lucene Desktop comparable to Google Desktop, but based on Lucene.
Zilverline supports collections: a set of files and directories in a directory. Zilverline extracts content from PDF, Word, Excel, Powerpoint, RTF, txt, java, CHM as well as zip, rar, and many other archives. A collection can be indexed, and searched. The results of the search can be retrieved from local disk or Intranet. Files inside zip, rar, chm and other archives are extracted during indexing, and can be preserved for searches. Otherwise they are extracted on-the-fly.