An efficient search feature is a mandatory function for any advanced DXP.
The search feature is a basic function in a website. But it is widely used by people looking for the content or products they want to read, see or purchase. Any advanced Digital Experience Platform (or DXP) must provide a state of the art search engine in order to satisfy this basic user need. We interviewed François Pral, Product Owner for Jahia DX, our open source DXP solution about the coming integration of Elasticsearch, a powerful open source search engine.
Can you explain the advantages of an Elasticsearch server running in close connection with Jahia DX server?
Elasticsearch is one of the most powerful open source search engine available on the market. Using it with Jahia DX will improve search performances as the request will be handled by a dedicated ElasticSearch server and not the Jahia DX one. We also expect more relevant results: where the current search engine included within Jahia DX is content-based, our ElasticSearch search provider is page-based. The behavior and results are closer to customer expectations: more “Google-like”, to be clear.
In which repositories does Elasticsearch perform its searches? Is it able to efficiently search within external data repositories as well as Jahia DX’s?
Jahia DX indexes both its default and live repositories in the ElasticSearch server. Thus public users will be able to perform search in the published version of a site as well as content editors and administrators will be able to conduct a private search in its preview mode. The indexation of external data repositories mounted in a Jahia DX system is also included in the project. This will allow our customers to have a federated search and consistent results throughout their daily digital environment. For instance it will be possible to merge in the results Jahia DX contents and product pages in an eCommerce website.
What kind of limitations on external data repositories searches may be expected?
In order to have an external data repository as part of the search result, we have to implement the corresponding methods in order to properly index such data. The limitations are the same as what we currently have with any External Data Provider: the permissions used will be the ones set on Jahia DX to access the repository. The search will be performed on the accessible part of the external data repository through Jahia DX.
Does Elasticsearch target all different contents stored in a repository? Are searches restrictions allowed? Is there a setting panel?
The list of indexed nodetypes is defined in a specific administration panel. Therefore, it is possible to index editorial contents and pages, as well as files (documents, images, etc.). Note that Elasticsearch also indexes the content of the documents, so one can retrieve a document by searching for its content.
How does Elasticsearch deal with duplicate content?
Elasticsearch is a search feature. It does not perform a semantic analysis of the content and is not able to identify duplicate content within a Jahia DX website. If similar contents are stored and displayed on several pages of a Jahia DX-based website, a search performed using Elasticsearch on part of these contents will display all the relevant pages.
Does Elasticsearch perform its searches within Jahia DX’s content real-time or does it rely on an indexation engine?
Elasticsearch is a near real-time search engine. It means that if you update a content in Jahia DX, the content will not be instantly indexed in the elasticsearch server. However, it's almost immediate, that's why it's called "near real-time".