BlueSpice switches over to Elasticsearch

Diego Delso, Biblioteca Vasconcelos, Ciudad de México, CC BY-SA 4.0, via Wikimedia Commons.

Diego Delso, Biblioteca Vasconcelos, Ciudad de México, CC BY-SA 4.0, via Wikimedia Commons.

We are currently fitting out the new Version 3 of BlueSpice with a new search engine. So this is a good opportunity to explain what actually happens in a search engine and why we have decided to make this change.

The limits of Apache Solr

BlueSpice has been using the search engine Apache Solr for over nine years. Solr is an outstanding, super-fast and scalable search engine which has played a significant role in the success of the MediaWiki distribution BlueSpice. The users have been able to filter search results by facets, search attached files, and the first results appear while they are still typing. Those who saw this search engine working in large companies in 2008 know that the availability of Solr has been a real revolution for most companies.

However, recording a new data structure is laborious, which has always been a disadvantage for us. A little background: we index all searchable contents in full-text with Solr. Alongside this, we record differing metadata (for example categories or authors). In Solr, however, one has to define what metadata will exist in advance. And once we have decided on a schema, then it is not so easy to change it.

That has become increasingly problematic for us as the amount of metadata available has grown via the semantic functions of BlueSpice. A characteristic feature of these semantics is the ability to create and change the data-structures flexibly and to get this structure from the wiki.
This metadata should be able to be searched and queried but we do not know in advance what metadata we will come across in the customer’s wiki. Metadata is very different in different wikis. A wiki for quality management encompasses, for example, operational units which are just not present in another wiki describing car parts. So semantics provides a large dynamic to which the search engine must respond.

We need, in short, a schema-free search function which fixes much less. A schema-free search function can index the most varied metadata, allowing you to search according to these additional attributes.

Arguments for Elasticsearch

The new search engine Elasticsearch will offer us these possibilities of a schema-free search function. Apache Solr is developing approaches in the schema-free direction but – and this is the second argument – the Elasticsearch ecosystem is much better developed.
Also, we have ascertained that Elasticsearch is much easier to install in our environment. The system is self-contained: one no longer needs an additional Tomcat server, and the set-up process is significantly shorter. Put briefly, you put in Elasticsearch and press the button. Then the wiki connects to Elasticsearch and says “this is roughly how my data looks” and then it works. The whole process takes place over the web and interface. This also makes the system significantly faster.

It was not so important for us that Elasticsearch is also better suited for big data tasks. We are a long way from big data problems with the data of an enterprise wiki. It does not play a large role in our use-cases, but BlueSpice with Elasticsearch does become a compatible application in a big data environment. That opens new user possibilities. It is conceivable, for example, to evaluate data automatically arising in a production process.

These are essentially the reasons why we have decided to change our search engine. We have had the idea for about two years already, but before now there was no actual necessity. And asking the customers, we found that they were not missing anything, thanks to Solr. But for technological development, this step is now right at the top of our agenda.

The real challenge: the conceptual design of the user interface

Elastic opens up an enormous range of possibilities which we also need to bring to the user interface, as the real innovation will be implementing them for the user on the interface. Thus one could prepare data in a dashboard with which new search results for a term (or even better a theme) are shown, extending the portals. One could use the search function to build lists of all pages with particular criteria. One could even introduce functions with the search function which previously have been part of special tools, for example “give me all pages for which my colleague Meier is responsible, and which are approved”. This is thus simpler and faster than was possible with the traditional Semantic MediaWiki mechanism.

This is, however, just a sketch of the possibilities. We will see what actually develops in BlueSpice in the next year or two. But the change over to Elasticsearch is a big step for the continuing development of the BlueSpice software.

One Comment to “BlueSpice switches over to Elasticsearch”

  1. […] my last blog entry, I wrote that the search engine in wikis indexes all searchable content in full-text. Metadata, […]

Leave a Reply to Metadata: the new challenges for wiki search engines - BlueSpice Blog