JCR is not dead, and neither is CMIS

January 13, 2012
Serge Huber

 

Recently, an article on CMSWire caused quite a stir, mostly because it was asking the controversial question "Is the JCR dead ?". In reply, a few opinions posted by myself and other CMS actors/vendors were quick to appear, but I think some clarification is needed in order to explain what I think is really relevant for developers, integrators and end-users.

The quick answer is : neither JCR nor CMIS are really important for end-users. Fortunately most of them will never have to deal with either, and only integrators and developers will have to bother.

Now these two standards were put in opposition in the article, probably in the hope that the controversy would attract readers, but it actually doesn't make sense to do that. A lot of developers are actually using both, and the only real difficulty in integrating the two is translating queries, but apart from that it maps pretty well.

The JCR is not dead and still relevant, because other standards such as JDBC are not dead and also still relevant. A lot of people wanting to see the JCR die have probably had bad experiences with it, or have moved on to other technologies, but this standard still make a lot of sense in the Java CMS and WCM world. Where else can you get a native language API that offers powerful queries, versioning, flexible content definitions, import/export etc ? Sure CMIS offers some of these, but at the same time you probably wouldn't want to use CMIS in the middle of your Java project. CMIS is mostly oriented towards being a service interface.

One may want to ask the question : if my technology exposes as a service, why even bother with a middleware standard such as JCR ? Mostly because JCR addresses some standard features that are hard to implement well, such as advanced queries or versioning, and standardizing the interface allows for a reliable layer at which to write tests and implement things properly. For example, query parsing and execution can be complex, and having a standard to define it them good both for interoperability, migration and overall quality. This doesn't mean that it doesn't have any drawbacks, as is the case for example in the amount of query languages supported, which really should be reduced in the next version, because it complexifies the implementations. In JCR 2.0 it is possible to query for content using the SQL-2 language or the Abstract Query Model, but also with the SQL-1 or XPath languages that were part of the 1.0 specification and are now deprecated. What this implies is that the default implementation, Jackrabbit, has to support all four of these query systems at this point, and this makes the implementers' job all that more difficult, and optimizations are equally hard to do. This problem is a transitional one, and one can hope it will be resolved in the next version, as the old languages are already deprecated.

Another important and difficult to implement sub-system is versioning. We, at Jahia, have noticed how complex this can become when doing advanced versioning operations on trees. A good example is CVS, which couldn't support moves, because of the complexity that is needed to properly version such operations. It is only much later, and with a different model, that SVN managed to fill the gap. All this to say that standardization of these features allows to make sure that they are well defined, and that JCR users can rely on such features to build value on top of it. It also means that in the case of open source implementations of the JCR, people may collaborate to develop and maintain such complex code. The alternative is to redevelop this all from scratch in a non-standard way, and coming from this world I can tell you it is not the best option for neither the developers, integrators nor the end-users.

The above are just two examples of why this standard actually help build common infrastructure, and make sure that backend features are available. There are many more features that are part of the standard, such as ACLs, content definitions, import/export, observation, workspaces, transactions that are equally part of the infrastructure that most developers want to take for granted, rather than re-implement.

As I have mentioned previously, my biggest gripes with CMIS is that it is too file-oriented, and that it lacks a simple interface. Maybe I should explain what I mean by simple interface. Today, in the world of Ruby on Rails, Apache Sling, Jahia 6.5, Day's Communique, people want simple (yet still powerful) access to their content objects. All the aforementioned tools allow for simple REST HTTP mapping of URIs and attributes to back-end content objects. CMIS 1.0 still requires that you use either ATOM or SOAP to interact with content objects, and even so a lot of wiring is still required. Of course one of the goals of CMIS is that tools and libraries will be available to help with the integration, but it will be hard to beat the simplicity of a simple HTTP POST request to update content.

I have been following the CMIS open-source implementation at Apache, and the efforts are really great, but it hit a major roadblock when it switched implementation in the middle of last year, moving from the old Chemistry codebase contributed by Nuxeo to the one contributed by OpenText. It took a little while to merge all the functionality, and it has now reached a point where it is really interesting, at version 0.2.0. Of course at Jahia we are integrating with Apache Chemistry, but as we had worked with the old code base, we had to restart when the implementation changed. This is probably true of other people working with CMIS. So in no way do I want to diminish the importance of CMIS, but between the wild claims that are out there that CMIS will revolutionize the world and the hard truth of the code available in open source or closed source implementations, there is still a lot of work.

I hope there will be an emerging service standard that will fulfill the promise of being both flexible and easy to integrate, and this ends up to be CMIS so much the better. But let's not forget that before CMIS there was iECM, WebDAV, and before that many more that didn't work because of the complexity of the implementations, or because vendors never really committed to interoperability. It is also hard to keep a standard minimal, and at version 1.0 CMIS is already much more complex than I had hoped it would be.

I have also said that one of the main reasons people are interested in CMIS is that the standard is that Microsoft is onboard with the inclusion of an implementation in Sharepoint 2010. But let's not forget that Microsoft has a pretty bad history of keeping up with standards. One often forgotten example is WebDAV, called Web Folders in Windows. Initially a lot of people were very happy to see Microsoft implement this at the OS level, but very quickly, as the standard evolved, the WebDAV implementation that Microsoft built was not property maintained and therefore not seen as very important for them as a interoperability strategy, especially compared to .NET and SOAP. As times passed, Microsoft started replacing WebDAV with SOAP interfaces, and this is reflected in the current state of the integration between MS Office and Sharepoint.

On the Java side, I believe there is no current standard alternative to the JCR, and the alternative of having vendor lock-in on the lower levels is actually a little scary. We've had a custom built content repository for years before we integrated with the JCR, and maintaining this within our company was not the best way to focus our resources on something that should be common infrastructure, much in the same way we would never think of re-implementing our own database.

When working on a WCM, you really need to be able to handle tree-like structure, with varying properties depending on the position in the tree, with advanced features such as versioning, permissions, locking, structure definitions, in an API that is native to the language in which you are writing your WCM product. This is what the JCR is for and what it is good at. It is NOT a service API. It would be equally crazy to use CMIS within an implementation of a WCM, because this means that each access to a content object would have to go through layers of transformations to handle the ATOM or SOAP calls. But if you are working on integrating loosly coupled data repositories, with a lower volume of calls, then using CMIS makes a lot more sense. This is why CMIS works well with files, but not as well for generic content objects.

So in conclusion, I think that a lot of noise was generated around all this, but the good news is that standards do exist and they are alive and well, and let's hope that their evolution will make them even better. But the real interesting work is above all this, to make content easier to generate, curate, share and retrieve.

Author : Serge Huber
Back