How-to ensure content integrity

July 27, 2016
Jordann Roussel

One common problem experienced by developers is that modifications made to the definitions might make all previous content invalid. Nodetype definition modifications should be made  cautiously as unwary modifications could lead to corruption in the JCR repository where the content integrity is no longer ensured. More importantly, this lack of integrity could exist silently until detected, e.g. when importing a site.

Jahia provides a best practice guide that provides recommendations to keep  in mind when modifying content definitions. Below is a table that provides comments on the different types of operations:

Type of modification Operation Comment
Namespace Creation Will not create problem
Namespace Deletion Should never be done. Instead of a deletion, stop using the previous namespace
Namespace Modification Should never be done. Instead of a modification, create a new namespace and stop using the previous one. Prefix or URL created before should never be reused.
Node type Creation Will not create problem
Node type Deletion Should never be done before first deleting all the instantiated nodes using this type from templates/sites.
Nodes using the nodetype can be found and deleted using Jahia Tools/JCR Console. It is also possible to script (groovy) this operation.
Node type Modification Renaming a node type is similar to perform a deletion of the previous node type, and creation of a new one
Property of a node type Creation Will not create problem
Property of a node type Deletion Should never be done before having set the property to “null” on all the instantiated nodes, otherwise it will lead to publication issues.
Alternative possibility is to declare this property “hidden”, and cease using it.
Property of a node type Modification Should never be done if there is node instantiated with this property. If necessary, create a new property and refer to “Deletion” section above.

When modifications are performed, it must come in two steps:

  • First, you must prepare your content so it acts accordingly with future definitions
  • When your content is ready, then you can make your definition modifications

We will explain 4 types of modifications/deletions you may encounter during development and how to manage these changes with groovy scripts:

  • Deletion of a nodetype
  • Deletion of a property
  • Adding mandatory constraint to a property
  • Adding regex constraint to a property

There many more modifications you may encounter, but the goal here is to only show you the common modifications and possible approaches to address them.

Before proceeding, if you are not familiar with groovy scripts, there are two ways to execute them:

  • execute them directly from the Tools :
    http://localhost:8080/modules/tools/groovyConsole.jsp
  • place your scripts in the folder /JahiaFolder/digital-factory-data/patches/groovy. If your platform is already running, the script will be executed automatically, otherwise it will be executed at startup, when the JCR context is ready

Practical cases

Deletion of a nodetype

This modification could lead to an issue during a site export and will lead to issue blocking import process on another environment, as this one will not know how to handle this nodetype.

This case is easy to fix, before removing a nodetype from your definitions, you need to query all the instantiated content nodes of this nodetype, and delete them.

The following groovy script will do the trick:

import org.jahia.api.Constants
import org.jahia.tools.patches.LoggerWrapper
import org.jahia.services.content.JCRCallback
import org.jahia.services.content.JCRNodeWrapper
import org.jahia.services.content.JCRSessionWrapper
import org.jahia.services.content.JCRTemplate

import javax.jcr.NodeIterator
import javax.jcr.RepositoryException
import javax.jcr.query.Query

/**
 * Remove a nodetype
 */

final LoggerWrapper logger = log

final String nodeTypeName = "ins:myComponent"

JCRTemplate.getInstance().doExecuteWithSystemSession(null, Constants.EDIT_WORKSPACE, new JCRCallback() {
    @Override
    Object doInJCR(JCRSessionWrapper session) throws RepositoryException {

        final String stmt = "SELECT * FROM [" + nodeTypeName + "] WHERE ISDESCENDANTNODE('/sites')"
        final NodeIterator iteratorSites = session.getWorkspace().getQueryManager().createQuery(stmt, Query
                .JCR_SQL2)
                .execute().getNodes()
        while (iteratorSites.hasNext()) {
            JCRNodeWrapper node = iteratorSites.nextNode() as JCRNodeWrapper

            node.remove()
        }

        session.save()

        return null
    }
})

Deletion of a property

If a property needs to be deleted, it would be wise to delete this property on existing content. Otherwise the export procedures will raise errors, even though these errors will not block the export or import process. This would also raise errors while trying to publish this content.

The following groovy script is simply querying for nodes which have this property before deleting it:

import org.jahia.api.Constants
import org.jahia.services.content.JCRPropertyWrapper
import org.jahia.tools.patches.LoggerWrapper
import org.jahia.services.content.JCRCallback
import org.jahia.services.content.JCRNodeWrapper
import org.jahia.services.content.JCRSessionWrapper
import org.jahia.services.content.JCRTemplate

import javax.jcr.NodeIterator
import javax.jcr.RepositoryException
import javax.jcr.query.Query

/**
 * Remove a property
 */

final LoggerWrapper logger = log

final String nodeTypeName = "ins:myComponent"
final String propertyName = "propertyStringWithI18NToRemove"

JCRTemplate.getInstance().doExecuteWithSystemSession(null, Constants.EDIT_WORKSPACE, new JCRCallback() {
    @Override
    Object doInJCR(JCRSessionWrapper session) throws RepositoryException {

        final String stmt = "SELECT * FROM [" + nodeTypeName + "] WHERE ISDESCENDANTNODE('/sites') AND [" +
                propertyName + "] IS NOT NULL"
        final NodeIterator iteratorSites = session.getWorkspace().getQueryManager().createQuery(stmt, Query
                .JCR_SQL2)
                .execute().getNodes()
        while (iteratorSites.hasNext()) {
            JCRNodeWrapper node = iteratorSites.nextNode() as JCRNodeWrapper

            JCRPropertyWrapper property = node.getProperty(propertyName)
            property.remove()
        }

        session.save()

        return null
    }
})

Adding mandatory constraint to a property

If existing content is empty on this mandatory property, you will need to know how to proceed by choosing whether to:

  • Delete all nodes missing this property and have the nodes recreated by the content editors
  • Add a default value on nodes with this empty property, they will have to be properly filled out later
  • List all nodes that require treatment and allow the content authors to manually enter the value for this property before proceeding with the definition modification

The following script will add a default value as well as list the impacted nodes:

import org.jahia.api.Constants
import org.jahia.tools.patches.LoggerWrapper
import org.jahia.services.content.JCRCallback
import org.jahia.services.content.JCRNodeWrapper
import org.jahia.services.content.JCRSessionWrapper
import org.jahia.services.content.JCRTemplate

import javax.jcr.NodeIterator
import javax.jcr.RepositoryException
import javax.jcr.query.Query

/**
 * Property which will become mandatory
 */

final LoggerWrapper logger = log

final String nodeTypeName = "ins:myComponent"
final String propertyName = "propertyWhichWillBecomeMandatory"
final String defaultValue = "Default value : please replace me"

JCRTemplate.getInstance().doExecuteWithSystemSession(null, Constants.EDIT_WORKSPACE, new JCRCallback() {
    @Override
    Object doInJCR(JCRSessionWrapper session) throws RepositoryException {

        final String stmt = "SELECT * FROM [" + nodeTypeName + "] WHERE ISDESCENDANTNODE('/sites') AND [" +
                propertyName + "] IS NULL"
        final NodeIterator iteratorSites = session.getWorkspace().getQueryManager().createQuery(stmt, Query
                .JCR_SQL2)
                .execute().getNodes()
        while (iteratorSites.hasNext()) {
            JCRNodeWrapper node = iteratorSites.nextNode() as JCRNodeWrapper
            logger.info("Node missing mandatory '" + propertyName +"' property : " + node.getPath())
            node.setProperty(propertyName, defaultValue)
        }

        session.save()

        return null
    }
})

Adding regex constraint to property

It is impossible to solve this directly, the best way here seems to list all the content nodes that do not fulfil the future regex constraint, so that it is possible to edit them before the definition modification.

The following script, more complex, will list nodes using the nodetype and will check the value constraint. This script needs to be executed on a development environment where the constraint has already been added :

import org.apache.jackrabbit.core.value.InternalValue
import org.apache.jackrabbit.spi.commons.nodetype.constraint.ValueConstraint
import org.apache.jackrabbit.spi.commons.value.QValueValue
import org.jahia.api.Constants
import org.jahia.services.content.nodetypes.ExtendedPropertyDefinition
import org.jahia.tools.patches.LoggerWrapper
import org.jahia.services.content.JCRCallback
import org.jahia.services.content.JCRNodeWrapper
import org.jahia.services.content.JCRSessionWrapper
import org.jahia.services.content.JCRTemplate

import javax.jcr.NodeIterator
import javax.jcr.PropertyType
import javax.jcr.RepositoryException
import javax.jcr.Value
import javax.jcr.nodetype.ConstraintViolationException
import javax.jcr.query.Query

/**
 * Property on which we will add a constraint (Regex, range)
 */

final LoggerWrapper logger = log

final String nodeTypeName = "ins:myComponent"
final String propertyName = "propertyWhichWillHaveARegexConstraint"

JCRTemplate.getInstance().doExecuteWithSystemSession(null, Constants.EDIT_WORKSPACE, new JCRCallback() {
    @Override
    Object doInJCR(JCRSessionWrapper session) throws RepositoryException {

        final String stmt = "SELECT * FROM [" + nodeTypeName + "] WHERE ISDESCENDANTNODE('/sites')"
        final NodeIterator iteratorSites = session.getWorkspace().getQueryManager().createQuery(stmt, Query
                .JCR_SQL2)
                .execute().getNodes()
        ExtendedPropertyDefinition propertyDefinition = null
        ValueConstraint[] constraints = null

        while (iteratorSites.hasNext()) {
            JCRNodeWrapper node = iteratorSites.nextNode() as JCRNodeWrapper

            if (propertyDefinition == null) {
                propertyDefinition = node.getApplicablePropertyDefinition(propertyName)
                constraints = propertyDefinition.getValueConstraintObjects()
            }

            InternalValue[] internalValues = null

            // Retrieve value or values
            if (!propertyDefinition.isMultiple()) {
                Value value = node.getProperty(propertyName).getValue()
                InternalValue internalValue = null
                if (value.getType() != PropertyType.BINARY && !((value.getType() == PropertyType.PATH || value.getType() == PropertyType.NAME) && !(value instanceof QValueValue))) {
                    internalValue = InternalValue.create(value, null, null)
                }
                if (internalValue != null) {
                    internalValues = new InternalValue[1]
                    internalValues[0] = internalValue
                }
            } else {
                Value[] values = node.getProperty(propertyName).getValues()
                List<InternalValue> list = new ArrayList<InternalValue>()
                for (Value value : values) {
                    if (value != null) {
                        // perform type conversion as necessary and create InternalValue
                        // from (converted) Value
                        InternalValue internalValue = null
                        if (value.getType() != PropertyType.BINARY
                                && !((value.getType() == PropertyType.PATH || value.getType() == PropertyType.NAME) && !(value instanceof QValueValue))) {
                            internalValue = InternalValue.create(value, null, null)
                        }
                        list.add(internalValue)
                    }
                }
                if (!list.isEmpty()) {
                    internalValues = list.toArray(new InternalValue[list.size()])
                }
            }

            // Check constraints
            if (internalValues != null && internalValues.length > 0) {
                for (InternalValue iValue : internalValues) {
                    // constraints are OR-ed together
                    boolean satisfied = false;
                    for (ValueConstraint constraint : constraints) {
                        try {
                            constraint.check(iValue)
                            satisfied = true
                            break
                        } catch (ConstraintViolationException e) {
                            logger.info("ConstraintViolation on node : " + node.getPath() + " | " + e.message)
                            break
                        }
                    }
                    if (!satisfied) {
                        break
                    }
                }
            }
        }

        return null;
    }
})

Repository desynchronization

If you are doing modifications on content which is already published, make sure to perform the changes on both repositories: live and default. Otherwise, you will create a desynchronisation that could block any publication of this content.

If you encounter a synchronization issue  between workspaces, it is possible to fix it by exporting only the staging site, reimporting it, and publishing it. This fix can only be used if you have no user generated content (content generated only in the live workspace such as forum posts or comments).

N.B.: If you do so, it might be complicated to know if a content was previously unpublished, you could accidentally publish a content still in a draft state

Going further

Manipulation of areas

Another kind of integrity issue you might encounter is related to template areas. The name of an area is used to determine the path of the contentList and the nodes to be stored under it.  If the name is changed in the JSP, the created content will not be automatically moved to the new expected path but will continue to exists under the old path. As the template is not expecting a contentList with this name, the content will not be rendered (making it hidden) and editors will not be able to edit it.

User  or scripted intervention is required to fix this issue by either removing the now unreachable and hidden content  or renaming the node name to match the new area name as specified  in the JSP.

This content will usually not generate issues during export or import operations but it could lead to issues if you decide to modify definitions of children nodes of these hidden nodes.

Furthermore, it is important to consider renaming the node name to match the new path name or revert to the original path name in the JSP used by the template.  This will prevent editors from thinking that the content is missing and that they need to recreate it.

Renaming an area

To make your content visible again, you have to rename all contentlist impacted with the new area name :

import org.jahia.api.Constants
import org.jahia.tools.patches.LoggerWrapper
import org.jahia.services.content.JCRCallback
import org.jahia.services.content.JCRNodeWrapper
import org.jahia.services.content.JCRSessionWrapper
import org.jahia.services.content.JCRTemplate

import javax.jcr.NodeIterator
import javax.jcr.RepositoryException
import javax.jcr.query.Query

/**
 * An area has been renamed
 */

final LoggerWrapper logger = log

final String nodeTypeName = "jnt:contentList"
final String siteKey = "mySiteKey"
final String oldName = "areawhichwillchangename"
final String newName = "arearenamed"

JCRTemplate.getInstance().doExecuteWithSystemSession(null, Constants.EDIT_WORKSPACE, new JCRCallback() {
    @Override
    Object doInJCR(JCRSessionWrapper session) throws RepositoryException {

        final String stmt = "SELECT * FROM [" + nodeTypeName + "] WHERE ISDESCENDANTNODE('/sites/" + siteKey + "') " +
                "AND" +
                " NAME(['"+ nodeTypeName + "'])='"+ oldName +"'"
        final NodeIterator iteratorSites = session.getWorkspace().getQueryManager().createQuery(stmt, Query
                .JCR_SQL2)
                .execute().getNodes()
        while (iteratorSites.hasNext()) {
            JCRNodeWrapper node = iteratorSites.nextNode() as JCRNodeWrapper
            node.rename(newName);
        }

        session.save()

        return null
    }
})

Removing an area

This case is easier to handle, you only have to delete the previous areas :

import org.jahia.api.Constants
import org.jahia.tools.patches.LoggerWrapper
import org.jahia.services.content.JCRCallback
import org.jahia.services.content.JCRNodeWrapper
import org.jahia.services.content.JCRSessionWrapper
import org.jahia.services.content.JCRTemplate

import javax.jcr.NodeIterator
import javax.jcr.RepositoryException
import javax.jcr.query.Query

/**
 * An area has been deleted
 */

final LoggerWrapper logger = log

final String nodeTypeName = "jnt:contentList"
final String siteKey = "mySiteKey"
final String nameOfTheArea = "areatodelete"

JCRTemplate.getInstance().doExecuteWithSystemSession(null, Constants.EDIT_WORKSPACE, new JCRCallback() {
    @Override
    Object doInJCR(JCRSessionWrapper session) throws RepositoryException {

        final String stmt = "SELECT * FROM [" + nodeTypeName + "] WHERE ISDESCENDANTNODE('/sites/" + siteKey + "') " +
                "AND" +
                " NAME(['"+ nodeTypeName + "'])='"+ nameOfTheArea +"'"
        final NodeIterator iteratorSites = session.getWorkspace().getQueryManager().createQuery(stmt, Query
                .JCR_SQL2)
                .execute().getNodes()
        while (iteratorSites.hasNext()) {
            JCRNodeWrapper node = iteratorSites.nextNode() as JCRNodeWrapper
            node.remove();
        }

        session.save()

        return null
    }
})

Checking the content integrity of a website

If you want to check the integrity of your nodes against their definitions, before performing an export for example, we began to develop an unofficial module fulfilling this need :

  • Link to our public Appstore : https://store.jahia.com/contents/modules-repository/org/jahia/modules/verify-integrity.html
  • Github repository : https://github.com/jordannroussel/verify-integrity

For the moment, not every possible issues are being detected by this module, yet we are improving it from time to time.

If you want to add improvements, or report issues, feel free to do it on the Github repository

N.B.: this module is not officially supported by Jahia

Author : Jordann Roussel
Back