codeflood logo

Distributed Authoring

Recently I've been contemplating how Sitecore could handle distributed authoring. This was largely prompted by a request in a demonstration I was giving where the client had multiple offices and wanted each office to be able to author content in Sitecore with the best performance. Depending on your network rules, as long as Sitecore is exposed to the internet, then anyone on the internet can log into Sitecore and author content.

But to give a little more oomph to this example, let me paint you a scenario. Let's say I have a geographically distributed authoring team for my website. My main office is in Melbourne, Australia but there are also satellite offices in San Fransisco, North America and London, England. The offices are all connected by a VPN, but bandwidth is limited. I want to ensure performance for my content authors in each of these locations is as good as possible.

To ensure performance, I'm going to want to deploy Sitecore to each of the locations on the office's internal network. This will give the best performance because I'll have minimal network lag, especially for the offices with limited bandwidth. The main Sitecore install will be at the Melbourne office. This is also where the live website is hosted. Content is published in Melbourne from the main content server to the production Sitecore instance in the DMZ.

Each office has it's own website in Sitecore which they are responsible for maintaining. There is also a pool of shared content that is maintained by the main office that the satellite sites can use which contains things like press releases and common articles.

So how do we get the content my satellite content authors are creating back to the main Sitecore instance?

What about SQL replication? SQL replication can be setup on a schedule or to trigger whenever the database is updated. We wouldn't want to use the trigger approach as the bandwidth of the networks would be used up and other services might suffer. So the schedule approach looks to be the better option. The issue here though is that when data is updated on the target, the target instances cache needs to be invalidated. The Sitecore APIs contain a lot of caching to improve performance and reduce load on the database server. If I'm doing distributed development, then the cache issue isn't really an issue for developers. When developing, the application is constantly restarting as code updates so the cache is constantly being invalidated. For development, SQL replication can work. But in production we don't expect the application to be restarting and I don't know how to get SQL to call our code to invalidate the cache on the target instance. So for distributed authoring the SQL replication option is out.

Instead, let's look at Sitecore's architecture. A Sitecore instance can contain any number of databases, some of which are publishing targets, meaning the database can be published to. Our offices in the above scenario are all connected by a VPN, so we can make SQL connections from one Sitecore instance to another in another office. Then we can let Sitecore take care of publishing the data to the target DB, and the staging module can be used to clear the target instance's cache. All looking good so far.

The issue we now face is to do with multiple publishing sources. When publishing content, depending on the item and publishing mode Sitecore will delete any items in the target that don't exist in the source. This would be the case for when an item is deleted, but is also relevant when you have multiple publishing sources for the single target. Mark Cassidy did a post on this a little while ago.

As Mark reveals, the RemoveUnknownChildren processor in the publishItem pipeline is responsible for deleting items in the target database that don't exist in the source. If we removed this processor we would be able to publish from multiple sources to a single target, but we won't be able to delete ANY items. So we know we don't want to remove this processor.

This is where we'll need to implement a little tweak to alter Sitecore's behavior. We don't want to change the publishing model to not delete unknown target items or you'd never be able to delete any items from the content tree. Instead, we'll delegate subtrees of the content tree to be owned by the separate offices and filter the publishing operations to only allow publishing of content they own. This will also reduce the amount of bandwidth required to perform a publish and improve performance.

I mentioned above that each office has it's own site. The logical content structure to accommodate these sites is in separate folders underneath the content item. We'll also have a shared content folder owned by the main office for common assets to be used across all the sites such as media releases. This structure looks like the following:

sites

To facilitate our filtering publish, all we have to do is alter the PublishItem pipeline, which is run for each item in a publishing operation. Inside this pipeline we can perform our path filtering so only content owned by the current Sitecore instance (in a particular location) will be published.

Looking through the pipeline you'll eventually find the DetermineAction processor. This processor is responsible for determining what action should be taken with the current item being published. One of the available actions is to perform no action, so this would be ideal for filtering items out of the publish. Seeing as though the sites all reside below their individual folders it makes sense to filter based on an item's path. To allow maximum flexibility we're going to require an includes property for our processor, an excludes property and a publishing target property. The publishing target property allows me to only filter publish when publishing back to the main office, but publish normal when publishing locally, so I can view the content properly before pushing to live. I'll also make each of these properties a list to allow finer grained control.

Let's go ahead and create our custom DetermineAction processor which inherits from Sitecore.Publishing.Pipelines.PublishItem.DetermineAction. In addition to the properties, we'll need to override the Process method to perform the filtering within. We'll only deal with our filtering in our custom processor and leave the rest up to the normal processor.

using System.Collections;
using Sitecore.Data.Items;
using Sitecore.Publishing.Pipelines.PublishItem;

namespace Sitecore.Starterkit.Sandbox
{
  public class FilteringDetermineAction : DetermineAction
  {
    ArrayList m_includes = null;
    ArrayList m_excludes = null;
    ArrayList m_targets = null;

    public ArrayList Includes
    {
      get { return m_includes; }
    }

    public ArrayList Excludes
    {
      get { return m_excludes; }
    }

    public ArrayList Targets
    {
      get { return m_targets; }
    }

    public FilteringDetermineAction()
    {
      m_includes = new ArrayList();
      m_excludes = new ArrayList();
      m_targets = new ArrayList();
    }

    public override void Process(PublishItemContext context)
    {
      if (m_targets.Contains
        (context.PublishOptions.TargetDatabase.Name))
      {
        var publishItem = context.PublishHelper.GetItemToPublish(
          context.ItemId);

        if (publishItem != null)
        {
          // Ensure item is below an includes path
          var include = false;
          for (int i = 0; i < m_includes.Count; i++)
          {
            Item item = 
              context.PublishHelper.Options.SourceDatabase.GetItem(
                m_includes[i].ToString());
            if (item != null && publishItem.Axes.IsDescendantOf(
              item))
            {
              include = true;
              break;
            }
          }

          if (!include)
          {
            context.Action = Sitecore.Publishing.PublishAction.None;
            context.CustomData.Add("publishItemFiltered", true);
            return;
          }

          // Ensure item is not below an excludes path
          for (int i = 0; i < m_excludes.Count; i++)
          {
            Item item = 
              context.PublishHelper.Options.SourceDatabase.GetItem(
                m_excludes[i].ToString());
            if (item != null && publishItem.Axes.IsDescendantOf(
              item))
            {
              context.Action = Sitecore.Publishing.PublishAction.None;
              context.CustomData.Add("publishItemFiltered", true);
              return;
            }
          }
        }
      }

      base.Process(context);
    }
  }
}

We'll also require a custom RemoveUnknownChildren processor. The publishing action to perform on the item being published is determined in our custom DetermineAction processor above, and we set the publishing action to None when we filter the item. The PerformAction processor is what executes the action the DetermineAction processor sets. The default PerformAction processor will still allow actions on child items even if None is set for the publishing action of an item. So to ensure our filter will also be in effect for child items and make sure the RemoveUnknownChildren processor doesn't delete items when a publishing target publishes an item back to it's source, we need to check if the filter was activated. This is why we set the custom data in the processor, so we can check for that in our custom RemoveUnknownChildren processor. This class may look like the below:

using System;
using Sitecore.Publishing.Pipelines.PublishItem;

namespace Sitecore.Starterkit.Sandbox
{
  public class FilteringRemoveUnknownChildren : RemoveUnknownChildren
  {
    public override void Process(PublishItemContext context)
    {
      object flag = context.CustomData["publishItemFiltered"];
      if(flag == null || (bool)flag == false)
        base.Process(context);
    }
  }
}

Now we need to update the web.config to use and configure our custom processors. The below configuration would be for the London server. Open the web.config file and replace the existing <processor type="Sitecore.Publishing.Pipelines.PublishItem.DetermineAction, Sitecore.Kernel"/> with the following:

<processor
  type="Sitecore.Starterkit.Sandbox.FilteringDetermineAction,
    Sitecore.Starterkit">
  <targets hint="list">
    <target>live</target>
  </targets>
  <includes hint="list">
    <include>/sitecore/content/london</include>
  </includes>
</processor>

We also need to configure a publishing target called live (as defined above) with a connection string to the master database on the main server (located in Melbourne). Now when a publish is performed to the "live" database the publish will be filtered to only include items below the /sitecore/content/london item.

The main server in Melbourne is responsible not only for the main website content, but also the shared content and all the other non-content items such as data templates, control registrations (layouts, sublayouts, renderings, etc), etc. So we need the main server to publish the entire content tree out to the satellite offices except the satellite office's content. The web.config on the main server might look like the following:

<processor
  type="Sitecore.Starterkit.Sandbox.FilteringDetermineAction,
    Sitecore.Starterkit">
  <targets hint="list">
    <target>london</target>
    <target>san fransisco</target>
  </targets>
  <includes hint="list">
    <include>/sitecore</include>
  </includes>
  <excludes hint="list">
    <exclude>/sitecore/content/london</exclude>
    <exclude>/sitecore/content/san fransisco</exclude>
  </excludes>
</processor>

We also need to swap the existing RemoveUnknownChildren processor with our custom filtering processor on all servers.

<processor
  type="Sitecore.Starterkit.Sandbox.FilteringRemoveUnknownChildren,
    Sitecore.Starterkit"/>

And don't forget to also configure the staging module to clear the target server's cache on publish.

Now we can publish satellite site content to the main server and publish templates, main and shared content from the main server to the satellite servers. We've achieved distributed content authoring with only a few minor tweaks.

Comments

Hey man, great post! Pretty interesting.
Cheers chris

Hey Ali, Nice approach but i don't really see this working in a real word scenario. You have to remember that most of the time people will be editing the same articles which would cause issues with publishing, versioning etc. You will need something else to handle this maybe events or specific SQL triggers. Could be a nice way to get a new open source model started have a think..

Alistair Deneys

Thanks Guys, Christopher, what kind of issues do you see with the publishing and versioning of the same articles? In these approaches there is only a single server that "owns" the content and only authors logged into that server can publish the content once updated. Authors in other locations can of course only link to content from another server. They can't edit content their server doesn't "own".

Very interesting approach :-)
When I myself puzzled over various ways of solving this issue, one of the approaches I considered was to hook into Sitecore's item:deleted events and have those written to the publishing pipeline as well. I've not investigated, but I assume these aren't being written as standard - as there would be little use for the RemoveUnknownChildren pipeline step then.
In any event, until an official "multiple content masters" version of Sitecore is released, approaching requirements on a case-by-case basis is probably the only way to go - and the case you demonstrate here is definitely one for the bookmarks :-)
Thanks.

Great post. I've been looking for this exact information for a while now. Bookmarked!

Fascinating, I passed this on to a crony of mine, and he actually bought me lunch because I found this for him, so let me rephrase: Thanks for lunch.

Alistair Deneys

Glad to be of service :)

Joel

Very interesting. I wonder though if proxy items and some clever user of remote sitecore databases will also be an option.
For example the London office would create a proxy item for both Melbourne and Shared, those items would just read from the database located in Melbourne. This would give excellent performance when editing local items and perhaps acceptable performance when linking to remote items.

Leave a comment

All fields are required.