XPath Magic
Recently, I was dealing with an XSLT stylesheet at work where our CMS (Cascade Server) runs mostly off of XML and XSLT for its theme/content development functionality. My project was to create an RSS feed for a list of files (for a gallery template). Most of it was pretty simple, but I got hung up on a somewhat unique problem. Allow me to describe it in detail.
The CMS utilizes a built-in “index block” function that outputs XML data based on several options. This XML is structured according to the CMS’s specification, so there’s no way to modify the XML to the degree that I want — I just have to run the XSLT to pull out the data I want. This works fine except for when you’re trying to use <xsl:sort /> tag inside a for-each loop and access un-nested XML within that sort directive. Is that a little unclear? It’s worse than you think it is.
Here’s an example of the XML I was dealing with.
<data>
<system-index-block current-time="1267845784864" name="galleries" type="folder">
<system-page current="true" id="d2015499ac100f91009f8ac15ab8a50c">
<name>gallery.rss</name>
<is-published>true</is-published>
<last-published-on>1267825368024</last-published-on>
<last-published-by>jrfenocc</last-published-by>
<title>Gallery</title>
<author>Jona</author>
<description>Gallery of images.</description>
<display-name>Gallery</display-name>
<path>/Web Site/galleries/gallery.rss</path>
<created-by>jrfenocc</created-by>
<created-on>1266243687406</created-on>
<last-modified-by>jrfenocc</last-modified-by>
<last-modified>1267825336186</last-modified>
<system-data-structure definition-path="Gallery Bank-new">
<GalleryBank>
<pagesize>2</pagesize>
<sort>Randomize</sort>
<Gallery>
<title>B - Photo 2</title>
<image>
<content />
<path>/Web Site/galleries/gallery/081107_2 INvesiture
Ceramony_689.jpg</path>
<name>081107_2 INvesiture Ceramony_689.jpg</name>
</image>
<description>photo 2</description>
<category>My Category</category>
</Gallery>
<Gallery>
<title>A - Photo One</title>
<image>
<content />
<path>/Web Site/galleries/gallery/081107_2 INvesiture
Ceramony_505.jpg</path>
<name>081107_2 INvesiture Ceramony_505.jpg</name>
<title>Investitute Ceremony</title>
</image>
<description>photo 1</description>
<category>Some Category</category>
</Gallery>
<Gallery>
<title>C - Photo 3</title>
<image>
<content />
<path>/Web
Site/galleries/gallery/09122_9163_Community_Leaders_Breakfast_060.jpg</path>
<name>09122_9163_Community_Leaders_Breakfast_060.jpg</name>
</image>
<description>photo 3</description>
<category>My Category</category>
</Gallery>
<Gallery>
<title>D - Photo 4</title>
<image>
<content />
<path>/Web
Site/galleries/gallery/09122_9163_Community_Leaders_Breakfast_080.jpg</path>
<name>09122_9163_Community_Leaders_Breakfast_080.jpg</name>
</image>
<description>photo 4</description>
<category>Some Category</category>
</Gallery>
</GalleryBank>
</system-data-structure>
</system-page>
<system-folder id="24dd22e3ac100f91009f8ac1daa42427">
<name>gallery</name>
<is-published>true</is-published>
<title>none</title>
<path>/Web Site/galleries/gallery</path>
<created-by>jrfenocc</created-by>
<created-on>1267633823300</created-on>
<last-modified-by>jrfenocc</last-modified-by>
<last-modified>1267633823300</last-modified>
<dynamic-metadata>
<name>displayInMenu</name>
<value>Yes</value>
</dynamic-metadata>
<system-file id="24dd2c8bac100f91009f8ac152626d31">
<name>081107_2 INvesiture Ceramony_505.jpg</name>
<is-published>true</is-published>
<title>Investitute Ceremony</title>
<path>/Web Site/galleries/gallery/081107_2 INvesiture
Ceramony_505.jpg</path>
<created-by>jrfenocc</created-by>
<created-on>1267633826933</created-on>
<last-modified-by>jrfenocc</last-modified-by>
<last-modified>1267646736784</last-modified>
<file-size>217550</file-size>
</system-file>
<system-file id="24dd2d81ac100f91009f8ac11f8a1660">
<name>081107_2 INvesiture Ceramony_689.jpg</name>
<is-published>true</is-published>
<path>/Web Site/galleries/gallery/081107_2 INvesiture
Ceramony_689.jpg</path>
<created-by>jrfenocc</created-by>
<created-on>1267633827183</created-on>
<last-modified-by>jrfenocc</last-modified-by>
<last-modified>1267633827183</last-modified>
<file-size>116833</file-size>
</system-file>
<system-file id="24dd2e4eac100f91009f8ac19426e04c">
<name>09122_9163_Community_Leaders_Breakfast_060.jpg</name>
<is-published>true</is-published>
<path>/Web
Site/galleries/gallery/09122_9163_Community_Leaders_Breakfast_060.jpg</path>
<created-by>jrfenocc</created-by>
<created-on>1267633827386</created-on>
<last-modified-by>jrfenocc</last-modified-by>
<last-modified>1267633827386</last-modified>
<file-size>172528</file-size>
</system-file>
<system-file id="24dd2f1eac100f91009f8ac1b515dbaf">
<name>09122_9163_Community_Leaders_Breakfast_080.jpg</name>
<is-published>true</is-published>
<path>/Web
Site/galleries/gallery/09122_9163_Community_Leaders_Breakfast_080.jpg</path>
<created-by>jrfenocc</created-by>
<created-on>1267633827591</created-on>
<last-modified-by>jrfenocc</last-modified-by>
<last-modified>1267633827591</last-modified>
<file-size>172425</file-size>
</system-file>
<system-folder id="24dd7ae8ac100f91009f8ac1d3c45b3f">
<name>thumbs</name>
<is-published>true</is-published>
<title>thumbs</title>
<path>/Web Site/galleries/gallery/thumbs</path>
<created-by>jrfenocc</created-by>
<created-on>1267633845813</created-on>
<last-modified-by>jrfenocc</last-modified-by>
<last-modified>1267633845813</last-modified>
<dynamic-metadata>
<name>displayInMenu</name>
<value>Yes</value>
</dynamic-metadata>
</system-folder>
</system-folder>
<calling-page>
<system-page current="true" id="d2015499ac100f91009f8ac15ab8a50c">
<name>gallery.rss</name>
<is-published>true</is-published>
<last-published-on>1267825368024</last-published-on>
<last-published-by>jrfenocc</last-published-by>
<title> Gallery</title>
<author>Jona</author>
<description>Gallery images for the site.</description>
<display-name> Gallery</display-name>
<path>/Web Site/galleries/gallery.rss</path>
<created-by>jrfenocc</created-by>
<created-on>1266243687406</created-on>
<last-modified-by>jrfenocc</last-modified-by>
<last-modified>1267825336186</last-modified>
<system-data-structure definition-path="Gallery Bank-new">
<GalleryBank>
<pagesize>2</pagesize>
<sort>Randomize</sort>
<Gallery>
<title>B - Photo 2</title>
<image>
<content />
<path>/Web Site/galleries/gallery/081107_2 INvesiture
Ceramony_689.jpg</path>
<name>081107_2 INvesiture Ceramony_689.jpg</name>
</image>
<description>photo 2</description>
<category>My Category</category>
</Gallery>
<Gallery>
<title>A - Photo One</title>
<image>
<content />
<path>/Web Site/galleries/gallery/081107_2 INvesiture
Ceramony_505.jpg</path>
<name>081107_2 INvesiture Ceramony_505.jpg</name>
<title>Investitute Ceremony</title>
</image>
<description>photo 1</description>
<category>Some Category</category>
</Gallery>
<Gallery>
<title>C - Photo 3</title>
<image>
<content />
<path>/Web
Site/galleries/gallery/09122_9163_Community_Leaders_Breakfast_060.jpg</path>
<name>09122_9163_Community_Leaders_Breakfast_060.jpg</name>
</image>
<description>photo 3</description>
<category>My Category</category>
</Gallery>
<Gallery>
<title>D - Photo 4</title>
<image>
<content />
<path>/Web
Site/galleries/gallery/09122_9163_Community_Leaders_Breakfast_080.jpg</path>
<name>09122_9163_Community_Leaders_Breakfast_080.jpg</name>
</image>
<description>photo 4</description>
<category>Some Category</category>
</Gallery>
</GalleryBank>
</system-data-structure>
</system-page>
</calling-page>
</system-index-block>
</data>
As you may notice, the way this XML is organized, all of the data we need is stored in a <GalleryBank> tag, and each image in the gallery is wrapped in a <Gallery> tag. Now, to create the RSS feed, these Gallery tags are sifted through using a <xsl:for-each /> loop.
However, notice that when we get to the <image> tag, we are only given 3 child elements with essentially only one piece of data: the file name and path to the image. For this reason, it is impossible to sort by date since the date created or last modified information — although it exists — is not output in the XML by the index block from the CMS! This makes retrieving that data impossible by ordinary means.
The solution to this problem is two-fold. First, the data must be obtained. The only way to get the index block to provide this information is to place all of the necessary images into a sub-directory of the RSS feed page object. So, the structure looks something like this:
The RSS file:
/Web Site Target (root)/galleries/gallery.rss
The images folder:
/Web Site Target (root)/galleries/gallery/
With this structure, we can set the index block in the CMS to step into subdirectories of the RSS feed, which provides all of the information that we couldn’t obtain through the XML of the RSS feed itself (because it only provides the file name and path of the images in question, but no further information).
Of course, the problem is that our XSLT looks like this:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<xsl:stylesheet version="1.0" xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:date-converter="http://www.hannonhill.com/dateConverter/1.0/"
xmlns:fn="http://www.w3.org/2005/xpath-functions"
xmlns:randomizer="http://www.xyz.com/rss/x-randomizer-ns.html"
xmlns:x_gallery="http://www.xyz.com/rss/x-gallery-ns.html"
xmlns:xalan="http://xml.apache.org/xalan"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes" method="xml"/>
<xsl:variable name="baseUri">http://www.xyz.com/</xsl:variable>
<xsl:template match="//system-index-block/system-page[@current='true']">
<xsl:variable name="pgsize"><xsl:value-of select="./system-data-structure/GalleryBank/pagesize"/></xsl:variable>
<xsl:variable name="sortopt"><xsl:value-of select="./system-data-structure/GalleryBank/sort"/></xsl:variable>
<atom:link href="{$baseUri}[system-asset:page]{./path}[/system-asset:page]" rel="self" type="application/rss+xml"/>
<x_gallery:pagination><xsl:value-of select="$pgsize"/></x_gallery:pagination>
<x_gallery:sort><xsl:value-of select="$sortopt"/></x_gallery:sort>
<xsl:if test="$sortopt = 'Title (alphabetical ascending)'">
<xsl:for-each select="./system-data-structure/GalleryBank/Gallery">
<xsl:sort data-type="text" order="ascending" select="./title"/>
<xsl:call-template name="galleryItem"/>
</xsl:for-each>
</xsl:if>
<xsl:if test="$sortopt = 'Title (alphabetical descending)'">
<xsl:for-each select="./system-data-structure/GalleryBank/Gallery">
<xsl:sort data-type="text" order="descending" select="./title"/>
<xsl:call-template name="galleryItem"/>
</xsl:for-each>
</xsl:if>
<xsl:if test="$sortopt = '(no sorting)'">
<xsl:for-each select="./system-data-structure/GalleryBank/Gallery">
<xsl:call-template name="galleryItem"/>
</xsl:for-each>
</xsl:if>
</xsl:template>
<xsl:template name="galleryItem">
<xsl:variable name="itemPath"><xsl:value-of select="./image/path"/></xsl:variable>
<item>
<title><xsl:value-of select="./title"/></title>
<x_gallery:filename><xsl:value-of select="./image/name"/></x_gallery:filename>
<description><xsl:value-of select="./description"/></description>
<content>
<img alt="{./description}" src="{$baseUri}{substring-after($itemPath,'/Web Site/')}"/>
</content>
<category><xsl:value-of select="./category"/></category>
<link>
<xsl:value-of select="$baseUri"/><xsl:value-of select="substring-after($itemPath,'/Web Site/')"/>
</link>
<guid>
<xsl:value-of select="$baseUri"/><xsl:value-of select="substring-after($itemPath,'/Web Site/')"/>
</guid>
<xsl:for-each select="//system-file">
<xsl:if test="./path = $itemPath">
<x_gallery:date-time>
<xsl:value-of select="date-converter:convertDate(number(./last-modified),1)"/>
</x_gallery:date-time>
<pubDate>
<xsl:value-of select="date-converter:convertDate(number(./last-modified))"/>
</pubDate>
</xsl:if>
</xsl:for-each>
</item>
</xsl:template>
<xalan:component functions="convertDate" prefix="date-converter">
<xalan:script lang="javascript">
function convertDate(date,p,cp,ig)
{ ... boring XALAN JS here ... }
</xalan:script>
</xalan:component>
<xalan:component functions="randomize" prefix="randomizer">
<xalan:script lang="javascript">
function randomize (items) {
... randomized sorting in XSL? Another day, perhaps. ;-) ...
}
</xalan:script>
</xalan:component>
<xsl:template match="//system-file|//calling-page/system-page|//system-folder">
</xsl:template>
</xsl:stylesheet>
Notice the lines near the beginning which test for the <sort /> element inside the <GalleryBank /> element. This element is an enumeration that can be one of several values (I’ve only shown 3 of them in the above code). This is fully functional because the data we need to sort by the listed means is available — we have titles for these images (applied by the Data Definition XML) and it’s rather easy to sort by nothing.
In XSL, you cannot use the <xsl: sort /> command unless it immediately follows either <xsl:for-each /> or <xsl:apply-templates />. Furthermore, due to context, you can’t sort an outer for-each loop from inside an inner one. If we are looping through //GalleryBank/Gallery items, we can’t access the “matching” //system-file/last-modified elements because they are not nested within the //GalleryBank/Gallery elements.
In other words, the following won’t sort the way we want:
<xsl:for-each select="./system-data-structure/GalleryBank/Gallery"> <xsl:for-each select="//system-file"> <xsl:sort data-type="text" order="ascending" select="./last-modified"/> <xsl:call-template name="galleryItem"/> </xsl:for-each> </xsl:for-each>
It would sort based on the current last-modified date, which would sort the //system-file XML, but since it isn’t nested as part of //GalleryBank/Gallery, it wouldn’t have any effect on the result — the output would be the same.
So we need another solution, one with only one for-each loop. My first idea was to use the following:
<xsl:for-each select="./system-data-structure/GalleryBank/Gallery"> <xsl:sort data-type="text" order="ascending" select="//system-file/last-modified"/> <xsl:call-template name="galleryItem"/> </xsl:for-each>
Of course, this doesn’t work, either. What’s happening above is that the XSL is sorting the //GalleryBank/Gallery items based on the value of //system-file/last-modified. The problem with this is that the returned result of //system-file/last-modified is all of the last-modified values in all of the system-file elements, and when XPath references multiple elements, only the first one is returned (unless you’re using a for-each loop or providing conditions on which XPath can identify a specific element). In other words, the result of the above code is to sort everything by the same exact date!
How, then, can we access and sort by an un-nested element? The trick lies in the syntax of XPath itself. (Yes, the magical solution is coming soon — you’ve waited this long!)
I ended up using a reference to the last-modified field by using the following sort command.
<xsl:sort data-type="number" order="ascending" select="//system-file/last-modified[parent::*/path = current()/image/path]"/>
What does this mean? Well, when we loop through each Gallery item, we need a way to tie the current Gallery item to the system-file element that is un-nested in the XML data. The only data we have that coincides between one Gallery element and one system-file element is the file’s path element, so we need to compare the current Gallery element’s path to all of the system-file elements’ paths and return the one system-file element’s last-modified element if there is a match. Make sense?
Let’s break down the XPath:
//system-file/last-modified[parent::*/path = current()/image/path]
In English, you would say, select all last-modified elements that are children of system-file elements and whose system-file parent elements’ child path element is equal to the current for-each loop element’s image element’s path element’s value.
Phew, what a mouthful. The key to this function is the use of current() which allows us to identify where we are in our for-each loop of the GalleryBank. We can thus compare the current GalleryBank/Gallery/image/path value of the for-each loop to all of the //system-file/path elements in the XML in one line and then return the last-modified element, which is a sibling of that specified //system-file/path element. Magical, right?
It gets better, since the actual XSLT I ended up with was even more complicated (using substring functions and XALAN JavaScript to process that funky last-modified number into an actual date that can be sorted). But perhaps I’ll explain that a little bit more in the future. ;-)
I won’t go into details regarding what version of XSLT you need and XPath functions namespaces and what have you, but for anyone out there trying to reference un-nested data and use it to sort a different set of XML data, this is your answer! I couldn’t find this anywhere else on the Internet, so hopefully it’ll be useful to someone — and if you DO find it useful or have something to add/optimize, post it in the comments and spread the knowledge! Enjoy!