XSLT performance when mapping large documents in BizTalk

Recently I had to map a document with many thousand rows. I could not split the document because before I could split it, the document’s nodes had to be sorted.

With such large files you generally test it using a small subset to avoid waiting for maps to complete, I built an XSLT which worked great, I thought.

When you use a select filter such as "not(KeyValue=preceding-sibling::row/ KeyValue)" you end up with a huge performance hit the larger the document gets. My map went from 2 seconds for 50 rows to 10 minutes for a few thousand.

How to improve performance when you have large XML files to map that you can’t split? Try using xsl:key instead, which builds an index of keys from which you can much more efficiently select.

Here is a sample XSLT that demonstrates how to use the xsl:key:


<?xml version="1.0" encoding="UTF-8" ?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns:ns0="http://Conversion.schemas">

    <xsl:output method="xml" indent="no" />

    <xsl:key name="NumberKey" match="/*[local-name()='top' and namespace-uri()='http://biztalk/Conversion.schemas']/*[local-name()='row' and namespace-uri()='']"

        use="keyValue" />

    <xsl:template match="/">

        <ns0:Rows>

            <xsl:for-each select="/*[local-name()='top' and namespace-uri()='http://biztalk/Conversion.schemas']/*[local-name()='row' and namespace-uri()='' and generate-id(.) = generate-id(key('NumberKey', keyValue)[1])]">

                <xsl:variable name="current_Number" select="keyValue" />

                <Data>

                    <keyValue>

                        <xsl:value-of select="$current_Number" />

                    </keyValue>

                    <xsl:for-each select="//row[keyValue=$current_Number]">

                        <Part>

                            <PartID>

                                <xsl:value-of select="nr_data" />

                            </PartID>

                        </Part>

                    </xsl:for-each>

                </Data>

            </xsl:for-each>

        </ns0:Rows>

    </xsl:template>

</xsl:stylesheet>

No comments: