Some days ago I had to implement a data transfer for a customer working this way:
A source system (e.g. SAP) provides provides invoices in a relational data fashioned style that looks like that:
So I got two lists of entities. First all invoices (namely the invoice head) below the <Invoices> node and second all invoice positions for all invoices under the <InvoicePositions> node.
But in the target I had to deliver the data in that way:
They want the invoice positions sub ordered below the invoice.
Initial Solution:
To achieve this I build up very quickly a BizTalk map with a custom xslt transformation. A very clean xslt which was working fine. Good.
But the problem…
with this solution was that I just did some small data tests initially. When starting tests with large real data transports my process got so damn slow that it became a “showstopper” at all.
What happened?
With larger sets of data the transformation got slower and slower because the iterations increased by every record dramatically. So if you are a software developer and loop two arrays to become all elements from the first array mapped to the second one you may very well know the problem. The invoices in the source side appear in a average ratio of 1:3 (Invoice : Positions). So for example you have 100 invoices and 300 invoice positions than your xsl processor has to perform 100 x 300 (30.000) rounds to loop all positions for all invoices. Well that sounds still “affordable”.
But lets go ahead increase the number of invoices a bit. Say 1000 invoices. Means: 1000 x 3000 (3.000.000) rounds to go.
In my case the maximum of set of data (full initial data transfer) was 325.000 x 895.000. Which means 290.875.000.000 rounds for mapping data. The transformation (map) run for a bit more than 4 days on my BizTalk server with a CPU usage of nearly 100%. => Total overkill
After some days of reflection I came up with the following…
Solution:
First of all I have to say that this solution is pragmatic and working safe and well but in my personal opinion it is not a very clean and straight xsl alike solution.
To solve my performance issues I created embedded custom code in the xslt template.
In my custom code I created the possibility to create in-memory lookup tables (dictionaries) which are of course pretty faster.
With the new template the transformation performed in nearly under 20 seconds!
Here is a sample of this xslt. Feel free to adopt or copy it.
Any comments about this appreciated.