Wednesday, November 9, 2022

Want to transform data? Then DataWeave

 If you work in corporate Information Technology, you may have been (or will be) asked to transform data from one form to another. There are many good ways to address the challenge, but my current favorite is the DataWeave language from Mulesoft.

The reason has nothing to do with the fact that it's my job to introduce developers to the language and then train them to use it well. It is much more because I've met many programming languages in my time, and this one is among the most compelling and satisfying.

You might not believe it if you have read many of the articles here, but I revere brevity. If you speak and finish exactly on the time marker, I applaud for the technical accomplishment much more so than for the speech. If you write me an email with 5 sentences instead of 5 paragraphs, I name my children after you.

When it comes to code, I realize at this point in my career, the fewer lines you write, the stronger your program. I blush with embarrassment when I think back to those days when I walked around and bragged that our project had grown to 80,000 lines of code. (Yes, I also counted the number of lines in each of the third-party libraries we bought for the project.)

Now that I know the folly of my ways, I can respect the fact that the best DataWeave expressions are the shortest.

You may be aware that Mulesoft makes much of its technology freely available to developers, and sometimes to organizations too small to justify the full corporate license.

They have also now made DataWeave available under the "DataWeave Playground Free Commercial License Agreement." This means that you can use the language, even if it's not for a Mule application. Once you begin to see what that makes possible, this turns out to be very good news.

Let's look at a simple example. We start with this simple data set.




The input data on the left is identified as "payload" in our expression (seen in the middle). The DataWeave expression (in line 4) simply returns the payload in its original form.

Our goal is to transform this data so that we can write it out as XML before we submit it to another system. There are a couple of difficulties with this data however. First, there is no "root key," which is a necessary characteristic of XML data. The other impediment is that you cannot represent an Array[1] in XML. Each Object in this collection must be given a key.

This example illustrates the output form that we must achieve before we can ask for this to be XML.


In this expression (beginning at line 4), we declare a static key ("customers) and then submit the payload to the map() function which simply applies an expression to each element of the Array (which we shall identify as "i" for each iteration). Our expression (seen in line 5) is a lambda, or unnamed function which declares a static key for each item ("customer") and then we present the item as the value for that key.

Because XML does allow a key to be repeated, you might consider that an Array could be encoded as a set of repeating key/value pairs. So how do we write a function that could do this with any Array of Object? Take a look at this:


The function (defined in lines 4-6) accepts an input value (a), which establishes a static key (customers) and then submits the input Array to the map() function in which we apply a static key to each element in the Array. The data on the right shows us what this will produce.

We are still left with one issue. The root key references an Array of Object. If we want this to be XML, we must transform that Array into a series of Object, each one with a common key. So we can add an additional parameter to our function (see line 4) to allow the caller to pass us a key to use for items in the collection.


Now each item is tagged with an appropriate key, but the root key is still established with a static value (line 5). We need to reflect the "tag" value passed to our function. 


We can use a function from the DataWeave "Strings" module to give us a plural version of the input tag. Take a look at line 5 where the "pluralize" function transforms our tag, "customer" into "customers." Now our function is fully generalized. If we pass any Array of Object, it will give us the variation of that data seen on the right.

Unfortunately, there is still an Array here. So our final step will be to convert what is an array of objects (each of which contains just one key/value pair, ie. a KVP) into a single stream of KVPs. This is the structure we need to transform:


I need you to take my word for the next step. It often takes me a half hour in the classroom to explain why this works. (If you come to my DataWeave class offered at Mulesoft Training, I'll happily unravel this mysterious alchemy for you fully.)

Take a look at this:


The final step here is for us to enclose the map() lambda in the () construct embedded within the {} construct. The former evaluates its content (in this case, our Array of Object) and returns a simple set of KVPs. The latter construct then assembles the set into an Object.

The outcome is a set of data that can be represented in XML.


Our function can now be passed a different set of data, and a tag that is appropriate for the elements, and we can get an XML ready version of that data.


This exploration leaves a lot to be explained. If you would like to explore further, you can take an example of this project to the DataWeave Playground and explore it for yourself. Look here for a Github repo that offers data samples you can try.

Of course the best way to learn effective DataWeave practices is to attend the MuleSoft course for developers. But don't wait for that, you can get started harnessing the power of DataWeave today!

---

[1] You may have noticed that I capitalize Array and Object. These are names of known data types in DataWeave. Others types include String, Date, Number, and Null.


To learn more about DataWeave, check out the DataWeave Tutorial on the DataWeave Playground (Find the button at the upper right-hand side of the screen). The MuleSoft Blog also provides a number of HowTo articles that may be helpful to you. The best way of course, is to visit the MuleSoft Training website to discover all your options.

Vincent Lowe is a Senior Technical Instructor for Mulesoft. He has trained developers in C, Java, Perl. Python, Javascript, and DataWeave. The views expressed here are his, and not necessarily those of MuleSoft.

No comments:

Post a Comment

Reduce to Dashboard

When developers use DataWeave, they often come to rely on the reduce() function to fill in any gaps left by the standard Core library. Altho...