Monday, June 20, 2022

Spin Up Sample Data With DataWeave

 [This article is primarily intended for DataWeave developers. It may be considered a Forward to an article for the MuleSoft Developers blog that demonstrates how this problem can solved. Readers are invited to post comments to suggest the approach they believe in.]


Hopefully you don't commonly face this problem on your development team, but I see it often enough in the field that I know some folks could use a handy answer. Something that often hamstrings an API development effort is the absence of useful sample data for the project. If you use specification-driven development the way MuleSoft suggests, then there very well may be a suitable sample to get your project into coding.

Occasionally however, the samples we find in the spec are not enough either, and it falls to a developer who must generate a sample of suitable size, or variation. Using DataWeave, this is something you can do in a few minutes. I'm not going to show how in this article, I'm going to show you a framework and have you show ME how. Then I will publish an article in the MuleSoft Developers blog, showing my solution, and perhaps yours, if you suggest something magical.

One thing that puzzled me early on as I began to learn DataWeave was the presence of the random() and the randomInt() functions. It didn't seem like something a DataWeave programmer would need.

For our use case however, randomInt() can be very helpful.

The function is given a limit L and yields an integer between 0 and L but not including L. 

Here is some code that lets us observe the function in action.

%dw 2.0
output application/json
---
1 to 40 map randomInt(18)
distinctBy $
orderBy $

We feed a Range to map() that forces 40 calls to the function. We then remove duplicates with distinctBy()and sort them with orderBy().

Take a look at how it plays out in the DataWeave Playground.

code sample using randomInt()

So to get our desired sample, we could begin with the canonical model for a sample record, and cobble up some possible random values to populate the records.

Here's what the starting point might look like:

%dw 2.0
output application/json

/*
* Create a function that accepts a parameter N.
* it should create N records with elements chosen
* randomly from the arrays below
*/
var sampleRecord = {
"name": "General Robotics",
"account_id": "1001699305",
"created": "2022-04-10 01:47:53",
"city": "Farmington",
"state": "IL",
"postal": "79068"
}

var companies = ["Giant","Greed","Value","Pros","Family","General","Empire"]
var industry = ["Hardware","Media","Foods","Medical","Automotive","Sports Wear"]
var cities = ["Austin","Boston","Detroit","Chicago",
        "Phoenix","Dublin","Paris","Dimebox"]
var states = ["TX","IL","MI","AZ","TN","WI","CA","RI"]

---
sampleRecord

The invitation implies that a simple function could be used to create a record set of any size requested. Don't silver plate it, just supply the leanest generalized function to create the requested records. How long does it take to write such a function?

If you do feel compelled to keep hammering, then think about this.

A "pure function" would not reference the "seed arrays" that provide our atomic details for the sample. So how could you reorganize the seed data so that it could be passed in to the function? Would you also pass in a canonical record with a specification for the sample record's details? What would that look like?

Drop comments if you have a bright idea to share.

To learn more about DataWeave, check out the DataWeave Tutorial on the DataWeave Playground  (Find the button at the upper right-hand side of the screen). The MuleSoft Blog also provides a number of HowTo articles that may be helpful to you. The best way of course, is to visit the MuleSoft Training website to discover all your options.

No comments:

Post a Comment

Reduce to Dashboard

When developers use DataWeave, they often come to rely on the reduce() function to fill in any gaps left by the standard Core library. Altho...