[This article is primarily intended for DataWeave developers. It deals with code and development strategies.]
A while back, I wrote this article suggesting an approach to generating sample data with a simple DataWeave function. I left the conversation open ended to offer time for readers to suggest their own solution. I provided a sample (the severed head of my data) and a shell that suggested an approach to generation of sample data at any scale.
In this article, I offer one possible solution. If you have not read the OP (as it were) then give it a read and then see if you can hack out a solution that you like. If you've already given it a try, or if you simply want to see a solution dissected, then by all means, turn the page.
The premise is that you sometimes need to synthesize data for a project, and although there are tools that can readily help you do this, sometimes the characteristics of the problem domain call for a customized solution. Here are some reasons you might turn to DataWeave to help you.
Consider the case that your Mule app will ingest a stream of objects that arrive at a variable rate. You might simulate this by feeding objects into a VM queue or a JMS queue using a Scheduler to regulate the rate of object generation.
Or think about how you can deal with the condition that your API will be presented with a collection of objects and must process the collection efficiently at scale. After assembling the processing logic, you might want to spin up a mock data source that can present a scale replica of your expected data.
So in my previous installment of this conversation, I gave you this to begin:
%dw 2.0
It actually turns out, when you begin to consider the issue seriously, that you will need several functions. The suggested function will need to synthesize "name," "city" and "state" from the values in the "seed arrays." The "account_id" and "postal" field values can be synthesized arithmetically using the randomInt() function and a little thought.
So, let's begin with a function that will generate the company name. A simple and casual function might look like this:
Okay, I ain't gonna lie. This code makes my teeth hurt!
For one thing, it fails at being a pure function. And secondly, it applies the most primitive of DataWeave operations, string concatenation. All too often, I find bad DataWeave transformations that contain what I call the Endless Graveyard of Concatenations. It's not the end of the world (although, I must confess that I've seldom traipsed to the end of the "Graveyard" so there may be an apocalypse along the way that I missed somewhere), but there is a better way to do this.
A simple fix would be to use string interpolation. But that solves only part of the problem. (and the way this code is written, it would have us create a pretty lengthy line of code for the function.
The larger problem is the pure function thing. Now most of the "awe" that functional programming mavens hold for the concept of pure functions is relevant mostly in the badlands of other development languages and platforms that define mechanisms to govern variable storage and scope.
In DataWeave, there are no "static" or "transient" or "stack perpetually ephemeral" variables. By default, all variables declared in the header of your transformation are global to that transformation, and if you use the do{} enclosure to create an "inner transformation" then you may consider variables created there as global to that enclosure.
Our createCompany() function matures a lot when we use this:
The body of this function is now easy to read and to maintain. The "seed array" is localized to this function because it is not needed elsewhere, and this is now a pure function because it does not depend upon external references to data. (Although much of the value from being a pure function is not relevant in DataWeave, this consideration is a big deal. The emphasis on generalization of functions, and the reuse of working logic has us always asking ourselves how overspecialized our functions might be.)
Here's how some of the other utility functions will look then:
I will not do it here, but the similarity in these functions suggests that they could be collapsed into one if we are willing to pass the "seed array" as a parameter. The localization of the seed arrays would not be necessary in such a case.
To get the "account_id" and the "postal" code, we require a pair of functions that simply construct a value suitable for our output record.
To get a postal code (presuming the simple US pattern), we get a number between 0 and 69,999 and boost it just a little bit with addition so that our final range of values is between 30,000 and 99,999.
(randomInt(70000) + 30000)
Then we convert it to a String with a pattern that requires five significant digits. That's a perfectly apt range for our simulated postal code.
as String {format: "00000"}
For account ID, we work a little harder.. First we take a date stamp from the time of execution. We arrange that as a string that contains "day," "hour," "minute," and "second."
now() as String {format: "ddhhmmss"}
We then enhance it (using concatenation; yes, I'm aware of the harsh things I've said in the past) using a two digit salt.
randomInt(100) as String {format: "00"}
No comments:
Post a Comment