Blog

Generating Complex Test Data Sets

How to Acquire Representational and Coverage Complete Test Data

Brendan Lester
6 February 2023

Acquiring representational and coverage complete test data is arguably the most important part of quality assurance. Its importance cannot be overstated and often the effort to gather cannot be understated – particularly in the absence of being able to use data from Production or when the sample set would need to be so large to avoid running the risk of missing coverage. One of my more challenging tasks this year was establishing test data that would represent multiple business transactions, 700+ elements, numerous business requirements and many different data relations. For this activity, the test data would be XML based and just as a note, actual XSDs were not available for reference.

Attempting to construct the data manually while learning and interpreting the large set of requirements along the way, along with managing the exponential impact of relational data elements, would have been an increasing and almost never-ending series of duplicate work, rework, and continuous checks on work - potentially to a non-sustainable point. And because the problem space would only reveal more as activity got underway, it was not possible to define ‘done’ from the beginning. That is, I was not able to predetermine how many test data files would be needed to ensure coverage.

From reading the business documents, I was fortunately able to identify some patterns of structure, reuse and several ways in which data values could influence the values of other data. For example, if one element has a value of ‘Biscuit’, then another may only have a possible value of ‘Chocolate, ‘Shortbread’ or ‘Ginger. I identified these as Constraints. Not all elements would be constrained, and some could just have a list of pre-defined values.

In addition, some elements and values could cause the presence or absence of other elements in the XML and the reusable structures also faced variations relating to location context and value dependencies. While I needed to ensure coverage, the permutations and the potential number of test cases, was growing large, fast, even on paper.

Reflecting on the patterns I had seen and hoping to avoid the manual approach, I established a Rules Engine to do the heavy lifting for me. I purposely aimed to limit the number of Rule types available, so things didn’t go from configuration to code. Interestingly, I even had to ‘learn’ how to best use what I had created.

List of Rules & Settings:

conditionalShow (on presence or specific value of another element)
conditionalShowWithin (when within a certain path – part of location context reuse)
conditionalHide (on presence or specific value of another element)
conditionalHideWithin (when within a certain path – part of location context reuse)
potentialValue (one value, the element could have – there could be multiple)
constrainedPotentialValue (one value, the element could have, only if another element matches given value)
potentialValueWithin (value when within a certain path – part of the location context reuse)
potentialRange (minimum, maximum – number of instances of repeatable sections e.g. 0..5 )
potentialRangeWhenAt (minimum, maximum when at a certain path)

Each Data Element or Structure would have a unique Identifier, examples:

ID_Food(Element parent) {
super(parent, "Food")

addElement(“FoodType”);
addElement(“FoodDetailed”);
}
ID_Food(Element parent) {
super(parent, "FoodType")

potentialValue("Apple");
potentialValue("Biscuit");
potentialValue("Drink");
}
ID_FoodDetailed(Element parent) {
super(parent, "FoodDetailed")

constrainedPotentialValue("Food", "Apple", "Granny Smith");
constrainedPotentialValue("Food", "Apple", "Red Delicious");

constrainedPotentialValue("Food", "Biscuit", "Chocolate");
constrainedPotentialValue("Food", "Biscuit", "Shortbread");
constrainedPotentialValue("Food", "Biscuit", "Ginger");
constrainedPotentialValue("Food", "Biscuit", "Malt");
conditionalHide("Food", "Drink");
}

The rules engine would determine there were 18 permutations for the above potential values, but only 6 would be valid. And for one case, the FoodDetailed value would not be shown. For the most part I was able to continue converting the business requirements into the rule’s engine. There were some edge cases with non-trivial relations, and these were able to be handled in code facilitated by the rule’s engine.

Of the 700+ elements, a large number had potential or constrained potential values that would contribute to an extremely large number of permutations. However, it would be undesirable and unnecessary to have Test Cases for all of these. The Rules engine was able to report on the elements it considered as ‘primary influences’ through their relations. I was then able to tune these out so they wouldn’t be individually catered for. I also needed to establish two Seeding elements to ‘kick-off’ the influence of others, but these would not appear in the final XML output. All up, 163 test files were discovered by the engine and generated.

By capturing the business information into a more useful state, I could ensure all items were catered for in accordance with their needs and would be part of the final result. Changes and additions would be a quick regeneration instead of the manual activities mentioned earlier. Being in an electronic form, it was possible to add reporting elements into the workflow for checking and discussing with the business. And finally, with the information effectively codified, it would not have been a stretch to generate those XSDs that were unavailable. But that’s a job for another day.

While various test data preparation tools exist ranging from too simple to perhaps too complex, I think the sweet spot was achieved with the approach taken for this exercise. And with the codified content in text form rather than GUIs, personally I was very comfortable working with it. Definitely one for the toolkit!

we have a lot to talk about