Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Python

KASHYAP KRISHNA GURUPRASAD
KASHYAP KRISHNA GURUPRASAD
4,527 Points

How to generate Random, but, controlled Data sets in Python?

Am planning to create test data set (at least, 10, 000 JSONS with valid and unique K/V pairs) for each of the elements such as nameX:valueY, i.e i need to create random but controlled name and value pairs, and then, bind them into one JSON.

Here, am looking for suggestions and recommendations, on the best approach to take this forward. Please share your approach, so that it would be helpdul for me to consider other alternatives.

My current REST endpoint takes in a JSON as follows, { name1:value, name2:value, name3:value, name4:value, name5:value, name6:value, name7:[ name1:value, name2:value, name3:value, name4:value, ], name7:value, name8:value }

Approach #1 : Generate Random strings(10, 000) for each nameX:ValueY pairs individually (store it in variables) and then stitch them together to create 10K JSON strings.

Approach #2 : Generate Random strings(10, 000) for each nameX:ValueY pairs and then store in Database and stitch them together using SQL queries to create 10K JSON strings.

The downside in both the approaches is that, whenever, there is change in the number of fields in the api, i need to modify the code/db to achieve the same.

Ken Alger
Ken Alger
Treehouse Teacher

There are several JSON data generators out there. You might take a look at the faker package to see if it suits your needs. Or, perhaps, using a site like mockaroo.com would work.

3 Answers

Steven Parker
Steven Parker
229,783 Points

When you say "random but controlled", what are the criteria for "controlling" the generated values?

And what would cause a "change in the number of fields in the api"? Couldn't this number be a parameter to the generator so it could automatically adjust to accommodate?

KASHYAP KRISHNA GURUPRASAD
KASHYAP KRISHNA GURUPRASAD
4,527 Points

Thank you Ken. I was aware of faker, was not aware of makaroo, this is an awesome site. Thank you very much for suggesting it, i shall try this as well.

KASHYAP KRISHNA GURUPRASAD
KASHYAP KRISHNA GURUPRASAD
4,527 Points

Thank you Steven.

Am creating a framework where i need to either generate data from within or make generated data from outside be used , hence, am looking for best approaches.

Here, there are certain scenarios where nameX:value can have either random value or empty or just invalid data/same data, since, the tests which would be invoked repeatedly, we should be able to know and understand "the sets of data" being provided as inputs.

There are 5 endpoints, lets call it EP1, EP2, EP3, EP4, EP5, what i mean by random but controlled is as follows, Now, data supplied to EP1 , EP2, EP3 has some dependency, i.e. say,

EP1 : name1:value, name2:value ... <<fill in the rest of JSON>> EP2: EP1{name1:value,} name2:value ... <<fill in the rest of JSON>>

like this certain nameX:value EP's are linked. hence, i wanted to have random datasets generated, while keeping the link to other EP's. Essentially, all the above things which is being mentioned is something the framework needs to be capable of doing.

Steven Parker
Steven Parker
229,783 Points

That seems very specific for a generic framework, it may take some custom programming to get something that will do what you want. But the suggested libraries can probably be helpful in creating some of the random content.