Welcome to the Treehouse Community
Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.
Looking to learn something new?
Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.
Start your free trialMike Wagner
23,559 PointsDealing With Large-scale Permutations
I've been trying to wrap my head around dealing with permutations with Python for a little while, but I think I'm going about it all wrong. I've tried a few different approaches (Itertools, custom built factorials, things I've stumbled on during exhausting searches online) and, while I've had limited success with some approaches I've tried, I seem to hit a bit of a roadblock once the permutation count climbs. So, I figured I'd ask the great people here for a bit of guidance, some tips, or any resources you guys can think of to help me along my way.
I don't have a ton of experience programming, and I'm still fairly new to Python, but I'm not afraid to get my hands dirty with some solid information that might help this out. I've read about a few things that I don't fully grasp like using a generator approach, but I wasn't able to find any implementations that were explained well enough for me to write my own. I've also read about streams and/or only generating smaller subsets of data, but I'm not sure where to begin and how I would implement it in a way that none of the data gets lost or duplicated. I think a lot of that is my frustration with this one small obstacle and the lack of current and viable information on the subject.
Essentially, what I'd like to be able to do is as follows:
- set a pool size
- set a sample size
- set a tuple count, possibly?
- return a list of tuples containing each permutation
Simplified Example:
- I have a pool of 9 unique items
- I would like samples of 4 items
- Since I don't need all the data:
- Take a sample (let's say the first 4) of the 9
- Use it as a seed to generate 4 permutations
- These can be any random permutation of 4 items out of the 9, but would be better if they were sequential permutations based on the input (i.e.
(0, 1, 2, 3)
would generate[(0, 1, 2, 3), (0, 1, 3, 2), (0, 2, 1, 3), (0, 2, 3, 1), (0, 3, 1, 2)]
and so on to however many permutations are requested)
- These can be any random permutation of 4 items out of the 9, but would be better if they were sequential permutations based on the input (i.e.
- Return the list of 5 permutations ( the seed permutation in index 0, like the above list )
- Once I'm done using the data, be able to generate more using the same function, and the last index of the previously returned list.
Everything else I'm doing with the data I can handle on my own (I think, haha), so I'm not really looking for anyone to write the code for me. It's a learning experience and that wouldn't be any fun. I don't really need all of the data all of the time, so if it's easier or smarter to only return say... 5 random permutations of the current sample set... I'd be okay with that, as long as the possibility for all data still exists in fair probability.
I'm really just not sure what the best approach is, so I'm looking for wisdom.
Thanks in advance.
3 Answers
james south
Front End Web Development Techdegree Graduate 33,271 Pointshave you tried this?
https://en.wikipedia.org/wiki/Heap%27s_algorithm
the article has pseudo-code but it's easy to code up in python to give all perms of a list of numbers/other data.
james south
Front End Web Development Techdegree Graduate 33,271 Pointspost your code and i'll take a look if you want. the algo definitely works, i used it on a problem asking for the one millionth perm of a list of 10 items.
Mike Wagner
23,559 Pointsjames south - I'm not entirely concerned with my flawed implementation of that article's algorithm, since it does seem to generate the expected count of permutations at lower levels. Even if it (my version) flawed and generates duplicates, this is something I could hash out later down the line. My main problem with it is that it performs much like Python's itertools, and several other attempts I've made at dealing with large-scale permutations. Once the permutation count reaches something around 350k+ permutations, all methods I have tried wind up hanging or crashing. I believe this is probably running out of memory or something, as I've read around online about many itertools implementations that have similar behavior, but I've yet to find a suitable solution. That's when I started to read up on generators and data streams but, as I said before, I haven't found a decent supply of understandable information on the topics to actually implement the principles myself yet.
To be perfectly honest, I'm not even sure what the best approach for dealing with large scale permutations would be. I assume that since I don't need all the data at once, it would be in the interest of performance to generate only a small series of permutations as I described in my original post (supplying a starting point and generating a set number of permutations), work with those permutations and then use the last permutation to seed the next series. To me, this sounds like something that could be done with a generator fairly easily, but I'm struggling with my understanding of them. I'm also not sure how I would implement any permutation solution (be it itertools, the one from your link, or any of the other solutions I've found that work in small scale) in a way that would work with a generator, or that would only return a smaller sample. Most of the permutation solutions I've found operate on a 1:1 basis where the permutations you get out are the same length as the iterable that's fed in. I'd like to be able to choose sample sizes.
I think I might just try to find an alternative to permutations, I dunno.
james south
Front End Web Development Techdegree Graduate 33,271 Pointshow big are you trying to go? my laptop (not especially powerful) takes about 12 seconds to fill a list with all perms of a list of 10 items, which is 3.6 million perms. 9 items (362K perms) just takes a second.
Mike Wagner
23,559 PointsThat's rather interesting to me, to be honest. Any online repl I've used, my server, and my personal laptop all refuse to go above a full 9 permutation using a single line itertools.permutations()
implementation. If I implement my own algorithms it takes significantly longer, but again, I can't pass anything larger than a 9 item permutation, on any of the above devices. I sort of expect that from an online repl, but I must have something set up wrong on my personal devices that's preventing me from going higher. Perhaps there is a safety in Python installs that I am unaware of?
Mike Wagner
23,559 PointsMike Wagner
23,559 Pointsjames south - I checked it out a bit and it doesn't appear that I can get it to work the way it should. For example, my adapted Pythonic version of it with an input list with a length of 7 and passing in 7 as the integer argument, when finished running, will return a list with the expected length (5040), but it appears to have duplicates (1148 of them) when running this against the list, giving only 3892 unique permutations.
dups = set(tuple(x) for x in PERMUTATIONS if PERMUTATIONS.count(x)>1)
And if I instead use a check before appending to the list of permutations, I am left with a list that is 1452 unique permutations.
if tuple(A) not in PERMUTATIONS:
I'm pretty sure I "converted" it over properly but I could be mistaken since I wasn't able to decipher parts of it and faked my way through it. I probably botched it somehow that I can't sort out yet, or the formula on Wikipedia is flawed but, either way, it doesn't tackle the bigger problem which dealing with larger sets of permutations. Even with my flawed interpretation of the formula, once the permutation count gets too high, it will crash. I'm guessing it has something to do with running out of memory, but I'm no expert (obviously)