Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Python Introduction to Data Visualization with Matplotlib Chart Toppers Box Plot

list(group) inside for loop

In the introduction for data visualization, Ken uses this code for sepal :

for species, group in groupby(irises, lambda i: i[4]):
      categorized_irises = list(group)
      sepal_lengths = [float(iris[0]) for iris in categorized_irises]

but do this for petal :

petal_lengths = []  
for species, group in groupby(irises, lambda i: i[4]):
        petal_lengths.append([float(petal[2]) for petal in group])

I tried to check but for me there is no difference, why use list(group) then in the first one ?

Thanks !

3 Answers

I also see no reason to convert group to a list, list() takes in a sequence, so to my knowledge it is already possible to iterate through group. The only time it comes in handy is when group is a tuple and you want to change it (you can't edit tuples).

Boban Talevski
Boban Talevski
24,793 Points

I was kinda wondering the same thing as this groupby thing is new to me, but I also tried not converting group to a list in the first example, and it didn't work. As Ken says, group is a generator and you can go over it one time, but in the said example we need to go over it twice, once for the sepal_lengths, and once for the sepal_widths.

for species, group in groupby(irises, lambda i: i[4]):
    categorized_irises = list(group)
    sepal_lengths = [float(iris[0]) for iris in categorized_irises]
    sepal_widths = [float(iris[1]) for iris in categorized_irises] # without this line, we don't need the list conversion
    plt.scatter(sepal_lengths, sepal_widths, s=10, c=colors[species], label=species)

Or am I getting this wrong, cause you both seem to agree that we don't need the list conversion. My understanding is that we do need the list conversion in the particular example because we need to go over the group generator twice.

If you print out group, you get

group: <itertools._grouper object at 0x11a714cc0>

It is essentially iterator like generator that simply gives you one item at a time out of many items for the efficient memory use. If you blow up the entire irises with list(group), you get a list data type with 50 items using the amount of memory for each 50 items.

Suppose you want to iterate over 1 million items, you don't want to iterate over list(listof1millionitems) which uses the memory for each item times 1 million. For example, you want to use range(1000000) instead of [0, 1,2,3,4,5,...,999998,999999] for which range() is a generator.