Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Python Introduction to Data Visualization with Matplotlib Chart Toppers Scatter Plot

ValueError: could not convert string to float: 'sepal_length'

Hi. Ken's code executes perfectly, while my code returns this error:

from itertools import groupby

import csv
import matplotlib.pyplot as plt

input_file = "data/iris.csv"

with open(input_file, 'r') as iris_data:
    irises = list(csv.reader(iris_data))

colors = {"Iris-setosa": "#2B5B84", "Iris-versicolor": "g", "Iris-virginica": "purple"}
irises.pop()  # because the list includes an extra unneeded item

for species, group in groupby(irises, lambda i: i[4]):
    import pdb; pdb.set_trace()
    categorized_irises = list(group)
    sepal_lengths = [float(iris[0]) for iris in categorized_irises]
    sepal_widths = [float(iris[1]) for iris in categorized_irises]
    plt.scatter(sepal_lengths, sepal_widths, s=10, c=colors[species], label=species)  # marker size of 10,
-------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-106afecb7f6d> in <module>()
     16 
     17     categorized_irises = list(group)
---> 18     sepal_lengths = [float(iris[0]) for iris in categorized_irises]
     19     sepal_widths = [float(iris[1]) for iris in categorized_irises]
     20     plt.scatter(sepal_lengths,sepal_widths,s=10,c=colors[species],label=species)

ValueError: could not convert string to float: 'sepal_length'

For a reference, there's a similar thread, but the responses provided unfortunately did not solve my error.

Thank you anyone in advance!

Cheo R
Cheo R
37,150 Points

Which error are you getting?

hi Cheo, I updated with the error I'm seeing

Cameron Stewart
Cameron Stewart
18,041 Points

try removing the last row on the data set, AND the first (header)

irises.pop() #last row irises.pop(0) #first row

4 Answers

ewelina krawczak
ewelina krawczak
5,707 Points

Hi again!

5.1,3.5,1.4,0.2,Iris-setosa

4.9,3.0,1.4,0.2,Iris-setosa

4.7,3.2,1.3,0.2,Iris-setosa

4.6,3.1,1.5,0.2,Iris-setosa

Those are 4 first lines of my csv file with coma as a separator of "columns" in csv file."Iris -setosa" has index 4-its in 5th "column" of csv.Does it look the same in Your file?

I would reccomed you doing the following:

just after

with open(input_file, 'r') as iris_data:
    irises = list(csv.reader(iris_data))

I would check what irises returns in lines

for i in irises:
    print (i)

Yes, mine matches your results. Thanks anyway for your help; I gladly appreciate it.

Hi Ewelina, my problem is that the header is caught in that loop's first execution:

>>> irises[0]
['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']

sepal_lengths = [float(iris[0]) for iris in categorized_irises]

ValueError: could not convert string to float: 'sepal_length'

I'm wondering why in the instructor's code, the groupby function safely removes this header row from the loop's execution, but my code tries to treat that header row as data.

I tried removing the header row, but of course, it's needed the way Ken writes the loop. Wow, if anyone knows another workaround, I'm completely stumped!

ewelina krawczak
ewelina krawczak
5,707 Points

Have You checked the csv file?It's structure?Does it have last unnecessary field?Maybe the separator is different? Have you tried looping through irises list to check if it goes without a problem and returns all the lines with the correct order of data?

Thanks for your questions, ewelina krawczak. Checking the csv is a fine idea... but I need a benchmark against which I can check it. I'm not quite sure if mine has the "last unnecessary field" -- if it doesn't have it, I won't know what it looks like. Same with the separator: if your separator gets the code to work well, then I'd love to see what your separator is. Otherwise I won't know what "different" would look like.

I found the Iris data on GitHub. someone else asked in a different thread where to get this too

Here's some code I looked at (I don't believe I used it), to obtain the iris data, from scikit-learn's website:

from sklearn import datasets
iris = datasets.load_iris()
Mustafa Başaran
Mustafa Başaran
28,046 Points

Hi Mark,

I have seen your other thread, as well. The below code works fine in my local environment (jupyter notebook on anaconda 1.8.7). input_file variable will be different of course depending on where you store the iris.csv file.

import csv
import matplotlib.pyplot as plt
from itertools import groupby 

input_file = "/Users/mustafabasaran/Desktop/iris.csv"

with open(input_file, 'r') as iris_data:
    irises = list(csv.reader(iris_data))

colors = {"Iris-setosa": "#2B5B84", "Iris-versicolor": "g", "Iris-virginica": "purple"}


irises.pop()

for species, group in groupby(irises, lambda i:i[4]):

    categorized_irises = list(group)
    sepal_lengths = [float(iris[0]) for iris in categorized_irises]
    sepal_widths = [float(iris[1]) for iris in categorized_irises]
    plt.scatter(sepal_lengths,sepal_widths,s=10,c=colors[species],label=species)

plt.title("Iris Data Set", fontsize=12)
plt.xlabel("sepal length (cm)",fontsize=10)
plt.ylabel("sepal width (cm)",fontsize=10)
plt.legend(loc="upper right")
plt.show()

I hope this helps.

Thanks Mustafa... your code was functionally identical to mine, so the error persists, unfortunately :|

ewelina krawczak
ewelina krawczak
5,707 Points

"I tried removing the header row, but of course, it's needed the way Ken writes the loop"

Why do You think the header row is needed?