Python Scraping Data From the Web Introducing Data Scraping More Soup in the Tureen

Phil Spelman
Phil Spelman
6,664 Points

Just wondering if there's a reason to use a for loop here; I thought soup.find(...) would only get the first result

I wanted to know if there's a specific reason for using a for loop in the example: for button in soup.find(attrs={'class': 'button button--primary'}): print(button)

My understanding was that using soup.find() (vs. soup.find_all()) would only return a single result

Gari Merrifield
Gari Merrifield
9,549 Points

It does seem like overkill, but 'for' will work on a single item array just as well as a multiple item array.

It would make it easier if you later were to convert to a '.find_all()', you wouldn't have to rewrite any code, just add the "_all"...

My two cents worth...

2 Answers

Jason Anders
MOD
Jason Anders
Treehouse Moderator 144,797 Points

Hey Phil,

That's a very good question! I don't understand why a loop was used either. Tagging Ken Alger for further clarification.

:) :dizzy:

Alex Koumparos
MOD
Alex Koumparos
Python Web Development Treehouse Moderator 33,475 Points

I think this is a bug in Ken's script, and it's not mere overkill. The behaviour is significantly different. Gari is not quite right when he describes the return value of find. It doesn't return a single item list, it returns a single item:

The only difference is that find_all() returns a list containing the single result, and find() just returns the result.

This means that Ken's for loop is not iterating (once) through a single item list, it is iterating several times through the individual children of that one result (and printing the string representation of the child object). With find_all, it would iterate once through a single item list (printing the string representation of the single element with the class "button button--primary").

Compare these two outputs:

>>> # create a simple demonstration HTML snippet
>>> html = """<div class="my_class">
...  <h1>A heading</h1>
...  <p>A paragraph</p>
...  </div>"""
>>> soup = BeautifulSoup(html, 'html.parser')

>>> for elem in soup.find(class_="my_class"):
...    print("before elem")
...    print(elem)
...    print("after elem")
before elem


after elem
before elem
<h1>A heading</h1>
after elem
before elem


after elem
before elem
<p>A paragraph</p>
after elem
before elem


after elem

>>> for elem in soup.find_all(class_="my_class", limit=1):
...    print("before elem")
...    print(elem)
...    print("after elem")
before elem
<div class="my_class">
<h1>A heading</h1>
<p>A paragraph</p>
</div>
after elem

Hope that is clear.

Cheers,

Alex