Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Python

Neil Anuskiewicz
Neil Anuskiewicz
11,007 Points

How should I process this output?

I'm just starting to learn Python, currently taking the Python Basics course. Anyway, I took a break to look around at libraries and scripts to see what's out there. I've decided to automate my own search for a new place to live.

This script finds listings and returns nicely formatted HTML to STDOUT. I'd like to write to an HTML file to review in a web browser unless there are other suggestions on how to review the output? I've looked around and it's not really clear to me the best way to get this done. I'll post my code and the output.

I can imagine that after enough time working with Python someone might be inspired to try to automate any task that requires a lot of repetition. :-)

#!/usr/bin/python
import requests
import pandas as pd
from bs4 import BeautifulSoup as bs4

url_base = 'http://eugene.craigslist.org/search/apa'
params = dict(bedrooms=2, housing_type=6)
rsp = requests.get(url_base, params=params)
print(rsp.url)
print(rsp.text[:500])

# parse text
html = bs4(rsp.text, 'html.parser')

print(html.prettify()[:1000])

dwellings = html.find_all('p', attrs={'class': 'row'})
print(len(dwellings))

this_dwelling = dwellings[15]
print(this_dwelling.prettify())
http://eugene.craigslist.org/search/apa?bedrooms=2&housing_type=6
<!DOCTYPE html>

<html class="no-js"><head>
    <title>eugene apartments / housing rentals  - craigslist</title>

    <meta name="description" content="eugene apartments / housing rentals  - craigslist">
    <meta http-equiv="X-UA-Compatible" content="IE=Edge"/>
    <link rel="canonical" href="https://eugene.craigslist.org/search/apa">
    <link rel="alternate" type="application/rss+xml" href="https://eugene.craigslist.org/search/apa?bedrooms=2&amp;format=rss&amp;housing_type=6" title="RSS feed

<!DOCTYPE html>
<html class="no-js">
 <head>
  <title>
   eugene apartments / housing rentals  - craigslist
  </title>
  <meta content="eugene apartments / housing rentals  - craigslist" name="description">
   <meta content="IE=Edge" http-equiv="X-UA-Compatible"/>
   <link href="https://eugene.craigslist.org/search/apa" rel="canonical">
    <link href="https://eugene.craigslist.org/search/apa?bedrooms=2&amp;format=rss&amp;housing_type=6" rel="alternate" title="RSS feed for craigslist | eugene apartments / housing rentals  - craigslist " type="application/rss+xml">
     <link href="https://eugene.craigslist.org/search/apa?s=100&amp;bedrooms=2&amp;housing_type=6" rel="next">
      <meta content="width=device-width,initial-scale=1" name="viewport">
       <link href="//www.craigslist.org/styles/cl.css?v=937c1171281e0c4991306867bcb8b61c" media="all" rel="stylesheet" type="text/css">
        <link href="//www.craigslist.org/styles/search.css?v=ecc3476e253474ac90b2099cdaecdb6c" media="all"
100
<p class="row" data-pid="5551536447">
 <a class="i gallery" data-ids="1:00A0A_glbV7Uf93GN,1:00y0y_cSbrNjYIeMD,1:00U0U_dvgYMuHFuzL,1:00808_4CI6VlTRPUN,1:00z0z_2nIxZnRDbZf,1:00V0V_7OFuPhOwMqr,1:00k0k_cHtxezB2GcC,1:00h0h_eXbBMqSSpW1,1:00c0c_Y8FsLEr6RN,1:00b0b_eCu7Kfd1pFI,1:00707_5YlsKLJdMBu,1:00r0r_kKvHSPXMsNu" href="/apa/5551536447.html">
 </a>
 <span class="txt">
  <span class="pl">
   <span class="icon icon-star" role="button">
    <span class="screen-reader-text">
     <? __("favorite this post") ?>
    </span>
   </span>
   <time datetime="2016-05-13 15:56" title="Fri 13 May 03:56:11 PM">
    May 13
   </time>
   <a class="hdrlnk" data-id="5551536447" href="/apa/5551536447.html">
    <span id="titletextonly">
     3 Bdrm home with single car garage- 1287 Washington
    </span>
   </a>
  </span>
  <span class="l2">
   <span class="price">
    $1495
   </span>
   <span class="housing">
    / 3br -
   </span>
   <span class="pnr">
    <small>
     (Eugene)
    </small>
    <span class="px">
     <span class="p">
      pic
     </span>
    </span>
   </span>
  </span>
  <span class="js-only banish-unbanish">
   <span class="banish" title="hide">
    <span class="icon icon-trash" role="button">
    </span>
    <span class="screen-reader-text">
     hide this posting
    </span>
   </span>
   <span class="unbanish" title="restore">
    <span class="icon icon-trash red" role="button">
    </span>
    <span class="screen-reader-text">
     restore this posting
    </span>
   </span>
  </span>
 </span>
</p>

7 Answers

Seth Kroger
Seth Kroger
56,413 Points

There are at least two ways to accomplish this. This easiest is if you're using a Unix-style shell (this includes Workspaces, Linux and MacOS, and unix-style shells for Windows like Git Bash, Mingw and Cmder). After a command you can redirect stdout to a file with " >file_name". This doesn't require alteration to the python script.

The other is to open and write to a file in the python script using open() and print() with a file argument.

out = open("results.html", 'w') # first argument is the file name, 2nd the mode. Mode here is 'w' to write.
print(html_to_output, file=out)
Neil Anuskiewicz
Neil Anuskiewicz
11,007 Points

Seth, thanks for your solution, I got this syntax error. Sorry for the deleting and adding comments. I was trying to clean this stuff up a bit.

./house.py 
  File "./house.py", line 27
    print(html_to_output, file=out)
Seth Kroger
Seth Kroger
56,413 Points

Are you running version 2 or 3 of python?

Neil Anuskiewicz
Neil Anuskiewicz
11,007 Points

By the way, I've tried to redirect to file but it silently fails (i.e., results.html empty).

./house.py > results.html

I tried piping the output to tee, which does provide an error.

./house.py | tee results.html

Traceback (most recent call last):
  File "./house.py", line 10, in <module>
    print(rsp.text[:500])
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 0: ordinal not in range(128)http://eugene.craigslist.org/search/apa?bedrooms=2&housing_type=6
Seth Kroger
Seth Kroger
56,413 Points

It should work. Remember to replace the html_to_output with whatever you had in the original print statements.

Neil Anuskiewicz
Neil Anuskiewicz
11,007 Points

Seth, what do you mean? I see that 'html_to_output' is a pleceholder but it's not clear what I'm supposed to put there.

Neil Anuskiewicz
Neil Anuskiewicz
11,007 Points

I didn't have any other print statements, the script has a series of print statements.

Seth Kroger
Seth Kroger
56,413 Points

What I'm suggesting with the 2nd way is to add ", file=out" to all the print statements where you're printing the relevant HTML.

Neil Anuskiewicz
Neil Anuskiewicz
11,007 Points

Could you show me what you mean with this little segment here?

print(rsp.url)
print(rsp.text[:500])
Neil Anuskiewicz
Neil Anuskiewicz
11,007 Points

Okay here's what ended up working:

f = open('house.html','w')
f.write(this_dwelling.prettify())
f.close()