Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Python

Mona Jalal
Mona Jalal
4,302 Points

problem with beautifulsoup

How should I get around this? I am new to bs4:

/Users/mona/anaconda/lib/python2.7/site-packages/bs4/__init__.py:166: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

To get rid of this warning, change this:

 BeautifulSoup([your markup])

to this:

 BeautifulSoup([your markup], "lxml")

This is the code:

 imdb.py                                                                                                       
  1 import pandas as pd
  2 from bs4 import BeautifulSoup
  3 #download data from https://www.kaggle.com/c/word2vec-nlp-tutorial/data
  4 train = pd.read_csv("data/imdb_dataset/labeledTrainData.tsv", header = 0, \
  5         delimiter = "\t", quoting= 3)
  6 print(train.shape)
  7 print(train.columns.values)
  8 print train["review"][0]
  9 
 10 #cleaning the reviews like removing HTML tags using bs4
 11 example1 = BeautifulSoup(train["review"][0])
 12 print train["review"][0]