Welcome to the Treehouse Community

The Treehouse Community is a meeting place for developers, designers, and programmers of all backgrounds and skill levels to get support. Collaborate here on code errors or bugs that you need feedback on, or asking for an extra set of eyes on your latest project. Join thousands of Treehouse students and alumni in the community today. (Note: Only Treehouse students can comment or ask questions, but non-students are welcome to browse our conversations.)

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and a supportive community. Start your free trial today.

Data Analysis Popularity

Anh Tran
Anh Tran
669 Points

What is the most popular book of the 1960's?

I'm working on a challenge of the course Analyzing Books with Pandas. The question asks to write code to find out the most popular book of the 1960's in the data set. Below is my code:

the_1960 = books[books['publication_date'] >= pd.Timestamp(1960,1,1)]

the_1960 = books[books['publication_date'] < pd.Timestamp(1970,1,1)]

the_1960 = the_1960[the_1960['ratings_count'] > 1000]

the_1960.sort_values(by=['average_rating'], ascending=False)

The results include the years in the 1960's. But the issue is they also include those that shouldn't be there like 1953 and 1951. If I add [:1] to the last line, it returns a book published in 1957, which obviously is wrong. How do I fix this?

Please note that before that, I followed the tutorial and used the line below to convert the date into datetime64[ns]:

books['publication_date'] = pd.to_datetime(books['publication_date'])

1 Answer

Mel Rumsey
seal-mask
STAFF
.a{fill-rule:evenodd;}techdegree seal-36
Mel Rumsey
Treehouse Staff

Hey Anh Tran !

Take a look at the 2nd line. You are currently overriding the first line with all of the books that are less than 1970. I believe what you are wanting to do is access the publication date from the the_1960 dataframe that you had created in the first line rather than all of the books from the books dataframe. the_1960[the_1960['publication_date'] < pd.Timestamp(1970,1,1)]

the_1960 = books[books['publication_date'] >= pd.Timestamp(1960,1,1)]

the_1960 = books[books['publication_date'] < pd.Timestamp(1970,1,1)]    # <--  books should be the_1960

the_1960 = the_1960[the_1960['ratings_count'] > 1000]

the_1960.sort_values(by=['average_rating'], ascending=False)

Hope this helps! :)