Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Data Analysis Cleaning and Preparing Data Handling Bad Data Fixing or Excluding Data

Fixing or Excluding Data: quiz question -- best option for removing all missing data from the dataset

Hi. The question is:

Consider this table of high school class information. What is the best option for removing all missing data from the dataset?

The answer is to remove the two columns, Room and Student.

I'm curious, could you help me understand why this would be done, rather then the particular rows?

Thanks

1 Answer

Steven Parker
Steven Parker
229,644 Points

There are similar questions that show different example datasets, and they have different answers. The one for which this answer is correct shows a dataset where most rows are missing either the Students or Room data, so removing the affected rows would eliminate more of the dataset. And since no other columns are missing any data, this makes removing those columns the better choice.

In a different question the sample shows fewer rows with missing data, but with more columns missing in each row, so the best choice on that question is to remove the affected rows.