Welcome to the Treehouse Community

The Treehouse Community is a meeting place for developers, designers, and programmers of all backgrounds and skill levels to get support. Collaborate here on code errors or bugs that you need feedback on, or asking for an extra set of eyes on your latest project. Join thousands of Treehouse students and alumni in the community today. (Note: Only Treehouse students can comment or ask questions, but non-students are welcome to browse our conversations.)

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and a supportive community. Start your free trial today.

Data Analysis Cleaning and Preparing Data Handling Bad Data Infeasible and Extreme Data

Cody Stephenson
Cody Stephenson
8,272 Points

Are these valid solutions for the individual practice tasks?

OCQ130 has valid values of (1-7, 77, 99). I wanted to included the 77 and 99 so I couldn't use a > [highest valid code] clause like in the example, so I used.

ind = ocq['OCQ130'].between(7, 77, inclusive=False)
ocq.loc[ind, 'OCQ130'] = np.nan

and for OCQ150 the valid values are (1, 2, 3, 4, 7, 9) but there were some 8's in the unique values so <> wouldn't work, and neither would df.between() so I went with

ind = ocq['OCQ150'].isin([1, 2, 3, 4, 7, 9])
ocq.loc[~ind, 'OCQ150'] = np.nan

These seemed reasonable and so did the subsequent outputs and checks on the data, I just wanted to see if there might be any hidden gotchas I missed.