Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Start a free Courses trial
to watch this video
In this video, we will discuss what is meant by cleaning or scrubbing a dataset, and why it’s an important step in data analysis.
New Terms:
- Data Cleaning -- The process of fixing or removing incorrect, incomplete, and irrelevant data from a dataset. Also called data cleansing, preparing, or scrubbing.
- Example -- A single observation, case, or member of a dataset, usually a row in a table.
- Feature -- A descriptive or measurable characteristic of an example in a dataset, usually a column in a table. Also called a variable.
- Raw Data -- Data that has been collected but not cleaned. Also called source, primary, or atomic data.
Data Sources:
data.world
Kaggle competition datasets
National Health and Nutrition Examination Survey
The Star Wars API (SWAPI)
Further Reading:
Why the ‘Boring’ Part of Data Science is Actually the Most Interesting
Data Science: A Kaggle Walkthrough Pt 1: Introduction
Data Science: A Kaggle Walkthrough Pt 2: Understanding the Data
Data Science: A Kaggle Walkthrough Pt 3: Cleaning Data
Tidy Data
Python Resources:
You need to sign up for Treehouse in order to download course files.
Sign up