How much data is missing from each row - what does axis = 1 mean?

In the notebook, there is a section called "How much data is missing from each row". The instructor uses the following code:

missing_data = np.sum(demo.isnull(), axis=1)

Per documentation, it looks like axis=None is the default. I'm not clear what the axis parameter does and why axis=1 was chosen.

1 Answer

Alex Koumparos
Python Development Techdegree Student 36,886 Points

Hi frankgenova

The axis value represents the dimension of a multidimensional array.

In the case of this dataframe, we have two dimensions: columns and rows. Axis 0 (or None) refers to the columns, Axis 1 refers to the rows.

Consider this simplified version of the dataset:

. ID Age Gender Military
0 1 2.0 2.0 NaN
1 2 77.0 1.0 1.0
2 3 95.0 2.0 NaN
3 4 1.0 1.0 NaN
4 5 49.0 1.0 1.0

Thus if we sum on axis 0/None, we see the number of null entries in each column:

ID             0
Age            0
Gender         0
Military       3
Citizenship    0
dtype: int64

Versus summing on axis 1, we see the number of null entries for each row:

0    1
1    0
2    1
3    1
4    0
dtype: int64

Hope that clears things up for you.