Find the column with the highest percentage of missing information in demographics

Question

Hi, Although I've finished the practice question. I was hoping if anyone could share their input if there's a simpler/easier way to solve this problem. My solution is as follows:
valid_entries = demo.count()
total_rows = len(demo.index)
missing_data = total_rows - valid_entries
missing_data.head()
missing_percentage = missing_data / total_rows * 100
missing_percentage.head()

missing_percentage_array = np.array(list(missing_percentage[:,]))
max_missing_perc_index = np.where(missing_percentage_array == 
missing_percentage.max())
np.array(list(missing_percentage.index))[max_missing_perc_index]
I'm quite certain there's an easier method to solve this and would love to know! For instance i was able to find the maximum missing percentage value directly from the dataframe (missing_percentage) but I couldn't find it's corresponding row label. So instead converted the list of values to a np.array, found the index of the largest percentage value, and used that as an index to find the corresponding row label, which was separately converted to a np.array.
Thanks and greatly appreciated!

Alex Koumparos · Answer

Hi Jason,
Using just the methods we've already seen, once you've got your missing_percentage Series you can do this:
```python
missingpercentage.sortvalues(ascending=False).index[0]
'DMARACE'
```
Exploring Pandas a bit further, there is a built-in method called idxmax() (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.idxmax.html#pandas.Series.idxmax) that does exactly what we want:
```python
missing_percentage.idxmax()
'DMARACE'
```
Hope that helps.
Cheers.
Alex

Welcome to the Treehouse Community

Looking to learn something new?

Jason Tran

Jason Tran

Find the column with the highest percentage of missing information in demographics

1 Answer

Alex Koumparos

Alex Koumparos