196

I have been using pandas for quite some time. But, I don't understand what's the difference between isna() and isnull(). And, more importantly, which one to use when identifying missing values in a dataframe.

What is the basic underlying difference of how a value is detected as either na or null?

nwaldo
  • 500
  • 3
  • 13
Vaibhav Thakur
  • 2,403
  • 3
  • 13
  • 9

2 Answers2

240

Pandas isna() vs isnull().

I'm assuming you are referring to pandas.DataFrame.isna() vs pandas.DataFrame.isnull(). Not to confuse with pandas.isnull(), which in contrast to the two above isn't a method of the DataFrame class.

These two DataFrame methods do exactly the same thing! Even their docs are identical. You can even confirm this in pandas' code.

But why have two methods with different names do the same thing?

This is because pandas' DataFrames are based on R's DataFrames. In R na and null are two separate things. Read this post for more information.

However, in python, pandas is built on top of numpy, which has neither na nor null values. Instead numpy has NaN values (which stands for "Not a Number"). Consequently, pandas also uses NaN values.

In short

  • To detect NaN values numpy uses np.isnan().

  • To detect NaN values pandas uses either .isna() or .isnull().
    The NaN values are inherited from the fact that pandas is built on top of numpy, while the two functions' names originate from R's DataFrames, whose structure and functionality pandas tried to mimic.

Djib2011
  • 8,068
  • 5
  • 28
  • 39
0

isnull is an alias for isna, so they are the same. You can check the source code to confirm as much.

Similarly, notnull is an alias for notna, which is defined as ~df.isna() (source code), so the following are all equivalent:

df[df['x'].notnull()]

df[df['x'].notna()]

df[~df['x'].isna()]

df[~df['x'].isnull()]

cottontail
  • 312
  • 3
  • 4
  • 13