I have a pd.DataFrame
import pandas as pd
country = ['US', 'US', 'US', 'UK', 'UK', 'UK']
year = ['1990', '1991', '2020', '1990', '1991', '2020']
people = [20, 34, 456, 5, 7, 300]
df = pd.DataFrame(zip(country, year, people), columns = ['country', 'year', 'people'])
country year people
0 US 1990 20
1 US 1991 34
2 US 2020 456
3 UK 1990 5
4 UK 1991 7
5 UK 2020 300
I wish to locate year '2020' and '1990'. I understand that this can be achieved with:
df.loc[(df.year == '2020') | (df.year == '1990')]
or
df.query('year == [\'2020\', \'1990\']')
to get the output:
country year people
0 US 1990 20
2 US 2020 456
3 UK 1990 5
5 UK 2020 300
However, I'd like to perform this 'query' with the in operator.
Trying:
df.loc[df['year'] in ['2020', '1990']]
which raises the error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I expect to use the in operator in my pandas.DataFrame subsetting, as it requires the least amount of typing.
What is the best way to mitigate this error raised by the in operator?