I am applying a function to a Pandas DataFrame, and returning a tuple, to cast into multiple DataFrame columns using zip(* ).
The returned tuple, contains a list, containing one or more tuples.
In cases where at least one of the the nested lists contain a different count of tuples from the rest of the lists, everything works fine.
In rare cases where the function returns all nested lists with equal tuple counts within, an AssertionError: Shape of new values must be compatible with manager shape is raised.
I suspect Pandas is seeing the consistent nested list lengths and is trying to unpack the list(tuples) into separate columns.
How can I force Pandas to always store the returned list as is, regardless of the conditions above?
(Python 3.7.4, Pandas 1.0.3)
Code that works:
import pandas as pd
import numpy as np
def simple_function(type_count):
calculated_value1 = np.random.randint(5)
calculated_value2 = np.random.randint(5)
types_list = [tuple((x, calculated_value2)) for x in range(0, type_count)]
return calculated_value1, types_list
df = pd.DataFrame([{'name': 'Joe', 'types': 1},
{'name': 'Beth', 'types': 1},
{'name': 'John', 'types': 1},
{'name': 'Jill', 'types': 2},
], columns=['name', 'types'])
df['calculated_result'], df['types_list'] = zip(*df['types'].apply(simple_function))
Code that raises AssertionError: Shape of new values must be compatible with manager shape:
import pandas as pd
import numpy as np
def simple_function(type_count):
calculated_value1 = np.random.randint(5)
calculated_value2 = np.random.randint(5)
types_list = [tuple((x, calculated_value2)) for x in range(0, type_count)]
return calculated_value1, types_list
df = pd.DataFrame([{'name': 'Joe', 'types': 1},
{'name': 'Beth', 'types': 1},
{'name': 'John', 'types': 1},
{'name': 'Jill', 'types': 1},
], columns=['name', 'types'])
df['calculated_result'], df['types_list'] = zip(*df['types'].apply(simple_function))