I am using the tf-idf to build representations. It is large dataset and it quickly becomes too much for my RAM if I convert the matrix to a Data-Frame.
What is the best way to reduce the number of features/columns and retain the highest possible level of information.
The model has the possibility of setting max_features to a number but that retains features that have high_term frequency, which kind off defeats the purpose of tf-idf. You can also set stop-words, but that doesn't reduce the dimensionality much in my case.