Most Popular

1500 questions
9
votes
3 answers

What is the use of user data collection besides serving ads?

Well this looks like the most suited place for this question. Every website collect data of the user, some just for usability and personalization, but the majority like social networks track every move on the web, some free apps on your phone…
9
votes
2 answers

How to model user's buying behavior on Amazon?

For our final course project in Data Science, we proposed the following- Give the Amazon Reviews Dataset, we plan to come up with an algorithm (thats roughly based on Personalized PageRank) that determines a strategic position for placing ads on…
9
votes
2 answers

High accuracy on test-set, what could go wrong?

You are given a pre-trained binary ML classification model with 99% accuracy on the test-set (assume the customer required 95% and that the test-set is balanced). We would like to deploy our model in production. What could go wrong? How would you…
CodeHoarder
  • 193
  • 1
  • 4
9
votes
2 answers

Effect of Stop-Word Removal on Transformers for Text Classification

The domain here is essentially topic classification, so not necessarily a problem where stop-words have an impact on the analysis (as opposed to, say, sentiment analysis where structure can affect meaning). With respect to the positional encoding…
9
votes
3 answers

Sentiment Analysis Tutorial

I am trying to understand sentiment analysis and how to apply it using any language (R, Python etc). I would like to know if there is a good place on internet for tutorial that I can follow. I googled, but I wasn't very much satisfied because they…
9
votes
1 answer

Why is the cosine distance used to measure the similatiry between word embeddings?

While computing the similarity between the words, cosine similarity or distance is computed on word vectors. Why aren't other distance metrics such as Euclidean distance suitable for this task. Let us consider 2 vectors a and b. Where, a = [-1,2,-3]…
Ashwin Geet D'Sa
  • 1,217
  • 2
  • 11
  • 20
9
votes
2 answers

Does "feature importance" depend on the model type?

I was working on a small classification problem (breast cancer data set from sklearn), and trying to decide which features were most important to predict the labels. I understand that there are several ways to define "important feature" here…
9
votes
4 answers

Understanding how convolutional layers work

After working with a CNN using Keras and the Mnist dataset for the well-know hand written digit recognition problem, I came up with some questions about how the convolutional layer work. I can understand what the convolution process is. My first…
9
votes
1 answer

Original Meaning of "Intelligence" in "Business Intelligence"

What does the term "Intelligence" originally stand for in "Business Intelligence" ? Does it mean as used in "Artificial Intelligence" or as used in "Intelligence Agency" ? In other words, does "Business Intelligence" mean: "Acting smart &…
9
votes
2 answers

image_dataset_from_directory VS flow_from_directory

What is the main diffrence between flow_from_directory VS image_dataset_from_directory in keras? which one should I use?
9
votes
1 answer

Is it possible to have stratified train-test split of a set based on two columns?

Consider a dataframe that contains two columns, text and label. I can very easily create a stratified train-test split using sklearn.model_selection.train_test_split. The only thing I have to do is to set the column I want to use for the…
Aventinus
  • 213
  • 1
  • 3
  • 7
9
votes
3 answers

Multivariate Time series analysis: When is a CNN vs. LSTM appropriate?

I have multiple features in a time series and want to predict the values of the same features for the next time step. I have already trained an LSTM which is working okay, but takes a bit long to train. So now my question: is it reasonable to use a…
drops
  • 220
  • 1
  • 2
  • 7
9
votes
3 answers

How to setup and run Conda on Google Colab

I am interested in using Google Colab for data modeling. How do I install conda, create an environment and run python in a notebook? I did some searching and found some helpful hints, but had several issues with this. I can only get a partially…
Donald S
  • 2,079
  • 3
  • 9
  • 28
9
votes
1 answer

What types of matrix multiplication are used in Machine Learning? When are they used?

I'm looking at equations for neural networks and backpropagation and I see this symbol in the equations, ⊙. I thought matrix multiplication of neural networks always involved matrices that matched dimensions on both sides, such as... [3, 3]@[3, 2].…
Ant
  • 217
  • 1
  • 2
  • 5
9
votes
2 answers

Is over fitting okay if test accuracy is high enough?

I am trying to build a binary classifier. I have tried deep neural networks with various different structures and parameters and I was not able to get anything better than Train set accuracy : 0.70102 Test set accuracy : 0.70001 Then I tried…
skrrrt
  • 314
  • 2
  • 14