Questions tagged [mathematics]

Mathematics in a data science or machine learning context refers to the mathematical underpinnings for algorithms, optimization, statistics, and linear algebra etc.

Mathematics in a data science or machine learning context refers to the mathematical underpinnings for algorithms, optimization, statistics, and linear algebra etc.

230 questions
44
votes
2 answers

What is GELU activation?

I was going through BERT paper which uses GELU (Gaussian Error Linear Unit) which states equation as $$ GELU(x) = xP(X ≤ x) = xΦ(x).$$ which in turn is approximated to $$0.5x(1 + tanh[\sqrt{ 2/π}(x + 0.044715x^3)])$$ Could you simplify the equation…
thanatoz
  • 2,495
  • 4
  • 20
  • 41
23
votes
2 answers

Formal proof of vanilla policy gradient convergence

So I stumbled upon this question, where the author asks for a proof of vanilla policy gradient procedures. The answer provided points to some literature, but the formal proof is nowhere to be included. Looking at Sutton,Barto- Reinforcement…
16
votes
3 answers

What does it mean when we say most of the points in a hypercube are at the boundary?

If I have a 50 dimensional hypercube. And I define it's boundary by $0
16
votes
5 answers

Beginner math books for Machine Learning

I'm a Computer Science engineer with no background in statistics or advanced math. I'm studying the book Python Machine Learning by Raschka and Mirjalili, but when I tried to understand the math of the Machine Learning, I wasn't able to understand…
14
votes
4 answers

Do you actually need math for your data science job?

I am a physicist working in a data scientist role. I was told everywhere that my degree is a very good starting point because I know a lot of math and it is crucial for this job. But other than understanding the math behind the models' calculations…
Physicist92
  • 141
  • 1
  • 3
12
votes
1 answer

Finding linear transformation under which distance matrices are similar

I have $n$ sets of vectors, where each set $S_i$ contains $k$ vectors in $\mathbb{R}^d$. I know there is some unknown linear transformation $W$ under which the distance matrix $D_i$ (a $k\times k$ matrix) is approximately "the same" (i.e. has a low…
8
votes
2 answers

How does one derive the modified tanh activation proposed by LeCun?

In "Efficient Backprop" (http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf), LeCun and others propose a modified tanh activation function of the form: $$ f(x) = 1.7159 * tanh(\frac{2}{3}*x) $$ They argue that : It is easier to approximate with…
Lucas Morin
  • 2,775
  • 5
  • 25
  • 47
7
votes
1 answer

bias and variance trade off related question

I am having difficulty to understand the expected squared errors formula in this website: $y=f(x)+e$ true regression line $\hat{y}=\hat{f}(x)$ your estimated regression…
gyambqt
  • 71
  • 2
7
votes
2 answers

When is the sum of models the model of the sum?

The response variable in a regression problem, $Y$, is modeled using a data matrix $X$. In notation, this means: $Y$ ~ $X$ However, $Y$ can be separated out into different components that can be modeled independently. $$Y = Y_1 + Y_2 + Y_3$$ Under…
7
votes
2 answers

Comparison between addition and multiplication function in deep neural network?

I designed a specific Convolution Neural Network to study in the area of image processing. The network has a part that there are two tensors that have to be transformed into a tensor in order to be fed to the next layer. This situation happens at…
6
votes
0 answers

Maths of Xavier initialization

The paper I read is Glorot et al (2010). And the math part is in Section 4.2.1. Formula (5) and (10) make sense to me but I cannot derive formula (6) and (7) myself from (2) and (3). I found many tutorials on the internet used the formula $$Var[XY]…
Jason
  • 61
  • 2
6
votes
1 answer

Evaluate the rotation of a polygon based on a reference polygon

In the context of machine vision, I need to evaluate the rotation of a polygon recognized on an image based on a reference polygon. This is better explained with a picture: The reference polygon is available as an array of coordinates,…
ocroquette
  • 221
  • 4
5
votes
1 answer

Estimating the value of $\pi$ with a Monte Carlo dartboard: $<$ or $\leq$?

I'm trying to figure out which is the proper way to estimate $\pi$ using the Monte Carlo method randomly distributing points in a square that also contains an inscribed circle. Some sources say to use the comparison of $\sqrt{x^2+y^2}\le 1$, while…
Oatmeal
  • 153
  • 3
4
votes
4 answers

Decision boundary in a classification task

I have 1000 data points from the bivariate normal distribution $\mathcal{N}$ with mean $(0,0)$ and variance $\sigma_1^2=\sigma_2^2=10$ with the covariances being $0$. Also there are 20 more points from another bivariate normal distibution with mean…
4
votes
2 answers

How propagate the error delta in backpropagation in convolutional neural networks (CNN)?

My CNN has the following structure: Output neurons: 10 Input matrix (I): 28x28 Convolutional layer (C): 3 feature maps with a 5x5 kernel (output dimension is 3x24x24) Max pooling layer (MP): size 2x2 (ouput dimension is 3x12x12) Fully connected…
Julen
  • 109
  • 1
  • 6
1
2 3
15 16