3

I'm a programmer and I've studied some calculus and linear algebra years ago. I've been getting in to AI recently and I struggle understanding some of the mathematical notation and formulas. I apologize ahead of time if my questions are stupid.

In programming, when there exists functionality that I wish to use but do not understand, I can simply google the documentation or look at the implementation. Descriptive naming is used and it really helps with readability.

However, when it comes to mathematics, I struggle understanding how to make use of what I see in research papers. I am lost as to where I can even navigate a formula like this

$$ \\ p_{\theta}\left(\mathbf{x}_{0:T}\right) := p\left(\mathbf{x}_T\right) \prod\limits_{t=1}^T p_{\theta} \left(\mathbf{x}_{t-1} \vert \mathbf{x}_t \right), p_{\theta}\left(\mathbf{x}_{t - 1} \vert \mathbf{x}_t\right) := \mathcal{N}\left(\mathbf{x}_{t - 1} ; \mathbf{\mu}_{\theta} \left(\mathbf{x}_t, t\right), \mathbf{\Sigma}_{\theta} \left(\mathbf{x}_t, t\right)\right) \\ $$

How can I emulate the research process I use for programming?

I would like to know why they the naming they did. What do each and every symbol mean? How am I supposed to compute individual components of this formula?

Why is it p(θ) and not something like p(a)? Why are we using theta over another variable name? How can I better navigate my research? What resources are available to better comprehend research papers math?

  • paper can be found here. https://proceedings.neurips.cc/paper/2020/file/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf – danial javady Dec 01 '22 at 18:38
  • 2
    Notations in research papers are aimed to be read by professionals who work around the field. So in many cases, unless the notation is newly defined by the author, you should refer to the bibliography in the paper to decode the notations you read. – user1851281 Dec 01 '22 at 18:39

1 Answers1

2

Ideally, the paper should define all notation used. But authors often skip familiar notation in the field. Introductory courses can help, as they should not assume you're already familiar. If you're not sure what is the exact topic that's relevant, check the author's citations near the formula of interest and in the introduction. Hopefully they've cited a textbook, or one of those papers cited have a more explanatory presentation. Sometimes if you enter the topic in Google scholar, you'll stumble across a perspective paper about properly understanding or misunderstandings of the topic; those are useful as they have to be explanatory. Also try searching for a relevant paper in an education journal (I don't know an appropriate one for maths).

As for why $p(\theta)$ over $p(a)$, that's often arbitrary. People try to choose the first letter of the meaning of the variable; e.g. $p$ for probability. But often the common notation is because one researcher who made great advances in the field started using it, and so everyone else followed suit. I suspect that's the rationale behind $\theta$. Sometimes authors deviate from convention simply because they're already using the conventional symbol for another variable. Don't get hung up on the symbol choice. If you don't see a rationale, just assume that's just what the authors chose.

Tony Mathew
  • 2,378
  • 6
  • 17