Questions tagged [diffusion]
12 questions
2
votes
1 answer
Diffusion Models: Conditioning on Time vs. Noise Level
I am new to SE-Data Science, therefore I hope this is the right place to ask this rather theoretical question.
In diffusion models we usually have a time variable which determines the noise schedule (e.g. $T \in [0,…4000]$). For training we sample a…
Lockhart
- 133
- 3
2
votes
2 answers
Edit friendly DDPM noise space
I was reading this paper, "An Edit Friendly DDPM Noise Space: Inversion and Manipulations". In page no. 4, they have mentioned that in DDPM, noise maps of consecutive steps are highly correlated while their edit friendly noise maps of consecutive…
shivani
- 150
- 10
1
vote
0 answers
Should I interleave sin and cosine in sinusoidal positional encoding?
I'm trying to implement a sinusoidal positional encoding. I found two solutions that give different encodings. I am wondering if one of them is wrong or both are correct. I showcase visual figures of the resulting encodings for both options. Thank…
Janikas
- 111
- 1
1
vote
1 answer
How does ChatGPT-4o work on text + image data?
What known state of art techniques might ChatGPT-4o, Claude 3 or other similar systems be using to understand both text and image data? I noticed that ChatGPT-4o can recognize text in an image well. Might it be using an external OCR tool or has it…
user163246
- 11
- 2
1
vote
0 answers
diffusion model: can't overfit on single batch
I am training the diffusion model from diffusion policy, specifically their vision notebook, on a custom dataset. As always, I try to make a sanity check of the pipeline, by overfitting on a single batch. I would expect the loss to go to 0 or nearly…
Felix Hegg
- 11
- 1
1
vote
0 answers
How to derive at the expectation equation given in the paper "Video Diffusion Models"?
In the paper Video Diffusion Models, Section 3.1 mentions the following equation:
$$ E_q[x^b|,z_t,x^a] = E_q[x^b|z_t] + (\frac{{\sigma}_t^2}{{\alpha}_t})\nabla_{z_t^b}\log q(x^a|z_t)$$, where $x^a, x^b$ are two video samples, $q$ is forward…
p1p13
- 21
- 2
1
vote
0 answers
Common sense fixes to a buggy diffusion model that won’t overfit one sample?
hope this question is in the right place. I’m working with a toy diffusion model to generate points e.g learning a Swiss roll which to me is a basic use case that I wanted to start with.
My model is generally sensible and I’ve implemented both a…
pxvxrx
- 11
- 1
0
votes
0 answers
Rate-distortion plots in denoising diffusion model evaluation
In the Denoising Diffusion Probabilistic Models paper (https://arxiv.org/abs/2006.11239), the rate-distortion plot is computed assuming access to a protocol that can transmit samples $(x_T, ... x_0)$. This is then used to construct Algorithm3 and…
abora
- 1,228
- 1
- 9
- 3
0
votes
0 answers
Connexion between noise and score in Diffusion
In the article Score-Based Generative Modeling through Stochastic Differential Equations (Song and al.), it's explained that we need to solve the reverse-time SDE to obtain samples from image distribution $p_{0}$:
$$ \text d \mathbf{x} =…
Pepper08
- 1
0
votes
0 answers
Accurate score function estimation using score-based diffusion models
My question is mainly related to the seminal paper by Song et al.: "Score-Based Generative Modeling through Stochastic Differential Equations". I would like to leverage their framework in order to build a strong prior that can accurately estimate…
cosec
- 51
- 1
- 1
- 3
0
votes
0 answers
Diffusion Model consistency term derivation question
The consistency term of the diffusion model is written as:
$$\mathop{\mathbb{E_{q_\phi(x_{1:T}|x_0)}}} \left[\log\prod_{t=2}^T \frac{p(x_{t-1} | x_t)}{q_\phi(x_{t-1}|x_t, x_0)}\right]$$
$$= \sum_{t=2}^T \mathop{\mathbb{E_{q_\phi(x_t,x_{t-1}|x_0)}}}…
0
votes
0 answers
using ddpm (Denoising diffusion probabilistic model) to teach the model to generate low resolution satellite images
what i am doing right now is using low res satellite images to train a ddpm. My problem lies with the dataset . The data set consists of 10 band-images of the same patch of land but with variation in tree sizes(+-20%) and variation in some physical…