Help an geographer define an equation!

Question

I'm a physical geographer and I'm working on a piece of research where I have some broad physical relationships which I want to try and put into an equation - i.e. I want to define an empirical equation which describes the phenomena I've observed. I'm way out of my depth in formulating equations so I'm hoping someone here can help. I have a suspicion this is not a hugely complex problem for anyone who knows maths!

It's probably not relevant to the question, but it's all about how the physical properties of peat bogs alter how their water content changes over time, and then how that water content, coupled with their density, affects wildfire risk.

For a given point in time, and a given location I'll have a volumetric water content $V_x$. (So I I'll have a lots of these values for different times and locations)

For each location I'll have a normally distributed bulk density $pb$, for which I know the mean and standard deviation.

If the volumetric water content is multiplied by bulk density it gives gravimetic water content $U_x$. (So for any given $V_x$ and I'd have a bunch of values for $U_x$, which I think would be a probability distribution?)

Gravimetric water content determines wildfire susceptibility (drier is more susceptible). It makes sense to define 2 thresholds: $U^*$, below which burning is almost certain, and $U_w$ above which burning is unlikely. I'd like to define this as probability of burning with 1 (or near 1) at $U^*$ and a probability of $~0.1$ to $0.2$ at $U_w$. Between these values the relationship would not be linear but would be a curve which drops off with increasing U. Ideally (as we're still trying to get a handle on this) there would be an adjustable factor for exactly how quickly this drops off with increasing $U$ approaching $U_w$.

So what I'd like to do is take a probability distribution of $U$ (based on the $V_x$ and the distribution of pb values). And then multiple this by the probability of burning relationship so I can say; given a volumetric water content, and given the bulk density distribution, $K$ is the probability of wildfire burning. As I said, I'm just not good enough with maths to put all these things together into an equation! The ultimate idea is to propose this empirical equation which then I and other researcher can test.

Thanks in advance for any help you can give in helping me define this.

You may want to consider also posting this question on "Cross Validated Stackexchange" (http://stats.stackexchange.com/), which is the stats site. As far as I know cross posting on two stackexchange sites is allowed if the question is worded carefully to be on-topic on both sites. — Ovi, Mar 14 '17 at 16:42
I think you want to do some kind of regression. You mention non-linearity, so I'd have to see data to know how to do it. (Once a model is established, it is best to let the data decide the particulars of the equation.) On what basis are the thresholds for flammability determined? If categories are well-established, maybe an ANOVA would be more successful than regression. Without data the question is too vague to answer responsibly. — BruceET, Mar 14 '17 at 18:17
Logistic regression might be appropriate. The left hand side is coded 1 when fires have been observed and 0 otherwise. — A. Webb, Mar 14 '17 at 19:57

score 1 · Answer 1 · edited Apr 13 '17 at 12:20

First, let me just rewrite what I think you've said, which you can correct if needed.

You have a volumetric water content value $V(x,t)$ at locations $x$ and time $t$. You further have a set of random measurements of bulk densities $b_i(x,t)$ at each location and time, where $i=1,\ldots,n(x,t)$ is the measurement number, such that each $b_i$ is normally distributed; i.e. $\forall i,\;b_i(x,t)\sim\mathcal{N}(\mu(x,t),\sigma^2(x,t))$, where distributions vary over space and time as well. Gravimetric water content is then given in sets per location by: $ U_i(x,t)=V(x,t)b_i(x,t) $.

For your first question, assuming $b_i$ is normal and $V$ is a constant per (spacetime) location, then yes, $U_i$ can be treated as a random variable with probability disribution given as $U_i(x,t)\sim \mathcal{N}(V(x,t)\mu(x,t),[V(x,t)]^2\sigma^2(x,t))$; i.e. the probability distribution of the bulk densities is normal, just with scaled mean and variance (see here).

Ok, so now things seem to be a little open-ended. I'll give two suggestions: the first is what I think you want and the second is what I think you should do.

It seems you want an analytic model that describes the relation between a $U_i$ and the "probability of fire", say $P_f(U_i)$, such that $P_f$ is 1 when $U_i$ is low (I guess it's always positive though), and falls off at a controllable rate. One idea is to use a logistic function, in particular: $$ P_f(u) = \frac{1 + \exp(-ab)}{1 + \exp[-a(-u+b)]} $$ where $a,b >0$ are positive constants. You can vary them (or better yet fit them to the data) to get the $U^*$ and $U_w$ you want. See here for an example plot. For example, if $a=2$ and $b=5$, then $U^*\approx 3$ and $U_w\approx 6$ (I have no idea what the real numbers would look like).

Notice that when $u=0$, we get $P_f(0)=1$, and when $u\rightarrow\infty$, we get $P_f(u)\rightarrow 0$.

One issue (to me, anyway) is how to combine all your measurements (i.e. you'll get a new set of data $S(x,t)=\{P_f(U_i(x,t))\}_{i=1}^{n(x,t)}$ per location that you'll want to combine into a final prediction). There are many ways to do this, and many ways to define $P_f$ differently as well. In particular, I think it would make sense to define a statistical model parametrized by the mean and variance of your distribution (so as to take the data into account directly).

Personally, if you have wildfire data relating $U$ values to wildfires, this sounds like a supervised learning problem. As the comments mention, it is better (albeit perhaps less interpretable in a way) to use a statistical model (for which logistic regression is an example). (One could also take the time series correlations into account (say, areas that have been a dry for longer might have more likelihood of fire) e.g. with a stochastic process model, but that's a bit much (for now, anyway)).

You could, for instance, learn on the $U_i$ values directly, or (as I mentioned before) learn from the distribution parameters. In the latter case, the function would be something like $f(x_1,x_2)=f(\mu,\sigma^2)$ fitted by e.g. logistic regression and then "tested" by inputting $(V(x,t)\mu(x,t),[V(x,t)]^2\sigma^2(x,t))$.

I don't know ... sorry it's not very precise ... Just some ideas :)

Help an geographer define an equation!

1 Answers1