The implementation of the TruncatedGaussian class is a bit opaque due to the complex mathematics involved.

This page provides formulae and explanations to help explain how it works.

Except where otherwise noted, we'll assume 2 dimensions here, as 1-d is trivial and higher dimensions are not supported.

We will denote by $\Omega$ the region in $\mathbb{R}^2$ such that all dimensions are positive, and use $\Gamma$ for its boundary, which consists of the coordinate axes and an arbitrary path joining them at infinity (which we will regularly ignore since all the functions were are concerned with are zero at infinity).

Standard-Parameter Integrals

In the standard parameters form, we are modeling the function

\[ f(\alpha) = \frac{1_{\Omega}(\alpha)}{k\left|2\pi\Sigma\right|^{1/2}} e^{-\frac{1}{2}(\alpha-\mu)^T \Sigma^{-1} (\alpha-\mu)} \]

Where $1_{\Omega}(\alpha)$ is the indicator function

\[ 1_{\Omega}(\alpha) \equiv \begin{cases} 1 & \text{if $\alpha \in \Omega$}\\ 0 & \text{if $\alpha$ otherwise} \end{cases} \]

and $k$ is the normalization constant that ensures that $f(\alpha)$ integrates to unity:

\[ k \equiv \int_\!\!d\Omega\; \frac{1}{\left|2\pi\Sigma\right|^{1/2}}\; e^{-\frac{1}{2}(\alpha-\mu)^T \Sigma^{-1} (\alpha-\mu)} \]

we assume that the matrix $\Sigma$ is full-rank (and throw an exception if it is not), because $k$ is infinite if $\sigma$ is singular.

In this case, there is no analytic integral, and we use a translation/reimplementation of the "bvnu" MatLab routine by Alan Genz to do the numerical integration (see detail::bvnu).

However, bvnu takes its arguments in a different form, computing the integral

\[ \text{bvnu}(h, k, \rho) = \frac{1}{2\pi\sqrt{1-\rho^2}}\int_h^{\infty}dx\int_k^{\infty}dy \;e^{-(x^2 - 2\rho x y + y^2)/(2(1-\rho^2))} \]

Computing the correlation coefficient $\rho$ from the covariance matrix is straightforward:

\[ \rho \equiv \frac{\Sigma_{1,2}}{\sqrt{\Sigma_{1,1} \Sigma_{2,2}}} \]

and transforming to the new form is simply a matter of making the substitution

\[ \eta_1 = \frac{\alpha_1 - \mu_1}{\sqrt{\Sigma_{1,1}}} \]

\[ \eta_2 = \frac{\alpha_2 - \mu_2}{\sqrt{\Sigma_{2,2}}} \]

Since both integrals are normalized in the untruncated case, we have

\[ \frac{1}{\left|2\pi\Sigma\right|}\int_0^{\infty}d\alpha_1\,\int_0^{\infty}d\alpha_2\, e^{-\frac{1}{2}(\alpha-\mu)^T\,\Sigma^{-1}\,(\alpha-\mu)} = \text{bvnu}\left(-\frac{\mu_1}{\sqrt{\Sigma_{1,1}}}, -\frac{\mu_2}{\sqrt{\Sigma_{2,2}}}, \frac{\Sigma_{1,2}}{\sqrt{\Sigma_{1,1} \Sigma_{2,2}}}\right) \]

Series-Parameter Integrals

In the series form, we are modeling the function

\[ f(\alpha) = 1_{\Omega}(\alpha) \; e^{-q(0) -g^T \alpha -\frac{1}{2}\alpha^T H \alpha} = 1_{\Omega}(\alpha) \; e^{-q(\alpha)} \]

There are two distinct cases, depending on whether $H$ is singular. In both cases, we begin by compute an eigensystem decomposition of $H$:

\[ H = V\,S\,V^T \]

where $S$ is diagonal and $V$ is orthogonal. We use this to determine whether $H$ is singular, and if so, to compute its inverse and determinant. In the nonsingular case, we can then use the same bvnu routine used for the standard parameters form, noting that

\[ \mu = -H^{-1} g \]

\[ \Sigma_{11} = \frac{H_{22}}{|H|} \]

\[ \Sigma_{22} = \frac{H_{11}}{|H|} \]

\[ \rho = -\frac{H_{12}}{\sqrt{H_{11}H_{22}}} \]

The singular case in the series form can arise when a linear combination of the parameters is completely unconstrained, but the total amplitude is well constrained (and hence the integral of the TruncatedGaussian is bounded due to the truncation). This happens extremely often, when a galaxy model has two components and the radii of both approach zero.

In the singular case (which bvnu does not handle), we assume the eigensystem is structured as

\[ H = V\,S\,V^T = \left[\begin{array}{c c} V_{1,1} & V_{1,2} \\ V_{2,1} & V_{2,2} \\ \end{array}\right] \left[\begin{array}{c c} 0 & 0 \\ 0 & S_2 \end{array}\right] \left[\begin{array}{c c} V_{1,1} & V_{2,1} \ \ V_{1,2} & V_{2,2} \end{array}\right] = \left[\begin{array}{c c} V_{:,1} V_{:,2} \end{array}\right] \left[\begin{array}{c c} 0 & 0 \\ 0 & S_2 \end{array}\right] = \left[\begin{array}{c} V_{:,1}^T V_{:,2}^T \end{array}\right] \]

and we change variables to isolate the zero eigenvalue:

\[ \alpha = V\beta = V_{:,1} \beta_1 + V_{:,2} \beta_2 \]

\[ \beta = V^T\alpha = \left[\begin{array}{c} V_{:,1}^T\alpha \\ V_{:,2}^T\alpha \end{array}\right] = \left[\begin{array}{c} \beta_1 \\ \beta_2 \end{array}\right] \]

We will assume that $V_{1,2}$ and $V_{2,2}$ are both positive (both negative would be an equally valid assumption, just a different convention, but mixed sign would indicate a divergent integral), and that $V_{:,1}^T g=0$ (this is equivalent to stating that the singularity arises because the least-squares problem represented by the series expansion has many degenerate solutions).

The region of integration defined by $\Omega$, in the rotated $\beta$ frame, lies between the lines that represent the rotated $\alpha_1$- and $\alpha_2$-axes, which are

\[ \beta_1 = -\frac{V_{1,2}}{V_{1,1}}\beta_2 \]

and

\[ \beta_1 = -\frac{V_{2,2}}{V_{2,1}}\beta_2 \]

with $\beta_2 \ge 0$. We can thus write the integral in $\beta$ as

\[ \int_{0}^{\infty} d\beta_2 e^{-q(0) - g^T V_{:,2} \beta_2 -\frac{1}{2}S_2\beta_2^2} \int_{-\frac{V_{1,2}}{V_{1,1}}\beta_2}^{-\frac{V_{2,2}}{V_{2,1}}\beta_2} d\beta_1 =\left(\frac{V_{1,2}}{V_{1,1}}-\frac{V_{2,2}}{V_{2,1}}\right) \int_{0}^{\infty} d\beta_2\;\beta_2\;e^{-q(0) - g^T V_{:,2} \beta_2 -\frac{1}{2}S_2\beta_2^2} \]

This can now be integrated using integration by parts and standard 1-d Gaussian integrals (I used Mathematica) to yield

\[ \left(\frac{V_{1,2}}{V_{1,1}}-\frac{V_{2,2}}{V_{2,1}}\right)\frac{e^{-q(0)}}{S_2}\left[ 1 - z\sqrt{\pi}\;e^{z^2}\;\text{erfc}\left(z\right) \right] \]

where

\[ z \equiv \frac{V_{:,2}^T\,g}{\sqrt{2S_2}} \]

Sampling

TruncatedGaussian supports two strategies for sampling, set by the TruncatedGaussian::SampleStrategy enum. The "direct-with-rejection" strategy is quite simple, and needs no further development here (see the enum docs for more information). The "align-and-weight" strategy is a bit more complex.

With this strategy, instead of sampling from the untruncated Gaussian, we use importance sampling: we draw samples from a similar "importance" distribution, then weigh these by the ratio of the true distribution to the importance distribution at each point. For the importance distribution, we'll use an "axis-aligned" Gaussian (i.e. one with a diagonal covariance matrix); this will allow us to draw from each dimension independently. Instead of doing this by drawing directly from the 1-d Gaussian and rejecting, however, we'll draw directly from the truncated 1-d Gaussian using its inverse CDF.

In detail, then, we start with a function $f(\alpha)$, for which we know the total integral

\[ A_f = \int\!\!d\alpha\;f(\alpha) \]

(note that this includes the truncation), as well as the Hessian matrix $H$ and mean $\mu$. This mixing of the series and standard parametrizations is useful because we can always compute $H$ from $\Sigma$ because we assume the latter is always nonsingular, but we do not assume that $H$ is nonsingular. Similarly, we can compute $\mu=-H^+g$ from the series form. Because one logically draws from normalized probability distributions, not arbitrarily scaled functions, we are thus logically drawing from $f(\alpha)/A_f$, not simply $f(\alpha)$.

To construct the importance distribution, we'd like to use the diagonal of $H^{-1}=\Sigma$, but because we cannot guarantee that $H$ is nonsingular, we will instead use the inverse of the diagonal of $H$. This represents the width in each dimension at fixed values in the other dimensions (note that the diagonal of $\Sigma$ would represent the marginal values if the Gaussian were not truncated), so it's not quite wide enough to make a good importance distribution. To address this, we'll simply multiply by 2, so the importance distribution will have the diagonal covariance matrix $D$ defined by:

\[ D_i = 2\sqrt{H_{i,i}} \]

We could do much more clever things by computing the actual confidence limits of the truncated Gaussian, but the current approach seems to work well enough so it doesn't seem to be worth the extra work.

Our importance function is thus

\[ p(\alpha) = 1_{\Omega}(\alpha)\;\frac{1}{2\pi|D|^{1/2}}\; e^{-\frac{1}{2}\left(\alpha-\mu\right)^T\,D\,\left(\alpha-\mu\right)} = \prod_i p(\alpha_i) \]

\[ p(\alpha_i) = \frac{1_{\alpha_i\ge 0}}{\sqrt{2\pi D_i}}\;e^{-\frac{1}{2D_i}(\alpha_i-\mu_i)^2} \]

with normalization factor

\[ A_p = \int\!\! d\alpha\;p(\alpha) = \prod_i A_{p_i} \]

\[ A_{p_i} = \int_0^{\infty}\!\!d\alpha_i\;p(\alpha_i) = \frac{1}{2}\text{erfc}\left(\frac{-\mu_i}{\sqrt{2D_i}}\right) = \Phi\left(\frac{\mu_i}{\sqrt{D_i}}\right) = 1-\Phi\left(\frac{-\mu_i}{\sqrt{D_i}}\right) \]

To draw from $p(\alpha)$, we draw from each $r(\alpha_i)$ in turn, which can be accomplished via

\[ \alpha_i = \Phi^{-1}\left(\Phi\left(\frac{-\mu_i}{D_i}\right) + u\left[1 - \Phi\left(\frac{-\mu_i}{D_i}\right) \right]\right) \sqrt{D_i} + \mu_i = \Phi^{-1}\left(1 - A_{p_i} + u A_{p_i}\right)\sqrt{D_i} + \mu_i = -\sqrt{2}\,\text{erfc}^{-1}\left(2\left[1-u A_{p_i}\right]\right)\sqrt{D_i} + \mu_i \]

where $u$ is a uniform random variate on $(0,1)$, and we have used the fact that $1-u$ and $-u$ are both random variates on $(-1,0)$.

After drawing these from the importance distribution, we compute the weight for each point as

\[ w = \frac{A_p\;f(\alpha)}{A_f\;f(\alpha)} \]

Maximization

To derive the formulae for higher-order moments, we will use the same hybrid functional form as in the sampling section:

\[ f(\alpha) = e^{-\frac{1}{2}(\alpha-\mu)^T H (\alpha-\mu)} \]

Our goal is to solve the quadratic programming problem

\[ \max_\alpha f(\alpha) = \min_\alpha (\alpha-\mu)^T H (\alpha-\mu) \]

subject to

\[ \alpha \ge 0 \]

In general quadratic programming, the difficulty is in knowing which of the inequality constraints will be active at the solution, and "active set" methods work by adding and removing constraints as they explore the problem. In our case, the constraints are sufficiently simple (and the objective function is known to be convex), so we can simply identify the active set at the solution as the dimensions for which the untruncated maximum point (i.e. $\mu$) has a negative value.

We begin simply by iterating over the vector $\mu$, identifying an elements that are less than zero. For each such element $\mu_i$, we add a unit vector row to a permutation matrix $A$ such that $A^T\mu$ selects the components of $\mu$ that are less than zero. We can now solve the equality-constrained quadratic problem

\[ \min_\alpha (\alpha-\mu)^T H (\alpha-\mu) \]

subject to

\[ A^T\alpha = 0 \]

We start by computing the QR decomposition of $A$:

\[ A = Q R = \left[\begin{array}{c c} Q_1 & Q_2 \end{array}\right] \left[\begin{array}{c} R_1 \\ 0 \end{array}\right] \]

We then perform the change of variables

\[ \tau = Q^T \alpha = \left[\begin{array}{c} Q_1^T \alpha \\ Q_2^T \alpha \end{array}\right] = \left[\begin{array}{c} \tau_1 \\ \tau_2 \end{array}\right] \]

with inverse

\[ \alpha = Q\tau = Q_1\tau_1 + Q_2\tau_2 \]

The constraint system reduces to $\tau_1=0$, which we can then plug into the objective function:

\[ \min_{\tau_1} (Q_1\tau_1 - \mu)^T H (Q_1\tau_1 - \mu) \]

Differentiating by $\tau_1$ and setting the result to zero, we have:

\[ 2 H (Q_1\tau_1 - \mu) = 0 \]

or simply (because $Q_1^T Q_1 = I$)

\[ \tau_1 = Q_1^T\mu \]

The corresponding value of $\alpha$ is then

\[ \hat{\alpha} = Q_1 \tau_1 = Q_1 Q_1^T \mu \]

Moments

To derive the formulae for higher-order moments, we will use the same hybrid functional form as in the sampling section:

\[ f(\alpha) = e^{-\frac{1}{2}(\alpha-\mu)^T H (\alpha-\mu)} \]

As we are interested in normalized moments, we will not include an additional arbitrary scaling, and we will include the truncation in the region of integration; the moments we are interested in are:

\[ m_0 = \int\!\!d\Omega\;f(\alpha) \]

\[ m_1 = \frac{1}{m_0}\int\!\!d\Omega\;\alpha\;f(\alpha) \]

\[ M_2 = \frac{1}{m_0}\int\!\!d\Omega\;(\alpha - m_1)(\alpha - m_1)^T\;f(\alpha) \]

Clearly, $m_0$ is trivially related to the integrals discussed above, so we will start on $m_1$ by noting that

\[ m_0 H (m_1-\mu) = \int\!\!d\Omega\;(H\alpha-\mu)\;f(\alpha) = -\int\!\!d\Omega\;\nabla f(\alpha) \equiv \hat{m}_1 \]

This surface integral is equivalent to the lower-dimensional integral

\[ \hat{m}_1 = \int\!\!d\Gamma\;\hat{n}\;f(\alpha) \]

where $\hat{n}$ is a unit vector normal to $\Gamma$, pointing outwards from the center of $\Omega$. In 2-d, this line integral can be split into three components:

an integral along $\alpha_1=0$ from $\alpha_2=0$ to $\alpha_2=\infty$, with $\hat{n}_1=-1$ and $\hat{n}_2=0$.
an integral along $\alpha_2=0$ from $\alpha_1=0$ to $\alpha_1=\infty$, with $\hat{n}_1=0$ and $\hat{n}_2=-1$.
an integral along a connecting path at infinity we can ignore because the integrand is zero there. Thus, for component $i$ (and $j \ne i$), we have $[ \left[\hat{m}_1\right]_i = \int\!\!d\alpha_j\;f(\alpha)|_{\alpha_i=0} $]