The Mathematics of BlurXTerminator

BlurXTerminator takes a different approach to deconvolution. Not only does it use a machine learning algorithm, but it was also developed using an updated mathematical formulation of deconvolution itself. Implicit in the classical formulation of deconvolution is the goal of perfect recovery of an original signal. By recognizing that this represents an unobtainable absolute, and by instead embracing the fundamental limitations of deconvolution, a new formulation results that is workable and provides a practically achievable goal. This approach also results in a formulation that is well-suited to training a machine learning algorithm.

The classical description of deconvolution begins with describing its inverse operation, convolution:


where $f$ is the original signal (e.g., image), $g$ is a point spread function (PSF), and $h$ is the image produced by the convolution of $f$ with $g$. Deconvolution is defined as inverting this process: recovering $f$ given $h$ and $g$.

Real images contain noise, and the above is usually modified to express this explicitly:


Because of the noise, $n$, and also because of imperfect knowledge of the PSF, $g$, amongst other factors such as finite numerical precision, there is no such thing as perfect deconvolution in practice. All deconvolution algorithms produce an estimate of the original image, $f$. In other words, real deconvolution algorithms, run on real image data, cannot completely deconvolve an image to a zero-width PSF. Their results can instead be described as the original image, $f$, convolved with a new (hopefully smaller) PSF, $g’$:


where $h’$ is the output of the deconvolution algorithm. It can be easily recognized that in the limit of taking $g’$ to an ideal Dirac delta function, this formulation becomes identical to the classical description of deconvolution.

Stated another way, and in the context of astronomical images, an ideal algorithm in the classical sense would transform every star in the image into a perfect point source. This is never observed in practice. It is an “upper bound” that can be approached but never reached. Perfect deconvolution, therefore, exists only as an idealized concept. What is observed in practice is that the existing PSF, $g$, is in effect replaced by a smaller PSF, $g’$.

Real deconvolution algorithms also introduce artifacts. The most familiar example is ringing. This causes dark halos around stars and the appearance of false structure in the areas of an image dominated by noise. While ringing by itself could technically be expressed as a PSF that has lobes with negative values, we can also simply lump it and any other errors produced by a given algorithm into a single error term:


The error, $e$, may be related to $f$, $g’$, and $n$ in complex ways. The point of collecting all errors into a single term is the following:

$e = f*g’+n-h’$

Defining $\mathcal{F}[x,\pmb{W}]$ as the operation performed by a machine learning deconvolution algorithm on an input, $x$, using a set a trainable weights, $\pmb{W}$, this becomes:


where $e$ is the training loss function. During training, $f$, $g$, $g’$, and $n$ are perfectly known: start with an ideal input image, convolve it with a chosen PSF, and add noise to produce the input training image, $h=f*g+n$. The corresponding ground truth image, the ideal output of the algorithm, is $f*g’+n$. This is simply the same ideal image convolved with a new (smaller, not aberrated) PSF, $g’$, plus noise.

Formulating deconvolution in the above way also provides control over the amount of deconvolution. By making the output PSF, $g’$, a parameterized quantity, we can specify exactly how much the algorithm should reduce the diameter of the PSF, or even how its other characteristics should be modified. This is in stark contrast to the classical algorithms which not only have the unachievable end goal of a zero-width PSF, but also have no control over whatever final PSF is obtained. They also usually have no clear stopping criteria for their iterations. We run them for as many iterations as we can before artifacts appear, and we get whatever final PSF we get.

Using the above approach, during training the machine learning algorithm produces an output, $\mathcal{F}[f*g+n,\pmb{W}]$, with error, $e$. A suitable optimization algorithm determines the gradient of the error with respect to the weights, $\nabla_{\pmb{W}}e$, and updates the weights iteratively to eventually minimize the error. If the distributions of the features in the ideal images, the chosen PSFs, and the noise are sufficiently broad and representative of real-world scenes and image acquisition systems, an algorithm is obtained that generalizes well to deconvolving real images with high accuracy.

Separating stellar and non-stellar image components

The above formulation of deconvolution can be taken a step further by decomposing the original image, $f$, into stellar and non-stellar components:


This is physical: some features in an astronomical scene originate directly from stars. Other features originate from non-stellar objects – clouds of gas and dust that reflect, emit, and/or absorb light. The actual scene in terms of received light is the arithmetic sum of these components.

Decomposing the original scene in this way, presuming that it is mathematically tractable to do so in practice, and given that the output PSF is a parameterized quantity, an additional degree of freedom in the deconvolution operation results: the stellar and non-stellar features can be deconvolved by different amounts. Stating this formally, while the captured image,


is convolved with a single PSF, $g$, the deconvolved image, $h’$, could, if we so choose, have different PSFs applied to each component:


The motivation for treating stellar and non-stellar features differently in this way is of course purely aesthetic and has no scientific value. Non-stellar features have much lower contrast than stellar features, and can usually be deconvolved more before artifacts appear, an observation that is as true for the classical algorithms as it is for the present algorithm. Equal treatment of all features is still an option if $g’_s$ and $g’_{ns}$ are identical.

Proceeding with the above development, any and all errors in the deconvolution estimate have again been lumped into a single term, $e$, so that the training loss function becomes:


Expanding this as before,


This is in fact precisely how BlurXTerminator’s convolutional network is trained.

Non-stationary point spread functions

The PSF of an image captured with real optics is rarely stationary, which is to say uniform for all positions in the image. Various off-axis aberrations such as coma, astigmatism, and field curvature cause the PSF to vary across the image – features in the corners of an image are rarely as sharp as those in the center. In reality the PSF varies smoothly across an image, and an ideal deconvolution algorithm would use a different PSF estimate for every position in the image.

An approximation to this ideal can be made by turning a limitation of convolutional networks into an advantage.

Efficient implementation of the computations in a convolutional network on modern hardware requires constructing a directed graph that parallelizes these computations as much as possible. The construction of this graph represents a significant amount of setup time, but subsequent computations on input data of the same dimensions can be performed using the same constructed graph. The expensive setup time can thus be amortized over a number of input data sets: an image can be broken up into same-sized chunks, or tiles.

In the case of a deconvolution network, each tile can be deconvolved with an estimated local PSF. Assuming that the PSF does not significantly vary within a tile, and assuming that a reasonable estimate of the local PSF for each tile is available, this can produce a reasonable step-wise approximation to the ideal.

BlurXTerminator is a blind deconvolution algorithm: a PSF estimate is not supplied as an input. The PSF for each tile is instead inferred from the stars present in that tile, stars being excellent approximations of point sources and therefore copies of the PSF with varying brightness and color. While this approach can fail on occasional tiles that contain no stars, in practice this is not frequently observed. A number of measures, such as overlapping the input tiles, are taken to further reduce the probability of not having a reasonable local PSF estimate for every part of an image.

Share this with others...