Nongaussian KL Expansion

Intro

Suppose we have a scalar-valued stochastic process for and , where describes our physical domain (spatial + temporal) and describes the sample space. In other words, suppose for some symmetric positive definite operator and for some . Making some relatively mild assumptions about , i.e., that is compact and bounded, we can see the KL expansion as

where we can define

Then, by construction,

which simply comes from the orthonormality of . Suppose further that is a Gaussian process (GP) (i.e., any collection of gives s.t. are jointly Gaussian). Then, one can prove that , meaning we can easily sample from given the eigenvalues and eigenfunctions (see the Nystrom method).

It suffices to show that are jointly Gaussian (identity covariance and zero mean give the exact shape). This loosely comes from the fact that if you use a convergent scheme to converge the integral in ^b94337 , each step will be a sum of Gaussian variables, which itself is Gaussian.

Nongaussian SP

The problem arises when is not a GP; we of course maintain zero mean and identity covariance, but the higher moments will become nontrivial (and thus the joint distribution of nongaussian). We can express the covariance function as

and consider the term truncation of such a series. We can express as the leading eigenfunction of . I believe, though obviously this isn't rigorous, that this implies a certain "triangular" nature of , in that they are well-ordered in the Knothe-Rosenblatt sense.

Suppose we, for the time being, can perform the integrals in ^992e28,^b94337 analytically (or, at the very worst, "close enough"), and thus can find the eigenfunctions and -values. Therefore, for a given set of samples of the stochastic process, we can obtain samples with where is our truncation level of the KL expansion and for some unknown distribution . We can calculate an acceptable before any discussion of how to describe using samples; the acceptable truncation error exactly comes from the decay of the covariance kernel spectrum.

If we could sample exactly, we could create samples of "easily" and "cheaply". The problem comes down to approximating the distribution of . In practice, we generally have a stochastic process where is a stochastic input to the process (consider, e.g., random diffusivity or initial condition, where is the solution to a PDE). More on this in On stochastic inputs.

Suppose, though, we only have access to samples of the solution (and no knowledge/desire for any stochastic inputs). Then, we create an invertible map such that , i.e., we find a transport map where . If we can find such a map from the samples , then we can sample from via the relationship for !

From there, we can sample new approximate solutions

On stochastic inputs

If we truly have , where is a deterministic function of and random variable following distribution , then there must exist a deterministic mapping , since must then be deterministic for any given realization of . In a distributional sense, we could create a map such that the operator satisfies . This is what is traditionally done in our previous triangular transport work. However, this is not well-posed. We know for certain that is a degenerate distribution! There's only one possible mapping from a choice of input to a given KL sample .

What to do? A few options, I suppose.

  1. Find a map such that . Note that this is ill-posed; the measures can be equivalent even when the samples get transported poorly. For example, suppose the input is a coin-flip on 0 1 and the output is a coin flip on 1 2. There's two ways of transporting these distributions, but one of them will guarantee you're always wrong.
  2. Find the map as above such that . This totally disregards interpolation and just matches points (though I suppose, with enough parameters, you could pass overfitting?)
  3. Use something like entropically-regularized OT? Idk enough to say much. Would I need the input/output to match?
  4. Suppose that is in fact still stochastic for fixed , i.e., doesn't capture all of the randomness. Then, the approach described by is perfectly well-posed, I think?

Conversation with michael

Interesting behavior of high-order modes

We observe that the high order stochastic modes of the KLE are increasingly Gaussian (or at least log-convex looking). Why is that? Is there any reason the bimodal behavior shouldn't get pushed out to parameter number 28? The spectral decay is particularly fast, so this isn't altogether unreasonable to assume that, since the KLE hinges on everything being L2, and the Gaussian is the pretty reasonable choice of measure on L2, it tends to look log-convex or something like that?

On practical use-cases of this algorithm

The case studied in the siam UQ presentation is a reaction with bimodal output concentration. It seems entirely reasonable to then consider a stochastic process source term, for example, which can be represented via the KLE, and we use the diffusion equation, e.g.

Here, may or may not be stochastic (undecided). Now suppose we represent spectrally and as a KLE, i.e.

Then, consider such that is the given inner product here (obviously, this is an infinite-dimensional matrix). This can be computed offline to a fixed and . Since we are interested in finding and we draw , then we just solve

Where are all known a priori, i.e., offline. Each time we would like to sample , however, this would require a new matrix solve. But if is factorized a priori (via, e.g., SVD), then the "online" cost should be on the order of a few matrix multiplications.

More spectral

Suppose now that and further that . Then, . We get that

Now we examine the inner product to note that

Here, though, we can precisely recover that this has a strong relationship to the Fourier series!

Matt thoughts

  • Problem doesn't actually reduce here? The fact this is spectral doesn't make a huge difference
  • See traditional spectral methods: Learning intrusively via would be the goal.
  • I don't personally like this-- training and simultaneously seems... dissatisfying

Further Thoughts

  • What's the problem with spectral UQ? Number of basis functions to solve for gets prohibitively large in high dimension
    Suppose we just do a spectral expansion of with linear differential operator with random coefficients
    Practically, we end up solving
    If all are uncorrelated or something, then we can solve each equation separately via inner product with and find the accordingly. The problem, however, is
  1. What if we don't know the distribution of the input (i.e. we don't know what basis functions to use)
  2. Even if we did know , if it's high-dimensional, we're out of luck (it needs to be a projection in many dimensions and the coefficients explode exponentially in the dimension)