Parameter estimation

Suppose we have the model
where indexes the functions we care about. We are generally interested in . One idea here is, for fixed (which will be disregarded below), we can follow Spantini and create
and then create samples . Then, we create a map from these samples to a reference and form the composition . When such an algorithm is performed recursively, this will approximately sample the distribution .

However, we may not know our . Then, everything becomes more difficult. We may, in principle, want to approximate but oftentimes is just some nuisance parameter (sometimes physical, like viscosity, or sometimes a nonphysical parameter to our model). In a frequentist sense of the nuisance case, we now would have for some unknown . The parameter, as mentioned above, indexes the functions we care about and so we, in principle, now have an infinite number of distributions we may care about. Contrast this with how indexes , where there's really some particular we can actually observe which will clearly point us toward one particular distribution.

For more of discussion of a state-augmentation approach, see my thesis.

Another idea would be to form a map given each , our th estimate of at timestep . Then, we can do exactly as above to get an ensemble which is our analysis ensemble given our prior parameter estimate. Getting the posterior-predictive average , we now have pairs . We should finally be able to calculate a map from these samples and then perform . A few notes

  • We then have to perform a proper analysis step using the "correct" . These then will properly come from .
  • We need to be associated with the distribution . I qualitatively like the average (it seems nice!), but another very reasonably idea would be to just take a sample .
  • There's a statistic that we, in some sense, disregard here: , the parameters of . It would be extraordinarily curious to instead perform inference on the pair .
    • Then, , where .

General idea?

Flipping some notation, suppose we have a joint distribution (i.e., static problem, no observations). Consider the idea that parameterizes the likelihood (again, in a somewhat frequentist sense). Then, we may end up sampling , i.e., have an ensemble of samples of for each choice of (representing a distribution). We may want to perform some kind of generative modeling (or surrogate) approach of learning from these samples. One method would just be to throw out all but one for each (i.e. just take joint samples ), but this wastes precious information. Another idea, perhaps a little more "out-there" would be to learn maps learned from mapping to a Gaussian distribution (where every is parameterized identically). Then, each map would be parameterized by some vector . In some sense, this is ideally some kind of summary statistic of a distribution. Then, we have pairs . If we learn the map (not necessarily the same parameterization as ) mapping to a Gaussian, then we would be able to generate samples for a given parameter to generate realizations of , which in turn would allow us to plug into our and generate realizations of .