tags:
- approximation
- algorithm
- polynomials
Suppose we have data
where the last step comes from integration by parts assuming
It should be observed that, often in score-matching literature, researchers will parameterize the gradient of
Therefore, we get that
If we then set
Now suppose
for some
implying that
which implies that
Suppose we have
where
so in this case, we can generally assume
We can also form
i.e., find the minimum norm solution. This can be done via allowing
However, while this has an optimal solution in closed form, the objective
While I think this is a convex function (the rightmost term is reminiscient of a log-sum-exp), it's not really tractable as an objective even when approximating the expectation with an average over the empirical distribution of
For simplicity, imagine
Generally, the neural networks for diffusion models are minimizing functions of the form
which approximates the loss
where
with
where we abuse notation to say
However, consider that we know
Suppose we have a basis for the data dimension
Similarly, we create a basis
In general dimension
tags:
- algorithm
- integration
share: "true"
Suppose we have two probability measures,
this is well-studied and has well-known restrictions. An alternative here would be the Fisher-Rao distance,
Note that this is distance is constrained by a reaction PDE instead of a continuity equation. We can see that
where
but this is irrelevant to the topic at hand.
The PDE constraining the Wasserstein-Fisher-Rao metric will intuitively be
i.e., we use velocity field
Then, if I have a measure
See, e.g., YWR23
At first, we may be interested in the KL-divergence function often used in gradient flows, which (for a given distribution
but if we keep
I don't think I need to center the weight derivative? Anyway, if
Suppose we now have
where
where
What's nice here is that this is "forgetful" of our initial statue
Take our Hilbert space
so our kernel becomes the Christoffel-Darboux kernel of
with
which means the embedding of an arbitrary function
Assume we want to match points and weights
^3be7e2
with
and somewhat less obviously, suppose we abuse notation to consider a diagonal matrix
which indicates the trace is operating on a matrix with only one nonzero column
Here we maintain control of three things:
We now assume data from
One can compare and contrast this with the minimization problem given in ^021c59. While they are very similar, note that the geometry of the static problem by
Suppose we have node
In this squared-exponential kernel case, we know that
We know that, if we set the RHS to zero, we might get a steady-state solution
The expressions on the right are necessitated by the form of the squared-exponential kernel.
If
Suppose
I'm not sure of either of these things, though.
We return to the original ODE when
I'm particularly interested if this can be extended to work for flows of functions and, in particular, orthogonal polynomials. Recall that
It's fairly clear that, if you take the inner product of both sides with
Similarly,
Assuming this is correct, we further constrict our polynomials to have unit magnitude, which automatically indicates that
One could show that (assuming unit variance of the polynomials)
I have no idea what
Another addition is that this becomes somewhat intractable in higher than one dimension due to the finicky nature of recurrence relations on multivariate polynomials.
Finally, I'm not entirely sure what one would linearize here, or how this could possibly be made more efficient.
The subsequent question is what else you would transport. One idea I like is transporting roots so that, at the end of the transport, you approximate the exact roots of the orthonormal polynomials (then one could maybe approximate the value of the Christoffel function or something? I'm not sure)
Alternatively, another idea would be somehow transporting the weights so that you keep the points and the weights change according to the transport (in the vein of tempered importance weighting?)
Generally, we know
Via our abuse of notation, however,
Suppose we know
and
Additionally, on the corner cases, assuming
Now we can formulate an algorithm:
Assume:
Input: (possibly unnormalized?) density
Output:
Note that this approximates the Euler discretization of the ODE of
Setting the stage
Notation | Meaning |
---|---|
target | |
reference | |
Densities of |
|
Transport |
|
Approximation space when considering |
|
Approximation of |
|
Quadrature rule on |
|
function we want expectation of on |
|
The "transported" quadrature rule, |
|
Orthogonal polynomials with respect to |
|
We are interested in, for
Quadrature generally, to some degree, approximates inner products via polynomials of some kind. Gauss quadrature gets phenomenal convergence via the fact that
which comes from Parseval's identity (see Sagiv 22). What we want is that, given a function
or something like that. Note here that
Suppose we map from the reference
In the above example, the problem is ostensibly that the transport
What I take away from these examples is that having "lighter" tails isn't enough; you want the same tails, i.e., the transport
Suppose
where
This "moves the onus onto
Suppose we have that, for
and, for those fixed nodes
The only difference between
For
where
which ensures some kind of "interpolatory behavior". This gives the loss as
Now we set
which comes from the fact that
Suppose we choose
where the third term
Suppose
however by construction, we can also just plug the simplex solution into the definition of
Input: multi-index set mset_full, which is assumed downward-closed
Output: quad_rules: set of tensor-product quadratures and their (possibly negative) associated cardinality `Vector{Tuple{Midx, Int}}`
keep_looping = true
quad_rules = []
while keep_looping
# Form the mset for this loop
look at indexes of occurrences that are not 1.
form them into mset_loop
get reduced frontier frontier of mset_loop
# Adjust each frontier member and reindex their ancestors' number of occurrences
for idx_midx_loop in frontier:
# Access the adjustment for this frontier member
get index idx_midx_full corresponding to the multi-index idx_midx_loop in mset_full
set j = occurrences[idx_midx_full] - 1, e.g., if idx_midx_full occurs $j+1=-1$ times, then $j=-2$.
occurrences[idx_midx_full] = 1
# Reindex the number of occurrences of the backward ancestors
get backward ancestors of idx_midx_loop
for each ancestor idx_back_loop of idx_midx_loop
get original index idx_back_full of idx_back_loop (i.e. index of idx_back_loop w.r.t. mset_full, not mset_loop)
occurrences[idx] -= j
end for
push (idx_midx_full, j) onto quad_rules
end for
keep_looping = occurrences is not identically 1
end while
return quad_rules