class: center, middle
.title[Local polynomial fitting on the sphere]
.subtitle[*A mapping solution for the Earth sciences*]
.author[Jonathan M. Lilly] .institution[The Planetary Science Institute, Tucson, Arizona]
.institution[Carnegie Mellon University] .date[December 5, 2025]
.note[Created with [{Liminal}](https://github.com/jonathanlilly/liminal) using [{Remark.js}](http://remarkjs.com/) + [{Markdown}](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) + [{KaTeX}](https://katex.org)] --- class: left ##Satellite Monitoring of Sea Surface height
- Sea surface height observed along tracks of an orbiting satellite - Dating back to 1992, one of the great oceanographic datasets - What is desired is a map of height on a regular grid This leads to a problem in mapping scattered data, a central statistical problem that comes up in many different fields. --- class: left ##Motivation There does exist a mapped product based on the along-track data, known as CMEMS. This is one of the msot important oceanographic data products, but it's unsatisfying for several reasons. - Not open source - Complex algorithm - Full details of the algorithm are not made public - Processing artifacts have been a problem - Likely oversmoothed In this talk I'm going to present an open source alternative using a very simple algorithm. To keep things simple, I'm only working with a subset of the alongtrack data, the *Jason-class* data on the previous slide. Along the way we're going to learn a lot about fundamentals of mapping methods. --- class: center ##A (Forthcoming) Open-Source Product
Sea level anomaly from CMEMS (top) vs. new product (bottom). --- class: left ##Comparison of the Two Products The two animations in the previous slide appear quite comparable. Yet whereas the upper panel incorporates Jason-class, ERS-class, Geosat-class, and other missions such as Saral and Cryosat, the lower panel uses Jason-class only. In other words, a refinement in the mapping algorithm has led to a gridded product comparable to that of CMEMS despite using a fraction of the data. This suggests that the current Optimal Interpolation algorithm is suboptimal and is substantially oversmoothing the data. It also suggests that incorporating all available altimeter data into a refined mapping method would lead to greatly improved products. This however is a large task that would require a dedicated effort. (Note: close inspection shows occasional “stripes” in the lower panel due to long-wavelength error; this should be seen as an upstream data processing issue rather than a mapping issue.) --- class: left ##Mapping Scattered Data This problem of mapping (or smoothing, or interpolating) scattered data is one of the main problems in statistical analysis, and as such, a number of methods exist — most known by various names. - **optimal interpolation** (Bretherton et al. 1976), a.k.a. objective analysis, statistical interpolation, Gauss-Markov interpolation, and simple kriging (Matheron, 1963b) - **smoothing splines** (Wahba and Wendelberger, 1980; Silverman, 1985; Gu, 2002) - **local polynomial fitting**, a.k.a. local polynomial smoothing (Stone, 1977; Fan and Gijbels, 1996; Ruppert and Wand, 1994), locally weighted least squares regression, local regression, and LOWESS (for LOcally WEighted Scatterplot Smoothing) or loess (Cleveland, 1979; Cleveland and Devlin, 1988) - **kernel smoothing**, a.k.a. kernel estimation and simple smoothing (Fan and Gijbels, 1996), formally the 0th order subset of local polynomial fitting A *massive* problem is that the literature is distributed among different communities, each with different terminologies! --- class: left ##A Variety of Methods All of the methods listed are linear in the data. Thus, in a certain sense, they are all equivalent—e.g., convert a spline norm into a polynomial fit—yet their implementations and philosophies are very different. There is some very interesting work towards an understanding of this equivalence (Kimeldorf and Wahba, 1970; Watson, 1984; McIntosh (1990); Furrer and Nychka (2007), but it remains incomplete. Sorry, no time for this today! Local polynomial fitting has a number of attractive properities: - simple - fast - easy to implement and adjust - performs well with irregular data spacing - doesn't require assumptions about e.g. covariance functions - lots of theoretical results - you get derivative estimates for free Unlike splines, not yet generalized to the sphere. --- class: left ##Local Polynomial Fitting The idea of local polynomial fitting is to minimize, within the vicinity of some mapping point $(x\_o,y\_o)$, the weighted squared error $\Delta\_p^2(x\_o,y\_o)$ between the observations and a $p$th-order polynomial. The $N$ observations are arranged into $N$-vectors, with $\tilde{\mathbf{x}}$ and $\tilde{\mathbf{y}}$ being the $x$- and $y$-locations, and $\tilde{\mathbf{z}}$ being the observed values. Then the weighted squared error in the vicinity of point $(x\_o,y\_o)$ is
\[\Delta_p^2(x_o,y_o)\equiv\left\|\mathbf{W}\left(\tilde{\mathbf{x}},\tilde{\mathbf{y}},x_o,y_o\right)\left[\tilde{\mathbf{z}}-\mathbf{X}_p\left(\tilde{\mathbf{x}},\tilde{\mathbf{y}},x_o,y_o\right)\widehat{\bm{\beta}}_p\left(x_o,y_o\right)\right]\right\|^2\]
where $\mathbf{W}$ is a matrix of weights, $\mathbf{X}\_p$ is a matrix of coordinate deviations containing powers up to $p$, and $\widehat{\bm{\beta}}\_p$ is the estimated field and its first $p$ derivatives. Three major degrees of adjustability: the order $p$, the shape of the weighting function used in $\mathbf{W}$, and scaling radius for the weighting function known as the *bandwidth*. --- class: left ##Estimate Vector $\widehat{\bm{\beta}}_p$ for Different Orders The vector $\bm{\beta}_p$ contains the mean field $z$ and its first $p$ derivatives.
\[\bm{\beta}_0(x_o,y_o)\equiv \begin{bmatrix} z(x_o,y_o) \end{bmatrix}\]
\[\bm{\beta}_1(x_o,y_o)\equiv \begin{bmatrix} z(x_o,y_o)\\[3pt] \frac{\partial z}{\partial x} (x_o,y_o) \\[3pt] \frac{\partial z}{\partial y} (x_o,y_o) \end{bmatrix}\]
\[\bm{\beta}_2(x_o,y_o)\equiv \begin{bmatrix} z(x_o,y_o)\\[3pt] \frac{\partial z}{\partial x} (x_o,y_o) \\[3pt] \frac{\partial z}{\partial y} (x_o,y_o)\\[3pt] \frac{\partial^2z}{\partial x^2} (x_o,y_o)\\[3pt] \frac{\partial^2z}{\partial x\partial y} (x_o,y_o)\\[3pt] \frac{\partial^2z}{\partial y^2} (x_o,y_o) \end{bmatrix}\]
which is of length $Q$, with $Q=1$, 3, and 6 for $p=0$, 1, and 2. The estimate vector $\widehat{\bm{\beta}}_p$ is then our error-minimizing estimate of $\bm{\beta}_p$. --- class: left ##Deviation Matrix $\mathbf{X}_p$ for Different Orders The matrix $\mathbf{X}_p$ contains the coordinate deviations necessary for a Taylor expansion up to $p$th order.
\[\mathbf{X}_0(\tilde{\mathbf{x}},\tilde{\mathbf{y}},x_o,y_o) \equiv \left.\begin{bmatrix} 1 \\ \vdots \\ 1 \end{bmatrix}\right\} N \,\,\mathrm{terms} \]
\[\mathbf{X}_{1}(\tilde{\mathbf{x}},\tilde{\mathbf{y}},x_o,y_o) \equiv \begin{bmatrix} 1 & \tilde{x}_{1}- x_o & \tilde{y}_{1} -y_o\\ 1 & \tilde{x}_{2}- x_o & \tilde{y}_{2} -y_o\\ \vdots & \vdots & \vdots \\ 1 & \tilde{x}_N- x_o & \tilde{y}_N -y_o \end{bmatrix} \]
\[\mathbf{X}_{2}(\tilde{\mathbf{x}},\tilde{\mathbf{y}},x_o,y_o) \equiv \begin{bmatrix} 1 & \tilde{x}_{1}' & \tilde{y}_{1}'& \frac{1}{2}\tilde{x}_{1}'{^2} & \tilde{x}_{1}'\tilde{y}_{1}' &\frac{1}{2}\tilde{y}_{1}'{^2} \\[3pt] 1 & \tilde{x}_{2}' & \tilde{y}_{2}'& \frac{1}{2}\tilde{x}_{2}'{^2} & \tilde{x}_{2}' \tilde{y}_{2}' &\frac{1}{2} \tilde{y}_{2}'{^2} \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ 1 & \tilde{x}_N' & \tilde{y}_N'& \frac{1}{2} \tilde{x}_N'{^2} & \tilde{x}_N' \tilde{y}_N' &\frac{1}{2}\tilde{y}_N'{^2} \end{bmatrix} \]
In the last expression, we define $\tilde x\_n'\equiv \tilde x\_n - x\_o $ and $\tilde y\_n'\equiv \tilde y\_n - y\_o $. --- class: left ##The Weighting Matrix $\mathbf{W}$ Finally, the weighting matrix $\mathbf{W}$ is defined as
\[ \mathbf{W}\left(\tilde{\mathbf{x}},\tilde{\mathbf{y}},x_o,y_o\right) \equiv \begin{bmatrix} K_h\left(\tilde{r}_{1}\right) & 0& \cdots & 0\\ 0 & K_h\left(\tilde{r}_{2}\right) & \cdots &0\\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \cdots & K_h\left(\tilde{r}_N\right) \end{bmatrix}\]
where $\tilde r\_n\equiv \sqrt{\left(\tilde{x}\_n-x\_o\right)^2+\left(\tilde{y}\_n-y\_o\right)^2}$ is the distance between the mapping point $(x\_o,y\_o)$ and the $n$th observation point $(x\_n,y\_n)$. Here $K_h(r)\equiv K(r/h)$ where $K(r)$ is a symmetric, decaying function called the kernel. Generally $K(r)$ is set to vanish outside of a prescribed radius, chosen here to be $r=1$. The rescaled kernel $K_h(r) $ then vanishes outside of radius $r=h$, where $h$ is known as the *bandwidth*. --- class: left ##The Error-Minimizing Solution The problem of finding the local polynomial that minimizes the weighted error
\[\Delta_p^2(x_o,y_o)\equiv\left\|\mathbf{W}\left(\tilde{\mathbf{x}},\tilde{\mathbf{y}},x_o,y_o\right)\left[\tilde{\mathbf{z}}-\mathbf{X}_p\left(\tilde{\mathbf{x}},\tilde{\mathbf{y}},x_o,y_o\right)\widehat{\bm{\beta}}_p\left(x_o,y_o\right)\right]\right\|^2\]
can be shown to have the solution
\[\widehat{\bm{\beta}}_{p}\left(x_o,y_o\right)=\left(\mathbf{X}_p^T \mathbf{W}\mathbf{X}_p\right)^{-1} \mathbf{X}_p^T \mathbf{W} \tilde{\mathbf{z}}\]
the heart of which involves inverting the $Q\times Q$ matrix $\mathbf{X}_p^T \mathbf{W}\mathbf{X}_p$. In the case that $p=0$, a linear fit, it can be shown that this reduces to a simple kernel smoothing. The main free parameters of local polynomial fitting are thus the fit order $p$, the bandwidth $h$, and the choice of kernel $K(r)$. Anisotropic smoothing, involving a matrix-valued bandwidth parameter, can also be employed, but will not be considered here. --- class: center ##Design Adaptivity and Fixed Population
**The zeroth-order fit has a severe defect:** when both the observation density and the field have gradients, bias results! Fits with $p=1$ or larger have do not have this bias, a property called *design adaptivity* by Fan (1992). Compare the (a) true field, (b) zeroth-order or $p=0$ fit, and (c) first-order or $p=1$ fit. The problem of gaps arising from low data density with a fixed bandwidth, as in (c), is then solved by letting the bandwidth locally expand to encompass a specified number of observation points, as in panel (d). This is termed the *fixed population* algorithm herein. --- class: left ##Choice of Kernel Function A variety of different choices of weighting kernel $K(r)$ have been employed in the literature. We wish to compare them in practice. Here we propose the general form
\[K(r;\alpha,\beta)\equiv \kappa_{\alpha,\beta} (1-r^\alpha)^\beta \]
where $ \kappa_{\alpha,\beta} $ is a normalizing constant. **This family subsumes the most commonly used kernels as special cases.** The use of a general form allows us to readily determine the optimal choice of parameters in an observing system simulation experiment through numerical optimization. This kernel is usefully reparameterized in terms of the *half-power radius* $R$ and *shape parameter* $\alpha$, with $\beta$ then given by
\[\beta(R,\alpha)\equiv \frac{\ln(1/2)}{\ln\left(1-R^\alpha\right)} \]
where we note that $K(R;\alpha,\beta)/K(0;\alpha,\beta)=1/2$. --- class: center ##The Generalized Kernel for $\alpha=2$
The advantage of this kernel is that it lets us have different weighting behaviors in an inner and outer region. The curves show different $\beta$ for $\alpha=2$. The kernel smoothly varies between a Gaussian-like function and a step function. --- class: center ##The Generalized Kernel for $\alpha=3$
--- class: center ##The Generalized Kernel for $\alpha=4$
The transition at the inner radius becomes shaper as $\alpha$ increases. --- class: center ##Bias Estimates
The existence of extensive theory surrounding local polynomial fitting allows errors to be estimated with reasonable accuracy. Here, the true mapping bias (a) is compared with two different estimates that do not rely on the true (generally unknown) field. --- class: center ##Local Polynomial Fitting on the Sphere
Local polynomial fitting on the sphere can be accomplished by projecting data into a plane tangent to the mapping point $(x_o,y_o)$. Then the previously presented equations all hold, with $\tilde{\mathbf{x}}(x_o,y_o)$ and $\tilde{\mathbf{y}}(x_o,y_o)$ now giving the location within the tangent plane. --- class: left ##Local Polynomial Fitting on the Sphere Let $z(\theta,\phi)$ be a function of longitude $\theta$ and latitude $\phi$, sampled at $N$ data points with longitudes $\tilde\theta_n$, latitudes $\tilde\phi_n$, and observed values $\tilde z_n$. We aim to perform a local polynomial fit at $\left(\theta_o,\phi_o\right)$. To do so we project into the tangent plane, yielding
\[\tilde x_n\left(\tilde{\theta}_n,\tilde{\phi}_n,\theta_o,\phi_o\right) \equiv R \cos \tilde{\phi}_n\,\sin\left(\tilde{\theta}_n-\theta_o\right)\]
\[\tilde y_n\left(\tilde{\theta}_n,\tilde{\phi}_n,\theta_o,\phi_o\right)\equiv R\cos\phi_o\sin\tilde{\phi}_n -R \sin\phi_o\cos\tilde{\phi}_n\,\cos\left(\tilde{\theta}_n-\theta_o\right)\]
Then we write
\[Z_o(x,y)\equiv z\left(\theta(x,y,\theta_o,\phi_o),\phi(x,y,\theta_o,\phi_o)\right)\]
and apply local polynomial fitting to $x$ and $y$. We do this at every point $(x_o,y_o)$ we want to create a map at. That's it. --- class: left ##Determining Optimal Mapping Parameters
A so-called *observing system simulation experiment* is performed using a global model sampled by a synthetic altimeter. Since we know the correct answer, we can compare with sea surface height mapped from the altimeter observations. (Thanks to Harper Simmons for the model fields.) --- class: left ##Determining Optimal Mapping Parameters We sweep through parameter space in order find the best set of parameters. These are allowed to vary as a function of latitude. Error is quantified by the averaged squared deviation between the resulting map and the colocated true model value. If any gaps emerge in the mapped product between +/- 66 degrees for a certain parameter choice, the error is set to infinity, guaranteeing no gaps. After this sweep, a simple parametric form is chosen if the error-minimizing choice is observed to vary with latitude, and its parameters are estimate through numerical optimization. Otherwise, the best constant value across latitudes is determined. This is done within the context of fixed population fits, as these naturally adapt to data gaps. The best linear ($p=1$) and best quadratic fits ($p=2$) are comparable and are *much* better than $p=0$. We'll look at $p=2$. --- class: center ##Determining Parameter Sensitivity
Inner radius $R$ (a), population $N$ (b), and shape parameter $\alpha$ (c). **Fit quality does not strongly depend on total population / outer radius, but rather on inner radius.** This does not match asymptotic theory which says the $\beta=1$, $\alpha=2$ or *Epanechnikov* kernel $(1-r^2)$ is optimal. --- class: center ##Optimal Latitude-Dependent Parameters
The heavy black curves represent the best fit parameters from a sweek of ~10K parameter combinations. Choosing a fixed population of $N=500$ leads to a bandwidth $H$ that varies spatially. --- class: center ##Optimal Inner Radius
The optimal inner radius is by construction independent of longitude and varies from large values near the equator to small values at the poles. Note that there are a few places where the optimal value is not achieveable due to topographic constraints. --- class: center ##Optimal Bandwidth
Here, the smoothing radius is expanding and contracting in order to encompass $N=500$ sampling points. Conventional wisdom says you want to smooth using spatially fixed parameters. This is **not true** for fits of order $p>0$! --- class: center ##Optimal $\beta$ Parameter
--- class: center ##Error Using Optimal Mapping Parameters
The imprint of the sampling grid is readily evident. What controls this remaining error? --- class: center ##Mean Value of the Laplacian Magnitude
The remaining error is due to the mean *curvature* of the surface. --- class: center ##Close-Up of the True Field
A zoom into the Gulf Stream area, a high mean curvature region. The features leading to the curvature are oceanic eddies. --- class: center ##Truth Minus Estimate
Errors are occuring because small-scale features—eddies—fall between the cracks of the observations. This can't be solved within the context of local polynomial fitting, thus, it's likely we've done the best this method can do. --- class: left ##Comparison with Optimal Interpolation In the oceanographic community, Optimal Interpolation (OI) is by far the most common mapping method. Local polynomial fitting (LPF) and OI differ in both theory and practice. Factors in favor of LPF are numerous, including: * LFP does not require prior knowledge of the covariance structure of the field being mapped. * Optimal LPF parameters can be chosen or estimated empirically using theoretical results. * LPF has several degrees of adjustability, whereas OI, strictly speaking, has zero. * LPF is much faster because the matrix being inverted is vastly smaller—$Q\times Q$, i.e. $6 \times 6$ for a $p=2$ fit, versus $N\times N$ for OI. * Although OI is theoretically optimal in some sense, key caveats are that (i) real-world performance may depart from optimality on paper and (ii) the sense of optimality is crucial. For example, OI applied to an eddying ocean is optimal if one wishes to smooth out the eddies to estimate the mean, yet paradoxically is commonly applied to estimate eddy properties. --- name: conclusions class: left ## Conclusions Local polynomial fitting was adapted for use on the sphere, a generalized kernel was introduced. The method was applied to an oceanographic problem of mapping alongtrack altimeter data. A sweep through the parameter space with an observing system simulation experiment led to a best fit, and the following insights: --- name: conclusions class: left ## Conclusions - Using only a fraction of available altimeter data, the results rival CMEMS with a much simpler, open-source product. Using all available data would thus likely yield a superior product. - Conventional wisdom says (i) the optional choice of kernel is a parabola but that (ii) the choice of kernel doesn't matter that much. In the ocean mapping problem neither of these are true. - The *inner radius* is the most important parameter, because the scale over which substantial weight is applied should match the scale of the eddy features. *But* you still need an outer radius that can expand and contract to grab enough data. - The optimal radius is one that is *almost too small*. For this reason, the so-called plug-in estimates of optimal parameter values don't perform very well. Takeways - Local polynomial fitting with $p>0$ and a variable bandwidth is great all-purpose mapping option. - Despite lots of theory for choosing parameters, an OSSE is hard to beat for optimizing performance. Use these! --- class: middle, center # Thank you! Publications and dataset coming in 2026!