# POLYSMOOTH is the jMap module of jLab.

``` POLYSMOOTH  Mapping using local polynomial fitting, also known as loess.

POLYSMOOTH generates a map from scattered data in two dimensions using
a locally weighted least squares fit to a polynomial.

This method is variously known as local polynomial fitting, local
polynomial smoothing, multivariate locally weighted least squares
regression, lowess (originally for LOcally WEighted Scatterplot
Smoothing), and loess.  All of these are essentially synonyms.

POLYSMOOTH has the support for all of the following options:

--- Cartesian or spherical geometry
--- a constant, linear, or quadratic fit in space
--- fixed-bandwidth or fixed population algorithms
--- prescribed spatially-varying bandwidth or population
--- multiple choices of weighting function or kernel
--- additional datapoint weighting factors, e.g. by confidence
--- the median-based robustification of Cleveland (1979)

POLYSMOOTH is implemented using a numerically efficient algorithm that
avoids using explicit loops. The data are pre-sorted so that different
mapping parameters can be tried out at little computational expense.

For algorithm details, see Lilly and Lagerloef (2018).
__________________________________________________________________

Local polynomial fitting on the plane

Let's say we have an array Z of data is at locations X,Y.  X,Y, and Z
can be arrays of any size provided they are all the same size.

The problem is to obtain a mapped field ZHAT on some regular grid
specified by the vectors XO and YO.

Calling POLYSMOOTH is a two-step process:

[DS,XS,YS,ZS]=TWODSORT(X,Y,Z,XO,YO,CUTOFF);
ZHAT=POLYSMOOTH(DS,XS,YS,[],ZS,[],RHO,H);

The empty arrays mark locations of optional arguments described later.

In the first step, one calls TWODSORT which returns ZS, a 3D array of
data values at each grid point, sorted by increasing distance DS, and
the corresponding positions XS and YS.  See TWODSORT for more details.

CUTOFF determines the maximum distance included in the sorting and
should be chosen to be greater than H.

In the second step, POLYSMOOTH fits a RHOth order spatial polynomial at
each gridpoint within a neighborhood specified by the "bandwidth" H.

The fit is found by minimizing the weighted mean squared error between
the fitted surface and the observations.  The bandwidth H sets the
decay of this weighting function, described in more detail shortly.

The fit order RHO may be chosen as RHO=0 (fit to a constant), RHO=1
(fit to a plane), or else RHO=2 (fit to a parabolic surface).

The data locations (X,Y) and grid point locations (X0,Y0) shoud have
the same units as the bandwidth H (e.g., kilometers).

Note that H may either be a scalar, or a matrix of size M x N to
specify an imposed spatially-varying bandwidth.

The dimensions of XO and YO are M x N x J, where J is the maximum
number of data points within bandwidth cutoff at any grid point
location.  Then ZHAT is matrix of dimension M x N.
__________________________________________________________________

Choice of weighting function or kernel

POLYSMOOTH(DS,XS,YS,[],ZS,[],RHO,{H,KERN}) weights the data points in
the vicinity of each grid point by some decaying function of distance
called the kernel, specified by KERN.

KERN = 'uni' uses the uniform kernel, K=1
KERN = 'epa' uses the Epanechnikov (parabolic) kernel K=1-(DS/H)^2
KERN = 'bis' uses the bisquare kernel K=(1-(DS/H)^2)^2
KERN = 'tri' uses the tricubic kernel K=(1-(DS/H)^3)^3
KERN = 'gau' used the Gaussian kernel, K=EXP(-1/2*(3*DS/H)^2)

Note that all choices of weighting function are set to vanish for DS>H.

The default behavior is STR='gau' for the Gaussian kernel, which is
specified to have a standard deviation of H/3.

KERN may also be an integer, in which the standard deviation of the
Gaussian is set to H/KERN, with the default corresponding to KERN = 3.

KERN may also be a vector (of length greater than one) defining a
custom kernel, with KERN(1) corresponding to DS/H=0 and KERN(end) to
DS/H=1.  Kernel values at any distance are then linearly interpolated.
__________________________________________________________________

Inclusion of temporal variability

It may be that the data contributing to the map is taken at different
times, and that in constructing the map it is important to take into
account temporal variability of the underlying field.

In this case, data points with values Z are taken at locations X,Y and
also at times T. POLYSMOOTH is then called as follows:

[DS,XS,YS,TS,ZS]=TWODSORT(X,Y,T,Z,XO,YO,CUTOFF);
ZHAT=POLYSMOOTH(DS,XS,YS,TS,ZS,[],[RHO MU],H,TAU);

which will fit the data to the sum of spatial polynomial of bandwidth H
and order RHO, and a temporal polynomial of bandwidth TAU and order MU.

TAU, like H, may either be a scalar or an M x N matrix.  Its units
should be the same as those of the times T.

Note that the times T should be given relative to the center of the
time window, that is, time T=0 should correspond to the time at which
you wish to construct the map.

By default the Gaussian kernal is used in time.  One can also employ
a cell array, as in POLYSMOOTH(..,[RHO MU],H,{TAU,KERN}), to specify
other behaviors for the time kernel, as described above.
__________________________________________________________________

BETA contains the estimated field, the same as ZHAT, together with
estimates of all spatial derivatives of the field up to RHOth order:

BETA(:,:,1) = ZHAT     --- The field z
BETA(:,:,2) = ZXHAT    --- The first derivative dz/dx
BETA(:,:,3) = ZYHAT    --- The first derivative dz/dy
BETA(:,:,4) = ZXXHAT   --- The second derivative d^2z/dx^2
BETA(:,:,5) = ZXYHAT   --- The second derivative d^2z/dxdy
BETA(:,:,6) = ZYYHAT   --- The second derivative d^2z/dy^2

The length of the third dimension of BETA is set by the total number of
derivatives of order RHO or less.  This number, called Q, is equal to
Q = 1, 3, and 6 for RHO = 0, 1, and 2 respectively.

After these, in the case that TS is input, the estimated time
derivatives are returned up to order MU:

BETA(:,:,Q+1) = ZT   --- The first time derivative dz/dt
BETA(:,:,Q+2) = ZTT  --- The second time derivative d^2z/dt^2

AUX is an M x N x 5 array of auxiliary fields associated with the fit.

AUX(:,:,1) = P  --- The total number of datapoints employed
AUX(:,:,2) = H  --- The bandwidth used at each point
AUX(:,:,3) = E  --- The rms error between the data and the fit
AUX(:,:,4) = W  --- The total weight used
AUX(:,:,5) = R  --- The weighted mean distance to the data points
AUX(:,:,6) = V  --- The weighted standard deviation of data values
AUX(:,:,7) = C  --- The matrix condition number

P, called the population, is the total number of data points within one
bandwidth distance H from each of the (M,N) grid points.

The root-mean-squared error E and standard deviation V are both
computed using the same weighted kernal applied to the data.

The condition number C arises because a matrix must be inverted at each
(M,N) location for the RHO=1 or RHO=2 fits. C is equal to 1 for RHO=0.

C is computed by COND.  At (M,N) points where C is large, the least
squares solution is unstable, and one should consider using a lower-
order fit RHO or a larger value of the bandwidth H.
__________________________________________________________________

Fixed population

POLYSMOOTH(DS,XS,YS,[],ZS,[],RHO,P,'population') varies the spatial
bandwidth H to be just large enough at each grid point to encompass P
points. This is referred to here as the "fixed population" algorithm.

Note that the argument P relaces the bandwidth H.  So, if one chooses
to specify a kernel, one use POLYSMOOTH(...,RHO,{P,KERN},'population').

When employed with the option for including a temporal fit, the fixed
population algorithm only applies to the spatial kernel.  The temporal
kernel remains specified in terms of a temporal bandwidth.

The fixed population algorithm can give good results when the data
spacing is uneven, particularly when used with a higher-order fit.

When using this method, the length of the third dimension of the fields
output by TWODSORT or SPHERESORT must be at least P. If it is greater
than P, it may be truncated to exactly P, thus reducing the size of
those fields and speeding up calculations.  This is done internally,
and also can be done externally by calling POLYSMOOTH_PRESORT.
__________________________________________________________________

Weighted data points

POLYSMOOTH can incorporate an additional weighting factor on the data
points. Let W be an array of positive values the same size as the data
array Z.  One may form a map incorporating these weights as follows:

[DS,XS,YS,ZS,WS]=TWODSORT(X,Y,Z,W,XO,YO,CUTOFF);
ZHAT=POLYSMOOTH(DS,XS,YS,[],ZS,WS,RHO,H);

The weights W could represent the confidence in the measurement values,
or an aggregation of invididual measurements into clusters.  The latter
approach may be used to condense large datasets to a managable size.
__________________________________________________________________

Smoothing on the sphere

POLYSMOOTH supports a local polynomial fit on the sphere, as described
in Lilly and Lagerloef (2018).  As before this is a two-step process:

[DS,XS,YS,ZS]=SPHERESORT(LAT,LON,Z,LATO,LONO,CUTOFF);
ZHAT=POLYSMOOTH(DS,XS,YS,[],ZS,[],RHO,H);

The only different is that one firstly one calls SPHERESORT, the
analogue of TWODSORT for the sphere.  See SPHERESORT for more details.

The bandwidth in this case should have units of kilometers.

Note that SPHERESORT and POLYSMOOTH both assume the sphere to be the

The derivatives appearing in BETA are now the derivatives given in a
local tangent plane.  These can be converted into derivatives in terms
of latitude and longitude following Lilly and Lagerloef (2018).
_________________________________________________________________

One grid, many fields

It is often the case that the field to be mapped, Z, consists of many
repeated sets of observations at the same (X,Y) points.

For example, X and Y could be mark the locations of measurements that
are repeated at different times (as in satellite altimetry), or else
there could be multiple fields Z that are measured simultaneously.

There is a simple way to handle this situation without needing to
resort the grid.  First one calls TWODSORT or SPHERESORT as follows:

[DS,XS,YS,INDEX]=TWODSORT(X,Y,XO,YO,CUTOFF);
--- or ---
[DS,XS,YS,INDEX]=SPHERESORT(LAT,LON,LATO,LONO,CUTOFF);

INDEX is now an index into the sorted datapoint locations, such that

ZS=POLYSMOOTH_INDEX(SIZE(DS),INDEX,K);

returns sorted values of Z that can be passed to POLYSMOOTH.

The virtue of this approach is that one only has to call TWODSORT or
SPHERESORT once, no matter how many variable are to be mapped.
__________________________________________________________________

Robustification

ZHAT=POLYSMOOTH(...,'robust',NI) implements the median-based iterative
robust algorithm of Cleveland (1979), p 830--831, using NI iterations.

This can be useful when outliers are present in the data.   Typically
a single iteration is sufficient to remove most outliers.

ZHAT will in this case have NI+1 entries along its third dimension.
The iterative estimates are stored in reserve order, with the last
iteration in ZHAT(:,:,1) and the original estimate in ZHAT(:,:,NI+1).
__________________________________________________________________

Parallelization

POLYSMOOTH(...,'parallel') parallelizes the computation using a PARFOR
loop, by operating on each latitude (or matrix row) separately.   This
requires that Matlab's Parallel Computing toolbox be installed.

POLYSMOOTH will then using an existing parallel pool, or if one does
not exist, a pool will be created using all availabale workers.

POLYSMOOTH(...'parallel',Nworkers) alternately specifies the number of
workers to use. If you run into memory constraints, reduce Nworkers.

If you are working on multiple maps simultaneously, depending on the
size of your problem, it may be faster to use an exterior PARFOR loop,
rather than calling POLYSMOOTH with the 'parallel' flag.
__________________________________________________________________

Quiet option

By default, POLYSMOOTH displays a status message saying what row it is
working on.  POLYSMOOTH(...,'quiet') suppresses this message.
___________________________________________________

'polysmooth --t' runs some tests.
'polysmooth --f' generates some sample figures.

Usage:  [ds,xs,ys,zs]=twodsort(x,y,z,xo,yo,cutoff);
zhat=polysmooth(ds,xs,ys,[],zs,[],rho,H);
[zhat,beta,aux]=polysmooth(ds,xs,ys,[],zs,[],rho,H);
--or--
[ds,xs,ys,zs,ws]=spheresort(lat,lon,z,w,lato,lono,cutoff);
[zhat,beta,aux]=polysmooth(ds,xs,ys,[],zs,[],rho,H);
__________________________________________________________________
This is part of JLAB --- type 'help jlab' for more information
(C) 2008--2020 J.M. Lilly --- type 'help jlab_license' for details
```