POLYSMOOTH Mapping using local polynomial fitting, also known as loess. POLYSMOOTH generates a map from scattered data in two dimensions using a locally weighted least squares fit to a polynomial. This method is variously known as local polynomial fitting, local polynomial smoothing, multivariate locally weighted least squares regression, lowess (originally for LOcally WEighted Scatterplot Smoothing), and loess. All of these are essentially synonyms. POLYSMOOTH has the support for all of the following options: --- Cartesian or spherical geometry --- a constant, linear, or quadratic fit in space --- an additional linear or quadartic fit in time --- fixed-bandwidth or fixed population algorithms --- prescribed spatially-varying bandwidth or population --- multiple choices of weighting function or kernel --- additional datapoint weighting factors, e.g. by confidence --- the median-based robustification of Cleveland (1979) POLYSMOOTH is implemented using a numerically efficient algorithm that avoids using explicit loops. The data are pre-sorted so that different mapping parameters can be tried out at little computational expense. For algorithm details, see Lilly and Lagerloef (2018). __________________________________________________________________ Local polynomial fitting on the plane Let's say we have an array Z of data is at locations X,Y. X,Y, and Z can be arrays of any size provided they are all the same size. The problem is to obtain a mapped field ZHAT on some regular grid specified by the vectors XO and YO. Calling POLYSMOOTH is a two-step process: [DS,XS,YS,ZS]=TWODSORT(X,Y,Z,XO,YO,CUTOFF); ZHAT=POLYSMOOTH(DS,XS,YS,[],ZS,[],RHO,H); The empty arrays mark locations of optional arguments described later. In the first step, one calls TWODSORT which returns ZS, a 3D array of data values at each grid point, sorted by increasing distance DS, and the corresponding positions XS and YS. See TWODSORT for more details. CUTOFF determines the maximum distance included in the sorting and should be chosen to be greater than H. In the second step, POLYSMOOTH fits a RHOth order spatial polynomial at each gridpoint within a neighborhood specified by the "bandwidth" H. The fit is found by minimizing the weighted mean squared error between the fitted surface and the observations. The bandwidth H sets the decay of this weighting function, described in more detail shortly. The fit order RHO may be chosen as RHO=0 (fit to a constant), RHO=1 (fit to a plane), or else RHO=2 (fit to a parabolic surface). The data locations (X,Y) and grid point locations (X0,Y0) shoud have the same units as the bandwidth H (e.g., kilometers). Note that H may either be a scalar, or a matrix of size M x N to specify an imposed spatially-varying bandwidth. The dimensions of XO and YO are M x N x J, where J is the maximum number of data points within bandwidth cutoff at any grid point location. Then ZHAT is matrix of dimension M x N. __________________________________________________________________ Choice of weighting function or kernel POLYSMOOTH(DS,XS,YS,[],ZS,[],RHO,{H,KERN}) weights the data points in the vicinity of each grid point by some decaying function of distance called the kernel, specified by KERN. KERN = 'uni' uses the uniform kernel, K=1 KERN = 'epa' uses the Epanechnikov (parabolic) kernel K=1-(DS/H)^2 KERN = 'bis' uses the bisquare kernel K=(1-(DS/H)^2)^2 KERN = 'tri' uses the tricubic kernel K=(1-(DS/H)^3)^3 KERN = 'gau' used the Gaussian kernel, K=EXP(-1/2*(3*DS/H)^2) Note that all choices of weighting function are set to vanish for DS>H. The default behavior is STR='gau' for the Gaussian kernel, which is specified to have a standard deviation of H/3. KERN may also be an integer, in which the standard deviation of the Gaussian is set to H/KERN, with the default corresponding to KERN = 3. KERN may also be a vector (of length greater than one) defining a custom kernel, with KERN(1) corresponding to DS/H=0 and KERN(end) to DS/H=1. Kernel values at any distance are then linearly interpolated. __________________________________________________________________ Inclusion of temporal variability It may be that the data contributing to the map is taken at different times, and that in constructing the map it is important to take into account temporal variability of the underlying field. In this case, data points with values Z are taken at locations X,Y and also at times T. POLYSMOOTH is then called as follows: [DS,XS,YS,TS,ZS]=TWODSORT(X,Y,T,Z,XO,YO,CUTOFF); ZHAT=POLYSMOOTH(DS,XS,YS,TS,ZS,[],[RHO MU],H,TAU); which will fit the data to the sum of spatial polynomial of bandwidth H and order RHO, and a temporal polynomial of bandwidth TAU and order MU. TAU, like H, may either be a scalar or an M x N matrix. Its units should be the same as those of the times T. Note that the times T should be given relative to the center of the time window, that is, time T=0 should correspond to the time at which you wish to construct the map. By default the Gaussian kernal is used in time. One can also employ a cell array, as in POLYSMOOTH(..,[RHO MU],H,{TAU,KERN}), to specify other behaviors for the time kernel, as described above. __________________________________________________________________ Additional output arguments [ZHAT,BETA,AUX]=POLYSMOOTH(...) returns two additional arguments. BETA contains the estimated field, the same as ZHAT, together with estimates of all spatial derivatives of the field up to RHOth order: BETA(:,:,1) = ZHAT --- The field z BETA(:,:,2) = ZXHAT --- The first derivative dz/dx BETA(:,:,3) = ZYHAT --- The first derivative dz/dy BETA(:,:,4) = ZXXHAT --- The second derivative d^2z/dx^2 BETA(:,:,5) = ZXYHAT --- The second derivative d^2z/dxdy BETA(:,:,6) = ZYYHAT --- The second derivative d^2z/dy^2 The length of the third dimension of BETA is set by the total number of derivatives of order RHO or less. This number, called Q, is equal to Q = 1, 3, and 6 for RHO = 0, 1, and 2 respectively. After these, in the case that TS is input, the estimated time derivatives are returned up to order MU: BETA(:,:,Q+1) = ZT --- The first time derivative dz/dt BETA(:,:,Q+2) = ZTT --- The second time derivative d^2z/dt^2 AUX is an M x N x 5 array of auxiliary fields associated with the fit. AUX(:,:,1) = P --- The total number of datapoints employed AUX(:,:,2) = H --- The bandwidth used at each point AUX(:,:,3) = E --- The rms error between the data and the fit AUX(:,:,4) = W --- The total weight used AUX(:,:,5) = R --- The weighted mean distance to the data points AUX(:,:,6) = V --- The weighted standard deviation of data values AUX(:,:,7) = C --- The matrix condition number P, called the population, is the total number of data points within one bandwidth distance H from each of the (M,N) grid points. The root-mean-squared error E and standard deviation V are both computed using the same weighted kernal applied to the data. The condition number C arises because a matrix must be inverted at each (M,N) location for the RHO=1 or RHO=2 fits. C is equal to 1 for RHO=0. C is computed by COND. At (M,N) points where C is large, the least squares solution is unstable, and one should consider using a lower- order fit RHO or a larger value of the bandwidth H. __________________________________________________________________ Fixed population POLYSMOOTH(DS,XS,YS,[],ZS,[],RHO,P,'population') varies the spatial bandwidth H to be just large enough at each grid point to encompass P points. This is referred to here as the "fixed population" algorithm. Note that the argument P relaces the bandwidth H. So, if one chooses to specify a kernel, one use POLYSMOOTH(...,RHO,{P,KERN},'population'). When employed with the option for including a temporal fit, the fixed population algorithm only applies to the spatial kernel. The temporal kernel remains specified in terms of a temporal bandwidth. The fixed population algorithm can give good results when the data spacing is uneven, particularly when used with a higher-order fit. When using this method, the length of the third dimension of the fields output by TWODSORT or SPHERESORT must be at least P. If it is greater than P, it may be truncated to exactly P, thus reducing the size of those fields and speeding up calculations. This is done internally, and also can be done externally by calling POLYSMOOTH_PRESORT. __________________________________________________________________ Weighted data points POLYSMOOTH can incorporate an additional weighting factor on the data points. Let W be an array of positive values the same size as the data array Z. One may form a map incorporating these weights as follows: [DS,XS,YS,ZS,WS]=TWODSORT(X,Y,Z,W,XO,YO,CUTOFF); ZHAT=POLYSMOOTH(DS,XS,YS,[],ZS,WS,RHO,H); The weights W could represent the confidence in the measurement values, or an aggregation of invididual measurements into clusters. The latter approach may be used to condense large datasets to a managable size. __________________________________________________________________ Smoothing on the sphere POLYSMOOTH supports a local polynomial fit on the sphere, as described in Lilly and Lagerloef (2018). As before this is a two-step process: [DS,XS,YS,ZS]=SPHERESORT(LAT,LON,Z,LATO,LONO,CUTOFF); ZHAT=POLYSMOOTH(DS,XS,YS,[],ZS,[],RHO,H); The only different is that one firstly one calls SPHERESORT, the analogue of TWODSORT for the sphere. See SPHERESORT for more details. The bandwidth in this case should have units of kilometers. Note that SPHERESORT and POLYSMOOTH both assume the sphere to be the radius of the earth, as specified by the function RADEARTH. The derivatives appearing in BETA are now the derivatives given in a local tangent plane. These can be converted into derivatives in terms of latitude and longitude following Lilly and Lagerloef (2018). _________________________________________________________________ One grid, many fields It is often the case that the field to be mapped, Z, consists of many repeated sets of observations at the same (X,Y) points. For example, X and Y could be mark the locations of measurements that are repeated at different times (as in satellite altimetry), or else there could be multiple fields Z that are measured simultaneously. There is a simple way to handle this situation without needing to resort the grid. First one calls TWODSORT or SPHERESORT as follows: [DS,XS,YS,INDEX]=TWODSORT(X,Y,XO,YO,CUTOFF); --- or --- [DS,XS,YS,INDEX]=SPHERESORT(LAT,LON,LATO,LONO,CUTOFF); INDEX is now an index into the sorted datapoint locations, such that ZS=POLYSMOOTH_INDEX(SIZE(DS),INDEX,K); returns sorted values of Z that can be passed to POLYSMOOTH. The virtue of this approach is that one only has to call TWODSORT or SPHERESORT once, no matter how many variable are to be mapped. __________________________________________________________________ Robustification ZHAT=POLYSMOOTH(...,'robust',NI) implements the median-based iterative robust algorithm of Cleveland (1979), p 830--831, using NI iterations. This can be useful when outliers are present in the data. Typically a single iteration is sufficient to remove most outliers. ZHAT will in this case have NI+1 entries along its third dimension. The iterative estimates are stored in reserve order, with the last iteration in ZHAT(:,:,1) and the original estimate in ZHAT(:,:,NI+1). __________________________________________________________________ Parallelization POLYSMOOTH(...,'parallel') parallelizes the computation using a PARFOR loop, by operating on each latitude (or matrix row) separately. This requires that Matlab's Parallel Computing toolbox be installed. POLYSMOOTH will then using an existing parallel pool, or if one does not exist, a pool will be created using all availabale workers. POLYSMOOTH(...'parallel',Nworkers) alternately specifies the number of workers to use. If you run into memory constraints, reduce Nworkers. If you are working on multiple maps simultaneously, depending on the size of your problem, it may be faster to use an exterior PARFOR loop, rather than calling POLYSMOOTH with the 'parallel' flag. __________________________________________________________________ Quiet option By default, POLYSMOOTH displays a status message saying what row it is working on. POLYSMOOTH(...,'quiet') suppresses this message. ___________________________________________________ 'polysmooth --t' runs some tests. 'polysmooth --f' generates some sample figures. Usage: [ds,xs,ys,zs]=twodsort(x,y,z,xo,yo,cutoff); zhat=polysmooth(ds,xs,ys,[],zs,[],rho,H); [zhat,beta,aux]=polysmooth(ds,xs,ys,[],zs,[],rho,H); --or-- [ds,xs,ys,zs,ws]=spheresort(lat,lon,z,w,lato,lono,cutoff); [zhat,beta,aux]=polysmooth(ds,xs,ys,[],zs,[],rho,H); __________________________________________________________________ This is part of JLAB --- type 'help jlab' for more information (C) 2008--2020 J.M. Lilly --- type 'help jlab_license' for details