class: center, middle .title[Continuous Wavelet Analysis 2.0] .author[Jonathan Lilly
1
] .coauthor[Sofia Olhede
2
] .institution[
1
NorthWest Research Associates,
2
University College London] .date[March 9, 2018] .center[[{www.jmlilly.net}](http://www.jmlilly.net)] .footnote[Created with [{Remark.js}](http://remarkjs.com/) using [{Markdown}](https://daringfireball.net/projects/markdown/) + [{MathJax}](https://www.mathjax.org/)] --- class: center ##A Brief History of Wavelets
Wavelets used to be the hot thing. --- class: center ##A Brief History of the EMD
The Empirical Mode Decomposition (EMD) is the new hot thing. --- class: left ##The Method Life Cycle As a result of the formidable challenges in communicating between theoretical and applied communities, a suboptimal cycle emerges. -- 1. A promising new method is proposed, but many mathematical or theoretical details remain unclear. -- 1. There is a great deal of excitement and activity among theoreticians as they work to sort out details. -- 1. More applied communities hear about this method and similarly rush to apply it. -- 1. Researchers in applied communities encounter difficulties not anticipated by theoreticians. -- 1. Owing to gaps between communities, these difficulties are not communicated back to theoreticians. -- 1. Applied researchers feel the method has been oversold, and become disheartened. -- 1. Theoretical researchers, feeling their work is done, have turned to some new methodology. -- 1. The cycle repeats. -- It takes an enormous amount of effort to go from the phase where all the details seem to be sorted, to where a method can actually be used to solve difficult real-world problems (in my experience). --- class: left ##Overview 1. Which continuous wavelet should be used? 1. Wavelet ridge analysis for studying modulated oscillations. 1. ‘Element analysis’ for studying time-localized signals. -- With ridge and element analysis, the wavelet transform becomes the basis for carrying out objective and statistically rigorous analysis on a massive scale. -- The key in moving from a qualitative to a quantitative analysis is, in both cases, the introduction of a *signal model*— that is, a conceptual model for what we are looking for. --- class: center, middle #Part I: Which Wavelet? --- class: left ##Which Wavelet to Use? And Why? Morlet, Cauchy-Klauder-Morse-Paul, Derivative of Gaussian, lognormal or log Gabor, Shannon, or Bessel wavelet? -- In .cite[Lilly and Olhede (2009a, 2012b)], we showed the following: 1. All of the above analytic wavelet types can be subsumed into one much broader family. 2. Within that family, one can objectively say which choice is the ‘best’ for general use. -- The name of this broad ‘superfamily’ of wavelets is the *generalized Morse wavelets.* -- The recommended choice within this wavelet family is that which maximizes frequency-domain symmetry. -- The generalized Morse wavelets have a simple Fourier-domain form `\[\Psi_{\beta,\gamma}(\omega) = a_{\beta,\gamma} \,\omega^\beta e^{-\omega^\gamma}\]` for `$\omega>0$`, with `$\Psi_{\beta,\gamma}(\omega)=0$` elsewhere. Here `$a_{\beta,\gamma} $` is a normalization. --- class: center ##Generalized Morse Wavelets in Time
`$\Psi_{\beta,\gamma}(\omega) = a_{\beta,\gamma} \,\omega^\beta e^{-\omega^\gamma}$` with varying `$\beta$` and `$\gamma$`. --- class: center ##Generalized Morse Wavelets in Frequency
`$\Psi_{\beta,\gamma}(\omega) = a_{\beta,\gamma} \,\omega^\beta e^{-\omega^\gamma}$` with varying `$\beta$` and `$\gamma$`. --- class: center ##Generalized Morse Wavelet Concentration
The time-frequency concentration (Heisenberg area) is maximized near `$\gamma=3$`, where the wavelets are also symmetric in frequency. --- class: left ##Important Parameters It can be readily shown that the generalized Morse wavelet `\[\Psi_{\beta,\gamma}(\omega) = a_{\beta,\gamma} \omega^\beta e^{-\omega^\gamma}\]` has a peak in the frequency domain at frequency `$\omega_{\beta,\gamma}\equiv(\beta/\gamma)^{1/\gamma}$`. Scale can therefore be converted to frequency by `$\omega=s/\omega_{\beta,\gamma}$`. The *half-width* of this peak, as a fraction of its location `$\omega_{\beta,\gamma}$`, is roughly `$1/P_{\beta,\gamma}$` where `$P_{\beta,\gamma}\equiv\sqrt{\beta\gamma}$`. Increasing `$\beta$` has the effect of increasing the number of oscillations in the time domian, or *decreasing* the bandwidth, thus making the wavelet narrower in the frequency domain. Changing `$\gamma$` changes the *shape* of the wavelet in the frequency domain, with a tail to the right for `$\gamma>3$` and to the left for `$\gamma \lt 3$`. `$\beta$` is called the *order* and `$\gamma$` is called the *family*. --- class: left ##The `$\gamma=3$` Family The `$\gamma=3$` family has a number of important properties 1. Maximum frequency-domain symmetry, as measured by a vanishing third derivative at the wavelet peak 2. Smallest Heisenberg area of any integer `$\gamma$` family, i.e., highest time-frequency concentration 3. Close to the absolute minimum curve of Heisenberg area 4. Most Gaussian envelope of any integer `$\gamma$`, i.e. most Morlet-ish 5. Exactly analytic, like all `$\gamma$` families This family can be expressed in terms of the second inhomogeneous Airy function `$\mathrm{Hi}(z)$` `\[\psi_{\beta,3}(t)=a_{\beta,3}(-i)^{\beta}\frac{1} {2}\frac{1}{3^ {1/3}}\frac{d^{\beta}}{dt^{\beta}}\,\left[\mathrm{Hi}\left(\frac{it}{3^ {1/3}}\right)\right]\]` and is therefore called the *Airy family*. As these are all desirable properites, the Airy family is recommended for general use. This means that the only parameter that needs to be adjusted is `$\beta$`, to set the wavelet bandwidth. --- class: left ##The Morlet Wavelet The Morlet wavelet is perhaps the canonical continuous wavelet. The basic idea is to multiply a Gaussian by a complex exponential `\[\psi_{\nu}(t)= a_\nu \, e^{-\frac{1}{2}\,t^2}\left[e^ {i\nu t}-e^ {-\frac{1}{2}\nu^2 }\right].\]` The correction term ensures that the wavelet has zero mean, as is required. The Morlet wavelet, however, has two flaws that become increasingly problematic for narrow time-domain forms. First, the wavelet is not exactly analytic. Second, the correction term deforms the wavelet envelope. The result is that for narrow time-domain settings, transforms with the Morlet wavelet are contaminated by artifacts. It turns out that the Airy wavelet embodies the spirit of the Morlet wavelet more than the Morlet wavelet itself. It is actually more like a Gaussian (for most settings) while remaining exactly analytic. --- class: center ##Morlet (left) vs. Airy (right)
--- class: center ##Morlet (left) vs. Airy (right)
--- class: left ##History of Generalized Morse wavelets 1. Introduced by .cite[Daubechies and Paul (1988b)] in an appendix 1. Discussed and applied by .cite[Bayram and Baraniuk (2000)] 1. Operator properities studied by .cite[Olhede and Walden, (2002)] 1. Roles of `$\beta$` and `$\gamma$` studied by .cite[Lilly and Olhede (2009a)] 1. ‘Superfamily’ status shown by .cite[Lilly and Olhede (2012b)] 1. Adopted by Matlab as default the wavelets for their continuous wavelet routine as of version 2016b, following the recommendations of .cite[Lilly and Olhede (2009a,2012b)] 1. Free implementation in my Matlab toolbox, [{jLab}](http://www.jmlilly.net), since 2004 --- class: center, middle # Part II: Ridge Analysis --- class: left ##Analysis of Modulated Oscillations It often happens that one wishes to analyze a signal of the form `\[x(t)=a(t)\cos \phi(t) \]` which is known as a *modulated oscillation* or *AM/FM* or *quasi-periodic* signal. An extremely powerful method, the *analytic signal* lets an *instantaneous* amplitude, frequency, and bandwidth be uniquely associated with a given modulated oscillation. These ‘instantaneous moments’ decompose the frequency-domain moments of the signal's Fourier spectrum across time. For example, the power-weighted time average of the instantaneous frequency is the mean frequency of the spectrum. This theory is primarily due to .cite[Gabor (1946)], with important contributions from .cite[Bedrosian (1963)], .cite[Vakman (1996)], .cite[Picinbono (1997)], and others. --- class: left ##The Analytic Signal Method The instantenous moments are defined by using the
analytic signal
`\[x_+(t)\equiv x(t) +\mathrm{i}\mathcal{H} x(t) \equiv x(t)+\mathrm{i}\frac{1}{\pi}\,\int_{-\infty}^\infty\frac{x(u)}{t-u}\,\mathrm{d} u\]` with `\(\mathcal{H}\)` being the Hilbert transform. The amplitude is uniquely defined as `$a(t)=|x_+(t)|$` and frequency as `$\omega(t) = \frac{d}{dt}\arg x_+(t)$`. **Uniqueness:** If you wish to assign the instantaneous moments using a
linear filter
, and if you wish your method to recover the correct results for a strictly periodic signal `$x(t)=a_o\cos(\omega_o t)$`—a property called *harmonic correspondence*— there is no alternative to using the analytic signal .cite[(Vakman, 1996)]. **Recoverability:** With `$x(t)$` generated by `$x(t)=a(t)\cos \phi(t)$`, the generating amplitude `$a(t)$` and phase `$\phi(t)$` are exactly recoverable from `$x(t)$` provided `$a(t)$` is supported on strictly lower frequencies than is `$e^{i\phi(t)}$` .cite[(Bedrosian, 1963)]. --- class: left ##Generalization to 2D and 3D Instantaneous moments generalize to multivariate oscillations. A bivariate modulated oscillation `$\mathbf{x}=[x(t)\,\, y(t)]^T$` is equivalent to a time-varying ellipse .cite[(Lilly and Olhede, 2010a)].
A trivariate modulated oscillation `$\mathbf{x}=[x(t)\,\, y(t)\,\, z(t)]^T$` is equivalent to a modulated ellipse within a rotating, tilting plane .cite[(Lilly, 2011)]. The time-varying ellipse properties may be recovered from the vector-valued analytic signal `$\mathbf{x}_+(t)\equiv\mathbf{x}(t)+\mathrm{i}\mathcal{H}\mathbf{x}(t)$`. --- class: left ##Recoverability Condition in 2D Q. Under what conditions are the time-varying properties of an ellipse exactly recoverable from the trajectory of one particle? A. When the ellipse geometry evolves more slowly than circulation of particles around the periphery.
Ellipse parameters for top half are exactly recoverable. --- class: left ##Modulated Oscillations in Noise With noise present, we write `$x(t)=x_o(t)+x_\epsilon(t)$`, where `$x_o(t)$` is an oscillation and `$x_\epsilon(t)$` is noise. In this case, applying the analytic signal method directly fails. The wavelet transform can be used as basis for
recovering
or
estimating
the properties of modulated oscillation immersed in background noise. Note that the analytic wavelet transform implicitly combines filtering, and creation of the analytic signal, into one step. *Wavelet ridge analysis*, due to .cite[Delprat (1992)] and investigated further by .cite[Mallat (1999)], allows the recovery of the oscillation. The idea is to project the time series onto an oscillatory test signal, or
wavelet
, to find a “best fit” frequencies at each moment. The fits are then chained together into a continuous curve called a
ridge.
This method lets us strip off the noise, resulting in an estimate of the analytic part of the oscillatory portion of the signal, `$x_o(t)$`. --- class: left ##Wavelet Ridge Details The wavelet ridge method has two sources of error, *variance* arising form the noise, and *bias* arising from the degree of modulation of the oscillatory signal. An internal estimate of the bias, formed for an unknown modulated oscillation from the wavelet transform itself, was derived by .cite[Lilly and Olhede (2010b)]. The first-order departure from a pure sinusoid is called the *bandwidth*, which has terms like `$a'(t)/a(t)$` (amplitude modulation). The *second-order* departure has terms like `$a''(t)/a(t)$`, which we call the *curvature*. This turns out to control the bias of the estimate. Wavelet ridge analysis was generalized to handle modulated multivariate oscillations by .cite[Lilly and Olhede (2010a)]. A bias estimate for the multivariate case was derived by .cite[Lilly and Olhede (2012)], and is a multivariate analogue of the curvature. Bias estimate is critical for removing false positives! See this [{talk}](http://www.jmlilly.net/talks/lilly18-os.html). --- class: left ##Multivariate Ridge Analysis For a vector-valued signal `\(\mathbf{x}(t)=\begin{bmatrix}x_1(t) & x_2(t) & \cdots & x_N(t)\end{bmatrix}^T\)` in noise `\(\mathbf{x}_\epsilon(t)\)`, `\(\mathbf{x}(t)=\mathbf{x}_o(t)+\mathbf{x}_\epsilon(t)\)`, define the wavelet transform `\[\mathbf{w}(t ,s) \equiv \int_{-\infty}^{\infty} \frac{1}{s} \psi^*\left(\frac{\tau-t}{s}\right)\,\mathbf{x}(\tau)\,\mathrm{d} \tau\]` (also a vector) and then find the
wavelet ridges
`\(s(t)\)` from `\[\frac{\partial}{\partial s}\, \left\|\mathbf{w}(t ,s)\right\| = 0,\quad\quad\frac{\partial^2}{\partial s^ 2}\, \left\|\mathbf{w}(t ,s)\right\| < 0.\]` The oscillation is estimated simply by `\(\widehat{\mathbf{x}}_o(t)\equiv\Re\left\{\mathbf{w}(t,s(t))\right\}.\)` .cite[See Delprat et al. (1992), Lilly and Olhede (2009, 2010, 2012a). ] It is shown by .cite[Lilly (2018)] that ridge analysis is based on the principle of maximizing the *power* (or energy density) of the projection of the signal onto the wavelet. --- class: center ##Example of Multivariate Ridge Analysis
The
ridge
is the curve made by tracing out the maximum modulus of the wavelet transform. --- class: center ##Example of Multivariate Ridge Analysis
The transform value along this curve is an estimate of the oscillatory signal component, visualized as ellipses. Ridge analysis separates modulated elliptical signal from low-frequency meandering and higher-frequency variability. Physically the ellipses quantify the properties of oceanic vortices. --- class: center ### Identifying Vortices from Particle Trajectories
Vortices (loops) are identified using only particle trajectories (dots). --- class: center ### Extraction of Modulated Elliptical Signals
Vortices can be identified, studied, and distinguished from waves. .cite[Lilly, Scott, and Olhede (2011), *GRL*] --- class: center ## Radius-Velocity Distribution of Signals
Coherent eddies are primarily high-frequency and are circularly polarized. Jet meander is lower-frequency and linearly polarized. --- class: center ## Application to the Global Drifter Dataset
Apply multivariate ridge detection algorithm to 20K time series. --- class: center ### Ridge Detection in the Global Drifter Dataset
Red = cyclonic rotation, blue = anticyclonic. --- class: center ### Removing False Positives
After comparison with a null hypothesis of red noise, employing the Matérn process of .cite[Lilly, Sykulski, Early, and Olhede (2017)]. --- class: center, middle # Part III: Element Analysis --- class: left ## Motivation The ridge analysis is intended for signals consisting of modulated oscillations in noise. Another type of signal that may occur is short-duration, isolated ‘bursts’ or ‘impulses’—that is, events which may be represented as a wavelet or the temporal integral of a wavelet. This suggests a model for a real, univariate signal of the form `\begin{equation}\label{signalmodel} x(t) = \sum_{n=1}^N \Re\left\{c_n \psi\left(\frac{t-t_n}{\rho_n}\right)\right\} +x_\epsilon(t) \end{equation}` consisting of superpositions of amplitude scaled, stretched, and phase-shifted versions of some basis signal `$\psi(t)$`, called the *element*. It is assumed that the different realizations of the element are sufficiently separated in time and frequency such that they do not interfere, in a way that will be made precise later. --- class: left ## Ingredients Element analysis consists of several steps: 1. Choice of element 1. Event detection 2. Rejection of statistically insignificant events 3. Rejection of non-isolated events Developed in .cite[Lilly, J. M. (2017). Element analysis: a wavelet-based method for analyzing time-localized events in noisy time series. Proceedings of the Royal Society of London, Series A, 473 (2200): 20160776, 1–28. [[10.7 Mb Pdf]](http://rspa.royalsocietypublishing.org/content/473/2200/20160776)] --- class: center ## Localized Wavelets
The method is suitable for events that themselves resemble wavelets, or the temporal integral of a wavelet (the `$\beta=0$` case). --- class: left ## Signal Model Our signal model, using the generalized Morse wavelets, is `\begin{equation}\label{morseelementmodel} x(t) = \sum_{n=1}^N \Re \left\{ c_n \psi_{\mu,\gamma}\left(\frac{t-t_n}{\rho_n}\right)\right\} +x_\epsilon(t) \end{equation}` where `$\mu$` is the most suitable value of the order or `$\beta$` parameter. The `$n$`th event is then characterized by coefficient `$c_n$`, time `$t_n$`, and scale `$\rho_n$`. We wish to *estimate* these unknown quantities. We then take the wavelet transform with an order `$\beta$` wavelet in the same `$\gamma$` family, leading to `\begin{equation}\label{transformofelementmodel} w_{\beta,\gamma}(\tau,s)=\frac{1}{2}\sum_{n=1}^N c_n\int_{-\infty}^{\infty} \frac{1}{s} \psi_{\beta,\gamma}^*\left(\frac{t-\tau}{s}\right)\psi_{\mu,\gamma}\left(\frac{t-t_n}{\rho_n}\right)\,d t+\varepsilon_{\beta,\gamma}(\tau,s). \end{equation}` Owing to the properties of the generalized Morse wavelets, the integral in the above equation has a closed-form expression in terms of a rescaled `$\psi_{\beta+\mu,\gamma}(t)$` wavelet. --- class: left ## Event Detection We then find *isolated maxima* of the transform, that is, time-scale `$(\tau,s)$` points at which `\begin{multline} \quad\quad\quad\quad\frac{\partial}{\partial \tau}\left|w_{\beta,\gamma}(\tau,s)\right| =\frac{\partial}{\partial s}\left|w_{\beta,\gamma}(\tau,s)\right|=0, \\\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\frac{\partial^2}{\partial \tau^2}\left|w_{\beta,\gamma}(\tau,s)\right|<0,\quad\frac{\partial^2}{\partial s^2}\left|w_{\beta,\gamma}(\tau,s)\right|<0.\label{maxconditions} \end{multline}` Because we have a simple expression for `$w_{\beta,\gamma}(\tau,s)$`, we can relate the event properties `$c_n$` `$t_n$`, and `$\rho_n$` to the value of `$w_{\beta,\gamma}(\tau,s)$` at a maxima. This lets us work backwards from the maxima points to estimates of the event properties. --- class: center ##An Example of Events in White Noise
The events are based on the `$\psi_{2,2}(t)$` wavelet in this case. Grey dots are all maxima, black are significant and isolated. --- class: left ##Assessing Statistical Significance This is the key step for large-scale applications. The obvious approach is to simulate many noise realizations suitable for the time series of interest, and then apply the method. This works great for one time series but not for 20,000. Fortunately, there is a workaround. It turns out that all we need to know is the statistics of a 5-vector, containing this point, the next point, the previous point, the point above, and the point below. Actually, given the assumption of power-law Gaussian noise, having a spectrum of the form `$A^2/\omega^{2\alpha}$`, the covariance statistics of this 5-vector are known analytically. This means that all one needs to do is simulation realizations of this 5-vector in order to determine the probability that a point at a given scale will be a maximum of a certain magnitude, thus establishing statistical significance. --- class: center ##Event Distributions for the Example
Color = 5-vector realizations, gray dots = all detected events Gray squares = values of true events Black circles = estimated values of significant events. Lines are significance curves for event detection rates. --- class: left ##Assessing Isolation This is done through the identification of *regions of influence* around isolated maxima, related to but distinct from concentration regions. Analytic expressions for these regions of influence may be found. An event is rejected if it is within the region of influence (subject to a threshold) of another, larger-amplitude event. This turns out to be mostly important for rejecting artifacts due to closely-spaced events. --- class: center ## A Real-World Application
Measurement tracks for a sea surface height satellite. One of the main windows into the ocean circulation. --- class: center ## A Real-World Application
From an short altimeter track segment crossing the Labrador Sea. --- class: center, middle # Thank you! .center[Visit [{www.jmlilly.net}](http://www.jmlilly.net) for papers, this talk, and a Matlab toolbox of all numerical code.] .center[.footnote[P.S. Like the way this presentation looks? Check out [{Liminal}](https://github.com/jonathanlilly/liminal).]] --- class: left ##The methods paradox The tools we use to look at data matter... but there is a challenge. -- *The methods paradox* -- * The need for powerful and trustworthy data analysis methods increases as datasets become larger and more complex, *but* -- * There are intrinsic barriers that prevent the appropriate methods from reaching those who need them. --- class: left ##The methods paradox What are these barriers? -- 1. Because of the increasing size of fields, those working with data (e.g. oceanographers) and those working on methods for treating data (e.g. statisticians and time series analysts) are frequently unable to find each other. -- 1. Therefore, those on the data side very frequently are forced to ‘reinvent the wheel’, while meanwhile,... -- 1. Those on the methods side do not have enough information to develop the right methods for the most pressing problems. -- 1. Both sides create their own jargon, and cite primarily within their own literature, reinforcing the barriers. This is based on my experience in four different areas: time series analysis, mapping methods, stochastic modeling, and covariance analysis. --- class: left ##The methods paradox Proposed solutions to the methods paradox: -- 1. We need long-term, sustained interactions between the oceanographic community and methods communities such as time series and statistics. -- 1. Those on the applied side need to embrace a spirit of collaboration for time series and other data analysis expertise—rather than only ‘do it yourself ’ or ‘black box ’ approaches. -- 1. We need to prioritize the creation of accessible tutorial literature in modern data analysis methods that is tailored to the needs of our community. -- 1. Just as a class of specialists in numerical modeling is integral to oceanographic research, we also need to train a new class of specialists in data analysis theory and methods. -- The end. Thanks!