I’m working with lake level measurements of Deep Creek Lake that look as shown in Figure 1, a set of discrete values. The measurements are made at discrete intervals, specifically every 10 minute. In reality the lake level signal is an analog signal, being continuous in both space and time.
Figure 1 - 7,500 data points of Deep Creek Lake water level measurement sampled at 10 minute intervals.
The peaks and valleys are reflective of all kinds of physical processes that take place continuously. Some of the effects shown are when the hydro-electric generator is operating and withdraws water from the lake, when groundwater enters the lake via streams and groundwater seepage along the shoreline, and when precipitation, rain and snow and melting snow occurs. The changes look large in the graph, but in reality, the difference between the top and bottom of the y-axis in the graph is only 1.4 ft. When one takes a subset of the data, use some smoothing by any number of procedures, one obtains a signal that looks like the one shown in Figure 2. The x-axis shows ’ticks’, each tick being 10 minutes. The original data has been smoothed and strongly filtered with a Savitzki-Golay type digital filter (more on that later). This reveals a periodicity of about 10 ticks, or a cycle of about 100 minutes. The lake is ‘sloshing.’ This is called a ‘seiche’, a fairly common and well understood phenomenon. It can be calculated that the period of 100 minutes is equivalent to a water body resonating in a pool that has a length of about 15 miles. That the lake is sloshing, therefore, appears to be a reasonable interpretation, given that the length of the lake has been reported anywhere from 12 to 17 miles long.
Figure 2 - A smoothed subset of the lake level data.
The impulse ( or the energy) that causes and maintains the standing wave is probably from the hydroelectric facility when the generator is turned on and off and wind action along the lake. We need more data to verify this, particularly a long period when the generator is off. In itself, the standing wave is of no consequence to our analysis efforts, it having a peak-to-peak height of less than 1.0 inches. However, it does interfere with disclosing the underlying signal. Hence, we would like to get rid of it. Various methods of smoothing have not been successful. We’ve homed-in on using wavelets (to be discussed in more detail later) or possibly Fourier transforms as the method(s) to remove this periodic signal. Figure 3 is an example of what can be done with wavelet transforms. The blue and purple curves are two different wavelet transforms, one showing smaller detail, the other only the main features.
Figure 3 - Application of Daubechies wavelets to ‘denoise’ a signal.
The data in Figure 3 resembles our data and hence we believe it worthwhile to pursue this approach. The problem is our lack of knowhow. In the following I will be trying to understand how to use them properly and how to de-noise the data.
What are we looking for? The measurements shown in Figure 1 reflect an ongoing, continuous, set of physical processes whose outcome is characterized by the water level of the lake as measured by a gage. As one can see from the measurements, the water level changes with time, water is leaving and water is entering the lake at various rates that each with time up and down, and that the changes are very small. Because of the very small changes being measured, the gage measurement contains noise introduced by the nature of the gage itself. We want to remove that noise, because it has nothing to do with the processes that are causing the lake level to fall or rise. We want to arrive at a smooth signal. As mentioned above, the measured signal also contains an effect that has nothing to do with where the water comes from and goes to. This periodic signal, caused by the sloshing of the lake, is a nuisance in our attempts to understand how the lake level varies. What is a wavelet? You can connect all of the data points to create a solid curve. In theory this solid curve can be represented by the sum of a large set of individual functions. Each of these functions is slightly different from each other so that when they are added together, the signal can be bigger or smaller at a given point in time. By correctly selecting these function and choosing appropriate coefficients for each of them, the values of the experimental data points can be replicated exactly. A lot of computation goes into it, but over the years people have found ways to make the process of selecting the functions and determining the coefficients very efficient. The reason this seemingly complex procedure is used is that it really makes the process of analyzing the details of the signal much simpler. I won’t go into the reasons why, but an excellent, imminently readable book on signal process by James S. Walker, “A Primer on Wavelets and Their Scientific Applications, Second Edition“ explains this in very clear terms. For many years the preferred set of function were the trigonometric sine and cosine functions. Fourier was the person who discovered this process. Hence, using sines and cosines to analyze the data is called Fourier analysis. Figure 4 is an illustration of what can be done with trigonometric functions.
Figure 4 - The sum of five sine waves approximates a square wave with amplitude +- 1 (bottom trace). The amplitude of the sine wave decreases with frequency.
A similar result can be obtained with wavelets. Wavelets came from Fourier analysis, and were discovered by Haar. The term wavelet refers to a set of functions that have the following form:
$$\psi_{ab}(x) = |a|^{-1/2} \psi(\frac{x-b}{a})$$where:
The function \(\psi(x)\) is called the mother wavelet. Visually, the mother wavelet appears like a local oscillation, or wave. Figure 4 shows some of these wavelets.
There are actually a large number of wave forms (wavelets) to choose from. The best one for a particular application depends on the nature of the data (signal) and what we require from the analysis. As mentioned above, the dilation parameter a controls the width and rate of this oscillation. The translation parameter b simply moves the wavelet throughout the domain. Figure 5 shows how dilation works, by varying a (a=0.5, 1.0, 2.0) and setting b to 0 in the “Mexican hat” wavelet.
Figure 5 - Four wavelets: (a) First Derivative of a Gaussian; (b) Mexican hat (second derivative of a Gaussian; © Haar; (d) Morlet (real part).
Figure 6 - How dilation affects the Mexican hat function.
The Mexican hat is derived from the second derivative of the Gaussian wave form ( a probability distribution curve) and is defined as:
Figure 7 - How dilation affects the Mexican hat function.