Despiking

This recipe demonstrates how to remove spikes from DAS data.

The Hampel Filter Method

The Hampel filter is effective for removing outliers (spikes) in data by identifying values that deviate significantly from the local median in terms of median absolute deviation (MAD). This makes it particularly useful for cleaning DAS data that may contain various types of noise spikes.

DASCore provides the Patch.hampel_filter. The filter works by:

  1. Computing the local median within a sliding window (optionally along multiple dimensions)
  2. Calculating the median absolute deviation (MAD) within the same window
  3. Replacing values that exceed a threshold (in MAD units) with the local median

Example Data

We’ll start by creating synthetic data with artificial spikes to demonstrate the filter’s effectiveness:

import numpy as np

import dascore as dc

plot_kwargs = dict(scale=(-200, 200), scale_type="absolute")

# Get example patch and add artificial spikes
patch = dc.get_example_patch("example_event_2")
data = patch.data.copy()

# Add some large spikes at various locations
data[50, 100] = 600   # Large positive spike
data[100, 150] = -1000  # Large negative spike
data[150, 200] = 1200.0  # Another positive spike

# Add a channel-wide spike (entire distance channel)
data[:, 75] = 1500.0

# Add time spike
data[75, :] = -1400

# Create patch with spikes
spike_patch = patch.update(data=data)

# Visualize the data with spikes
ax = spike_patch.viz.waterfall(**plot_kwargs)
ax.set_title("Data with Artificial Spikes");

Basic Hampel Filtering

The simplest usage applies the filter along a single dimension. The threshold parameter controls the sensitivity - lower values result in more aggressive filtering:

# Apply Hampel filter along time dimension
filtered_time = spike_patch.hampel_filter(time=0.02, threshold=3.5)

ax = filtered_time.viz.waterfall(**plot_kwargs)
ax.set_title("Filtered along Time (threshold=3.5)");

The resulting signal had some of the spikes removed with minimal distortion to the original signal. To remove the time spike, we need to apply the filter along the distance axis. We could apply two filters separately, but often a multidimensional filter is easier.

Warning

The Hampel filter can be quite slow if large windows are used. We recommend using window sizes that correspond to <10 samples for most cases.

Multi-dimensional Filtering

For more effective spike removal, especially for channel-wide spikes, apply the filter along multiple dimensions:

# Apply filter along both time and distance dimensions
filtered_2d = spike_patch.hampel_filter(time=0.02, distance=5.0, threshold=3.5)

ax = filtered_2d.viz.waterfall(**plot_kwargs)
ax.set_title("Filtered along Time and Distance");

Best Practices for Hampel Filter

  1. Multi-dimensional filtering: For most DAS applications, applying the filter along both time and distance dimensions is more effective than single-dimension filtering.

  2. Threshold selection: Start with a conservative threshold (3.5-5.0) and adjust based on your data characteristics and noise level.

  3. Window size: Choose window sizes that are large enough to provide stable statistics but small enough to preserve local features. Start with ~5-9 samples per dimension.

  4. Performance optimization: approximate=True (default) achieves a 3-4x speedup with minimal quality loss. It is usually good enough for simply removing data spikes, but exact median calculations can be achieved by setting this to False.

  5. Validation: Always visually inspect results and consider domain-specific requirements when choosing parameters.