This recipe demonstrates how to remove spikes from DAS data.
The Hampel Filter Method
The Hampel filter is effective for removing outliers (spikes) in data by identifying values that deviate significantly from the local median in terms of median absolute deviation (MAD). This makes it particularly useful for cleaning DAS data that may contain various types of noise spikes.
Computing the local median within a sliding window (optionally along multiple dimensions)
Calculating the median absolute deviation (MAD) within the same window
Replacing values that exceed a threshold (in MAD units) with the local median
Example Data
We’ll start by creating synthetic data with artificial spikes to demonstrate the filter’s effectiveness:
import numpy as npimport dascore as dcplot_kwargs =dict(scale=(-200, 200), scale_type="absolute")# Get example patch and add artificial spikespatch = dc.get_example_patch("example_event_2")data = patch.data.copy()# Add some large spikes at various locationsdata[50, 100] =600# Large positive spikedata[100, 150] =-1000# Large negative spikedata[150, 200] =1200.0# Another positive spike# Add a channel-wide spike (entire distance channel)data[:, 75] =1500.0# Add time spikedata[75, :] =-1400# Create patch with spikesspike_patch = patch.update(data=data)# Visualize the data with spikesax = spike_patch.viz.waterfall(**plot_kwargs)ax.set_title("Data with Artificial Spikes");
Basic Hampel Filtering
The simplest usage applies the filter along a single dimension. The threshold parameter controls the sensitivity - lower values result in more aggressive filtering:
# Apply Hampel filter along time dimensionfiltered_time = spike_patch.hampel_filter(time=0.02, threshold=3.5)ax = filtered_time.viz.waterfall(**plot_kwargs)ax.set_title("Filtered along Time (threshold=3.5)");
The resulting signal had some of the spikes removed with minimal distortion to the original signal. To remove the time spike, we need to apply the filter along the distance axis. We could apply two filters separately, but often a multidimensional filter is easier.
Warning
The Hampel filter can be quite slow if large windows are used. We recommend using window sizes that correspond to <10 samples for most cases.
Multi-dimensional Filtering
For more effective spike removal, especially for channel-wide spikes, apply the filter along multiple dimensions:
# Apply filter along both time and distance dimensionsfiltered_2d = spike_patch.hampel_filter(time=0.02, distance=5.0, threshold=3.5)ax = filtered_2d.viz.waterfall(**plot_kwargs)ax.set_title("Filtered along Time and Distance");
Best Practices for Hampel Filter
Multi-dimensional filtering: For most DAS applications, applying the filter along both time and distance dimensions is more effective than single-dimension filtering.
Threshold selection: Start with a conservative threshold (3.5-5.0) and adjust based on your data characteristics and noise level.
Window size: Choose window sizes that are large enough to provide stable statistics but small enough to preserve local features. Start with ~5-9 samples per dimension.
Performance optimization: approximate=True (default) achieves a 3-4x speedup with minimal quality loss. It is usually good enough for simply removing data spikes, but exact median calculations can be achieved by setting this to False.
Validation: Always visually inspect results and consider domain-specific requirements when choosing parameters.