import numpy as np
import dascore as dc
= dict(scale=(-200, 200), scale_type="absolute")
plot_kwargs
# Get example patch and add artificial spikes
= dc.get_example_patch("example_event_2")
patch = patch.data.copy()
data
# Add some large spikes at various locations
50, 100] = 600 # Large positive spike
data[100, 150] = -1000 # Large negative spike
data[150, 200] = 1200.0 # Another positive spike
data[
# Add a channel-wide spike (entire distance channel)
75] = 1500.0
data[:,
# Add time spike
75, :] = -1400
data[
# Create patch with spikes
= patch.update(data=data)
spike_patch
# Visualize the data with spikes
= spike_patch.viz.waterfall(**plot_kwargs)
ax "Data with Artificial Spikes"); ax.set_title(
Despiking
This recipe demonstrates how to remove spikes from DAS data.
The Hampel Filter Method
The Hampel filter is effective for removing outliers (spikes) in data by identifying values that deviate significantly from the local median in terms of median absolute deviation (MAD). This makes it particularly useful for cleaning DAS data that may contain various types of noise spikes.
DASCore provides the Patch.hampel_filter
. The filter works by:
- Computing the local median within a sliding window (optionally along multiple dimensions)
- Calculating the median absolute deviation (MAD) within the same window
- Replacing values that exceed a threshold (in MAD units) with the local median
Example Data
We’ll start by creating synthetic data with artificial spikes to demonstrate the filter’s effectiveness:
Basic Hampel Filtering
The simplest usage applies the filter along a single dimension. The threshold
parameter controls the sensitivity - lower values result in more aggressive filtering:
# Apply Hampel filter along time dimension
= spike_patch.hampel_filter(time=0.02, threshold=3.5)
filtered_time
= filtered_time.viz.waterfall(**plot_kwargs)
ax "Filtered along Time (threshold=3.5)"); ax.set_title(
The resulting signal had some of the spikes removed with minimal distortion to the original signal. To remove the time spike, we need to apply the filter along the distance axis. We could apply two filters separately, but often a multidimensional filter is easier.
The Hampel filter can be quite slow if large windows are used. We recommend using window sizes that correspond to <10 samples for most cases.
Multi-dimensional Filtering
For more effective spike removal, especially for channel-wide spikes, apply the filter along multiple dimensions:
# Apply filter along both time and distance dimensions
= spike_patch.hampel_filter(time=0.02, distance=5.0, threshold=3.5)
filtered_2d
= filtered_2d.viz.waterfall(**plot_kwargs)
ax "Filtered along Time and Distance"); ax.set_title(
Best Practices for Hampel Filter
Multi-dimensional filtering: For most DAS applications, applying the filter along both time and distance dimensions is more effective than single-dimension filtering.
Threshold selection: Start with a conservative threshold (3.5-5.0) and adjust based on your data characteristics and noise level.
Window size: Choose window sizes that are large enough to provide stable statistics but small enough to preserve local features. Start with ~5-9 samples per dimension.
Performance optimization:
approximate=True
(default) achieves a 3-4x speedup with minimal quality loss. It is usually good enough for simply removing data spikes, but exact median calculations can be achieved by setting this to False.Validation: Always visually inspect results and consider domain-specific requirements when choosing parameters.