hampel_filter

function of dascore.proc.hampel source

hampel_filter(
    patch: Patch ,
    threshold: float = 10.0,
    samples: bool = False,
    approximate: bool = True,
    **kwargs ,
)-> ‘PatchType’

A Hampel filter implementation useful for removing spikes in data.

Parameters

Parameter Description
patch Input patch.
threshold Outlier threshold in MAD units. Default is 10.0.
samples If True, values specified by kwargs are in samples not coordinate units.
approximate If True, use fast approximation algorithms for improved performance.
This applies 1D median filters sequentially along each dimension
instead of a true 2D median filter, providing a 3-4x speedup.
The approximation is usually good enough for spike removal purposes.
**kwargs Used to specify the lengths of the filter in each dimension. Each
selected dim must be evenly sampled and should represent a window
with an odd number of samples.

Warning

Selecting windows with many samples can be very slow. It is recommended window size in each dimension be <10 samples.

Returns

Patch with outliers replaced by local median.

Note

When samples=False, even window lengths are bumped to the next odd value to ensure a clean median calculation. When samples=True, an even sample count raises a ParameterError.

Edge Handling: - Edge effects may differ slightly between modes due to different padding strategies based on the patch’s dimensionality and use of approximate parameter.

Performance: - approximate=True provides 3-4x speedup over exact calculations - Installing bottleneck package can further improve performance (~50%) which applies to both approximate and exact modes.

See Also

Examples

import numpy as np
import dascore as dc
# Get an example patch and add artificial spikes
patch = dc.get_example_patch()
data = patch.data.copy()
data[10, 5] = 10  # Add a large spike
patch = patch.update(data=data)

# Apply hampel filter along time dimension with 0.2 unit window
filtered = patch.hampel_filter(time=0.2, threshold=3.5)
assert filtered.data.shape == patch.data.shape
# The spike should be reduced
assert abs(filtered.data[10, 5]) < abs(patch.data[10, 5])

# Apply filter along multiple dimensions using samples and
# default threshold.
filtered_2d = patch.hampel_filter(time=5, distance=5, samples=True)
assert filtered_2d.data.shape == patch.data.shape

# Use exact median calculations (slower, more accurate)
filtered_exact = patch.hampel_filter(
    time=5, distance=5, samples=True, approximate=False
)