Spool

Spools are containers/managers of patches. Spools come in a few varieties which can manage a group of patches loaded into memory, archives of local files, and (in the future) a variety of clients for accessing remote resources.

Data Sources

The simplest way to get the appropriate spool for a specified input is to use the spool function, which should work in the vast majority of cases.

Patches (in-memory)

import dascore as dc

patch_list = [dc.get_example_patch()]

spool1 = dc.spool(patch_list)

A Single File

import dascore as dc
from dascore.utils.downloader import fetch

path_to_das_file = fetch("terra15_das_1_trimmed.hdf5")

spool2 = dc.spool(path_to_das_file)

A Directory of DAS Files

import dascore as dc
from dascore.utils.downloader import fetch

path_to_das_file = fetch("terra15_das_1_trimmed.hdf5")
directory_path = path_to_das_file.parent

spool3 = dc.spool(directory_path).update()


Despite some implementation differences, all spools have common behavior/methods.

Accessing patches

Patches are extracted from the spool via simple iteration or indexing. New spools are returned via slicing.

import dascore as dc

spool = dc.get_example_spool()

patch = spool[0]  # extract first patch

# iterate patchs
for patch in spool:
    ...

# slice spool to create new spool which excludes first patch.
new_spool = spool[1:]

get_contents

Returns a dataframe listing contents. This method may not be supported on all spools, especially those interfacing with vast remote resources.

import dascore as dc
spool = dc.get_example_spool()

# Return dataframe with contents of spool (each row has metadata of a patch)
contents = spool.get_contents()
print(contents)
data_type data_category data_units time_min time_max d_time time_units distance_min distance_max d_distance ... cable_id dims tag station network history file_version file_format path category
0 2020-01-03 00:00:00.000 2020-01-03 00:00:07.996 0 days 00:00:00.004000 0.0 299.0 1.0 ... distance,time random DAS
1 2020-01-03 00:00:07.996 2020-01-03 00:00:15.992 0 days 00:00:00.004000 0.0 299.0 1.0 ... distance,time random DAS
2 2020-01-03 00:00:15.992 2020-01-03 00:00:23.988 0 days 00:00:00.004000 0.0 299.0 1.0 ... distance,time random DAS

3 rows × 22 columns

select

Selects a subset of a spool and returns a new spool. get_contents will now reflect a subset of the original data requested by the select operation.

import dascore as dc
spool = dc.get_example_spool()

# select a spool with
subspool = spool.select(time=('2020-01-03T00:00:09', None))

In addition to trimming the data along a specified dimension (as shown above), select can be used to filter patches that meet a specified criteria.

import dascore as dc
# load a spool which has many diverse patches
spool = dc.get_example_spool('diverse_das')

# Only include patches which are in network 'das2' or 'das3'
subspool = spool.select(network={'das2', 'das3'})

# only include spools which match some unix-style query on their tags.
subspool = spool.select(tag='some*')

chunk

Chunk controls how data are grouped together in patches. It can be used to merge contiguous patches together, specify the size of patches for processing, specify overlap with previous segments, etc.

import dascore as dc
spool = dc.get_example_spool()

# chunk spool for 3 second increments with 1 second overlaps
# and keep any segements that don't have full 3600 seconds
subspool = spool.chunk(time=3, overlap=1, keep_partial=True)

# merge all contiguous segments along time dimension
merged_spool = spool.chunk(time=None)