import dascore as dc
= [dc.get_example_patch()]
patch_list
= dc.spool(patch_list) spool1
Spool
Spools are containers/managers of patches. Spools come in a few varieties which can manage a group of patches loaded into memory, archives of local files, and (in the future) a variety of clients for accessing remote resources.
Data Sources
The simplest way to get the appropriate spool for a specified input is to use the spool
function, which should work in the vast majority of cases.
Patches (in-memory)
A Single File
import dascore as dc
from dascore.utils.downloader import fetch
= fetch("terra15_das_1_trimmed.hdf5")
path_to_das_file
= dc.spool(path_to_das_file) spool2
A Directory of DAS Files
import dascore as dc
from dascore.utils.downloader import fetch
= fetch("terra15_das_1_trimmed.hdf5")
path_to_das_file = path_to_das_file.parent
directory_path
= dc.spool(directory_path).update() spool3
Despite some implementation differences, all spools have common behavior/methods.
Accessing patches
Patches are extracted from the spool via simple iteration or indexing. New spools are returned via slicing.
import dascore as dc
= dc.get_example_spool()
spool
= spool[0] # extract first patch
patch
# iterate patchs
for patch in spool:
...
# slice spool to create new spool which excludes first patch.
= spool[1:] new_spool
get_contents
Returns a dataframe listing contents. This method may not be supported on all spools, especially those interfacing with vast remote resources.
import dascore as dc
= dc.get_example_spool()
spool
# Return dataframe with contents of spool (each row has metadata of a patch)
= spool.get_contents()
contents print(contents)
data_type | data_category | data_units | time_min | time_max | d_time | time_units | distance_min | distance_max | d_distance | ... | cable_id | dims | tag | station | network | history | file_version | file_format | path | category | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2020-01-03 00:00:00.000 | 2020-01-03 00:00:07.996 | 0 days 00:00:00.004000 | 0.0 | 299.0 | 1.0 | ... | distance,time | random | DAS | |||||||||||
1 | 2020-01-03 00:00:07.996 | 2020-01-03 00:00:15.992 | 0 days 00:00:00.004000 | 0.0 | 299.0 | 1.0 | ... | distance,time | random | DAS | |||||||||||
2 | 2020-01-03 00:00:15.992 | 2020-01-03 00:00:23.988 | 0 days 00:00:00.004000 | 0.0 | 299.0 | 1.0 | ... | distance,time | random | DAS |
3 rows × 22 columns
select
Selects a subset of a spool and returns a new spool. get_contents
will now reflect a subset of the original data requested by the select operation.
import dascore as dc
= dc.get_example_spool()
spool
# select a spool with
= spool.select(time=('2020-01-03T00:00:09', None)) subspool
In addition to trimming the data along a specified dimension (as shown above), select can be used to filter patches that meet a specified criteria.
import dascore as dc
# load a spool which has many diverse patches
= dc.get_example_spool('diverse_das')
spool
# Only include patches which are in network 'das2' or 'das3'
= spool.select(network={'das2', 'das3'})
subspool
# only include spools which match some unix-style query on their tags.
= spool.select(tag='some*') subspool
chunk
Chunk controls how data are grouped together in patches. It can be used to merge contiguous patches together, specify the size of patches for processing, specify overlap with previous segments, etc.
import dascore as dc
= dc.get_example_spool()
spool
# chunk spool for 3 second increments with 1 second overlaps
# and keep any segements that don't have full 3600 seconds
= spool.chunk(time=3, overlap=1, keep_partial=True)
subspool
# merge all contiguous segments along time dimension
= spool.chunk(time=None) merged_spool