Working with Files

The following highlights some DASCore features for working with IO.

Writing Patches to Disk

Patches can be written to disk using the io namespace. The following shows how to write a Patch to disk in the DASDAE format

from pathlib import Path
import dascore as dc

write_path = Path("output_file.h5")
patch = dc.get_example_patch()

patch.io.write(write_path, "dasdae")

PosixPath('output_file.h5')

DirectorySpool

The DirectorySpool is used to retrieve data from a directory of dascore-readable files. It has the same interface as other spools and is created with the dascore.spool function.

For example:

import dascore
from dascore import examples as ex

# Get a directory with several files
diverse_spool = dascore.get_example_spool('diverse_das')
path = ex.spool_to_directory(diverse_spool)

# Create a spool for interacting with the files in the directory.
spool = (
  dascore.spool(path)
  .select(network='das2')  # sub-select das2 network
  .select(time=(None, '2022-01-01'))  # unselect anything after 2022
  .chunk(time=2, overlap=0.5)  # change the chunking of the patches
)

# Iterate each patch and do something with it
for patch in spool:
  ...

Converting Patches to Other Library Formats

The Patch.io namespace also includes functionality for converting Patch instances to datastructures used by other libraries including Pandas, Xarray, and ObsPy. See the external conversion recipe for examples.

Directory Indexer

The ‘DirectoryIndexer’ is used to track the contents of a directory which contains fiber data. It creates a small, hidden HDF index file at the top of the directory which can be efficiently queried for directory contents (it is used internally by the DirectorySpool).

import dascore
from dascore.io.indexer import DirectoryIndexer
from dascore import examples as ex

# Get a directory with several files
diverse_spool = dascore.get_example_spool('diverse_das')
path = ex.spool_to_directory(diverse_spool)

# Create an indexer and update the index. This will include any new files
# with timestamps newer than the last update, or create a new HDF index file
# if one does not yet exist.
indexer = DirectoryIndexer(path).update()

# get the contents of the directory's files
df = indexer.get_contents()

# This dataframe can be used to ascertain data availability, detect gaps, etc.

	time_max	file_format	network	dims	time_step	station	path	time_min	file_version	tag
0	1989-05-04 00:00:07.996	DASDAE		distance,time	0 days 00:00:00.004000	wayout	DAS____wayout__random__1989_05_04T00_00_00_000...	1989-05-04	1	random
1	2020-01-03 00:00:07.996	DASDAE		distance,time	0 days 00:00:00.004000		DAS______some_tag__2020_01_03T00_00_00_0000000...	2020-01-03	1	some_tag
2	2020-01-03 00:00:07.996	DASDAE		distance,time	0 days 00:00:00.004000	big_gaps	DAS____big_gaps__random__2020_01_03T00_00_00_0...	2020-01-03	1	random
3	2020-01-03 00:00:07.996	DASDAE		distance,time	0 days 00:00:00.004000	overlaps	DAS____overlaps__random__2020_01_03T00_00_00_0...	2020-01-03	1	random
4	2020-01-03 00:00:07.996	DASDAE	das2	distance,time	0 days 00:00:00.004000		DAS__das2____random__2020_01_03T00_00_00_00000...	2020-01-03	1	random