Several steps are required to add support for a new file format. To demonstrate, imagine adding support for a format called jingle which conventionally uses a file extension of jgl.
Adding a New IO Module
First, we create a new io module called ‘jingle’ in DASCore’s io module (dascore/io/jingle). Make sure there is a __init__.py file in this module whose docstring describes basic use of the format and lists any non-obvious implementation details that might debug/improve the parser.
contents of dascore/io/jingle/__init__.py
"""Jingle format support module.Jingle is a really cool new DAS format.It supports all the "bells" and whistles.Examples--------import dascore as dcjingle = dc.read('path_to_file.jgl')"""
'\nJingle format support module.\n\nJingle is a really cool new DAS format.\n\nIt supports all the "bells" and whistles.\n\nExamples\n--------\nimport dascore as dc\n\njingle = dc.read(\'path_to_file.jgl\')\n'
Next, create a core.py file in the new module (dascore/io/jingle/core.py). Start by creating a class called JingleIO, or better yet, JingleIOV1, which subclasses `dascore.io.core.FiberIO. Then all you need to do is implement the supported methods.
Contents of dascore/io/jingle/core.py
"""Core module for jingle file format support."""import dascore.exceptionsfrom dascore.io.core import FiberIOclass JingleV1(FiberIO):""" An IO class supporting version 1 of the jingle format. """# you must specify the format name using the name attribute name ='jingle'# you can also define which file extensions are expected like so.# this will speed up determining the formats of files if not specified. preferred_extensions = ('jgl',)# also specifying a version is good practice so when version 2 is# released you can just make another class in the same module named# JingleV2. version ='1'def read(self, path, jingle_param=1, **kwargs):""" Read should take a path and return a patch or sequence of patches. It can also define its own optional parameters, and should always accept kwargs. If the format supports partial reads, these should be implemented as well. """def get_format(self, path):""" Used to determine if path is a supported jingle file. Returns a tuple of (format_name, file_version) if the file is a supported jingle file, else return False or raise a dascore.exceptions.UnknownFiberFormat exception. """def scan(self, path):""" Used to get metadata about a file without reading the whole file. This should return a list of `PatchFileSummary` objects from the dascore.core.schema module. """def write(self, patch, path, **kwargs):""" Write a patch or spool back to disk in the jingle format. """
All 4 methods are optional; some formats will only support reading, others only writing. If is_format is not implemented the format will not be auto-detectable.
Warning
It is very important that the scan method returns exactly the same min and max values for each dimension as the trace attrs created by read. If this is not the case, the lazy merge planning done by spool can be inaccurate!
Writing Tests
Next we need to write tests for the format (you weren’t thinking of skipping this step were you!?). The tests should go in: test/test_io/test_jingle.py. The hardest part is finding a small (typically no more than 1-2 mb) file to include in the test suite. The Adding test data section details how to do this.
Assuming you added a file called “jingle_test_file.jgl”, then the test file might look something like this:
contents of tests/test_io/test_jingle.py
import pytestimport dascorefrom dascore.utils.downloader import fetchclass TestJingleIO:"""Tests for jingle IO format."""@pytest.fixture(scope='class')def jingle_file_path(self):"""Return the path to the test jingle file.""" path = fetch("jingle_test_file.jgl")return path@pytest.fixture(scope='class')def jingle_patch(self, jingle_file_path):"""Read the jingle test data"""return dascore.read(jingle_file_path, file_format='jingle')[0]def test_read(self, jingle_file_path):"""Ensure the test file can be read."""# fetch will download the file (if not already downloaded) and# test read without specifying format out1 = dascore.read(jingle_file_path)# next assert things about the output, maybe len, attributes etc.assertlen(out1) ==1# test read specifying format out2 = dascore.read(jingle_file_path, file_format='jingle') ...def test_write(self, tmp_path_factory, jingle_patch):"""Ensure jingle format can be written.""" out_path = tmp_path_factory.mktemp('jingle') /'jingle_test.jgl' jingle_patch.io.write(jingle_patch, out_path)# make sure the new file exists and has a sizeassert out_path.exists()assert out_path.stat().size >0def test_scan(self, jingle_file_path):"""Tests for scanning a jingle file""" scan = dascore.scan(jingle_file_path)assertlen(scan) ==1# should also test the scan time/distance min/max values are# the same as the ones in the patch returned by readdef test_is_format(self, jingle_file_path):"""Tests for automatically determining a jingle format.""" file_format, version = dascore.get_format(jingle_file_path)assert file_format.lower() =='jingle'
Register Plugin
Now that the Jingle format support is implemented and tested, the final step is to register the jingle core module in DASCore’s entry points. This is done under the [project.entry-points.”dascore.fiber_io”] section in dascore’s pyproject.toml file. For example, after adding jingle the pyproject.toml section might look like this: