ChunkManager

class of dascore.utils.chunk source

ChunkManager(
    overlap = None,
    group_columns = None,
    keep_partial = False,
    tolerance = 1.5,
    conflict = raise,
    **kwargs ,
)

A class for managing the chunking of data defined in a dataframe.

The chunk manager handles both splitting and joining of contiguous, or near-contiguous, blocks of data.

Parameters

Parameter Description
overlap The amount of overlap between each segment, starting with the end of
first row. Negative values can be used for inducing gaps.
group_columns A sequence of column names which should be used for sorting groups.
keep_partial If True, keep segments which are shorter than chunk size (at end of
contiguous blocks)
tolerance The upper limit of a gap to tolerate in terms of the sampling
along the desired dimension. E.G., the default value means entities
with gaps <= 1.5 * {name}_step will be merged.
conflict Indicates how to handle conflicts in attributes other than those
indicated by dim (eg tag, history, station, etc). If “drop” simply
drop conflicting attributes, or attributes not shared by all models.
If “raise” raise an
[AttributeMergeError](dascore.exceptions.AttributeMergeError] when
issues are encountered. If “keep_first”, just keep the first value
for each attribute.
**kawrgs kwargs specify the column along which to chunk. The key specifies the
column along which to chunk, typically, time or distance, and the
value specifies the chunk size. A value of None means to chunk on all
available data (e.g. merge all data).
Note

This class is used internally by dc.Spool.chunk.

Methods

Name Description
chunk Chunk a dataframe into new contiguous segments.
get_instruction_df Get a dataframe connecting the chunked dataframe to its origin.