ChunkManager

class of dascore.utils.chunk source

ChunkManager(
    overlap = None,
    group_columns = None,
    keep_partial = False,
    tolerance = 1.5,
    conflict = raise,
    **kwargs ,
)

A class for managing the chunking of data defined in a dataframe.

The chunk manager handles both splitting and joining of contiguous, or near-contiguous, blocks of data.

Parameters

Parameter Description
overlap The amount of overlap between each segment, starting with the end of first row. Negative values can be used for inducing gaps.
group_columns A sequence of column names which should be used for sorting groups.
keep_partial If True, keep segments which are shorter than chunk size (at end of contiguous blocks)
tolerance The upper limit of a gap to tolerate in terms of the sampling along the desired dimension. E.G., the default value means entities with gaps <= 1.5 * {name}_step will be merged.
conflict Indicates how to handle conflicts in attributes other than those indicated by dim (eg tag, history, station, etc). If “drop” simply drop conflicting attributes, or attributes not shared by all models. If “raise” raise an [AttributeMergeError](dascore.exceptions.AttributeMergeError] when issues are encountered. If “keep_first”, just keep the first value for each attribute.
**kawrgs kwargs specify the column along which to chunk. The key specifies the column along which to chunk, typically, time or distance, and the value specifies the chunk size. A value of None means to chunk on all available data (e.g. merge all data).
Note

This class is used internally by dc.Spool.chunk.

Methods

Name Description
chunk Chunk a dataframe into new contiguous segments.
get_instruction_df Get a dataframe connecting the chunked dataframe to its origin.