ChunkManager

ChunkManager(
    overlap = None,
    group_columns = None,
    keep_partial = False,
    snap_coords = True,
    tolerance = 1.5,
    conflict = raise,
    **kwargs ,
)

A class for managing the chunking of data defined in a dataframe.

The chunk manager handles both splitting and joining of contiguous, or near-contiguous, blocks of data.

Parameters

Parameter	Description
overlap	The amount of overlap between each segment, starting with the end of first row. Negative values can be used for inducing gaps.
group_columns	A sequence of column names which should be used for sorting groups.
keep_partial	If True, keep segments which are shorter than chunk size (at end of contiguous blocks)
tolerance	The upper limit of a gap to tolerate in terms of the sampling along the desired dimension. E.G., the default value means entities with gaps <= 1.5 * {name}_step will be merged.
conflict	Indicates how to handle conflicts in attributes other than those indicated by dim (eg tag, history, station, etc). If “drop” simply drop conflicting attributes, or attributes not shared by all models. If “raise” raise an [AttributeMergeError](`dascore.exceptions.AttributeMergeError`] when issues are encountered. If “keep_first”, just keep the first value for each attribute.

**kawrgs	kwargs specify the column along which to chunk. The key specifies the column along which to chunk, typically, `time` or `distance`, and the value specifies the chunk size. A value of None means to chunk on all available data (e.g. merge all data).

Note

This class is used internally by dc.BaseSpool.chunk.

Methods

Name	Description
chunk	Chunk a dataframe into new contiguous segments.
get_instruction_df	Get a dataframe connecting the chunked dataframe to its origin.