ChunkManager

class of dascore.core.spool source

ChunkManager(
    overlap ,
    group_columns ,
    keep_partial ,
    tolerance ,
    **kwargs ,
)

A class for managing the chunking of data defined in a dataframe.

The chunk manager handles both splitting and joining of contiguous, or near-contiguous, blocks of data.

Parameters

Parameter Description
overlap The amount of overlap between each segment, starting with the end of first row. Negative values can be used for inducing gaps.
group_columns A sequence of column names which should be used for sorting groups.
keep_partial If True, keep segments which are shorter than chunk size (at end of contiguous blocks)
tolerance The upper limit of a gap to tolerate in terms of the sampling along the desired dimension. E.G., the default value means entities with gaps <= 1.5 * d_{name} will be merged.
**kawrgs kwargs specify the column along which to chunk. The key specifies the column along which to chunk, typically, time or distance, and the value specifies the chunk size. A value of None means to chunk on all available data (e.g. merge all data).
Note

This class is used internally by dc.Spool.chunk.

Methods

Name Description
chunk Chunk a dataframe into new contiguous segments.
get_instruction_df Get a dataframe connecting the chunked dataframe to its origin.