ChunkManager
ChunkManager(
overlap ,
group_columns ,
keep_partial ,
tolerance ,
**kwargs ,
)
A class for managing the chunking of data defined in a dataframe.
The chunk manager handles both splitting and joining of contiguous, or near-contiguous, blocks of data.
Parameters
Parameter | Description |
---|---|
overlap | The amount of overlap between each segment, starting with the end of first row. Negative values can be used for inducing gaps. |
group_columns | A sequence of column names which should be used for sorting groups. |
keep_partial | If True, keep segments which are shorter than chunk size (at end of contiguous blocks) |
tolerance | The upper limit of a gap to tolerate in terms of the sampling along the desired dimension. E.G., the default value means entities with gaps <= 1.5 * d_{name} will be merged. |
**kawrgs |
kwargs specify the column along which to chunk. The key specifies the column along which to chunk, typically, time or distance , and the value specifies the chunk size. A value of None means to chunk on all available data (e.g. merge all data).
|
Note
This class is used internally by dc.Spool.chunk
.
Methods
Name | Description |
---|---|
chunk | Chunk a dataframe into new contiguous segments. |
get_instruction_df | Get a dataframe connecting the chunked dataframe to its origin. |