Coordinates and Coordinate Managers

Warning

This page covers advanced DASCore features. Most users will be fine with only the coordinate material presented in the patch tutorial.

In order to manage coordinate labels and array manipulations, DASCore implements two classes, BaseCoordinate, which has several associated subclasses corresponding to different types of coordinates, and CoordManager which manages a group of coordinates. Much like the Patch, instances of both of these classes are immutable (to the extent possible), so they cannot be modified in place but have methods which return new instances.

Coordinates

Coordinates usually keep track of labels along an associated dimension of an array, but they can also be independent of array data. They provide methods for slicing, re-ordering, filtering etc. and are used internally by DASCore for such operations.

Much like DASCore’s Spool, Coordinates are a collection of classes which implement a common interface.

Note

Coordinates are very similar (in concept) to Pandas’ indices, with some significant differences in implementation.

Coordinate Creation

Get Coord

get_coord returns an instance of a subclass of BaseCoord appropriate for the input values. Here are a few examples:

import numpy as np

from dascore.core import get_coord

# Get monotonic, evenly sampled coords from start, stop, step.
range_coord = get_coord(start=0, stop=10, step=1)

# Do the same as above but using an evenly sampled sorted array.
array = np.arange(0, 10, step=1)
array_coord = get_coord(data=array)

# Assert the type and values of the resulting coordinates are the same.
assert range_coord == array_coord

# the Arrays don't have to be evenly sampled, or sorted.
sorted_array = np.sort(np.random.rand(10))
sorted_coord = get_coord(data=sorted_array)

random_array = np.random.rand(10) 
random_coord = get_coord(data=random_array)

Update

Update uses the existing coordinate as a template and returns a coordinate with some part modified.

import numpy as np

import dascore as dc
from dascore.core import get_coord

# Create example coordinate.
start, stop = np.datetime64("2023-01-01"), np.datetime64("2023-01-01T01")
step = np.timedelta64(60, 's')
coord = get_coord(start=start, stop=stop, step=step)

# Update entire array, adding 10s to each element.
new_data = coord.values + np.timedelta64(10, "s")
coord_new_data = coord.update(data=new_data)

# Update step, keeping length and start but shorting end time.
coord_new_step = coord.update(step=np.timedelta64(1800, 's'))

# Change maximum value, keeping length the same and changing step.
coord_new_max = coord.update(max=stop + 10 * step)

Coordinate Attributes

The following tables shows some of the commonly used coordinate attributes:

Coordinate attributes
Attribute Description
sorted True if the coordinate is sorted in ascending order.
reverse_sorted True if the coordinate is sorted in descending order.
evenly_sampled True if the coordinate has uniform step sizes.
dtype The numpy data type of the coordinate.
data Return an array of coordinate values.
units Coordinate units.
degenerate True if the coordinate has a zero length dimension.
min() Return the minimum value in the coordinate.
max() Return the maximum value in the coordinate.

Coordinate Methods

This section highlights some of the coordinate methods. The methods which would cause changes to a data array return a new coordinate and an object that can be used for indexing an array. This can either be a slice instance or another array which uses numpy’s advanced indexing features for sorting or selection.

Sort

sort sorts the values of the coordinate.

import numpy as np

from dascore.core import get_coord

random_array = np.random.rand(10) 
random_coord = get_coord(data=random_array)

# Returns a new array and indexer that can be used to apply the sorting
# opeartion to a dimension of an array.
sorted_coord, indexer = random_coord.sort()

# The array could then be updated as follows.
data = np.random.rand(10, 20)
sorted_data = data[indexer, :] 

Snap

‘snap’ is used to calculate an average spacing between samples and “snap” all values to that spacing. If the coordinate is not sorted, it will be sorted in the process. This method should be used with care since it causes some loss in precision and can introduce inaccuracies in down-stream calculations. The min and max of the coordinate remain unchanged.

import numpy as np

from dascore.core import get_coord

random_array = np.random.rand(10) 
random_coord = get_coord(data=random_array)

sorted_coord, indexer = random_coord.sort()

# The data array can be updated like so:
data = np.random.rand(10, 20)
sorted_data = data[indexer, :] 

Select

select is used for slicing/sub-selecting.

import numpy as np

from dascore.core import get_coord

coord = get_coord(start=0, stop=21, step=1)

new_coord, indexer = coord.select((3, 14))

data = np.random.rand(10, 20)
selected_data = data[:, indexer] 

Most coordinate methods also support units.

import numpy as np

from dascore.core import get_coord
from dascore.units import ft

coord = get_coord(start=0, stop=21, step=1, units='m')

new_coord, indexer = coord.select((14*ft, 50 * ft))
print(new_coord) 
CoordRange( min: 5.0 max: 15.0 step: 1 shape: (11,) dtype: float64 units: m )

Units

convert_units and set_units are used to change/set the units associated with a coordinate.

import numpy as np

from dascore.core import get_coord
from dascore.units import ft

coord = get_coord(start=0, stop=21, step=1, units='m')

# Convert coords to ft.
coord_ft_converted = coord.convert_units("ft")

# Simply change unit label (values remain the same).
coord_ft_set = coord.set_units("ft")

# Create coord with silly units.
silly_units = "10*PI*m/ft * furlongs * fortnight"
coord_silly_units = get_coord(start=10, stop=21, step=1, units=silly_units)

# Simplify the coordinates and modify coordinte values accordiningly.
simple_coord = coord_silly_units.simplify_units()
print(f"Simplified units are: {simple_coord.units}")
print(f"New coord lims are: {simple_coord.limits}") 
Simplified units are: 1 m * s
New coord lims are: (250805152879.9319, 501610305759.8638)

Get Next Index

get_next_index returns the index value (an integer) for where a value would be inserted into the coordinate. It can only be used on a sorted coordinate.

from dascore.core import get_coord
coord = get_coord(start=0, stop=10, step=1)
# Find the index for a value contained by the coordinate.
assert coord.get_next_index(1) == 1
# The next (not closest) index is return for value not in coord.
assert coord.get_next_index(2.000001) == 3

CoordManager

The CoordManager handles a group of coordinates and provides methods for updating managed data arrays.

Coordinate Manager Creation

CoordManager instances can be created from a dictionary of coordinates via the get_coord_manager function.

from dascore.core import get_coord, get_coord_manager

coord_dict = {
    "dim1": get_coord(start=1, stop=10, step=1),
    "dim2": get_coord(start=0.001, stop=1, step=.1),
}

cm = get_coord_manager(coords=coord_dict, dims=("dim1", "dim2"))

# dims are the dimension names (in order).
print(f"dimensions are {cm.dims}")

# coord_map is a mapping of {coord_name: coordinate}.
print(dict(cm.coord_map))

# dim_map is a mapping of {coord_name: (associated_dimensions...)}.
print(dict(cm.dim_map))
dimensions are ('dim1', 'dim2')
{'dim1': CoordRange(units=None, step=1, start=1, stop=10), 'dim2': CoordRange(units=None, step=0.1, start=0.001, stop=1.001)}
{'dim1': ('dim1',), 'dim2': ('dim2',)}

CoordManagers can have non-dimensional coordinates which may or may not be associated with a coordinate dimension.

from dascore.core import get_coord, get_coord_manager

coord_dict = {
    "dim1": get_coord(start=0, stop=10, step=1),
    "dim2": get_coord(start=0.001, stop=1, step=.1),
    # "dim_coord" is a non-dimensional coordinate asscoiated with
    # dim1, so it must have the same shape. Notice how the associated
    # dimension and coordiante values can be specified in a tuple. 
    "dim_coord": ("dim1", get_coord(start=10, stop=20, step=1)),
    # Non-dimensional coordinates are not associated with a dimension
    # and must use None as the first argument in the tuple.
    "non_dim_coord": (None, get_coord(start=1, stop=100, step=1)),
}

cm_many_coords = get_coord_manager(coords=coord_dict, dims=("dim1", "dim2"))
print(cm)
➤ Coordinates (dim1: 9, dim2: 10)
    *dim1: CoordRange( min: 1 max: 9 step: 1 shape: (9,) dtype: int64 )
    *dim2: CoordRange( min: 0.001 max: 0.901 step: 0.1 shape: (10,) dtype: float64 )

Update

update uses an existing CoordinateManager as a template and updates some aspect in the returned coordinate.

import dascore as dc

# Get coordinate manager from default patch.
patch = dc.get_example_patch()
cm = patch.coords

# Add 10 to each distance values create new coord.
dist_coord = cm.get_coord("distance") 
new_dist_array = dist_coord.data + 10
new_dist_coord = dist_coord.update(data=new_dist_array)

# Create new coordinate manager with new distance coord.
new_cm = cm.update(distance=new_dist_coord)

It can also be used to add new coordinates,

distance_length = len(cm.get_coord("distance"))

# Create a new coordinate. 
new_coord = get_coord(data=np.random.rand(distance_length))

# Add it to the coord manager associated with distance dimension.
new_cm_1 = cm.update(new_coord=("distance", new_coord))

# Add the coordinate but dont associate it with any dimension.
new_cm_2 = cm.update(new_coord=(None, new_coord))

and drop or disassociate coordinates.

# Disassociate "new_coord" from dimension distance.
new_cm_3 = new_cm_1.update(new_coord=(None, new_coord))

# Drop coordiante "new_coord".
new_cm_4 = cm.update(new_coord=None)

# Drop dimension "time".
new_cm_5 = cm.update(time=None)
assert "time" not in new_cm_5.dims

Coordinate Manager Methods

Much like BaseCoord, the CoordinateManager class implements a variety of methods for filtering, sorting, modifying units, etc. However, there are some difference. Unlike coordinates, when an operation would change the data array associated with the coordinates, the CoordManager method accepts the array as an argument and returns a new array. Like the Patch methods, CoordManager methods use keyword arguments to specify coordinates by name.

Select

select trims the coordinate manager and, optionally, an associated array.

import dascore as dc

patch = dc.get_example_patch()
cm, data = patch.coords, patch.data

new_cm, new_data = cm.select(data=data, distance=(..., 100))

Sort

sort sorts along one or more axes.

import dascore as dc

patch = dc.get_example_patch()
cm, data = patch.coords, patch.data

# Sort along both dimensions in descending oder.
new_cm, new_data = cm.sort("time", "distance", reverse=True)

Rename Coord

rename_coord renames a coordinate or dimension.

import dascore as dc

patch = dc.get_example_patch()
cm = patch.coords

# Rename time to money.
renamed_cm = cm.rename_coord(time="money")
print(renamed_cm)
➤ Coordinates (distance: 300, money: 2000)
    *distance: CoordRange( min: 0 max: 299 step: 1 shape: (300,) dtype: int64 units: m )
    *money: CoordRange( min: 2017-09-18 max: 2017-09-18T00:00:07.996 step: 0.004s shape: (2000,) dtype: datetime64[ns] units: s )