import numpy as np
from dascore.core import get_coord
# Get monotonic, evenly sampled coords from start, stop, step.
= get_coord(start=0, stop=10, step=1)
range_coord
# Do the same as above but using an evenly sampled sorted array.
= np.arange(0, 10, step=1)
array = get_coord(data=array)
array_coord
# Assert the type and values of the resulting coordinates are the same.
assert range_coord == array_coord
# the Arrays don't have to be evenly sampled, or sorted.
= np.sort(np.random.rand(10))
sorted_array = get_coord(data=sorted_array)
sorted_coord
= np.random.rand(10)
random_array = get_coord(data=random_array) random_coord
Coordinates and Coordinate Managers
This page covers advanced DASCore features. Most users will be fine with only the coordinate material presented in the patch tutorial.
In order to manage coordinate labels and array manipulations, DASCore implements two classes, BaseCoordinate, which has several associated subclasses corresponding to different types of coordinates, and CoordManager which manages a group of coordinates. Much like the Patch
, instances of both of these classes are immutable (to the extent possible), so they cannot be modified in place but have methods which return new instances.
Coordinates
Coordinates usually keep track of labels along an associated dimension of an array, but they can also be independent of array data. They provide methods for slicing, re-ordering, filtering etc. and are used internally by DASCore for such operations.
Much like DASCore’s Spool
, Coordinates are a collection of classes which implement a common interface.
Coordinates are very similar (in concept) to Pandas’ indices, with some significant differences in implementation.
Coordinate Creation
Get Coord
get_coord
returns an instance of a subclass of BaseCoord
appropriate for the input values. Here are a few examples:
Update
Update uses the existing coordinate as a template and returns a coordinate with some part modified.
import numpy as np
import dascore as dc
from dascore.core import get_coord
# Create example coordinate.
= np.datetime64("2023-01-01"), np.datetime64("2023-01-01T01")
start, stop = np.timedelta64(60, 's')
step = get_coord(start=start, stop=stop, step=step)
coord
# Update entire array, adding 10s to each element.
= coord.values + np.timedelta64(10, "s")
new_data = coord.update(data=new_data)
coord_new_data
# Update step, keeping length and start but shorting end time.
= coord.update(step=np.timedelta64(1800, 's'))
coord_new_step
# Change maximum value, keeping length the same and changing step.
= coord.update(max=stop + 10 * step) coord_new_max
Coordinate Attributes
The following tables shows some of the commonly used coordinate attributes:
Attribute | Description |
---|---|
sorted |
True if the coordinate is sorted in ascending order. |
reverse_sorted |
True if the coordinate is sorted in descending order. |
evenly_sampled |
True if the coordinate has uniform step sizes. |
dtype |
The numpy data type of the coordinate. |
data |
Return an array of coordinate values. |
units |
Coordinate units. |
degenerate |
True if the coordinate has a zero length dimension. |
min() |
Return the minimum value in the coordinate. |
max() |
Return the maximum value in the coordinate. |
Coordinate Methods
This section highlights some of the coordinate methods. The methods which would cause changes to a data array return a new coordinate and an object that can be used for indexing an array. This can either be a slice
instance or another array which uses numpy’s advanced indexing features for sorting or selection.
Sort
sort
sorts the values of the coordinate.
import numpy as np
from dascore.core import get_coord
= np.random.rand(10)
random_array = get_coord(data=random_array)
random_coord
# Returns a new array and indexer that can be used to apply the sorting
# opeartion to a dimension of an array.
= random_coord.sort()
sorted_coord, indexer
# The array could then be updated as follows.
= np.random.rand(10, 20)
data = data[indexer, :] sorted_data
Snap
‘snap’ is used to calculate an average spacing between samples and “snap” all values to that spacing. If the coordinate is not sorted, it will be sorted in the process. This method should be used with care since it causes some loss in precision and can introduce inaccuracies in down-stream calculations. The min and max of the coordinate remain unchanged.
import numpy as np
from dascore.core import get_coord
= np.random.rand(10)
random_array = get_coord(data=random_array)
random_coord
= random_coord.sort()
sorted_coord, indexer
# The data array can be updated like so:
= np.random.rand(10, 20)
data = data[indexer, :] sorted_data
Select
select
is used for slicing/sub-selecting.
import numpy as np
from dascore.core import get_coord
= get_coord(start=0, stop=21, step=1)
coord
= coord.select((3, 14))
new_coord, indexer
= np.random.rand(10, 20)
data = data[:, indexer] selected_data
Most coordinate methods also support units.
import numpy as np
from dascore.core import get_coord
from dascore.units import ft
= get_coord(start=0, stop=21, step=1, units='m')
coord
= coord.select((14*ft, 50 * ft))
new_coord, indexer print(new_coord)
CoordRange( min: 5 max: 15 step: 1 shape: (11,) dtype: int64 units: m )
Units
convert_units
and set_units
are used to change/set the units associated with a coordinate.
import numpy as np
from dascore.core import get_coord
from dascore.units import ft
= get_coord(start=0, stop=21, step=1, units='m')
coord
# Convert coords to ft.
= coord.convert_units("ft")
coord_ft_converted
# Simply change unit label (values remain the same).
= coord.set_units("ft")
coord_ft_set
# Create coord with silly units.
= "10*PI*m/ft * furlongs * fortnight"
silly_units = get_coord(start=10, stop=21, step=1, units=silly_units)
coord_silly_units
# Simplify the coordinates and modify coordinte values accordiningly.
= coord_silly_units.simplify_units()
simple_coord print(f"Simplified units are: {simple_coord.units}")
print(f"New coord lims are: {simple_coord.limits}")
Simplified units are: 1 m * s
New coord lims are: (250805152879.9319, 501610305759.8638)
Get Next Index
get_next_index
returns the index value (an integer) for where a value would be inserted into the coordinate. It can only be used on a sorted coordinate.
from dascore.core import get_coord
= get_coord(start=0, stop=10, step=1)
coord # Find the index for a value contained by the coordinate.
assert coord.get_next_index(1) == 1
# The next (not closest) index is return for value not in coord.
assert coord.get_next_index(2.000001) == 3
CoordManager
The CoordManager
handles a group of coordinates and provides methods for updating managed data arrays.
Coordinate Manager Creation
CoordManager
instances can be created from a dictionary of coordinates via the get_coord_manager
function.
from dascore.core import get_coord, get_coord_manager
= {
coord_dict "dim1": get_coord(start=1, stop=10, step=1),
"dim2": get_coord(start=0.001, stop=1, step=.1),
}
= get_coord_manager(coords=coord_dict, dims=("dim1", "dim2"))
cm
# dims are the dimension names (in order).
print(f"dimensions are {cm.dims}")
# coord_map is a mapping of {coord_name: coordinate}.
print(dict(cm.coord_map))
# dim_map is a mapping of {coord_name: (associated_dimensions...)}.
print(dict(cm.dim_map))
dimensions are ('dim1', 'dim2')
{'dim1': CoordRange( min: 1 max: 9 step: 1 shape: (9,) dtype: int64 ), 'dim2': CoordRange( min: 0.001 max: 0.901 step: 0.1 shape: (10,) dtype: float64 )}
{'dim1': ('dim1',), 'dim2': ('dim2',)}
CoordManager
s can have non-dimensional coordinates which may or may not be associated with a coordinate dimension.
from dascore.core import get_coord, get_coord_manager
= {
coord_dict "dim1": get_coord(start=0, stop=10, step=1),
"dim2": get_coord(start=0.001, stop=1, step=.1),
# "dim_coord" is a non-dimensional coordinate asscoiated with
# dim1, so it must have the same shape. Notice how the associated
# dimension and coordiante values can be specified in a tuple.
"dim_coord": ("dim1", get_coord(start=10, stop=20, step=1)),
# Non-dimensional coordinates are not associated with a dimension
# and must use None as the first argument in the tuple.
"non_dim_coord": (None, get_coord(start=1, stop=100, step=1)),
}
= get_coord_manager(coords=coord_dict, dims=("dim1", "dim2"))
cm_many_coords print(cm)
➤ Coordinates (dim1: 9, dim2: 10)
*dim1: CoordRange( min: 1 max: 9 step: 1 shape: (9,) dtype: int64 )
*dim2: CoordRange( min: 0.001 max: 0.901 step: 0.1 shape: (10,) dtype: float64 )
Update
update
uses an existing CoordinateManager
as a template and updates some aspect in the returned coordinate.
import dascore as dc
# Get coordinate manager from default patch.
= dc.get_example_patch()
patch = patch.coords
cm
# Add 10 to each distance values create new coord.
= cm.get_coord("distance")
dist_coord = dist_coord.data + 10
new_dist_array = dist_coord.update(data=new_dist_array)
new_dist_coord
# Create new coordinate manager with new distance coord.
= cm.update(distance=new_dist_coord) new_cm
It can also be used to add new coordinates,
= len(cm.get_coord("distance"))
distance_length
# Create a new coordinate.
= get_coord(data=np.random.rand(distance_length))
new_coord
# Add it to the coord manager associated with distance dimension.
= cm.update(new_coord=("distance", new_coord))
new_cm_1
# Add the coordinate but dont associate it with any dimension.
= cm.update(new_coord=(None, new_coord)) new_cm_2
and drop or disassociate coordinates.
# Disassociate "new_coord" from dimension distance.
= new_cm_1.update(new_coord=(None, new_coord))
new_cm_3
# Drop coordiante "new_coord".
= cm.update(new_coord=None)
new_cm_4
# Drop dimension "time".
= cm.update(time=None)
new_cm_5 assert "time" not in new_cm_5.dims
Coordinate Manager Methods
Much like BaseCoord
, the CoordinateManager
class implements a variety of methods for filtering, sorting, modifying units, etc. However, there are some difference. Unlike coordinates, when an operation would change the data array associated with the coordinates, the CoordManager
method accepts the array as an argument and returns a new array. Like the Patch
methods, CoordManager
methods use keyword arguments to specify coordinates by name.
Select
select trims the coordinate manager and, optionally, an associated array.
import dascore as dc
= dc.get_example_patch()
patch = patch.coords, patch.data
cm, data
= cm.select(data=data, distance=(..., 100)) new_cm, new_data
Sort
sort sorts along one or more axes.
import dascore as dc
= dc.get_example_patch()
patch = patch.coords, patch.data
cm, data
# Sort along both dimensions in descending oder.
= cm.sort("time", "distance", reverse=True) new_cm, new_data
Rename Coord
rename_coord renames a coordinate or dimension.
import dascore as dc
= dc.get_example_patch()
patch = patch.coords
cm
# Rename time to money.
= cm.rename_coord(time="money")
renamed_cm print(renamed_cm)
➤ Coordinates (distance: 300, money: 2000)
*distance: CoordRange( min: 0 max: 299 step: 1 shape: (300,) dtype: int64 units: m )
*money: CoordRange( min: 2017-09-18 max: 2017-09-18T00:00:07.996 step: 0.004s shape: (2000,) dtype: datetime64[ns] units: s )