In this case study we will be using movement to quantify the collective behaviour of a herd of zebras in escape response. The data we will use is part of a larger dataset collected in Mpala (Kenya), in which researchers simulated a predation event to study the group response of the animals. We will demonstrate how we can compute useful metrics for this analysis using movement.
Using the right environment
If you are following along this chapter on your own computer, make sure to run all code snippets with the animals-in-motion-env environment activated (see prerequisites A.3.3).
6.1 Dataset description
The 3.5-min dataset presented here consists of 44 trajectories of zebras (Equus quagga) expressed in a coordinate system fixed to the ground. Each individual has two keypoints (head and tail). The data was obtained as follows: first, a trained SLEAP model was run on a video clip recorded from a camera drone, to obtain the trajectories of the zebras in a coordinate system linked to the drone. Then, the trajectories were expressed in a coordinate system fixed to the ground using Structure-from-motion (with OpenSFM and OpenDroneMap). This transformation between camera and world coordinate systems is necessary to be able to disentangle the motion of the zebras from the movement of the camera drone. After this coordinate transformation, the data was cleaned by removing low-confidence keypoints and implausible data points.
You can find a detailed description of the approach as a collection of notebooks in this repository. Further details about the dataset and the prototype pipeline can be found in Duporge, Miñano et al(2025).
Acknowledgement
The sample video and original SLEAP trajectories were kindly shared by Dr. Isla Duporge from the Rubenstein Lab at Princeton University, with permission to use for this workshop. The trajectories in world coordinate system were computed by Sofía Miñano, Niko Sirmpilatze and Igor Tatarnikov.
6.2 Load and explore the dataset
First, let’s load and explore the dataset
import matplotlib.pyplot as pltimport numpy as npimport xarray as xrfrom movement import sample_datafrom movement.kinematics import compute_pairwise_distances, compute_speedfrom movement.transforms import scalefrom movement.utils.vector import compute_norm, convert_to_unitfrom movement.utils.reports import report_nan_values
Downloading data from 'https://gin.g-node.org/neuroinformatics/movement-test-data/raw/master/metadata.yaml' to file '/home/runner/.movement/data/temp_metadata.yaml'.
SHA256 hash of downloaded file: ab9c9c95a3f8366583e2cdb9c84610c62bd0bb0c6cbbcbbdc6a3390b6108623f
Use this value as the 'known_hash' argument of 'pooch.retrieve' to ensure that the file hasn't changed if it is downloaded again in the future.
We can see the poses datasetds is made up of two data arrays, position and confidence. In this example, we will use the position data array only, which spans four dimensions: time, space, keypoints and individuals. We can verify there are 44 individuals in this dataset (track_0 to track_43) and two keypoints per individual, labelled H (head) and T (tailbase). The data was collected at 29.97 frames per second, and the dataloader used this information to automatically express the time dimension in seconds.
Position units
The position data in ds is expressed in arbitrary units. This is because no GPS data was available for georeferencing or defining ground control points in the structure-from-motion (SfM) analysis. As a result, the scale factor remains a free parameter in the reconstruction of the world coordinates.
Note however that this will not be a problem for our analysis, since the relative positions between the individuals are still correct. Moreover, we will use the median zebra body length to scale the data to more informative units. For more details on the coordinate systems involved in SfM analysis see the OpenSfM documentation.
6.3 Compute the body length per individual
We define the body vector for each individual as the vector going from the T keypoint (tail) to the H keypoint (head).
We can compute the body length of each individual by computing the norm of the body vector.
# Compute body length per individualbody_length = compute_norm(body_vector)
It would be useful to check if there are missing values in the body length array. We can quickly inspect this using movement’s report_nan_values function.
The output shows that the number of missing values per individual varies between 0.27% and 19.92%. This is not necessarily a problem for our analysis, but it is something to keep in mind when interpreting the results. These missing points are likely due to imperfect tracking of one or both of the keypoints required to compute the body vector.
Let’s compute some basic statistics to get a sense of the distribution of the body length values.
Body length mean: 20.45 a.u.
Body length median: 20.43 a.u.
Body length std: 2.32 a.u.
We can also plot the distribution of body lengths.
Code
fig, ax = plt.subplots()# plot histogram of body length valuescounts, bins, _ = body_length.plot.hist(bins=100)# add reference lines for mean and mean +- 2 stdsax.vlines( body_length_mean, ymin=0, ymax=np.max(counts), color="red", linestyle="-", label="mean body length",)lower_bound = body_length_mean -2* body_length_stdupper_bound = body_length_mean +2* body_length_stdfor bound in [lower_bound, upper_bound]: ax.vlines( bound, ymin=0, ymax=np.max(counts), color="red", linestyle="--", label="mean +- 2 std", )ax.set_ylim(0, np.max(counts))ax.set_xlabel("body length (a.u.)")ax.set_ylabel("counts")ax.legend()
Figure 6.1: Distribution of zebra body lengths.
We can see there is some variability in the body lengths per individual. Part of it may reflect the diversity across individuals, but from visual inspection of the video we expect the majority of it to be due to imperfect tracking of the keypoints. To remove some of these outliers, we continue the analysis considering only body vectors that are within 2 standard deviations of the mean.
We would now like to inspect the orientation of each individual in relation to the group while the simulated escape events take place.
For this, we first compute each animal’s unit body vector. These are a scaled version of the body vectors we just computed, normalised to have unit length. movement provides a convenience function to do this:
We now define the herd vector as the mean of the unit body vectors across all individuals. The mean vector of a set of \(n\) vectors is the sum of all the vectors (i.e., the resultant vector) scaled by \(1/n\).
The resulting array has (time, space) dimensions, which means that we have a single herd vector defined at each timestep.
The norm of the herd vector already gives us an intuition of how aligned the whole herd is. When its norm is close to 1, it means that the majority of the unit body vectors are aligned. When its norm is close to 0, it means that the unit body vectors are dispersed. The norm of the herd vector is sometimes called polarization.
polarization = compute_norm(herd_vector)
We can plot the evolution of the polarization over time to get a sense of how the herd’s alignment changes.
Figure 6.2: Evolution of the herd’s polarization over time.
The plot suggests that the herd alternates between periods of higher and lower polarization.
6.5 Compute average speed of the herd
We would also like to inspect how the speed of the herd changes over the course of the simulated escape events.
First, let’s scale the position data to express it in units of body lengths (BL). This will make the results more interpretable. We can use movement’s scale function to do this.
For simplicity, we would also like to reduce the position of each individual to a single point. A good candidate for this is the centroid, which is the mean of all the keypoints per individual. In our case, the centroid will be the midpoint between the head and tail keypoints.
centroid = position_scaled.mean("keypoints")
Exercise A
Use the centroid data array to:
Compute centroid_speed, the speed of each individual’s centroid.
Compute herd_speed, the average speed across all individuals.
Plot the evolution of herd_speed over time.
Click to reveal the answers
# Compute speed of each zebra's centroidcentroid_speed = compute_speed(centroid)# Compute the average speed across all individualsherd_speed = centroid_speed.mean("individuals")# Plot the evolution of the herd speed over timefig, ax = plt.subplots()ax.plot(herd_speed.time, herd_speed)ax.set_ylabel("herd speed (BL/s)")ax.set_xlabel("time (s)")ax.grid()
We can see that there are four periods in the dataset in which the speed of the herd surpasses 2 BL/s for about 10 to 20 seconds.
We can also inspect how the speed of each individual changes over time.
Code
fig, ax = plt.subplots()im = ax.matshow( centroid_speed, aspect="auto", cmap="viridis",)# convert frames to seconds in y-axistime_ticks_step =1498time_ticks = np.arange(0, len(centroid_speed.time), time_ticks_step) time_labels = [f"{t:.0f}"for t in centroid_speed.time.values[0:-1:time_ticks_step]]ax.set_yticks(time_ticks)ax.set_yticklabels(time_labels)ax.tick_params(axis='x', bottom=True, top=False, labelbottom=True, labeltop=False)ax.set_xlabel("individual")ax.set_ylabel("time (s)") # add colorbarcbar = plt.colorbar(im)cbar.set_label("speed (BL/s)")ax.get_images()[0].set_clim(0, 6) # cap values at 6 BL/s
Figure 6.3: Evolution of the speed of each individual over time.
The plot suggests that the individuals are quite coordinated also in their change of speed. We can clearly see the four periods of higher speed, which correspond to the four simulated escape events.
6.6 Polarization vs speed
Let’s put it all together and inspect how polarization changes with the speed of the herd.
We will use the logarithm of the speed to focus on the higher range of speeds.
log10_herd_speed = np.log10(herd_speed)
We can now plot the polarization in time, colouring the points by the logarithm of the speed.
fig, ax = plt.subplots()sc = ax.scatter( x=polarization.time, y=polarization, c=log10_herd_speed, s=5, cmap="turbo",# rescale color map to 1st and 99th percentiles vmin=log10_herd_speed.quantile(0.01).item(), vmax=log10_herd_speed.quantile(0.99).item(),)ax.set_xlabel("time (s)")ax.set_ylabel("polarization")cbar = plt.colorbar(sc)cbar.set_label("log10 herd speed (BL/s)")
Figure 6.4: Periods of highest polarization are associated with higher speeds
The plot shows that for this dataset, the periods of highest polarization are associated with higher speeds. This is consistent with the interpretation that the zebras become more aligned when escaping at speed, and more dispersed when they are at rest.
Duporge, Isla, Sofia Minano, Nikoloz Sirmpilatze, Igor Tatarnikov, Scott Wolf, Adam L. Tyson, and Daniel Rubenstein. 2025. “Tracking the Flight: Exploring a ComputationalFramework for AnalyzingEscapeResponses in PlainsZebra (Equus Quagga).” arXiv. https://doi.org/10.48550/arXiv.2505.16882.