In this case study, we’ll be using the movement package to dive into mouse home cage monitoring data acquired in Smart-Kages and tracked with DeepLabCut. We’ll explore how mouse activity levels fluctuate throughout the day.
Before you get started, make sure you’ve set up the animals-in-motion-env environment (refer to prerequisites A.3.3) and are using it to run this notebook. You’ll also need to download the Smart-Kages.zip archive from Dropbox (see prerequisites A.4) and unzip it.
5.1 Import libraries
from pathlib import Pathimport matplotlib.pyplot as pltimport pandas as pdimport numpy as npimport xarray as xrfrom movement import sample_datafrom movement.filtering import filter_by_confidencefrom movement.kinematics import compute_speedfrom movement.plots import plot_occupancy
Downloading data from 'https://gin.g-node.org/neuroinformatics/movement-test-data/raw/master/metadata.yaml' to file '/home/runner/.movement/data/temp_metadata.yaml'.
SHA256 hash of downloaded file: cf2876bab4f754d48d3c9f113ce5ac91787304cc587d33d8bf1124d5358e957f
Use this value as the 'known_hash' argument of 'pooch.retrieve' to ensure that the file hasn't changed if it is downloaded again in the future.
5.2 The Smart-Kages dataset
Acknowledgement
This dataset was kindly shared by Loukia Katsouri from the O’Keefe Lab, with permission to use for this workshop.
The Smart-Kages dataset comprises home cage recordings from two mice, each housed in a specialised Smart-Kage(Ho et al. 2023)—a home cage monitoring system equipped with a camera mounted atop the cage.
The camera captures data around the clock at a rate of 2 frames per second, saving a video segment for each hour of the day. A pre-trained DeepLabCut model is subsequently employed to predict 8 keypoints on the mouse’s body.
Let’s examine the contents of the downloaded data. You will need to specify the path to the unzipped Smart-Kages folder on your machine.
# Replace with the path to the unzipped Smart-Kages folder on your machinesmart_kages_path = Path.home() /".movement"/"Smart-Kages"# Let's visualise the contents of the folderfiles = [f.name for f in smart_kages_path.iterdir()]files.sort()forfilein files:print(file)
The tracking data are stored in two.nc (NetCDF) files: kage14 and kage17. NetCDF is an HDF5-based file format that can be natively saved/loaded by the xarray library, and is therefore convenient to use with movement.
Apart from these, we also have two .png files: kage14_background.png and kage17_background.png, which constitute frames extracted from the videos.
Figure 5.1: Top-down camera views of the Smart-Kage habitats
Questions
What objects do you see in the habitat?
What challenges do you anticipate with tracking a mouse in this environment?
What are the trade-offs one has to consider when designing a continuous monitoring system?
Let’s load and inspect the tracking data:
ds_kages = {} # a dictionary to store kage name -> xarray datasetfor kage in ["kage14", "kage17"]: ds_kages[kage] = xr.open_dataset(smart_kages_path /f"{kage}.nc")ds_kages["kage14"] # Change to "kage17" to inspect the other dataset
We see that each dataset contains a huge amount of data!
Code
start_date_k14 = pd.to_datetime(ds_kages["kage14"].time.data[0])end_date_k14 = pd.to_datetime(ds_kages["kage14"].time.data[-1])duration_k14 = end_date_k14 - start_date_k14start_date_k17 = pd.to_datetime(ds_kages["kage17"].time.data[0])end_date_k17 = pd.to_datetime(ds_kages["kage17"].time.data[-1])duration_k17 = end_date_k17 - start_date_k17print("Experiment durations:")print(f"kage-14: from {start_date_k14} to {end_date_k14} ({duration_k14.days} days)")print(f"kage-17: from {start_date_k17} to {end_date_k17} ({duration_k17.days} days)")
Experiment durations:
kage-14: from 2024-04-08 13:55:40 to 2024-05-10 07:59:59.501205 (31 days)
kage-17: from 2024-04-03 00:00:06 to 2024-05-10 07:59:59.509103 (37 days)
5.3 Datetime Coordinates
You might notice something interesting about the time coordinates in these xarray datasets: they’re given in datetime64[ns] format, which means they’re precise timestamps expressed in “calendar time”.
This is different from what we’ve seen before in other movement datasets, where time coordinates are expressed as seconds elapsed since the start of the video, or “elapsed time”.
How did we get these timestamps?
Many recording systems can output timestamps for each video frame. In our case, the raw data from the Smart-Kage system included the start datetime of each 1-hour-long video segment and the precise time difference between the start of each segment and every frame within it.
Using this information, we were able to reconstruct precise datetime coordinates for all frames throughout the entire experiment. We then concatenated the DeepLabCut predictions from all video segments and assigned the datetime coordinates to the resulting dataset. If you’re interested in the details, you can find the code in the smart-kages-movement GitHub repository.
Using “calendar time” is convenient for many applications. For example, we could cross-reference the tracking results against other data sources, such as body weight measurements. It also allows us to easily select time windows by datetime:
ds_14 = ds_kages["kage14"]# Select a specific monthds_14.sel(time=slice("2024-04-01", "2024-04-30"))# Select a specific dayds_14.sel(time=slice("2024-04-17 00:00:00", "2024-04-17 23:59:59"))# Select a half-hour windowds_14.sel(time=slice("2024-04-17 09:30:00", "2024-04-17 10:00:00"))
That said, it’s still useful to also know the total time elapsed since the start of the experiment. In fact, many movement functions will expect “elapsed time” and may not work with datetime coordinates (for now).
Luckily, it’s easy to convert datetime coordinates to “elapsed time” by simply subtracting the start datetime of the whole experiment from each timestamp.
Expand to see how this can be done
# Get the start datetime the experiment in kage14experiment_start = ds_14.time.isel(time=0).data# Subtract the start datetime from each timestamptime_elapsed = (ds_14.time.data - np.datetime64(experiment_start))# Convert to secondsseconds_elapsed = time_elapsed / pd.Timedelta("1s")# Assign the seconds_elapsed coordinate to the "time" dimensionds_14 = ds_14.assign_coords(seconds_elapsed=("time", seconds_elapsed))
We’ve pre-computed this for convenience and stored it in a secondary time coordinate called seconds_elapsed.
Whenever we want to switch to “elapsed time” mode, we can simply set the seconds_elapsed coordinates as the “index” of the time dimension. This means that seconds_elapsed will be used as the primary time coordinate, allowing us to select data by it.
It looks like the “neck”, “bodycenter”, “spine1”, and “spine2” keypoints are the most confidently detected. Let us define a list of “reliable” keypoints for later use. These are all on the mouse’s body.
Filtering kage14...
No missing points (marked as NaN) in input.
Missing points (marked as NaN) in output:
keypoints snout leftear rightear neck spine1 bodycenter spine2 tailbase
individuals
individual_0 3541459/5236793 (67.63%) 2326818/5236793 (44.43%) 2670878/5236793 (51.0%) 1763878/5236793 (33.68%) 1195647/5236793 (22.83%) 1109151/5236793 (21.18%) 1762748/5236793 (33.66%) 3443803/5236793 (65.76%)
Filtering kage17...
No missing points (marked as NaN) in input.
Missing points (marked as NaN) in output:
keypoints snout leftear rightear neck spine1 bodycenter spine2 tailbase
individuals
individual_0 4729215/6063448 (78.0%) 4296942/6063448 (70.87%) 4380660/6063448 (72.25%) 3934047/6063448 (64.88%) 3215592/6063448 (53.03%) 1959611/6063448 (32.32%) 3568964/6063448 (58.86%) 4471744/6063448 (73.75%)
5.5 Plot the mouse’s speed over time
Let’s define a single-point representation of the mouse’s position, which we’ll call the body_centroid. We derive this by taking the mean of the 4 reliable keypoints, using their filtered positions.
for kage, ds in ds_kages.items(): ds["body_centroid"] = ds.position_filtered.sel( individuals="individual_0", # the only individual in the dataset keypoints=reliable_keypoints ).mean(dim="keypoints")
Now let’s compute the body centroid speed as a proxy of the mouse’s speed. For compute_speed to work properly, we’ll temporarily switch to “elapsed time” mode.
for kage, ds in ds_kages.items(): ds["body_centroid_speed"] = compute_speed( ds.body_centroid.set_index(time="seconds_elapsed") ).assign_coords(time=ds.body_centroid.time)ds_kages["kage14"].body_centroid_speed
fig, axes = plt.subplots( nrows=2, ncols=1, figsize=(10, 6), sharex=True, sharey=True)for i, kage inenumerate(["kage14", "kage17"]): ds_kages[kage].body_centroid_speed.plot.line(ax=axes[i]) axes[i].set_title(f"{kage} body centroid") axes[i].set_ylabel("speed (pixels/sec)")plt.show()
Figure 5.3: Body centroid speed
Exercise B
What do you notice about the overall speed fluctuations over time?
What do you think is the reason for this?
Do you think there are any differences between the two kages? Feel free to “zoom in” on specific time windows to investigate this.
5.6 Plot the occupancy heatmap
For the heatamp calculation to properly work, we need to temporarily set the time coordinates to “elapsed time”.
Code
fig, axes = plt.subplots( nrows=2, ncols=1, figsize=(8, 8), sharex=True, sharey=True)for i, kage inenumerate(["kage14", "kage17"]): img = plt.imread(img_paths[i]) height, width = img.shape[:2] axes[i].imshow(img) plot_occupancy( ds_kages[kage].body_centroid.set_index(time="seconds_elapsed"), ax=axes[i], cmap="turbo", norm="log", # log scale the colormap vmax=10**6, alpha=0.6, # some transparency )# invert y-axis to match the video frame axes[i].set_ylim([height -1, 0]) axes[i].set_xlim([0, width]) axes[i].set_title(f"Body centroid occupancy (log scale)")
Occupancy heatmaps
Ho, Hinze, Nejc Kejzar, Hiroki Sasaguri, Takashi Saito, Takaomi C. Saido, Bart De Strooper, Marius Bauza, and Julija Krupic. 2023. “A Fully Automated Home Cage for Long-Term Continuous Phenotyping of Mouse Cognition and Behavior.”Cell Reports Methods 3 (7): 100532. https://doi.org/10.1016/j.crmeth.2023.100532.