5  A mouse’s daily activity log

In this case study, we’ll be using the movement package to dive into mouse home cage monitoring data acquired in Smart-Kages and tracked with DeepLabCut. We’ll explore how mouse activity levels fluctuate throughout the day.

Before you get started, make sure you’ve set up the animals-in-motion-env environment (refer to prerequisites A.3.3) and are using it to run this notebook. You’ll also need to download the Smart-Kages.zip archive from Dropbox (see prerequisites A.4) and unzip it.

5.1 Import libraries

from pathlib import Path

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import xarray as xr

from movement import sample_data
from movement.filtering import filter_by_confidence
from movement.kinematics import compute_speed
from movement.plots import plot_occupancy
Downloading data from 'https://gin.g-node.org/neuroinformatics/movement-test-data/raw/master/metadata.yaml' to file '/home/runner/.movement/data/temp_metadata.yaml'.
SHA256 hash of downloaded file: cf2876bab4f754d48d3c9f113ce5ac91787304cc587d33d8bf1124d5358e957f
Use this value as the 'known_hash' argument of 'pooch.retrieve' to ensure that the file hasn't changed if it is downloaded again in the future.

5.2 The Smart-Kages dataset

Acknowledgement

This dataset was kindly shared by Loukia Katsouri from the O’Keefe Lab, with permission to use for this workshop.

The Smart-Kages dataset comprises home cage recordings from two mice, each housed in a specialised Smart-Kage (Ho et al. 2023)—a home cage monitoring system equipped with a camera mounted atop the cage.

The camera captures data around the clock at a rate of 2 frames per second, saving a video segment for each hour of the day. A pre-trained DeepLabCut model is subsequently employed to predict 8 keypoints on the mouse’s body.

Let’s examine the contents of the downloaded data. You will need to specify the path to the unzipped Smart-Kages folder on your machine.

# Replace with the path to the unzipped Smart-Kages folder on your machine
smart_kages_path = Path.home() / ".movement" / "Smart-Kages"

# Let's visualise the contents of the folder
files = [f.name for f in smart_kages_path.iterdir()]
files.sort()
for file in files:
    print(file)
kage14.nc
kage14_background.png
kage17.nc
kage17_background.png

The tracking data are stored in two.nc (NetCDF) files: kage14 and kage17. NetCDF is an HDF5-based file format that can be natively saved/loaded by the xarray library, and is therefore convenient to use with movement.

Apart from these, we also have two .png files: kage14_background.png and kage17_background.png, which constitute frames extracted from the videos.

Let’s take a look at them.

Code
kages = ["kage14", "kage17"]
img_paths = [smart_kages_path / f"{kage}_background.png" for kage in kages]

fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(10, 5))

for i, img_path in enumerate(img_paths):
    img = plt.imread(img_path)
    axes[i].imshow(img)
    axes[i].set_title(f"{kages[i]}")
    axes[i].axis("off")
Figure 5.1: Top-down camera views of the Smart-Kage habitats
Questions
  1. What objects do you see in the habitat?
  2. What challenges do you anticipate with tracking a mouse in this environment?
  3. What are the trade-offs one has to consider when designing a continuous monitoring system?

Let’s load and inspect the tracking data:

ds_kages = {}  # a dictionary to store kage name -> xarray dataset

for kage in ["kage14", "kage17"]:
    ds_kages[kage] = xr.open_dataset(smart_kages_path / f"{kage}.nc")

ds_kages["kage14"]   # Change to "kage17" to inspect the other dataset
<xarray.Dataset> Size: 1GB
Dimensions:          (time: 5236793, space: 2, keypoints: 8, individuals: 1)
Coordinates: (5)
Data variables:
    position         (time, space, keypoints, individuals) float64 670MB ...
    confidence       (time, keypoints, individuals) float64 335MB ...
Attributes: (7)

We see that each dataset contains a huge amount of data!

Code
start_date_k14 = pd.to_datetime(ds_kages["kage14"].time.data[0])
end_date_k14 = pd.to_datetime(ds_kages["kage14"].time.data[-1])
duration_k14 = end_date_k14 - start_date_k14

start_date_k17 = pd.to_datetime(ds_kages["kage17"].time.data[0])
end_date_k17 = pd.to_datetime(ds_kages["kage17"].time.data[-1])
duration_k17 = end_date_k17 - start_date_k17

print("Experiment durations:")
print(f"kage-14: from {start_date_k14} to {end_date_k14} ({duration_k14.days} days)")
print(f"kage-17: from {start_date_k17} to {end_date_k17} ({duration_k17.days} days)")
Experiment durations:
kage-14: from 2024-04-08 13:55:40 to 2024-05-10 07:59:59.501205 (31 days)
kage-17: from 2024-04-03 00:00:06 to 2024-05-10 07:59:59.509103 (37 days)

5.3 Datetime Coordinates

You might notice something interesting about the time coordinates in these xarray datasets: they’re given in datetime64[ns] format, which means they’re precise timestamps expressed in “calendar time”.

This is different from what we’ve seen before in other movement datasets, where time coordinates are expressed as seconds elapsed since the start of the video, or “elapsed time”.

Many recording systems can output timestamps for each video frame. In our case, the raw data from the Smart-Kage system included the start datetime of each 1-hour-long video segment and the precise time difference between the start of each segment and every frame within it.

Using this information, we were able to reconstruct precise datetime coordinates for all frames throughout the entire experiment. We then concatenated the DeepLabCut predictions from all video segments and assigned the datetime coordinates to the resulting dataset. If you’re interested in the details, you can find the code in the smart-kages-movement GitHub repository.

Using “calendar time” is convenient for many applications. For example, we could cross-reference the tracking results against other data sources, such as body weight measurements. It also allows us to easily select time windows by datetime:

ds_14 = ds_kages["kage14"]

# Select a specific month
ds_14.sel(time=slice("2024-04-01", "2024-04-30"))

# Select a specific day
ds_14.sel(time=slice("2024-04-17 00:00:00", "2024-04-17 23:59:59"))

# Select a half-hour window
ds_14.sel(time=slice("2024-04-17 09:30:00", "2024-04-17 10:00:00"))

That said, it’s still useful to also know the total time elapsed since the start of the experiment. In fact, many movement functions will expect “elapsed time” and may not work with datetime coordinates (for now).

Luckily, it’s easy to convert datetime coordinates to “elapsed time” by simply subtracting the start datetime of the whole experiment from each timestamp.

Expand to see how this can be done
# Get the start datetime the experiment in kage14
experiment_start = ds_14.time.isel(time=0).data

# Subtract the start datetime from each timestamp
time_elapsed = (ds_14.time.data - np.datetime64(experiment_start))

# Convert to seconds
seconds_elapsed = time_elapsed / pd.Timedelta("1s")

# Assign the seconds_elapsed coordinate to the "time" dimension
ds_14 = ds_14.assign_coords(seconds_elapsed=("time", seconds_elapsed))

We’ve pre-computed this for convenience and stored it in a secondary time coordinate called seconds_elapsed.

print(ds_14.coords["time"].values[:2])
print(ds_14.coords["seconds_elapsed"].values[:2])
['2024-04-08T13:55:40.000000000' '2024-04-08T13:55:40.499044000']
[0.       0.499044]

Whenever we want to switch to “elapsed time” mode, we can simply set the seconds_elapsed coordinates as the “index” of the time dimension. This means that seconds_elapsed will be used as the primary time coordinate, allowing us to select data by it.

ds_14.set_index(time="seconds_elapsed").sel(time=slice(0, 1800))
Exercise A

For each of the two kages:

  1. Plot the x-axis position of the mouse’s body center over time, for the week starting on April 15th. What do you notice?
  2. Plot the median confidence of the body center for each day, over the entire duration of the experiment.

5.4 Filtering out low-confidence predictions

Let’s examine the range of confidence values for each keypoint.

Code
kage = "kage14"
confidence = ds_kages[kage].confidence.squeeze()

fig, ax = plt.subplots(figsize=(8, 3))
confidence.quantile(q=0.25, dim="time").plot.line("o--", color="gray", ax=ax, label="25% quantile")
confidence.quantile(q=0.75, dim="time").plot.line("o--", color="gray", ax=ax, label="75% quantile")
confidence.median(dim="time").plot.line("o-", color="black", ax=ax, label="median")

ax.legend()
ax.set_title(f"{kage} confidence range")
plt.show()
Figure 5.2: Confidence range by keypoint

It looks like the “neck”, “bodycenter”, “spine1”, and “spine2” keypoints are the most confidently detected. Let us define a list of “reliable” keypoints for later use. These are all on the mouse’s body.

reliable_keypoints = ["neck", "bodycenter", "spine1", "spine2"]

We can filter out low-confidence predictions.

confidence_threshold = 0.95

for kage, ds in ds_kages.items():
    print(f"Filtering {kage}...")
    ds["position_filtered"] = filter_by_confidence(
        ds.position,
        ds.confidence,
        threshold=confidence_threshold,
        print_report=True,
    )
    print("\n")
Filtering kage14...
No missing points (marked as NaN) in input.
Missing points (marked as NaN) in output:

keypoints                        snout                   leftear                 rightear                      neck                    spine1                bodycenter                    spine2                  tailbase
individuals                                                                                                                                                                                                                
individual_0  3541459/5236793 (67.63%)  2326818/5236793 (44.43%)  2670878/5236793 (51.0%)  1763878/5236793 (33.68%)  1195647/5236793 (22.83%)  1109151/5236793 (21.18%)  1762748/5236793 (33.66%)  3443803/5236793 (65.76%)


Filtering kage17...
No missing points (marked as NaN) in input.
Missing points (marked as NaN) in output:

keypoints                       snout                   leftear                  rightear                      neck                    spine1                bodycenter                    spine2                  tailbase
individuals                                                                                                                                                                                                                
individual_0  4729215/6063448 (78.0%)  4296942/6063448 (70.87%)  4380660/6063448 (72.25%)  3934047/6063448 (64.88%)  3215592/6063448 (53.03%)  1959611/6063448 (32.32%)  3568964/6063448 (58.86%)  4471744/6063448 (73.75%)

5.5 Plot the mouse’s speed over time

Let’s define a single-point representation of the mouse’s position, which we’ll call the body_centroid. We derive this by taking the mean of the 4 reliable keypoints, using their filtered positions.

for kage, ds in ds_kages.items():
    ds["body_centroid"] = ds.position_filtered.sel(
        individuals="individual_0",  # the only individual in the dataset
        keypoints=reliable_keypoints
    ).mean(dim="keypoints")

Now let’s compute the body centroid speed as a proxy of the mouse’s speed. For compute_speed to work properly, we’ll temporarily switch to “elapsed time” mode.

for kage, ds in ds_kages.items():
    ds["body_centroid_speed"] = compute_speed(
        ds.body_centroid.set_index(time="seconds_elapsed")
    ).assign_coords(time=ds.body_centroid.time)

ds_kages["kage14"].body_centroid_speed
<xarray.DataArray 'body_centroid_speed' (time: 5236793)> Size: 42MB
4.352 2.27 0.376 0.2181 0.7072 2.167 ... 0.2808 0.176 0.2587 0.2372 0.4738
Coordinates: (2)

Let’s plot the speed over time.

fig, axes = plt.subplots(
    nrows=2, ncols=1, figsize=(10, 6), sharex=True, sharey=True
)

for i, kage in enumerate(["kage14", "kage17"]):
    ds_kages[kage].body_centroid_speed.plot.line(ax=axes[i])
    axes[i].set_title(f"{kage} body centroid")
    axes[i].set_ylabel("speed (pixels/sec)")

plt.show()
Figure 5.3: Body centroid speed
Exercise B
  1. What do you notice about the overall speed fluctuations over time?
  2. What do you think is the reason for this?
  3. Do you think there are any differences between the two kages? Feel free to “zoom in” on specific time windows to investigate this.

5.6 Plot the occupancy heatmap

For the heatamp calculation to properly work, we need to temporarily set the time coordinates to “elapsed time”.

Code
fig, axes = plt.subplots(
    nrows=2, ncols=1, figsize=(8, 8), sharex=True, sharey=True
)

for i, kage in enumerate(["kage14", "kage17"]):
    img = plt.imread(img_paths[i])
    height, width = img.shape[:2]

    axes[i].imshow(img)
    plot_occupancy(
        ds_kages[kage].body_centroid.set_index(time="seconds_elapsed"),
        ax=axes[i],
        cmap="turbo",
        norm="log",  # log scale the colormap
        vmax=10**6,
        alpha=0.6,   # some transparency
    )
    # invert y-axis to match the video frame
    axes[i].set_ylim([height - 1, 0])
    axes[i].set_xlim([0, width])
    axes[i].set_title(f"Body centroid occupancy (log scale)")

Occupancy heatmaps