library(dplyr)
library(tidyr)
library(ggplot2)
library(printr) # for nice table printing

Introduction

[Still in draft …] Zebrafish are fish who swim … add more profound description.

They are interesting why? [DTK: Even though I have a joint paper about the activity of Mauthner neurons in zebrafish, that probably has nothing to do with Mary’s interests.

Data as they arrived for me

Mary, this is just for our internal purposes. It’s here to document where the data came from and provide a “chain of evidence” for the validity of the data to be presented in your research publication. (But I should have saved the original CSV file with a better name to refer to this specific run.)

OriginalForm <- readr::read_csv("Zebrafish.csv")
OriginalForm[1:8,1:6]
cycle_number 1 2 3 4 5
time 0.0 904.2 1804.3 2704.5 3604.6
temp 19.4 27.5 27.8 28.1 27.7
A1 83.0 76.0 62.0 39.0 48.0
A2 107.0 86.0 69.0 63.0 74.0
A3 147.0 136.0 84.0 89.0 69.0
A4 209.0 228.0 214.0 204.0 217.0
A5 77.0 65.0 56.0 49.0 43.0
A6 51.0 41.0 21.0 19.0 19.0

I’m going to translate this to a long form. Think of this as cleaning more than it is wrangling. Cleaning is often ugly.

Tmp <- t(OriginalForm) # transpose
nms <- Tmp[1,]
Tmp <- Tmp[-1,]
nrows <- nrow(Tmp)
Tmp <- data.frame(matrix(as.numeric(Tmp), nrow = nrows),
                  stringsAsFactors=FALSE)
names(Tmp) <- nms
Activity <- gather(Tmp, "fish", "value", -time, -temp)
Activity %>% head(10)
time temp fish value
0.0 19.4 A1 83
904.2 27.5 A1 76
1804.3 27.8 A1 62
2704.5 28.1 A1 39
3604.6 27.7 A1 48
4504.6 28.3 A1 41
5404.6 28.1 A1 73
6304.7 28.1 A1 79
7204.7 28.3 A1 50
8104.7 28.5 A1 56
nrow(Activity)
## [1] 48000

Temperature to Fahrenheit Time to hours

Just fish A1

Just temperatures greater than 28 degrees

Mean and standard deviation of each fish

Arrange by maximum activity for each fish

Convert temp to Fahrenheit

Activity2 <- 
  Activity %>% 
  mutate(tempF = 32 + (9/5) * temp) 
head(Activity2, 10) 
time temp fish value tempF
0.0 19.4 A1 83 66.92
904.2 27.5 A1 76 81.50
1804.3 27.8 A1 62 82.04
2704.5 28.1 A1 39 82.58
3604.6 27.7 A1 48 81.86
4504.6 28.3 A1 41 82.94
5404.6 28.1 A1 73 82.58
6304.7 28.1 A1 79 82.58
7204.7 28.3 A1 50 82.94
8104.7 28.5 A1 56 83.30
Activity2 %>% 
  filter(fish == "A3") %>%
  head()
time temp fish value tempF
0.0 19.4 A3 147 66.92
904.2 27.5 A3 136 81.50
1804.3 27.8 A3 84 82.04
2704.5 28.1 A3 89 82.58
3604.6 27.7 A3 69 81.86
4504.6 28.3 A3 95 82.94
Activity2 %>% 
  group_by(fish) %>%
  summarise(mean_temp = mean(tempF, na.rm = TRUE)) 
fish mean_temp
A1 82.73727
A2 82.73727
A3 82.73727
A4 82.73727
A5 82.73727
A6 82.73727
A7 82.73727
A8 82.73727
A9 82.73727
A10 82.73727
A11 82.73727
A12 82.73727
B1 82.73727
B2 82.73727
B3 82.73727
B4 82.73727
B5 82.73727
B6 82.73727
B7 82.73727
B8 82.73727
B9 82.73727
B10 82.73727
B11 82.73727
B12 82.73727
C1 82.73727
C2 82.73727
C3 82.73727
C4 82.73727
C5 82.73727
C6 82.73727
C7 82.73727
C8 82.73727
C9 82.73727
C10 82.73727
C11 82.73727
C12 82.73727
D1 82.73727
D2 82.73727
D3 82.73727
D4 82.73727
D5 82.73727
D6 82.73727
D7 82.73727
D8 82.73727
D9 82.73727
D10 82.73727
D11 82.73727
D12 82.73727
E1 82.73727
E2 82.73727
E3 82.73727
E4 82.73727
E5 82.73727
E6 82.73727
E7 82.73727
E8 82.73727
E9 82.73727
E10 82.73727
E11 82.73727
E12 82.73727
F1 82.73727
F2 82.73727
F3 82.73727
F4 82.73727
F5 82.73727
F6 82.73727
F7 82.73727
F8 82.73727
F9 82.73727
F10 82.73727
F11 82.73727
F12 82.73727
G1 82.73727
G2 82.73727
G3 82.73727
G4 82.73727
G5 82.73727
G6 82.73727
G7 82.73727
G8 82.73727
G9 82.73727
G10 82.73727
G11 82.73727
G12 82.73727
H1 82.73727
H2 82.73727
H3 82.73727
H4 82.73727
H5 82.73727
H6 82.73727
H7 82.73727
H8 82.73727
H9 82.73727
H10 82.73727
H11 82.73727
H12 82.73727
Activity2 %>% 
ggplot(aes(x = value, y = tempF)) +
  geom_point(aes(color = fish))
## Warning: Removed 10752 rows containing missing values (geom_point).

Time Series

As I (DTK) understand things, we’re looking for temporal patterns in fish activity. Here’s a quick plot for each fish:

Activity %>%
  ggplot(aes(x=time, y=value)) + 
  geom_line(aes(color=fish))
## Warning: Removed 10752 rows containing missing values (geom_path).

DTK: There’s something going on, but with 96 fish it’s a very crowded plot.

MH: Could we plot out the fish individually?

DTK: Yes, but what are you going to do with 96 different plots?

MH: Perhaps divide up the fish by treatment group. You can tell from the letter of the fish name what treatment that fish received.

DTK: OK. What’s the code?

MH: It’s a bit complicated. I’m putting a CSV file that translates between fish name and code. There are 96 cases, one for each well. (Some of the wells were empty.) Here it is:

Treatments <- read.csv("FishTx.csv",
                       stringsAsFactors = FALSE)
Treatments %>% sample_n(size=5)
fish Tx
61 F1 LD
90 H6 LD
37 D1 Empty
83 G11 LD
19 B7 DD
nrow(Treatments)
## [1] 96

Joining the treatment with the activity data

Tmp <- 
  Activity %>%
  inner_join(Treatments)
## Joining by: "fish"
## Warning in inner_join_impl(x, y, by$x, by$y): joining factor and character
## vector, coercing into character vector
Tmp %>% sample_n(5)
time temp fish value Tx
22267 239413.0 28.1 D9 51 Empty
29201 180010.9 28.4 E11 18 Empty
7005 3604.6 27.7 B3 41 DD
10581 72007.0 28.5 B10 74 DD
25902 NA NA E4 NA Empty
nrow(Tmp)
## [1] 48000
Tmp %>%
  ggplot(aes(x=time, y=value)) + 
  geom_line(aes(color=fish)) + 
  facet_grid( Tx ~ .)
## Warning: Removed 4032 rows containing missing values (geom_path).
## Warning: Removed 2688 rows containing missing values (geom_path).
## Warning: Removed 4032 rows containing missing values (geom_path).

DTK: Some observations from this simple graph …

  1. The empty wells have a different level of behavior than many of the fish. (That’s expected, right?)
  2. The time pattern for the empty wells is strikingly similar to the the pattern for the wells with fish. Why? Might this be leakage between the reading for the wells? Perhaps a fish was accidentally put into one of the nominally empty wells?
  3. There seem to be some differences between the DD treated fish and LD treated fish in the level of activity even though the pattern over time is similar.
  4. Fish with higher overall activity seem to have higher fluctuations in activity.
  5. There are color bands in the plots. I’m thinking this is because each 12-well row in the 96-well plate is being assigned a similar color in the plot. So perhaps there is different activity in each column. Would this be a fish-activity effect or an instrumentation effect.

Some suggestions:

  • Let’s see how similar the 24 empty wells are.
  • Look at the log of activity.
  • Subtract out the empty-well level of activity from each fish. Maybe do this in logarithm land.
  • Transform the time axis to be scaled in days. Were the fish exposed to a 24-hour zeitgeber? If not, do they have a circadian rhythm different from 24 hours?
  • Let’s build a model of activity using plate row and column as covariates variables. We’d also include treatment and time of (circadian) day. This will be a heirarchical model, since well column is nested within treatment. Or maybe leave time out of the model and use the residuals to display the time pattern.