library(dplyr)
library(tidyr)
library(ggplot2)
library(printr) # for nice table printing
[Still in draft …] Zebrafish are fish who swim … add more profound description.
They are interesting why? [DTK: Even though I have a joint paper about the activity of Mauthner neurons in zebrafish, that probably has nothing to do with Mary’s interests.
Mary, this is just for our internal purposes. It’s here to document where the data came from and provide a “chain of evidence” for the validity of the data to be presented in your research publication. (But I should have saved the original CSV file with a better name to refer to this specific run.)
OriginalForm <- readr::read_csv("Zebrafish.csv")
OriginalForm[1:8,1:6]
cycle_number | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|
time | 0.0 | 904.2 | 1804.3 | 2704.5 | 3604.6 |
temp | 19.4 | 27.5 | 27.8 | 28.1 | 27.7 |
A1 | 83.0 | 76.0 | 62.0 | 39.0 | 48.0 |
A2 | 107.0 | 86.0 | 69.0 | 63.0 | 74.0 |
A3 | 147.0 | 136.0 | 84.0 | 89.0 | 69.0 |
A4 | 209.0 | 228.0 | 214.0 | 204.0 | 217.0 |
A5 | 77.0 | 65.0 | 56.0 | 49.0 | 43.0 |
A6 | 51.0 | 41.0 | 21.0 | 19.0 | 19.0 |
I’m going to translate this to a long form. Think of this as cleaning more than it is wrangling. Cleaning is often ugly.
Tmp <- t(OriginalForm) # transpose
nms <- Tmp[1,]
Tmp <- Tmp[-1,]
nrows <- nrow(Tmp)
Tmp <- data.frame(matrix(as.numeric(Tmp), nrow = nrows),
stringsAsFactors=FALSE)
names(Tmp) <- nms
Activity <- gather(Tmp, "fish", "value", -time, -temp)
Activity %>% head(10)
time | temp | fish | value |
---|---|---|---|
0.0 | 19.4 | A1 | 83 |
904.2 | 27.5 | A1 | 76 |
1804.3 | 27.8 | A1 | 62 |
2704.5 | 28.1 | A1 | 39 |
3604.6 | 27.7 | A1 | 48 |
4504.6 | 28.3 | A1 | 41 |
5404.6 | 28.1 | A1 | 73 |
6304.7 | 28.1 | A1 | 79 |
7204.7 | 28.3 | A1 | 50 |
8104.7 | 28.5 | A1 | 56 |
nrow(Activity)
## [1] 48000
Temperature to Fahrenheit Time to hours
Just fish A1
Just temperatures greater than 28 degrees
Mean and standard deviation of each fish
Arrange by maximum activity for each fish
Convert temp to Fahrenheit
Activity2 <-
Activity %>%
mutate(tempF = 32 + (9/5) * temp)
head(Activity2, 10)
time | temp | fish | value | tempF |
---|---|---|---|---|
0.0 | 19.4 | A1 | 83 | 66.92 |
904.2 | 27.5 | A1 | 76 | 81.50 |
1804.3 | 27.8 | A1 | 62 | 82.04 |
2704.5 | 28.1 | A1 | 39 | 82.58 |
3604.6 | 27.7 | A1 | 48 | 81.86 |
4504.6 | 28.3 | A1 | 41 | 82.94 |
5404.6 | 28.1 | A1 | 73 | 82.58 |
6304.7 | 28.1 | A1 | 79 | 82.58 |
7204.7 | 28.3 | A1 | 50 | 82.94 |
8104.7 | 28.5 | A1 | 56 | 83.30 |
Activity2 %>%
filter(fish == "A3") %>%
head()
time | temp | fish | value | tempF |
---|---|---|---|---|
0.0 | 19.4 | A3 | 147 | 66.92 |
904.2 | 27.5 | A3 | 136 | 81.50 |
1804.3 | 27.8 | A3 | 84 | 82.04 |
2704.5 | 28.1 | A3 | 89 | 82.58 |
3604.6 | 27.7 | A3 | 69 | 81.86 |
4504.6 | 28.3 | A3 | 95 | 82.94 |
Activity2 %>%
group_by(fish) %>%
summarise(mean_temp = mean(tempF, na.rm = TRUE))
fish | mean_temp |
---|---|
A1 | 82.73727 |
A2 | 82.73727 |
A3 | 82.73727 |
A4 | 82.73727 |
A5 | 82.73727 |
A6 | 82.73727 |
A7 | 82.73727 |
A8 | 82.73727 |
A9 | 82.73727 |
A10 | 82.73727 |
A11 | 82.73727 |
A12 | 82.73727 |
B1 | 82.73727 |
B2 | 82.73727 |
B3 | 82.73727 |
B4 | 82.73727 |
B5 | 82.73727 |
B6 | 82.73727 |
B7 | 82.73727 |
B8 | 82.73727 |
B9 | 82.73727 |
B10 | 82.73727 |
B11 | 82.73727 |
B12 | 82.73727 |
C1 | 82.73727 |
C2 | 82.73727 |
C3 | 82.73727 |
C4 | 82.73727 |
C5 | 82.73727 |
C6 | 82.73727 |
C7 | 82.73727 |
C8 | 82.73727 |
C9 | 82.73727 |
C10 | 82.73727 |
C11 | 82.73727 |
C12 | 82.73727 |
D1 | 82.73727 |
D2 | 82.73727 |
D3 | 82.73727 |
D4 | 82.73727 |
D5 | 82.73727 |
D6 | 82.73727 |
D7 | 82.73727 |
D8 | 82.73727 |
D9 | 82.73727 |
D10 | 82.73727 |
D11 | 82.73727 |
D12 | 82.73727 |
E1 | 82.73727 |
E2 | 82.73727 |
E3 | 82.73727 |
E4 | 82.73727 |
E5 | 82.73727 |
E6 | 82.73727 |
E7 | 82.73727 |
E8 | 82.73727 |
E9 | 82.73727 |
E10 | 82.73727 |
E11 | 82.73727 |
E12 | 82.73727 |
F1 | 82.73727 |
F2 | 82.73727 |
F3 | 82.73727 |
F4 | 82.73727 |
F5 | 82.73727 |
F6 | 82.73727 |
F7 | 82.73727 |
F8 | 82.73727 |
F9 | 82.73727 |
F10 | 82.73727 |
F11 | 82.73727 |
F12 | 82.73727 |
G1 | 82.73727 |
G2 | 82.73727 |
G3 | 82.73727 |
G4 | 82.73727 |
G5 | 82.73727 |
G6 | 82.73727 |
G7 | 82.73727 |
G8 | 82.73727 |
G9 | 82.73727 |
G10 | 82.73727 |
G11 | 82.73727 |
G12 | 82.73727 |
H1 | 82.73727 |
H2 | 82.73727 |
H3 | 82.73727 |
H4 | 82.73727 |
H5 | 82.73727 |
H6 | 82.73727 |
H7 | 82.73727 |
H8 | 82.73727 |
H9 | 82.73727 |
H10 | 82.73727 |
H11 | 82.73727 |
H12 | 82.73727 |
Activity2 %>%
ggplot(aes(x = value, y = tempF)) +
geom_point(aes(color = fish))
## Warning: Removed 10752 rows containing missing values (geom_point).
As I (DTK) understand things, we’re looking for temporal patterns in fish activity. Here’s a quick plot for each fish:
Activity %>%
ggplot(aes(x=time, y=value)) +
geom_line(aes(color=fish))
## Warning: Removed 10752 rows containing missing values (geom_path).
DTK: There’s something going on, but with 96 fish it’s a very crowded plot.
MH: Could we plot out the fish individually?
DTK: Yes, but what are you going to do with 96 different plots?
MH: Perhaps divide up the fish by treatment group. You can tell from the letter of the fish name what treatment that fish received.
DTK: OK. What’s the code?
MH: It’s a bit complicated. I’m putting a CSV file that translates between fish name and code. There are 96 cases, one for each well. (Some of the wells were empty.) Here it is:
Treatments <- read.csv("FishTx.csv",
stringsAsFactors = FALSE)
Treatments %>% sample_n(size=5)
fish | Tx | |
---|---|---|
61 | F1 | LD |
90 | H6 | LD |
37 | D1 | Empty |
83 | G11 | LD |
19 | B7 | DD |
nrow(Treatments)
## [1] 96
Tmp <-
Activity %>%
inner_join(Treatments)
## Joining by: "fish"
## Warning in inner_join_impl(x, y, by$x, by$y): joining factor and character
## vector, coercing into character vector
Tmp %>% sample_n(5)
time | temp | fish | value | Tx | |
---|---|---|---|---|---|
22267 | 239413.0 | 28.1 | D9 | 51 | Empty |
29201 | 180010.9 | 28.4 | E11 | 18 | Empty |
7005 | 3604.6 | 27.7 | B3 | 41 | DD |
10581 | 72007.0 | 28.5 | B10 | 74 | DD |
25902 | NA | NA | E4 | NA | Empty |
nrow(Tmp)
## [1] 48000
Tmp %>%
ggplot(aes(x=time, y=value)) +
geom_line(aes(color=fish)) +
facet_grid( Tx ~ .)
## Warning: Removed 4032 rows containing missing values (geom_path).
## Warning: Removed 2688 rows containing missing values (geom_path).
## Warning: Removed 4032 rows containing missing values (geom_path).
DTK: Some observations from this simple graph …
Some suggestions: