---
title: "Clocks and Rocks"
author: "Matheus Januario and Jennifer Auler"
date: "Dec/2023"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{clocks_and_rocks}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r include=FALSE}
options(rmarkdown.html_vignette.check_title = FALSE)
```


```{r}
library(evolved)
```

## The fossil record:

Here we will use data on fossilized organism, which we call "fossil occurrences". the data was originally obtained from the Paleobiology database, a free-to-use resource that is the standard repository for such data and which has been used by hundreds or thousands of scientific research articles to study the history and dynamics of biodiversity in Deep Time.

```{r}
data("mammals_fossil")
head(mammals_fossil)
```

Students can use functions in the package to calculate the diversity of a certain taxonomic rank, and plot it through time. 

To make it more fun, we will compare tow different taxonomic levels, and will plot them in a relative, log scale:

```{r,fig.height=5, fig.width=6, , fig.align='center'}
spDTT = calcFossilDivTT(mammals_fossil, tax.lvl = "species")
genusDTT = calcFossilDivTT(mammals_fossil, tax.lvl = "genus")
famDTT = calcFossilDivTT(mammals_fossil, tax.lvl = "family")

# And to allow comparisons, we will use relative richness:
plot(x=genusDTT$age, xlim = rev(range(genusDTT$age)),
     y=log(genusDTT$div)-log(max(genusDTT$div)), 
     xlab="Time (Million years ago)",
     ylab="Log relative diversity",
     type="l", col="blue", ylim=c(-7,0))

lines(x=famDTT$age,
     y=log(famDTT$div)-log(max(famDTT$div)), 
     col="red")

lines(x=spDTT$age,
     y=log(spDTT$div)-log(max(spDTT$div)), 
     col="black")
```

We can also visualize fossil records by running:

```{r,fig.height=5, fig.width=6, , fig.align='center'}
# Family-level:
plotRawFossilOccs(mammals_fossil, tax.lvl = "family", knitr = TRUE)

# Genus level:
plotRawFossilOccs(mammals_fossil, tax.lvl = "genus", knitr = TRUE)

# Species level:
plotRawFossilOccs(mammals_fossil, tax.lvl = "species", knitr = TRUE)
```

Are they different? how could we test? Hint: look at the column names of the fossil object:

```{r}
colnames(mammals_fossil)
```

Now, how complete is the record? We can use another dataset to explored this:

```{r}
data("mammals_spp")
```

And see the proportion of living species with a fossil occurrence.

```{r}
sum_w_fossil = sum(mammals_spp %in% mammals_fossil$species)

sum_w_fossil / length(mammals_spp)
```

Students can for instance explore how this varies across different mammal groups, and which type of factors (e.g. biological, geological factors) seem to be influencing this the most.

Results from this dataset can also be compared (when relevant) with other fossil records:

```{r}
data("trilob_fossil")
data("ammonoidea_fossil")
data("dinos_fossil")
data("birds_spp")
```

We can also compare the temporal trends of species number, for instance. To help in that way, we also provide biodiversity timeseries for more clades:

```{r}
data("timeseries_fossil")
head(timeseries_fossil)
```

It contains many fossil datasets, and we will only plot the first 4 of them

```{r fig.height=5, fig.width=6, , fig.align='center'}
clades = unique(timeseries_fossil$clade)[1:4]

cols= c("#ffd353", "#ef8737", "#bb292c", "#62205f")

par(mfrow=c(2,2))
for(i in 1:length(clades)){
  aux = timeseries_fossil[timeseries_fossil$clade==clades[i], ]
  plot(aux$time_ma, log(aux$richness), col=cols[i], lwd=3,
       main=clades[i], type="l", frame.plot = F,
       xlab="Time (Mya)", ylab="Log richness",
       xlim=rev(range(aux$time_ma)))
}
```

Students can explore this other dataset, and compare it with the previously discussed ones. They can compare richness thought time, calculate statistics related to richness change. Explore factors (e.g. mass extinction) that might have affected some clades, among other learning problems.

But fossils are not the only way to explore the timescale of evolution. Below, we are going to show how some functions that use this type of data work.

## Comparing molecular patterns

With other functions, students can also explore molecular sequences and compare species.

First we load the dataset of protein sequences from the cytochrome oxidase 1 gene. This gene, often known as `CO1`, is a mitochondrial gene that plays a key role in cellular respiration (e.g., the primary aerobic pathway to energy ( ATP ) generation). `CO1` contains approximately 513 aminoacids (AA) and has been used by previous studies for reconstructing phylogenetic trees and estimating divergence times between taxa by assuming a molecular clock:

```{r}
data(cytOxidase)

summary(cytOxidase)

head(cytOxidase)
```

We can compare two sequences in terms AA difference number. For instance if we want to compare a species of snake with one species of bird, we type:

```{r}
countSeqDiffs(cytOxidase, "snake", "bird")
```

And to calculate the proportion of differences, we type:

```{r}
countSeqDiffs(cytOxidase, "snake", "bird")/nchar(cytOxidase["snake"])
```

We can also quickly visualize the sequences by directly handling all elements in this object:

```{r,fig.height=5, fig.width=6, , fig.align='center'}
plotProteinSeq(cytOxidase, c("snake", "bird", "cnidaria"), knitr = TRUE)
```

And using patterns of molecular divergence, as well as fossil occurrences, we can build and, specially, date, molecular phylogenies.