Chapter 12 Comparisons 2

This chapter compares the mutations of Sixto and Angela on the basis of the closest possible filtering.

12.1 Genomes

The total genome size of Sextonia rubra is larger than that of Dicoryinia guyanensis, but the Dicoryinia guyanensis genome was better assembled in fewer scaffolds (Fig. 12.1).

Caption.

Figure 12.1: Caption.

Caption.

Figure 12.2: Caption.

12.2 Libraries

As expected, Angela’s sequencing depth is twice as high as Sixto’s, averaging about 150X versus 80X (Fig. 12.3), which translates into greater accepted sequencing depth in Angela (note that as a result, Sixto’s libraries are less filtered in terms of coverage, Fig. 12.4).

Caption.

Figure 12.3: Caption.

Caption.

Figure 12.4: Caption.

12.3 Mutations

The origins of the mutations are distributed in the crown, but most of the mutations with basic filtering come from the base of the crown, followed by the carpenter, the branches and finally the tips (Fig. 12.5). Interestingly, stronger filtering favoured mutations in the tips as well as at the base of the crown.

Caption.

Figure 12.5: Caption.

12.4 Frequencies

Surprisingly, the allelic frequencies of mutations occurring at the base of the crown are not significantly higher than those at the tips (Fig. 12.6).

Caption.

Figure 12.6: Caption.

Caption.

Figure 12.7: Caption.

Caption.

Figure 12.8: Caption.

12.5 Phylogeny

Therefore, mutations are strongly shared across the crown and do not always follow the architecture. As a result, the phylogeny of the mutations does not match the architecture of the tree, with the exception of branch I, which is conserved in Sixto (Fig. 12.9 & Fig. ??).

## 
## Setting initial dates...
## Fitting in progress... get a first set of estimates
##          (Penalised) log-lik = -6.362752 
## Optimising rates... dates... -6.362752 
## Optimising rates... dates... -6.361982 
## 
## log-Lik = -6.306181 
## PHIIC = 66.66
Caption.

Figure 12.9: Caption.

12.6 Light

The light condition of the sampled tips at the time of sampling did not affect the number of mutations observed per library (Fig. ??).

Caption.

Figure 12.10: Caption.

Caption.

Figure 12.11: Caption.

Caption.

Figure 12.12: Caption.

Caption.

Figure 12.13: Caption.

Caption.

Figure 12.14: Caption.

12.7 Type

Mutation types are similar between Angela and Sixto, with the exception of an increase in C->A and C->T but a decrease in T->A in Sixto compared to Angela, which are probably due to the sampling effect (Fig. 12.15, to be further investigated).

Caption.

Figure 12.15: Caption.

12.8 Spectra

The mutation spectra are similar between Angela and Sixto, with a few exceptions that are probably due to a sampling effect (Fig. 12.16, to be explored further).

Caption.

Figure 12.16: Caption.

12.9 Rates

Angela and Sixto show ten to ten thousand mutations depending on the filtering and the minimum accepted allelic frequency (Fig. 12.17). Using base filtering, Angela has a higher number of mutations than Sixto, due to the twofold sequencing depth. Nevertheless, ev filtering gave a similar number of mutations close to thousands in both trees.

Caption.

Figure 12.17: Caption.

12.10 Annotations

In progress.

12.11 Genes

In progress.

##               Df Sum Sq Mean Sq F value   Pr(>F)    
## tree           1  15.52  15.522   87.89  < 2e-16 ***
## synonymy       1  10.63  10.633   60.20 1.47e-14 ***
## Residuals   1698 299.90   0.177                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

12.12 Fruits

12.12.1 From SSRseq Olivier

12.12.1.1 Align genotypes

library(Biostrings)
candidates <- bind_rows(
  read_tsv("data/mutations/fruits/angela_fruits_candidate_mutations.tsv") %>% 
    mutate(mutation = 1:n(), tree = "Angela"),
  read_tsv("data/mutations/fruits/sixto_fruits_candidate_mutations.tsv") %>% 
    mutate(mutation = 125:(124+n()), tree = "Sixto")
) %>% 
  dplyr::select(CHROM, POS, REF, ALT, af, branch, mutation, tree) %>% 
  mutate(CHROM = as.numeric(gsub("Super-Scaffold_", "", CHROM)))
alleles <- readxl::read_xlsx("data/mutations/fruits/SSRseq_TreeMutation_DataAnalysis.xlsx", "AlleleInformation") %>% 
  dplyr::rename(locus = Locus, genotype = AlleleSeqCode, sequence = AlleleSequence) %>% 
  dplyr::select(locus, genotype, sequence) %>% 
  mutate(locus2 = gsub("TreeMut_SNP-", "", locus)) %>% 
  separate(locus2, c("mutation", "pos"), "_Super-Scaffold_", convert = T) %>% 
  separate_rows(mutation, convert = T) %>% 
  separate(pos, "CHROM", convert = T) %>% 
  left_join(candidates) %>% 
  mutate(name = paste0("SNV", sprintf("%03d", mutation), "_A", genotype))
angela_alleles <- DNAStringSet(filter(alleles, tree == "Angela")$sequence)
names(angela_alleles) <- filter(alleles, tree == "Angela")$name
writeXStringSet(angela_alleles, "data/mutations/fruits/angela_alleles.fa")
sixto_alleles <- DNAStringSet(filter(alleles, tree == "Sixto")$sequence)
names(sixto_alleles) <- filter(alleles, tree == "Sixto")$name
writeXStringSet(sixto_alleles, "data/mutations/fruits/sixto_alleles.fa")
refs <- bind_rows(readxl::read_xlsx("data/mutations/fruits/treemutation_fruits.xlsx", "Sixto") %>% 
            mutate(tree = "Sixto"),
          readxl::read_xlsx("data/mutations/fruits/treemutation_fruits.xlsx", "Angela") %>% 
            mutate(tree = "Angela")) %>% 
  mutate(library = paste0(Code_Espece, Tube), tissue = Tissu) %>%
  dplyr::select(library, tree, tissue) %>% 
  bind_rows(readxl::read_xlsx("data/mutations/fruits/Angela.xlsx", "libraries") %>% 
              mutate(tissue = paste("branch", branch), library = idOld, tree = "Angela") %>% 
              dplyr::select(library, tree, tissue)) %>% 
  bind_rows(readxl::read_xlsx("data/mutations/fruits/Sixto.xlsx", "samples") %>% 
              mutate(tissue = paste("branch", Branch), library = id, tree = "Sixto") %>% 
              dplyr::select(library, tree, tissue))
bwa mem -t 2 ../angela/genome/Dgu_HS1_HYBRID_SCAFFOLD.fa angela_alleles.fa | samtools sort > angela_alleles_aligned.bam
samtools index angela_alleles_aligned.bam
bwa mem -t 2 ../sixto/genome/HS1_Sru_omap1_hap1_HYBRID_SCAFFOLD.fa sixto_alleles.fa | samtools sort > sixto_alleles_aligned.bam
samtools index sixto_alleles_aligned.bam

12.12.1.2 Automate IGV

conda create -n igvreports python=3.7.1
conda activate igvreports
pip install igv-reports
conda deactivate
conda activate igvreports
create_report data/mutations/fruits/angela_fruits_candidate_mutations.tsv \
data/mutations/angela/genome/Dgu_HS1_HYBRID_SCAFFOLD.fa \
--sequence 2 --begin 3 --end 3 \
--flanking 1000 \
--info-columns SNV CHROM POS REF ALT af replicate branch rank \
--tracks data/mutations/fruits/angela_alleles_aligned.bam data/mutations/angela/annotation/Dgu_HS1_HYBRID_SCAFFOLD.fa.out.gff \
--output data/mutations/fruits/angela_fruits_aligned.html
conda deactivate
conda activate igvreports
create_report data/mutations/fruits/sixto_fruits_candidate_mutations.tsv \
data/mutations/sixto/genome/HS1_Sru_omap1_hap1_HYBRID_SCAFFOLD.fa \
--sequence 2 --begin 3 --end 3 \
--flanking 1000 \
--info-columns SNV CHROM POS REF ALT af replicate branch rank \
--tracks data/mutations/fruits/sixto_alleles_aligned.bam data/mutations/sixto/annotation/HS1_Sru_omap1_hap1_HYBRID_SCAFFOLD.fa.out.gff \
--output data/mutations/fruits/sixto_fruits_aligned.html
conda deactivate

12.12.1.3 Results

##             Type Angela Sixto Total Percentage
## 1       mutation     16     5    21         15
## 2 reference only     81    20   101         72
## 3        suspect     11     6    17         12
## 4      unaligned     16     5    21         15
pdf( "data/mutations/fruits/fruits_mutations.pdf")
for(i in 1:ggforce::n_pages(graph))
  print(graph + ggforce::facet_wrap_paginate(~title, scales = "free",
                                         ncol = 1, nrow = 1, page = i))
dev.off()

12.12.1.4 Tissues

SNV FALSE TRUE agreement
SNV006 0 52 100
SNV013 0 50 100
SNV031 0 52 100
SNV054 1 51 98
SNV057 8 44 85
SNV107 6 46 88
SNV128 0 19 100
SNV132 0 19 100
SNV151 0 20 100
SNV160 0 19 100
fruit cotyledon embryo sac
A1 NA homozygous
A10 homozygous homozygous
A11 homozygous homozygous
A12 homozygous heterozygous
A13 homozygous homozygous
A14 homozygous homozygous
A15 heterozygous heterozygous
A16 homozygous homozygous
A17 NA homozygous
A18 homozygous heterozygous
A19 homozygous homozygous
A2 homozygous homozygous
A20 homozygous homozygous
A3 homozygous homozygous
A4 homozygous homozygous
A5 homozygous heterozygous
A6 homozygous heterozygous
A7 homozygous homozygous
A8 homozygous homozygous
A9 homozygous homozygous
B1 homozygous homozygous
B10 homozygous homozygous
B11 homozygous homozygous
B12 homozygous homozygous
B13 homozygous homozygous
B14 homozygous homozygous
B15 homozygous homozygous
B16 homozygous heterozygous
B17 homozygous homozygous
B18 homozygous heterozygous
B19 heterozygous heterozygous
B2 homozygous homozygous
B20 homozygous homozygous
B21 homozygous homozygous
B22 homozygous homozygous
B23 homozygous NA
B24 homozygous homozygous
B25 homozygous homozygous
B3 homozygous homozygous
B4 homozygous homozygous
B5 homozygous heterozygous
B6 homozygous homozygous
B7 homozygous homozygous
B8 homozygous homozygous
B9 homozygous homozygous
C1 NA homozygous
C2 homozygous homozygous
C3 homozygous homozygous
D1 homozygous homozygous
D2 homozygous homozygous
D3 homozygous heterozygous
D5 NA homozygous
SNV cotyledon embryo sac endocarpe pericarpe
SNV006 7 7 1 NA
SNV013 2 3 NA NA
SNV031 NA 1 NA NA
SNV054 NA 1 3 NA
SNV057 2 10 6 NA
SNV107 1 6 1 NA
SNV128 NA 1 NA NA
SNV132 NA 1 NA 2
SNV151 NA 1 NA 3
SNV160 NA 1 NA 3

12.12.1.5 Frequencies