Species

Allometric parameters and species functional traits for Paracou and Tapajos were used both as input to the TROLL assessment and as output to assess the functional composition of the forest. Many allometric parameters and functional traits did not exist and were inferred from dbh and height data or by using the relationship between traits with a predictive mean match.

Allometric parameters

We took advantage of the TALLO database (Jucker et al. 2022) to infer species allometric parameters of the 496 species present in Amazonia (latitude between 10 and -18 and longitude between -78 and -39).

Code
ggplot(map_data("world"), aes(x = long, y = lat)) +
  geom_polygon(aes(group = group), fill="lightgray", colour = "white") +
  geom_point(data = read_csv("data/species/tallo.csv") %>% 
               mutate(amazonia = ifelse(longitude <= -39 & 
                                          longitude >= -79 &
                                          latitude >= -18 &
                                          latitude <= 10, "amazonia", "other")),
  aes(x = longitude, y = latitude, col = amazonia)) +
  theme_bw() +
  geom_hline(yintercept = c(10, -18)) +
  geom_vline(xintercept = c(-39, -79)) +
  theme(axis.title = element_blank(), legend.position = "bottom") +
  scale_color_manual("", values = c("darkgreen", "black")) +
  coord_equal()

Available data in Amazonia.
Code
read_csv("data/species/tallo.csv") %>% 
  filter(longitude <= -39, longitude >= -79, latitude >= -18, latitude <= 10) %>% 
  ggplot(aes(stem_diameter_cm, height_m)) +
  geom_point(aes(col = species), alpha = 0.25) +
  theme_bw() +
  geom_smooth() +
  scale_color_discrete(guide = "none") +
  xlab("diameter (cm)") + ylab("height (m)")

Height (m) and diameter (cm) relationship for observed amazonian trees.

We inferred the following allometry using brms:

\[ \begin{array}{c} log(height) \sim N [ log(h_{max,s} \times \frac{dbh}{a_{h,s}+dbh^2}), \sigma^2] \\ h_{max,s} \sim N (h_{max}, \sigma_h^2) \\ a_{h,s} \sim N (a_{h}, \sigma_a^2) \end{array} \]

Code
library(brms)
mdata <- read_csv("data/species/tallo.csv") %>% 
  filter(longitude <= -39, longitude >= -79, latitude >= -18, latitude <= 10) %>% 
  select(species, height_m, stem_diameter_cm) %>% 
  rename(h = height_m, d = stem_diameter_cm) %>% 
  mutate(d = d/100) %>% 
  mutate(species = ifelse(is.na(species), "unknown", species))
fit <- brm(bf(log(h) ~ log(hmax * (d/ (ah + d))), 
              hmax ~ 1 + 1|species,
              ah ~ 1 + 1|species,
              nl = TRUE), 
           prior = c(
             prior(normal(40, 10), lb = 10, nlpar = "hmax"),
             prior(normal(0.5, 0.5), lb = 0, nlpar = "ah")
             ),
           data = mdata, chains = 2, cores = 2, threads = 10)
save(fit, file = "outputs/brms_tallo.Rdata")
fit %>% 
  tidybayes::spread_draws(b_hmax_Intercept, r_species__hmax[species,],
               b_ah_Intercept, r_species__ah[species,]) %>% 
  mutate(ah = b_ah_Intercept + r_species__ah) %>% 
  mutate(hmax = b_hmax_Intercept + r_species__hmax) %>% 
  group_by(species) %>% 
  summarise(ah = median(ah), hmax = median(hmax)) %>% 
  filter(species != "unknown") %>% 
  mutate(species = gsub(".", " ", species, fixed = TRUE)) %>% 
  left_join(read_csv("data/species/tallo.csv") %>% 
  filter(longitude <= -39, longitude >= -79, latitude >= -18, latitude <= 10) %>% 
    select(family, genus, species) %>% unique()) %>% 
  write_tsv("outputs/tallo_pars.tsv")

We had good convergence and good post-predictive checks.

Code
load('outputs/brms_tallo.Rdata')
bayesplot::mcmc_rhat(bayesplot::rhat(fit$fit))

Goodness of convergence assessed by Rhat. All Rhat are below 1.05 indicating a very good convergence of all chains.
Code
load('outputs/brms_tallo.Rdata')
bayesplot::pp_check(fit)

Posterior post-predictive check of simulated response variable against observed values indicating a good representation of the whole distribution in posteriors.

We thus inferred 496 additional \(a_h\), \(h_{max}\) couple of parameters.

Code
load('outputs/brms_tallo.Rdata')
sjPlot::tab_model(fit)
  log(h)
Predictors Estimates CI (95%)
hmax_Intercept 40.52 38.69 – 42.38
ah_Intercept 0.29 0.27 – 0.31
Observations 24609
R2 Bayes 0.742
Code
fit %>% 
  tidybayes::spread_draws(b_hmax_Intercept, r_species__hmax[species,],
               b_ah_Intercept, r_species__ah[species,]) %>% 
  mutate(ah = b_ah_Intercept + r_species__ah) %>% 
  mutate(hmax = b_hmax_Intercept + r_species__hmax) %>% 
  group_by(species) %>% 
  summarise(ah = median(ah), hmax = median(hmax)) %>%
  filter(species != "unknown") %>% 
  ggplot(aes(ah, hmax)) +
  geom_point() +
  theme_bw() +
  xlab(expression(a[h])) + ylab(expression(h[max]))

Inferred species allometric parameters hmax and ah and their relations.

Functional traits

We further combined the obtained coefficient from TALLO with the following datasets:

  • Jucker et al. (2022)
  • Maréchaux et al. (2015)
  • Guillemot et al. (2022)
  • Vleminckx et al. (2021)
  • TROLL version 4 species data
  • Tapajos functional traits data retrieved by Jérôme Chave (pers. com.) from the TRY database (Kattge, Bönisch, and al. 2020)

TROLL version 4 species data were gathered from:

  • TROLL version 3 species (partly from Baraloto et al. (2010), described in Maréchaux and Chave (2017), and available in @schmitt2023 and )
  • Maréchaux et al. (2015)
  • Maréchaux et al. (2019)
  • Schmitt and Boisseaux (2023)
  • Boisseaux et al. submitted
  • Ziegler et al. (2019)

All databases together included 20 traits across 2,921 species but with numerous missing data. Traits distribution were similar across datasets. However, for the same species, allometric parameters inferred from TALLO were lower than those inside TROLL V4 and leaf area from Guillemot was higher than those from TROLL and Vleminckx. As all Paracou TROLL species are in Vleminckx dataset and 109 species from Tapajos we will use only the species from Vleminckx to limit missing data in predictive mean matching (PMM) data imputation.

Code
# Maréchaux et al., (2015)
marechaux <- readxl::read_xls("data/species/marechaux.xls", "data") %>% 
  mutate(Pi_tlp = as.numeric(Pi_tlp)) %>% 
  separate(Binomial, c("genus", "species"), remove = FALSE) %>% 
  mutate(species = gsub("_", " ", Binomial)) %>% 
  rename(family = Family, TLP = Pi_tlp) %>% 
  dplyr::select(family, genus, species, TLP) %>% 
  mutate(dataset = "Maréchaux 2015") %>% 
  gather(trait, value, -dataset, -family, -genus, -species)
# Jucker et al., (2022) inferred
tallo <- read_tsv("outputs/tallo_pars.tsv") %>% 
  mutate(dataset = "Tallo inferred") %>% 
  gather(trait, value, -dataset, -family, -genus, -species)
# Vleminckx et al., (2021)
vleminckx <- readxl::read_xlsx("data/species/vleminckx.xlsx",
                  "App.S6-ok", skip = 3) %>% 
  mutate(species = sub("_", " ", Species)) %>% 
  mutate(species = gsub("[[:punct:]]", "", species)) %>% 
  rename(family = Family, genus = Genus, CC = Chlorophyll_content,
         LT = Thickness, LA = Leaf_Area, d13C = `13C`,
         BT = Trunk_bark_thickness) %>% 
    mutate(N = N/100, P = P/10^6, LMA = 1/SLA*1000) %>% 
  mutate(LMA = ifelse(LMA > 200, NA, LMA)) %>% 
  dplyr::select(family, genus, species, CC, LT, 
                Toughness, LA, LMA, C, N, d13C, Ca, P, K, 
                BT, WSG, SRL, SRTA) %>% 
   mutate(dataset = "Vleminckx 2021") %>% 
  gather(trait, value, -dataset, -family, -genus, -species)
# Guillemot et al., (2022)
guillemot <- readxl::read_xlsx("data/species/guillemot.xlsx", "Dataset") %>% 
  mutate(max_height = as.numeric(gsub(',', '.', max_height))) %>% 
  rename(LA = leaf_size, N = Leaf_N, P = Leaf_P, 
         WSG = Wood_density, hmax = max_height) %>% 
  mutate(TLP = ifelse(TLP < -3, NA, TLP)) %>% 
  mutate(P50 = ifelse(P50 < -10, NA, P50)) %>% 
  mutate(LA = ifelse(LA > 1000, NA, LA)) %>% 
  mutate(LMA = ifelse(LMA > 200, NA, LMA)) %>% 
  select(family, genus, species, TLP, P50, LMA, LA, WSG) %>% 
  mutate(dataset = "Guillemot 2022") %>% 
  gather(trait, value, -dataset, -family, -genus, -species)
# TROLL V4
troll <- rcontroll::TROLLv4_species %>% 
  rename_all(~ gsub("s_", "", .)) %>% 
  rename(species = name, N =Nmass, P = Pmass,
         WSG = wsg, TLP = tlp, LA = leafarea) %>% 
   mutate(species = gsub("_", " ", species)) %>% 
  select(-seedmass, -regionalfreq) %>% 
  left_join(select(vleminckx, family, genus, species) %>% unique()) %>% 
  mutate(dataset = "TROLL V4") %>% 
  gather(trait, value, -dataset, -family, -genus, -species)
# Tapajos
tapajos <- vroom::vroom("data/species/tapajos_try.csv") %>% 
  separate(species, c("genus", "Species"), remove = FALSE) %>% 
  mutate(family = BIOMASS::getTaxonomy(genus, findOrder = FALSE)$family)  %>% 
  filter(!is.na(Species)) %>% 
  filter(!is.na(family)) %>% 
  mutate(LMA = 1/SLA*1000, N = `N_try(mg/g)`/1000, WSG = wood_density,
         P = `P_try(mg/g)`/1000, dbhmax = dbh_max/100) %>% 
  select(family, genus, species, LMA, N, P, WSG, dbhmax) %>% 
  mutate(dataset = "Tapajos TRY") %>% 
  gather(trait, value, -dataset, -family, -genus, -species)
# All
raw <- bind_rows(marechaux, 
                 tallo,
                 vleminckx, 
                 guillemot, 
                 troll,
                 tapajos) %>% 
  na.omit()  
filtered <- raw %>% 
  filter(species %in% vleminckx$species) %>% 
  group_by(family, genus, species, dataset, trait) %>% 
  summarise(value = median(value)) %>% 
  group_by(family, genus, species, trait) %>% 
  summarise(value = median(value)) %>% 
  pivot_wider(names_from = trait, values_from = value) %>% 
  ungroup()
Code
filtered %>% 
  select(-family, -genus) %>% 
  gather(trait, value, -species) %>% 
  na.omit() %>% 
  group_by(trait) %>% 
  summarise(N = n()) %>% 
  knitr::kable(caption = "Available functional traits data per sepcies.")
Available functional traits data per sepcies.
trait N
BT 2043
C 2043
CC 2043
Ca 2043
K 2043
LA 2057
LMA 2028
LT 2043
N 2044
P 2044
P50 43
SRL 2043
SRTA 2043
TLP 125
Toughness 2043
WSG 2053
ah 252
d13C 2043
dbhmax 192
hmax 252
Code
options(knitr.kable.NA = '')
raw %>% 
  group_by(species, dataset) %>% 
  summarise(value = 1) %>% 
  pivot_wider(names_from = dataset, values_from = value) %>% 
  mutate(`Vleminckx 2021` = ifelse(is.na(`Vleminckx 2021`), "Vleminckx: no", "Vleminckx: yes")) %>% 
  ungroup() %>% 
  select(-species) %>% 
  gather(dataset, value, -`Vleminckx 2021`) %>% 
  na.omit() %>% 
  group_by(`Vleminckx 2021`, dataset) %>% 
  summarise(N = n()) %>% 
  pivot_wider(names_from = `Vleminckx 2021`, values_from = N) %>% 
  knitr::kable(caption = "Available functional traits data per species depending on the dataset and its overlap with Vleminchkx's data.")
Available functional traits data per species depending on the dataset and its overlap with Vleminchkx’s data.
dataset Vleminckx: no Vleminckx: yes
Guillemot 2022 538 57
Maréchaux 2015 9 62
Tallo inferred 302 193
Tapajos TRY 85 109
TROLL V4 107
Code
raw %>% 
  mutate(trait = gsub("_", " ", trait)) %>% 
  ggplot(aes(dataset, value, col = dataset)) +
  geom_boxplot() +
  facet_wrap(~ trait, scales = "free") +
  theme_bw() +
  theme(axis.title = element_blank(), axis.text.x = element_blank(),
        legend.position = "bottom") 

Raw distributions of functional traits data per dataset origin.
Code
raw %>% 
  select(species, dataset, trait, value) %>% 
  group_by(species, dataset, trait) %>% 
  summarise(value = median(value)) %>% 
  group_by(species, trait) %>%
  filter(n() > 1) %>%
  mutate(median = median(value, na.rm = TRUE)) %>% 
  ggplot(aes(median, value, col = dataset)) +
  geom_point() +
  facet_wrap(~ trait, scales = "free") +
  theme_bw() +
  geom_smooth(method = "lm", se = FALSE)

Species traits means distributions per datasets in function of their median across datasets.
Code
filtered %>% 
  select(hmax, ah, dbhmax, LA, LMA, N, P, TLP, WSG) %>% 
  rename(hlim = hmax, dbhtres = dbhmax) %>% 
  cor(use = "pairwise.complete.obs") %>% 
  corrplot::corrplot(type = "upper", diag = FALSE)

Code
filtered %>% 
  select(-family, -genus, -species) %>% 
  cor(use = "pairwise.complete.obs") %>% 
  corrplot::corrplot(type = "upper", diag = FALSE)

Species traits pairwise correlations.

Imputation

Missing data were too important to use random forest for imputation with missForest even with phylogenetic eigen values. Thus, we used simple predictive mean matching (PMM) data imputation with mice without phylogenetic eigen values. However, phylogenetic eigen values could also been used with phylomice (not done here). As we focused on species in common with Vlemincks data we only had to impute ah, dbhmax, hmax and TLP with values from Maréchaux, Tapajos and inferred with TALLO. Obtained imputed distributions matched well the observed distribution of the raw data and imputed functional space match the one of the raw data (PCA). The PCA also revealed similar functional spaces between species from Paracou and Tapajos, whereas Tapajos functional space is a bit wider.

Code
imputation <- filtered %>% 
  mice::mice(maxit = 100)
complete(imputation) %>% 
  gather(trait, value, -family, -genus, -species) %>% 
  mutate(type = "imputed") %>% 
  bind_rows(gather(filtered, trait, value, -family, -genus, -species) %>% 
              mutate(type = "raw")) %>% 
  na.omit() %>% 
  write_tsv("outputs/imputation.tsv")
Code
t <- vroom::vroom("outputs/imputation.tsv") %>% 
  filter(trait %in% c("ah", "dbhmax", "hmax", "TLP")) %>% 
  filter(species %in% unique(troll$species, tapajos$species)) %>% 
  mutate(trait_long = recode(trait, "ah" = "a[h]", "dbhmax" = "dbh[thres]", "hmax" = "h[lim]", "TLP" = "pi[~~tlp]"))
labels <- t %>% 
  group_by(trait, type) %>% 
  summarise(N = n()) %>% 
  pivot_wider(names_from = type, values_from = N) %>% 
  mutate(label = paste0("Raw: ", raw, "\nImputed:", imputed)) %>% 
  ungroup() %>% 
  mutate(value = c(-1.8, 0.25, 0.35, 40),
         density = c(0.5, 10, 1, 0.05)) %>% 
  mutate(trait_long = recode(trait, "ah" = "a[h]", "dbhmax" = "dbh[thres]", "hmax" = "h[lim]", "TLP" = "pi[~~tlp]"))
g <- ggplot(t, aes(value)) +
  geom_density(aes(col = type)) +
  geom_text(data = labels, aes(label = label, y = density)) +
  theme_bw() +
  facet_wrap(~ trait_long, scales = "free", labeller = label_parsed) +
  scale_color_discrete("") +
  theme(legend.position = "bottom", axis.title = element_blank())
ggsave("figures/fa01.png", g, dpi = 300, width = 5, height = 5, bg = "white")
g

Traits density distribution before and after species trait values imputation by predictive means matching.
Code
data <- vroom::vroom("outputs/imputation.tsv") %>% 
  filter(trait %in% troll$trait) %>% 
  pivot_wider(names_from = trait, values_from = value) %>% 
  na.omit()
autoplot(princomp(select(data, -family, -genus, -species, -type), cor = TRUE), 
         data = data, alpha = 0.25, col = "type",
         loadings.label.size = 6,
         loadings.label.colour = 'red', loadings.label.vjust = 1.1,
         loadings.label.repel   = TRUE,
         loadings = T, loadings.label = T, loadings.colour = 'red') +
  coord_equal() +
  geom_hline(aes(yintercept = 0), col = 'black', linetype = "dotted") +
  geom_vline(aes(xintercept = 0), col = 'black', linetype = "dotted") +
  theme_bw()

Traits distribution in the first plan of a PCA before and after species trait values imputation by predictive means matching.
Code
data2 <- data %>% 
  filter(type == "imputed") %>% 
  mutate(tapajos = (species %in% tapajos$species)) %>% 
  mutate(paracou = (species %in% troll$species)) %>% 
  filter(tapajos | paracou) %>% 
  mutate(forest = ifelse(tapajos, "tapajos", "paracou")) %>% 
  mutate(forest = ifelse(paracou & tapajos, "both", forest)) %>% 
  select(-tapajos, -paracou, -type)
autoplot(princomp(select(data2, -family, -genus, -species, -forest), cor = TRUE), 
         data = data2, alpha = 0.5, col = "forest",
         loadings.label.size = 6,
         loadings.label.colour = 'red', loadings.label.vjust = 1.1,
         loadings.label.repel   = TRUE,
         loadings = T, loadings.label = T, loadings.colour = 'red') +
  coord_equal() +
  geom_hline(aes(yintercept = 0), col = 'black', linetype = "dotted") +
  geom_vline(aes(xintercept = 0), col = 'black', linetype = "dotted") +
  theme_bw() +
  stat_ellipse(aes(col = forest))

Traits distribution in the first plan of a PCA among sites after imputation by predictive means matching.
Code
data2 %>% 
  filter(forest != "paracou") %>% 
  select(-forest, -family, -genus) %>% 
  mutate(seedmass = 1, regionalfreq = 1/n()) %>% 
  mutate(species = gsub(" ", "_", species)) %>% 
  rename(s_name = species, s_LMA = LMA, s_Nmass = N, s_Pmass = P, s_wsg = WSG,
         s_dbhmax = dbhmax, s_hmax = hmax, s_ah = ah, 
         s_seedmass = seedmass, s_regionalfreq = regionalfreq, 
         s_tlp = TLP, s_leafarea = LA) %>% 
  write_tsv("outputs/Tapajos_species_troll.tsv")
data2 %>% 
  filter(forest != "tapajos") %>% 
  select(-forest, -family, -genus) %>% 
  mutate(seedmass = 1, regionalfreq = 1/n()) %>% 
  mutate(species = gsub(" ", "_", species)) %>% 
  rename(s_name = species, s_LMA = LMA, s_Nmass = N, s_Pmass = P, s_wsg = WSG,
         s_dbhmax = dbhmax, s_hmax = hmax, s_ah = ah, 
         s_seedmass = seedmass, s_regionalfreq = regionalfreq, 
         s_tlp = TLP, s_leafarea = LA) %>% 
  write_tsv("outputs/Paracou_species_troll.tsv")

Functional composition

Forest functional composition was simply assessed using functional trait distribution in each plots (repeted lines in Paracou data) at the species level in Paracou and genus level in Tapajos.

Code
traits <- bind_rows(read_tsv("outputs/Tapajos_species_troll.tsv"),
          read_tsv("outputs/Paracou_species_troll.tsv")) %>% 
  unique() %>% 
  rename_all(~ gsub("s_", "", .)) %>% 
  select(-seedmass, -regionalfreq) %>% 
  rename(species = name) %>% 
  mutate(species = gsub("_", " ", species)) %>% 
  gather(trait, trait_value, -species)
paracou <- read_tsv("outputs/inventories.tsv") %>% 
  filter(site == "Paracou", type == "default") %>% 
  mutate(plot = ifelse(site == "Tapajos", 1, plot)) %>% 
  filter(type == "default") %>% 
  select(site, plot, species) %>% 
  left_join(traits) %>% 
  na.omit()
tapajos <- read_tsv("outputs/inventories.tsv") %>% 
  filter(site == "Tapajos", type == "default") %>% 
  mutate(site = "Tapajos", plot = 1) %>% 
  separate(species, "species") %>% 
  select(plot, site, species) %>% 
      left_join(traits %>% 
  separate(species, "species") %>% 
  group_by(species, trait) %>% 
  summarise_all(mean)) %>% 
  na.omit()
bind_rows(paracou, tapajos) %>% 
  write_tsv("outputs/functional_composition.tsv")
Code
read_tsv("outputs/functional_composition.tsv") %>% 
  filter(site == "Paracou") %>% 
  ggplot(aes(trait_value, group = plot)) +
  geom_density(alpha = 0.5) +
  facet_wrap(~ trait, scales = "free") +
  theme_bw() +
  ggtitle("Paracou", "546,498 inds represented on 577,761 (94.59%)") +
  theme(axis.title = element_blank(), legend.position = "bottom") +
  scale_color_discrete("")

Functional composition at the Paracou site expressed in terms of density distribution per trait. The analyses have been done at the species level in Paracou.
Code
read_tsv("outputs/functional_composition.tsv") %>% 
  filter(site == "Tapajos") %>% 
  ggplot(aes(trait_value, group = plot)) +
  geom_density() +
  facet_wrap(~ trait, scales = "free") +
  theme_bw() +
  ggtitle("Tapajos Genus level", "19,188 inds represented on 19,499 (98.65%)") +
  theme(axis.title = element_blank(), legend.position = "bottom") +
  scale_color_discrete("")

Functional composition at the Tapajos site expressed in terms of density distribution per trait. The analyses have been done at the genus level in Tapajos.
Code
library(tidyverse)
library(rcontroll)
library(vroom)
climate <- vroom("results/data/Paracou_climate.tsv") %>% 
  arrange(time) %>% 
  filter(paste0(month(time), "-", day(time)) != "2-29") %>% 
  mutate(snet = ifelse(snet <= 1.1, 1.1, snet)) %>% 
  mutate(vpd = ifelse(vpd <= 0.011, 0.011, vpd)) %>% 
  mutate(ws = ifelse(ws <= 0.11, 0.11, ws))
clim <- generate_climate(climate, 
                         daytime_start = 7, 
                         daytime_end = 19)
day <- generate_dailyvar(climate, 
                         daytime_start = 7, 
                         daytime_end = 19)
spinup <-  load_output(name = "Paracou_1.8_-0.25_0.035",
                       path = "results/calib_str4_run/Paracou_1.8_-0.25_0.035/")
n <- as.numeric(nrow(clim))
parameters <- spinup@inputs$global %>% 
  mutate(value = ifelse(param == "nbiter", n, value))
sim <- troll(
  name = "old",
  path = "test_species",
  global = parameters,
  species = spinup@inputs$species, 
  climate = clim,
  daily = day,
  pedology = spinup@inputs$pedology, 
  forest = get_forest(spinup),
  soil = get_soil(spinup), 
  load = FALSE,
  verbose = TRUE,
  overwrite = TRUE
)

species_new <- read_tsv("outputs/Paracou_species_troll.tsv")
sim <- troll(
  name = "new",
  path = "test_species",
  global = parameters,
  species = species_new, 
  climate = clim,
  daily = day,
  pedology = spinup@inputs$pedology, 
  forest = get_forest(spinup),
  soil = get_soil(spinup), 
  load = FALSE,
  verbose = TRUE,
  overwrite = TRUE
)
Code
library(tidyverse)
stack <- rcontroll::load_stack("test_species", "test_species")
g <- rcontroll::autoplot(stack)
ggsave(plot = g, filename = "test_species.png", height = 30)