Statistics for Ecologists - Open Tech Book

Ecological data pose many challenges to statistical inference. Most data come from observational studies rather than designed experiments; observational units are frequently sampled repeatedly over time, resulting in multiple, non-independent measurements; response data are often binary (e.g., presence-absence data) or non-negative integers (e.g., counts), and therefore, the data do not fit the standard assumptions of linear regression (Normality, independence, and constant variance). This book will familiarize readers with modern statistical methods that address these complexities using both frequentist and Bayesian frameworks.

Conditions of Use

This book is licensed under a Creative Commons License (CC BY). You can download the ebook Statistics for Ecologists for free.

Title: Statistics for Ecologists
Subtitle: A Frequentist and Bayesian Treatment of Modern Regression Models
Publisher: University of Minnesota Libraries Publishing
Author(s): John Fieberg
Published: 2024-01-31
Edition: 1
Format: eBook (pdf, epub, mobi)
Pages: 571
Language: English
ISBN-13: 9781959870029
License: CC BY
Book Homepage: Free eBook, Errata, Code, Solutions, etc.

About the Author
Preface
Formatting and conventions
Pre-requisites
Learning objectives
Skills objectives
Real versus simulated data
Standing on the shoulders of…
How to use this book and associated resources for teaching
Feedback
Acknowledgements
I Models for Normally Distributed Responses
1 Linear regression review
1.1 R Packages
1.2 Data example: Sustainable trophy hunting of male African lions
1.3 Interpreting slope and intercept
1.4 Linear regression assumptions
1.5 Evaluating assumptions
1.6 Statistical inference: Sampling distributions
1.7 Sampling distribution of the t-statistic
1.8 Confidence intervals
1.9 Confidence intervals versus prediction intervals
1.10 Hypothesis tests and p-values
1.11 R2
: Amount of variance explained
1.12 Coded answers to exercises
2 Bootstrapping
2.1 R Packages
2.2 Motivating data example: RIKZ [Dutch governmental institute] data
2.3 Consequences of assumption violations
2.4 Introduction to the Bootstrap
2.5 Replicating the sampling design
2.6 Utility of the bootstrap
3 Multiple regression
3.1 R Packages
3.2 Introduction to multiple regression
3.3 Matrix notation for regression
3.4 Parameter estimation, sums-of-squares, and R2
3.5 Parameter interpretation: Multiple regression with RIKZ data
3.6 Categorical predictors
3.7 Categorical variables with >2
 levels or categories
3.8 Models with interactions
3.9 Pairwise comparisons
3.10 Multiple degree-of-freedom hypothesis tests
3.11 Regression F-statistic
3.12 Contrasts: Estimation of linear combinations of parameters
3.13 Aside: Revisiting F-tests and comparing them to Wald χ2
 tests
3.14 Visualizing multiple regression models
4 Modeling Non-linear relationships
4.1 R Packages
4.2 Modeling non-linear relationships
4.3 Polynomials
4.4 Basis functions/vectors
4.5 Splines
4.6 Splines versus polynomials
4.7 Visualizing non-linear relationships
4.8 Generalized additive models (GAMS)
4.9 Generalizations to multiple non-linear relationships
4.10 Non-Linear models with a mechanistic basis
4.11 Aside: Orthogonal polynomials
4.12 Aside: Segmented and piecewise regression
5 Generalized least squares (GLS)
5.1 R Packages
5.2 Why cover models fit using generalized least squares?
5.3 Generalized least squares (GLS): Non-constant variance
5.4 Variance heterogeneity among groups: T-test with unequal variances
5.5 Variance increasing with Xi
 or μi
5.6 Approximate confidence and prediction intervals
5.7 Modeling strategy
II What Variables to Include in a Model?
6 Multicollinearity
6.1 R Packages
6.2 Multicollinearity
6.3 Motivating example: what factors predict how long mammals sleep?
6.4 Variance inflation factors (VIF)
6.5 Understanding confounding using DAGs: A simulation example
6.6 Strategies for addressing multicollinearity
6.7 Applied example: Modeling the effect of correlated environmental factors on the distribution of subtidal kelp
6.8 Residual and sequential regression
6.9 Principal components regression
6.10 Other methods
7 Causal Inference
7.1 R Packages
7.2 Introduction to causal inference
7.3 Directed acyclical graphs and conditional independencies
7.4 d-separation
7.5 Estimating causal effects (direct, indirect, and total effects)
7.6 Some (summary) comments
8 Modeling Strategies
8.1 R Packages
8.2 Goals of multivariable regression modeling
8.3 My experience
8.4 Stepwise selection algorithms
8.5 Degrees of freedom (df) spending: One model to rule them all
8.6 AIC and model-averaging
8.7 Regularization using penalization
8.8 Evaluating model performance
8.9 Summary
III Frequentist and Bayesian Inferential Frameworks
9 Introduction to probability distributions
9.1 Statistical distributions and regression
9.2 Probability rules
9.3 Probability Trees
9.4 Sample space, random variables, probability distributions
9.5 Cumulative distribution functions
9.6 Expected value and variance of a random variable
9.7 Expected value and variance of sums and products
9.8 Joint, marginal, and conditional distributions
9.9 Probability distributions in R
9.10 A sampling of discrete random variables
9.11 A sampling of continuous probability distributions
9.12 Some probability distributions used for hypotheses testing
9.13 Choosing an appropriate distribution
9.14 Summary of statistical distributions
10 Maximum likelihood
10.1 R packages
10.2 Parameter estimation
10.3 Introductory example: Estimating slug densities
10.4 Probability to the likelihood
10.5 Maximizing the likelihood
10.6 Properties of maximum likelihood estimators
10.7 A more complicated example: Fitting a weight-at-age model
10.8 Confidence intervals for functions of parameters
10.9 Likelihood ratio test
10.10 Profile likelihood confidence intervals
10.11 Aside: Least squares and maximum likelihood
11 Introduction to Bayesian statistics
11.1 R packages
11.2 Review of frequentist statistics
11.3 Bayesian statistics
11.4 Comparing frequentist and Bayesian inference for a simple model
12 A Brief introduction to MCMC sampling and JAGS
12.1 R packages
12.2 Introduction to MCMC sampling
12.3 Metropolis algorithm
12.4 Aside: Sampler performance
12.5 Specifying a model in JAGS
12.6 Fitting a model using JAGS
12.7 Density and traceplots for assessing convergence
12.8 Tips on running models in JAGS
13 Bayesian linear regression
13.1 R packages
13.2 Bayesian linear regression
13.3 Testing assumptions of the model
13.4 Goodness-of-fit testing: Bayesian p-values
13.5 Credible intervals
13.6 Prediction intervals
13.7 Credible and prediction intervals the easy way
IV Models for Non-Normal Data
14 Introduction to generalized linear models (GLMs)
14.1 R packages
14.2 The Normal distribution as a data-generating model
14.3 Generalized linear models
14.4 Describing the data-generating mechanism
14.5 Example: Poisson regression for pheasant counts
15 Regression models for count data
15.1 R packages
15.2 Introductory examples: Slugs and Fish
15.3 Parameter interpretation
15.4 Evaluating assumptions and fit
15.5 Quasi-Poisson model for overdispersed data
15.6 Negative Binomial regression
15.7 Model comparisons
15.8 Bayesian implementation of count-based regression models
15.9 Modeling rates and densities using an offset
16 Logistic regression
16.1 R packages
16.2 Introduction to logistic regression
16.3 Parameter Interpretation: Application to moose detection data
16.4 Evaluating assumptions and fit
16.5 Model comparisons
16.6 Effect plots: Visualizing generalized linear models
16.7 Logistic regression: Bayesian implementations
16.8 Aside: Logistic regression with multiple trials
16.9 Aside: Complete separation
17 Models for zero-inflated data
17.1 R packages
17.2 Zero-inflation
17.3 Testing for excess zeros
17.4 Zeros and the Negative Binomial model
17.5 Hurdle models
17.6 Zero-inflated mixture models
17.7 Example: Fishing success in state parks
17.8 Goodness-of-fit
17.9 Bayesian implementation
17.10 Implementation using glmmTMB
V Models for Correlated Data
18 Linear Mixed Effects Models
18.1 R packages
18.2 Revisiting the independence assumption
18.3 Optional: Mixed-effect models versus simple alternatives
18.4 What are random effects, mixed-effect models, and when should they be considered?
18.5 Two-step approach to building a mixed-effects model
18.6 Random-intercept versus random intercept and slope model
18.7 Fitting mixed-effects models in R
18.8 Site-specific parameters (BLUPs)
18.9 Fixed effects versus random effects and shrinkage
18.10 Fitted/predicted values from mixed-effects models
18.11 Model assumptions
18.12 Model comparisons, hypothesis tests, and confidence intervals for fixed-effects and variance parameters
18.13 Modeling strategies revisited
18.14 Induced correlation, marginal model, generalized least squares
18.15 Random effects specified using multiple grouping variables
18.16 Implementing mixed-effects models in JAGS
19 Generalized linear mixed effects models (GLMMs)
19.1 R packages
19.2 Case study: A comparison of single and double-cylinder nest structures
19.3 Review of approaches for modeling correlated data for Normally distributed response variables.
19.4 Extentions to Count and Binary Data
19.5 Generalized linear mixed effects models
19.6 Modeling nest occupancy using generalized linear mixed effects models
19.7 Parameter estimation
19.8 Parameter interpretation
19.9 Hypothesis testing and modeling strategies
20 Generalized Estimating Equations (GEE)
20.1 R packages
20.2 Introduction to Generalized Estimating Equations
20.3 Motivating GEEs
20.4 GEE applied to our previous simulation examples
20.5 GEEs versus GLMMs
Appendix
A Projects and Reproducible Reports in R
A.1 Introduction to R and RStudio
A.2 Reproducible projects with RStudio and R markdown
A.3 RStudio projects
A.4 R markdown
A.5 More on R markdown
References

Statistics