Ecological data pose many challenges to statistical inference. Most data come from observational studies rather than designed experiments; observational units are frequently sampled repeatedly over time, resulting in multiple, non-independent measurements; response data are often binary (e.g., presence-absence data) or non-negative integers (e.g., counts), and therefore, the data do not fit the standard assumptions of linear regression (Normality, independence, and constant variance). This book will familiarize readers with modern statistical methods that address these complexities using both frequentist and Bayesian frameworks.
Conditions of Use
This book is licensed under a Creative Commons License (CC BY). You can download the ebook Statistics for Ecologists for free.
- Title
- Statistics for Ecologists
- Subtitle
- A Frequentist and Bayesian Treatment of Modern Regression Models
- Publisher
- University of Minnesota Libraries Publishing
- Author(s)
- John Fieberg
- Published
- 2024-01-31
- Edition
- 1
- Format
- eBook (pdf, epub, mobi)
- Pages
- 571
- Language
- English
- ISBN-13
- 9781959870029
- License
- CC BY
- Book Homepage
- Free eBook, Errata, Code, Solutions, etc.
About the Author Preface Formatting and conventions Pre-requisites Learning objectives Skills objectives Real versus simulated data Standing on the shoulders of… How to use this book and associated resources for teaching Feedback Acknowledgements I Models for Normally Distributed Responses 1 Linear regression review 1.1 R Packages 1.2 Data example: Sustainable trophy hunting of male African lions 1.3 Interpreting slope and intercept 1.4 Linear regression assumptions 1.5 Evaluating assumptions 1.6 Statistical inference: Sampling distributions 1.7 Sampling distribution of the t-statistic 1.8 Confidence intervals 1.9 Confidence intervals versus prediction intervals 1.10 Hypothesis tests and p-values 1.11 R2 : Amount of variance explained 1.12 Coded answers to exercises 2 Bootstrapping 2.1 R Packages 2.2 Motivating data example: RIKZ [Dutch governmental institute] data 2.3 Consequences of assumption violations 2.4 Introduction to the Bootstrap 2.5 Replicating the sampling design 2.6 Utility of the bootstrap 3 Multiple regression 3.1 R Packages 3.2 Introduction to multiple regression 3.3 Matrix notation for regression 3.4 Parameter estimation, sums-of-squares, and R2 3.5 Parameter interpretation: Multiple regression with RIKZ data 3.6 Categorical predictors 3.7 Categorical variables with >2 levels or categories 3.8 Models with interactions 3.9 Pairwise comparisons 3.10 Multiple degree-of-freedom hypothesis tests 3.11 Regression F-statistic 3.12 Contrasts: Estimation of linear combinations of parameters 3.13 Aside: Revisiting F-tests and comparing them to Wald χ2 tests 3.14 Visualizing multiple regression models 4 Modeling Non-linear relationships 4.1 R Packages 4.2 Modeling non-linear relationships 4.3 Polynomials 4.4 Basis functions/vectors 4.5 Splines 4.6 Splines versus polynomials 4.7 Visualizing non-linear relationships 4.8 Generalized additive models (GAMS) 4.9 Generalizations to multiple non-linear relationships 4.10 Non-Linear models with a mechanistic basis 4.11 Aside: Orthogonal polynomials 4.12 Aside: Segmented and piecewise regression 5 Generalized least squares (GLS) 5.1 R Packages 5.2 Why cover models fit using generalized least squares? 5.3 Generalized least squares (GLS): Non-constant variance 5.4 Variance heterogeneity among groups: T-test with unequal variances 5.5 Variance increasing with Xi or μi 5.6 Approximate confidence and prediction intervals 5.7 Modeling strategy II What Variables to Include in a Model? 6 Multicollinearity 6.1 R Packages 6.2 Multicollinearity 6.3 Motivating example: what factors predict how long mammals sleep? 6.4 Variance inflation factors (VIF) 6.5 Understanding confounding using DAGs: A simulation example 6.6 Strategies for addressing multicollinearity 6.7 Applied example: Modeling the effect of correlated environmental factors on the distribution of subtidal kelp 6.8 Residual and sequential regression 6.9 Principal components regression 6.10 Other methods 7 Causal Inference 7.1 R Packages 7.2 Introduction to causal inference 7.3 Directed acyclical graphs and conditional independencies 7.4 d-separation 7.5 Estimating causal effects (direct, indirect, and total effects) 7.6 Some (summary) comments 8 Modeling Strategies 8.1 R Packages 8.2 Goals of multivariable regression modeling 8.3 My experience 8.4 Stepwise selection algorithms 8.5 Degrees of freedom (df) spending: One model to rule them all 8.6 AIC and model-averaging 8.7 Regularization using penalization 8.8 Evaluating model performance 8.9 Summary III Frequentist and Bayesian Inferential Frameworks 9 Introduction to probability distributions 9.1 Statistical distributions and regression 9.2 Probability rules 9.3 Probability Trees 9.4 Sample space, random variables, probability distributions 9.5 Cumulative distribution functions 9.6 Expected value and variance of a random variable 9.7 Expected value and variance of sums and products 9.8 Joint, marginal, and conditional distributions 9.9 Probability distributions in R 9.10 A sampling of discrete random variables 9.11 A sampling of continuous probability distributions 9.12 Some probability distributions used for hypotheses testing 9.13 Choosing an appropriate distribution 9.14 Summary of statistical distributions 10 Maximum likelihood 10.1 R packages 10.2 Parameter estimation 10.3 Introductory example: Estimating slug densities 10.4 Probability to the likelihood 10.5 Maximizing the likelihood 10.6 Properties of maximum likelihood estimators 10.7 A more complicated example: Fitting a weight-at-age model 10.8 Confidence intervals for functions of parameters 10.9 Likelihood ratio test 10.10 Profile likelihood confidence intervals 10.11 Aside: Least squares and maximum likelihood 11 Introduction to Bayesian statistics 11.1 R packages 11.2 Review of frequentist statistics 11.3 Bayesian statistics 11.4 Comparing frequentist and Bayesian inference for a simple model 12 A Brief introduction to MCMC sampling and JAGS 12.1 R packages 12.2 Introduction to MCMC sampling 12.3 Metropolis algorithm 12.4 Aside: Sampler performance 12.5 Specifying a model in JAGS 12.6 Fitting a model using JAGS 12.7 Density and traceplots for assessing convergence 12.8 Tips on running models in JAGS 13 Bayesian linear regression 13.1 R packages 13.2 Bayesian linear regression 13.3 Testing assumptions of the model 13.4 Goodness-of-fit testing: Bayesian p-values 13.5 Credible intervals 13.6 Prediction intervals 13.7 Credible and prediction intervals the easy way IV Models for Non-Normal Data 14 Introduction to generalized linear models (GLMs) 14.1 R packages 14.2 The Normal distribution as a data-generating model 14.3 Generalized linear models 14.4 Describing the data-generating mechanism 14.5 Example: Poisson regression for pheasant counts 15 Regression models for count data 15.1 R packages 15.2 Introductory examples: Slugs and Fish 15.3 Parameter interpretation 15.4 Evaluating assumptions and fit 15.5 Quasi-Poisson model for overdispersed data 15.6 Negative Binomial regression 15.7 Model comparisons 15.8 Bayesian implementation of count-based regression models 15.9 Modeling rates and densities using an offset 16 Logistic regression 16.1 R packages 16.2 Introduction to logistic regression 16.3 Parameter Interpretation: Application to moose detection data 16.4 Evaluating assumptions and fit 16.5 Model comparisons 16.6 Effect plots: Visualizing generalized linear models 16.7 Logistic regression: Bayesian implementations 16.8 Aside: Logistic regression with multiple trials 16.9 Aside: Complete separation 17 Models for zero-inflated data 17.1 R packages 17.2 Zero-inflation 17.3 Testing for excess zeros 17.4 Zeros and the Negative Binomial model 17.5 Hurdle models 17.6 Zero-inflated mixture models 17.7 Example: Fishing success in state parks 17.8 Goodness-of-fit 17.9 Bayesian implementation 17.10 Implementation using glmmTMB V Models for Correlated Data 18 Linear Mixed Effects Models 18.1 R packages 18.2 Revisiting the independence assumption 18.3 Optional: Mixed-effect models versus simple alternatives 18.4 What are random effects, mixed-effect models, and when should they be considered? 18.5 Two-step approach to building a mixed-effects model 18.6 Random-intercept versus random intercept and slope model 18.7 Fitting mixed-effects models in R 18.8 Site-specific parameters (BLUPs) 18.9 Fixed effects versus random effects and shrinkage 18.10 Fitted/predicted values from mixed-effects models 18.11 Model assumptions 18.12 Model comparisons, hypothesis tests, and confidence intervals for fixed-effects and variance parameters 18.13 Modeling strategies revisited 18.14 Induced correlation, marginal model, generalized least squares 18.15 Random effects specified using multiple grouping variables 18.16 Implementing mixed-effects models in JAGS 19 Generalized linear mixed effects models (GLMMs) 19.1 R packages 19.2 Case study: A comparison of single and double-cylinder nest structures 19.3 Review of approaches for modeling correlated data for Normally distributed response variables. 19.4 Extentions to Count and Binary Data 19.5 Generalized linear mixed effects models 19.6 Modeling nest occupancy using generalized linear mixed effects models 19.7 Parameter estimation 19.8 Parameter interpretation 19.9 Hypothesis testing and modeling strategies 20 Generalized Estimating Equations (GEE) 20.1 R packages 20.2 Introduction to Generalized Estimating Equations 20.3 Motivating GEEs 20.4 GEE applied to our previous simulation examples 20.5 GEEs versus GLMMs Appendix A Projects and Reproducible Reports in R A.1 Introduction to R and RStudio A.2 Reproducible projects with RStudio and R markdown A.3 RStudio projects A.4 R markdown A.5 More on R markdown References