This book discusses what is currently known about software engineering, based on an analysis of all the publicly available data. This aim is not as ambitious as it sounds, because there is not a great deal of data publicly available.
The intent is to provide material that is useful to professional developers working in industry; until recently researchers in software engineering have been more interested in vanity work, promoted by ego and bluster.
The material is organized in two parts, the first covering software engineering and the second the statistics likely to be needed for the analysis of software engineering data.
Conditions of Use
This book is licensed under a Creative Commons License (CC BY-NC-SA). You can download the ebook Evidence-based Software Engineering for free.
- Title
- Evidence-based Software Engineering
- Subtitle
- based on the publicly available data
- Publisher
- Knowledge Software
- Author(s)
- Derek M. Jones
- Published
- 2022-02-03
- Edition
- 1
- Format
- eBook (pdf, epub, mobi)
- Pages
- 455
- Language
- English
- ISBN-10
- B09RQ98SCL
- ISBN-13
- 9781838291303
- License
- CC BY-NC-SA
- Book Homepage
- Free eBook, Errata, Code, Solutions, etc.
Introduction What has been learned? Replication Software markets The primary activities of software engineering History of software engineering research Folklore Research ecosystems Overview of contents Why use R? Terminology, concepts and notation Further reading Human cognition Introduction Modeling human cognition Embodied cognition Perfection is not cost-effective Motivation Built-in behaviors Cognitive effort Attention Visual processing Reading Memory systems Short term memory Episodic memory Recognition and recall Serial order information Forgetting Learning and experience Belief Expertise Category knowledge Categorization consistency Reasoning Deductive reasoning Linear reasoning Causal reasoning Number processing Numeric preferences Symbolic distance and problem size effect Estimating event likelihood High-level functionality Personality & intelligence Risk taking Decision-making Expected utility and Prospect theory Overconfidence Time discounting Developer performance Miscellaneous Cognitive capitalism Introduction Investment decisions Discounting for time Taking risk into account Incremental investments and returns Investment under uncertainty Real options Capturing cognitive output Intellectual property Bumbling through life Expertise Group dynamics Maximizing generated surplus Motivating members Social status Social learning Group learning and forgetting Information asymmetry Moral hazard Group survival Group problem solving Cooperative competition Software reuse Company economics Cost accounting The shape of money Valuing software Maximizing ROI Value creation Product/service pricing Predicting sales volume Managing customers as investments Commons-based peer-production Ecosystems Introduction Funding Hardware Evolution Diversity Lifespan Entering a market Population dynamics Growth processes Estimating population size Closed populations Open populations Organizations Customers Culture Software vendors Career paths Applications and Platforms Platforms Pounding the treadmill Users' computers Software development Programming languages Libraries and packages Tools Information sources Projects Introduction Project culture Project lifespan Pitching for projects Contracts Resource estimation Estimation models Time Size Paths to delivery Development methodologies The Waterfall/iterative approach The Agile approach Managing progress Discovering functionality needed for acceptance Implementation Supporting multiple markets Refactoring Documentation Acceptance Deployment Development teams New staff Ongoing staffing Post-delivery updates Database evolution Reliability Introduction It's not a fault, it's a feature Why do fault experiences occur? Fault report data Cultural outlook Maximizing ROI Experiencing a fault Input profile Propagation of mistakes Remaining faults: closed populations Remaining faults: open populations Where is the mistake? Requirements Source code Libraries and tools Documentation Non-software causes of unreliability System availability Checking for intended behavior Code review Testing Creating tests Beta testing Estimating test effectiveness Cost of testing Source code Introduction Quantity of source Experiments Exponential or Power law Folklore metrics Desirable characteristics The need to know Narrative structures Explaining code Memory for material read Integrating information Visual organization Consistency Identifier names Programming languages Build bureaucracy Patterns of use Language characteristics Runtime characteristics Statements Control flow Loops Expressions Literal values Use of variables Calls Declarations Unused identifiers Ordering of definitions within aggregate types Evolution of source code Function/method modification Stories told by data Introduction Finding patterns in data Initial data exploration Guiding the eye through data Smoothing data Densely populated measurement points Visualizing a single column of values Relationships between items 3-dimensions Communicating a story What kind of story? Technicalities should go unnoticed People have color vision Color palette selection Plot axis: what and how Communicating numeric values Communicating fitted models Probability Introduction Useful rules of thumb Measurement scales Probability distributions Are two sample drawn from the same distribution? Fitting a probability distribution to a sample Zero-truncated and zero-inflated distributions Mixtures of distributions Heavy/Fat tails Markov chains A Markov chain example Social network analysis Combinatorics A combinatorial example Generating functions Statistics Introduction Statistical inference Samples and populations Effect-size Sampling error Statistical power Describing a sample A central location Sensitivity of central location algorithms Geometric mean Harmonic mean Contaminated distributions Compositional data Meta-Analysis Statistical error Hypothesis testing p-value Confidence intervals The bootstrap Permutation tests Comparing samples Building regression models Comparing sample means Comparing standard deviation Correlation Contingency tables ANOVA Regression modeling Introduction Linear regression Scattered measurement values Discrete measurement values Uncertainty only exists in the response variable Modeling data that curves Visualizing the general trend Influential observations and Outliers Diagnosing problems in a regression model A model's goodness of fit Abrupt changes in a sequence of values Low signal-to-noise ratio Moving beyond the default Normal error Count data Continuous response variable having a lower bound Transforming the response variable Binary response variable Multinomial data Rates and proportions response variables Multiple explanatory variables Interaction between variables Correlated explanatory variables Penalized regression Non-linear regression Power laws Mixed-effects models Generalised Additive Models Miscellaneous Advantages of using lm Very large datasets Alternative residual metrics Quantile regression Extreme value statistics Time series Cleaning time series data Modeling time series Building an ARMA model Non-constant variance Smoothing and filtering Spectral analysis Relationships between time series Miscellaneous Survival analysis Kinds of censoring Input data format Survival curve Regression modeling Cox proportional-hazards model Time varying explanatory variables Competing risks Multi-state models Circular statistics Circular distributions Fitting a regression model Linear response with a circular explanatory variable Compositional data Miscellaneous techniques Introduction Machine learning Decision trees Clustering Sequence mining Ordering of items Seriation Preferred item ordering Agreement between raters Simulation Experiments Introduction Measurement uncertainty Design of experiments Subjects The task What is actually being measured? Adapting an ongoing experiment Selecting experimental options Factorial designs Benchmarking Following the herd Variability in today's computing systems Hardware variation Software variation The cloud End user systems Surveys Data preparation Introduction Documenting cleaning operations Outliers Malformed file contents Missing data Handling missing values NA handling by library functions Restructuring data Reorganizing rows/columns Miscellaneous issues Application specific cleaning Different name, same meaning Multiple sources of signals Duplicate data Default values Resolution limit of measurements Detecting fabricated data Overview of R Your first R program Language overview Differences between R and widely used languages Objects Operations on vectors Creating a vector/array/matrix Indexing Lists Data frames Symbolic forms Factors and levels Operators Testing for equality Assignment The R type (mode) system Converting the type (mode) of a value Statements Defining a function Commonly used functions Input/Output Graphical output Non-statistical uses of R Very large datasets