Harvey J. Motulsky, 2014 (3rd Edition)

Statistics is not intuitive.

People frequently see patterns in random data and often jump to unwarranted conclusions.

Decisions about how to analyze data should be made in advance.

Beware of "HARKing": hypothesizing after results are known.

Statistical inference lets you make general conclusions from limited data. Statistical conclusions are always presented in terms of probability.

A confidence interval quantifies precision, and is easy to interpret.

All statistical tests are based on assumptions.

If your data are not representative of a larger set of data you could have collected (but did not), then statistical inference makes no sense.

A p-values tests a null hypothesis, and is hard to understand at first. "Statistically significant" does not mean the effect is large or scientifically important. "Not significantly different" does not mean the effect is absent, small or scientifically irrelevant.

The concept of statistical significance is designed to help make a decision based on one result.

If a difference is not statistically significant, you can conclude that the observed results are not inconsistent with the null hypothesis. You cannot conclude that the null hypothesis is true.

Multiple comparisons (tests) make it hard to interpret statistical results.

Correlation does not mean causation.

Published statistics tend to be optimistic.

Elements of experimental design:

- sample size;
- parametric/nonparametric test;
- outlier handling;
- data transformation;
- normalization to external control values;
- adjust for regressors;
- weighting factors in regression;

In mathematical statistics, **design of experiments** (DOE) [@Fisher1935] deals with
the optimal configuration of variables to be used in an experiment subject to measurement errors.
Example designs:
factorial (all possible combinations of factor levels), fractional factorial, one-factor-at-a-time;
block (e.g. Latin square, Latin hypercube, orthogonal array);
response surface (optimal design for regression models);

Some principles in design of experiments:

- comparison/control: the effect of a cause is always relative to another cause [@Holland1986];
- randomization: average out effect of nuisance variables;
- blocking: grouping similar units to control for apparent variability (nuisance variables);
- blinding: hold out information to avoid subjective bias (open-label, single-blind, double-blind);
- orthogonality: independent variables are uncorrelated;
- factorial: efficient at evaluating the effects and interactions of several factors;
- replication;

(Factors refers to independent variables.) (Nuisance factors are those that may affect the measured result, but are not of primary interest.)

Compared to most experiments, observational studies often require more complicated analyses and yield less certain results.

Observational studies can be useful but are rarely definitive.

- Don't get distracted by p-values and conclusions about statistical significance without also thinking about the
**effect size**.

Think of statistical significance (and p-value) as resolution of your observation on an uncertain quantity. It only determines if your sample size is sufficiently large to distinguish an effect. Effect size (the size of the difference, association, or correlation) needs to be compared to some pre-determined reference value for the effect to be nontrivial.

The null is typically the hypothesis to be rejected. And hypothesis testing is useful only in this way.

- Don't fall for
**ecological fallacy**.

Conclusions cannot be made on individuals when data are at group-level.

- Don't focus only on mean values while ignoring
**variation and outliers**.

Although classic regression only cares about Conditional Expectation Function.

- Statistical significance of differences is not transitive.

Inference about the difference between two differences needs to be based on a single test on that exact quantity, not tests on the component differences.

- Don't pool data from different
**populations**(whenever distinguishable).

Two populations are distinct if they have distinct population-level attributes or unit-level attributes homogeneous with either population, or they have different mechanism that generates observed attributes. Combining samples from these populations may confound population-specific trends or mechanism.