Suggestions for Researchers

*The most common error
in statistics is to assume that
statistical procedures can take the place of sustained effort.*

Good and Hardin (2006) Common Errors in Statistics, p. 186

The hard part, and the one where training is so poor, is the a priori thinking about the science of the matter before data analysis -- even before data colleciton. It has been too easy to collect data on a large number of variables in the hope that a fast computer and sophisticated software will sort out the important things -- the "significant" ones (the "just the numbers" approach). Instead, a major effort shoud be mounted to understand the nature of the problem by critical examination of the literature, talking with others working on the general problem, and thinking deeply about alternative hypotheses. Rather than "test" dozens of trivial matters ... there must be a more concerted effort to provide evidence on meaningful questions that are important to a discipline. This is the critical point: the common failure to address important science questions in a fully competent fashion.

Burnham and
Anderson (2002) Model Selection and
Multimodel Inference, pp. 144 - 145

- Decide what questions you
will be studying.
- Trying to study too many
things at once is likely to create problems with multiple testing, so
it may be wise to limit your study.
- If you will be gathering
data, think about how you will gather
*and*analyze it*before*you start to gather the data. - Read reports on related
research, focusing on problems that were encountered and how you might
get around them and/or how you might plan your research to fill in gaps
in current knowledge in the area.
- If you are planning an
experiment, look for possible sources of variability and design
your experiment to take these into account as much as possible.
- The design will depend on
the particular situation.
- The literature on design
of experiments is extensive; consult it.
- Remember that the design affects what method of
analysis is appropriate.
- If you are gathering
observational data, think about possible confounding factors and plan
your data gathering to reduce confounding.
- Be sure to record any time and spatial variables present,
or any other variables that might influence outcome, whether or not you
initially plan to use them in your analysis.
- Also think about any
factors that might make the sample biased.
- You may need to limit
your study to a smaller population than originally intended.
- Think carefully about
what measures you will use.
- If your data gathering
involves asking questions, put careful thought into choosing and phrasing them. Then
check them out with a test-run and revise as needed.
- Think carefully about how you will randomize (for an experiment) or sample (for an observational study).
- Think carefully about
whether or not the model assumptions
of your intended method of analysis are likely to be reasonable.
- If not, revise either
your plan for data gathering or your plan for analysis, or both.
- Conduct a pilot study to
trouble shoot and obtain variance estimates for a power
analysis.
- Revise plans as needed.
- Decide on appropriate levels of Type I and Type II error, taking into account consequences of each type of error.
- Plan how to deal with multiple inferences, including "data snooping" questions that might arise later
- Do a power
analysis to estimate what sample size you need to detect meaningful
differences.
- Take into account any relevant considerations such as multiple inference, Intent-to-Treat analysis or how you plan to handle missing data.
- Revise plans as needed.
- If you plan to use
existing data, modify the suggestions above, as in the suggestions
under Item II(b) of Data Snooping. See
Burnham and Anderson (2002) Model
Selection and Multimodel Inference for advice on model
selction. If
your interest is causal inference, see Rubin,
Donald B. (2008), For objective causal inference, design trumps
analysis, Annals of Applied
Statistics 2(3), 808 - 840.
- For additional suggestions, see Chapter 8 of van Belle (2008), Statistical Rules of Thumb.

- Before doing any formal analysis, ask whether or not the model assumptions of the procedure are plausible in the context of the data.
- Plot the data (or residuals, as appropriate) as possible to get additional checks on whether or not model assumptions hold.
- If model assumptions appear to be violated, consider transformations of the data, or use alternate methods of analysis as appropriate.
- If more than one statistical inference is used, be sure to take that into account by using appropriate methodology for multiple inference.
- If you use hypothesis tests, be sure to calculate corresponding confidence intervals as well.
- But be aware that there may also be other sources of uncertainty not captured by confidence intervals.
- Keep careful records of
decisions made in data cleaning and in using software.
^{1}

*Critics
may complain that we advocate interpreting reports not merely with a
grain of
salt but with an entire shaker; so be it. ... Neither society nor we
can afford
to be led down false pathways*.

Good and Hardin (2006), Common Errors in Statistics, p.
119

David
Freedman (2008, p. 61)

- Aim for
transparency and reproducibility.
- Include enough detail so
the reader can critique both the data gathering and the analysis.
- Look for and report
possible sources of bias
^{2}or other sources of additional uncertainty in results. - For more detailed
suggestions on recognizing and reporting bias, see Chapter 1 and
pp. 113 - 115 of Good and Hardin (2006). All of Chapter 7 of that book
is a good supplement to the suggestions here.
- Consider including a
"limitations" section, but be sure to reiterate or summarize the
limitations in stating conclusions -- including in the abstract.
- Include enough detail so
that another researcher could replicate both the data gathering and the
analysis.
- For example, "SAS Proc
Mixed was used" is
*not*adequate detail. You also need to explain which factors were fixed, which random, which nested, etc. Refer to the notes you have made when performing the analysis. - If space limitations do
not permit all the detail needed to be included in the actual paper,
provide them in a website to accompany the article.
- Some journals now include
websites for supplementary information; publish in these when
possible.
- When citing sources, give
explicit page numbers, especially for books.
- Include discussion of
*why*the analyses used are appropriate - i.e., why model
assumptions are well enough satisfied for the robustness criteria for
the specific technique, or whether they are iffy.
- This might go in a
supplementary information website.
- If you do hypothesis
testing, be sure to report p-values (rather than just phrases such as
"significant at the .05 level")
*and**also*give confidence intervals. - In some situations, other
measures such as "number to treat" would be appropriate. See pp.
151 - 153 of van Belle (2008)
- Be careful to use
language (both in the abstract and in the body of the article) that
expresses any uncertainty and limitations.
- If you have built a
model, be sure to explain the decisions that went into the selection of
that model
- See Good and Hardin
(2006, pp. 181 – 182) for more suggestions
- For more suggestions and
details, see
- Chapters 8 and 9 of van
Belle (2008)
- Chapters 7 and 9 of Good
and Hardin (2006)
- Harris et al (2009)
- Miller (2004)
- Robbins (2004)
- Strasak et al (2007)

K. P. Burnham and D. R. Anderson (2002), Model selection and Multimodel Inference: A Practical Information-Theoretic Approach, 2nd ed., Springer

D. Freedman (2008), Editorial: Oasis or Mirage?, Chance v. 21 No 1, pp. 59 -61

P. Good and J. Hardin (2006), Common Errors in Statistics (and How to Avoid Them), Wiley

Harris, A. H. S., R. Reeder and J. K. Hyun (2009), Common statistical and research design problems in manuscripts submitted to high-impact psychiatry journals: What editors and reviewers want authors to know, Journal of Psychiatric Research, vol 43 no15, 1231 -1234

Miller, Jane (2004), The Chicago Guide to Writing about Numbers: The Effective Presentation of Quantitative Information, University of Chicago Press

Robbins, N. (2004), Creating More Effective Graphs, Wiley

Strasak, A. M. et al (2007), The Use of Statistics in Medical Research, The American Statistician, February 1, 2007, 61(1): 47-55

van Belle, G. (2008) Statistical Rules of Thumb, Wiley

Notes:

1. For more discussion, see:

- K. Baggerly and D. Berry, Reproducible
Research, AMSTATNEWS Science Policy Column, January 2011

- A. Gelman (2010) and commentators, Forensic bioinformatics, or, Don't believe everything you read in the (scientific) papers, and references therein.

"The only way to have real success
in science ... is to describe the
evidence very carefully without regard to the way you feel it should
be. If you have a theory, you must try to explain what's good and
what's bad about it equally. In science, you learn a kind of standard
integrity and honesty.

What Do You Care What Other People Think? (1988) p. 217

What Do You Care What Other People Think? (1988) p. 217

There is one feature I notice that
is generally missing in "cargo cult
science"... It's a kind of scientific integrity, a principle of
scientific thought that corresponds to a kind of utter honesty —
a kind of leaning over backwards... For example, if you're doing an
experiment, you should report everything that you think might make it
invalid — not only what you think is right about it... Details
that could throw doubt on your interpretation must be given, if you
know them. ... If you make a theory, for example, and advertise it, or
put it out, then you must also put down all the facts that disagree
with it, as well as those that agree with it. ... In summary, the idea
is to try to give all of the
information to help others to judge the value of your contribution; not
just the information that leads to judgment in one particular direction
or another. ... The first principle is that you must not fool yourself
-- and you are the easiest person to fool. So you have to be very
careful about that.

"Cargo Cult Science", adapted from a commencement address given at Caltech (1974)"

"Cargo Cult Science", adapted from a commencement address given at Caltech (1974)"

Last updated September 8, 2012