Translating statistical findings into plain English
2 pages
Published by
raf
Copyright :
All rights reserved
Comment
www.
thelancet.
com Published online April 16, 2009 DOI:10.
1016/S0140-6736(09)60499-2 1
Translating statistical findings into plain English
Clinical trial reports usually give estimates of treatment
effects, their confidence...
[More]
Comment
www.
thelancet.
com Published online April 16, 2009 DOI:10.
1016/S0140-6736(09)60499-2 1
Translating statistical findings into plain English
Clinical trial reports usually give estimates of treatment
effects, their confidence intervals, and p values.
The
statistical methods and their technical meaning are
well established.
There is less clarity about the concise
interpretative wording that authors should use,
especially in the abstract and conclusions and by others
in commentaries.
The following guidance assumes that
one short sentence needs to capture the essence of a
trial’s findings for the primary endpoint.
Various scenarios can arise (figure).
Scenario A has
the treatment effect very highly statistically significant
(p<0·001); in, for example, the comparison of
everolimus with placebo for progression-free survival
in advanced renal-cell carcinoma.
1
Such strong evidence
provides proof of treatment efficacy beyond reasonable
doubt, justifying the statement “everolimus prolongs
progression-free survival”.
However, even extreme
p values are not definitive proof.
Scenario B has greater uncertainty even though the
(artificial) barrier of p<0·05 is reached: eg, the LIFE trial.
2
There is some evidence of efficacy but 0·01<p<0·05
means the play of chance (ie, no true effect) cannot be
dismissed, and the lower confidence limit close to zero
means the true effect might be small.
Hence some doubt
is appropriate: “treatmentX seems superiortotreatment
Y” or “patients receiving treatment X had significantly
fewer primary events”.
The absolute benefit and its
confidence interval3
are an important guide to clinical
interpretation.
In LIFE,2
treatment with losartan led to
4·1 fewer cardiovascular events per 1000 patient-years
than did atenolol (95% CI 0·6–7·6, p=0·021), which
is small enough to justify “losartan confers modest
benefits”.
Scenario C casts further doubt on whether true efficacy
exists, with a p value slightly above 0·05: eg, the TORCH
trial4
with p=0·052 for mortality.
Remember the correct
interpretation of p values:5
statistical significance is on
a continuous scale, the smaller the p value the stronger
the evidence, and p<0·05 is an arbitrary cutoff with
no rational justification.
p=0·049 and p=0·051 carry
essentially the same information, but in view of the
misguided (but seemingly inevitable) wish to interpret
them differently, some extra doubt can be expressed
when p is slightly above 0·05.
Such weak evidence
means treatment X “might be superior” or “this trial is
inconclusive”.
TORCH’s conclusion that “the reduction in
death…didnotreachthepredeterminedlevelofstatistical
significance” seems too guarded.
The word “trend” is
sometimes used in this context, but is best avoided
because it implies special pleading when evidence is slim.
After all, authors usually decline to mention trends in the
opposite (harmful or “wrong”) direction.
Scenario D depicts the disappointing situation in
which the p value is quite large (eg, p=0·3), which
indicates no evidence of a treatment difference, and
one concludes “the trial did not show superiority” or
“treatment X seems not to be superior”.
However, if the
trial was too small (underpowered) to reliably detect
clinically important effects, one might state there was
insufficient evidence and the trial was “inconclusive”.
p>0·05 should not be labelled as a “negative” finding,
because the possibility of a true treatment difference
cannot be dismissed.
Equally the label “positive” trial is
best avoided when p<0·05.
Non-inferiority (or equivalence) trials, designed to
examine whether a new treatment has comparable
efficacy to an active control, are increasingly common
and present particular interpretive challenges.
6
Key is
whether the 95% CI for the primary endpoint’s treatment
difference excludes a prespecified non-inferiority
margin, δ: any true inferiority less than δ is deemed
acceptable.
Scenario E presents such an outcome, the LEADERS
trial7
that compared new and standard coronary
p<0·001
p=0·02
p=0·06
p=0·3
pNI=0·02
pNI=0·2
Strong evidence
Some evidence
Weak evidence
No evidence
Evidence of non-inferiority
Insufficient evidence
“Is superior”
“Seems superior”
“Might be superior”
“Seems not superior”
“Seems non-inferior”
“Inconclusive whether
non-inferior”
0
New treatment
better
New treatment
worse
A
Superiority trials
Non-inferiority trials
B
C
D
E
F
δ
Figure: Six scenarios for primary endpoint of randomisedtrial comparing new and standardtreatment groups
Each displays estimated treatment difference and its 95% CI, p value, strength of evidence, and appropriate
comment for use in conclusions.
For non-inferiority scenarios, non-inferiority margin δ is shown, and pNI is for
consequent test of non-inferiority.
Published Online
April 16, 2009
DOI:10.
1016/S01406736(09)60499-2
[Less]
Insert a miniCalaméo on your website or your blog