DELTA2 guidance on choosing the target difference and undertaking and reporting the sample size calculation for a randomized controlled trial part 1

DELTA2 guidance on choosing the target difference and undertaking and reporting the sample size calculation for a randomized controlled trial part 1#

https://lh4.googleusercontent.com/pno7K44oyUgCTg8M6IuAgTSCyqV_GWwY1Cx8Oky6SMjnzEgzQlLYdtP3wmXBmjnLOttizR_SfRkzN6T6mUumMsJDvsBlhFmfW8o0mZnuFJ8qiCD_57m-8NDlrzAEeStdHL64lusoLoBWIP7t1jdtHbk

2023년 07월 31일

Information#

논문 Link
- https://www.bmj.com/content/363/bmj.k3750
DELTA2 Project Link
- https://www.ndorms.ox.ac.uk/csm/delta2-guidance

논문 소개#

DELTA2 Guidance는 RCT(Randomised Clinical Trial) 임상 연구에서 목표하는 차이값(target difference)과 이에 따라 계산되는 표본 수 산출(sample size calculation)에 대한 Guidance로 University of Oxford의 Centre for Statistics in Medicine 에서 2019년에 최종 버전이 발표되었습니다.

이 논문에서는 이 Guidance의 주요 내용(target difference 설정 및 표본수 산출)에 대해 설명하고 있습니다.

The DELTA2 project#

the United Kingdom’s Medical Research Council/National Institute for Health Research Methodology Research Programme 의 지원을 받아 진행되었음
기존 Guidance(Delta guidance)의 개정 (target difference, sample size calculation을 다룸)
- on specifying and reporting the target difference (the effect size)
  - in the sample size calculation of a randomized controlled trial.

Box 1: DELTA2 recommendations for researchers undertaking a sample size calculation and choosing the target difference#

Target difference을 찾기 위해 우선 문헌 고찰을 시작할 것
- Relevant literature can: relate to a candidate primary outcome or the comparison of interest, and
  - inform what is an important or realistic difference
    - for that outcome, comparison, and population.
주 평가변수 후보지(Candidate primary outcomes)를 검토할 것
- the corresponding sample size explored.
- Where multiple candidate outcomes are considered,
  - the choice of the primary outcome and target difference should be based on
    - consideration of the views of relevant stakeholder groups (eg, patients), as well as
    - the practicality of undertaking such a study with the required sample size.
  - The choice should not be based solely on
    - which outcome yields the minimum sample size.
    - Ideally, the final sample size will be sufficient for all key outcomes,
      - although this is not always practical.
The importance of observing a particular magnitude of a difference in an outcome
결과변수의 차이값 크기를 특정하는 것(Particular magnitude of a difference)은 중요함
- with the exception of mortality and other serious adverse events,
- cannot be presumed to be self evident.
- Therefore, the target difference for all other outcomes needs
  - additional justification to infer importance to a stakeholder group.
The target difference for a definitive trial (eg, phase III) should be one
- considered to be important to at least one key stakeholder group.
목표 차이값은 필요한 값의 최소값(the minimum value)보다 커도 무방함
- that would be considered important
- if a larger difference is considered a realistic possibility or would be necessary to alter practice.
추가 연구가 필요할 경우, anchor and opinion seeking method가 더 나은 방법임
- the anchor and opinion seeking methods are to be favored.
  - “anchor method”: The anchor method is also called “external reference” method, that is, one anchor is selected as the external indicator and examines the relationship between scores on the instrument whose interpretation is under question (the target instrument) and some independent measure (an anchor)
  - The anchor design of anchor-based method to determine the minimal clinically important difference: a systematic review
- The distribution method should not be used.

The anchor and the distribution method 관련 link

Distribution- and anchor-based methods to determine the minimally important difference on patient-reported outcome questionnaires in oncology: a structured review
Specifying the target difference based solely on a
- standardized effect size approach should be considered a last resort,
Where additional research is needed to inform what would be a realistic difference,
- The opinion seeking and the review of the evidence base methods are recommended.
- Pilot trials are typically too small to inform what would be a realistic difference and
  - primarily address other aspects of trial design and conduct.
기존 연구(existing studies) 결과치(e.g., 탐색 연구 pilot trial)를 표본수 계산에 활용할 것
- that are part of the sample size calculation.
- For example, a pilot trial can be used
  - to inform the choice of the standard deviation value for a continuous outcome and
  - the control group proportion for a binary outcome,
  - along with other relevant inputs such as the amount of missing outcome data.
Sensitivity analyses, used in the sample size calculation, should be carried out.
표본수 계산에 사용된 값(e.g., the target difference, 대조군 반응율 등)들에 대한 민감도 분석(Sensitivity Analysis)을 수행해야 함
- Sensitivity Analysis: “a method to determine the robustness of an assessment by examining the extent to which results are affected by changes in methods, models, values of unmeasured variables, or assumptions”
  - A tutorial on sensitivity analyses in clinical trials: the what, why, when and how
- which consider the effect of uncertainty around key inputs
  - (eg, the target difference and the control group proportion for a binary outcome)
Specification of the sample size calculation, including the target difference,
표본수 계산에 사용된 값들(e.g., target difference)은 양식에 따라 관련 문서(임상시험 계획서 등)에 보고되어야 함

The target difference and sample size calculations in randomised controlled trials#

RCT 연구에서의 목표 차이값과 표본수 산출#

표본수 산출(sample size calculation)의 역할
- 이 연구에 몇 명의 피험자가 필요한지 결정
- 주 평가변수(primary outcome) 기준으로 정함
- It is typically achieved by
  - specifying a target difference for the key (primary) outcome
  - that can be reliably detected and the required sample size calculated
The precise research question that the trial is primarily set up to answer
- will determine what needs to be estimated in the planned primary analysis,
- which is known formally as the “estimand”
  - The target difference should be a difference that is appropriate for that estimand.
The target difference should be viewed as important by
- at least one (and preferably more) key stakeholder groups—
  - that is, patients, health professionals, regulatory agencies, and healthcare funders.
- In practice, the target difference is not always formally considered and
  - in many cases appears, at least from trial reports, to be determined on convenience, the research budget, or some other informal basis.
The target difference can be expressed as an
- absolute difference
  - (eg, mean difference or difference in proportions) or
- relative difference
  - (eg, hazard or risk ratio)
- is also often referred to, rather imprecisely, as the trial “effect size
Statistical calculation of the sample size is far from an exact science
통계적인 피험자수 산출의 한계 (가정 assumption이 사용되고 가정된 값의 차이에 민감)
- Firstly, investigators typically make assumptions
  - that is a simplification of the anticipated analysis.
  - For example, the impact of adjusting for baseline factors is difficult to quantify upfront,
    - and even though the analysis is intended to be an adjusted one
      - (such as when randomisation has been stratified or minimized),
    - the sample size calculation is often conducted on the basis of an unadjusted analysis.
- Secondly, the calculated sample size can be sensitive to the assumptions made in the calculations
  - a small change in one of the assumptions can lead
    - to substantial change in the calculated sample size.
  - Often a simple formula can be used to calculate the required sample size.
통계적 가설 검정 setting에서 제1종 오류와 제2종 오류 사이에서 균형을 맞춰야 함
- the risk of incorrectly concluding that there is a difference (Type I error)
  - when no actual difference between the treatments exists,
- with the risk of failing to identify a meaningful treatment difference when the treatments do differ(Type II error)

[Example]

In CXR 121 study, we set a hypothesis for comparison of Readers’ performance on Test 1(without CAD) and Test 2(with CAD).

H0: Test1 AUC = Test2 AUC (NO DIFFERENCE)

H1: Test1 AUC != Test2 AUC (THERE IS DIFFERENCE)

Rejecting H0 means we have stronger evidence toward H1, and we make the decision to reject H0 by p-value from the study result if p-value < 0.05.

(e.g. There is a difference between Test 1 and Test 2 , so that we can claim for performance improvement from our study).

[Type I error ]

If H0 is true in reality(e.g., no difference between Test 1 and Test 2), but

we reject H0 from the study results(p-value < 0.05), then we made the wrong decision that there was a difference.

[Type II error]

We fail to reject the null hypothesis(no difference) from the study results(p-value > 0.05), but actually there is difference(favoring H1), then we made the wrong decision that there is no difference

Under the conventional approach, referred to as the statistical hypothesis testing framework
- the probabilities of these two errors are controlled by setting
  - the significance level (type I error) and
  - statistical power (1 minus type II error) at appropriate levels
  - (typical values are two sided 5% significance and 80% or 90% power, respectively).
- Once these two inputs have been set, the sample size can be determined given
  - the magnitude of the between group difference in the outcome it is desired to detect
    - (the target difference).
- The calculation (reflecting the intended analysis) is conventionally done
  - on the basis of testing for a difference of any magnitude
A key question of interest is what magnitude of difference can be ruled out.
- The expected (predicted) width of the confidence interval can be determined
  - for a given target difference and sample size calculation,
- The required sample size is very sensitive to the target difference.
  - Under the conventional approach,
    - 차이값을 절반으로 설정하면, 표본수 산출 결과는 4배로 커짐(*아래 공식 참조)
  - Appropriate sample size formulas vary depending on
    - the proposed trial design and
    - statistical analysis
더 복잡한 연구 시나리오의 경우, Simulations도 사용 가능함
It is prudent to undertake sensitivity calculations to assess
가정값들 확인을 위해 민감도 분석, 계산 (sensitivity calculations) 을 하는 것이 현명함
- the potential effect of misspecification of key assumptions such as
  - the control response rate for a binary outcome or
  - the anticipated variance of a continuous outcome