Clinical Review & Education

JAMA Facial Plastic Surgery Guide to Statistics and Methods

Thoughtful Methods to Increase Evidence Levels and Analyze Nonparametric Data Lisa E. Ishii, MD, MHS

In this issue of JAMA Facial Plastic Surgery, Sapthavee et al1 present a retrospective cohort study designed to compare the effectiveness of skin graft reconstruction and local flap reconstruction for nasal defects. This article is an excellent example of how level of evidence (LOE) can be Related article page 270 increased from level 4 (case series) to level 3 (retrospective cohort study) by adding a comparison cohort group to the analysis (Table 1).2 It is also a good example of using a nonparametric statistical test, the Wilcoxon rank sum test, to evaluate nonparametric data.

Table 1. Levels of Evidence Provided by Different Study Typesa Level of Evidence

Study Type

1

High-quality, properly powered and conducted RCT; systematic review or meta-analysis of these studies

2

Well-designed controlled trial without randomization; prospective comparative cohort trial

3

Retrospective cohort study, case-control study, or systematic review of these studies

4

Case series with or without intervention; cross-sectional study

5

Expert opinion, case report, or bench research

Abbreviation: RCT, randomized clinical trial.

Observational vs Experimental Study Design

a

Sapthavee et al1 conduct an observational study rather than an experimental study. These 2 study designs are distinguished by whether a defined group is being observed without any intervention by the investigator (observational) or a group is distinguished by the presence or absence of an intervention (experimental).3 Examples of experimental study designs are randomized clinical trials (RCTs) and nonrandomized trials. While experimental study designs like RCTs are historically thought of as producing the highest levels of evidence, in facial plastic surgery, there are many cases where this type of study design is unrealistic. While there has been concern that observational studies are open to bias— whereby changes seen may be due to unmeasured differences other than the one under examination—it has been demonstrated that well-designed observational studies can hold value similar to that of experimental studies.4

Cohort Study Design The 3 types of observational studies include cohort, case-control, and cross-sectional studies. I focus herein on the cohort design used by Sapthavee et al.1 In a cohort study, the population of interest is defined based on the outcome of interest and observed until the outcome occurs. Subjects in the cohort all share something, such as an experience or an exposure. Cohort studies are particularly good for identifying associations between a cause and effect. One of the best-known medical cohort studies is the Framingham Heart Study.5 Cohort studies can be prospective or retrospective. Prospective studies have a defined starting point, and the researchers record events as they unfold in time, knowing that they are collecting data to be analyzed for the ongoing study. While prospective cohort studies are advantageous for the deliberate measuring of outcomes, they require a priori knowledge of the research plan. In retrospective cohort studies, the intervention of interest is defined in the present, and researchers look to records of the past to identify the associated outcomes data. The obvious limitation of retrospective analysis is the risk of missing outcome data jamafacialplasticsurgery.com

Adapted from Oxford Centre for Evidence-Based Medicine (http://www.cebm.net/2011-oxford-cebm-levels-evidence-introductory-document/).

not collected during the encounters; analysis is limited by the types of outcomes that were recorded at the time of the encounter. In the retrospective cohort study by Sapthavee et al,1 skin graft and local flap nasal defect reconstructions were the interventions of interest, and the researchers looked back in time to collect the medical records and photographs for those patients who underwent these interventions. The critical distinction between a cohort study (LOE, 3) and a case series (LOE, 4) is that a cohort study has a comparison group. In a cohort study, there is always an additional group that serves as a control group. Sapthavee et al1 selected the group of patients with nasal defects reconstructed with skin grafts as their control group to compare with the patients with similar nasal defects reconstructed with local flaps. An important requirement of cohort group selection is that the 2 groups be selected from the same sample population. In the current study, both groups were selected from a sample of patients with nasal defects of similar characteristics (eg, size, location, depth).

Parametric vs Nonparametric Statistical Tests The 2 general types of statistical tests are parametric and nonparametric. Parametric tests make specific assumptions about the data set under evaluation, such as the assumption that the data set is normally distributed; nonparametric tests do not make these assumptions.6 Generally, the types of data analyzed with parametric tests are continuous data, while the data analyzed with nonparametric tests are ordinal, ranked, or continuous data that are skewed or otherwise do not meet the normality assumptions of parametric data. Continuous data are those that can take any value within a range. Sapthavee et al1 were working with ordinal data: they asked reviewers to rate the appearance of the reconstructions for the 2 groups using an ordinal 5-point Likert scale.

(Reprinted) JAMA Facial Plastic Surgery July/August 2015 Volume 17, Number 4

Copyright 2015 American Medical Association. All rights reserved.

Downloaded From: http://archfaci.jamanetwork.com/ by a University of California - San Diego User on 09/02/2015

307

Clinical Review & Education JAMA Facial Plastic Surgery Guide to Statistics and Methods

Table 2. Parametric Tests for Significance and Their Nonparametric Counterparts Parametric

Nonparametric

t Test and independent samples t test

Wilcoxon rank sum test Mann-Whitney U test

Paired samples t test

Wilcoxon signed rank test

Wilcoxon Rank Sum Test Many investigators are familiar with the 2-sample t test, formerly known as the Student t test, which is a statistical test used to determine if there are significant differences between 2 samples. The Wilcoxon rank sum test is the nonparametric equivalent to the parametric 2-sample t test.7 Sapthavee et al1 appropriately selected the Wilcoxon rank sum test as opposed to the 2-sample t test to evaluate for differences in the Likert scale ratings for the 2 independent groups, local flap and skin graft. It would have been inappropriate to use the 2-sample t test for these ordinal data. The Wilcoxon rank sum test is valid for data with any distribution, and it is less sensitive to outliers than the 2-sample t test. The disadvantage to the Wilcoxon rank sum test, which is true for nonparametric tests in general, is that given the lack of assumptions about the data set, it is less robust and less sensitive than its parametric counterpart and so may require a larger sample size to detect differences, and it generally cannot be used to determine cause-effect relationships.8 Sapthavee et al1 did not see differ-

Conclusions In summary, a well-designed observational study can provide valuable data and is a highly desirable option for studies where experimental designs like RCTs and non-RCTs are not feasible. With careful consideration, such as that demonstrated by Sapthavee et al,1 it is possible to raise the LOE with the appropriate study design (Table 1). When considering statistical analysis, it is important to consider the type of data that were collected and select a statistical test appropriate for the specific data type. The Wilcoxon rank sum test is a statistical test to evaluate for differences in independent samples of ordinal data, while the Wilcoxon signed rank test is the equivalent test for paired sample data (Table 2).

ARTICLE INFORMATION

REFERENCES

Author Affiliation: Department of Otolaryngology– Head and Neck Surgery, Johns Hopkins School of Medicine, Baltimore, Maryland.

1. Sapthavee A, Munaretto N, Toriumi DM. Skin grafts vs local flaps for reconstruction of nasal defects: a retrospective cohort study [published online May 28, 2015]. JAMA Facial Plast Surg. doi:10 .1001/jamafacial.2015.0444.

Corresponding Author: Lisa E. Ishii, MD, MHS, Department of Otolaryngology–Head and Neck Surgery, Johns Hopkins School of Medicine, 601 N Caroline St, Ste 6231, Baltimore, MD 21287 ([email protected]). Section Editor: Lisa E. Ishii, MD, MHS.

2. Rhee JS. Evidence and quality initiative: moving beyond levels of evidence. JAMA Facial Plast Surg. 2015;17(2):80-81. 3. Benson K, Hartz AJ. A comparison of observational studies and randomized, controlled trials. N Engl J Med. 2000;342(25):1878-1886.

Published Online: May 28, 2015. doi:10.1001/jamafacial.2015.0465. Conflict of Interest Disclosures: None reported.

308

ences in the ratings scale data for the nasal defects reconstructed with local flaps vs skin grafts. Differences may exist, but the sample size may have been too small for the Wilcoxon rank sum test to detect them in the 2 groups. Notably, the Mann-Whitney U test is equivalent to the Wilcoxon rank sum test, ie, a nonparametric test used to analyze ordinal data from 2 independent samples to learn if they are significantly different. The Wilcoxon signed rank test is equivalent to the Wilcoxon rank sum test but is used instead to analyze differences in paired, rather than independent, samples. The Wilcoxon signed rank test is the nonparametric equivalent to the paired t test (Table 2).

4. Concato J, Shah N, Horwitz RI. Randomized, controlled trials, observational studies, and the hierarchy of research designs. N Engl J Med. 2000; 342(25):1887-1892.

5. Dawber TR, Meadors GF, Moore FE Jr. Epidemiological approaches to heart disease: the Framingham Study. Am J Public Health Nations Health. 1951;41(3):279-281. 6. Qualls M, Pallin DJ, Schuur JD. Parametric versus nonparametric statistical tests: the length of stay example. Acad Emerg Med. 2010;17(10):1113-1121. 7. Haldane JB. The Wilcoxon and related tests of significance. Experientia. 1956;12(6):205. 8. Wu P, Han Y, Chen T, Tu XM. Causal inference for Mann-Whitney-Wilcoxon rank sum and other nonparametric statistics. Stat Med. 2014;33(8): 1261-1271.

JAMA Facial Plastic Surgery July/August 2015 Volume 17, Number 4 (Reprinted)

Copyright 2015 American Medical Association. All rights reserved.

Downloaded From: http://archfaci.jamanetwork.com/ by a University of California - San Diego User on 09/02/2015

jamafacialplasticsurgery.com

Thoughtful Methods to Increase Evidence Levels and Analyze Nonparametric Data.

Thoughtful Methods to Increase Evidence Levels and Analyze Nonparametric Data. - PDF Download Free
109KB Sizes 3 Downloads 9 Views