The new milestones: do we need to take a step back to go a mile forward?

Acad Psychiatry DOI 10.1007/s40596-014-0213-9

IN DEPTH ARTICLE: COMMENTARY

The New Milestones: Do We Need to Take a Step Back to Go a Mile Forward? Mantosh Dewan & John Manring & Usha Satish

Received: 31 March 2014 / Accepted: 17 July 2014 # Academic Psychiatry 2014

Abstract The Milestones Project, like all previous systems and changes in graduate psychiatric education, for example, moving from 3 to 4 years of training or adopting six competency domains, has been devised without any supporting data and does not assess meaningful outcomes, such as improved patient outcomes. No evidence is presented that Milestonesbased training will produce better psychiatrists. There is a path forward. First, replace unproven expert consensus with scientific and evidence-based approaches. Second, exchange endpoints that are easy to assess but uncorrelated with real world functioning (e.g., multiple-choice examinations) for outcomes that are meaningful and external to the training program (e.g., patient outcomes). Finally, to prevent possible waste, excess burden, or harm, no changes should be mandated until proven in prospective studies. Keywords Milestones Project . Meaningful outcomes . Proven outcomes As we work to train better psychiatrists using Milestones [1, 2] as our guide, let us stop and think about the word milestones itself. Milestones suggest a journey through sequential, developmental achievements that culminates at a desired destination. Although each of the 324 milestones in Psychiatry is described in admirable detail [2], the desired destination, that is, a better psychiatrist in unsupervised practice, is never defined, and no meaningful, external criteria exist by which we could identify one [1, 3]. What does make a better psychiatrist? How will we know—and prove—that we got there?

M. Dewan (*) : J. Manring : U. Satish Upstate Medical University, Syracuse, NY, USA e-mail: [email protected]

In fact, first defining a destination and then charting our journey would make milestones meaningful. Without a destination defined in advance, it may be possible to confirm that a trainee has met or even exceeded a milestone without knowing if this leads to a meaningful, positive outcome in unsupervised practice, such as one’s patients getting better [3]. Will the Milestones Project lead to training better psychiatrists? We address this question in the context of previous changes in graduate psychiatric education and suggest adjustments on the basis of resulting insights.

Have Previous Changes Led to Better Psychiatrists? Like the Milestones, we have always been prescribed steps that were intended to lead to better psychiatrists. Fifty years ago, expert consensus declared that 3 years of training with broad exposure to clinical settings (inpatient, outpatient, consultation-liaison, etc.) and patients (diagnoses, socioeconomic issues, cultural diversity, etc.) were needed. Broad areas of knowledge and skills thought to be important were suggested. In the 1960s, a fourth year was added, first as an elective and then made mandatory, with 6 months dedicated to medicine and neurology [4]—again, by expert consensus but without any data that this longer journey would lead to a better psychiatrist. Next, in the 1980s, both clinical experiences and the specific “knowledge, skills, and attitudes” to be mastered were detailed in elaborate checklists, for example, at least 9 but not more than 18 months of inpatient psychiatry was required and history of psychiatry must be taught. Did all or any of these changes lead to increasingly more effective psychiatrists? In 1999, there was a paradigm shift toward competencybased training with six domains: Patient Care, Medical Knowledge, Practice-based Learning and Improvement,

Acad Psychiatry

Interpersonal and Communication Skills, Professionalism, and Systems-Based Practice [5]. The previous informal domains of knowledge, skills, and attitudes were incorporated in each of the six formal domains. Did these successive changes produce better psychiatrists? As Magen et al. [6] recently lamented, “Importantly, and perhaps dismayingly, there is almost no literature we know of that documents meaningful changes in residency graduates in terms of behavior, practice distribution, ethics, or other measures, despite all the changes in undergraduate and graduate medical education over the last 20 years” (p. 375).

The Essential Question Why, “despite all the changes” and thousands of papers (a Medline search of medical education found 16,277 papers from 1994 through 2013), is there “almost no literature we know of that documents meaningful changes in residency graduates”? In our view it is because we conducted these educational experiments incorrectly. We failed to state a specific experimental hypothesis and failed to study “the most relevant educational outcomes (effects on clinical practice and patient care, rather than measures of knowledge, skills, or attitudes)” ([7], pp. 918; [9]). We focused entirely on the input side during residency (e.g., what to do, where to do it, how long or how many times to do it). The outcome, which is to create increasingly better psychiatrists in unsupervised practice [1], was never defined or studied. There were no post-residency endpoints that were externally validated and meaningful, for example, did the unsupervised psychiatrist provide ethical care, help his or her patients get better, and make fewer errors. Because we did not complete this experiment appropriately, we do not have any data to support the effectiveness, harm, or wastefulness of any of these changes. There is an absence of evidence of meaningful effects rather than evidence of absence. Numerous studies [8] show how a change improved parameters such as student interest, faculty satisfaction, or most commonly, examination scores. While helpful, these do not correlate with the primary and most meaningful patient outcomes [7, 9–11].

have 65 threads, and 324 milestones distributed across 5 levels [3]. Will this transform psychiatric education and produce better psychiatrists? As currently structured, we will never know. The Milestones Project, like all previous changes, lacks the essential elements needed to answer this question. There are no validated, objective, meaningful outcome measures being collected at baseline that will be repeated in, say, 5 or 10 years after graduation, to show that the Milestones Project has made a difference. The paradigm shift underlying the Milestones Project and the NAS is the use of “measurement-based outcomes.” It is the first time that “outcomes” have been specifically incorporated into our educational system. Is this not enough? It is problematic on at least two counts. First, verifying completion of a milestone is a measure of adherence to a requirement on the input side. It is not an outcome. Mistaking input for outcome prevents us from building in true, meaningful outcome measures such as patient outcomes or adverse events in unsupervised practice [7, 9, 11]. Second, it has been consistently shown that “objective” assessments done by internal faculty are notoriously “subjective,” to the point of passing students who should have failed [10]. Milestones will indeed be better at tracking the developmental stage of a resident [12], but grade inflation will continue [13], and essentially all residents will be labeled “competent” and graduate as before. The medical education literature reassures us that familiarity breeds graduates. Will the elaborate Milestones Project fulfill its many promises? Norman et al. have a categorical response: “Regrettably, these declarations appear to be more a matter of faith than of evidence” [14].

The Path Forward No data show that the previous series of mandated changes have been beneficial or, worse, harmful. This must change. There are two essential and urgently required next steps. First, a big step forward: define competent functioning of a real world psychiatrist using validated, meaningful, national measures that are external to the training program. Second, a small step back: as with clinical medicine, evidence, and not expert consensus or faith, must guide whether we adopt changes in medical education [15].

Will the Milestones Project Lead to Better Psychiatrists? A Big Step Forward: Defining a Good Psychiatrist The Milestones Project of the Next Accreditation System (NAS) from the Accreditation Council for Graduate Medical Education (ACGME) [1] focuses on “measurement-based outcomes” and has created sophisticated, objective anchor points for assessing each of the six domains in all specialties. In psychiatry, the map that guides our journey consists of six competency domains with 22 subcompetency domains that

ACGME dictates curriculum and sets targets to achieve before entering unsupervised practice but does not set graduation standards [1, 3]. Residents become board-eligible and capable of independent practice after being declared competent in all domains by training directors [3]. However, even this is not an acceptable endpoint: appropriately, we entrust this to a

Acad Psychiatry

separate national body, the American Board of Psychiatry and Neurology (ABPN), that was designed to be independent of both ACGME and training programs. ABPN certifies psychiatrists via national board examinations. These have been criticized for not being objective (especially the old part II oral exam), being too weighted toward one competency (knowledge), and not being meaningful, because board results do not correlate with real world functioning, such as better patient outcomes [8, 9]. It is now recognized that the primary goal of medical education, including in psychiatry, is to produce physicians whose patients have good clinical outcomes [7, 9, 11, 15]. Training programs and individuals must be judged on this. However, there are no studies of the impact of residency training in psychiatry on the clinical outcomes of its graduates. Substantial differences among groups and training programs have been detected in other fields and can serve as models for psychiatry. Norcini et al. [9] studied mortality rates of 244,153 patients admitted for congestive heart failure or myocardial infarction treated by 6,113 physicians. They found a 9–16 % difference in patient outcome among groups depending on where the treating physician went to medical school. They write: “Much of the research on the competence of … graduates has focused largely on educational measures of quality. A more fundamental question is: Are there differences in clinical outcomes for patients cared for by these physicians?” ([9], p. 1462). In another example, Asch et al. [11] compared the complication rates for 4.9 million deliveries of 4,124 graduates from 107 US obstetrical residencies (after vaginal delivery, after cesarean delivery, and for all deliveries). They found that patients of graduates from the residencies in the bottom quintile had one-third more complications than graduates from the top quintile, showing meaningful differences in patient outcome depending on their residency program. Individuals, too, must be judged on clinical outcomes. Because clinical outcomes of individual graduates are difficult to obtain and become available years after training is completed, surrogate markers, such as performance on examinations, need to be developed. Surrogates are practical and easier to administer but are meaningful only if they accurately and reliably predict primary clinical outcomes [15]. The richness of independent practice will probably require a portfolio of surrogate markers [16]. For instance: Knowledge is easiest to assess, and passing a national examination is an acceptable minimum standard; however, examination results currently do not correlate with clinical functioning in the real world [8, 9, 11, 17, 18] so they cannot and must not remain the only measure. They can be made meaningful by repeatedly refining them until they reflect clinical outcomes. Clinical skills are of critical importance but difficult to measure. Multiple objective structured clinical examination (OSCE) stations with live, video, or standardized patients to evaluate a range of skills and judged by independent raters

make practical surrogates. These also become meaningful only when tied to clinical outcomes. Attitudes and professionalism can be assessed via 360° evaluations and (adverse) reports to the national data bank, state boards, or medical staff. Integrative functioning could be evaluated via Entrustable Professional Activities [19] using external examiners. Alternatively, Satish et al. [17] have shown that the bestranked psychiatry residents about to enter unsupervised practice could not be described by their examination scores but could be by a simulation exercise that has been validated for real world functioning. We need studies to determine which minimum combination of examinations, OSCEs, 360° evaluations, results on simulations, and so on best predict good patient outcomes in unsupervised practice.

A Small Step Back: Let Data Drive Changes in Medical Education As with clinical medicine, evidence must guide changes in medical education. Credible prospective studies must show positive results before new changes are dictated. As with any untested proposed change, there are concerns regarding the usefulness [15], validity [14], practicality [2, 14], and substantial added administrative burden [2] of implementing the Milestones. This is particularly concerning because there is no evidence that Milestones lead to our desired outcome. Do we need any Milestones? Or 324 milestones in psychiatry, when all of internal medicine has just 278? [20]. We do not know. What we do need are data. The burden of Milestones (dubbed “millstones” [14]) becomes even more important as we consider adding two more domains (with many more milestones) to the six domains we already have [21]. Interprofessional Collaboration aims to ensure the ability to engage in an interprofessional team in a manner that optimizes safe, effective patient- and population-centered care. Personal and Professional Development includes critical competencies such as trustworthiness, the ability to manage stress, flexibility, understanding one’s limits, and capacity for leadership [21]. Could we take a step back and collect meaningful data first? Instead of one major untested change every decade, we recommend the continuous quality improvement model. For instance, the Model for Improvement Plan-Do-Study-Act (PDSA) cycles for rapid-cycle tests of change asks: What are we trying to accomplish? How will we know if a change is an improvement? What changes can we make that will result in improvement? Data from PDSA cycles can then guide steady, demonstrably effective improvements. Specifically, we would define meaningful clinical measures for a psychiatrist in unsupervised practice. We would

Acad Psychiatry

determine how our existing six-domain, pre-Milestones system contributes to these outcome measures and identify gaps and shortcomings. We would then specifically modify the curriculum to address these deficits. This can also be done at the program level. For instance, to address a lack of services in rural areas, we would build a specific curriculum and rotations in rural/underserved psychiatry, with telepsychiatry as a key modality. Then we can monitor improvement in availability of services in these underserved areas, refine our intervention as necessary, and collect data again. Could we take a specific step back? As in any scientific system, could we pilot our new Milestones and prove that we produced better psychiatrists (for instance, that their patients had better clinical outcomes than the patients of those who were trained without Milestones) before we make this a national requirement? And could we show that eight domains are better than six before we institute them? Major changes are being instituted, including Milestones in graduate education and a new medical college admission test (MCAT) [22] in undergraduate education. There are no data to show that any of these changes will produce better physicians with improved clinical outcomes. To actually transform medical education, academic leaders must do three things. First, replace unproven expert consensus with scientific and evidence-based approaches. Studies are expensive but necessary; without them, unproven mandates distribute significant costs to all programs [2]. Recognizing this, the ACGME has taken bold steps to study the impact of a previously unproven mandate, restricted resident duty hours [15]. Second, exchange endpoints that are easy to assess but uncorrelated with real world functioning (e.g., multiple-choice examinations) for measures that are meaningful and external to the training program (e.g., patient outcomes). This is difficult, but new technologies and big data enable a move in this direction [9, 11], a direction that the ACGME plans as its next step and “the accreditation system after the ‘next accreditation system’” [10]. Third, academic leaders must recommend only those changes that are proven in rigorous studies before they are enforced. Only then will we be able to affirm— for the first time—that we are producing better physicians and psychiatrists today than ever before.

Implications for Academic Leaders • As we work to transform medical education, academic leaders are encouraged to move away from unproven expert consensus to new directions that are scientific and evidence-based. • Academic leaders must ensure that we reject endpoints that are easy to assess but uncorrelated with real-world functioning, such as multiplechoice examinations, and choose outcomes that are meaningful and external to the training program, such as patient outcomes. • Academic leaders must ensure that recommended changes are proven in pilot studies before they are enforced. This would give these changes a good chance to achieve the desired outcome and avoid implementation of changes that may be wasteful or even harmful.

Disclosure Dr. Dewan has received a grant funded by the Health Resources and Services Administration (HRSA) and receives royalties from American Psychiatric Press, Inc.; John Wiley; and Taylor & Francis Group, LLC. He serves as a consultant to Streufert Consulting, LLC. Dr. Satish has received grants funded by Syracuse University/Center of Excellence/EPA and serves as a consultant to Streufert Consulting, LLC.

References 1. Nasca TJ, Philibert I, Brigham T, et al.: The Next GME Accreditation System — rationale and benefits. NEJM 2012: 1–6. 2. Beresin E, Balon R, Coverdale J. The Psychiatry Milestones: new developments and challenges. Acad Psychiatry. 2014;38: 249–52. 3. www.acgme.org/acgmeweb/Portals/O/PDFs/Milestones/ PsychiatryMilestones.pdf ; November 2013. 4. Hollender MH, Kaplan EA. Psychiatry as part of a mixed internship. A report based on five years of experience. Arch Gen Psychiatr. 1965;12:18–22. 5. Leach DC. The ACGME competencies: substance or form? J Am Coll Surg. 2001;192:396–8. 6. Magen J, Richards M, Ley AF. A proposal for the “Next-Generation Psychiatry Residency”: Responding to Challenges of the Future. Acad Psychiatry. 2013;37:375–9. 7. Golub R. Medical education theme issue 2014. Editorial. JAMA. 2014;311:918. 8. Cooke M, Irby D, O’Brien B: Educating physicians: a call for reform of medical school and residency. Jossey- Bass, 2010 9. Norcini JJ, Boulet JR, Dauphinee WD, et al. Evaluating the quality of care provided by graduates of international medical schools. Health Aff. 2010;29:1461–8. 10. Nasca T, Weiss K, Bagian JP, et al. The accreditation system after the "next accreditation system". Acad Med. 2014;89:27–9. 11. Asch DA, Nicholson S, Srinivas S, et al. Evaluating obstetrical residency programs using patient outcomes. JAMA. 2009;302: 1277–83. 12. Fazio SB, Papp KK, Torre DM, et al. Grade inflation in the internal medicine clerkship: a national survey. Teach Learn Med. 2013;25:71–6. 13. Carter W: Milestone myths and misperceptions. J Grad Med Ed 2014 March 18–20. 14. Norman G, Norcini J, Bordage G: Competency- based education: milestones or millstones? J Grad Med Ed 2014, March, 1–6. 15. Dewan M, Manring J, Satish U: The Milestones Hypothesis. J Grad Med Ed (in press) 16. Carraccio C, Englander R. Evaluating competence using a portfolio: a literature review and web-based application to the ACGME competencies. Teach Learn Med. 2004;16:381–7. 17. Satish U, Manring J, Gregory R, et al. Novel assessment of psychiatric residents: SMS simulations. ACGME Bulletin. 2009;1:18–23. 18. Krishnamurthy S, Satish U, Foster T, et al. Components of critical decision making and ABSITE assessment: toward a more comprehensive evaluation. J Grad Med Educ. 2009;1:273–7. 19. ten Cate O. Entrustability of professional activities and competencybased training. Med Educ. 2005;39:1176. 20. www.acgme.org/acgmeweb/Portals/O/PDFs/Milestones/ InternalMedicineMilestones.pdf; 2012. 21. Englander R, Cameron T, Ballard A, et al. Toward a common taxonomy of competency domains for the health professions and competencies for physicians. Acad Med. 2013;88:1088– 94. 22. Kaplan R, Satterfield J, Kington R. Building a better physician—the case for the new MCAT. NEJM. 2010;366:1265–8.

A step forward back to (induced) fetal.

[It's time to take a firm step forward].

Do we need a new approach to making vaccine recommendations?

Isotretinoin and pregnancy prevention: do we need to take a long, hard look at ourselves?

Genotype-guided coumarin dosing: where are we now and where do we need to go next?

Clinical risk-stratification for prostate cancer: Where are we, and where do we need to go?

Expanded pharmacy practice: Where are we, and where do we need to go?

What steps do we need to take to improve diagnosis of tuberculosis in children?

Diagnosing sepsis: a step forward, and possibly a step back.

Happy to go the extra mile.

Ready to go the extra mile.

Advancing treatment of metastatic cancers: from research to communication--where do we need to go?

We now take an initial step to the world!

Multiantigenic subunitary vaccines against tuberculosis in clinical trials: Where do we stand and where do we need to go?

Do we need a new vaccine to control the re-emergence of pertussis?

Looping back to leap forward: transcription enters a new era.

Theory building, replication, and behavioral priming: where do we need to go from here?

Air Pollution Levels and Children's Lung Health. How Low Do We Need to Go?

Why do we need to age?

Chronic thromboembolic pulmonary hypertension: do we need a new definition?

Leadership in medicine: do we need a new approach?

Do we need a new offence of 'mercy killing'?

Sustaining a culture of safety: are we one step forward or three steps back?

Pregnant adolescents living with HIV: what we know, what we need to know, where we need to go.