EDITORIALS

Note:  This copy is for your personal non-commercial use only. To order presentation-ready copies for distribution to your colleagues or clients, contact us at www.rsna.org/rsnarights.

116

Practice Corner The Science and Art of Measuring the Impact of an Article1 Suresh Maximin, MD • Douglas Green, MD “Not everything that can be counted counts, and not everything that counts can be counted.” —William Blake Cameron

Measuring the impact of a scholar’s work is no mere academic exercise. To the scholar, impact metrics are essential for grants and promotions. A publisher relies on impact metrics to attract submissions and subscriptions. Impact metrics inform institutional decisions about hiring and funding. Impact is the currency of the academic economy. Impact is best measured at the level of the individual article (1). The article-level impact can then be used to calculate author-, journal-, and institutional-level impact. But what should article-level metrics be measuring? That is, what “counts”? A paper published in a research-oriented journal such as Radiology stands on the validity of its results and the utility of its conclusions (2). A paper in an educational journal such as RadioGraphics stands on its effectiveness in helping readers become better radiologists and teachers. It is fair to say that none of the available articlelevel impact metrics count what counts. This has not deterred bibliometricians from striving to develop impact metrics that do count what counts. A look at the past may aid comprehension of present and potential future trends in impact metrics.

Impact 1.0

Traditional measures of impact date from an era when scientific communication was stored on paper. Eugene Garfield created the first citation index in the 1960s. Workers scanned reference lists and collated citations to published articles (3,4). A high citation count implied that an article was impactful. However, citation counts have important limitations. Citation counts measure the extent to which particular articles are integrated into the

Abbreviation: JIF = journal impact factor RadioGraphics 2014; 34:116–118 Published online 10.1148/rg.341134008 From the Department of Radiology, University of Washington School of Medicine, 1959 NE Pacific St, Seattle, WA 98195. Received November 2; accepted November 7. The authors have no financial relationships to disclose. Address correspondence to S.M. (e-mail: [email protected]). 1

©

RSNA, 2014

literature, not the validity of those articles. Thus, an article that purportedly linked vaccination to autism but that was later discredited has over 1700 citations. Citations within articles that report contradictory results or contain rebuttals are counted. An article may be cited solely for its methods, not its results. Citation counts also can be gamed by self-citing authors. Traditional citation indices do not weight the quality of the citing article. Finally, citations reflect the activity of authors, who represent less than 1% of an article’s users (5). Garfield, along with Irving Sher, also conceived of the journal impact factor (JIF), defined as the ratio of the number of citations of any items in a journal to the number of source articles published in that journal (6). The JIF was intended to enable a comparison of journals within the same field. However, it quickly became a surrogate measure for the impact of an article, because citation counts took time to accrue and were expensive to obtain. This use of the JIF was problematic for several reasons. First, a small number of articles with metastatic citations can skew the JIF upward. Second, the JIF varies widely according to the number of active, publishing scholars in the particular field; for example, citations of an article published in a biology journal will be far more numerous than those of an article published in a mathematics journal (7). Last, the JIF can be gamed by publishers, who have been known to encourage authors to cite other articles published in their journals to boost the JIF (8).

Impact 2.0

Now that scientific communications are stored on networks, the way impact is measured is changing. Citation counts are now freely available from sources such as CrossRef, Scopus, PubMed Central, ISI Web of Science, and Google Scholar. These sources use software programs that can identify and automatically extract all references, regardless of format (3). Newer measures of impact include usage metrics and alternative metrics (“altmetrics”). These impact metrics have two significant advantages over citation counts. First, they track the entire population of an article’s users, not just authors. Second, they are continuously updated. However, they also have limitations.

RG  •  Volume 34  Number 1

Basic usage metrics include the number of views of an HTML article page and the number of downloads of the article in PDF or XML format. In the past, RadioGraphics has provided these kinds of metrics, allowing readers to see which articles have been “most read” within a specified period of time and allowing authors to see how often their articles have been viewed. Public Library of Science (PLoS) journals have gone one step further, permitting readers to see exactly how many users have viewed or downloaded an article and to compare that number with a reference standard for other articles published in the same subject area. However, these basic usage metrics do not indicate whether an article was actually read. Advanced usage metrics that chart how long a user dwells on a page in a browser window and record the user’s “highlighting” of specific sections of text may somewhat compensate for that deficit. The MESUR project is working to clear away the roadblocks obstructing the collection of such usage data and to increase our understanding of that type of data (9). Alternative metrics, or altmetrics, measure nontraditional usage. They track social media activity; count bookmarking “saves”; record nonscholarly citations in blogs, news articles, and wikis; and follow community input in the form of comments and ratings (1). Social media activities include the posting of “tweets” on Twitter and “likes” and “shares” on Facebook pages. RadioGraphics readers can share links to articles via Twitter and Facebook by clicking on an icon in the browser window. Social media buzz marks attention to an article (ie, popularity), which may be less a function of the intrinsic value of the article than it is a function of the nature of “the journal or venue it appears in, the community built around the journal, and how the scholarly information is marketed around the journal” (10). The popularity of social media sites varies over time, and the significance of a tweet or a “like” depends on the month and year. Popular social bookmarking sites include CiteULike, Connotea, and Mendeley. Users can save a reference to one of these sites, then access the reference from any of their networkconnected devices. The sites have the added value of directing users to other articles of interest by using predictive algorithms, such as those used by Amazon and Netflix. However, saving a reference to one of these sites signals an intention to read but does not mean that an article was in fact read. Nonscholarly citations include blogs, news articles, and wikis. Science blogs posted to ResearchBlogging are formatted to allow citation analysis, a feature that attracts the professional science writer. Such blogs broadcast the hallway conversations

Maximin and Green  117

of the “invisible college,” the virtual community of network-connected scientists. These postings can be counted. Wikipedia citations are presented in a standardized format and are therefore easily counted. PLoS allows readers to see how many blogs and Wikipedia articles cite an article that they are viewing; interested readers can then “drill down” to see what is being said in those blogs and written about in Wikipedia (which is what counts). News stories that do not clearly identify the articles on which the reporting is based make counting more difficult (1). Community input can take several forms. Commenting threads that follow an article are a familiar feature of online publications. Internet-enabled commercial Web sites such as eBay and Amazon make extensive use of community ranking systems; physicians who search for themselves on the Web will quickly realize that patients are ranking them by using the same systems. However, community rating systems and commenting threads do not usually provide information about the reputations of those producing the ratings or comments. Although PLoS journals have incorporated both features, readers are not using them (1).

Impact 3.0

In the “information-rich, user-engaged environment” that we are moving toward, impact measurements will incorporate many more variables. In the article “The New Metrics of Scholarly Publication,” Michael Jensen listed among these variables the reputation of the publisher (if any), the reputation of prepublication peer reviewers (if any), the reputation of commenters, the percentage of the document that is quoted by other documents, links to the document weighted by the reputation of the source of the links, attention (the nature of the language, either positive or negative, in blogs or comment threads, with the nature of the language in the discussion, either positive or negative, noted), the quality of the author’s institutional affiliation, the significance of the author’s other work, the significance rating of all the information that the author has used in creating the document, and the inclusion of the document “in lists of ‘best of,’ in syllabi, indexes, and other human distillations” (11). Obviously, it will be a while before all these tools are available, and longer, still, before we are able to determine whether they count what counts.

Using Experts to Determine What Counts

As Nobel laureate and metric skeptic Sidney Brenner argued, “What matters absolutely is the scientific content of a paper, and nothing will substitute for either knowing or reading it.” It is not possible for an individual to know or read

118  January-February 2014

all the work by a prolific author, let alone all the content included in a journal or published by an organization. Still, we need to know whether that work counts. Passing through prepublication peer review does not mean that an article counts. An article that was rejected by several journals can get published eventually, especially online, where the cost of publication may be lower than it is for print. An intentionally bogus manuscript from a nonexistent academic institution can pass through the peer-review process of multiple journals (12). There are even fraudulent journals with sham peer review processes (13). Postpublication peer review, however, is a promising new development. Postpublication peer review is an Internet-enabled combination of community rating systems and commenting threads. Only prequalified authorities are allowed to comment. An example is Faculty of 1000 (F1000), a subscription service that has enlisted 6000 expert scientists and clinical researchers (radiology is not yet one of the clinical specialties included), assisted by 5000 designated associates, to canvas 4000 peer-reviewed journals, score articles (as good *, very good **, or exceptional ***), and provide brief comments (14). This model has two substantial advantages over traditional peer review: because it opens up the process to a larger set of reviewers, it is scalable; in addition, it is transparent. Transparency reduces the risk that a reviewer may criticize an article for reasons other than its content but heightens the risk that a reviewer who fears blowback may temper criticism of the article.

Meta-Impact: Is Counting Really Such a Good Idea?

One problem in measuring impact is the “observer effect,” the change in measurements, states, or activities that is caused by the very act of observation. The effect has been noted in many fields, including quantum physics (the quantum Zeno effect), behavioral psychology (the Hawthorne effect), and programming (the Heisenbug). When measuring the impact of scientific output, there can be little doubt that an important effect of that measurement will be an increase in whatever is being measured. The effect may be negative (eg, encouraging publishers to “game” citation counts) or positive (eg, encouraging authors to maximize the accessibility and relevance of their work so as to improve social media altmetrics). Of greater concern is the possibility that emphasizing particular metrics of impact may affect

radiographics.rsna.org

the direction in which science evolves. Herding behavior, rushing to the hottest and latest, is best known in behavioral finance but also occurs in scientific research. Modern metrics might potentiate and intensify this behavior by steering scholars to “hot” fields. Still, there is a perceived need for assessments of scientific impact. Bibliometricians of the future may duplicate the feat of the baseball sabermetricians and devise a suite of moneyball metrics for scholarly work. Until that time, we will have to form our own opinions by reading the sources ourselves, consulting other experts who have established a reputation as trustworthy peer reviewers, and eavesdropping on hallway chatter in the invisible college.

References 1. Binfield P. Article level metrics at PLoS & beyond [Webcast]. http://www.sparc.arl.org/news/now-online -peter-binfield-webcast-article-level-metrics. Published April 12, 2012. Accessed October 25, 2013. 2. Ioannidis JP. Evolution and translation of research findings: from bench to where? PLoS Clin Trials 2007;1(7):e36. 3. Adam D. The counting house. Nature 2002;415 (6873):726–729. 4. Jiménez-Contreras E, Delgado López-Cózar E, RuizPérez R, Fernández VM. Impact-factor rewards affect Spanish research [comment on Adam D, The counting house]. Nature 2002;417(6892):898. 5. Buschman M, Michalek A. Are alternative metrics still alternative? Bull Am Soc Inf Sci Technol 2013; 39(4):35–39. 6. Garfield E. The agony and the ecstasy: the history and meaning of the journal impact factor. Presented at the International Congress on Peer Review and Biomedical Publication, Chicago, Ill, September 16, 2005. 7. Mabe MA. Looking at metrics: counting and what counts. http://www.casalini.it/retreat/2013_docs/7 _Mabe.pdf. Accessed October 25, 2013. 8. Arnold DN, Fowler KK. Nefarious numbers. Not Am Math Soc 2011;58(3):434–437. 9. Bollen J, Van de Sompel H, Rodriguez MA. Towards usage-based impact metrics: first results from the MESUR project. JDCL ’08. Proceedings of the 8th ACM/IEEE-CS Joint Conference on Digital Libraries, Pittsburgh, Pa, June 16–20, 2008. 10. Eysenbach G. Can tweets predict citations? Metrics of social impact based on Twitter and correlation with traditional metrics of scientific impact. J Med Internet Res 2011;13(4):e123. 11. Jensen M. The new metrics of scholarly authority. Chron High Educ 2007;53(41):B6. 12. Science’s Sokal moment. The Economist. October 5–11, 2013. 13. Looks good on paper. The Economist. September 28–October 4, 2013. 14. Faculty of 1000 Web site, http://f1000.com/prime ?hp=1. Accessed October 25, 2013.

Practice corner: the science and art of measuring the impact of an article.

Practice corner: the science and art of measuring the impact of an article. - PDF Download Free
162KB Sizes 0 Downloads 0 Views