This article was downloaded by: [University of West Florida] On: 03 October 2014, At: 23:08 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Accountability in Research: Policies and Quality Assurance Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/gacr20

Analysis of Three Factors Possibly Influencing the Outcome of a Science Review Process a

a

John Araujo Ph.D. M.H.S.A. , Neelam D. Ghiya M.P.H. , Angela a

Calugar M.D. M.P.H. & Tanja Popovic M.D. Ph.D.

a

a

Office of the Associate Director for Science, Office of the Director, Centers for Disease Control and Prevention , Atlanta , Georgia , USA Published online: 14 Jan 2014.

To cite this article: John Araujo Ph.D. M.H.S.A. , Neelam D. Ghiya M.P.H. , Angela Calugar M.D. M.P.H. & Tanja Popovic M.D. Ph.D. (2014) Analysis of Three Factors Possibly Influencing the Outcome of a Science Review Process, Accountability in Research: Policies and Quality Assurance, 21:4, 241-264, DOI: 10.1080/08989621.2013.848798 To link to this article: http://dx.doi.org/10.1080/08989621.2013.848798

PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/termsand-conditions

Downloaded by [University of West Florida] at 23:08 03 October 2014

Accountability in Research, 21:241–264, 2014 ISSN: 0898-9621 print / 1545-5815 online DOI: 10.1080/08989621.2013.848798

Analysis of Three Factors Possibly Influencing the Outcome of a Science Review Process John Araujo, Ph.D., M.H.S.A., Neelam D. Ghiya, M.P.H., Angela Calugar, M.D., M.P.H., and Tanja Popovic, M.D., Ph.D. Office of the Associate Director for Science, Office of the Director, Centers for Disease Control and Prevention, Atlanta, Georgia, USA We analyzed a process for the annual selection of a Federal agency’s best peer-reviewed, scientific papers with the goal to develop a relatively simple method that would use publicly available data to assess the presence of factors, other than scientific excellence and merit, in an award-making process that is to recognize scientific excellence and merit. Our specific goals were (a) to determine if journal, disease category, or major paper topics affected the scientific-review outcome by (b) developing design and analytic approaches to detect potential bias in the scientific review process. While indeed journal, disease category, and major paper topics were unrelated to winning, our methodology was sensitive enough to detect differences between the ranks of journals for winners and non-winners.

INTRODUCTION In science, one of the several functions served by review is the assurance of scientific excellence. The principle of scientific review, and especially peer review, establishes this function. The implementation of the review process (whether for publication in journals or awarding funding, as the two most frequent reasons for conducting the review) is a challenge well recognized both by scientists and those conducting the review (Fuller, 2002). Peers, in the sense of “mutual trust and respect” and recognized expertise (Fuller, 2002), are particularly suited for the assurance function of scientific review. However, in some sense,

This article not subject to US copyright law. Address correspondence to Tanja Popovic, Deputy Associate Director for Science, CDC, 1600 Clifton Road, MS D50, Atlanta, Georgia 30333, USA. E-mail: [email protected] Color versions of one or more of the figures in the article can be found online at www.tandfonline.com/gacr.

Downloaded by [University of West Florida] at 23:08 03 October 2014

242

J. Araujo et al.

the “peer” component of scientific review is absent when that review is conducted by prestigious scientists with broad knowledge in an area of inquiry but not necessarily recognized expertise in the specific topics or disciplines within that area on which they are invited to review, even when these scientists have garnered the “mutual trust and respect” of their science peers. Thus, it becomes particularly important and useful to evaluate this type of departure from the usual peer review in science to determine if this departure still can uphold the assurance function of scientific excellence. Assessing a review process can be especially difficult when the review process already has been institutionalized. Even so, if it is possible to develop an approach to evaluate a review process, then certain conclusions might be possible about the assurance of scientific excellence, which in turn could be matched to a principle of scientific review. Thus, this intriguing and challenging question of assessing a review process is the purpose of this paper. The motivation for this study arose from the review process associated with the most prestigious science award available to scientists at the U.S. Centers for Disease Control and Prevention (CDC)–The Charles C. Shepard Science Award. This science award seeks to recognize the premier science conducted by CDC scientists, or in collaboration with scientists around the world, and the award attests to scientific excellence via the published work of CDC’s scientists. The award has been made annually since its inception in 1986. The awardees, one from each of three public health science categories—Assessment and Epidemiology (AE), Laboratory and Methods(LM), or Prevention and Control (PC)—are selected via formal nomination by the agency’s organizational units. The review process is conducted by a cross-cutting matrix of agency scientists who carry out reviews in one of each of the three public health science categories. Their objective is to identify the best representation of scientific excellence among a heterogeneous set of papers within each of these public health science categories based on specific criteria related to scientific merit (originality, difficulty, efficiency or quality, and clarity) and impact (importance and significance). This means that no one reviewer will be a peer with recognized expertise (regarding merit and impact) among all the heterogeneous cadre of papers within a public health science category; even so, there is little doubt that identifying excellence is aided by assembling the agency’s preeminent scientists (Honjo, 2005) to serve as reviewers. We leveraged 10year publication patterns of the published papers nominated for this award and among which the three annual winners are selected. We analyzed publicly available data (i.e., publication metadata) and constructed comparisons from these naturally occurring data that appeared to support reasonable, parsimonious conclusions about the susceptibility of the process to potential bias that could compromise the integrity of this review process. Our goals were to develop a relatively simple method that would use publicly available data and is able to assess the presence of factors, other than scientific excellence and

Downloaded by [University of West Florida] at 23:08 03 October 2014

Influencing the Outcome of Scientific Review

merit, in an award-making process that is to recognize scientific excellence and merit. The distinction that we draw between our work and others is that we were not attempting to independently verify and validate the actual decision about the award; rather our assessment was to determine if three factors that should not be involved into the award-making process had, in fact, seeped into it. Further, we wanted to share this approach so that other reviewers interested in evaluating their review processes could adopt, adapt, or extend this approach to their own review processes. Bornmann and colleagues also have assessed the panel review process. However, their approach is based on actually assessing the scientific impact of research, while we were interested in a question of factors that could affect scientific award making, other than the excellence of the paper itself (Bornmann et al., 2008; Bornmann et al., 2010; Bornmann, 2012; Bornmann et al., 2012).

METHOD Data Collection Approach CDC maintains a cumulative, year-by-year electronic bibliography of finalists’ papers (including winners’ papers, which are a subset of the finalists’ papers) for the Charles C. Shepard Science Award (SSA) (Office of the Associate Director for Science, 2011). This SSA bibliography arises from the nomination and review process that produces the annual set of finalists in each of the three public health science categories, from approximately 3,000 publications that CDC staff author each year. The three public health science categories are AE, LM, and PC. AE papers report on “applied studies characterizing diseases and other health-related parameters and address some aspect of an infectious or chronic disease or other health conditions (e.g., injuries).” “Studies that address methodological problems or describe new methods or procedures” are in the LM space, and PC papers arise from “applied studies of the control or prevention of disease and other health conditions.” The nomination and review process for selecting finalists occurs within CDC’s major programmatic units: centers, institute, or offices (CIO).1 The scientific review process for selecting winners from the CIO finalists is carried out by an average of 15 agency scientists (drawn from across CDC) per public health science category—for a total of 45 reviewers. The entries in this bibliography of finalists’ papers eligible for winning in the SSA years from 2001 through 2010 (n = 574) were the publications analyzed to study the scientific review process for selecting winners. Via Perl scripting (Wall, 2007; Berman, 2007) we constructed a tag search strategy from the SSA bibliography to retrieve the bibliographic citation record (BCR) from MEDLINE® /PubMed® (hereafter “MEDLINE”) for each finalist

243

Downloaded by [University of West Florida] at 23:08 03 October 2014

244

J. Araujo et al.

SSA paper indexed in MEDLINE (U.S. National Library of Medicine, 2007b). Author, title, and publication year data from each entry in the CDC SSA bibliography were the basis for the tag search strategy, which returned the BCR data in Extensible Markup Language (XML) format from MEDLINE (U.S. National Library of Medicine, 2008). The total number of SSA papers indexed in MEDLINE was 559 (or 97% of all finalists’ papers) for the 10 years (2001–2010). The remaining 15 papers not indexed in MEDLINE were on topics outside the MEDLINE scope (i.e., “journal articles in life sciences with a concentration in biomedicine”). Examples of SSA papers not indexed in MEDLINE appeared in outlets such as the Journal of Hydrologic Engineering, The American Statistician, and Human and Ecological Risk Assessment. Since the scope of public health science extends beyond “biomedicine” in the “life sciences,” it was not surprising that all of the nominations were not indexed in MEDLINE. The XML format of the citation record returns elements, or metadata, indexed in MEDLINE that were used in our analyses and included all of the usual citation elements (e.g., authors, article title, journal name and issue, International Standard Serial Number [ISSN], etc.) as well as the Medical Subject Headings (MeSH® ) indexed with each BCR. MeSH® make up the “controlled vocabulary thesaurus” of the National Library of Medicine used to index articles appearing in MEDLINE (U.S. National Library of Medicine, 2011). MeSH® are the concepts tagged (i.e., attached or indexed) to each citation record and by which articles can be identified in MEDLINE via a tag search strategy. This means, for example, that articles on “population surveillance” could be identified by creating a search query that would ask for all BCRs tagged (indexed) with the “population surveillance” MeSH® term. In addition to BCR data from MEDLINE, we obtained a 2010 (year) copy of the entire MeSH® vocabulary in XML format and the 2010 MeSH® tree structure in American Standard Code for Information Interchange (ASCII) format (U.S. National Library of Medicine, 2007a), which enabled analyses of relationships between (a) paper category and winner selection and (b) paper major topics and winner selection. How we defined category and major topics appears below. Journal Impact Factor (JIF) is one of at least 39 indicators of publication influence, and JIF may represent degrees of “prestige” vs. “popularity” (Bollen et al., 2009). JIF “is a measure of the frequency with which the ‘average article’ in a journal has been cited in a particular year or period” (Thomson Reuters, 2013); specifically, in our case, the average number of times papers published in a particular journal are cited over a previous two-year period (Garfield, 1999). For the purpose of our analyses, JIF was analyzed because agency scientists may view it as both an indicator of scientific merit and impact as well as a possible “reason” why a paper wins. The JIF data set for 2008 was used in our analyses and provided by Thomson Reuters (Thomson Reuters, 2011).

Influencing the Outcome of Scientific Review

Downloaded by [University of West Florida] at 23:08 03 October 2014

When required by the analytic techniques, MEDLINE and Thomson Reuters data were linked via journal ISSN or via journal title, when JIF could not be assigned via ISSN.2 Thomson Reuters JIF data did not cover 10 journals that published 13 SSA articles. The XML MEDLINE BCR and MeSH® data were mapped into SAS data tables with SAS XML Mapper (SAS Institute Inc., 2009), and then data sets were constructed with SAS 9.2 (SAS Institute Inc., 2008). Thomson Reuters’ data were read directly into SAS data sets.

Analytic Framework, Indicators, and Analytic Methods Analytic Framework We created an analytic framework driven by three overarching indicator questions related to the selection of winners from the finalists, which were based on generally accepted indicators of scientific merit and impact at CDC. Each of the indicator questions is developed and described in the Indicators section below. Indicators The indicators we analyzed have been recognized both formally and informally by agency scientists as related to the merit and impact of agency science. We questioned if the same indicators also could have a certain influence during the process of selecting the winners; in other words was the winner selected entirely for its scientific merit, or were other factors, such as the journal in which the article appeared, influencing the selection of winners. Thus, the design and analysis tasks were to separate these two attributions: an indicator as a “cause” of winning (which is a bias-like affect) vs. an indicator as a “correlate” of winning (which is not a bias-like affect), or even an indicator unrelated to winning. To perform this separation of attributions (because we could not perform the preferred experimental design approach), we developed an approach using three indicator questions to understand the scientific review process for the selection of the winners after the three pools of finalists (i.e., AE, LM, and PC), were determined from the nomination, review, and selection processes. This process occurred within CIOs of the agency, with guidance from the Office of the Associate Director for Science (OADS) (e.g., the maximum number of finalists per CIO derived from overall number of peer-reviewed publications per CIO, authorship, and publication year of the paper in a peer-reviewed journal). Identifying the finalists, including the assignment of papers to one of the three SSA public health science categories, was an internal CIO responsibility, and the CIO identification and narrowing processes were not within the scope of this paper. It also is clear that this analysis began with the outputs

245

Downloaded by [University of West Florida] at 23:08 03 October 2014

246

J. Araujo et al.

of the CIO nomination process, and any factors that might have systematically skewed a CIO nomination process, over the 10 years from which the study data were drawn, were not known to this analysis. Indicator Question 1: Journal Influence. Journal influence was defined by both frequency and JIF. For frequency, the unit of analysis was the journal title,3 which led to tabulating paper frequencies by journal title and then ranking the journals by the number of papers published in them. The paper frequencies by journal title produced rank-frequency profiles of journals for winners and non-winners for analysis with an approach based on Zipf’s empirical law (Salton and McGill, 1983). Zipf’s law, referenced in lexical statistics (i.e., the “statistical study of the distribution of words and other units” in a corpus) (Baroni and Evert, 2006), states that the product of rank and frequency approximately equals a constant: This means that as the numerical rank moves from the first position (i.e., from 1, the highest-rank, to V, the lowest rank,), then the frequency of each journal title decreases.4 The method of ordinary least squares regression provides a simple interpretation of the rank-frequency relationship as a slope. It compares the slopes by testing the equality of two slopes (Edwards, 1976), where each slope represents a rankfrequency relationship—one of the winners and the other of the non-winners. The rank-frequency relationships are illustrated in Table 1. The rank-frequency slopes may vary depending upon the distribution of frequencies. Thus, it is possible to determine if the journal frequency distributions for winners and non-winners were similar or not. If the slopes of the two plots are not statistically different from each other they would be parallel meaning that the journal title did not influence (in the meaning of “cause”) winner selection. When the slopes are not parallel, they obviously diverge on the y-axis and two possibilities exist. If the slope of the non-winners is steeper than that of the winners (a situation arithmetically defined as a positive difference), this suggests that the journal title did not influence winner selection. Alternatively, if the difference between the slopes is negative (meaning that the winners slope is steeper than that of the non-winners) then the distribution of winners’ papers is heaped in higher, frequency-ranked journals as compared to non-winners. We also compared the mean JIFs for non-winners and winners’ papers and used the outcome to indicate if winning was a JIF-biased event based on the sign (+/-) of the difference, if it was detected. If the average winners’ JIF was equal to or significantly less than the non-winners’ JIF, then we concluded that JIF was not an influencer during the winner selection process. The remaining, greater-than result, however, does not make it possible to differentiate the two attributions, cause or correlate, of winning. Matching papers based on a common attribute, such as the same journal titles for both winners and non-winners, creates a degree of similarity that must be overcome by paper characteristics in order to distinguish between

247

Frequency

Journal of the American Medical Association 58 The New England Journal of Medicine 46 American Journal of Epidemiology 20 Lancet 20 The Journal of Infectious Diseases 20 The American Journal of Tropical Medicine and Hygiene 18 American Journal of Preventive Medicine 15 Journal of Virology 11 Science 8 Analytical Chemistry 5 Vaccine 2 Inquiry: Journal of Medical Care Organization 1 AIDS Research and Human Retroviruses 1 Paper Totals 225 JIF (M ± SD) 23.790 ± 17.753

Journal

Finalists

1 2 3 3 3 4 5 6 7 8 9 10 10

Rank

53 37 19 16 19 17 13 10 5 4 1 0 0 194 23.237 ± 17.613

Frequency

1 2 3 5 3 4 6 7 8 9 10

Rank

Non-Winners

5 9 1 4 1 1 2 1 3 1 1 1 1 31 27.247 ± 18.529

Frequency

Winners

2 1 6 3 6 6 5 6 4 6 6 6 6

Rank

Note. These journals, publishing SSA winners’ papers, served as the basis for creating a matched set of journal titles for analyzing the two journal influence indicators: JIF and frequency-based rank. Reported are the mean JIFs as well as the number of finalists, non-winners, and winners’ papers for each journal, from 2001–2010, which created rank-frequency profiles for each of these three competition categories. Except for 2010, there was one winner in each of the SSA public health science categories. In 2010, there was a two-way tie in the PC science category. JIF = Journal Impact Factor. M = Mean. PC = Prevention and Control. SD = Standard Deviation. SSA = Shepard Science Award.

31.718 50.017 5.454 28.409 5.682 2.450 3.766 5.308 28.103 5.712 3.298 .528 2.024

JIF

Table 1: The 13 journals publishing SSA winners’ papers

Downloaded by [University of West Florida] at 23:08 03 October 2014

Downloaded by [University of West Florida] at 23:08 03 October 2014

248

J. Araujo et al.

them. Matching on the common attribute of journal title could be, in our view, the single most significant source of bias in the winner selection process. Matching also is a form of statistical control for a factor that could be introduced into winner selection. While matching based on a common attribute (which for the indicator of journal influence was journal title) only begins to approximate our preferred approach (i.e., the a priori assurance that all finalists’ papers were equivalent on potential sources of selection bias), it is not equivalent to matching followed by random assignment. An analysis based on matching followed by actual random assignment to winner and non-winner groups, however, would be impossible for this research because the winner selection process itself determines winner and non-winner groups, as described previously. Therefore, we worked from the premise that the indicator of journal influence could reveal potential sources of bias if we created an appropriate analytic data set. For the 574 papers that entered the SSA competition, a journal in which both winner and non-winner have been published was used as a common attribute. There were 31 winners that were published in 13 different journals, and 194 non-winners that also published in the same 13 journals, bringing our analytic data set to 225. The potential influence of the journal in which a paper has been published could be measured by slope analysis or by JIF and could suggest if winning and a journal influence indicator were related. Indicator Question 2: Paper Disease Category. CDC has a broad role that covers the entire spectrum of diseases and conditions that affect the public, and not only a more known, but still narrower focus on infectious diseases. We used the broadest and most widely recognized indexing method based on MeSH® . Papers indexed in MEDLINE are assigned MeSH® terms—which is a controlled, structured vocabulary of approximately 26,000 biomedical concepts or topics in the life sciences—based on the concepts or topics covered in those papers. The terms in this controlled vocabulary are grouped to form organizational schemas, and the most general grouping of terms forms 16 categories, such as “Anatomy,” “Organisms,” and “Diseases,” as three examples of the 16 (U.S. National Library of Medicine, 2011). Because each category within itself forms a “branching” hierarchical arrangement of concepts with increasing specificity, categories also carry the name “trees.” Since MeSH® terms are assigned to one or more trees, via these terms, we could determine the tree or trees to which each SSA paper belonged and then follow the branching hierarchy of the “Diseases” tree to determine if the paper could be classified as IF or NIF or about another (i.e., “Other”) non-disease topic, when all of the MeSH® terms for the major concepts in an SSA paper were not located in the “Diseases” tree. The general algorithm for this approach is illustrated in Fig. 1. The details of this general approach follow.5 Numerous MeSH® terms are assigned to any paper indexed in MEDLINE. These MeSH® terms are called “descriptors,” and when the concept represented

Downloaded by [University of West Florida] at 23:08 03 October 2014

Influencing the Outcome of Scientific Review

Figure 1: The approach for paper categorization by disease category. The approach leverages

the MeSH® terms assigned to each SSA paper and then identifies the trees in which those terms appear, leading to categorization as IF or NIF or a non-disease (i.e., Other) topic.

by the “descriptor” is refined, a “qualifier” is attached to the MeSH® term (cf. Table 2). The indexing process, where MeSH® terms are assigned, also specifically identifies the major concepts covered in each paper. A majorconcept indicator (i.e., “Y” or “N”) is attached to each indexed term. A major concept can be represented as a stand-alone, unqualified “descriptor” concept or can be represented by the combination of descriptor and qualifier together (refer to Table 2 for an example of each of these representations of a major concept). Even though a major topic can be represented by a standalone or a qualified MeSH® term (via the use of qualifiers), only stand-alone terms, or descriptors, can be located in a tree. Therefore, all MeSH® terms denoting major concepts and assigned to each SSA paper were “de-qualified” and de-replicated, as necessary, in order to place each bare, major concept MeSH® term within all of its trees. If at least one of these major concept terms appeared in the “Bacterial Infections and Mycoses,” “Virus Diseases,” or “Parasitic Diseases” branch of the “Diseases” tree, then the SSA paper was classified IF. If one of the major concept terms was located within any of the other branches of the “Diseases” tree and none of the major concept terms

249

250

Condoms HIV Infections

Health Knowledge, Attitudes, Practice Persuasive Communication Sexual Behavior

N N

Y

Y

Y Y

N

Qualifier Major Topic

Statistics & Numerical Data

Utilization Prevention & Control

Physiology

Qualifier Name Adolescent Behavior/ Physiology Condoms/Utilization HIV Infections/ Prevention & Control Health Knowledge, Attitudes, Practice Persuasive Communication Sexual Behavior/ Statistics & Numerical Data

Descriptor Qualified

Health Knowledge, Attitudes, Practice Persuasive Communication Sexual Behavior

Condoms HIV Infections

Adolescent Behavior

Tree Search String

F01.145.802

F01.145.209.631

F01.100.150.500

E07.190.270.150 C02.782.815.616.400

F01.145.022

Tree Set: Location 1

L01.143.762

N05.300.150.410

C02.800.801.400

Tree Set: Location 2

C20.673.480

Tree Set: Location 3

Name and Qualifier Name capitalization complies with MeSH® . Note. An SSA article (Kennedy et al., 2000) illustrating the application of the disease categorization method. This paper was indexed in MEDLINE with 19 MeSH® entries; however, the major paper topics were indexed by three descriptors (Adolescent Behavior; Health Knowledge, Attitudes, Practice; and Persuasive Communication) and three qualified descriptors (Condoms/Utilization; HIV Infections/Prevention & Control; and Sexual Behavior/Statistics & Numerical Data), indicated with Y’s. These six unique descriptors were located within one or more “trees,” and since at least one major concept MeSH® term appeared in the “Virus Diseases” branch (i.e., C02), this SSA article was classified IF.

∗ Descriptor

N

Y

Adolescent Behavior

Descriptor Name∗

Y

Major Topic

Table 2: An example of the application of the paper disease categorization method

Downloaded by [University of West Florida] at 23:08 03 October 2014

Downloaded by [University of West Florida] at 23:08 03 October 2014

Influencing the Outcome of Scientific Review

appeared in any one of the three IF branches, then the paper was classified NIF. If none of the major concept terms appeared in the “Diseases” tree, then the paper was classified “Other,” as a non-disease paper. Examples of “Other” topics can be adduced from inspecting Fig. 1. A few specific “Other” topics, as examples, from the MEDLINE-indexed data set are “War” (Spiegel and Salama, 2000), “Clinical Laboratory Techniques” (Shahangian and Cohn, 2000), and “Health Insurance” (Decker, 2009). The paper categorization approach based on MeSH® is illustrated in Table 2 with one of the 2001 SSA IF papers (Kennedy et al., 2000). To validate the MEDLINE-based algorithm for paper disease categorization, which used MeSH® terms and the associated tree structures, a senior scientist experienced in peer review (i.e., the senior author and a subject matter expert [SME]) independently categorized SSA winners’ papers (n = 31) by examining each paper title and then making the disease categorization assignment of IF or NIF. Because SME categorization only assigned papers to IF or NIF categories, MEDLINE “Other” winners’ papers (n = 6) were assigned the NIF category for the reliability analyses. Degree of reliability between the two approaches (categorization by title or categorization by MEDLINE) was based on overall percentage agreement (84%) and the computation of kappa (.63) (Cohen, 1960). The analysis consisted of comparing the observed frequencies of papers in each paper disease category and winning status to their expected frequencies which were derived as if paper disease category and winning status were independent of each other. The data set for this analysis consisted of all papers with MeSH® assigned (n = 557) and used the chi square distribution as the basis for the test of independence between paper disease category and winning status. We would conclude that paper disease category and winning status were independent of each other if observed and expected frequencies were not significantly different from each other. Indicator Question 3: Paper Major Topics. The third question leveraged the major topic (i.e., concept) determination method that was described and completed for question 2. This means that all of the bare major topics in all of the SSA papers indexed in MEDLINE were identified. Since a paper could contribute multiple topics, counting a topic was equivalent to counting a “paper” in these analyses, and we did not attempt to limit or normalize the number of major topics that were covered in each paper. A winning topic was indexed (but not uniquely) to at least one winning paper because winning topics did not just appear in winning papers. Among the 557 papers there were 1,144 unique major topics. We compared the observed frequency of major topics per paper and winning status to their expected frequencies using the chi square distribution as a decisional aid. Next we identified all topics (n = 15) with at least two winners to determine if winners published differentially on these topics compared

251

Downloaded by [University of West Florida] at 23:08 03 October 2014

252

J. Araujo et al.

to non-winners. A topic with at least two winners means that the winning outcome, if independent of the topic, reasonably would be an equal number of winners and non-winners, or a 50% chance of winning, assuming substantially equivalent scientific merit and impact and no biasing factors. Therefore, we compared the expected chance of winning at 50% to the observed average actual winning percentage for the 15 major topics. The observed average actual winning percentage was calculated by computing the winning percentage for each year the topic appear and averaging the winning percentages only using the years in which the major topic appeared. For example, the major topic of Rotavirus Vaccines appeared in 2002 and 2010. Its winning percentage in each year was 100%, making the average actual winning percentage also 100%. In addition to the comparing the average actual and unbiased estimates of winning, we examined the frequencies with which these 15 topics were associated with winning using the chi square distribution as a decisional aid. Because of the small frequencies associated with these 15 major topics, the exact test of significance was computed as described by R. A. Fisher. The data serving as the basis for major topic analyses appear in Table 3. We did not use the chi square test to establish statistical significance of the frequencies of the winning topics, but to indicate to what extent each of the 15 major topics was expected to be in a winning paper. The larger the chi square value for a topic is, the more unexpected the number of winners with that topic is (regardless of whether one would expect more or less papers with that topic). When the percentage of topics that were the finalists and then went on to be winning papers is high (and the chi square value is large) that translates to a topic winning more than expected. When the percentage is low, while the chi square value can still be large, that translates to a topic being in fewer winning papers than expected.

Analytic Design Approach The entire analysis centered on how to distinguish most clearly between the two attributions, cause or correlate, of winning based upon analyzing three indicators of scientific merit and impact. The design we were handed by the longstanding SSA history could support the analysis and conclusion of correlation, but not causation, because we could not devise an experimental design that would not disrupt the peer review process itself that was under evaluation. However, we could discard the notion of cause, on an indicator-by-indicator basis, if the indicator was not correlated with winning. This means that if a correlation did not exist, then it would be reasonable to conclude that the indicator was not biasing the selection process by “causing” the selection of winners. Interpreting the indicator of journal influence leveraged one-tailed statistical tests, because if journal “prestige” (or “popularity”) or rank were the same (i.e., no difference) or less among winners compared to non-winners, then it would seem unlikely that this type of journal influence impacted the selection

253

2 2 2 2 3 3 7 15 8 6 6 9 17 36 43 Total: 161

Hemagglutinin Glycoproteins, Influenza Virus Orthomyxoviridae Infections Rotavirus Vaccines War AIDS Serodiagnosis Antigens, Viral Influenza A Virus, H1N1 Subtype Influenza, Human Pneumococcal Vaccines Developing Countries Influenza Vaccines Diarrhea HIV-1 Disease Outbreaks HIV Infections

0 0 0 0 1 1 4 10 6 4 4 7 15 34 40 Total: 126

Non-Winners (n)

2 2 2 2 2 2 3 5 2 2 2 2 2 2 3 Total: 35

Winners (n)

.4 .4 .4 .4 .7 .7 1.5 3.3 1.7 1.3 1.3 2.0 3.7 7.8 9.3

Winners Expected (n)

5.6 5.6 5.6 5.6 2.8 2.8 1.4 .9 .0 .4 .4 .0 .8 4.3 4.3 Total: 40.7

Winners X 2

100.0 100.0 100.0 100.0 75.0 66.7 56.3 36.9 33.3 30.0 30.0 26.7 9.3 6.5 4.9 Average: 51.7%

Winners (%)

Note. There were 15 major topics associated with at least two winning papers, of 1,144 major topics in 557 papers. The number of expected winning papers (“Winners Expected”) and “Winners X 2 ” contribution were derived as if paper disease category and winning status were independent of each other. “Winners (%)” was calculated using the actual years in which the major topic appeared, as described previously. It is important to recall that papers often contributed more than major topic so that paper totals in this table will not match the actual paper totals.

Finalists (n)

Major Topic

Table 3: The number of papers by winning status for the 15 major topics associated with at least two winning papers

Downloaded by [University of West Florida] at 23:08 03 October 2014

Downloaded by [University of West Florida] at 23:08 03 October 2014

254

J. Araujo et al.

of winners. The same reasoning framework was applied to the chances of winning based on the paper major topic: If chances of winning were not improved for papers covering certain topics, then it is possible to rule out their favor in the scientific review process. This is not equivalent to asserting that if certain topics were favored, then the scientific review process favored them, because the strongest relationship that we could test was correlational. Because we wanted a confirmation that a correlation was detectable by our methods, we adjusted critical values obtained at p ≤ .05 to critical values obtained at p ≤ .01. Doing so could indicate if we were accepting the conclusion of no correlation when in fact there was an association between one of the indicators and winning, when the decision point was between these two thresholds. In this situation, this technique could serve as a positive control for our data analytic methods. It is important to note, however, that the existence of a correlation, or association, when α was adjusted to less than .01, is not necessarily causal for any of the indicators analyzed but merely suggests that our methods could detect the existence of a correlation or association in the data set, when the decision acceptance criterion makes it less difficult not to find the correlation or association (or more difficult to reject the null hypothesis).

RESULTS Overview. The Shepard Science Award fielded 574 finalists’ papers published in peer-reviewed journals for the award years of 2001–2010: AE papers accounted for 48% (n = 274) of the total, followed by LM (30%, n = 170) and then PC (23%, n = 130). Annual paper totals ranged from 45 in 2001 to 72 in 2007. The cell frequencies of public health science category by year were not different, X 2 (18, N = 574) = 6.79, p > .05. Indicator Question 1: Journal Influence. Finalists and winners published in 181 and 13 different journals, respectively. The number of papers per journal ranged from 58 in the Journal of the American Medical Association (JAMA) to one paper each in 109 different journals. One-half of finalists’ papers appeared in 14 (7.7% of 181) different journals and 45% of winners’ papers appeared in two (14% of 13) different journals. For a common, matched set of journal titles, JIF and frequency-based rank were the indicators of journal influence. The average JIFs (see Table 1) of the non-winners (23.237 ± 17.613) and winners (27.247 ± 18.529) were not statistically different, t(223) = −1.17, p > .05 (onetailed), suggesting that the prominence of journal title, as measured by JIF, did not favor winners. The 13 journal titles publishing both finalists and winner’s papers and the associated number of finalists (n = 225) and winners’ (n = 31) papers appear in Table 1, ranked by the paper frequencies in each competition category. We employed a technique base on Zipf’s law to investigate the journal,

Downloaded by [University of West Florida] at 23:08 03 October 2014

Influencing the Outcome of Scientific Review

Figure 2: The rank-frequency profiles of finalists and winners graphed on double logarithmic

axes. The profiles are based on the journal rank for the frequency of finalists’ publications. The log relationships between frequency and rank have significant linear components, and the slopes are not parallel for finalists and winners. However, when one relationship is merely a subset of another, then the slopes are parallel, as illustrated by the “Winners Modeled” as 10% of the frequency of the “Finalists” data. Some 10% values were between 0 and 1; hence the log10 -transformed values would be negative. While those values do not appear in the graph, they were used in the regression analysis of the 10% model.

rank-frequency relationships for finalists vs. winners, as illustrated in Fig. 2, and for non-winners vs. winners (see Table 4). Each of these journal, rank-frequency relationships have a significant linear component. The slopes of the linear components for finalists and winners were not parallel, t(22) = 2.82, p < .05 (one-tailed) as also were the slopes of the non-winners and winners’ rank-frequency relationships, t(18) = 1.87, p < .05 (one-tailed); specifically, the negative slope of the winners’ line was greater than the negative slope of finalists or non-winners’ lines. This finding has two-fold significance: From the outcome standpoint that means that winners published less frequently in higher ranked journals compared to the other two competition categories. Further, this finding indicates that journal title, as measured by paper frequency, was less important to winning, when compared to all finalists or non-winners publishing in the common set of journals. From the methodology standpoint this finding could have even greater value in that the proposed (null) hypothesis was rejected and hence the method did detect a true difference between the two groups. Because in other instances, due to the way the hypotheses were defined, the method did not detect the difference between the two groups, and therefore it may have appeared that it is not sensitive enough to detect the difference between the two groups.

255

256

225/13 194/11 31/11 or 31/13

Finalists Non-Winners Winners

Multiple R

.91 .65

Slope ± SE

−1.73 ± .24 −.73 ± .26

8.1, p < .05

50.1, p < .05

F (1, 11)

−1.37 ± .25 −.60 ± .33

Slope ± SE

.88 .52

Multiple R

29.5, p < .05 3.4, p = .098

F (1, 9)

Ranked by Non-Winners’ Frequency

Note. The number of winners’ journals, 11 or 13, depended on the profile comparison, since journal title was used as the matching variable. Finalists and winners’ slopes were not parallel, t(22) = 2.82, p < .05 (one-tailed), as were non-winners and winners’ slopes, t(18) = 1.87, p < .05 (one-tailed). The decisional tool employed one-tailed tests because the statistical test outcome could not disambiguate journal rank as an indicator of bias vs. scientific excellence if winners published in higher ranked journals. SE = Standard Error of the Estimate. R = Regression Coefficient, F = F Test.

Papers (n)/ Journals (n)

Competition Category

Ranked by Finalists’ Frequency

Table 4: Summarized Zipf law regression analyses for two sets of profiles: finalists vs. non-winners and non-winners vs. winners

Downloaded by [University of West Florida] at 23:08 03 October 2014

Downloaded by [University of West Florida] at 23:08 03 October 2014

Influencing the Outcome of Scientific Review

Indicator Question 2: Paper Disease Category. The second indicator question examined the disease category covered in the paper—infectious (IF), noninfectious (NIF), or “Other”—in relation to winning (see Table 5). Via chi square analysis (only with those papers appearing in MEDLINE, because of the dependency on MeSH® for this analysis) winning was independent of paper disease category, X 2 (2, N = 557) = 4.75, p > .05. Indicator Question 3: Paper Major Topics. There were 1,144 unique major concepts (topics) in the 557 SSA papers (appearing in MEDLINE). There was no difference in the distribution of number of major topics per paper between non-winners and winners, X 2 (8, N = 557) = 5.71, p > .05 (see Table 6). The most frequently appearing topic was “HIV Infections” (n = 43), and 745 topics appeared only once. Of the 1,144 unique major topics, 93 appeared in the 31 winners’ papers. In the winners’ papers, “Human Influenza” was indexed most frequently (n = 5) and 78 topics appeared only once. Among 1,144 topics there were only 15 topics that appeared in at least 2 winning papers. (see Table 3). For these 15 major topics, the average chance Table 5: The number and percentage of papers by disease category and winning

status Winning Status Paper Disease Category

Infectious Noninfectious Other Total

Winner

Non-Winner

Total

19 (8.0%) 6 (4.0%) 6 (3.5%) 31 (5.6%)

218 (92.0%) 144 (96.0%) 164 (96.5%) 526 (94.4%)

237 150 170 557

Note. Percentages are based on row totals.

Table 6: The number of papers grouped by the number of major topics per paper for winners and non-winners Number Finalists Winners Non-Winners Major Topics Number of Cumulative Number of Cumulative Number of Cumulative per Paper Papers % Papers % Papers %

1 2 3 4 5 6 7 8 9

26 74 150 113 97 58 30 5 4

Totals

557

4.7 18.0 44.9 65.2 82.6 93.0 98.4 99.3 100.0

1 6 9 7 5 2 0 1 0 31

3.2 22.6 51.6 74.2 90.3 96.8 96.8 100.0 100.0

25 68 141 106 92 56 30 4 4 526

4.8 17.7 44.5 64.6 82.1 92.8 98.5 99.2 100.0

257

Downloaded by [University of West Florida] at 23:08 03 October 2014

258

J. Araujo et al.

of winning (51.7 ± 13.1%) was not different from 50%, t(14) = .18, p > .05 (one-tailed). However, six topics, (the first six in the table), were more likely to be associated with winning, as indicated by both their chi square contribution and winning percentage (ranging from 66.7% to 100%). There was only a single paper that included 2 of these 6 major topics: “Antigens, Viral” and “Hemagglutinin Glycoproteins, Influenza Virus.” Papers covering the remaining nine topics were not strong winning candidates. These results suggest that crafting a paper motivated by a specific a major topic may not improve the chances of winning: the pool of specific topics associated with better than even chances of winning was limited.

DISCUSSION The Shepard Science Awards have been made since 1986, and we took a 10year slice of data to analyze three factors that might influence the selection of winners from the finalists rather than the scientific excellence of the work per se. The three factors of importance to the cross-cutting matrix of scientists involved in selecting the winners and that might introduce bias into their choices were the journal, disease category, and major topics associated with the winning papers. We investigated the possible influence of various factors upon winner selection, as we noted above. CDC science regularly appears in journals with high visibility, such as the New England Journal of Medicine (JIF = 50.017), Journal of the American Medical Association (JIF = 31.718), Lancet (JIF = 28.409), and Science (JIF = 28.103). However, we did not observe a bias towards winner selection based on the journal title. Our analyses suggest that winning papers were not merely a subset, by journal title, of the overall scientific community of SSA papers because the frequency of appearance by journal title was different between finalists and winners, and winners published more frequently in lower ranked journals. Our other measure of journal importance, JIF, did not reveal a preference for papers appearing in high impact journals. This piece of evidence indicates that members of the three SSA winner selection committees have, year-after-year, executed their responsibility for identifying award winning work based on scientific merit and impact rather than on the journal “popularity” or “prestige” in which papers have been published, thus reinforcing the thinking that excellence should be worthy of inherent recognition, independent of the vehicle making it visible to a community of accomplished scientists. The reader should note that the Zipfian analysis was based on the empirical rank of journal importance within a set of papers the agency viewed as its best science. Journals deemed important to CDC public health scientists may not necessarily correspond to journal importance as measured by citation patterns, represented by JIF, for a worldwide community of biomedical scientists. JIF is

Downloaded by [University of West Florida] at 23:08 03 October 2014

Influencing the Outcome of Scientific Review

not stable over time, as journal policies and editorial leadership are reflected in the types of articles accepted for publication, and electronic and print journal types may report different JIFs. Even the lay press may influence citation rate (Phillips et al., 1991). Furthermore, there will never be a single “correct” JIF for a group papers published in the same journal over time, as well as the beginning boundary of JIF two-year calculation interval may not coincide with the appearance of one of our papers or the journal publication schedule. JIF is more or less a numerical indication of “popularity,” and while the actual value of JIF may vary over time, if the differences between frequency-based ranks remain roughly constant, then the conclusion would be the same. In a sense, the journal, rank-frequency approach represents an internal vote by agency scientists on journal rank vs. an external expression of importance via JIF, even with the limitations noted previously. However, neither approach found support for journal factor influencing the selection of winners. Even so, there are acknowledged, complex sets of factors that can inform a reviewer’s decision, and while journal title might not be one for SSA winners, we do not know if author status was associated with the reviewers’ decisions (Newton, 2010). Therefore, it is possible that the review for SSA winners is less influenced by factors that might bear upon a selection for publication. “Merit and “impact” are not necessarily the same criteria used by journal reviewers when judging acceptance. Among the implicit and explicit journal acceptance criteria would be papers increasing JIF and the “stated aims of the journal” (Newton, 2010). It would be favorable for “merit” and “impact” to be independent of the journal rank. Where public health science is published and where all science is published may not necessarily be in a uniform, conjoint space, especially over long periods of time. While two-thirds of CDC’s “programmatic and research budgets” have been outlaid on infectious diseases (Curry et al., 2006), the breadth of the activities that the agency has the responsibility for spans from injury prevention to environmental health, to preparedness, to chronic disease including obesity and cancer, to closing the gap in health disparities across the lifespan. We developed an algorithm for categorizing papers (as IF, NIF, or “Other”), based on MeSH® indexing, and subsequently validated the process by comparing its categorization of the winning papers to categorization of the same papers by a SME. Satisfied that the algorithm performed correctly, all 557 papers were assigned a disease category in order to determine if winning was related to disease category. Our results turned up no relationship between winning and paper disease category and may alleviate concerns about whether working on the most pressing health problems carries the risk of failing to be recognized for scientific excellence. We made the ancillary observation that IF or NIF diseases are not uniformly favored among the three public health science categories, and 31% of CDC SSA science is not related to a specific disease or a condition, but broader in scope, X 2 (4, N = 557) = 42.3, p < .05.

259

Downloaded by [University of West Florida] at 23:08 03 October 2014

260

J. Araujo et al.

Similar to disease categories, these environmental drivers (e.g., Administration priorities) are possible sources for the introduction of bias into the scientific review process so that scientific recognition is based less on excellence and more on alignment with strategic priorities. However, over the course of 10 years and among 1,144 topics there were only 15 topics that appeared in at least 2 winning papers. We further observed a better than even chance of winning for only six of these15 topics. These topics appeared in 13 different papers that were published for every SSA year except 2004, indicating that topics reappeared over this 10-year interval. The interval between winning generally either was relatively immediate, within two years (for “AIDS Serodiagnosis,” “Orthomyxoviridae Infections,” and “Hemagglutinin Glycoproteins, Influenza Virus”), or delayed, by seven or more years (for “Antigens, Viral” and “Rotavirus Vaccines”). In contrast, “HIV Infections,” with a 4.9% chance of winning, appeared in 43 different papers and three or more times per SSA year. Thus, some appear to be longstanding topics of interest regardless of strategic priorities, the cadre of peer reviewers, or the chances of winning, while a few might be “hot” for a shorter lifecycle. Among the first five author positions of these 13 papers, only two authors (of 60) appeared twice: one was the first author twice, and the other was the fourth co-author twice. Thus, winning among these topics was associated with a set of papers virtually never authored by the same primary cohort of authors. While these six topics were more likely to win, they were far from the most popular topics, and the data indicate that the chances of winning decreased with more papers or interest in the topic. Thus, a paper or position largely, and perhaps indiscriminately, topic-driven and based on popularity is not necessarily a winning strategy in the face of selection criteria based on merit and impact. Of course the generalizability of the present findings must be clarified. We analyzed a single case and used significance testing solely for the purpose of assessing whether or not the three indicators we focused on were possibly sources of bias when selecting the SSA winners; that is, we primarily wanted to develop a method that would provide for defensible conclusions about the scientific review process for the agency’s most prestigious scientific award. Hence, our focus was on threats to internal validity (Campbell and Stanley, 1966). At the same time, the process and methodology themselves could be helpful to others’ instances when they want to assess their scientific review processes even when different indicators are the focus of interest or when the review process may be focused on grant applications rather than on published papers. This is not to say that the indicators (or criteria) we used would be of exact same interest to those working on other review processes, and naturally, the outcomes of any other analyses would depend on the data used in those analyses. In other words, it is the methodology that we offer as the potential generalizable product. Hence, different kinds of correlations and different

Downloaded by [University of West Florida] at 23:08 03 October 2014

Influencing the Outcome of Scientific Review

kinds of biases (or lack of them) could be detected in different review processes using different criteria. We believe that this paper illustrates an approach to evaluating a longstanding review process that allows identification or negation of the presence of bias with respect to three specific indicators we have selected for this study. While indeed journal, disease category, and major paper topics were unrelated to winning, our methodology was sensitive enough to detect differences between the ranks of journals for winners and non-winners. That is particularly important to reaffirm the ability of the methodology to detect true difference between the two groups by rejecting the proposed (null) hypothesis. In all other instances our hypotheses were accepted. This ability of the method to accept and reject the proposed hypothesis allows us to have positive and negative outcomes. Because in other instances, due to the way the hypotheses were defined, the method did not detect the difference between the two groups, and therefore it may have appeared appear that it is not sensitive enough to detect the difference between the two groups. Realizing that this could be a challenging concept we conducted further analyses to reaffirm the method’s capability by bringing in the idea and value of a positive control. Because we wanted to further ascertain if a correlation is indeed detectable by our methods, as described in the methods section, we adjusted critical values obtained at p ≤ .05 to critical values obtained at p ≤ .01. Doing so could indicate if we were accepting the conclusion of no correlation when in fact there was an association between one of the indicators and winning, when the decision point was between these two thresholds. In this situation, this technique could serve as a positive control for our data analytic methods. If a decision point appeared between two commonly accepted α values, then it would be possible to document the existence of an association. Thus, an association between winning and an indicator would exist. Our data set yielded one such outcome for null hypothesis of no difference between the non-winners and winners’ slopes (cf. Table 4). For all other statistically-guided decisions, the decision to reject null hypothesis was not influenced by these two α values. That finding strengthened our belief that our methods really could detect positive controls or bias if it were present in the analytic data set. Even so, because of the relationship between correlation and causation, it was still possible to gather evidence suggesting the absence of bias, as we did. This approach was especially useful for this question of bias because we viewed it as a metaanalysis of a variegated group where scientific excellence was the primary common denominator as opposed to a disease, condition, or other subject area of interest. To CDC scientists and the SSA scientific review process, we provide reassurance that the scientific review process for identifying the “best of the best” agency science was not influenced by three potentially strong sources of bias. When selecting and recognizing the agency’s premier science, the answer to

261

Downloaded by [University of West Florida] at 23:08 03 October 2014

262

J. Araujo et al.

the question—“Does it matter where I publish or what my paper is about?”— suggested by our data is “no.” This conclusion is even more valuable because the scientific reviewers are members of the CDC community and scientific culture and, therefore, end up executing their review task without the benefit of no prior knowledge about journal title or where CDC outlays are made. Furthermore, on a broader scale, we have offered a workable methodology for the evaluation of review processes, which invariably appear during the life cycles of organizations and which others can adopt, adapt, and extend to suit their peer-review needs.

ACKNOWLEDGMENT The authors extend most sincere thanks and appreciation to Dr. Charles Heilig, CDC, for his sharp and deep probing, and hours of critical reviews, discussions, and analysis, all of which profoundly influenced this work.

NOTES 1. The top-level organizational structure of the U.S. Centers for Disease Control and Prevention is composed of several national centers, one institute (i.e., National Institute for Occupational Safety and Health), and multiple offices. Hence, the acronym for this collective structure of centers, institute, and offices is CIO. 2. As an example, JIFs may not be available for both print and electronic formats, where each format is assigned a unique ISSN even for the same, common journal title; The New England Journal of Medicine is an example of this scenario, and since an electronic format JIF for The New England Journal of Medicine was not available, the print format JIF, based on the common journal title, was used in the analyses. 3. The unit of analysis is not necessarily the “element surveyed” (Nardi, 2003), which in our case was an SSA published paper in a peer-reviewed journal. 4. The reader is referred to Fig. 2, which is a graph with double logarithmic axes, as an aid to understanding the visual expression of Zipf’s law. 5. Each tree carries a letter identifier. For example, the “Anatomy,” “Organism,” and “Diseases” tree identifiers are “A,” “B,” and “C,” respectively. Tuples, based on the first letter tree identifier, indicate the first level branch of each tree. “Bacterial Infections and Mycoses,” “Virus Diseases,” and “Parasitic Diseases” are identified with “C01,” “C02,” and “C03,” respectively, and are three of the first-level branches of the “Diseases” tree.

REFERENCES Baroni, M. and Evert, S. (2006). Counting Words (introductory course at ESSLLI ’06). Available at http://zipfr.r-forge.r-project.org/ Last accessed April 11, 2011. Berman, J. J. (2007). Perl Programming for Medicine and Biology. 1st ed, Jones and Bartlett Series in Biomedical Informatics. Sudbury, MA: Jones and Bartlett Publishers. Bollen, J., Van de Sompel, H., Hagberg, A., and Chute, R. (2009). A principal component analysis of 39 scientific impact measures. PLoS ONE 4: e6022.

Influencing the Outcome of Scientific Review Bornmann, L. (2012). Measuring the societal impact of research: Research is less and less assessed on scientific impact alone–we should aim to quantify the increasingly important contributions of science to society. EMBO Rep. 13: 673–676. Bornmann, L., Marx, W., Gasparyan, A. Y., and Kitas, G. D. (2012). Diversity, value and limitations of the journal impact factor and alternative metrics. Rheumatol. Int. 32: 1861–7.

Downloaded by [University of West Florida] at 23:08 03 October 2014

Bornmann, L., Mutz, R., and Daniel, H. D. (2010). A reliability-generalization study of journal peer reviews: A multilevel meta-analysis of inter-rater reliability and its determinants. PLoS ONE 5: e14331. Bornmann, L., Wallon, G., and Ledin, A. (2008). Does the committee peer review select the best applicants for funding? An investigation of the selection process for two European molecular biology organization programmes. PLoS ONE 3: e3480. Campbell, D. T. and Stanley, J.C. (1966). Experimental and Quasi-Experimental Designs for Research. Chicago: Rand McNally. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20: 37–46. Curry, C. W., Anindya, K. D., Ikeda, R. M., and Thacker, S. B. (2006). Health burden and funding at the Centers for Disease Control and Prevention. American Journal of Preventive Medicine 30: 269–276. Decker, S. L. (2009). Changes in Medicaid physician fees and patterns of ambulatory care. Inquiry 46: 291–304. Edwards, A. L. (1976). An Introduction to Linear Regression and Correlation. San Francisco: W. H. Freeman and Company. Fuller, S. (2002). Knowledge Management Foundations. Boston: KMCI Press. Garfield, E. (1999). Journal impact factor: A brief review. Canadian Medical Association Journal 161: 979–980. Honjo, T. (2005). In search of the best grant system. Science 309: 1329. Kennedy, M. G., Mizuno, Y., Seals, B. F., Myllyluoma, J., and Weeks-Norton, K. (2000). Increasing condom use among adolescents with coalition-based social marketing. AIDS 14: 1809–1818. Nardi, P. M. (2003). Doing Survey Research: A Guide to Quantitative Methods. Lasser, J. ed., Boston: Pearson Education, Inc. Newton, D. P. (2010). Quality and peer review of research: an adjudicating role for editors. Accountability in Research 17: 130–145. Office of the Associate Director for Science. (2011). Charles C. Shepard Science Award. Available at http://www.cdc.gov/od/science/aboutus/shepard/ Last accessed April 4, 2011. Phillips, D. P., Kanter, E. J., Bednarczyk, B., and Tastad, P. L. (1991). Importance of the lay press in the transmission of medical knowledge to the scientific community. N. Engl. J. Med. 325: 1180–1183. Salton, G. and McGill, M. J. (1983). Introduction to Modern Information Retrieval. New York: McGraw-Hill Book Company. SAS for Windows SAS 9.2 TS Level 2M0. Carey, NC: SAS Institute. SAS XML Mapper 9.2 902000.12.5.20090116170000_v920. Carey, NC: SAS Institute.

263

264

J. Araujo et al. Shahangian, S. and Cohn, R. D. (2000). Variability of laboratory test results. American Journal of Clinical Pathology 113: 521–527. Spiegel, P. B. and Salama, P. (2000). War and mortality in Kosovo, 1998-99: An epidemiological testimony. The Lancet 355: 2204–2209.

Downloaded by [University of West Florida] at 23:08 03 October 2014

Thomson Reuters. (2011). Science-Thomson Reuters 2011. Available at http:// thomsonreuters.com Last accessed March 6, 2011. Thomson Reuters. (2013). The Thomson Reuters Impact Factor. Available at http:// thomsonreuters.com/products_services/science/free/essays/impact_factor/ Last accessed February 11, 2013. U.S. National Library of Medicine. (2007a). Medical Subject Headings (files available to download). Health and Human Services. Available at http://www.nlm.nih.gov/mesh/ filelist.html Last accessed December 3, 2010. U.S. National Library of Medicine. (2007b). PubMed Help. Available at http://www.ncbi. nlm.nih.gov/books/bookres.fcgi/helppubmed/pubmedhelp.pdf. U.S. National Library of Medicine. (2008). XML MeSH Data Elements. Health and Human Services. Available at http://www.nlm.nih.gov/mesh/xml_data_elements. html Last accessed July 7, 2010. U.S. National Library of Medicine. (2011). Fact Sheet: Medical Subject Headings (MeSH® ). Available at http://www.nlm.nih.gov/pubs/factsheets/mesh.html Last accessed February 10, 2013. Wall, L. (2007). Perl v5.8.8, Binary build 820 [274739] (MSWin32-x86-multi-thread). ActiveState.

Analysis of three factors possibly influencing the outcome of a science review process.

Analysis of three factors possibly influencing the outcome of a science review process. - PDF Download Free
426KB Sizes 0 Downloads 0 Views