DOI:10.1093/jncimonographs/lgt020

© The Author 2013. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: [email protected].

Sentiment Analysis to Determine the Impact of Online Messages on Smokers’ Choices to Use Varenicline Nathan K. Cobb, Darren Mays, Amanda L. Graham Correspondence to: Nathan K. Cobb, MD, Schroeder Institute for Tobacco Research and Policy Studies, American Legacy Foundation, 1724 Massachusetts Ave NW, Washington, DC 20036 (e-mail: [email protected]).

Background

Social networks are a prominent component of online smoking cessation interventions. This study applied sen-

Methods Data were from QuitNet, an online social network dedicated to smoking cessation and relapse prevention. Selfreported medication choice at registration and at 30 days was coded among new QuitNet registrants who participated in at least one forum discussion mentioning varenicline between January 31, 2005 and March 9, 2008. Commercially available software was used to code the sentiment of forum messages mentioning varenicline that occurred during this time frame. Logistic regression analyses examined whether forum message exposure predicted medication choice. The sample of 2132 registrants comprised mostly women (78.3%), white participants (83.4%), averaged Results  41.2 years of age (SD = 10.9), and smoked on average 21.5 (SD = 9.7) cigarettes/day. After adjusting for potential confounders, as exposure to positive varenicline messages outweighed negative messages, the odds of switching to varenicline (odds ratio = 2.05, 95% confidence interval = 1.66 to 2.54) and continuing to use varenicline (odds ratio = 2.46, 95% confidence interval = 1.96 to 3.10) statistically significantly increased. Conclusions

Sentiment analysis is a useful tool for analyzing text-based data to examine their impact on behavior change. Greater exposure to positive sentiment in online conversations about varenicline is associated with a greater likelihood that smokers will choose to use varenicline in a quit attempt.



J Natl Cancer Inst Monogr 2013;47:224–230

Online social networks (OSNs) are widely used resources for healthrelated behaviors (1–3). The Internet is a primary source for health information for many people (2,4). Health-related OSNs enable people to share personal experiences and connect with others in similar circumstances, such as coping with a health issue or beginning a new medication (1,3,5). Despite their popularity and anecdotal evidence regarding impact, there are few published reports linking interpersonal communications in OSNs with behavior change. Evaluating the impact of online communications on health behaviors and decision making and identifying influential communications aspects are important research priorities for intervention design and evaluation. Health communication theories provide a framework for understanding how OSN communications may affect behavior (6,7). Messages with salient emotion (eg, risk, fear) may generate an affective response; when messages are relevant to the individual they may elicit change in attitudes and behaviors (6,7). Expression of emotion is common in OSNs (1,8–10), and research demonstrates that positive and negative emotions can propagate from one user to another within OSNs (10). Although there is powerful evidence that health behavior spreads through OSNs (11), there has been little research examining the mechanisms through which emotional content in online networks influences health behavior. 224

Contemporary computing technology permits textual analysis of large-scale data sets from online forum discussions (12,13). Sentiment analysis is a method to analyze textual data and quantitatively code positive and negative emotions expressed in written text such as user-generated content (12). Forum messages may contain mentions of a specific topic of interest (eg, smoking), which can be coded as individual expressions of emotion, or the entire message itself can be coded for polarity (12). Similar approaches have been used to examine emotional content of health-related messages in publicly available data sources (eg, blogs, social networking sites), demonstrating the utility of sentiment analysis to model emotional content in online communities (13–15). However, applications of sentiment analysis to examine how exposure to emotional content in OSNs may influence behavior remain limited. Using sentiment analysis, we examined whether exposure to emotional content in public forum discussions in an online healthrelated community influenced decision making about medication use. We hypothesized that the emotional polarity of discussions mentioning specific medications would affect later decision making about their use. We used longitudinal text-based data from a long-standing OSN dedicated to smoking cessation and relapse prevention (16) and focused on smokers’ decisions about the use of varenicline. Varenicline is a partial nicotine agonist that Journal of the National Cancer Institute Monographs, No. 47, 2013

Downloaded from http://jncimono.oxfordjournals.org/ at Aston University on January 28, 2014

timent analysis—a data processing technique that codes textual data for emotional polarity—to examine how exposure to messages about the cessation drug varenicline affects smokers’ decision making around its use.

METHODS Setting This study was a secondary analysis of deidentified data obtained from QuitNet (http://www.quitnet.com). QuitNet is a large, continuously operated online social network for smoking cessation and relapse prevention. Characteristics of QuitNet users and details regarding its development, evolution, and efficacy were published previously (16,19–22). QuitNet enables multiple forms of social interaction among users, including asynchronous communication through private internal e-mail (“Qmail”) and one-to-many messaging in threaded discussion forums, which contain multiple user messages related to a specific topic within each thread. Users can self-affiliate into “clubs,” which are user-initiated minisites complete with dedicated forums; buddy lists allow users to keep track of their friends. Social influence regarding cessation occurs through profile pages, blogs, anniversary lists, and personal testimonials. Users are encouraged to publicly share quit dates, which are set through a wizard tool, and are prompted for quit date updates at each login (16). At registration, all users are asked if they intend to use a cessation medication and to indicate which one (“medication plan”). This medication plan can be updated at any point, usually when the user changes a quit date. Sample and Data Extraction QuitNet maintains a complete history of communications that occur throughout the site; each message is stored in a database and associated with individual users. We limited our initial observation to the period from January 31, 2005 to March 9, 2008. This date range spanned a period before varenicline’s approval, the date of approval, the controversy surrounding adverse side effects, and the month following the FDA’s alert and revision of the drug’s safety information (18). Data on US QuitNet members who 1) registered as new users during the observation period, 2) indicated during registration that they were looking for cessation help, 3) were actively smoking at registration, and 4) participated in at least one forum

discussion where varenicline was mentioned during the observation period were extracted. For descriptive purposes, we first examined all messages occurring during the observation period to quantify forum activity before and after approval of varenicline. For our main analyses, we limited the data to users who registered following approval of varenicline who would have access to the drug. For each individual, we extracted registration data, including demographic (gender, age, race, and education) and smoking information (cigarettes/day, previous 12-month quit attempts), the medication plan set at registration, and their last known medication choice 30 days after registration. The change between the two time points (eg, from no choice to varenicline) was the primary outcome. A comparison sample was generated using the same criteria but with bupropion as the medication of interest. Text Extraction and Sentiment Coding We extracted a data set consisting of all forum threads and all messages within threads where varenicline was mentioned and one of our eligible users actively participated (ie, posted a message). Sentiment coding was conducted using Salience Engine 4.1 (Lexalytics, Amherst, MA). The software uses a key-word approach to extract and code parts of speech from free-form text. Sentiment is determined using a precoded dictionary where distinctive portions within a message (eg, phrases) are assigned a unitless polarity score and then combined to determine overall message sentiment. For example, the words “hate,” “fear,” or “hurt” are coded as negative, whereas “love” and “comfort” are coded as positive. We modified the default polarity dictionary to create customized sentiment coding definitions for this study. For example, the word “quit” was changed from a negative contribution to neutral to avoid negative coding for phrases such as “I quit smoking with Chantix!” We also modified the key-word dictionary to identify brand names (ie, Chantix or Champix, Zyban for bupropion) and common misspellings. Based on the vendor’s specifications and visual inspection of message sentiment distribution, messages were categorized as having positive (>0.35), neutral (0–0.35), or negative (

Sentiment analysis to determine the impact of online messages on smokers' choices to use varenicline.

Social networks are a prominent component of online smoking cessation interventions. This study applied sentiment analysis-a data processing technique...
548KB Sizes 0 Downloads 0 Views