DOI: 10.1111/hir.12111

Could we do better? Behavioural tracking on recommended consumer health websites Jacquelyn Burkell & Alexandre Fortier Faculty of Information and Media Studies, The University of Western Ontario, London, ON, Canada

Abstract Objective: This study examines behavioural tracking practices on consumer health websites, contrasting tracking on sites recommended by information professionals with tracking on sites returned by Google. Methods: Two lists of consumer health websites were constructed: sites recommended by information professionals and sites returned by Google searches. Sites were divided into three groups according to source (Recommended-Only, Google-Only or both) and type (Government, Not-for-Profit or Commercial). Behavioural tracking practices on each website were documented using a protocol that detected cookies, Web beacons and Flash cookies. The presence and the number of trackers that collect personal information were contrasted across source and type of site; a second set of analyses specifically examined Advertising trackers. Results: Recommended-Only sites show lower levels of tracking – especially tracking by advertisers – than do Google-Only sites or sites found through both sources. Government and Not-for-Profit sites have fewer trackers, particularly from advertisers, than do Commercial sites. Conclusions: Recommended sites, especially those from Government or Not-for-Profit organisations, present a lower privacy threat than sites returned by Google searches. Nonetheless, most recommended websites include some trackers, and half include at least one Advertising tracker. Implications: To protect patron privacy, information professionals should examine the tracking practices of the websites they recommend. Keywords: consumer health information; critical appraisal; digital information resources; information literacy; websites. Key Messages

• • • • •

Some guidelines for assessing web resources take privacy issues into account, but none of them include monitoring behavioural tracking practices. Information professionals should take behavioural tracking practices into account when recommending consumer health information resources. Research indicates that online resources currently recommended by information professionals have substantial presence of online behavioural tracking. Information professionals should educate themselves with respect to behavioural tracking mechanisms and the measures that can be taken to identify and even block behavioural tracking. Information professionals should educate their patrons with respect to behavioural tracking mechanisms and the measures that can be taken to identify and even block behavioural tracking.

Correspondence: Alexandre Fortier, Faculty of Information and Media Studies, The University of Western Ontario, London, ON N6C 5B7, Canada, E-mail: [email protected] Support for this project was provided by the Office of the Privacy Commissioner of Canada through its Contributions Program. The views expressed in this document are those of the researchers and do not necessarily reflect the views of the Officer of the Privacy Commissioner of Canada.

182

Introduction In a context of patient-centred health decisionmaking where health information is more available than ever, patients are increasingly faced with the responsibility or the desire to become informed participants in their own health and health care © 2015 Health Libraries Group Health Information & Libraries Journal, 32, pp. 182–194

Could we do better?, Jacquelyn Burkell & Alexandre Fortier

decisions.1,2 Many choose to seek information on their own, often conducting online searches to identify health information resources.3 Although consumer health information is readily and freely available online, the selection and interpretation of this information represents a challenge for many.4 It is not surprising, therefore, that many lay consumers of health information seek information intermediaries who can assist them to identify high quality health information resources.4,5 Libraries and the information professionals who work within them are key information intermediaries, and they maintain this role with respect to consumer health information.5–8 Information professionals provide assistance to information consumers in finding, evaluating and making sense of consumer health resources; they teach to their patrons the essentials skills to perform these activities independently; and their services are being directly integrated into clinical practice to support information provision at point of care.9–11 Increasingly, the best and most up-to-date consumer health information is available on the Internet, and information professionals are directing patrons to online health information resources. In identifying the best possible online health resources, information professionals pay attention to a variety of factors including intended audience, and information accuracy, currency and completeness.12–15 They also attend to advertising,12,14 because such sponsorship could potentially affect the balance, coverage and objectivity of the information delivered on the site. More rarely, librarians advise users to read privacy policies for disclosure of collection and use of personally identifiable information such as name or address.16 To date, however, guidelines for consumer health information recommendations have not considered the privacy risks associated with a relatively new, and increasingly frequent, advertising tactic: behavioural tracking. It has long been recognised that Internet users face privacy risks as they navigate online spaces. Historically, the associated privacy concerns have focused on the collection, use and retention of the personally identifying information that is explicitly provided by users in the course of online activities (e.g. registration information that includes name and email). More recently, however, websites and © 2015 Health Libraries Group Health Information & Libraries Journal, 32, pp. 182–194

associated advertisers have increased their use of behavioural tracking using tools such as HTTP cookies, Web beacons and Flash cookies. These mechanisms collect non-personally identifying information such as IP address, browser configuration information and details of browsing behaviour.17–21 Information is often integrated across multiple visits to a website and even across visits to different websites to assemble a detailed profile of online activities. While this information cannot typically be used to identify a specific individual, it provides advertisers and others with a rich profile that reveals a great deal about an individual, presenting potential privacy issues.22 Behavioural tracking is often justified as a tool that supports positive outcomes such as website personalisation and targeted advertising that delivers information on products and services that are of interest to the user. The information gathered through this tracking, however, can also be used to discriminate against consumers through activities such as ‘price steering’ (manipulating the products shown23), price discrimination (charging some users more for the same product23–25) or even denial or 22,26,27 discontinuation of service. The detailed personal profile that can be developed on the basis of behavioural tracking is of potential interest to employers, insurers and providers of financial services – in fact, to anyone who would derive value from the segmentation of Internet users according to their online behaviour and characteristics inferred on the basis of that behaviour.28 Privacy threats associated with this profiling are particularly acute in the context of health information, because the searches that individuals conduct can reveal sensitive and potentially damaging information regarding health-related concerns and interests.29–31 There exist various types of trackers that present different privacy risks. One primary distinction is between first-party trackers and third-party trackers. Information collected by first-party trackers is only available to the website itself; as a result, these trackers present a relatively low privacy risk. Third-party trackers are set and read by third-party organisations. Some third-party trackers are used to provide page functionality, usually through communication between the

183

Could we do better?, Jacquelyn Burkell & Alexandre Fortier

important question: Can the lower level of tracking on Recommended-Only sites be attributed to differences in the types of cites (Government, Not-for-Profit or Commercial) that appear in the Recommended-Only list? To address this question, we examined the relationship between source (Recommended-Only, Google-Only or Recommended-and-Google) and type (Government, Not-for-Profit or Commercial) of website. A chi-square analysis revealed a significant association (v2(2, N = 151) = 39.76, P < 0.001, V = 0.36): the Google-Only category is dominated by Commercial websites, while the Recommended-Only and Recommended-andGoogle categories both have larger and approximately equal proportions of Government and Not-for-Profit sites (see Table 1 for the complete distribution). Given the small number of sites in some categories (e.g. there are only two sites in the Recommended-Only group that are Commercial), we cannot statistically test the joint impact of source and type on behavioural tracking. Instead, we provide a descriptive overview of Problematic trackers (Table 2) and Advertising trackers (Table 3) in each combination of source (Recommended-Only, Recommended-and-Google and Google-Only) and type (Government, Not-forProfit, and Commercial). A quick perusal reveals that Problematic trackers are present in the large majority of all groups, with Google-Only Government sites showing the lowest level (Problematic trackers are present on only 62% of these sites). Furthermore, while Commercial websites that appear on the Recommended-Only list are somewhat less likely to include Problematic trackers than are the Google-Only Commercial sites (Problematic trackers appear on 75% of the Commercial sites on the Recommended-Only list compared to 96% of the Commercial sites on the Google-Only list), all of the Commercial sites identified by both sources

(admittedly, only two in number) include Problematic trackers. When we concentrate on Advertising trackers (Table 3), it appears that Recommended-Only Government and Not-forProfit sites are more likely to include these trackers than are Google-Only sites, and Recommended-Only Commercial sites are only slightly less likely (75% compared to 84%) to include Advertising trackers. Overall, it appears that librarians tend to recommend Government and Not-for-Profit sites over Commercial sites. Given that these same types of sites (Government and Not-for-Profit) also show lower levels of tracking, the overall lower levels of tracking on Recommended-Only sites could be accounted for by a tendency to recommend a different type of site (Government or Not-for-Profit), rather than better oversight of the tracking practices of recommend sites. This conclusion is strengthened by the fact that within each type of site (Government, Not-for-Profit and Commercial), similar levels of Problem tracking and tracking by advertisers are observed. Thus, it does not appear that information professionals are selectively recommending those Commercial, Notfor-Profit or Government sites that are more privacy-respecting in that they are less likely to engage in tracking; instead, information professionals simply tend to recommend the types of sites (Government and Not-for-Profit) that are less likely to engage in tracking. Discussion The contrast between Recommended-Only, Google-Only and Google-and-Recommended provides important information, but it does not give an overview of actual user experience. Information seekers following the recommendations of information professionals would in fact encounter both the sites identified as

Table 1 Distribution of types of websites by source

Recommended-Only Recommended-and-Google Google-Only

Government

Not-for-Profit

Commercial

Total

37.3% (n = 22) 44.4% (n = 8) 10.8% (n = 8)

49.2% (n = 29) 44.4% (n = 8) 28.4% (n = 21)

14.5% (n = 8) 11.1% (n = 2) 60.8% (n = 45)

100% (n = 59) 100% (n = 18) 100% (n = 72)

© 2015 Health Libraries Group Health Information & Libraries Journal, 32, pp. 182–194

189

Could we do better?, Jacquelyn Burkell & Alexandre Fortier

2 The consolidated set of websites returned on the first two pages of Google searches of the ten most commonly searched conditions, as identified by the Pew Research Center’s Internet & American Life Project.45 The Google searches were repeated three times using a different computer each time. Obviously, irrelevant results (e.g. sites for roofing companies that were returned for the ‘shingles’ search) were eliminated from the Google results, as were any ‘dead’ links. The consolidated set of Recommended websites contained 77 unique websites and the Google searches returned 92 unique websites. Eighteen websites were present in both lists. Websites were divided into three groups: Recommended-Only (59 websites), Google-Only (74 websites) and Recommended-and-Google (18 websites). For the purpose of the analysis, websites were also divided into three categories: Government (38 websites), Not-for-Profit (58 websites) and Commercial (55 websites). These categories were assigned manually using the information about the website publisher available in the ‘About’ section of each website. In the absence of an ‘About’ section, the website was examined in detail to determine the type of site. Data collection and analysis The data for each website were collected following a protocol that was developed to avoid any contamination of tracking results between the websites, based on McDonald and Cranor.46 While other approaches are available that offer extensive information regarding data collection, data flow and data use,47 the identified approach was ideally suited to the goal of this study, which was to document the prevalence of trackers (including third-party trackers) on consumer health websites. Each website was visited in an independent session. Each session began with the browser at an about:blank page, with clean data directories (no HTTP and Flash cookies, and an empty cache). The website was then accessed directly by entering the domain name into the browser’s navigation bar. A typical user interaction with the website was mimicked by visiting approximately 10 pages on the site. Search functions on the site © 2015 Health Libraries Group Health Information & Libraries Journal, 32, pp. 182–194

were used, and any surveys that did not ask for personal information were completed (e.g. ‘Question of the day’ surveys). We did not click through on any ads or follow any external links; thus, user interaction was confined to the website in question. At the end of the session, all HTTP cookies in the browser cookie file were recorded along with any Flash cookies stored in Adobe’s Website Storage Settings panel. These results were augmented by those returned by Ghostery,48 a browser extension that records Web beacons, and Charles,49 an application that captures and analyses data being sent between the browser and the visited website and between the browser and third-party sites. Using these sources, we created a list of the third-party tracking mechanisms (HTTP cookies, Flash cookies and Web beacons) present on the site and the domains from which these trackers originated. After these data were recorded, the browser cache was cleared, all HTTP cookies were removed, and the Flash cookie folder was emptied using Adobe’s Website Storage Settings panel, in preparation for a new data collection session. Once all data collection was complete, a consolidated list of third-party tracking domains identified on all websites was created. Across all sites, trackers from a total of 127 different tracking domains were identified. Using a combination of the information about trackers provided by Ghostery48 and PrivacyChoice,32 we assigned each domain to one of five categories: 1 Advertising: trackers that deliver advertisements (153 different domains; e.g. [x + 1], AdRoll); 2 Analytics: trackers that provide analytics for a website (47 different domains; e.g. ChartBeat, Crazy Egg); 3 Audience Segmentation: trackers that serve no other purpose than audience segmentation (67 different domains; e.g. Adloox, BlueCava); 4 Privacy: trackers that deliver privacy notices (4 different domains, e.g. GDN Notice, TRUSTe Notice); 5 Widgets: trackers that provide page functionality (42 different domains; e.g. Gigya Socialize, ShareThis). Our analysis excluded the trackers from Privacy and Widgets categories, which present low privacy

185

186

Could we do better?, Jacquelyn Burkell & Alexandre Fortier

risk, and focused on those from the Advertising, Analytics and Audience Segmentation categories. For the purpose of analysis, these were collapsed into a single category and termed ‘Problematic’ trackers. An additional separate analysis was conducted on Advertising trackers, because it is these trackers that create the greatest privacy risk. Results Comparison between Recommended sites and sites returned by Google Overall, 88.1% of the websites (n = 133) had Problematic trackers. Among the 59 Recommended-Only sites, 86.4% (n = 51) were associated with at least one problematic tracking domain, compared to 87.8% (n = 65) of the Google-Only sites and 94.4% (n = 17) of Recommended-and-Google sites. There was no difference between Recommended-Only, GoogleOnly and Recommended-and-Google sites with respect to the presence of Problematic trackers (v2(2, N = 151) = 0.85, P = 0.65). It is not just the presence of trackers that matters: it is also important to understand the extent of tracking by looking at the number of different tracking domains on a site. Recommended-Only sites had trackers from an average of 3.5 domains (Min = 0, Max = 42), Google-Only sites had trackers from an average of 13.2 domains (Min = 0, Max = 58), and Recommended-and-Google sites had trackers from

an average of 8.9 domains (Min = 0, Max = 55) (See Figure 1). The difference between these groups was statistically significant (F (2,148) = 8.16, P < 0.001, g² = 0.09). A Tukey post hoc test revealed that the number of Problematic trackers was significantly lower for Recommended-Only sites (M = 4.9, SD = 6.7) compared to Google-Only sites (M = 15, SD = 18.4, P < 0.001); there were, however, no significant differences between RecommendedOnly and Recommended-and-Google sites (P = 0.35), and between Google-Only and Recommended-and-Google sites (P = 0.42). Trackers from Advertisers were present on 55.6% of the websites (n = 84). Among the 59 Recommended-Only sites, 44.1% (n = 26) were associated with at least one tracking domain, compared to 64.9% (n = 48) of the Google-Only sites and 55.6% (n = 10) of the Recommendedand-Google sites. The difference between Recommended-Only, Google-Only and Recommended-and-Google sites with respect to the presence of trackers from advertisers was only marginally insignificant (v2(2, N = 151) = 5.75, P = 0.056, V = 0.2). Once again, the number of Advertising tracking domains was also compared to give a more detailed portrait. Recommended-Only sites had trackers from an average of 1.4 advertising domains (Min = 0, Max = 16), Google-Only sites had trackers from an average of 8.1 domains (Min = 0; Max = 38), and Recommended-and-Google sites had trackers from an average of 5.2 domains

Figure 1 Average of third-party domains © 2015 Health Libraries Group Health Information & Libraries Journal, 32, pp. 182–194

Could we do better?, Jacquelyn Burkell & Alexandre Fortier

Figure 2 Average of advertising domains

(Min = 0; Max = 32) (See Figure 2) . The difference between these groups was significant (F (2,148) = 10.33, P < 0.001, g² = 0.12). A Tukey post hoc test revealed that the number of Advertising trackers was significantly lower for Recommended-Only sites (M = 1.4, SD = 2.8) compared to Google-Only sites (M = 8.1, SD = 10.9, P < 0.001). There were no significant differences between Recommended-Only and Recommended-and-Google sites (P = 0.22), and between Google-Only and Recommended-andGoogle sites (P = 0.39). These results demonstrate that third-party tracking from Problematic trackers is widespread on consumer health websites, and Recommended sites are as likely as those returned by Google searches to include such trackers. The results are similar for Advertising trackers, although the difference between Recommended-Only, GoogleOnly and Recommended-and-Google sites is marginally significant, with Advertising trackers least commonly observed on Recommended sites. The results also indicate that websites recommended by librarians contain fewer Problematic trackers and fewer Advertising trackers when compared to websites returned by Google. Comparison among Government, Not-for-Profit and Commercial sites In a second set of analyses, we examined differences in tracking across Government, Not-for-Profit and Commercial websites. Among © 2015 Health Libraries Group Health Information & Libraries Journal, 32, pp. 182–194

the Government sites, 81.6% (n = 31) had at least one Problematic tracker, compared to 87.9% (n = 51) of the Not-for-Profit sites and 92.7% (n = 51) of the Commercial sites. Across the three groups, there was no significant difference in the presence of Problematic trackers (v2(2, N = 151) = 2.66, P = 0.26). Analysis of the number of tracking domains, however, revealed a different picture. Government sites had trackers from an average of 1.8 domains (Min = 0; Max = 5), Not-for-Profit sites had trackers from an average of 4.5 domains (Min = 0; Max = 32), and Commercial sites had trackers from an average of 18.2 domains (Min = 0; Max = 56) (See Figure 3). There was a statistically significant difference between these groups (F(2,148) = 25.05, P < 0.001, g² = 0.25). A Tukey post hoc test revealed that the number of Problematic trackers was significantly lower for Government (M = 2.7, SD = 2.1, P < 0.001) and Not-for-Profit websites (M = 6.4, SD = 8.1, P < 0.001) compared to Commercial websites (M = 20.2, SD = 19.9). There was no significant difference between Government and Not-for-Profit websites (P = 0.53). The difference between Government, Not-forProfit and Commercial websites is even more evident when the focus is shifted to Advertising trackers. Among the Government sites, 21.1% (n = 8) had at least one tracker from an advertiser, compared to 51.7% (n = 30) of the Not-forProfit sites and 83.6% (n = 46) of the Commercial sites. This difference was significant (v2(2, N = 151) = 36.24, P < 0.001, V = 0.49).

187

188

Could we do better?, Jacquelyn Burkell & Alexandre Fortier

Figure 3 Average of third-party trackers

Figure 4 Average of Advertising trackers

Government sites had an average of Advertising trackers of 0.3 domains (Min = 0, Max = 3), Notfor-Profit sites had an average of 2.5 domains (Min = 0; Max = 17), and Commercial sites had trackers from an average of 11.2 domains (Min = 0; Max = 38) (See Figure 4). There was a statistically significant difference between these groups, as determined by one-way analysis of variance (F(2,148) = 28.71, P < 0.001, g² = 0.28). A Tukey post hoc test revealed that the number of Advertising trackers was significantly lower for Government (M = 0.3, SD = 0.7, P < 0.001) and Not-for-Profit websites (M = 2.5, SD = 0.3, P < 0.001) compared to Commercial websites (M = 11.2, SD = 18.6). There were no statistically significant differences between Government and Not-for-Profit websites (P = 0.36). These results demonstrate that Government, Not-for-Profit and Commercial sites are equally

likely to contain at least one Problematic tracker; the three groups differ, however, with respect to the presence of Advertising trackers, with these being most common on Commercial sites. With respect to the number of trackers, Commercial sites have more Problematic trackers and more Advertising trackers than do Government or Notfor-Profit sites, which do not differ from each other in these respects. Why are Recommended sites better? Our results thus far indicate the RecommendedOnly sites fare better with respect to tracking than do either Google-Only sites or sites that are identified in both sources. The results also demonstrate, however, that Commercial sites have higher levels of tracking than do either Government or Not-for-Profit sites. This raises an © 2015 Health Libraries Group Health Information & Libraries Journal, 32, pp. 182–194

Could we do better?, Jacquelyn Burkell & Alexandre Fortier

important question: Can the lower level of tracking on Recommended-Only sites be attributed to differences in the types of cites (Government, Not-for-Profit or Commercial) that appear in the Recommended-Only list? To address this question, we examined the relationship between source (Recommended-Only, Google-Only or Recommended-and-Google) and type (Government, Not-for-Profit or Commercial) of website. A chi-square analysis revealed a significant association (v2(2, N = 151) = 39.76, P < 0.001, V = 0.36): the Google-Only category is dominated by Commercial websites, while the Recommended-Only and Recommended-andGoogle categories both have larger and approximately equal proportions of Government and Not-for-Profit sites (see Table 1 for the complete distribution). Given the small number of sites in some categories (e.g. there are only two sites in the Recommended-Only group that are Commercial), we cannot statistically test the joint impact of source and type on behavioural tracking. Instead, we provide a descriptive overview of Problematic trackers (Table 2) and Advertising trackers (Table 3) in each combination of source (Recommended-Only, Recommended-and-Google and Google-Only) and type (Government, Not-forProfit, and Commercial). A quick perusal reveals that Problematic trackers are present in the large majority of all groups, with Google-Only Government sites showing the lowest level (Problematic trackers are present on only 62% of these sites). Furthermore, while Commercial websites that appear on the Recommended-Only list are somewhat less likely to include Problematic trackers than are the Google-Only Commercial sites (Problematic trackers appear on 75% of the Commercial sites on the Recommended-Only list compared to 96% of the Commercial sites on the Google-Only list), all of the Commercial sites identified by both sources

(admittedly, only two in number) include Problematic trackers. When we concentrate on Advertising trackers (Table 3), it appears that Recommended-Only Government and Not-forProfit sites are more likely to include these trackers than are Google-Only sites, and Recommended-Only Commercial sites are only slightly less likely (75% compared to 84%) to include Advertising trackers. Overall, it appears that librarians tend to recommend Government and Not-for-Profit sites over Commercial sites. Given that these same types of sites (Government and Not-for-Profit) also show lower levels of tracking, the overall lower levels of tracking on Recommended-Only sites could be accounted for by a tendency to recommend a different type of site (Government or Not-for-Profit), rather than better oversight of the tracking practices of recommend sites. This conclusion is strengthened by the fact that within each type of site (Government, Not-for-Profit and Commercial), similar levels of Problem tracking and tracking by advertisers are observed. Thus, it does not appear that information professionals are selectively recommending those Commercial, Notfor-Profit or Government sites that are more privacy-respecting in that they are less likely to engage in tracking; instead, information professionals simply tend to recommend the types of sites (Government and Not-for-Profit) that are less likely to engage in tracking. Discussion The contrast between Recommended-Only, Google-Only and Google-and-Recommended provides important information, but it does not give an overview of actual user experience. Information seekers following the recommendations of information professionals would in fact encounter both the sites identified as

Table 1 Distribution of types of websites by source

Recommended-Only Recommended-and-Google Google-Only

Government

Not-for-Profit

Commercial

Total

37.3% (n = 22) 44.4% (n = 8) 10.8% (n = 8)

49.2% (n = 29) 44.4% (n = 8) 28.4% (n = 21)

14.5% (n = 8) 11.1% (n = 2) 60.8% (n = 45)

100% (n = 59) 100% (n = 18) 100% (n = 72)

© 2015 Health Libraries Group Health Information & Libraries Journal, 32, pp. 182–194

189

190

Could we do better?, Jacquelyn Burkell & Alexandre Fortier Table 2 Proportion of websites with Problematic trackers by website source and type

Recommended-Only Recommended-and-Google Google-Only

Government

Not-for-Profit

Commercial

81.8% (18 of 22) 100% (8 of 8) 62.5% (5 of 8)

93.1% (27 of 29) 87.5% (7 of 8) 81% (17 of 21)

75% (6 of 8) 100% (2 of 2) 95.6% (43 of 45)

Table 3 Proportion of websites with Advertising trackers by website source and type

Recommended-Only Recommended-and-Google Google-Only

Government

Not-for-Profit

Commercial

22.7% (5 of 22) 25% (2 of 8) 12.5% (1 of 8)

51.7% (15 of 29) 75% (6 of 8) 42.8% (9 of 21)

75% (6 of 8) 100% (2 of 2) 84.4% (38 of 45)

Table 4 Summary of tracking on Recommended-and-Google sites

Presence of at least one Problem tracker Average number of Problem trackers Range of Problem trackers Presence of at least one Advertising tracker Average number of Advertising tracker Range of Advertising tracker

Recommended sites (Recommended-Only and Recommended-and-Google)

Google sites (Google-Only and Recommended-and-Google)

88.3% (n = 68) 6.2 0–55 46.8% (n = 36) 2.3 0–32

89.1% (n = 82) 14.1 0–58 63% (n = 58) 7.5 0–38

Recommended-Only and Recommended-andGoogle in our study; those seeking information through Google searches would encounter the sites identifies as Google-Only and Recommended-andGoogle. As indicated above, 18 websites appeared both in the Recommended lists and on Google searches; thus, there is a total of 77 Recommended sites (59 Recommended-Only and 18 Recommended-and-Google) and a total of 92 Google sites (74 Google-Only and 18 Recommended-and-Google). Given the overlap between the two sets of sites (Google-andRecommended), we cannot conduct statistical tests to compare tracking; we can, however, provide a descriptive overview of the level of tracking that an information seeker will encounter when seeking information on Recommended sites versus those returned by Google. Table 4 presents an overview of tracking practices on Recommended sites (RecommendedOnly and Recommended-and-Google) and Google sites (Google-Only and Recommended-and-

Google). It is starkly clear from this summary that those seeking health information online are overwhelmingly likely to be subject to behavioural tracking, whether they turn to information professionals for website recommendations or identify sources themselves through Google searches. Although Advertising trackers, which present the most significant privacy threat, are present on a smaller proportion of Recommended sites, users who follow information professional recommendations will still encounter these trackers on half of the websites they visit. The bottom line is clear: when it comes to protecting patrons from behavioural tracking of their consumer health information seeking, information professionals could do better. The detailed analyses presented above demonstrate that behavioural tracking, especially by advertisers, is in general less prevalent and less intense (in terms of the number of trackers observed) on Recommended sites than on sites identified by Google searches. The same can be © 2015 Health Libraries Group Health Information & Libraries Journal, 32, pp. 182–194

Could we do better?, Jacquelyn Burkell & Alexandre Fortier

said, however, of Government and Not-for-Profit sites compared to Commercial sites: less tracking, and particularly less tracking by advertisers, is present on these types of sites. Further examination suggests that the advantage observed for Recommended sites can be attributed to the types of sites that tend to be recommended by information professionals (Government and Notfor-Profit sites), rather than to active selection of privacy-respecting online resources. In other words, to the extent we get it right – recommending resources that protect patron privacy – we are benefiting from other criteria we use for selecting good resources. Commercial websites are an especially important case. Our results indicate that these sites have the highest level of tracking. This is not surprising, considering the importance of behavioural tracking in the current business model of most, if not all, Commercial websites.50 Furthermore, more popular sites that appear high on Google rankings are also more likely to contain trackers, precisely because their popularity makes these lucrative sites for advertisers.46 Consumer health information sites are likely to become popular precisely because they present useful information in a way that is accessible to consumers. Thus, the natural tendency is for Commercial sites to pit privacy interests against site quality: the more popular (and presumably better) the site is, the more likely it is to include behavioural tracking. From a privacy perspective, it appears that recommending Commercial websites, particularly very popular consumer websites, is a risky practice. Behavioural tracking has a variety of purposes, some positive for both individuals and the larger society,51–54 and analysis of behavioural tracking data can reveal important information about health conditions and situations.55,56 Even in the realm of research, however, the use of these behavioural data presents ethical challenges,57 and these challenges are exacerbated in the context of advertising applications of tracking data.58 In particular, behavioural tracking by third-party advertisers presents significant privacy concerns to those seeking information online.35,58 There are, of course, others threats to online privacy including government surveillance,59 and these threats may © 2015 Health Libraries Group Health Information & Libraries Journal, 32, pp. 182–194

raise greater levels of concern in some countries.60 The use of behavioural tracking by third-party advertisers to assemble fine-grained individual profiles for various marketing purposes, however, presents significant privacy risks to anyone participating in the online environment, and these privacy concerns must be addressed. Conclusion Responses to the privacy issues raised by behavioural tracking are multi-faceted and include technological approaches (e.g., ref. 61 but see ref. 62 on the limits of these approaches), regulatory responses37 and user education.63 Information professionals have a responsibility to be aware of the privacy risks associated with online surveillance, and they should keep abreast of technological and regulatory developments that relate to online privacy. They also have a key role to play in information literacy training that assists users to become active in protecting their own privacy online. This study focuses on a fourth avenue of response: minimising the behavioural tracking privacy risks associated with recommended resources. Our research demonstrates that the health information resources recommended by information professionals show high levels of behavioural tracking, including tracking by thirdparty advertisers. While recommended sites show lower levels of tracking compared to sites returned by Google searches, the difference can be attributed primarily to a difference in the types of sites recommended (Government rather than Commercial), rather than to careful selection of resources to minimise behavioural tracking risks. Privacy protection is a core professional value for information professionals, and selection of recommended resources should proceed in the light of this professional commitment. To this end, we have several recommendations. First, information professionals should educate themselves with respect to behavioural tracking mechanisms and the measures that can be taken to identify and even block behavioural tracking.21 Second, armed with this information and appropriate tools, information professionals should assess the behavioural tracking practices of the

191

192

Could we do better?, Jacquelyn Burkell & Alexandre Fortier

consumer health websites they recommend. At minimum, tracking practices should be disclosed to consumers; in addition, behavioural tracking should be one of the factors taken into account in website recommendations, and where possible, patrons should be offered the choice of information resources that do not participate in online tracking. We are not so na€ıve as to suggest that information professionals can or should restrict their consumer health information website recommendations to those sites that are completely free of behavioural tracking. We recognise that tracking is widespread and that even the most conscientious selection of consumer health websites will not entirely eliminate the risk; moreover, in some cases information quality and information privacy will work at cross purposes, with the best consumer health information sites including the highest levels of tracking. Instead, our goal was much more modest: we encourage information professionals to take behavioural tracking practices into account when recommending consumer health information sites.

8

9

10

11

12

13

14

References 1 McMullan, M. Patients using the Internet to obtain health information: how this affects the patient-health professional relationship. Patient Education and Counseling 2006, 63, 24–28. 2 Rodriguez-Osorio, C. A. & Dominguez-Cherit, G. Medical decision making: paternalism versus patient-centered (autonomous) care. Current Opinion in Critical Care 2008, 14, 708–713. 3 Fox, S. & Jones, S. The Social Life of Health Information [Internet]. Washington, DC: Pew Internet and American Life Project, 2009 [cited 10 August 2014] Accessible at: http:// www.pewinternet.org/2009/06/11/the-social-life-of-healthinformation/. 4 Johnson, J. D. & Case, D. O. Health Information Seeking. New York, NY: Peter Lang, 2012. 5 Harris, R., Henwood, F., Marshall, A. & Burdett, S. “I’m not sure if that’s what their job is”: consumer health information and emerging “healthwork” roles in the public library. Reference & User Services Quarterly 2010, 49, 239–252. 6 Henwood, F., Harris, R., Burdett, S. & Marshall, A. Health intermediaries? Positioning the public library in e-health discourse. In: Wathen, C. N., Wyatt, S. & Harris, R. (eds). Mediating Health Information. Houndmills, UK: Palgrave Macmillan, 2008: 38–55. 7 Rubenstein, E. From social hygiene to consumer health: libraries, health information, and the American public from

15

16

17

18

19

the late nineteenth century to the 1980s. Library & Information History 2012, 28, 202–219. Bakker, S. The changing context of health for library and information professionals. In: Brettle, A. & Urquhart, C. (eds). Changing Roles and Contexts for Health Library and Information Professionals. London: Facet, 2012: 3. Murray, S. Consumer health information services in public libraries in Canada and the US. The Journal of the Canadian Health Libraries Association 2008, 29, 141–143. MacDonald, S. L., Winter, T. & Luke, R. Roles for information professionals in patient education: librarians’ perspective. Partnership: The Canadian Journal of Library and Information Practice & Research 2010, 5. Spatz, M. History of consumer and patient health librarianship. In: Spatz, M. (ed.). The Medical Library Association Guide to Providing Consumer and Patient Health Information. Lanham, MD: Rowman and Littlefield, 2014: 1–10. Medical Library Association A user’s guide to finding and evaluating health information on the web [Internet]. Chicago, IL: The Association; [cited 10 August 2014]. Accessible at: http://www.mlanet.org/resources/userguide.html. Fox, S. & Rainie, L. Vital decisions: how Internet users decide what information to trust when they or their loved ones are sick [Internet]. Washington, DC: Pew Internet and American Life Project, 2002 [cited 10 Aug 2014]. Accessible at: http://www.pewinternet.org/2002/05/22/vitaldecisions-a-pew-internet-health-report/. Dickenson, N., Huddleston, C., Johnson, J., Kumagai, G. & Lopez, E. Health reference service. In: Spatz, M. (ed.). The Medical Library Association Guide to Providing Consumer and Patient Health Information. Lanham, MD: Rowman and Littlefield, 2014: 97–119. Papadakos, J., Trang, A., Wiljer, D., Mis, C. C., Cyr, A., Friedman, A., Mazzocut, M., Snow, M., Raivich, V. & Catton, P. What criteria do consumer health librarians use to develop library collections? a phenomenological study. Journal of the Medical Library Association 2014, 102, 78–84. MedlinePlus Guide to healthy web surfing [Internet]. Bethesda, MD: U.S. National Library of Medicine; [rev 18 Apr 2012; cited 10 August 2014]. Accessible at: http:// www.nlm.nih.gov/medlineplus/healthywebsurfing.html. Soltani, A., Canty, S., Mayo, Q., Thomas, L. & Hoofnagle, C. J. Flash cookies and privacy [Internet]. Rochester, NY: Social Science Research Network, 2009 [cited 10 August 2014]. Accessible at: http://papers.ssrn.com/sol3/papers.cfm? abstract_id=1446862. McDonald, A. M. & Cranor, L. F. Beliefs and behaviors: Internet users’ understanding of behavioural advertising [Internet]. Rochester, NY: Social Science Research Network, 2010 [posted 22 Jan 2012; cited 10 August 2014]. Accessible at: http://papers.ssrn.com/sol3/papers.cfm? abstract_id=1989092. Ayenson, M., Wambach, D. J., Soltani, A., Good, N. & Hoofnagle, C. J. Flash cookies and privacy II: now with HTML5 and ETag respawning [Internet]. Rochester, NY: Social Science Research Network, 2011 [cited 10 August

© 2015 Health Libraries Group Health Information & Libraries Journal, 32, pp. 182–194

Could we do better?, Jacquelyn Burkell & Alexandre Fortier

20

21

22

23

24

25

26 27

28

29

30

31

2014]. Accessible at: http://papers.ssrn.com/sol3/papers.cfm? abstract_id=1898390. Chester, J. Cookie wars: how new data profiling and targeting techniques threaten citizens and consumers in the ‘Big Data’ era. In: Gutwirth, S., Leenes, R., de Hert, P. & Poullet, Y. (eds). European Data Protection: In Good Health? Dordrecht, The Netherlands: Springer, 2012: 53–77. Fortier, A. & Burkell, J. What librarians should know to protect their privacy and that of their patrons. Information Technology and Libraries. In press. Castelluccia, C. & Narayanan, A. Privacy considerations of online behavioural tracking. Heraklion, Greece: European Union Agency for Network and Information Security, 2012 [cited 10 August 2014]. Accessible at: http://www.enisa. europa.eu/activities/identity-and-trust/library/deliverables/ privacy-considerations-of-online-behavioural-tracking. Hannak, A., Soeller, G., Lazer, D., Mislove, A. & Wilson, C. Measuring price discrimination and steering on E-commerce web sites. 14th Internet Measurement Conference; 5–7 November 2014, Vancouver, B.C., Canada. New York: ACM. doi: 10.1145/2663716.2663744. Mikians, J., Gyarmati, L., Erramilli, V. & Laoutaris, N. Crowd-assisted search for price discrimination in e-commerce: first results. 9th International Conference on Emerging Networking EXperiments and Technologies; 9–12 December 2013, Santa Barbara, California. New York: ACM. doi: 10.1145/2535372.2535415. Newman, N. The costs of lost privacy: consumer harm and rising economic inequality in the age of Google. William Mitchell Law Review 2014, 40, article 12. doi: 10.2139/ ssrn.2310146. Andrejevic, M. The big data divide. International Journal of Communication 2014, 8, 1673–1689. Center for Digital Democracy, Consumer Federation of America, Consumers Union, Consumer Watchdog, Electronic Frontier Foundation, Privacy Lives, Privacy Rights Clearinghouse, Privacy Times, U.S. Public Interest Research Group, The World Privacy Forum Online behavioural tracking and targeting concerns and solutions: legislative primer [Internet]. The Coalition, 2009 [cited 10 August 2014]. Accessible at: https://www.eff.org/sites/ default/files/OnlinePrivacyLegPrimerSEPT09.pdf. Kosinski, M., Stillwell, D. & Graepel, T. Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences of the USA 2013, 110, 5802–5805. Anderson-Inman, L. & Horney, M. A. Transforming text for at-risk readers. In: Reinking, D., McKenna, M. C., Labbo, L. D. & Kieffer, R. D. (eds). Handbook of Literacy and Technology: Transformations in a Post-Typographic World. Mahwah, NJ: Lawrence Erlbaum Associates, 1998: 15–44. Berger, M., Wagner, T. H. & Baker, L. C. Internet use and stigmatized illness. Social Science & Medicine 2005, 61, 1821–1827. Cline, R. J. & Haynes, K. M. Consumer health information seeking on the Internet: the state of the art. Health Education Research 2001, 16, 671–692.

© 2015 Health Libraries Group Health Information & Libraries Journal, 32, pp. 182–194

32 PrivacyChoice [Internet]. The Organization; [cited 10 August 2014]. Accessible at: http://www.privacychoice.org/. 33 Federal Trade Commission FTC staff revises online behavioral advertising principles [Internet]. Washington, DC: The Commission; 12 February 2009 [cited 10 August 2014] Accessible at: https://www.ftc.gov/news-events/press-releases/ 2009/02/ftc-staff-revises-online-behavioral-advertising-principles. 34 Federal Trade Commission Self-regulatory principles for online behavioral advertising: tracking, targeting, and technology. [Internet]. Washington, DC: The Commission, 2009 [cited 10 August 2014] Accessible at: https:// www.ftc.gov/sites/default/files/documents/reports/federal-tradecommission-staff-report-self-regulatory-principles-onlinebehavioral-advertising/p085400behavadreport.pdf. 35 Office of the Privacy Commissioner of Canada Policy position on online behavioural advertising [Internet]. Ottawa, Canada: The Office, 2012 [cited 10 August 2014]. Accessible at: https://www.priv.gc.ca/information/guide/ 2012/bg_ba_1206_e.asp. 36 Office of the Privacy Commissioner of Canada Use of sensitive health information for targeting of Google ads raises privacy concerns [Internet]. Ottawa, Canada: The Office; 14 January 2014 [cited 10 August 2014]. Accessible at: https://www.priv.gc.ca/cf-dc/2014/2014_001_0114_e.asp. 37 Mayer, J. R. & Mitchell, J. C. Third-party web tracking: policy and technology. 33rd IEEE Symposium on Security and Privacy; 20–23 May 2012, San Francisco, California. Washington, DC: IEEE Computer Society. doi: 10.1109/SP.2012.47. 38 International federation of Library Associations and Institutions Riding the waves or caught in the tide? Navigating the evolving information environment [Internet]. The Hague, The Netherlands: The Federation, 2013 [cited 10 August 2014]. Accessible at: http://trends.ifla.org/ insights-document. 39 American Library Association Code of Ethics of the American Library Association, 2008. Chicago, IL: The Association, 1939 [amended 22 Jan 2008; cited 10 August 2014]. Accessible at: http://www.ala.org/advocacy/proethics/ codeofethics/codeethics. 40 American Library Association Choose Privacy Week. Why Libraries [Internet]. Chicago, IL: The Association; [cited 10 August 2014]. Accessible at: http://chooseprivacyweek.org/ our-story/why-libraries/. 41 Slobogin, C. Privacy at Risk: The New Government Surveillance and the Fourth Amendment. Chicago, IL: University of Chicago Press, 2007. 42 Libert, T. Privacy implications of health information seeking on the web [Internet]. Rochester, NY: Social Science Research Network, 2014 [cited 10 August 2014]. Accessible at: http://papers.ssrn.com/sol3/papers.cfm? abstract_id=2423006. 43 Medical Library Association Top 100 list health websites you can trust [Internet]. Chicago, IL: The Association, 2013 [cited 10 August 2014]. Accessible at: http://caphis. mlanet.org/consumer/top100all.pdf. 44 Canadian Health Libraries Association Top 10 Canadian consumer health websites [Internet]. The Association, 2010

193

194

Could we do better?, Jacquelyn Burkell & Alexandre Fortier

45

46

47

48

49 50

51

52

53

54

[cited 10 August 2014]. Accessible at: http://www. chla-absc.ca/chipig/Events/CHLA2010poster.pdf. Fox, S. Health topics [Internet]. Washington, DC: Pew Research Center’s Internet & American Life Project, 2011 [cited 10 August 2014]. Accessible at: http://www.pewinternet.org/2011/02/01/health-topics-2/. McDonald, A. M. & Cranor, L. F. A survey of the use of Adobe Flash Local Shared Objects to respawn HTTP cookies [Internet]. Pittsburgh, PA: Carnegie Mellon University, CyLab, 2011 [cited 10 August 2014]. Accessible at: http://repository.cmu.edu/cgi/viewcontent.cgi?article=1078 &context=cylab. Englehardt, S., Eubank, C., Zimmerman, P., Reisman, D. & Narayanan, A. Web privacy measurement: scientific principles, engineering platform, and new results [Internet]. Princeton, NJ: Princeton University, 2014 [cited 1 March 2015]. Accessible at: http://randomwalker.info/publications/ WebPrivacyMeasurement.pdf. Ghostery [Internet]. New York, NY: The Organization; [cited 10 August 2014]. Accessible at: https://www.ghostery. com/. Charles [Internet]. The Organization; [cited 10 August 2014]. Accessible at: http://www.charlesproxy.com. Angwin, J. The web’s new gold mine: your secrets [Internet]. New York, NY: Wall Street J, 2010 [cited 10 August 2014]. Accessible at: http://online.wsj.com/news/ articles/SB10001424052748703940904575395073512989404. Dennison, L., Morrison, L., Conway, G. & Yardley, L. Opportunities and challenges for smartphone applications in supporting health behavior change: qualitative study. Journal of Medical Internet Research 2013, 15, e86. Exeter, D. J., Rodgers, S. & Sabel, C. E. “Whose data is it anyway?” The implications of putting small area-level health and social data online. Health Policy 2014, 114, 88–96. Eysenbach, G. Infodemiology and infoveillance: tracking online health information and cyberbehavior for public health. American Journal of Preventive Medicine 2011, 40, S154–S158. Toch, E., Wang, Y. & Cranor, L. F. Personalization and privacy: a survey of privacy risks and remedies in personalization-based systems. User Modeling and User-Adapted Interaction 2012, 22, 203–220.

55 Ayers, J. W., Althouse, B. M., Allem, J. P., Rosenquist, J. N. & Ford, D. E. Seasonality in seeking mental health information on Google. American Journal of Preventive Medicine 2013, 44, 520–525. 56 Milinovich, G. J., Williams, G. M., Clements, A. C. A. & Wenbiao, H. Internet-based surveillance systems for monitoring emerging infectious diseases. The Lancet Infectious Diseases 2014, 14, 160–168. 57 Vayena, E., Mastroianni, A. & Kahn, J. Ethical issues in health research with novel online sources. American Journal of Public Health 2012, 102, 2225–2230. 58 Hoofnagle, C. J., Soltani, A., Good, N., Wambach, D. J. & Ayenson, M. Behavioral advertising: the offer you cannot refuse. Harvard Law & Policy Review 2012, 273. Rochester, NY: Social Science Research Network; 28 August 2012 [cited 1 March 2015]. Accessible at: http://papers.ssrn.com/ sol3/papers.cfm?abstract_id=2137601. 59 Richards, N. M. The dangers of surveillance. Harvard Law Review 2012, 126, 1934–1965. 60 Morozov, E. The Net Delusion: The Dark Side of Internet Freedom. New York: Public Affairs, 2011. 61 Balebako, R., Leon, P., Shay, R., Ur, B., Wang, Y. & Cranor, L. Measuring the effectiveness of privacy tools for limiting behavioral advertising. Web 2.0 Security and Privacy Workshop; 24 May 2012, San Francisco, California. Accessible at: http://www.w2spconf.com/2012/papers/w2sp12final2.pdf. 62 Leon, P., Ur, B., Shay, R., Wang, Y., Balebako, R. & Cranor, L. Why Johnny can’t opt out: a usability evaluation of tools to limit online behavioral advertising. Conference on Human Factors in Computing Systems; 5–10 May 2012, Austin, TX. New York: ACM. doi: 10.1145/ 2207676.2207759. 63 Rader, E. Awareness of behavioral tracking and information privacy concern in Facebook and Google. 10th Symposium on Usable Privacy and Security; 9–11 July 2014, Menlo Park, California. Berkeley, California: Usenix. Accessible at: https://www.usenix.org/conference/soups2014/proceedings/ presentation/rader. Received 15 October 2014; Accepted 29 May 2015

© 2015 Health Libraries Group Health Information & Libraries Journal, 32, pp. 182–194

Could we do better? Behavioural tracking on recommended consumer health websites.

This study examines behavioural tracking practices on consumer health websites, contrasting tracking on sites recommended by information professionals...
508KB Sizes 3 Downloads 7 Views