Does Query Expansion Limit Our Learning? A Comparison of Social-Based Expansion to Content-Based Expansion for Medical Queries on the Internet Christopher Pentoney1, Jeff Harwell1, Gondy Leroy PhD1,2 1 Claremont Graduate University, Claremont, CA; 2University of Arizona, Tucson, AZ Abstract Searching for medical information online is a common activity. While it has been shown that forming good queries is difficult, Google’s query suggestion tool, a type of query expansion, aims to facilitate query formation. However, it is unknown how this expansion, which is based on what others searched for, affects the information gathering of the online community. To measure the impact of social-based query expansion, this study compared it with contentbased expansion, i.e., what is really in the text. We used 138,906 medical queries from the AOL User Session Collection and expanded them using Google’s Autocomplete method (social-based) and the content of the Google Web Corpus (content-based). We evaluated the specificity and ambiguity of the expansion terms for trigram queries. We also looked at the impact on the actual results using domain diversity and expansion edit distance. Results showed that the social-based method provided more precise expansion terms as well as terms that were less ambiguous. Expanded queries do not differ significantly in diversity when expanded using the social-based method (6.72 different domains returned in the first ten results, on average) vs. content-based method (6.73 different domains, on average). Introduction Searching for medical and healthcare information is the third most popular online activity 1. However, it poses several difficulties to online health information consumers: composing good queries, identifying objective information, and understanding the content. The second and third problems have been the topic of much research. Evaluating the quality of websites automatically and manually has been shown not only to be possible, but also effective2, and there exist tools and frameworks for evaluating health information online specifically3,4. Improving understanding has relied on readability formulas, and more recently, evidence-based corpus-driven simplification. The readability formulas are applied to a variety of text, e.g., for patient education materials 5,6, bereavement materials7, informed consent forms8,and even survey instruments9. The formulas are recommended to help evaluate text10 suggested by government instances and in IRB training courses. More recent corpus-based approaches use semi-automated simplification and have demonstrated that using more familiar terms, grammar structures, or shorter noun phrases is associated with both perceived and actual text difficulty11,12. The first problem is less frequently addressed or placed as the goal of information systems development in the medical field. This is an issue because knowing what type of information is best to present to a user can be difficult. Especially with complex medical terms and conceptual relationships, naïve users may not have enough knowledge to even form an effective search query. A current, popular solution of generic search engines, e.g., Google, is to automatically offer additional terms to a user that might help better define their search. Relevant terms are typically appended to the user’s search based upon popularity of similar searches conducted by other users13. While query formation has also been subject of considerable research, it only became a common tool when Google provided their automatic query completion. This study compares Google’s method of query expansion to another method that focuses on the information within the documents being searched. Our goal is to evaluate the effect of using the different query expansion terms on the query and the query results. Since the social-based queries rely on other people’s queries, they may make searching less diverse. At the population level, this may lead to less information being read and digested. If this limited information happens to be biased or untrue, the longitudinal effects may affect the general population. To evaluate social-based expansion, we compare it with an objective, text-based query expansion baseline that uses the existing text on the Internet as the basis for query expansion. In order to explore information, it may be more useful to expand based on what exists in the documents rather than what other people are searching. If searchers are already naïve, using their queries would seem to be a somewhat misguided way to finding reliable health information online.

976

Background and Significant Information Online Medical Search With 72% of internet users in the U.S. responding that they have searched online for health information in 201314, there is a clear need to deliver this knowledge in a way that presents internet users with reliable information. Furthermore, 77% of people who searched online for information about health said that they began the search with a common search engine (i.e. Google, Bing, Yahoo), and 52% of smartphone owners say that they have even used their phones to search for medical information14. People are constantly looking for health information, even on-thego, and it needs to not only be reliable, but accessible. It is important for search engines to return medical knowledge that is useful and trustworthy. One way to make this information accessible is to guide people to useful information through the use of query expansion. By helping them form their searches, a search engine can provide relevant information. Query Expansion Returning relevant and reliable information can be difficult. Natural language itself is often very ambiguous, and health language is no exception; it is further complicated by medical jargon. Query expansion helps guide a user in generating a query by suggesting search terms that may be relevant to their search. Historically, these additional terms have been generated through methods such as relevance feedback, co-occurrence, and ontologies15. This is not a trivial problem since words can have multiple meanings and different words can share similar meanings. Many expansion techniques were evaluated at the Text Retrieval Conferences (http://trec.nist.gov/) with varying results. However, regardless of technique, most expansion methods showed improved search results. Today, query expansion as offered by Google is based on overall popularity of search queries already submitted by others13. While it has been shown that successive querying (more queries) leads to looking for increasingly serious conditions16, the effect of query expansion (more terms) is unknown. For the individual, seeing alternate query options may lead to more diverse searching or a different search than originally intended. For the population of all searching individuals, reusing queries may lead to less diverse search and even more limited medical domain knowledge. While other research has explored different methods to expanding queries in medical text, none has used the actual document text to suggest additional terms. Knowledge-based types of query expansion that leveraged the Unified Medical Language System (UMLS) as a related information source for expanding queries led to increases in precision-recall17. Expansion methods that employ the UMLS Metathesaurus for expansion with synonyms or hierarchically related terms18 showed similar improvements. Using these content-based or knowledge based methods of query expansion have advantages over the social-based methods, and they should be compared. Research Interests Our overall interest was to determine the differences between two different methods of expanding medical queries. It is possible that the social-based expansions will not offer diverse enough results, which may result in a misleading picture of the results. This particular project addressed a comparison between the status quo (Google’s social-based expansion) and a content-based expansion method that can sometimes present results more clearly. Given these two types of expansion methods, we were able to conduct a study that explored two questions: (1) how do the expansion terms for medical queries differ based on expansion type, and (2) how do results differ based upon expansion type? Google searches were conducted using each of the methods on 1% of the queries that were identified as being medically relevant. Using existing knowledge, as found in text, to expand medical queries may provide some benefit over socially popular expansion types. Current social-based methods may help to spread common misunderstandings or misinformation, while content-based methods will show term relationships that exist in text already. Furthermore, as popularity of a term increases, it may become easier for people to come across information that is only popularly searched, and much more difficult to find new knowledge. This study investigates both content-based and socialbased expansion types.

977

Social and Content-based Query Expansion Nearly a quarter (24.2%) of all queries in the AOL User Search Collection was identified as trigrams, whereas only 19.9% were bigrams and 17.4% were unigrams. Since trigrams were so frequent, we decided to limit the study to all three keyword searches. Social-based expansion i.e., Google’s Auto-complete function, involves appending the search with terms that have been popularly used by other individuals who have searched with similar queries. In order to do this type of expansion, we programmatically submitted queries to Google. Google’s suggestions were obtained using Google’s Autocomplete API19 and a Python 2.7 script that sent each search with its expansions to the Google Search (URL: https://ajax.googleapis.com/ajax/services/search/web?v=1.0&rsz=8&q=). The returned set contains 10 queries. We use all of these top 10 suggestions to expand upon, although it is unclear what the order is based on. Each suggested term is added to the initial query to provide a more precise search. Content-based expansion depends upon the text in online documents themselves. We used the 2006 Google Web Corpus20, which reflects the index of all sites available to Google. Since we used trigram queries, the four-gram data contained in the corpus (approximately 1.3 million) was needed to suggest the additional terms: the original trigram and one extra term. Punctuation and symbols were filtered out before matching the queries to the Web Corpus in order to find text matches. Table 1 below shows how the two types of expansion yielded different searches. Typically, the content-based expansion led to broader expansion terms, which have more Google results overall. Even though terms may have been related between the expansion types (i.e. “seroconversion” and “labor”; “figures” and “info”), the contentbased ones were often times more broad and returned a larger number of results from Google. Table 1. Examples of social-based and content-based query expansions with the number of results returned by Google. Original Search

Social-Based Expansion

HIV testing during

HIV testing during seroconversion (122,000 results)

Obesity and blood pressure

Obesity and blood pressure cuff (1,050,000 results)

Diabetes Facts and

Diabetes facts and figures (1,690,000 results)

Content-Based Expansion HIV testing during labor (2,200,000 results) Obesity and blood pressure baby (15,900,000 results) Diabetes facts and info (21,900,000 results)

Study Design Stimuli Medical Queries: Queries were obtained from the AOL 500k User Session Collection, which is a set of approximately 20M queries from 650K AOL users gathered during a three-month period in 2006. We selected queries that matched terms from the UMLS that were associated with the following UMLS semantic types: antibiotic, disease or syndrome, injury or poisoning, or body substance. We choose these semantic types because they are related to top medical concerns as described on Health.gov. The fourth gram of a four-gram was used to expand the base three-gram queries for content-based For each query, we collected up to ten expansion terms using both methods. This results in a total of 346,145 expanded queries. However, since each query is submitted to Google for the results evaluation and to complete the project in a reasonable time frame, we extracted a random 1% sample

978

for the analysis. This provided 1,731 expanded searches for social-based expansions, and 1,731 expanded searches for corpus-based expansions. Variables There was one independent variable used in the study: query expansion type. All medical queries were expanded in two fashions: using social query expansion and using content query expansion. We evaluated the effect of expansion on the query itself using specificity and ambiguity and on the results by using domain diversity and expansion edit distance. These metrics were chosen because they provide an objective, if not complete, evaluation of queries and because they can be used for large sets of queries in an automated fashion. The first two dependent variables (ambiguity and specificity) help evaluate the effect of expansion on the queries. They are of interest because they measure how easily understood a word might be. Words that are more ambiguous have more possible meanings and would be expected to lead to more diverse results. Words that are more specific are further down a conceptual hierarchy (i.e. German Shepard is more specific than dog) and would be expected to result in a more focused results set (even though that set of results can be large). Both are calculated using WordNet 3.0 through the NLTK for Python 2.7. Ambiguity was calculated as the number of synsets for a specific term, or how many different concepts that word can have. For example, the word “dog” has eight synsets or cognitive synonyms in WordNet, while the word “beagle” only has 1. This is an indication of how many different concepts each word might have, suggesting that “dog” is more ambiguous than “beagle.” Specificity was as the depth in the hypernym path of the first synset for each word. In this case, “dog” is found at the ninth level in WordNet’s tree (top term is simply: “entity”), while “beagle” has 12, indicating that “beagle” is a more specific term conceptually than “dog.” The effect of expansion on the results themselves was measured by the diversity of domains in the results. Similarly, this measure was chosen because it is a straightforward way to evaluate the number of different domains returned by a search. This was measured using the Google Search API, which returns results as a JSON object given a query. Results returned show the base URL of each result as seen in Table 2. Table 2. Example of the top three domains returned by a content expanded query and social expanded query. Query

Top Three Domains

HIV testing during seroconversion

www.healthline.com www.aidsmap.com www.jhsph.edu

(Social-based expansion) HIV testing during labor (Content-based expansion)

www.aids.gov www.cdc.gov www.jiasociety.org

Finally, the expansion edit distance was calculated to compare the effect of an expanded query to the original (unexpanded) query. This was done in both a ranked and unranked manner. In the unranked method, the order of results was not taken into account, only the domains themselves. Searches that returned one different result when the expansion was included compared to when it wasn’t would have a distance of 1. For example, in the results above, if the query “HIV testing during seroconversion” had returned “www.thebody.com” instead of “www.jhsph.edu,” but the other two returned domains had been the same, the unranked edit distance would be 1. For the ranked edit distance, order was taken into account. For example, if the query had this time returned had returned

979

“www.thebody.com” instead of “www.jhsph.edu,” and “www.healthline.com” and “www.aidsmap.com” had been switched in order, the edit distance would be 2. Results Searching the 3462 expanded queries required approximately 38.2 hours to complete using a Python script with the Google Search API previously described. Table 3 shows the details of expansion type on ambiguity and specificity of queries. We provided the numbers for the entire set, and for the subset that were used to evaluate the results. Since the original query did not change, only the expansion terms, these numbers were averages of the expansion terms only. Overall, the social-based expansion terms were almost an entire level more specific in the WordNet 3.0 hierarchy. The difference in ambiguity was larger. On average, there was an extra meaning (sense) for content-based expansion terms versus social-based expansion terms. A similar trend is seen for the 1% sample, although the differences are smaller. Table 3. Query Evaluation: Average specificity and ambiguity for different query expansion types. Complete Set (N = 346,145)

Specificity

Ambiguity

Social-based (N = 277,319)

7.89

6.37

Content-based (N = 68,826)

7.01

7.68

Social-based (N = 1731)

7.16

7.37

Content-based (N = 1731)

7.10

7.50

1% Sample (N = 3462)

This means that for each individual using this query expansion method, the social-based method adds a term that narrows the search term to a subcategory of the original search. These words also tend to have fewer different meanings. Table 3 also shows the results of social-based and content-based expansion the results of the search. The number of unique domains did not vary depending upon the type of expansion used. For content-based expansion, an average of 6.73 (SD = 1.53) unique domains were returned, for social-based expansions an average of 6.72 (SD = 1.55). A Welch’s t-test showed no significant difference in domain diversity for the two expansion types (t = .04, p = 0.96, d < 0.01). Table 4 shows the expansion edit distance (steps needed to go from unexpanded to expanded result set) for the two expansion types. There was a statistically significant difference between ranked social-based and content-based expansion. The social-based expansion had a slightly higher edit distance (M = 6.67, SD = 1.57) than content-based expansion (M = 6.44, SD = 1.73) t(3,460) = 4.07, p < .001. This means that, on average, social-based expansion resulted in more different results before and after expansion than the results from the content-based expansion. It is important to note, however, that the mean difference was small (0.23), and may not be a meaningful difference although it is statistically significant. The difference between the unranked edit distance was also significant in the same direction, such that social-based queries had a slightly higher unranked edit distance (M = 4.61, SD = 2.00) than content-based queries (M = 4.38, SD = 2.20), t(3,460) = 3.22, p < .01. In other words, query expansions matters: expanding the queries by a single term caused four of the ten results to change, on average.

980

Table 4. Average edit distances for top ten results from content-based and social-based expanded queries. Ranked

Unranked

Content-based ( n = 1731)

M=6.44 SD=1.73

M=4.38 SD=2.20

Social-based ( n = 1731)

M=6.67 SD=1.59

M=4.61 SD=2.00

Discussion Two main findings stand out from this study. First, there is an interesting difference in specificity and ambiguity in the content-based expansion and social-based expansion. Terms generated through social-based expansion tended to be less ambiguous (have fewer different meanings) and more specific (further down the concept hierarchy) than terms created through content-based expansion. Since social-based expansion takes into account the popularity of searched phrases, it is likely that people use much more specific terms when they are searching information than actually exist in the text online. Information that exists on the web is written to express a certain point or relationship an author wants to express. Secondly, although the findings above show that the two types of expansions resulted in different types of terms on average, there was almost zero difference in diversity of domains returned. However, since the starting queries in the 1% were less different based on expansion method, the results have most likely been affected. There is the possibility that Google purposely places a similar number of domains for every query in order to keep results diverse. In addition, a limitation of our work is that we could not directly compare the domains returned by socialbased to content-based queries. On an individual query basis there was no good way to pair queries for a matched comparison. That is, a social-based expansion “hiv during seroconversion” is asking for different, more specific information than “hiv during pregnancy,” and would obviously return different domains. Because of this, we only were able to compare the number of different domains returned by each expansion type, on average. Edit distances provided some interesting information. Although the small difference in edit distance between the two expansion types was statistically significant, the raw difference was small. It is worth noting, however, that the addition of the terms from their base query did change results by quite a bit. Not only was the order of existing results changed, but nearly half of the results completely changed depending upon whether the query was expanded or not. At the very least, it can be argued that expanding does matter, but the method of expansion may not matter as much. It is unclear how much the expansion would affect user behavior, and subjective evaluation from human raters is necessary to further investigate the results here. Furthermore, there may be some impact of Google’s AdWords on results in a real user setting. Certain search results will actually be from paid advertisements that are a function of the keywords searched. These were not collected in this study, but would likely have some influence on actual search behavior nonetheless. It may be hypothesized that users form subsequent queries based on the text of initial results. As such, paid advertisements within the results may influence what is being searched. Overall, the small size of these effects is surprising, but there is the possibility that there may still be an impact on the diversity of information people read. More studies are needed evaluate the effects of the current query expansion on health information consumption. A further limitation is that this study represents only the effects on one query followed by an expanded query. The effect may be much more pronounced when 3 or 4 searches on one topic (as would be common for someone searching online) are combined. An engaged user is very likely to have interest in a topic beyond a single query. The effect could be such that results end up converging, or diverge even further. The effect could be such that search results converge or diverge even further. The actual impact of the human behavioral process of searching is a complex question that requires further study. Current directions in this work involve exploring how an individual user forms multiple queries upon the same search topic.

981

A final limitation of this study was the sole use of objective measures, so there was no direct human interaction with the search process. Subjective evaluations from human raters will be added to fully evaluate the results and look at the quality of information and the ability of readers to form a comprehensive picture. Conclusion Queries can be expanded using content from the documents being searched to provide queries that are broader on average than terms used by popular social-based expansion methods. Both social-based and content-based methods of expansion do not lead to differences in diversity of information returned on the internet, but whether or not the results differ in other ways is unknown. It is possible that one of the two methods returns better results depending on different criteria. Further research is needed to explore this possibility. Future work will include subjective ratings of results and expansion terms in addition to analysis of temporal trends and human interaction studies with concrete search tasks. Specifically, the effects of multiple queries on a search topic need to be investigated to provide a more complete picture of how different types of search expansion affect the results returned. It is unlikely that an interested individual would conduct a search on a topic using a single query. In addition to research on the process of searching, evaluation of the search results needs to be refined and measured using broader criteria, such as subjective ratings of the quality and diversity of webpages. Ultimately, human raters and searchers will be a crucial part of next steps in this research, including the formation of queries and multiple search process, as well as the subjective evaluation of search results. Subjective ratings will give insight into human understanding and initial response to results returned by the different expansion types. Exploring results from search engines (i.e. Bing or DuckDuckGo) may also provide different results, but it is unclear what direction those results might go. Ultimately, more interaction from human raters is needed to draw strong conclusions about the different expansion types.

982

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.

Fox, S. Health topics. Pew Research Internet Project. February 2011. [online] Available: http://www.pewinternet.org/2011/02/01/health-topics-2/ Hasan L, Abuelrub E. Assessing the quality of web sites. Applied Computing and Informatics. 2011; 9.1 11-29. Weitzel L, Quaresma P, de Oliveira JPM. Evaluating quality of health information sources. Advanced Information Networking and Applications (AINA), 2012. Wang Y, Zhenkai L. Automatic detecting indicators for quality of health information on the Web. International Journal of Medical Informatics. 2007; 76.8: 575-582. Maples P, Franks A, Ray S, Stevens A, Wallace L. Development and validation of a low-literacy Chronic Obstructive Pulmonary Disease knowledge Questionnaire (COPD-Q). Patient Education and Counseling 2009; 81:19-22. Weiss BD. Health literacy and patient aafety: Help patients understand (manual for clinicians): American Medical Assocation; 2007. Leroy G, Endicott JE. Term familiarity to indicate perceived and actual difficulty of text in medical digital libraries. International Conference on Asia-Pacific Digital Libraries (ICADL 2011). 2011. Leroy G, Endicott JE. Combining NLP with evidence-based methods to find text metrics related to perceived and actual text difficulty. In: 2nd ACM SIGHIT International Health Informatics Symposium (ACM IHI 2012); 2012. Leroy G, Endicott JE, Mouradi O, Kauchak D, Just M. Improving perceived and actual text difficulty for health information consumers using semi-automated methods. In: American Medical Informatics Association (AMIA) Fall Symposium. 2012. Leroy G, Endicott JE, Kauchak D, Mouradi O, Just M. User evaluation of the effects of a text simplification algorithm using term familiarity on perception, understanding, learning and information retention. Journal of Medical Internet Research (JMIR) 2013;15:e144. Mouradi O, Leroy G, Kauchak D, Endicott JE. Influence of text and participant characteristics on perceived and actual text difficulty. Hawaii International Conference on System Sciences. 2013. Leroy G, Kauchak D, Mouradi O. A user-study measuring the effects of lexical simplification and coherence enhancement on perceived and actual text difficulty. Int J Med Inform 2013; 82:717-30. Sullivan D. How Google’s instant autocomplete suggestions work. Search Engine Land. April 2011. [online]. Available: http://searchengineland.com/how-google-instant-autocomplete-suggestions-work62592 Health Fact Sheet. Pew Research Internet Project. December 2013. [online]. Available: http://www.pewinternet.org/fact-sheets/health-fact-sheet/ Bhogal J, Macfarlane A, Smith P. A review of ontology based query expansion. Information Processing and Management. 2007; 43:866-886. White RW, Horvitz E. Cyberchondria: studies of the escalation of medical concerns in web search. ACM Transactions on Information Systems (TOIS). 2008; 27: 23. Liu Z, Chu WW. Knowledge-based query expansion to support scenario-specific retrieval of medical free text. Information Retrieval. 2007; 10:173-202. Hersh W, Price S, Donohoe L. Assessing thesaurus-based query expansion using the UMLS metathesaurus. Proceedings of AMIA. 2000. Chand S. Google autocomplete API. Sheryas Chand. January 2013. http://shreyaschand.com/blog/2013/01/03/google-autocomplete-api/ Franz A, Brants T, Norvig P. All our n-gram are belong to you | research blog. Research blog: the latest news from research at Google. August 2006; [online]. Available: http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html

983

Does query expansion limit our learning? A comparison of social-based expansion to content-based expansion for medical queries on the internet.

Searching for medical information online is a common activity. While it has been shown that forming good queries is difficult, Google's query suggesti...
361KB Sizes 1 Downloads 6 Views