Using Google Reverse Image Search to Decipher Biological Images

UNIT 19.13

Jennifer L. Mamrosh1 and David D. Moore1 1

Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas

Despite the range of tasks performed by biological image-processing software, current versions cannot find matches for the image in question among the huge range of biological images that exist in the literature and elsewhere on the Internet. Google’s Reverse Image Search is designed for this, and it is a simple, yet powerful tool that can be applied to decipher the contents of biological images. For images that contain unfamiliar or unknown elements, for example, Reverse Image Search can identify similar features in published images. Here we describe general guidelines for using this freely available tool to search published images in National Center for Biotechnology Information’s (NCBI’s) image database. These guidelines can be applied to a variety of types of biological images, including immunohistochemistry and electron microscopy, to facilitate C straightforward and rapid searches using Google’s Reverse Image Search.  2015 by John Wiley & Sons, Inc. Keywords: image analysis r reverse image search r unknown mechanism

How to cite this article: Mamrosh, J.L. and Moore, D.D. 2015. Using Google Reverse Image Search to decipher biological images. Curr. Protoc. Mol. Biol. 111:19.13.1-19.13.4. doi: 10.1002/0471142727.mb1913s111

Deciphering the meaning of the individual contents of biological images can be difficult. Many types of software exist to analyze different aspects of an image, from simple quantifiers of parameters like staining intensity to automated cell or organelle counters and machine learning phenotype assessments (Eliceriri et al., 2012). However, these software packages are unable to access the largest source of biological information: the millions of published biological images and their associated manuscripts. We have found that Google Reverse Image Search (also called Search by Image or Inside Search) provides a simple but powerful means to access this huge resource, and to find biological images that are similar to an image in question. Here we suggest guidelines to maximize matching of relevant images, which should be of use to researchers with a range of biological image types.

BASIC PROTOCOL

1. Prepare the image. If the image type is not supported, convert it to a supported image type (e.g., JPG, TIFF, PNG, GIF, BMP) using software like Adobe Photoshop or GNU Image Manipulation Program. Crop image to the most interesting or distinctive features. An image can be used as-is, although hits are often more relevant if the image is cropped.

2. Upload the image by clicking the camera icon in the Google Reverse Image Search page (http://images.google.com). Then choose the “Upload an image” tab and browse for the image.

Current Protocols in Molecular Biology 19.13.1-19.13.4, July 2015 Published online July 2015 in Wiley Online Library (wileyonlinelibrary.com). doi: 10.1002/0471142727.mb1913s111 C 2015 John Wiley & Sons, Inc. Copyright 

Informatics for Molecular Biologists

19.13.1 Supplement 111

Figure 19.13.1 Example of reverse image searching using an electron microscopy image. Input image is from mouse liver at 8000× magnification. This image was uploaded and the sixth hit investigated (White and Estensen, 1974). This hit suggested the input image is of a vacuole. Interestingly, most images of vacuoles found online do not contain electrondense material inside as is found in both the input image and matched image. This substantiates both the specificity of Google Reverse Image search and its potential to find mechanistically relevant images. For the input image, associated discussion from the match suggests that the vacuole is responsible for degradation of large amounts of cellular material in the liver.

Figure 19.13.2 Example of reverse image searching using an immunofluorescence image. Input image is from mouse primary hepatocytes stained with a Golgi marker (RCAS1; green) and mounted with DAPI to visualize nuclei (blue). This image was captured at 1000x magnification and cropped. This image was uploaded and the sixth hit investigated, which depicts a commercial antibody to stain Golgi (Abcam ab2809; http://www.abcam.com/tgn46-antibody-2f71-ab2809.html). Results such as this can be helpful to determine the localization of uncharacterized proteins.

3. Specify the search for only biological images: a. Type site:www.ncbi.nlm.nih.gov in the search bar, and then click the search icon. b. Alternatively, limit results to mostly biological images by including words like cell or microscopy within the search bar. If there are specific attributes of the uploaded image (e.g., “nucleus,” “GFP,” or “dark spots”), add these to the search bar to help refine the matched images once you click the search icon. Google Reverse Image Search can find images anywhere on the Internet, but the most relevant results come from specifying searches within NCBI’s database of images, which includes images from nearly all papers on PubMed.

Google Reverse Image Search

These criteria must be added to the search bar, and searched, before clicking on “Visually similar images”. Once this link is clicked, adding anything additional to the search bar will unfortunately restart the search and results will no longer be based on the uploaded image. Examples of search results are provided in Figures 19.13.1 and 19.13.2.

19.13.2 Supplement 111

Current Protocols in Molecular Biology

4. Filter results for desired hits. Currently, this must be done manually. It is recommended to select about a dozen images considered to be the most relevant matches, although this can vary. Browse the top 100-200 matched images and select what you consider to be the closest matches. Click on these images, and most typically, you will be directed to a published figure where general attributes about the matched image can be inferred from the figure legend. Carefully read the manuscript main text and discussion for mechanistic clues as to the relevance of your match.

COMMENTARY Background Information Currently, no biological image analysis software takes into account previously published biological images when considering the input image. However, Google Reverse Image Search is designed for such queries, and can facilitate the finding of published images mechanistically similar to the input image. Other reverse image search programs, including TinEye and Image Raider, may similarly accomplish this goal, but Google Reverse Image Search is generally thought to be the most accurate. Future development of Google Reverse Image Search or other image-matching algorithms, to better suit biological data, could revolutionize analysis of biological images. For example, including parameters like specification of image magnification, type of image (e.g., electron microscopy or fluorescence), organism or cell type, and type of processing performed could improve relevancy of image matches obtained. This could even be extended to include adaptation for automated image analysis for use by clinicians to assess pathology of unknown diseases. Regardless of its future developments, Google Reverse Image Search in its current form still is likely to benefit many basic and applied science researchers by allowing them to find similar images with associated discussion on type of structure, disease relevancy, or implications of changes in biological pathways.

Critical Parameters and Troubleshooting Some searches may not yield relevant or useful matches, for a variety of potential reasons, including nonbiological artifacts in the image. Strategies to maximize your chances of success include: (1) input the most specific image possible, which might mean cropping the image to include only the most relevant or distinctive portion; (2) specify search parameters to include only images likely to be relevant; and (3) compare the

results of searches with multiple examples of a particular unknown structure. Some images may yield different hits when input as color images versus black and white. Colored images assembled from images taken at multiple channels (for example, immunofluorescence images also co-stained with a nuclear or cell perimeter marker) may work better when input as only the channel of interest. Matched images obtained also cannot be guaranteed to be functionally the same as the input. Certainly there is wide variability across organisms or even different cell types in the way many things look (for example, the size of the nucleus or other organelles is variable). This is expected to be the largest source of biologically nonrelevant matches produced by this method. One suggestion is to include the organism, cell type, or cell line in the search criteria to essentially normalize the input image to the cell type. Any potentially interesting results obtained should be verified by experimentation when possible. One potential concern is that unpublished images uploaded might be retained by Google. However, these authors have never detected their own uploaded images when performing multiple searches designed to find them, suggesting that uploaded images are not searchable by other groups. Therefore, these authors believe this search is secure enough for analysis of unpublished data. Additional security, such as use of a virtual private network, could also be employed. Researchers with large sets of unpublished images might benefit from searching within these images for similarity to the uploaded image. These unpublished images could be hosted online by any available means and the search restrictions changed from site:www.ncbi.nlm.nih.gov to site:HOSTING LOCATION (for example, site:www.dropbox.com/FOLDER). A multitude of options exist for storing photos online so as to be searchable, although a

Informatics for Molecular Biologists

19.13.3 Current Protocols in Molecular Biology

Supplement 111

location not publically searchable would be advised (for example, private domains not indexed by Google or a temporary public link generated in Google Drive or Dropbox). Finally, the use of this tool without restrictions sometimes yields images that some may find undesirable or offensive. Restricting a search to images found in the PubMed image database is a good practice to avoid this.

Literature Cited Eliceriri, K.W., Berthold, M.R., Goldberg, I.G., Ibanez, L., Manjunath, B.S., Martone, M.E., Murphy, R.F., Peng, H., Plant, A.L., Roysam, B., Stuurman, N., Swedlow, J.R., Tomancak, P., and Carpenter, A.E. 2012. Biological imaging software tools. Nat. Methods 9:697-710. White, J.F. and Estensen, R.D. 1974. Selective labilization of specific granules in polymorphonuclear leukocytes by phorbol myristate acetate. Am. J. Pathol. 75:45-60.

Google Reverse Image Search

19.13.4 Supplement 111

Current Protocols in Molecular Biology

Using Google Reverse Image Search to Decipher Biological Images.

Despite the range of tasks performed by biological image-processing software, current versions cannot find matches for the image in question among the...
257KB Sizes 3 Downloads 7 Views