Diving deeper into Zebrafish development of social behavior: analyzing high resolution data.

Journal of Neuroscience Methods 234 (2014) 66–72

Contents lists available at ScienceDirect

Journal of Neuroscience Methods journal homepage: www.elsevier.com/locate/jneumeth

Basic Neuroscience

Diving deeper into Zebrafish development of social behavior: Analyzing high resolution data Christine Buske a,b,∗ , Robert Gerlai b a b

Papers/Springer SBM, London, UK (previously: University of Toronto, Department of Cell & Systems Biology) University of Toronto Mississauga, Department of Psychology, Toronto, Canada

h i g h l i g h t s • • • •

Zebrafish are a high throughput, cost effective vertebrate model. Behavioral data collection/analysis is time-consuming. The R programming language is a powerful tool in data analytics. R was used to analyze a large behavioral dataset.

a r t i c l e

i n f o

Article history: Received 2 May 2014 Received in revised form 16 June 2014 Accepted 16 June 2014 Available online 23 June 2014 Keywords: Zebrafish Behavior Methods

a b s t r a c t Vertebrate model organisms have been utilized in high throughput screening but only with substantial cost and human capital investment. The zebrafish is a vertebrate model species that is a promising and cost effective candidate for efficient high throughput screening. Larval zebrafish have already been successfully employed in this regard (Lessman, 2011), but adult zebrafish also show great promise. High throughput screening requires the use of a large number of subjects and collection of substantial amount of data. Collection of data is only one of the demanding aspects of screening. However, in most screening approaches that involve behavioral data the main bottleneck that slows throughput is the time consuming aspect of analysis of the collected data. Some automated analytical tools do exist, but often they only work for one subject at a time, eliminating the possibility of fully utilizing zebrafish as a screening tool. This is a particularly important limitation for such complex phenotypes as social behavior. Testing multiple fish at a time can reveal complex social interactions but it may also allow the identification of outliers from a group of mutagenized or pharmacologically treated fish. Here, we describe a novel method using a custom software tool developed within our laboratory, which enables tracking multiple fish, in combination with a sophisticated analytical approach for summarizing and analyzing high resolution behavioral data. This paper focuses on the latter, the analytic tool, which we have developed using the R programming language and environment for statistical computing. We argue that combining sophisticated data collection methods with appropriate analytical tools will propel zebrafish into the future of neurobehavioral genetic research. © 2014 Published by Elsevier B.V.

1. Introduction Larval zebrafish have been an excellent vertebrate model for high throughput studies (Lessman, 2011), but adult zebrafish may also hold substantial promise in this regard (Brennan, 2014). The limiting factors in using adult zebrafish for high throughput studies lie not in maintenance or acquisition costs, but rather in time consuming experimental procedures and analysis. Nevertheless

∗ Corresponding author. Tel.: +44 7867410264. E-mail address: [email protected] (C. Buske). http://dx.doi.org/10.1016/j.jneumeth.2014.06.019 0165-0270/© 2014 Published by Elsevier B.V.

adult zebrafish have been becoming increasingly popular in behavioral neuroscience as the identification of mutation or druginduced alterations in brain function may be best investigated with behavioral test paradigms (Gerlai, 2010). While larval zebrafish offer behavioral endpoints, adult zebrafish provide the advantage of a far wider ranging behavioral repertoire (Norton and BallyCuif, 2010). Behavior is particularly appropriate when modeling human conditions for which abnormal behavior is a core symptom (Gerlai, 2012; Brennan, 2014). Previous work has also indicated a correlation between neurochemical changes and behavioral maturation in zebrafish (Buske and Gerlai, 2012). From this leads the argument that changes in behavior are accompanied by changes in

C. Buske, R. Gerlai / Journal of Neuroscience Methods 234 (2014) 66–72

neurochemistry, and as we further develop our understanding of how these differences interplay we may gain further understanding of the brain and behavior. Zebrafish offer a rich behavioral repertoire. Shoaling is one of the behavioral outputs belonging to this (Buske and Gerlai, 2011a,b; Miller and Gerlai, 2011). Several behaviors have been characterized in zebrafish and are being further investigated within a behavioral neuroscience context. These have been previously reviewed and include reward, learning and memory, aggression, locomotion, anxiety, mating, and sleep (Norton and Bally-Cuif, 2010). Shoaling is a complex behavior to study and quantify as there are a number of protocols that mimic a shoaling situation, while quantifying shoaling in an open field setting (where subjects are able to interact with each other) has been difficult in the past (Buske and Gerlai, 2011a,b; Gerlai, 2014). While behavior offers insights into brain function, recording and analysis of behavioral trials can be time consuming. Throughput in behavioral screening may be successfully increased by running several testing arenas in parallel, and cost of set up such systems is decreasing as technology advances. This makes parallel behavioral test systems a reality even for the average academic laboratory. The bottleneck in such behavioral screening then becomes the proper extraction and analysis of the acquired data. Several commercially available tracking systems exist for testing a single subject at a time, (e.g. Noldus’ Ethovision, and CleverSys). These systems are highly sophisticated, but still present some trade-offs: while accurately tracking a single fish in optimal conditions, these systems present difficulty tracking very small fish in a large area (in larger tanks), and require optimal light conditions. The additional disadvantage is that most of these systems have difficulties with tracking multiple fish, particularly shoals with more than four members. The final, but certainly not most trivial limiting factor is the cost: With a single license one could only analyze one trial at a time. This presents a significant time constraint with regards to data extraction from videotaped trials. Processing video files at a higher throughput rate would require the purchase of multiple licenses and an equal number of computers. With tracking systems costing $5–15 thousand per license, parallel processing (multiple licences) is not possible for most academic laboratories. Several commercial software applications exist that allow for automated tracking of zebrafish, and quantification of several behavioral outputs. As discussed above, these can be costly and the cost can limit throughput. Having said that, these methods do offer increasingly sophisticated means of measuring behavior: tracking of animal behavior has been possible in a two dimensional plane for some time (Noldus and Spink, 2001). More recently, various groups have started applying 3D video behavioral tracking (Maaswinkel et al., 2013). These methods have provided insight in various behavioral outputs, particularly as a result of drug exposure (Cachat et al., 2013; Maaswinkel et al., 2013). When developing our in-house tools, first we consider the requirements of the tracking system and the specifics of behavioral analysis as they pertain to zebrafish. High throughput capabilities would require a simple tracking system and reliable behavioral paradigms (Blaser and Gerlai, 2006). The tracking system should ideally be capable of tracking shoals of fish and tracking even under sub-optimal conditions (e.g. when reflections cannot be avoided or under sub-optimal light/background conditions). Preferably, such systems should be capable of processing video-data in real time and preferably with a small number of computers. For example, under such conditions commercially available tracking systems frequently mistake small particles, e.g. air bubble or debris, which introduces substantial errors. The errors need to be corrected by the experimenter, and this requires continuous monitoring and time. Recent improvements in optics and computing power have revolutionized the analysis of behavior. Just under 30 years ago,

67

tracking with one camera and one computer cost thousands of dollars and was only able to generate a position for the animal once per second. Now the cost for basic equipment is in the few hundred dollar range and tracking occurs at 30 times per second with a simple inexpensive camera (Dusenbery, 1985). The rapid improvements in electronics and video equipment have opened up the possibility of extracting large amounts of data. Extracting positional data from a video recorded at 30 frames per second results in 18,000 data points in a short 10-min video. Processing and analysis that requires manual input would be prohibitively time consuming for the number of trials required in the average experiment. As technology improves, Big Data is becoming the new bottle neck across many disciplines (Marx, 2013). Shoaling behavior is complex, involving subjects leaving and rejoining the shoal, and varying distances between shoal members. Several research groups have attempted to describe and quantify shoaling behavior using multiple zebrafish simultaneously, but these approaches typically involve a method where the number of individuals occupying an arbitrary area of space within the testing tank is counted at different time intervals (Echevarria et al., 2011), or by assigning a shoal cohesion ‘score’, a subjective judgment made by the observer, at different time points during the trial (Piato et al., 2011). In other studies, a single fish is exposed to a shoal of conspecifics separated by a glass divider, and the time spent within a preset compartment next to the stimulus is measured (Savio et al., 2012). Aside from being time consuming, these methods do not offer an objective or particularly informative description of shoal cohesion. For example, a group of fish may be divided across two different arbitrary areas as defined by the experimenter, but be physically very close to each other. In another scenario, the group may be present within the same arbitrary area of the testing arena, but be physically further apart. Shoal cohesion would be rated lower in the first scenario than the second and does not accurately represent how close the shoal truly is. In cases where a shoal cohesion score is assigned to each time interval sampled, substantial information is also lost, and there is the possibility of experimenter bias. Subtle differences in shoal cohesion would be missed in such a method, even when assessed by multiple raters and inter-rater reliability is high. Needless to say, these methods do not allow for high throughput analysis of behavioral trial, aside from not providing objective and precise measurements of group behavior. Reproducibility of these methods across laboratories is also of concern (Benjamini et al., 2010). Also notably, methods where a single fish is tested in the presence of a stimulus shoal (Savio et al., 2012) do not describe shoal behavior properly as there is no possibility of interactive communication between the subject and the stimulus. For example, even when live stimulus fish are used, the experimental zebrafish will not be able to sense the presence of the stimulus fish with their lateral line, as a glass barrier is placed between the test subject and the stimulus fish. Similarly, while larval zebrafish can be observed and tracked in multiwell plates, each subject is isolated in its own well, and thus the subjects do not interact (Cario et al., 2011). When measuring shoaling in a group setting automated tracking methods have still fallen short. An earlier tracking program developed within our laboratory (described in detail previously by Miller and Gerlai (2007) have allowed for more objective and accurate measurement of interindividual distance, and other parameters of group behavior, in zebrafish. This method has been successfully employed in previous studies (Buske and Gerlai, 2011a,b). Notably, it relied on manually identifying each individual in a shoal at each time interval sampled, which required a human observer, and thus the method was highly time consuming. Because the observer was only identifying the location of the subject on a screen by clicking on it, and not making any assessment on shoal cohesion, this method was arguably

68


more objective than several prior manual scoring or rating methods. However, extracting high-resolution data from trials using this method is prohobitively time consuming, and as a consequence it is impossible to avoid loss of information unless a substantial time investment is made. More recently, a sophisticated yet simple tracking system was developed in our laboratory, internally named ‘Real Fish Tracker’. It was developed by James McCrae, a computer science PhD Candidate at the University of Toronto. This tracking software is able to track multiple subjects within the same environment, and records precise location data (X-Y coordinates) for each fish at the frame rate of the video being sampled. This translates to a sampling rate of 29 times per second for videos created with conventional digital cameras. The program not only records location data at 29× per second, but it does so in real time on computers with average processing speeds. In addition, multiple sessions can run simultaneously on the same computer, allowing for tracking of up to five different trials in unison on a laptop computer with a modest processor and memory card. It should be noted that when running this many instances of the program, tracking does not proceed in real time, but some time savings is gained from only calibrating a few trials at once every half an hour instead of calibrating one file every few minutes. The program supports a range of conventional video formats, and tracks the fish by comparing the frame being sampled with an average image computed over the previous several frames in the video. The difference in images allows the program to identify the change in pixels, i.e. the location of the fish, and assigns x-y coordinates to each subject. The algorithms take into account the previously known position of the subject, the current image and the average images for previous frames. This allows for a highly reliable determination of the position of each subject, and minimizes instances where a subject might be ‘lost’ by the program due, for example, to two subjects crossing into the path of each other. While minimizing these instances, they do occur. The software, as with any other application, does not guarantee consistent identification of the same fish. This would be of concern in paradigms where an individual mutant or differentially treated fish may be exposed to shoal of control fish. The experiments discussed within this article are focused on characterizing dynamics of movement of the shoal and as such do not require individual labeling of each shoal member, an additional goal that will require future software development. The current software program provides high-resolution positional data for each of the subjects in the trial. However, it does not provide any quantification of particular (established) behavioral endpoints, such as inter-individual distance, nearest neighbor distance, distance from the closest wall or center, distance from a corner, etc. As described before, some sophisticated software packages exist that can analyze select behavioral outputs in zebrafish (Ahmad et al., 2012), but for many research groups the cost of acquisition of these packages can still be prohibitive. In addition, these packages offer a set of behavioral outputs that can be quantified, but beyond these predetermined behavioral measures, they do not offer more flexibility. The current method was deployed in a different study validating its reliability for the endpoints measured; comparable results were obtained in the description of shoaling behavior with the currently described method (Buske and Gerlai, 2012; Mahabir et al., 2013), as with older studies using previously discussed methods (Buske and Gerlai, 2011a,b). Several other tracking programs exist that have been either custom developed or open source (Aguiar et al., 2007; Wolfer et al., 2001; Kane et al., 2004). The community faces the challenge of frequently reinventing the wheel, with several tracking programs being developed with sometimes overlapping features. Most of these programs produce data files consisting of coordinate data in

a xy field for the subject being tracked. At this stage, the analysis of the data provides the second major challenge. Using the raw data output from the application developed inhouse affords this flexibility. We have developed a quantification module based on the R-environment for statistical computing that allows us to extract numerous behavioral endpoints from the raw Data-output, and in a highly flexible, user-specific manner. In our behavioral experiments, we recorded the behavior of our fish for 8 min. Each 8-min trial resulted in data files consisting of 13,920 rows of data (sampled 29× per second). More complex behavioral studies, e.g. those requiring following the subject’s behavior across extended periods of time, generate even larger data matrices. A longitudinal study recently completed, assessing the effects of embryonic ethanol exposure over the course of development, generated 1300 data files (13 age points, n = 20 for five treatment groups). This resulted in a total of 18 million data points (Buske and Gerlai, 2011a,b). Processing and computation of many hundreds, or thousands of data files corresponding to an equal number of trials require a sophisticated approach. The R programming language and environment for statistical computing is particularly suitable for this purpose (Venables and Smith, 2012). R can be regarded as an implementation of the S language which was developed at Bell Laboratories by Rick Becker, John Chambers and Allan Wilks, and also forms the basis of the S-Plus systems. It is an effective programming language that facilitates data manipulation, calculation, and graphics. The software suite is referred to as an ‘environment’, as it is a system for developing methods of interactive data analysis, rather than a typical statistical package or data analysis software. R provides the researcher with flexibility in designing analytical tools suitable for very specific data sets or goals unique to a particular project. By creating programs in R for the data manipulation and analysis of high resolution positional data as acquired with the Real Fish Tracker program it is possible to quickly and efficiently process and analyze hundreds of output files in a very short time frame, and with minimal interference by the experimenter.

2. Data processing and analysis Similar to several other tracking programs, The Real Fish Tracker produces individual .txt format output files with high resolution positional time series data for each individual tracked within the same arena. Video files are sampled at a rate of 29× per second, creating data files of thousands of rows with data. Each data file corresponds to a single trial, and from the positional data recorded various different behavioral measures can be computed in a user-defined and highly flexible manner. For our own purposes we decided to compute the following behavioral measures: inter-individual distance, nearest neighbor distance, distance from the closest wall or center, distance from a corner, distance traveled, time spent in a perimeter or particular zone, etc. As only positional data are provided, further processing is both a must and a great advantage providing flexibility. The 29 hz sampling rate generates many time points and the average of these (the temporal mean) is calculated for the above measures for user defined intervals. In addition, but also importantly, the variance of the time point data is also calculated. It represents the within individual temporal variance of the behavior. The xy coordinates are in screen units and need to be converted to an experimenter defined distance unit, e.g. cm or body lengths. The latter is commonly used in animal, and particularly fish, research (Partridge, 1981). Subsequently, a new data frame can be constructed with the means of these time intervals or trials


69

for statistical analysis. Processing each data file individually is prohibitively time consuming and an automated procedure for high throughput processing is a necessity. The development of an analytical tool in the R programming language, to complement the tracking program solves this issue and is described below. 3. Requirements Batch automation of all script files applied to a selection of data files has been accomplished using the args package (Piipari, 2010). The R environment runs on both Macintosh and Widows based operating systems. Basic knowledge of data processing in R is required for basic analysis. 3.1. Data processing workflow Raw data files are in .txt format and contain the following variables: • Tracking area (coordinate points) • Calibration rulers x and y (coordinate points), and length (in cm or other measure defined by the user). • Seconds • FishX coordinates for each subject • FixhY coordinates for each subject • Confidence (a confidence measure for detecting the subject in the trial) • Ruler 1 and Ruler 2 calibrated positions for each subject. R scripts are written to process and analyze the data obtained in an automated manner. 3.2. Data pre-processing The first R script applied to the data is to reshape the data for further analysis. It is possible to perform ANOVAs and other statistical analyses in R using data in a long format. The data obtained from the tracking program is already in the long format, but contains calibration information in the first six rows of the matrix. The original data sets can be simplified first for easier processing and behavioral quantification. The calibration information is transferred to a new column in the data matrix. For functions to be applied across the entire data matrix, the R script written requires the matrix to be of equal width and length. All pertinent information retained in the first rows of the output file is therefore reshaped to fit this requirement. In the interest of processing speed, non-essential information is removed from the new data matrix. All original data files are preserved, and the script outputs a newly reshaped data matrix. First, the calibration information is called from each trial: tanksize

Digging deeper: high-resolution genome-scale data yields new insights into root biology.

Development of social behavior in young zebrafish.

Marine biotechnology: diving deeper for drugs.

Diving into Data: Planning a Research Data Management Event.

Diving into data: quantifying efficiency by improving patient flow.

Diving into the unknown.

Diving into the unknown.

Development of Bioinformatics Pipeline for Analyzing Clinical Pediatric NGS Data.

Analyzing the hydrodynamic and crowding evolution of aqueous hydroxyapatite-gelatin networks: Digging deeper into bone scaffold design variables.

Digging Deeper Into Hepatitis C Virus Outbreaks.

Looking ahead from age 6 to 13: a deeper insight into the development of planning ability.

Social science. Twitter offers entire data pool, but some wary of diving in.

Whole transcriptome data analysis of zebrafish mutants affecting muscle development.

Application of high-resolution CT imaging data to lung cancer drug development: measuring progress: workshop IX.

The development of an intermediate-duration tag to characterize the diving behavior of large whales.

Zebrafish Craniofacial Development: A Window into Early Patterning.

Editorial: Oxytocin's routes in social behavior: into the 21st century.

Understanding Spatiotemporal Patterns of Biking Behavior by Analyzing Massive Bike Sharing Data in Chicago.

On the Prediction of Flickr Image Popularity by Analyzing Heterogeneous Social Sensory Data.

Deeper Insights into the Allosteric Modulation of Ionotropic Glutamate Receptors.

Digging deeper into the intronic sequences of the SPINK1 gene.

Deeper insight into HBsAg--anti-albumin antibody correlations.

Social behavior of zebrafish: from synthetic images to biological mechanisms of shoaling.

Integration of high-resolution data for temporal bone surgical simulations.