Theory Biosci. DOI 10.1007/s12064-015-0213-7

ORIGINAL ARTICLE

Characterization of the crawling activity of Caenorhabditis elegans using a hidden markov model Sang-Hee Lee1 • Seung-Ho Kang2

Received: 29 May 2014 / Accepted: 26 July 2015  Springer-Verlag Berlin Heidelberg 2015

Abstract The locomotion behavior of Caenorhabditis elegans has been studied extensively to understand the respective roles of neural control and biomechanics as well as the interaction between them. Constructing a mathematical model is helpful to understand the locomotion behavior in various surrounding conditions that are difficult to realize in experiments. In this study, we built three hidden Markov models (HMMs) for the crawling behavior of C. elegans in a controlled environment with no chemical treatment and in a formaldehyde-treated environment (0.1 and 0.5 ppm). The organism’s crawling activity was recorded using a digital camcorder for 20 min at a rate of 24 frames per second. All shape patterns were quantified by branch length similarity (BLS) entropy and classified into four groups using the self-organizing map (SOM). Comparison of the simulated behavior generated by HMMs and the actual crawling behavior demonstrated that the HMM coupled with the SOM was successful in characterizing the crawling behavior. In addition, we briefly discussed the possibility of using the HMM together with BLS entropy to develop bio-monitoring systems to determine water quality. Keywords Caenorhabditis elegans  Branch length similarity (BLS) entropy  Hidden Markov models  Selforganizing map

& Sang-Hee Lee [email protected] 1

Division of Mathematical Modeling, National Institute for Mathematical Sciences, Daejeon 305-811, Republic of Korea

2

Department of Information Security, Dongshin University, Naju 520-714, Republic of Korea

Introduction The study of the locomotion behavior of Caenorhabditis elegans has been extensively made because it provides information on not only the underlying simplicity of complex motor systems but also the indirect physiological relationship between neuronal systems and muscles. This organism is one of simplest organisms, and its genome and neural circuit have been elucidated (Brenner 1974; Kaletta and Hengartner 2006), which provides us with the tangible possibility for the study. However, in spite of the extensive study, we have not succeeded in understanding the relationship between the dynamics of the neuronal system and the organism’s behavior. Experimental studies on the behavior have been vigorously carried out, such as the behavior at a low Reynolds number (Cohen and Boyle 2010) in a soil system (Jung 2010) or on a solid surface or in a liquid environment (Ghosh and Emmons 2008). Stephens et al. (2008) showed that the space of shapes adopted by C. elegans is low dimensional, with just 4 dimensions accounting for 95 % of the shape variance. Other researchers have focused on thermotaxis (Ryu and Samuel 2002; Zariwala et al. 2003), pirouettes in chemotaxis (Pierce-Shimomura et al. 1999), and foraging response (Sawin et al. 2000). Srivastava et al. (2009) performed a temporal analysis for stochastic turning behaviors of the organisms. In spite of the increase of the experimental data, we have not still reached the understanding for the relationship. This is because the behavior is too complex according to the environmental conditions and the functional mechanism, as a result of complicated process regarding information flow from neurons to muscles. For this reason, the computer simulation models to mimic the organism’s behavior have attracted considerable attention and they were helpful to understand the

123

Theory Biosci.

complicated behavior (Suzuki et al. 2005). In other words, a virtual organism, as the simulation model, could be used to explore the actual organism’s behavior in response to various environmental conditions that are difficult or expensive to realize in experiments (Ferree et al. 1997). The suggested simulation models for C. elegans can be classified into three main groups: (1) models for understanding the processing of stimulation information in the neural circuit (Wicks et al. 1996; Morita et al. 2001), (2) models that express the muscles for motion generation in detail (Niebur and Erdos 1993; Bryden and Cohen 2004), and (3) models to integrate the information flow process and the muscle movement (Lee 2010). These studies qualitatively reproduced four fundamental patterns of locomotion: namely, forward and backward movement, rest motion, omega-type turning, and coil-type turning. Simple stochastic models were suggested by Stephens et al. (2008). The studies showed numerically that up to 95 % of the movement patterns of C. elegans comprised 4 eigenmodes and each that eigenmode corresponds to 1 of the 4 fundamental patterns. However, the models still have a limitation that they did not cover the issue of the physiological processes. This is because it is difficult to solve this problem to reveal the physiological causality of producing a certain type of movement pattern at the neural circuit level structures and their functions (Srivastava et al. 2009). In this study, an alternative way to overcome the difficulty, we built a Hidden Markov model (HMM) to simulate the organism’s crawling behavior under different environmental conditions, such as in a controlled (untreated) environment and a chemically treated environment. Each observed shape pattern of the organism was characterized by calculating the BLS entropies (Lee 2010; Lee et al. 2010; Kang et al. 2012) for 13 points evenly placed along the organism’s length. Shape pattern transition probability matrices were built from shape sequences generated from the 2 trained HMMs using Monte Carlo sample methods. These matrices were compared with those built from the experimental shape sequences recorded from specimens to evaluate whether stochastic processes govern the movement of C. elegans. We also compared matrices to find the differences between the temporal shape patterns of the organism in the controlled and the chemically treated environments using Levenshtein distances (Gusfield, 1997).

(60 mm in diameter and 15 mm in height) filled with Nematode Growth Medium (NGM) in an incubator at 20 C and were fed with Escherichia coli of the strain OP50. All worms analyzed in these experiments were young adults in fourth-stage larvae. The test worms were allowed to acclimate for 15 min before their behavior was analyzed. Among the 41 specimens, 21 specimens were placed in the controlled environment and the other 20 specimens were placed in the Formaldehyde-treated environment (11 individuals for 0.1 ppm treatment and 9 individuals for 0.5 ppm treatment). The individual’s crawling activity was monitored for 20 min with a Sony digital camcorder, which was equipped on the top. The camcorder recorded frame images every 1/24 s. The recording resolution was selected by referencing the experiment of Ramakrishnan and Okkema (2014). In each image of the selected clips, 13 points were placed at equal intervals along the length of the skeletonized test specimen. The skeleton method was employed in Korta et al. (2007). Feature extraction In the study of Lee et al. (2010), the branch length similarity (BLS) entropy was defined on a simple branch network consisting of a single node and its branches as below (see Fig. 1) n X s¼ pi logðpi Þ= logðnÞ: i

Here, the probability of the i-th branch of the simple network is defined as n X pi ¼ Li = Lk k¼1

where n is the number of branches in the simple network and Lk is the length of the k-th branch (k = 1, 2, 3, …, n). The denominator log(n) is introduced to normalize the entropy. The shape of C. elegans is represented by 13

Model description and analysis Organisms and experimental setup In this study, 41 adult individuals of the wild-type N2 C. elegans were used. They were cultivated in Petri dishes

123

Fig. 1 Definition of branching length similarity (BLS) entropy

Theory Biosci.

points placed on the organism’s length at equal intervals. Thus, we can form simple networks by connecting points in the shape to each other. Then, the BLS entropy profile, which is the assembly of BLS entropies for the 13 points, is used as a descriptor to characterize the shape of C. elegans. In the present study, we formed 2 simple networks based on 13 points, which served as nodes, and the edges between nodes were extracted from the shape of C. elegans. The neuronal control of the body-wall muscles was divided into 12 parts (Pierce-Shimomura et al. 1999; Suzuki et al. 2005). This is the reason that we used the 13 points. One network was formed by connecting the center point and all other points, and the other network was generated from the angles between the edges (see Fig. 2). For the former network, the distances in the connection were used as the branch lengths mentioned in the definition of the BLS entropy, while for the latter network, the angles corresponded to the branch lengths. The BLS entropy values were calculated as SL for the former network and Sh for the latter network. SL values capture the degree of the bending of the organisms, and Sh values characterize the degree of the coiling of the organisms. We believe that the two values contain minimum information on the characteristic feature of the shapes in that the crawling behavior seemed to consist of the bending and the coiled movement during observation.

Self-Organizing map and k-means clustering We used a two-level approach to cluster the shapes of C. elegans. We projected the obtained datasets to the selforganizing map (SOM) (Kohonen 1989) using the learning method and then clustered them using the k-means clustering algorithm (Hartigan and Wong 1979). The SOM is a type of neural network, which has been used as a valuable tool in dealing with complex or vast amounts of data. The SOM is a set of neurons (or nodes), connected to one another via a rectangular or hexagonal topology. The connections between the inputs and the nodes have weights, so a set of weights corresponds to each neuron. A neuron is represented by a prototype vector wj ¼ ½w1j ; w2j ; w3j ; . . .; wdj , where d is the input vector dimension. Adjacent neurons are connected by a neighborhood relation. In this study, we heuristically determined a 14 9 10 hexagonal lattice as the structure of the SOM. Each input vector consists of two BLS entropies, SL and Sh. The entropy values were scaled to a value between 0 and 1 using the following scaling function: si  smin i max si  smax i

ði ¼ L or hÞ

ð1Þ

where si is the entropy value for the network formed by connecting the center point and all other points, and for the network generated from the angles between the edges. smin i and smax denote the minimum and maximum entropy vali ues among input data set, respectively. The SOM is trained iteratively. At each training step, an input vector x is randomly chosen from the input dataset. The distances between x and all the prototype vectors are computed using the following formula: dj ðtÞ ¼

13 X

½si  wij ðtÞ2

ð2Þ

i¼1

The best matching unit (BMU) is the neuron with the prototype vector closest to the input vector x. The prototype vectors of the BMU and their topological neighbors are updated such that these units are moved closer to the input vector in the input space. The update rule for a prototype vector of unit j is  wj ðtÞ þ aðtÞ½xðtÞ  wj ðtÞ; j 2 NðtÞ wj ðt þ 1Þ ¼ ; ð3Þ wj ðtÞ j 62 NðtÞ

Fig. 2 Definition of the BLS entropy and the formation of two simple networks. One is generated using the distances between the center node (seventh node) and other nodes, and the other is constructed from the angles between neighboring branches. The darker shape represents a crawling pattern of C. elegans

where t and aðtÞ are the time and the learning rate, respectively. N(t) indicates the neighbors of the BMU, including the BMU itself. wj(t) represents the weight value of j-th neuron. After an adequate number of learning steps, the network tends to be spatially organized according to the structure of

123

Theory Biosci.

the input dataset. The locations of the units tend to be ordered in accordance with the topological relations among the patterns in the input dataset. Hence, the topological relationship in the input data space is preserved on the map. To show the degree of association between SOM units and to be able to effectively utilize the information provided by the SOM, methods that yield good map unit clusters are required. There are two main approaches for clustering data (SOM units in this case), namely, hierarchical and partitive approaches. Hierarchical methods such as Ward’s linkage method use a dendrogram to cluster data (Vesanto and Alhoniemi 2000). However, the dendrogram does not cluster data in a unique way. Rather, partitioning is achieved by arbitrarily cutting the dendrogram at a certain level. We used a partitive algorithm, the k-means algorithm, to avoid the arbitrariness of the hierarchical approaches. The k-means algorithm for data clustering minimizes the error function, E: E¼

K X X

kx  c i k2

ð4Þ

i¼1 x2Qi

where K is the number of clusters, and ci is the center of cluster i. The k-means algorithm was repeated for sets with pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi different number of clusters from 2 to 14  10. To select the best clustering among different partitions, we evaluated each of the clusters using the Davies–Bouldin (DB) index. According to the DB validity index, the best clustering minimizes   K 1X Sk ðQi Þ þ Sk ðQj Þ max ; ð5Þ K k¼1 j6¼i SðQi ; Qj Þ where K is the number of clusters, Sk(Qi) is the average distance of all objects belonging to cluster i to their cluster center, and S(Qi, Qj) is the distance between two cluster centers.

Hidden Markov models Hidden Markov models (HMMs) are widely used in applied sciences and engineering (Rabiner 1989). In an HMM, there are two types of states: the hidden state and the observable state (often called an ‘‘observational event’’). Markov processes underlie both types of states. An HMM is characterized by two stochastic processes: an unobservable Markov chain addressed by a finite number of the hidden states, an initial state probability distribution, and a state transition probability matrix; and an observable process defined as a set of probability density functions (i.e., events) associated with each state. An HMM is characterized by the 5-tuple hN; M; A; B; pi, where N is

123

the number of states in the model (Rabiner 1989). Generally, HMM can be described as following notations, 1. 2. 3.

4.

5.

Xq ¼ fSi ji ¼ 1; 2; . . .; N g: Finite set of N possible states. Xo ¼ fEi ji ¼ 1; 2; . . .; M g: Finite set of M distinct observation events.   A ¼ faij g ¼ P qtþ1 ¼ Sj jqt ¼ Si ; 1  i; j  ; N : Transition probability matrix (TPM) for states, where qt denotes the state at time t.   B ¼ fbij g ¼ P ot ¼ Ej jqt ¼ Si ; 1  i  N; 1  j  M : Emission probability matrix (EPM) for events, where ot denotes the event at time t. p ¼ fpi g, where pi ¼ Pðq1 ¼ Si ; 1  i  N Þ: The initial state distribution.

The TPMs, EPMs, and state distribution were estimated by training the HMM based on the Baum–Welch algorithm (Baum et al. 1970). In the learning process of HMM, the initial state distribution p ¼ fpi g was generated randomly, P satisfying the condition, i pi ¼ 1. The number of hidden states, N, was determined as the value of 5.

Levenshtein distance The Levenshtein distance (LD) is an indicator of how similar two sequences are, and it has been widely used in many fields such as bioinformatics and natural language processing. The LD between any two sequences is defined as the minimum number of operations required to transform one sequence into another, with the allowable operations being insertion, deletion, or replacement (Gusfield 1997). For example, only two operations, insertions of ‘‘i’’ and ‘‘t’’ are sufficient to transform ‘‘physics’’ into ‘‘physicist’’. Thus, the LD between ‘‘physics’’ and ‘‘physicist’’ is two. In this study, the LD was used to illuminate the similarities of shape pattern sequences within and between pattern sequence groups.

Results Shape patterns Four shape patterns were identified using the SOM and kmeans clustering algorithm, which minimized the DB index (see Table 1). Lower value of DB index means better cluster performance. Figure 3 shows the clustered SOM based on a k-means clustering algorithm for the shape patterns of the crawling behavior. The movement shapes of the organism in the controlled and the chemically treated environments were accordingly grouped. The outer sub-

Theory Biosci. Table 1 Davies–Bouldin (DB) indices for different numbers of groups Number of groups

2

3

4

5

6

7

DB index

0.881

0.921

0.865

0.903

0.917

0.928

figures represent the typical shape patterns for each group. The shape patterns appeared to be classified according to the relative bending degrees of the 13 points on the basis of the BLS entropy property. Linear and V-shaped patterns were tightened in group I and shapes in group II typically show the pattern of ‘‘omega turning’’, which is well-known pattern in the locomotion study of the organism (Suzuki

et al. 2005). Group III contains shapes with the coiled up end, while shapes in group IV have S-like pattern. The temporal movements of C. elegans are characterized by a series of shapes, each selected from these shapes, with intervals of 1/24 s, for example hp1 ; p4 ; p2 ; p2 ; p3 ; p1 ; p3 ; . . .; p2 i. Shape transition probability The parameter-learning task of the three HMMs was carried out using the well-known Baum–Welch algorithm. 41 shape sequences (21 for the control group, 9 for the treated group of Formaldehyde 0.1 ppm, and 11 for the treated

Fig. 3 Movement shape patterns of C. elegans clustered by the self-organizing map and k-means clustering algorithm. The center and the outer figures show the clustered SOM and the shape patterns for the 4 pattern clusters, respectively

123

Theory Biosci. Table 2 The Hidden Markov model trained by pattern sequences of C. elegans in normal condition State

State 1

State 2

State 3

State 4

State 5

0.184

0.094

0.047

0.315

0.358

State

State 3

State 4

State 5

(a)

(a) Probability t?1 t

Table 3 The Hidden Markov model trained by pattern sequences of C. elegans treated with 0.1 ppm of formaldehyde: (a) initial state probabilities, (b) transition probability matrix (TPM), and (c) emission probability matrix (EPM) from trained hidden Markov model (HMM)

State 1

State 2

Probability

(b) State 1

0.101

0.559

0.002

0.317

0.019

State 2

0.315

0.147

0.000

0.525

0.011

State 3

0.001

0.006

0.962

0.008

0.021

State 4

0.234

0.524

0.001

0.226

0.013

State 5

0.011

0.005

0.003

0.010

0.969

Pattern 1

Pattern 2

Pattern 3

Pattern 4

t?1 t

State 1

State 2

State 3

State 4

State 5

0.276

0.018

0.062

0.108

0.533

State 1

State 2

State 3

State 4

State 5

State 1

0.819

0.057

0.001

0.033

0.087

State 2

0.026

0.213

0.000

0.751

0.009

State 3

0.002

0.000

0.990

0.000

0.006

State 4

0.018

0.670

0.000

0.300

0.011

State 5

0.010

0.001

0.000

0.002

0.985

(b)

(c) State 1

0.011

0.001

0.000

0.986

State 2

0.008

0.001

0.000

0.990

State 3

0.037

0.278

0.664

0.020

State 4

0.010

0.001

0.000

0.987

State 5

0.973

0.000

0.001

0.024

1.23

2.83

%

32.60

63.16

(a) Initial state probabilities, (b) transition probability matrix (TPM), and (c) emission probability matrix (EPM) from trained hidden Markov model (HMM). The notation ‘‘ %’’ represents the percentage frequency with which an event pattern occurred in the simulated crawling behavior

group of Formaldehyde 0.5 ppm), each sequence consisting of 28,800 shapes (corresponding to 20 min), and were provided to HMM as input data for training. In the control case, the organisms mainly showed active forward behavior. When the organisms reached near the wall, they moved back with body turning. Because the forward and the backward movement are achieved by the repetition of ‘‘Pattern 4-pattern 1-pattern 4’’, the frequency percentage of pattern 4 is higher than that of pattern 1 (Table 2). The percentages were calculated by generating state sequences based on the TMPs and calculating the ratio of the number of occurrences of each state to the total number of occurrences in the sequence. This ratio was used as weight (W) for the occurrence of events. For example, the percentage of pattern 1 can be calculated as the summation of Wi 9 pattern1i. Here, pattern1i (i = 1, 2, 3, and 4) represents the occurrence probability from state i to pattern 1. In the HMM for the treatment case of Formaldehyde 0.1 ppm (see Table 3), the frequency percentage of pattern 1 became higher than that of pattern 4. When moving forward and backward, the individuals bend (pattern 4) and stretch (pattern 1) their bodies. Thus, the change in the percentage reflects that the movement activity of

123

Pattern 1

Pattern 2

Pattern 3

Pattern 4

(c) State 1

0.534

0.050

0.001

0.413

State 2

0.002

0.000

0.000

0.997

State 3

0.007

0.065

0.927

0.000

State 4

0.001

0.000

0.000

0.997

State 5

0.998

0.000

0.001

0.40

0.84

%

69.95

0.000 28.69

The notation ‘‘ %’’ represents the percentage frequency with which an event pattern occurred in the simulated crawling behavior

individuals was markedly decreased during the crawling behavior. In particular, the much higher occurrence of pattern 1 indicated the resting behavior. For the case of Formaldehyde 0.5 ppm treatment, the organisms showed a similar percentage in pattern 1 and pattern 4, which means that the forward and backward activity of the organisms was increased and the turning behavior (pattern 2 and pattern 3) was also enhanced. The organisms sporadically showed strong forward movement for a short time and then took a rest for 2–3 min, which was also reflected on the similar percentage of pattern 1 and pattern 4. It would be interesting to investigate the physiology for the behavioral change. They sporadically showed strong forward movement for a short time and then took a rest for 2–3 min, which was reflected on the similar percentage of pattern 1 and pattern 4, as well higher percentage of pattern 3 than other cases (Table 4). Similarity analysis for temporal shape sequences We computed the LD over all pairs of shape pattern sequences, consisting of 41 sequences. Table 5 shows the

Theory Biosci. Table 4 The Hidden Markov model trained by pattern sequences of C. elegans treated with 0.5 ppm of formaldehyde: (a) initial state probabilities, (b) transition probability matrix (TPM), and (c) emission probability matrix (EPM) from trained hidden Markov model (HMM) State

State 1

State 2

State 3

State 4

State 5

0.078

0.066

0.136

0.218

0.500

(a) Probability t?1 t

State 1

State 2

State 3

State 4

State 5

State 1

0.092

0.543

0.002

0.311

0.050

State 2

0.257

0.162

0.000

0.562

0.017

State 3

0.001

0.002

0.976

0.003

0.016

State 4

0.192

0.548

0.000

0.235

0.022

State 5

0.011

0.004

0.002

0.009

0.971

(b)

Pattern 1

Pattern 2

Pattern 3

Pattern 4

value between control and the treatment group for Formaldehyde 0.1 ppm was higher than that between the control and the treatment group for Formaldehyde 0.5 ppm. As mentioned in the Sect. 5.2, this was caused by the fact that organisms under the treatment of Formaldehyde 0.1 ppm were less active and the organism for the formaldehyde 0.5 ppm showed the combinational behavior of the resting and the sudden-strong forward movement. The LD values for the estimated sequences showed similar trend with those for the experimental sequences (Table 5b). The LD values for the comparison between experimental sequences and estimated sequences were smaller than those for the comparison between experimental sequences, which means that the HMMs were successful to simulate the crawling behavior (see Table 5c).

Discussion and conclusions

(c) State 1

0.025

0.007

0.001

0.964

State 2

0.018

0.006

0.000

0.974

State 3

0.013

0.105

0.875

0.005

State 4

0.017

0.006

0.000

0.975

State 5

0.974

0.001

0.000

0.89

4.68

%

48.36

0.023 45.85

The notation ‘‘ %’’ represents the percentage frequency with which an event pattern occurred in the simulated crawling behavior

average LDs within and between data groups. For the experimental sequences (Table 5a), the average LD within control groups was smaller than the average LDs between treated groups and between control-treatment groups. This means that the crawling behavior is distinctively different before and after treatment. The LD Table 5 The average Levenshtein distances within and between pattern sequence groups

In this study, we suggested a new model for the behaviors of C. elegans and using the model, elucidated the differences between the behavior before and after chemical treatment. To characterize the movement patterns of C. elegans, we newly introduced the ‘‘branch length similarity’’ (BLS) entropy based on a simple network consisting of a single node and branches. By placing 13 nodes and 12 edges at equal intervals along the organism’s length, we constructed 2 networks. One is for the nodes, and the other is the edges. BLS entropy values were calculated as SL for the former network and Sh for the latter. A set of SL and Sh successfully captured the characteristics of the motion patterns. This method has obviously comparative merits with respect to other conventional metrics such as the distance between the head and the tail, and the distance between the center point of the body and the center of

Normal

Formaldehyde 0.1 ppm

Formaldehyde 0.5 ppm

6813.7

8203.8

7353.6

6839.4

7477.7

(a) Normal Formaldehyde 0.1 ppm Formaldehyde 0.5 ppm

7498.6

(b) Normal

5326.2 ± 64.3

Formaldehyde 0.1 ppm

7217.9 ± 102.6

6001.8 ± 44.8

5437.9 ± 160.2

6286.5 ± 30.1

Formaldehyde 0.5 ppm (c) Normal Formaldehyde 0.1 ppm

5888.5 ± 74.9 6371.8 ± 39.0

7907.7 ± 45.5

6985.4 ± 28.8

6362.4 ± 98.7

7187.3 ± 59.2

Formaldehyde 0.5 ppm

6968.3 ± 37.3

(a) Measured data, (b) simulated data, and (c) measured vs. simulated data

123

Theory Biosci.

mass. In the self-organizing map (SOM) analysis, all shapes were clustered into 4 patterns. Three HMMs were trained using a training dataset consisting of 21 temporal shape sequences for no-chemical condition, 9 sequences for Formaldehyde 0.1 ppm treatment, and 11 sequences for Formaldehyde 0.5 ppm treatment. We generated 100 samples with the same size as the training sequences from each of the trained HMMs and then showed that HMM can successfully characterize the crawling behaviors of C. elegans. Our study is meaningful in the research of aquatic bio-monitoring systems, which is the equipment of inferring the ecological condition of river, lakes, and wetlands by examining the behavioral changes of organisms that live there. To quickly make an alarm for the changes in the ecological condition, the bio-monitoring system should be sensitive to the behavioral changes and take a simple process to reduce the computational burden for behavioral analysis. Thus, fast recognition of behavioral changes is as important as the correct identification of the change. Most previous studies on the response behaviors of organisms depend on the movement track of the organisms. However, the extraction of features such as speed, angle, and stop duration from the movement tracks, in addition to training the learning system using such features, is time consuming. Above all, these methods are expected to take a long time to determine the water quality in real environments. These drawbacks make it difficult to use methods based on the movement track as bio-monitoring systems. In contrast, our BLS entropy can be immediately calculated with little computing power. As a result, the BLS entropy for shapes, the SOM and the HMM were efficiently combined to analyze movement patterns and could be a means of in situ behavioral monitoring in real time. We can imagine without difficulty how the monitoring system can be built. As the first step, we need to make a small cage (*1 cm) with inlet and outlet tubing. Through the tubes, water in an aquatic system (e.g., river) can flow to the cage. After the flow is stabilized in the cage, a test worm is automatically introduced into the cage, and then the behavior of the worm is recorded and analyzed. Finally, behavioral difference between a simulated worm made by HMM and a real worm can be statistically compared. The comparison can provide some information on the water quality of the aquatic system. In summary, our approach and results are valuable not only because they can provide a novel tool to quantitatively characterize the behavioral patterns of C. elegans and other organisms with slender bodies, but also because they showed the possibility of being applied to bio-monitoring systems. Acknowledgments We thank the anonymous referee for the careful reading of the manuscript and the valuable comments. This research

123

was supported by the research project (B21502-1) of the National Institute for Mathematical Sciences, Republic of Korea.

References Baum LE, Petrie T, Soules G, Weiss N (1970) A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann Math Stat 41:164–171 Brenner S (1974) The genetics of Caenorhabditis elegans. Genetics 77:71–94 Bryden JA, Cohen N (2004) A simulation model of the locomotion controllers for the nematode Caenorhabditis elegans. In: From animals to animats 8. MIT Press, Cambridge, pp 183–192 Cohen N, Boyle JH (2010) Swimming at low Reynolds number: a beginners guide to undulatory locomotion. Contemp Phys 51:103–123 Ferree TC, Marcotte BA, Lockery SR (1997) Neural network models of chemotaxis in the nematode Caenorhabditis elegans. In: Advances in neural information processing systems 9. MIT Press, Cambridge, pp 55–60 Ghosh R, Emmons SW (2008) Episodic swimming behavior in the nematode C. elegans. J Exp Biolo 211:3703–3711 Gusfield D (1997) Algorithms on strings, trees and sequences: computer science and computational biology. Cambridge University Press, Cambridge Hartigan JA, Wong MA (1979) Algorithm AS 136: a k-means clustering algorithm. Appl Stat 28:100–108 Jung S (2010) Caenorhabditis elegans swimming in a saturated particulate system. Phys Fluids 22:031903 Kaletta T, Hengartner MO (2006) Finding function in novel targets: C. elegans as a model organism. Nat Rev Drug Discov 5:387–399 Kang S-H, Jeon W, Lee S-H (2012) Butterfly species identification by branch length similarity entropy. J Asia-Pac. Entomol 15:437–441 Kohonen T (1989) Self-organization and associative memory, 3rd edn. Springer, New York Korta J, Clark DA, Gabel CV, Mahadevan L, Samuel AD (2007) Mechanosensation and mechanical load modulate the locomotory gait of swimming C. elegans. J Exp Biolo 210:2383–2389 Lee S-H (2010) Robustness of the branch length similarity entropy approach for noise-added shape recognition. J Kor Phys Soc 57:501–505 Lee S-H, Bardunias P, Su NY (2010) A novel approach to shape recognition using the shape outline. J Kor Phys Soc 56:1016–1019 Morita S, Oshio KI, Osana Y, Funabashi Y, Oka K, Kawamura K (2001) Geometrical structure of the neuronal network of Caenorhabditis elegans. Phys A 298:553–561 Niebur E, Erdos P (1993) Modeling locomotion and its neural control in nematodes. Commun Theor Biolo 3:109–139 Pierce-Shimomura JT, Morse TM, Lockery SR (1999) The fundamental role of pirouettes in Caenorhabditis elegans chemotaxis. J Neurosci 19:9557–9569 Rabiner L (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77:257–286 Ramakrishnan K, Okkema RG (2014) Regulation of C. elegans neuronal differentiation by the ZEB-Family factor ZAG-1 and the NK-2 homeodomain factor CEH-28. PLoS One 9:0113893 Ryu WS, Samuel AD (2002) Thermotaxis in Caenorhabditis elegans analyzed by measuring responses to defined thermal stimuli. J Neurosci 22:5727–5733

Theory Biosci. Sawin ER, Ranganathan R, Horvitz HR (2000) C. elegans locomotory rate is modulated by the environment through a dopaminergic pathway and by experience through a serotonergic pathway. Neuron 26:619–631 Srivastava N, Clark DA, Samuel AD (2009) Temporal analysis of stochastic turning behavior of swimming C. elegans. J Neurophysiolo 102:1172–1179 Stephens GJ, Johnson-Kerner B, Bialek W, Ryu WS (2008) Dimensionality and dynamics in the behavior of C. elegans. PLoS Comput Biol 4:e1000028 Suzuki M, Tsuji T, Ohtake H (2005) A model of motor control of the nematode C. elegans with neuronal circuits. Artifi Intell Med 35:75–86

Vesanto J, Alhoniemi E (2000) Clustering of the self-organizing map. IEEE Transac Neural Net 11:586–600 Wicks SR, Roehrig CJ, Rankin CH (1996) A dynamic network simulation of the nematode tap withdrawal circuit: predictions concerning synaptic function using behavioral criteria. J Neurosci 16:4017–4031 Zariwala HA, Miller AC, Faumont S, Lockery SR (2003) Step response analysis of thermotaxis in Caenorhabditis elegans. J Neurosci 23:4369–4377

123

Characterization of the crawling activity of Caenorhabditis elegans using a Hidden Markov model.

The locomotion behavior of Caenorhabditis elegans has been studied extensively to understand the respective roles of neural control and biomechanics a...
1KB Sizes 0 Downloads 14 Views