IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 25, NO. 1, JANUARY 2014

203

Linguistic Decision Making for Robot Route Learning Hongmei He, Member, IEEE, Thomas Martin McGinnity, Member, IEEE, Sonya Coleman, and Bryan Gardiner, Member, IEEE

Abstract— Machine learning enables the creation of a nonlinear mapping that describes robot-environment interaction, whereas computing linguistics make the interaction transparent. In this paper, we develop a novel application of a linguistic decision tree for a robot route learning problem by dynamically deciding the robot’s behavior, which is decomposed into atomic actions in the context of a specified task. We examine the real-time performance of training and control of a linguistic decision tree, and explore the possibility of training a machine learning model in an adaptive system without dual CPUs for parallelization of training and control. A quantified evaluation approach is proposed, and a score is defined for the evaluation of a model’s robustness regarding the quality of training data. Compared with the nonlinear system identification nonlinear auto-regressive moving average with eXogeneous inputs model structure with offline parameter estimation, the linguistic decision tree model with online linguistic ID3 learning achieves much better performance, robustness, and reliability. Index Terms— Atomic action, dynamic behavior decision, linguistic decision tree, robot route learning, task decomposition.

I. I NTRODUCTION

T

HE ability to learn automatically is a major step forward in robotics, enabling a more widespread adoption of robots in a range of applications. Machine learning technologies enable us to train a model with a set of training samples, and thus to obtain approximately accurate model parameters. Hence, they offer a good approach to robot learning. Machine learning enables the creation of a nonlinear mapping that describes robot-environment interaction, and the trained models are validated by comparing the behavior of the robot with that represented by the training data. There is some related work in the literature. For example, in the earlier literature, Krishna and Kalra [11], [12] proposed a real-time collision avoidance algorithm by classifying the environment based on the spatio-temporal sensory sequences. They used a double layered classification scheme, in which, a

Manuscript received April 24, 2012; revised January 16, 2013; accepted March 31, 2013. Date of publication May 13, 2013; date of current version December 13, 2013. This work was supported by the Leverhulme Trust under Grant F00430F. H. He is with the Department of Engineering and Digital Arts, University of Kent, Canterbury CT2 7NZ, U.K. (e-mail: [email protected]). T. M. McGinnity, S. Coleman, and B. Gardiner are with the Intelligent System Research Centre, University of Ulster, Londonderry BT487JL, U.K. (e-mail: [email protected]; [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNNLS.2013.2258037

fuzzy rule base is used for the spatial classification at the first level and at the second level Kohonen’s self-organizing map and a fuzzy ART network is used for temporal classification. Recently, Freire et al. [3] investigated four structures of neural network (NN) classifiers for the wall-following navigation task with a real mobile robot, and examined the influence of short-term memory mechanisms on the performance of neural classifiers. Kostavelis et al. [10] used a support vector machine (SVM) classifier to detect nontraversable scenes using solely stereo vision input. Mulero–Martinez [17] developed a GRBF static NN controller supervised by a switch logic, allowing arbitration between a NN and a robust proportional-derivative controller, for stable adaptive tracking of a robot manipulator. The approaches mentioned above are essentially classification-based approaches. An alternative methodology is that based on nonLinear auto-regressive moving average with eXogeneous inputs (NARMAX) or equivalent techniques. A NARMAX model [2] is a general and natural representation of nonlinear systems. When a nonlinear system is represented with a polynomial, the problem of parameter estimation becomes a linear regression problem, which can be solved with the least squares algorithms. Hence, a NARMAX model polynomial can be used to directly produce the mapping between control code (the velocity of a robot in the implementation) and environment perceptions, and it is applied to simulate a robot’s behavior for the robot’s route learning problem [9], [13], [19]. Generally, the ability of a robot simulator to predict accurately depends on three models: 1) the robot model; 2) the task model; and 3) the environment model [13]. The environment model provides the robot’s sensory perception, based on the robot’s position and orientation. For the approaches above, the environment model is of the most interest. However, it requires robots to randomly learn the whole environment. As in [9], the sensory perception of a robot is calculated by the environment model, and the estimated sensory perception is as the input of a control model. There exists accumulated errors. If the training data does not represent the environment sufficiently accurately, then the inaccurate environment model can lead to inaccurate robot behavior. A human driving a robot to learn a route can be analogous to an adult leading a child to walk through a path. Thus fast learning only requires obtaining the perception of a robot relative to the environment in a specified route, instead of the whole environment. To improve the accuracy of a robot’s behavior for a specified task, a reasonable approach is to decompose the behavior of

2162-237X © 2013 IEEE

204

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 25, NO. 1, JANUARY 2014

the robot in the task to atomic actions. The key requirement is then to decide the critical points where the robot switches its action from one atomic action to another. Because of the uncertainty inherent in environmental conditions, and given that the perception of the robot’s current position in the environment is a consequence of the last behavior of the robot in the environment, including any errors, dynamic decision making based on the perception is necessary. Decision trees are popularized by Quinlan [23] with the ID3 induction algorithm. Decision tree techniques are already shown to be interpretable, efficient, problem independent and able to treat large scale applications [21]. A linguistic decision tree (LDT) [15], [22] is a type of probabilistic tree that combines a decision tree with label semantics [14]. Machine Learning is concerned equally with classification and regression problems, and considerable research on classification and regression is based on static databases [5], [6], although there is some research on machine learning for real-time control. The introduction of LDT is also based on static databases [15], [22]. Lan and Liu [24] investigated a neurophysiological decision-making mechanism for robot behavior coordination demonstrated by the pushing stick task. However, such decision making misses linguistic interpretation. A linguistic decision tree can improve the transparency of the interaction between the robot and environment. In this paper, we develop a LDT to dynamically control a robot’s behavior by using the robot’s sensory perception in the specified environment as the input attributes of the LDT for the robot route learning problem. Alippi et al. [1] proposed an effective just-in-time adaptive classifier. However, as the computing complexity of training a machine learning model poses a big challenge when applying it to real-time control, we examine the real-time performance of training, and explore the possibility of dynamic training through experimentation without updating the training data in each run. We also compare the offline and online performance. In addition, we propose a quantified approach to evaluating the performance of the robot’s learning. Unlike the simultaneous localization and mapping (SLAM) method for mobile robot navigation, in which a map of the environments is necessary, the LDT approach does not need any map, but the perception of environments. In addition, SLAM has two main problems. The first problem is the computational complexity due to the growing state vector with each added landmark in the environment. The second problem is the data association that matches the observations and landmarks in the state vector [25]. However, once the LDT model is learned, the complexity of the LDT model is almost fixed, as the number of input attributes in the trained LDT and the number of labels for each attribute are fixed. Mucientes et al. [16] used a genetic algorithm to produce weighted linguistic rules to control an autonomous robot. However the genetic search is clearly an offline process. Nguyen–Tuong and Peters [18] employed a local kernelbased learning for the online approximation of a multivalued mapping, and they argued that learning models for task-space tracking control from sampled data is an ill-posed problem, as the same input data point can yield many different output

values. We will show that the proposed LDT model for robot’s route learning is robust, whereas in an NARMAX model with offline learning, the ill-posed problem remains. This paper is organized as follows. In Section II, we briefly introduce the LDT based on label semantics. In Section III, we explain the decomposition of the robot’s behavior for a specified task. In Section IV, the robot’s controller is implemented with the trained LDT for a specified task. In Section V, we propose an approach to evaluating the learning performance for the robot’s route learning problem, and in Section VI, the experiments are presented and the results are evaluated. Finally, we conclude the work in Section VII. II. L INGUISTIC D ECISION T REE Label semantics proposed by Lawry [14] provides a means of using linguistic expressions (defined by fuzzy sets) to label numerical values. An LDT is a type of probabilistic tree that combines label semantics and a decision tree and thus provides transparent rules for decision making or classification [5], [15]. It expands with focal elements from level to level guided by information heuristics. For each branch, the class probabilities are evaluated based on the training data. In this section, we briefly describe label semantics, LDT, and the induction algorithm, linguistic ID3 (LID3) [5], [22]. A. Label Semantics Label semantics [14] proposes two fundamental and interrelated measures of the appropriateness of labels as descriptions of an object in an underlying domain . Label semantics assumes a finite set of labels L. Each label corresponds to a fuzzy interval [Fig. 1(a)]. The measure of appropriateness of a label as a description of instance x is denoted by μ L (x) and quantifies the agent’s subjective belief that the label L can be used to describe x based on its (partial) knowledge of the current labeling conventions of the population. A mass assignment m x on sets of labels quantifies an agent’s belief that any particular subset (called focal element) of labels contains all and only the labels with which it is appropriate to describe x. Definition 1 (Mass Assignment on Label Sets): For ∀x ∈ , mass assignment on labels is a function m x :2L → [0, 1],  such that S⊆L m x (S) = 1. Depending on labeling conventions there may be certain combinations of labels that cannot all be appropriate to describe any object. For example, given label set L = {small, medi um, and large}, small and large cannot both be appropriate to describe an object, but {small, medi um} and {medi um, large} could be used to describe an object. Definition 2 (Set of Focal Elements): Given label set L together with associated mass assignment m x , the set of focal elements for L is given by: F = {S ⊆ L:∃ m x (S) > 0}. A λ-mapping is introduced as ∀L i ∈ L, λ(L i ) = {F ∈ F , L i ∈ F}. Namely, λ(L i ) covers those subsets that include label L i . An appropriateness measure μ L (x) quantifies the degree of our belief that the label L is appropriate for x ∈ . μ L (x) is

HE et al.: LINGUISTIC DECISION MAKING

205

x2

{s} y 1.00 0.00 0.00

y

0.00 0.00 0.00

Fig. 2.

evaluated as the sum of mass assignments m x over λ(L)  μ L (x) = m x (F). (1) F ∈λ(L)

Appropriateness measures are not a one-to-one function of mass assignments, as m x cannot be uniquely determined from μ L (x):L ∈ L. However, based on the following assumption, the calculus can be functional. Definition 3 (Consonance in Label Semantics): Given nonzero appropriateness measure on basic labels L = {L 1 , L 2 , . . . , L k } ordered such that μ L i (x) ≥ μ L i+1 (x), for i = 1, . . . , k − 1, then the consonant mass assignment has the form m x ({L 1 , . . . , L k }) = μ L k (x) m x (φ) = 1 − μ L 1 (x) m x ({L 1 , . . . , L i }) = μ L i (x) − μ L i+1 (x) for i = 1, . . . , k − 1. In this context the consonant assumption is that, for each x ∈ , we first identify a total ordering on the appropriateness of labels. We then evaluate our belief value m x about which labels are appropriate to describe x consistently with this ordering. Fig. 1(a) and (b) shows the relationship between appropriateness measure and mass assignment of focal elements. Fig. 1(a) shows the appropriateness of attribute x ∈ [0, 10] on the label set L = {vl, l, m, h, vh}. Each label represents an interval, and they overlap by 50%. Fig. 1(b) shows the mass assignments on focal elements, corresponding to the appropriateness of x on labels in Fig. 1(a). For example, in Fig. 1(a), μvl (x) = 1 and μl (x) = x − 1, x ∈ [1, 2]. According to the consonance in Label Semantics, we have see Fig. 1(b): m x ({vl, l}) = x − 1 and m x ({vl}) = 1 − (x − 1) = 2 − x, x ∈ [1, 2].

{m,l}

{l}

x1

y 0.00 1.00 0.00

y

x1

0.00 1.00 0.00 {m} {s} {s,m} {m,l} y

Fig. 1. Two fundamental and interrelated measures in label semantics. (a) Appropriateness. (b) Mass assignments.

{m}

{s,m}

{l}

{s}

{s,m}

{m} {m,l}

0.00 0.00 1.00

{l}

y

y

y

y

y

y

y

y

0.00 1.00 0.00

0.00 0.91 0.19

0.00 0.14 0.86

0.00 0.00 0.00

0.00 1.00 0.00

0.00 0.83 0.17

0.00 0.43 0.57

0.00 0.04 0.96

An LDT.

depth d is a conjunction of focal elements F1 ∧ · · · ∧ Fd , where Fk is the focal element of an edge on branch B, for k = 1, . . . d. Each branch is augmented by a set of conditional mass values m(F|B), for each focal element F ∈ F y , where, F y is the set of labels describing the decision variable y. Therefore, an LDT can present transparent linguistic probability rules specified in Definition 4. Through these rules, for a given sample, the probabilities of goal description (e.g., robot’s action) on labels can be estimated. Definition 4 (Linguistic Definitions): Given an LDT, where the rules corresponding to the branch Bi are Fi1 ∧ · · · ∧ Fid → F: m(F|Bi ), for each focal element F ∈ F y , the mass assignment m y , for a given example with attribute values x = (x 1 , . . . , x m ), can be determined according to Jeffrey’s rule [7] by m y (F) =

t 

μ Bi ( x )m(F|Bi )

(2)

i=1

where t is the number of branches in the LDT, and m(F|Bi ) is equivalent to the conditional probability p(F|Bi ). Assume x i j is described with a focal element Fi j in branch Bi . For branch Bi , we have x) = μ Bi (

d  j =1

m xi j (Fi j ).

(3)

B. Semantics of an LDT

For example, Fig. 2 shows an LDT for a classifier with a goal variable (representing robot’s behavior) that can be described with the label set L y = {F, R, L}, where “F” represents the robot’s action Forward, “R” represents Turn Right, and “L” represents Turn Left. There is no fuzzy action. Namely, there are no samples where the robot’s action can be described with subsets {F, R} and {R, L}. Therefore, the focal set of the goal variable is {{F},{R},{L}}. The input attributes x 1 and x 2 (e.g., laser readings, robot’s position, or its heading angle) can be described with label set {s, m, l}, where s = small, m = medium, and l = large. The focal set is {{s}, {s, m}, {m}, {m, l}, {l}}. There are 13 branches in the LDT in Fig. 2. We denote branches as B0 , . . . , B12 from the left to the right of the LDT. B3 represents the following rule:

In an LDT based on label semantics, the nodes are attributes, the edges are focal elements describing attributes, and a branch is a path from the root to a leaf of the LDT. A branch B with

[B3 ] (x 2 is {m}) ∧ (x 1 is {s, m}) → {F}:0.00, {R}:1.00, {L}:0.00. Now consider an example, x = {3.0, 1}. We can calculate the mass assignments for each attribute. We have

206

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 25, NO. 1, JANUARY 2014

the following: m x1 = ({s}:0, {s, m}:0.644, {m}:0.356, {m, l}:0, {l}:0), m x2 = ({s}:0, {s, m}:0.333, {m}:0.667, {m, l}:0, {l}:0). m y ({R}) = m x2 ({s, m}) × p({R}|B1 ) +m x2 ({m}) × m x1 ({s, m}) × p({R}|B3) +m x2 ({m}) × m x1 ({m}) × p({R}|B4) = 0.333 × 1 + 0.667 × 0.644 × 1 + 0.667 ×0.356 × 1 = 1. Therefore, given the sample, the probability that the robot’s action is Turn Right is one. C. LID3 Algorithm for the Induction of an LDT LID3 [5], [22], an extension of the well-known ID3 algorithm [23], is used to build the LDT based on a given linguistic database. The search is guided by a modified measure of information gain in accordance with label semantics. Definition 5 (Branch Entropy): The entropy of branch B, for a given goal variable belonging to class set C = C1 , . . . , Ct , is E(B) = −

t 

P(Ci |B) log2 P(Ci |B).

(4)

i=1

Consider a branch B, and suppose that x j is appended to the branch B. Then the expected entropy is defined as follows: Definition 6 (Expected Entropy): When x j is appended to the branch B, the expected entropy is  E E(B, x j ) = E(B ∪ F j )P(F j |B) (5) F j ∈F j

where B∪F j represents the new branch obtained by appending the focal element F j to the end of branch B. The probability of F j given B can be calculated as  x) ∈D P(B ∪ F j | (6) P(F j |B) = x P(B| x ) x ∈D where P(B| x ) = μ B ( x ) (see Definition 4). Definition 7 (Information Gain): By using the notations of Definitions 5 and 6, the information gain is defined as follows: I G(B, x j ) = E(B) − E E(B, x j ).

(7)

The most informative attribute will form the root of an LDT, and the tree will expand into branches associated with all possible focal elements of this attribute. For each branch, the free attribute with maximal information gain will be the next node until the branch reaches the specified maximum depth or the maximum class probability reaches the given threshold. The process forms a level order traversal. III. D ECOMPOSING THE B EHAVIOR OF A ROBOT The trajectory a robot follows may be considered to be a combination of a set of simple actions. We define an atomic action of a robot as an action that cannot be decomposed further, and can be represented by a set of parameters that are used to drive a robot. The behavior of a mobile robot is

Fig. 3.

U-path and an S-path.

decided by its linear velocity and angular velocity. Therefore, the atomic action of a mobile robot is represented by the pair of linear velocity and angular velocity (v, w). No matter what shape of trajectory a mobile robot is traversing, the robot’s behavior can always be divided into the four types of atomic actions, which correspond to the pairs of linear velocity and angular velocity (v, w). 1) Forward (F): linear velocity v > 0, angular velocity w = 0. 2) Backward (B): linear velocity v < 0, angular velocity w = 0. 3) Turn left (L): linear velocity v = C, angular velocity w > 0. 4) Turn right (R): linear velocity v = C, angular velocity w < 0. Here, C is a constant that is larger than zero. Usually, we do not set the rotational radius to zero. Hence, the goal variable can be described with at most four action labels. Namely, for any shape of routes, a robot’s behavior can be decomposed into a sequence of the atomic actions, turn-left, turn-right, forward, and backward. Therefore, the classifier deals with at most four classes. For example, suppose a robot travels along a U-path as shown in Fig. 3(a). The U-path can be divided into five stages, and the robot has five atomic actions correspondingly. We assume a constant linear velocity of 0.25 and a constant angular velocity of 0.524 for simplicity: 1) F: v = 0.25, w = 0 (a1 ); 2) R: v = 0.25, w = −0.524 (a2 ); 3) F: v = 0.25, w = 0 (a3 ); 4) R: v = 0.25, w = −0.524 (a4 ); 5) F: v = 0.25, w = 0 (a5 ). Actually, in the U-path case, there exist only two types of actions, forward and turn right, during the robot’s journey. The goal variable can be described with two labels {F, R} for this task. The task is transformed to a decision making problem. Fig. 3(b) shows an S-path. The behavior of the robot is decomposed to nine actions, including three atomic actions, forward, turn right, and turn left. Therefore, the S-path task is transformed to a classification problem. For mobile robots, the relationship between linear velocity (v) and angular velocity (w) is as follows: v = r ∗ w or r = v/w where, r is the rotating radius.

HE et al.: LINGUISTIC DECISION MAKING

207

Hence, when w is constant, as v increases, r will increase; when v is constant, as w increases, r will decrease. By making use of this relationship, we can change the shape of the robot’s path by changing the linear velocity v when fixing w as a constant, or changing the angular velocity w when fixing v as a constant. For a physical mobile robot, the linear velocity and the angular velocity may restrict each other. Highly frequent fluctuation of the angular velocity could cause the robot to wobble or vibrate. The behavior of a robot R can be described with a sequence of actions, B(R) = {a1 , a2 , . . . , an }, where ai = (v i , wi ) is an atomic action, as the subtask is completed by ai without changing the robot action. Correspondingly the subtask is called an atomic task; in other words, v i and wi are kept constant during the subtask. For the previous U-path example, given T = U -path, the robot’s behavior includes two atomic actions: a1 = (0.25, 0) and a2 = (0.25, 0.524). Hence, the sequence of robot’s actions is {a1 , a2 , a1 , a2 , a1 }. When a mobile robot equipped with a laser sensor travels along a specific trajectory, the laser readings of the robot reflect the behavior of the robot in the specified environment. With the introduction of atomic actions, the robot’s behavior can be decomposed to the sequential execution of a series of atomic actions. Hence, finding the relationships between the actions of a robot and the received laser readings of the robot becomes a decision making or classification problem. Because of noise in the environment and the mechanical features of robots, the relationship between the robot’s actions in the real world and laser perceptions is uncertain and cannot be reliably and precisely expressed with a mathematical equation. Hence, a LDT may be an appropriate approach to solving the problem by combining a decision tree model and label semantics. Namely a = LDT(L 1 , . . . , L M )

(8)

where, L 1 , . . . , L M are the laser readings, a is the action of the robot, a ∈ A, and A is the set of atomic actions obtained from the training data. IV. ROBOT C ONTROL Assume that an LDT is trained with the LID3 algorithm based on the training data (Dtrn ), which is obtained through human control. Given a sample (e.g., laser readings L 0.M ), the robot’s action can be obtained with the trained LDT. As in the example in Section II-B, given a sample, the probability distribution for all actions can be calculated with the trained LDT. A decision is made as to the action that has the largest probability in all atomic actions. Initially, the robot is at position < x 0 , y0 , ϕ0 >. The robot controller reads the laser values, and decides its action with the LDT. Assume A is a set of atomic actions, A = {a1 , . . . , ak }, ai = (v i , wi ). The most appropriate action decided by the LDT corresponds to a pair of linear velocity and angular velocity (v, w) ∈ A, which is used to drive the robot. In the implementation, the robot’s action is replaced with the index i that indicates the position of the decided action in the array A of atomic actions. This procedure is repeated until the task is completed. In the controller, the robot is driven every

Algorithm 1 RobotLearning(R, L DT , A) 1: Initialization(R); 2: LP=LaserProxy(R); 3: PP=Position2DProxy(R); 4: v= V; 5: w = W; 6: k=1, i =0; 7: PP.SetSpeed(v,w); 8: while (not STOP) do 9: R.Reading(); 10: [L 0...M ] = LP.GetLaserReadings(); 11: i =LDT(L 0...M ); 12: (v, w) = A(i ) 13: PP.SetSpeed(v,w); 14: Sleep(90 ms); 15: end while

100 ms with a new action obtained by the trained LDT for a specific trajectory. Considering the relative computational times associated with the LDT algorithm and laser reading, the sleep time is set to 90 ms. Algorithm 1 shows the pseudocode for the learning process of the robot R with an LDT and a set of atomic actions A. V. P ERFORMANCE E VALUATION A PPROACHES For a classifier with k classes (k ≥ 2), we can define the true estimate such that a sample with measure x belongs to class C, and the estimated probability p(C| x ) is the largest among the probabilities for all classes Ci , i = 1, . . . , k. The simplest approach to evaluating a classifier is accuracy. The ordinary accuracy (A) is the ratio of the number of true estimations ( ki=1 Ni ) for all classes to the number of testing samples (M), and the conditional accuracy on an action a (e.g., Aa ) is the ratio of the number Na of true estimations for the action (a) to the number (Ma ) of samples where the robot’s behavior is action a. We propose an approach to evaluating the learning performance of a robot driven by a trained model in the target environment. Assume we have sampled n runs of a path for training, and each run contains N sample points, each point represents a pair of coordinates of the robot’s location in the test environment. The i th sample points in the j th run is denoted as si j =< x i j , yi j >. Usually, we can specify an expected path, or the expected path can be obtained through calculating the average coordinates at each point for all n training paths as follows:  n  n 1 1 xi j , yi j . (9) s¯i =< x¯i , y¯i >= n n j =1

j =1

We assume the number of samples for all runs in the training data is the same. During the robot’s learning procedure, two runs of a path could have quite different numbers of sample points. For example, the robot may stop at some points due to stochastic mechanical errors. In this case, several points are sampled at the same place. In particular, paths learned by

208

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 25, NO. 1, JANUARY 2014

different models could have very different numbers of sample points. The area between two paths can directly reflect the deviation of the two paths. When two curves are both in the first quadrant in a Cartesian coordinate system, we can calculate the difference of areas that two curves form to the x axis as  b    ε A =  f 2 (x) − f 1 (x)d x   a b   b   =  f 2 (x)d x − f 1 (x)d x  , x ∈ [a, b]. (10) a

a

Assuming the whole experimental environment is in the first quadrant of the Cartesian coordinate system, then all paths the robot learned are in the first quadrant. As the robot’s positions < x, y > are sampled discretely, the area difference between two paths based on the x axis can be calculated as        (11) Ax =  yi |(x i+1 − x i )| − yi |x i+1 − x i | .   p2

p1

Similarly, the area difference between two paths based on the y axis can be calculated as        Ay =  (12) x i |(yi+1 − yi )| − x i |yi+1 − yi | .   p2

p1

For the path p1 , the distance that the robot travels along the x axis is  |x i+1 − x i |. (13) Lx = p1

For the path p1 , the distance that the robot travels along the y axis is  |yi+1 − yi |. (14) Ly = p1

The integrated error of the path p2 to the expected path p1 can be calculated as follows:



Ay 2 Ax 2 ε( p2 , p1 ) = + . (15) Lx Ly Assume that we execute the trained model (e.g., LDT) with specified parameters several times. The average error for the n learned paths relative to the expected path p is 1 ε¯ = ε( pi , p). (16) n p i

With a machine learning approach, the performance is usually related to the training data. If a model can work well under noisy training data conditions, then the model is more robust. Hence in the proposed evaluation system, we consider the training data. The training data can also be evaluated with the approach above. Assume the average error of the training path is ε¯ (Ptrn ), and the average error for learned paths with a model is ε¯ (Plrn ). Generally, we expect the learned paths will have a small average error, when the average error for the training data is large. Hence we define a score χ to evaluate the model’s robustness ε¯ (Ptrn ) (17) χ= ε¯ (Plrn )

Fig. 4. Swilly, the Metralabs SCITOS G5 mobile robot used in the Experiments. (a) Mobile robot Swilly. (b) Laser configuration.

where, Ptrn is the set of the training paths, and Plrn is the set of learned paths. The higher the score, the more robust the model. VI. E XPERIMENTS AND E VALUATIONS A. Configuration In the experiments, we used a Metralabs SCITOS G5 robot [Fig. 4(a)]. The robot is equipped with 24 sonar sensors distributed around its circumference and a SICK laser range finder, which can scan the front of the robot ([0°, 270°]) with a radial resolution of 0.5°. In our experiments the laser range finder is configured to scan the front semicircle of the robot in the range ([0°, 180°]); see Fig. 4(b). The robotics arena is configured with artificial walls to form a working environment measuring 5 × 5 m. During the experiments, the data from the robot’s laser, position, orientation, linear and angular velocities are logged every 250 ms, and the robot is driven once per 100 ms. We do not need to use a Vicon tracking system, which is used by many other researchers [9], [13], [19], [20]. B. Experiment Approach We simplify the experiment by driving a robot along the specified trajectory to obtain the sensorgraph, instead of driving a robot to learn the whole environment as in previous approaches [9], [13], [19], [20]. The experiment involves the four stages outlined below. Stage 1: Obtaining the Sensorgraph on Specified Trajectory: A user demonstrates a desired trajectory by performing a task in the target environment. This demonstration is completed by driving a mobile robot along a specified path. During this demonstration period, the robot’s behavior is recorded, including robot’s linear velocity v, angular velocity w, position < x, y >, the heading angle (ϕ) and laser values L 0..M . The positions can be used for the validation of the robot’s learning behavior. Stage 2: Behavior Decomposition: Using the linear velocity and angular velocity obtained in the previous stage, we can index the actions of the robot. As we fix linear velocity to 0.25 m/s, the actions directly indicate different angular velocities.

HE et al.: LINGUISTIC DECISION MAKING

Stage 3: Training an LDT: To drive a robot, the goal is to decide that atomic action should be performed by the robot. An LDT is produced through training it with the data sampled in the previous stage, where laser readings, robot’s position < x, y > or its heading angle ϕ are input attributes, and the action index is the output attribute. Stage 4: Robot Learning Behavior: Finally, the implemented controller described in Section IV is used to drive the robot along the specified trajectory in the target environment. A notation for sample features is defined for simplicity of description. The notation includes the information of the path shape, the number of laser readings, and the information that indicates whether the robot’s position and its heading angle are contained in a sample. For example, in the notation U-3FT and U indicates that the training data are sampled through a U-path walk; the number of 3 indicates that three laser readings are evenly extracted from the 181 laser readings, the letter following the number of laser readings indicates whether the robot’s position < x, y > is included, and the last letter indicates whether the heading angle ϕ is included in a sample. We use F to denote exclusion, and T to denote inclusion. When the notation is used to represent a training set, a number may be appended to indicate the number of runs in the training data. For example, U-6FF5 indicates the training data are composed of five runs of a U-path with only six laser readings without robot’s position and its heading angle. The default is ten runs of path data. For the U-path and S-path learning, the robot starts from the upper-left corner. C. Samples for the U-Path and the S-Path The sampling cycle is 250 ms, and each run of the U-path is sampled with 128 sample points for 32 s. Each sample point has 189 values, which are: time (s), real linear velocity (rv), real angular velocity (rw), command linear velocity (cv), command angular velocity (cw), robot’s coordinates < x, y >, heading angle (ϕ), and laser readings L 0 , . . . , L 180 . Fig. 5(a) shows the training data with 10 runs of the robot along a U-path. Because of human errors, the initial positions and initial heading angles are slightly different in each run, and the stochastic errors, floor friction, mechanical features, etc cause slight differences in the robot runs. Fig. 5(b) shows the training data with 10 runs of the robot along an S-path. The differences between different runs are quite large. We will observe the robustness of the LDT approach. The curves look very smooth, because of the fixed linear velocity of 0.25 m/s and the scale of figures. As shown in Fig. 3(a), the behavior of the robot is decomposed to five actions, and the sequence of the robot’s behavior is given by {F, R, F, R, F}. Therefore, for the U-path task, the LDT is used to solve a decision making problem, and the decision variable can be described with two labels {F, R}. As there is no fuzzy action for the mobile robot, the set of focal elements is {{F}, {R}}. For the S-path shown in Fig. 3(b), the sequence of robot actions is {F, R, F, R, F, L, F, L, F}. Therefore, for this S-path task, the LDT is used to solve a classification problem, and the goal variable can be described

209

Fig. 5. Training samples for U-path and S-path. (a) Ten runs of U-path. (b) Ten runs of S-path.

with labels {F, R, L}, and the set of focal elements is {{F}, {R}, {L}}. All input attributes of the LDT, such as laser readings, robot’s position < x, y >, or the sin(ϕ) of its heading angle, are described with a set of five labels, {vl, l, m, h, vh}. Correspondingly, the set of focal elements are {{vl}, {vl, l}, {l}, {l, m}, {m}, {m, h}, {h}, {h, vh}, {vh}}. Each attribute will have the minimum and maximum values according to the training data. Hence, it can be evenly divided into five intervals between the minimum and maximum values, corresponding to five labels, and they overlap 50% with each other; see Fig. 1(a) and (b). D. Preliminary Offline Experiments With the sampled data, we conducted some preliminary offline experiments. The offline test results show that all decision makers or classifiers with different input attributes can obtain accuracy higher than 90%. For those LDTs with only laser readings, the accuracy slightly increases as the input attribute number increases. For the U-path learning, LDT (U-4TT) achieves an accuracy of 96.93%, and for the S-path learning, LDT (S-6TT) achieves an accuracy of 96.42%. However, for U-path learning, the accuracy on turning right is less than 80% for LDTs with only laser readings, and the LDT (S-3FF) only obtains an accuracy of 67.8% on turning right. It may be because some laser readings have large changes when the robot turns in a squared area, and the fuzzy intervals do not cover the changes of all laser readings. The uniformed trapezoid fuzzy labels describing laser readings might not be good enough. Therefore, to improve performance, other forms of nonuniform fuzzy labels may need to be considered in the future. Because of the nature of LDTs, the computing complexity increases as the number of attributes rises. Although the number of branches of the trained LDT for more attributes is quite large, the response for one run of decision making is less than 10 ms. As the driving period in the experiments is set to 100 ms, the response is fast enough for controlling a robot. Freire et al. [3] used a NN to solve the robot navigation problem as a classification problem. Therefore, we also offline trained NNs with 20-neurons in the hidden layer regarding different input attributes, respectively. The overall accuracies of LDTs with different input attributes are slightly better than or comparable with that of the NN classifiers. For the U-path learning, NN (U-5TT) achieved the best accuracy

210

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 25, NO. 1, JANUARY 2014

Fig. 6. U-paths and S-paths learned by the robot, which is driven by the (a) LDT(U-4TT) and (b) LDT(S-6TT), respectively.

of 96.95%, whereas for the S-path learning, NN (S-4TT) achieved the best accuracy of 95.69%. We also conducted the offline experiments for SVM (U_3FF), SVM (U_6FF) and SVM (U_6TT), and the accuracies are 86.11%, 84.50%, and 87.94%, respectively. We will examine the online performance of other machine learning approaches in future work. E. Experiments on a Physical Robot The LDT model can be viewed as a function that maps the relationship between input attributes and robot’s behavior, which is represented by the angular velocity of the robot, as the linear velocity is fixed to a constant 0.25 m/s. For simplicity, we denote the LDT with the input sample X as LDT (X). According to the offline experiments, U-4TT and S-6TT are the best sets of feeds for LDTs for the U-path and S-path learning, respectively. Therefore, we first examine the performance of the robot’s learning driven by LDT (U-4TT) and LDT (S-6TT), whose training data are extracted from the 10 runs of the U-path and S-path in Fig. 5(a) and (b), and the paths learned by the robot are shown in Fig. 6(a) and (b), respectively. Obviously, the U-paths learned by the robot that is driven by the LDT (U-4TT) have little deviation, although the ten runs of training data have quite large deviations. It can also be seen that the S-paths of the robot driven by the LDT (S-6TT) are quite stable. Player/Stage [4] is a robot simulation tool, of which, Player is a Hardware Abstraction Layer. It can send instructions to a robot, and get sensor data from the robot. The robot’s position < x, y > and its heading angle ϕ is estimated by the Player system. Errors may arise from the Player system. The robot’s laser readings directly reflect the environment change. Of most interest is the relationship between sensor perception and robot’s behavior. Therefore, the following experiments mainly examine the performance of the LDT with only the laser readings for robot route learning. 1) LDT With Input Samples of Different Attributes: In this section, we will examine the effect of different attributes on the learning performance of the robot through changing the attributes of samples utilized in the LDT. To guarantee that the training data are sufficient, ten runs of paths are used for both the U-path and the S-path learning. a) U-path learning: The experiment for each dataset of the LDT is performed three times. Fig. 7(a)–(d) shows the U-paths learned by the robot, which is driven by LDT(U-3FF),

Fig. 7. U-paths learned by the robot, which is driven by the (a) LDT(U-3FF), (b) LDT(U-4FF), (c) LDT(U-5FF), and (d) LDT(U-6FF), respectively.

Fig. 8. S-paths learned by the robot, which is driven by the (a) LDT(S-3FF), (b) LDT(S-4FF), (c) LDT(S-5FF), and (d) LDT(S-6FF), respectively.

LDT(U-4FF), LDT(U-5FF), and LDT(U-6FF), respectively. For the U-path learning, three laser readings can make the LDT work quite well. Although it is not easy to compare the paths of the robot driven by the LDT(U-3FF), LDT(U-4FF), and LDT(U-5FF), it can be seen that the U-paths of the robot driven by the LDT(U-6FF) are more consistent than others, and the Turn Right action is more accurate, close to 90°, as the deviation mainly comes from the initial heading angle of the robot. b) S-path learning: Similarly, the experiment for each feed set of the LDT is done three times. Fig. 8(a)–(d) shows the S-paths learned by the robot, which is driven by

HE et al.: LINGUISTIC DECISION MAKING

211

Fig. 9. U-paths learned with the LDT(U-6FF), which is trained with 10, 5, 2, and 1 runs of the U-path, respectively. (a) LDT trained with U-6FF10. (b) LDT trained with U-6FF5. (c) LDT trained with U-6FF2. (d) LDT trained with U-6FF1.

Fig. 10. S-paths learned with the LDT(S-6FF), which is trained with 10, 5, 2, and 1 runs of the S-path, respectively. (a) LDT trained with U-6FF10. (b) LDT trained with U-6FF5. (c) LDT trained with U-6FF2. (d) LDT trained with U-6FF1.

LDT(S-3FF), LDT(S-4FF), LDT(S-5FF), and LDT(S-6FF), respectively. For S-path learning, it can be clearly seen that as the feeding attribute number increases, the S-paths that the robot learned improve. Fig. 8(a) shows the learned paths of the robot driven by LDT(S-3FF); it shows a poor performance. The performance is slightly better when the robot is driven by LDT(S-4FF), illustrated by the run shown with the black solid line in Fig. 8(b), but the learned paths are quite distinct. When the robot is driven by LDT(S-5FF), the learned S-paths are quite good Fig. 8(c), although the second turning right in the learned path is not accurate enough. When the robot is driven by LDT(S-6FF), the learned paths are the best among all experiments Fig. 8(d). 2) Effect of Training Data Sizes on the LDT: The size of a training set directly affect the training time. To examine the training performance, the effect of training data sizes on the performance of an LDT is investigated in this section. In each case, we train the LDT with one, two, five, and ten runs of the training path and compare their performance. a) U-path learning: Fig. 9(a)–(d) shows the learned paths of the robot driven by the LDT(U-6FF) for different sizes of training data. Obviously, 1 run of U-path data is not sufficient to train the LDT(U-6FF) Fig. 9(d). b) S-path learning: Fig. 10(a)–(d) shows the learned paths of the robot driven by the LDT(S-6FF) for different sizes of training data. Again, 1 run of S-path data is not sufficient to train the LDT(S-6FF), although occasionally the LDT(S-6FF) achieved a very good S-path, such as the run shown with the dashed line in Fig. 10(d). It can be concluded that one run of path data is not sufficient for training an LDT for both U-path and S-path learning, and at least two runs of a path are needed for LDT training. Table I shows the training time of the LDT with different sizes of dataset.

T RAINING T IME (ms) OF LDTs W ITH D IFFERENT S IZES OF D ATA

TABLE I

Runs

10

5

2

1

LDT(S-6FF) LDT(U-6FF)

3613 ± 600 2703 ± 6

1313 ± 202 897 ± 12

247 ± 9 113 ± 6

63 ± 8 27 ± 7

The average training time of LDT with two runs of a path for both the U-path and the S-path is less than 250 ms. This makes it possible to seamlessly embed the training algorithm into the controller without multiprocessing, as the sample time is 250 ms. Otherwise, the training algorithm must be improved, or the number of input attributes could be further minimized. 3) Robot’s Position and Heading Angle as the Input Attributes of an LDT: The final experiments are to observe the performance of the robot’s learning driven by the LDT with the robot’s position < x, y > and its heading angle ϕ as input attributes. The LDT is trained with 5 runs of path data, which are randomly selected from the 10 runs of path data. This size of data is enough for training an LDT according to the experiments above. a) U-path learning: Fig. 11(a) and (b) shows the U-paths of the robot driven by LDT(U-0TT) and LDT(U-1TT). In the featured U-1TT samples, the one laser reading is the average value of all 181 laser readings. If the LDT is supplied only with the robot’s position < x, y > and the heading angle ϕ, the trajectory of the robot can form a good U-path. When the robot’s position and its heading angle are combined with the average value of the 181 laser readings, the learned paths are more stable. b) S-path learning: Fig. 12(a) and (b) shows the paths of the robot driven by LDT(S-0TT) and LDT(S-1TT), where the 1 indicates the average value of 181 laser readings. It can be

212

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 25, NO. 1, JANUARY 2014

Fig. 11. U-paths learned by the robot, which is driven by the LDTs with the feeds of (a) U-0TT and (b) U-1TT, respectively.

Fig. 12. S-paths learned by the robot, which is driven by the LDTs with the feeds of (a) S-0TT and (b) S-1TT, respectively.

Fig. 13.

(a) Environmental setup and the (b) training paths.

seen that the learned paths driven by LDT(S-1TT) are more stable than that of LDT(S-0TT). Although the offline testing also shows the laser readings and the robot’s position plus the heading angle do not have a consistent effect on the robot’s learning, the average laser reading improved the performance of robot’s learning based on its position and heading angle. This may be because that the noise is eliminated by taking the average of the laser readings. We observed all the training times of LDTs with feeds U-0TT, U-1TT, S-0TT, and S-1TT with 5 runs of the training paths. They are 10, 70, 28, and 133 ms, respectively. As the attribute number in an LDT is reduced to three or four, the training time is greatly reduced, comparing the experiments on the 5 runs of training path in Table I in Section VI-E.2. F. Comparing With the NARMAX System Identification In this section, we compare the performance of the LDT model with that of the nonlinear system identification

Fig. 14. Comparison of LDT with NARMAX. (a) Paths learned with the LDT(1TT) model. (b) Paths learned with the NARMAX(6FFA) model.

NARMAX model. The environment is set as in Fig. 13(a), where four blocks are symmetrically and evenly placed in the squared 5 × 5 m environment. The robot’s initial position is at the middle left with the coordinates and heading angle < 1, 2.5, 0 > Fig. 13(b) shows the training paths in the environment, where the thick line is the expected path. For the collection of training data, the linear velocity of the robot is kept constant at 0.2 m/s while the angular velocity is controlled by a human operator using a keyboard, and the angular velocity can be 0, +0.4 or 0.4 rad/s. The data is collected by using the human operator method in [8], where a human operator drove the robot to traverse the expected path, and the laser readings, robot’s positions and its heading angles are logged, as described in Stage 1 in Section VI-B. The collected samples are used for training both the LDT and the NARMAX models. The LDT is supplied with sample 1TT, namely, the average value of all laser readings, robot’s position < x, y > and its heading angle ϕ. Fig. 14(a) shows the paths learned with LDT(1TT). We also tested the LDT(6FF), where the six laser readings are the medium values [i.e., u1–u6 in Fig. 4(b)] evenly extracted from the 181 laser readings. The performance of LDT(6FF) is similar to the LDT(1TT). But LDT(1TT) achieves a more stable performance, similar to the experiments in Section VI-E.3. With the same training data, a NARMAX model, representing the mapping between a robot’s velocity and laser readings, is trained with the algorithm used in [9].We use the same notation for describing the sample features for the NARMAX model as that for the LDT model. If a NARMAX model is fed with input sample x, then it is denoted as NARMAX(x). We have conducted the experiments with NARMAX(1TT) and NARMAX(6FF), where 6 laser readings are the medium values evenly extracted from 181 laser readings. Both experiments failed to traverse correctly. Therefore, we used the strategy as in [8], where the 6 laser readings are obtained with the average values of the evenly divided sections, denoted as 6FFA. A hill-climbing strategy is used, through using the different regression orders and degrees of inputs to find the best NARMAX model structure with the smallest sum of squared errors. The best NARMAX model polynomial with 183 coefficients is obtained when the regression order is set to 4 and the degree of inputs is set to 2. To avoid wobbling

HE et al.: LINGUISTIC DECISION MAKING

Fig. 15. Comparing with the expected U-path and S-path. (a) Expected U-path and average U-paths of the robot driven by LDT(U-6FF) (gray solid), LDT(U-4TT) (dashdotted), LDT(U-0TT) (dashed) and LDT(U-1TT) (black solid). (b) Expected S-path and average S-paths of the robot driven by LDT(S- 6FF) (gray solid), LDT(S- 6TT) (dashdotted), LDT(S-0TT) (dashed) and LDT(S-1TT) (black solid).

of the robot, we used a low-pass filter. In the control system, the estimated angular velocity is limited to a maximum of 0.4 rad/s, and if the absolute value of the estimated angular velocity is less than 0.05 rad/s, then it is set to 0. The experiments show that the NARMAX model is very sensitive. It is not easy to make the model work as expected, and at least 50% of the runs of the NARMAX model are not appropriate. Fig. 14(b) shows the three best paths learned with NARMAX(6FFA). It can be seen that the shape of the learned paths are quite different to the expected path, and almost drive through the upper left block in the environment. We also conducted experiments for learning other paths with blocks differently distributed in the environment, such as the path of the ∞ shape, using both LDT and NARMAX models. Although the NARMAX model always failed, the LDT showed strong robustness for those paths. As the NARMAX is trained offline, the NARMAX is not adaptable to the system. The accumulated errors of angular velocity caused by the inherent ill-posed problem [18] result in the failure of the route learning. But for the LDT model, the discrete states of velocity described with fuzzy labels reduced such accumulated error caused by the model itself. Therefore, the LDT model achieved much better performance in robustness and reliability than the NARMAX model. We will further compare the performance of the classificationbased LDT model and the regression-based NARMAX model with the proposed evaluation approach in the next section. G. Evaluation We have performed a substantial number of experiments for robot’s route learning, and presented qualitative analysis about the performance of the LDT model with different sensory data inputs and different sizes of training data. Also the performance of the LDT model and the NARMAX model is compared qualitatively through an experiment based on the same training data in the same environment. We will use the proposed approach in Section V to further evaluate the learning performance. 1) Evaluation of LDTs With Different Sensory Data Inputs: Fig. 15(a) shows the average U-paths of the robot driven by LDT(U-6FF) (gray solid line), LDT(U-4TT) (the dashdotted

213

line), LDT(U-0TT) (dashed line) and LDT(U-1TT) (the black solid line), comparing with the expected U-path (the thick line). Fig. 15(b) shows the average S-paths of the robot driven by LDT(S-6FF) (the gray solid line), LDT(S-6TT) (the dashdotted line), LDT(S-0TT) (the dashed line) and LDT(S-1TT) (the black solid line), respectively, comparing with the expected S-path (the thick line). It should be noticed that the expected paths are the average paths of 10 runs of the training U-path and S-path, respectively. For U-path learning, the average path learned with LDT(U-6FF) is the closest to the expected path in all paths learned with different LDTs. For S-path learning, the average path learned with LDT(S-6TT) is the closest to the expected path in all paths learned with different LDTs. Table II presents the integrated errors (15) for each learned path and average errors (16) compared with the expected path. With the proposed evaluation approach, the average error of 10 runs of the U-path is 0.3719, the average error of 10 runs of the S-path is 0.2654, the average error of 5 runs of the U-path is 0.4851, and the average error of 5 runs of the S-path is 0.2562. Therefore, robustness scores for the LDTs with different sensory data inputs can be calculated with Eq. (17), and they are listed in the right most column in the table. In the table, ri is the i th run of a U-path or S-path, and rav is the average path of the three runs of the U-path or the S-path. If the average error of a path learned with LDT(x) is less than that learned by LDT(y), then it is denoted as x ≺ y. Hence, based on the average errors, the relations for different LDTs are as follows: U −6F F ≺ U − 4T T ≺ U − 0T T ≺ U − 1T T ; S −6T T ≺ S − 1T T ≺ S − 6F F ≺ S − 0T T. We denote that x  y, if the robustness score of a path learned with LDT(x) is larger than that learned with LDT(y). With the robustness scores, the relations for different LDTs are as follows: U −0T T  U − 6F F  U − 1T T  U − 4T T ; S −6T T  S − 1T T  S − 6F F  S − 0T T. It can be seen that LDT(U-0TT) achieved the largest robustness score, although the average error of the learned path is quite high. This is because the average error of the 5 runs of training paths is larger than that of the 10 runs of training paths. Therefore, for the U-path learning, the order of LDTs based on average errors is different from that of LDTs based on the robustness scores. For the S-path learning, the order of LDTs based on the average errors is the same as that of LDTs based on the robustness scores, as the average error of 5 runs of training paths is similar to that of 10 runs of training paths. From Table II, it can be also seen that the average errors of three paths learned with LDT(U-1TT) are similar. This is consistent with the Fig. 11(b), where the three paths appear close in a visual inspection. Similarly, LDT(S-1TT) obtained similar average errors for the three learned paths [Fig. 12(b)]. This might indicate that the one average laser reading, the robot’s position and heading angle may complement each other, hence the random errors are reduced. However, there may exists system errors, which could be further corrected.

214

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 25, NO. 1, JANUARY 2014

TABLE II E VALUATION OF THE PATHS L EARNED W ITH LDTs F ED W ITH D IFFERENT ATTRIBUTES FOR THE U-PATH AND S-PATH L EARNING SF

ε(r1 )

ε(r2 )

ε(r3 )

ε(rav )

ε¯

χ¯

U-6FF

0.2675

0.8077

0.1824

0.3631

0.4192

0.8871

U-4TT

0.5570

0.7567

0.4760

0.3754

0.4635

0.8024

U-0TT

0.2610

0.9131

0.3955

0.4127

0.5232

0.9271

U-1TT

0.5727

0.5263

0.6793

0.5540

0.5928

0.8131

S-6FF

0.4111

0.0634

0.4751

0.2858

0.3165

0.8384

S-6TT

0.2290

0.0580

0.1057

0.0397

0.1307

2.0275

S-0TT

0.3840

0.8038

0.2865

0.2884

0.4914

0.5214

S-1TT

0.2952

0.3043

0.2770

0.2513

0.2922

0.8769

TABLE III E VALUATION OF THE PATHS L EARNED W ITH LDT(1TT) AND NARMAX(6FFA) SF

ε(r1 )

ε(r2 )

ε(r3 )

ε¯

χ¯

LDT(1TT)

0.2816

0.2752

0.2052

0.2607

0.8715

NARMAX(6FFA)

0.4048

0.4324

0.2837

0.3736

0.6080

Briefly, LDT(S-6TT) is the best solution for the S-path learning, whereas LDT(S-4TT) is not the best for the U-path learning. 2) Evaluation of the LDT(1TT) and the NARMAX(6FFA): There are six training paths, including the expected path (the thick line) shown in Fig. 13(b). The average error of all the training paths is 0.2272, comparing with the expected path. The integrated error of each learned path for the LDT model and the NARMAX model can be calculated with (15). Table III lists the errors of all learned paths, the average errors and the robustness scores for the LDT model and the NARMAX model, respectively. From Table III, it can be clearly seen that the integrated errors of all paths learned with LDT(1TT) are smaller than that learned with NARMAX(6FFA), and LDT(1TT) achieves a higher robustness score than NARMAX(6FFA). VII. C ONCLUSION In this paper, we proposed and implemented a new application of an LDT in robotic control. We modeled the robot’s route learning problem as a classification problem with an LDT, which made the procedure of a robot’s learning transparent. We examined the real-time performance of LDT training and mobile robot control, and explore the possibility of training a machine learning model in an adaptive system. The effect of different sizes of training data on the performance of the LDT was investigated. The experimental results showed that one run of path data was not enough for training an LDT, and at least two runs were needed. To make online training possible, namely to seamlessly embed a training algorithm in the robot control system, it was necessary to select a suitable size of training data and refine the feed attributes of the model. We proposed an evaluation approach that enabled us to quantify the evaluation of the robot’s learning. A robustness

score was defined regarding the quality of training data. The model with the higher score was considered more robust. This approach can also be used to evaluate the performance of the robot’s learning on a subtask. In the online experiments, we examined the performance of LDTs with different feed attributes. The experimental results showed that different tasks require LDTs with different feed attributes to obtain good performance, and the accuracy on the turning action may be more important, as it determined the direction of the robot’s movement. Task-oriented fuzzy intervals may be good for improving the turning accuracy. Compared with the NARMAX model as a robot’s controller working on a specified path, the LDT model was much better in performance, robustness and reliability. Future work will address task adaptive learning, especially for more complicated learning tasks, which may need to be decomposed hierarchically. Other machine learning approaches will be investigated. Although appropriate sample features could improve the performance of a model, automatic online adaptive training is necessary for improving the performance of a system. R EFERENCES [1] C. Alippi, G. Boracchi, and M. Roveri, “An effective just-in-time adaptive classifier for gradual concept drifts,” in Proc. Int. Joint Conf. Neural Netw., Aug. 2011, pp. 1675–1682. [2] S. Chen, S. A. Billings, and W. Luo, “Orthogonal least squares methods and their application to non-linear system identification,” Int. J. Control, vol. 50, no. 5, pp. 1873–1896, Nov. 1989. [3] A. L. Freire, G. A. Barreto, M. Veloso, and A. T. Varela, “Short-term memory mechanisms in neural network learning of robot navigation tasks: A case study,” in Proc. 6th Latin Amer. Robot. Symp., Oct. 2009, pp. 1–6. [4] B. Gerkey, R. Vaughan, and A. Howard, “The player/stage project: Tools for multi-robot and distributed sensor systems,” in Proc. Int. Conf. Adv. Robot., Jul. 2003, pp. 317–323. [5] H. He and J. Lawry, “Optimal cascade linguistic attribute hierarchies for information propagation,” IAENG Int. J. Comput. Sci., vol. 36, no. 2, pp. 129–136, May 2009. [6] H. He and Z. C. Qin, “A k-hyperplane-based neural network for nonlinear regression,” in Proc. 9th IEEE Int. Conf. Cognit. Inf., Jul. 2010, pp. 783–787. [7] R. C. Jeffrey, The Logic of Decision. New York, NY, USA: Gordon and Breach, 1965. [8] R. Iglesias, T. Kyriacou, U. Nehmzow, and S. Billings, “Route training in mobile robotics through system identification,” in Proc. ICCI Conf., 2006, pp. 56–61. [9] D. Kerr, U. Nehmzow, and S. A. Billings, “Towards automated code generation for autonomous mobile robots,” in Proc. 3rd Conf. Artif. General Intell., Mar. 2010, pp. 55–60. [10] I. Kostavelis, L. Nalpantidis, and A. Gasteratos, “Supervised traversability learning for robot navigation,” in Proc. 12th Annu. Conf., Sep. 2011, pp. 289–298. [11] K. M. Krishna and P. K. Kalra, “Spatial understanding and temporal correlation for a mobile robot,” Spatial Cognit. Comput., vol. 2, no. 3, pp. 219–259, 2000. [12] K. M. Krishna and P. K. Kalra, “Solving the local minima problem for a mobile robot by classification of spatiotemporal sensory sequences,” J. Robot. Syst., vol. 17, no. 10, pp. 549–564, Oct. 2000. [13] T. Kyriacou, U. Nehmzow, R. Iglesias, and S. A. Billings, “Accurate robot simulation through system identification,” Robot. Auto. Syst., vol. 56, no. 12, pp. 1082–1093, Dec. 2008. [14] J. Lawry, Modeling and Reasoning with Vague Concepts, J. Kacprzyk, Ed. New York, NY, USA: Springer-Verlag, 2006. [15] J. Lawry and H. He, “Multi-attribute decision making based on label semantics,” Int. J. Uncertainty, Fuzziness Knowl.-Based Syst., vol. 16, no. 2, pp. 69–86, 2008.

HE et al.: LINGUISTIC DECISION MAKING

[16] J. A.-F. M. Mucientes, R. Alcalá, and J. Casillas, “Learning weighted linguistic rules to control an autonomous robot,” Int. J. Intell. Syst., vol. 24, no. 3, pp. 226–251, 2009. [17] J. I. Mulero-Martínez, “Robust GRBF static neurocontroller with switch logic for control of robot manipulators,” IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 7, pp. 1053–1064, Jul. 2012. [18] D. Nguyen-Tuong and J. Peters, “Online kernel-based learning for taskspace tracking robot control,” IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 9, pp. 1417–1425, Sep. 2012. [19] U. Nehmzow, D. Kerr, and S. A. Billings, “Accurate robot simulation,” in Proc. 10th Anniversary Toward Auto. Robot. Syst., 2009, pp. 203–209. [20] U. Nehmzow and L. Riano, “A proposal for continual learning in robotics,” in Proc. 10th Anniversary Toward Auto. Robot. Syst., 2009, pp. 43–49. [21] C. Olaru and L. Wehenkel, “A complete fuzzy decision tree technique,” Fuzzy Sets Syst., vol. 138, no. 2, pp. 221–254, Sep. 2003. [22] Z. Qin, J. Lawry, “Decision tree learning with fuzzy labels,” Inf. Sci., vol. 172, nos. 1–2, pp. 91–129, Jun. 2005. [23] J. R. Quinlan, “Induction of decision trees,” Mach. Learn., vol. 1, no. 1, pp. 81–106, 1986. [24] T. Lan and S. Liu, “Robot behavior coordination using decisionmaking neurophysiological mechanism,” in Proc. IEEE Int. Conf. Robot. Biomimet., Dec. 2009, pp. 1341–1345. [25] H. Temeltas, “SLAM for robot navigation,” IEEE Aerosp. Electron. Syst. Mag., vol. 23, no. 12, pp. 16–19, Dec. 2008.

Hongmei He (M’05) received the B.Eng. degree in computer engineering from Anhui University of Technology and Science, Hefei, China, in 1997, and the M.Sc. degree in multimedia and internet computing and the Ph.D. degree in computer science from Loughborough University, Loughborough, U.K., in 2003 and 2006, respectively. She is currently a Research Fellow with the School of Engineering and Digital Arts, University of Kent, Kent, U.K. Previously, she was a Research Associate with the University of Ulster, Derry, Northern Ireland, and the University of Bristol, Bristol, U.K. Her current research interests include computational intelligence for a wide range of applications, such as cognitive robotics, network-based data mining, data fusion, optimization, and autonomous wireless sensor networks. Dr. Hongmei has been a representative of IEEE region 8 Ireland section since 2009. She is an Editorial Board Member of Advances in Computing since 2011. She was a recipient of the Best Paper Award in the IAENG International Conference on Artificial Intelligence and Applications, Hong Kong, in 2009, and the Certificate of Merit for the Best Student Paper Award in the International Multi-Conference of Engineers and Computer Scientists, Hong Kong, in 2006.

215

Thomas Martin McGinnity (S’09–M’83) received the Degree (First Class Hons.) in physics and the Ph.D. degree from the University of Durham, Durham, U.K., in 1975 and 1979, respectively. He is a Professor of intelligent systems engineering with the Faculty of Computing and Engineering, University of Ulster, Derry, Northern Ireland. He is currently the Director of the Intelligent Systems Research Centre, which encompasses the research activities of approximately 100 researchers. Formerly, he was an Associate Dean of the Faculty and Director of the University’s technology transfer company, Innovation Ulster, and a spin out company Flex Language Services. He has authored or coauthored more than 275 research papers. His current research interests are focused on computational intelligence, and in particular on computational systems, which explore and model biological signal processing, specifically in relation to cognitive robotics and computation neuroscience. Prof. McGinnity was a recipient of the Senior Distinguished Research Fellowship and a Distinguished Learning Support Fellowship in recognition of his contribution to teaching and research. He is a fellow of the IET and a Chartered Engineer.

Sonya Coleman received the Ph.D. degree from the University of Ulster Derry, Northern Ireland, in May 2003. She is currently a Senior Lecturer with the Faculty of Computing and Engineering, University of Ulster. She has authored more than 80 publications in her research area. Her current research interests include robotics, machine vision, digital image processing, and pattern recognition. Dr. Coleman is a member of the Irish Pattern Recognition and Classification Society and the London Mathematics Society.

Bryan Gardiner (M’08) received the B.Eng. (Hons.) degree in electronics and computer systems and the Ph.D. degree from the University of Ulster, Derry, Northern Ireland, in 2006 and 2010, respectively. He is currently a Lecturer with the School of Computing and Intelligent System, University of Ulster. His current research interests include mobile robotics, digital image processing, and computer vision and pattern recognition. Dr. Gardiner is a member of the International Association of Pattern Recognition and the Irish Pattern Recognition and Classification Society.

Linguistic decision making for robot route learning.

Machine learning enables the creation of a nonlinear mapping that describes robot-environment interaction, whereas computing linguistics make the inte...
1MB Sizes 1 Downloads 3 Views