Generic Dynamic Environment Perception Using Smart Mobile Devices.

sensors Article

Generic Dynamic Environment Perception Using Smart Mobile Devices Radu Danescu *, Razvan Itu and Andra Petrovai Computer Science Department, Technical University of Cluj Napoca, 28 Memorandumului Street, Cluj Napoca 400114, Romania; [email protected] (R.I.); [email protected] (A.P.) * Correspondence: [email protected]; Tel.: +40-740-502-223 Academic Editor: Vittorio M. N. Passaro Received: 28 July 2016; Accepted: 12 October 2016; Published: 17 October 2016

Abstract: The driving environment is complex and dynamic, and the attention of the driver is continuously challenged, therefore computer based assistance achieved by processing image and sensor data may increase traffic safety. While active sensors and stereovision have the advantage of obtaining 3D data directly, monocular vision is easy to set up, and can benefit from the increasing computational power of smart mobile devices, and from the fact that almost all of them come with an embedded camera. Several driving assistance application are available for mobile devices, but they are mostly targeted for simple scenarios and a limited range of obstacle shapes and poses. This paper presents a technique for generic, shape independent real-time obstacle detection for mobile devices, based on a dynamic, free form 3D representation of the environment: the particle based occupancy grid. Images acquired in real time from the smart mobile device’s camera are processed by removing the perspective effect and segmenting the resulted bird-eye view image to identify candidate obstacle areas, which are then used to update the occupancy grid. The occupancy grid tracked cells are grouped into obstacles depicted as cuboids having position, size, orientation and speed. The easy to set up system is able to reliably detect most obstacles in urban traffic, and its measurement accuracy is comparable to a stereovision system. Keywords: obstacle detection; occupancy grid; monocular vision; advanced driving assistance system; mobile device

1. Introduction Driving is a process of continuous sensing, processing the sensory information, making decisions and putting these decisions into action. Some parts of this loop can be done by humans, and some by machines. As all decisions related to driving are based on sensory information, sensing is maybe the most important part of the driving process. Humans are well equipped for sensing, but they are not perfect. This is why, for decades, researchers have tried to devise better sensors, and better and faster algorithms, to help perceive the environment, help the driver and increase traffic safety. While most of the research and development effort has been dedicated to adding intelligent sensing capabilities to the vehicles themselves, especially to the higher end ones, driving assistance applications on mobile devices are a less costly alternative. Facilitated by the rapid evolution of the processing power of the smart mobile devices, and by the increasingly complex and accurate sensors that come embedded into these devices, driving assistance based on image and signal processing on mobile devices may quickly become a reliable tool for traffic safety. This paper presents a method for perceiving the obstacles from the dynamic driving environment based on on-board, real time processing of the images from the camera of a smart mobile device (smartphone, and tablet) that can be fitted behind the windshield of a car. Besides the image sequence stream, the algorithms will use other sensors, such as the GPS receiver, the gyroscope, and the Sensors 2016, 16, 1721; doi:10.3390/s16101721

www.mdpi.com/journal/sensors

Sensors 2016, 16, 1721

2 of 21

accelerometer to know the motion of the user’s car, and delimit this motion from the proper motion of the obstacles. The method proposed in this paper relies on segmenting obstacle areas from an image, after the perspective effect is removed, followed by using these areas to update a probabilistic occupancy grid, which is able to filter out some detection errors, and extract dynamic properties for each grid cell. The estimated occupied grid cells, having also speed and orientation, are grouped into obstacles. In this way, no assumption about the nature or shape of the obstacle is made, and detected obstacles that are the system’s result will have position, size, speed and orientation assigned to them. 2. Related Work The set of driving assistance applications available includes iOnRoad, Drivea, and Movon FCW. iOnRoad [1] is one of the first augmented driving solutions available for iOS and Android, using the smartphone’s camera and GPS receiver to provide time to collision to the detected vehicle in front of the ego-vehicle, and for lane departure warning. A similar application is Drivea [2], which provides obstacle collision warning, lane departure warning, and warning about speeding. A newer application, Movon FCW [3] is one of the few applications that can detect incoming cars, or cars that are not in full view, and can provide actual distance information. In the research community, most of the recent obstacle detection efforts have been focused on sensors or sensory solutions that are able to deliver 3D data directly: radar, laser, or stereovision [4]. While some mobile devices are equipped with stereo cameras, which can be used for short distance accurate detection of obstacles [5], most devices are only capable of monocular vision. Monocular obstacle detection faces multiple challenges, the most important being that there is no direct 3D information, and no direct, geometrical way of separating the obstacles from the road features, or the closer obstacles from the background. For the identification of the obstacle regions in the image, solutions include region segmentation [6,7], motion based analysis [8,9], or appearance analysis, either through heuristics such as symmetry [10,11], or through the use of machine learning [12,13]. For extracting the 3D position of the obstacle, monocular vision uses constraints imposed on the structure of the environment, such as the condition that the road is flat, which leads to the measurement of the points on the road by Inverse Perspective Mapping (IPM) [14,15]. The lack of 3D information considerably affects the robustness of an obstacle detection algorithm, and therefore most monocular applications make assumptions about the obstacles. Generic obstacle detection, independent of type or size, is much more difficult to achieve. While stereovision solutions, or other ranging solutions, can also employ assumptions about the obstacle’s nature, it also allows generic environment perception, such as in the form of dynamic occupancy grids [16], especially if the stereovision information is dense. The solution proposed in this paper comes to bring this power of generic environment perception to the monocular sensor as well, and it does so by using the dynamic particle-based occupancy grid as the common intermediate representation, a middle environment layer between the image data and the discrete obstacle layer. The use of a probabilistic, dynamic middle layer allows inference about the presence of obstacle parts in the scene, and about their motion, and also allows modeling the uncertainties of the sensing methodology. In order to achieve a generic environment perception, a generic scene segmentation algorithm has to be employed. This paper presents such a solution, which is a refinement of a preliminary result of ours, based on IPM images, described in [17]. 3. Solution Description 3.1. Overview The process starts with the acquisition of an image from the mobile device’s camera. The image is then transformed into a bird’s eye view of the road by Inverse Perspective Mapping, using the camera calibration parameters. The IPM image is segmented along rays staring from the camera

Sensors 2016, 16, 1721 Sensors 2016, 16, 1721

3 of 21 3 of 21

position, and a hypothetic obstacle map is generated. The obstacle map is used to to update thethe particle position, and a hypothetic obstacle map is generated. The obstacle map is used update particle based occupancy grid. In In thethe update process, thethe vehicle motion information (speed and yaw rate) is is based occupancy grid. update process, vehicle motion information (speed and yaw rate) obtained from thethe available sensors of of thethe mobile device (GPS forfor speed, gyroscope or or accelerometer obtained from available sensors mobile device (GPS speed, gyroscope accelerometer forfor yaw occupancy grid gridattaches attaches to each theview topworld view an world an occupancy yawrate). rate). The The occupancy to each cell cell of theoftop occupancy probability probability andprobability a speed probability and a speed density. density. The estimated grid cells occupancy The estimated grid cells occupancyprobabilities, probabilities,and andthe thespeed speedvectors, vectors,are are used used in in the process of of obstacle extraction. extraction. The process is depicted depicted in in Figure Figure 1. 1.

Figure 1. Overview of the obstacle detection process. Figure 1. Overview of the obstacle detection process.

3.2.3.2. Removing thethe Perspective Effect Removing Perspective Effect Assuming that thethe road is flat, thethe perspective image cancan bebe remapped in in such a way that thethe Assuming that road is flat, perspective image remapped such a way that features that belong to the road surface are displayed as they are seen from above and their lateral features that belong to the road surface are displayed as they are seen from above and their lateral and and longitudinal coordinates real world proportional pixel coordinates (Figure longitudinal coordinates in in thethe real world areare proportional to to thethe pixel coordinates (Figure 2). 2). In In thethe IPM image, the obstacles are deformed, since they do not belong to the road surface and IPM image, the obstacles are deformed, since they do not belong to the road surface and therefore their features do not obey the road geometry. For this reason, only the point of contact therefore their features do not obey the road geometry. For this reason, only the point of contact between thethe obstacle andand thethe road is relevant for the stepssteps of the detection algorithm, as between obstacle road is relevant for next the next ofobstacle the obstacle detection algorithm, itsas position can be directly related to the position of the obstacle in the real world. The camera is its position can be directly related to the position of the obstacle in the real world. The camera assumed to betocalibrated (the (the calibration methodology is described in Section 4), 4), therefore thethe is assumed be calibrated calibration methodology is described in Section therefore correspondence between any point in the 3D world and its corresponding projected pixel in the correspondence between any point in the 3D world and its corresponding projected pixel in the image image plane is known. The 3D coordinate system is assumed to have the origin on the ground, below plane is known. The 3D coordinate system is assumed to have the origin on the ground, below the thecamera, camera,the thex xaxis axis pointing the driver’s right, z axis indicating direction of travel, pointing toto the driver’s right, thethe z axis indicating thethe direction of travel, andand the y theaxis y axis the height above the road (xOz) plane. the height above the road (xOz) plane. The removal of of thethe perspective effect is performed using Algorithm 1: 1: The removal perspective effect is performed using Algorithm Algorithm 1: IPM Transformation 1: IPM Transformation 1 Algorithm Input: Source image I 2 1 Output: IPM image Input: Source imageITI 3 2 For each IPM pixelimage of coordinates (uT, vT) of IT Output: IT 4 3 xFor = k each uT + xpixel 0 of coordinates (uT , vT ) of IT 5 4 zx==j kvTu+T z+0 x0 6 5 yz==0j vT + z0 7 6 (u, y =v)0 = Projection(xW, yW, zW) 8 7 I(u, T(uTv) , v=T)Projection(x = I(u, v) W , yW , zW ) 9 8 End IT (uTFor , vT ) = I(u, v) 9 End For

Sensors 2016, 16, 1721 Sensors 2016, 2016, 16, 16, 1721 1721 Sensors

4 of 21 4 of of 21 21 4

The and zz000 are are chosen chosen in in such such a way that that the The constants constants k, k, j, j, xx000 and and are chosen in such aa way way that the most most relevant relevant portion portion of of the the road road the remapped image. The projection function uses the plane is displayed in the remapped image. camera intrinsic plane is displayed in the remapped image. The projection function uses the camera intrinsic and and 3D point point (expressed (expressed in extrinsic extrinsic parameters, parameters, which which compose compose the the projection projection matrix, matrix, for for mapping mapping aa 3D in the the image plane. 3D world coordinates x, y and z), to the image plane. 3D world coordinates x, y and z), to the image plane.

Figure 2. 2. Removal Removal of of the the perspective perspective effect. effect. Figure

The completely completely flat road assumption isisrestrictive, restrictive, and the world will often notnot comply with it, The will often not comply with it, completely flat flat road roadassumption assumptionis restrictive,and andthe theworld world will often comply with due to the road geometry and, most often, to the balancing of the observer vehicle. This error is due to to thethe road geometry it, due road geometryand, and,most mostoften, often,totothe thebalancing balancingof of the the observer observer vehicle. vehicle. This This error is modeled in in the the grid grid measurement measurement model model used used for for grid grid state state update. update. for grid state update. modeled 3.3. 3.3. Finding Finding the the Candidate Candidate Obstacle Obstacle Regions Regions

to identify identify the the areas areas corresponding From From the the top top view view IPM IPM image image we we have have to corresponding to to the the contact contact points points the obstacle and the road. Since we want our system to be as form-independent as possible, between as form-independent as possible, between the obstacle and the road. Since we want our system to be as form-independent as possible, clues, such such as as symmetry. symmetry. we we will will not not use use shape-based shape-based clues, clues, such as symmetry. the IPM image, an upright obstacle the tendency to radially from the We In the IPM image, an upright obstacle has tendency to expand radially from the camera. In the IPM image, an upright obstacle has has thethe tendency to expand expand radially from the camera. camera. We use this property to scan the IPM image with rays for each angular degree of the field of view, with We use this property to scan the IPM image with rays for each angular degree of the field of view, use this property to scan the IPM image with rays for each angular degree of the field of view, with each each ray converging converging on the the camera position pointpoint (Figure 3). 3). with ray converging oncamera the camera position (Figure each ray on position point (Figure 3).

Figure Scanning the Inverse Perspective Mapping (IPM) image with starting from camera Figure 3. 3. thethe Inverse Perspective Mapping (IPM)(IPM) imageimage with rays rays from the the camera Figure 3. Scanning Scanning Inverse Perspective Mapping withstarting rays starting from the location. location. camera location.

For each each scan scan ray, ray, the angle angle and the the minimum minimum and and maximum maximum distance distance from from the the camera are are For For each scan ray, thethe angle andand the minimum and maximum distance from the cameracamera are known. known. If an obstacle is present on such a ray, its position along the ray must be found. As the known. If an isobstacle on its such a ray,along its position the ray must beobstacles found. As the If an obstacle present is onpresent such a ray, position the ray along must be found. As the do not obstacles do not comply with the flat road assumption of the IPM transformation, they will be obstacles do not comply with the flat road assumption of the IPM transformation, they will be comply with the flat road assumption of the IPM transformation, they will be deformed such that they deformed such such that that they they will will most most likely likely stretch stretch for for the the whole whole remainder remainder of of the the ray. ray. Thus, Thus, the the only only deformed will most likely stretch for the whole remainder of the ray. Thus, the only relevant information about relevant relevant information information about about the the obstacle obstacle is is its its starting starting distance distance along along the the ray, ray, the the point point where where the the the obstacle is its starting distance along the ray, the point where the obstacle touches the ground. obstacle touches the ground. obstacle touches the ground. In order to find the obstacle distance along the scan ray, the grayscale values of the IPM image In order order to to find find the the obstacle obstacle distance distance along along the the scan scan ray, ray, the the grayscale grayscale values values of of the the IPM IPM image image In corresponding to the ray are recorded in a vector gα (d), α denoting the angle of the ray, and d being the corresponding to the ray are recorded in a vector g α (d), α denoting the angle of the ray, and d corresponding to the ray are recorded in a vector gα(d), α denoting the angle of the ray, and d being being distance from the camera. For each candidate obstacle distance d, the following values are computed: the distance distance from from the the camera. camera. For For each each candidate candidate obstacle obstacle distance distance d, d, the the following following values values are are the computed: computed:

Sensors 2016, 16, 1721

Sensors 2016, 16, 1721

5 of 21

 , P (d ) 

d 3 1  g (k ) d  3  d _ min  1 k d _min

5 of 21

(1)

d _ max

d −3 11 µα,P d))= , D ((d ∑ ggα((kk) ) _ 3max 3 dd − − d_min 1 k1=d_min k  d 3   d α+



(2) (1)

α

d 3

d_maxα

11 g ( k ) gα ( k ) µα,D (d) = , M (d )  d_maxα −7dk− 3 + 1 k=∑ d +3  d 3



(2) (3)

1 for the proximal region (  ), middle region These quantities are the mean intensity  ,P µα,M (d)values = gα ( k ) (3) 7 k=∑ d −3 (  ,M ) and distal region (  ,D ) of a ray of angle α with respect to the camera orientation, and the These quantities are the mean intensity values for the proximal region (µα,P ), middle region (µα,M ) hypothetical obstacle distance d. Based on these mean values, we have to decide if a distance d may and distal region (µα,D ) of a ray of angle α with respect to the camera orientation, and the hypothetical in fact be a distance to an obstacle touching the road. First, a binary function of d is designed, obstacle distance d. Based on these mean values, we have to decide if a distance d may in fact be a indicating whether the intensity profile of the ray of angle α supports the hypothesis of an obstacle distance to an obstacle touching the road. First, a binary function of d is designed, indicating whether being present at the distance d. the intensity profile of the ray of angle α supports the hypothesis of an obstacle being present at the distance d. (d )  1, if (  , P (d )   ,M (d )   ) and ((  , D (d )   ,M (d )   ) or (  , P (d )   , D ( d )   )) (4)   d +3

0, otherwise (  1, i f (µα,P (d) − µα,M (d) > σ ) and ((µα,D (d) − µα,M (d) > σ )or (µα,P (d) − µα,D (d) > σ ))

(4) α (d) = InωEquation (4), σ is the graylevel standard deviation of the IPM image, a measure of the image 0, otherwise contrast. Intuitively, Equation (4) states that an obstacle point of contact with the road should be darker than the (4), roadσ leading to it, and also darker thanofthe the obstacle along theimage ray. In Equation is the graylevel standard deviation the rest IPM of image, a measure of the The shadow area below a car passes this test. continue along the ray are also taken contrast. Intuitively, Equation (4) states that anDarker obstacleareas pointthat of contact with the road should be darker into Offtocourse, only daytime, and will not be valid at thanconsideration. the road leading it, and this also assumption darker than is thevalid rest of theduring obstacle along the ray. The shadow area night, thispasses solution not Darker aplicable at night. below so a car thisistest. areas that continue along the ray are also taken into consideration. Off course, this assumption is valid only daytime, notbest be valid at night, solution  (dand ) >will If multiple candidate distances for aduring ray have 0, the candidate hasso tothis be selected. is not aplicable at night. A continuous score function is defined: If multiple candidate distances for a ray have ωα (d) > 0, the best candidate has to be selected. 1 A continuous score function is defined: 3 (5)

  (d )  (  , P (d )   ,M (d ) .  , D (d )   ,M (d ) .  , P (d )   , D (d ) ) 1

(5) ς α (d) = (|µα,P (d) − µα,M (d)| · |µα,D (d) − µα,M (d)| · |µα,P (d) − µα,D (d)|) 3 Out of all the values d that have  (d ) > 0, the one having a maximum   (d ) is kept as the Out of all the values d that have ωα (d) > 0, the one having a maximum ς α (d) is kept as the obstacle obstacle position. The process is depicted in Figure 4: on the first row the binary function  (d ) is position. The process is depicted in Figure 4: on the first row the binary function ωα (d) is shown, shown, the row second row depicts the continuous score function. line indicates the for choice and the and second depicts the continuous score function. The redThe linered indicates the choice the for the obstacle’s position. obstacle’s position.

a (d )

 a (d ) d Figure Figure 4. 4. Finding Finding the the obstacle’s obstacle’s position position along along aa ray. ray.

After the obstacles are located on each ray, a binary grid is constructed from these results (Figure 5).

Sensors 2016, 16, 1721

6 of 21

Sensors 2016, 16, 1721

6 of 21

After the obstacles are located on each ray, a binary grid is constructed from these results (Figure 5).

Figure 5. The measurement binary obstacle grid. Figure 5. The measurement binary obstacle grid.

3.4. The Occupancy Grid Grid 3.4. The Particle-Based Particle-Based Occupancy The particle particle based based occupancy occupancy grid is aa freeform freeform dynamic dynamic world world representation, representation, which which tracks tracks aa The grid is bird’s eye grid view of the world, in terms of occupancy (presence of an obstacle) and speed vector bird’s eye grid view of the world, in terms of occupancy (presence of an obstacle) and speed vector probability density density for for each each grid grid cell. cell. probability The dynamic occupancy grid depicts the the obstacles obstacles as as composed composed of of particles particles having having individual individual The dynamic occupancy grid depicts position and speed. Each discrete grid cell will thus contain a certain number of particles, each position and speed. Each discrete grid cell will thus contain a certain number of particles, each particle particleits having its own speedThe vector. The occupancy probability of each cell is described by the having own speed vector. occupancy probability of each grid cell grid is described by the number number of particles in that cell (more particles, more likely the cell is occupied by an obstacle). The of particles in that cell (more particles, more likely the cell is occupied by an obstacle). The particles particles in this algorithm can be regarded in two as hypotheses, in of theparticle case offiltering particle in this algorithm can be regarded in two ways: as ways: hypotheses, such as insuch the as case filtering algorithms such as CONDENSATION [18], but also as actual building blocks of the algorithms such as CONDENSATION [18], but also as actual building blocks of the environment, environment, which can move from one cell to another in order to model the dynamic evolution of which can move from one cell to another in order to model the dynamic evolution of the environment the environment (and, by doing this, achieve the environment’s prediction of the environment’s state). (and, by doing this, achieve the prediction of the state). The world is represented by a 2D grid, mapping the bird’s eye view 3D3D space into discrete 20 The world is represented by a 2D grid, mapping the bird’s eye view space into discrete cm × 20 cm cells. The aim of the grid tracking algorithm is to estimate the occupancy probability 20 cm × 20 cm cells. The aim of the grid tracking algorithm is to estimate the occupancy probability of of each grid cell, and the speed vector components on each axis (longitudinal and lateral). each grid cell, and the speed vector components on each axis (longitudinal and lateral). Considering aacoordinate coordinate system where axis towards points towards the ofdirection of the Considering system where the z the axis zpoints the direction the ego-vehicle, ego-vehicle, the to x axis pointsthe to obstacles the right,inthe in theare world model are by a and the x axisand points the right, theobstacles world model represented by arepresented set of particles: set of particles: S = { p | p = (c , ri , vci , vri , ai ), i = 1 . . . NS } (6) S  { pi | pii i (ci , rii, vc (6) i , vri , ai ), i  1...N S } Each particle i has a position in the grid, described by the row ri (a discrete value of the distance particle i hasthe a position the grid,value described the row ri (a discrete of described the distance in theEach 3D world z) and column in ci (discrete of theby lateral position x), and avalue speed, by in the 3D world z) and the column c i (discrete value of the lateral position x), and a speed, described the speed components vci and vri . An additional parameter, ai , describes the age of the particle, since its by the speed components vcparameter i and vri. An additional parameter, ai, describes the age of the particle, creation. The purpose of this is to facilitate the validation and the speed estimation process, since its creation. The purpose thisfor parameter is to are facilitate the consideration. validation and the speed as only particles that survive in theoffield several frames taken into estimation process, as only particles that survive in the field for several frames are oftaken into The total number of particles in the scene NS depends on the scene load (the number obstacles consideration. in the field), and is not known a-priori, but updated each time a new obstacle cell is discovered by The total number particles in theof scene NS depends onthe theoccupancy scene load (the numberofofaobstacles measurement. Havingofthe population particles in place, probability cell C is in the field), and is not known a-priori, but updated each time a new obstacle cell is discovered estimated as the ratio between the number of particles whose position coincides with the positionby of measurement. Having the population of particles thecell, occupancy probability of a cell C is the cell C and the total number of particles allowed in forplace, a single NC . estimated as the ratio between the number of particles whose position coincides with the position of pi ∈ S|rfor , ci = cell, cc }| NC. |{ i =arcsingle the cell C and the total number ofPparticles allowed (7) O (C ) = NC

|{p  S | r  r ,c  c }|

i c The number of allowed of the system. In setting its value, POparticles (C )  peri cell NiC is ca constant (7) N a trade-off between accuracy and time performance C should be considered. A large number means that on a single cell multiple speed hypotheses can be maintained, and therefore the tracker can have The number of allowed particles per cell NC is a constant of the system. In setting its value, a a better speed estimation, and can handle fast moving objects better. However, the total number of trade-off between accuracy and time performance should be considered. A large number means that particles in the scene will be directly proportional with NC , and therefore the speed of the algorithm on a single cell multiple speed hypotheses can be maintained, and therefore the tracker can have a will decrease. The current value for NC is 50. better speed estimation, and can handle fast moving objects better. However, the total number of The speed of a grid cell can be estimated as the average speed of its associated particles, if we particles in the scene will be directly proportional with NC, and therefore the speed of the algorithm assume that only one obstacle is present in that cell. will decrease. The current value for NC is 50.

Sensors 2016, 16, 1721

7 of 21

Sensors 2016, 2016, 16, 16, 1721 1721 Sensors

of 21 21 77 of

The speed speed of of aa grid grid cell cell can can be be estimated estimated as as the the average average speed speed of of its its associated associated particles, particles, if if we we The assume that only one obstacle is present in that cell. ( vc , vr ) assume that only one obstacle is present in that cell.∑ i i

(vcC , vrC ) =

(vcC , vrC ) 

) c }| r ii ,, vr c ii = |{ pi p∈S |r=(vc c S , x i x , z cz i pi ∈S,xi = xc ,zi =zc

(8)

(vc , vr )

p ii S , xii  x cc , z ii  z cc

(8) (8)

C C Thus, the population of particles is sufficiently | { pii  S | riirepresentative  rcc , cii  ccc } | for the probability density of occupancy and speed for the whole grid. Multiple speed hypotheses can be maintained simultaneously Thus, the the population population of of particles particles is is sufficiently sufficiently representative representative for for the the probability probability density density of of Thus, for a single cell, and the occupancy uncertainty is represented by the varying number of particles occupancy and speed for the whole grid. Multiple speed hypotheses can be maintained occupancy and speed for the whole grid. Multiple speed hypotheses can be maintained associated to the cell. on cell, the world model, the environment algorithm simultaneously forBased single and the the occupancy uncertainty is istracking represented by the the becomes varying the simultaneously for aa single cell, and occupancy uncertainty represented by varying continuous creation, moving and destroying the particles, a process driven by the measurement number of of particles particles associated associated to to the the cell. cell. Based Based on on the the world world model, model, the the environment environment tracking tracking number information, which is the binary grid described in the previous section. algorithm becomes the continuous creation, moving and destroying the particles, a process driven algorithm becomes the continuous creation, moving and destroying the particles, a process driven The first step of the algorithm is the prediction, which is applied to each particle by the measurement information, which is the binary grid described in the previous section. by the measurement information, which is the binary grid described in the previous section. in the set. The first first step of the the algorithm algorithm is the the prediction, which is applied applied to each each particle in can the set. set. The positions of the particles are altered (seeprediction, Figure 6)which according to their speed (or we say that The step of is is to particle in the The positions of the particles are altered (see Figure 6) according to their speed (or we can say that The positions the particles altered Figure 6) according their speed (or we can sayofthat the particles moveofaccording toare their own(see speed vector), and totothe motion parameters the ego the(speed particles move according tofrom theirthe own speeddevice’s vector), and and to the the motion parameters of of the the ego ego the particles move their own speed vector), to motion vehicle and yawaccording rate), readto mobile sensors, GPS and parameters accelerometer/gyroscope. vehicle (speed (speed and yaw yaw rate), read read from the the mobile mobile device’s device’s sensors, sensors, GPS GPS and vehicle In addition, a randomand amount is rate), added to thefrom position and speed of each particle, for the and effect of accelerometer/gyroscope. In addition, a random amount is added to the position and speed of each each accelerometer/gyroscope. In addition, a random amount is added to the position and speed of stochastic diffusion, modelling the uncertainty of the motion model with respect to the real-world particle, for for the the effect effect of of stochastic stochastic diffusion, diffusion, modelling modelling the the uncertainty uncertainty of of the the motion motion model model with with particle, movement obstacles.movement The second step is the processing measurement information. This step respect of to the the real-world real-world of the the obstacles. The second secondofstep step is the the processing processing of measurement measurement respect to the movement of obstacles. The is of is based on the This raw step occupancy cells by processing theby IPM image,the and provides information. This step is based based on on theprovided raw occupancy occupancy cells provided provided by processing the IPM image, the information. is the raw cells processing IPM image, measurement model for each cell. and provides provides the the measurement measurement model model for for each each cell. cell. and

Figure 6. Migration of particles from one cell to another, as prediction is applied [16]. © 2011 IEEE.

Figure 6. Migration particlesfrom from one one cell prediction is applied [16]. [16]. © 2011 IEEE.IEEE. Figure 6. Migration of ofparticles cell to toanother, another,asas prediction is applied © 2011 Reprinted, with with permission, permission, from from IEEE IEEE Transactions Transactions on on Intelligent Intelligent Transportation Transportation Systems, Systems, Vol. Vol. 12, 12, Reprinted, Reprinted, with permission, from IEEE Transactions on Intelligent Transportation Systems, Vol. 12, No. 4. No. 4. 4. No.

The The measurement model information to weight weightthe theparticles, particles, resample The measurement measurement model information is is used used to to weight the particles, andand resample themthem in the thein the model information is used and resample them in samesame step step (see(see Figure 7). 7). ByBy weighting theparticles particles a cell be multiplied same step (see Figure 7). By weightingand and resampling, resampling, the the particles in in cell cancan be multiplied multiplied or or Figure weighting and resampling, in aa cell can be or reduced. The final step is to to estimatethe theoccupancy occupancy and and speeds for each cell. More details about the the reduced. The The finalfinal stepstep is to estimate andspeeds speedsfor for each cell. More details about reduced. is estimate the occupancy each cell. More details about the grid tracking tracking process are provided in ourpreviously previously published published work work [16]. grid process provided our previously published grid tracking process areare provided inin our work[16]. [16].

Figure 7. Weighting and resampling: The weight of the occupied hypothesis is encoded in the darkness of the cell of the left grid. In the right grid, the effect of resampling is shown, as particles are multiplied or deleted [16]. © 2011 IEEE. Reprinted, with permission, from IEEE Transactions on Intelligent Transportation Systems, Vol. 12, No. 4.

3.5. Updating the Dynamic Occupancy Grid from the IPM-Based Measurement The update process uses the measured binary grid, and the uncertainty associated to the creation of this grid, in the process of adjusting the particle population. In our previously published Sensors 2016, 16, 1721 8 of 21 work, the binary grid was created from a digital map generated by processing dense stereovision data. The position uncertainty of an obstacle cell is derived from the uncertainty of the stereovision the Dynamic Occupancy Grid fromzthe IPM-Basedby Measurement process. 3.5. TheUpdating standard deviation of a distance estimated a stereovision system of baseline b and updateaprocess uses the measured binary and the uncertainty associated to the creation focal length The f, having matching uncertainty of grid, d pixels, is:

of this grid, in the process of adjusting the particle population. In our previously published work, the binary grid was created from a digital map2 generated by processing dense stereovision data. z The position uncertainty of an obstacle cell uncertainty of the stereovision process.  zisderivedd from   the z0 . bf by a stereovision system of baseline b and focal The standard deviation of a distance z estimated length f, having a matching uncertainty of σd pixels, is:

The standard deviation of the other grid coordinate, the lateral position x, is: z2 σd σz = xbf z + σz0 .

x 

  x0.

(9)

(9)

(10)

The standard deviation of the other grid coordinate, the lateral position x, is: z

xσz The lateral position uncertainty depends σx0 .distance uncertainty, which is(10)the most σx = on+the z important uncertainty of the measurement process. If we want to adapt the occupancy grid tracking mechanism to binary measurement grids generated byuncertainty, processing IPMisimages, the computation Theuse lateral position uncertainty depends on the distance which the most important uncertainty of the measurement process. If we want to adapt the occupancy grid tracking mechanism of distance uncertainty has to be adapted to the IPM method of distance computation. use binary gridsdistance generatedofbyan processing images, the computation of distance Thetoprocess of measurement estimating the obstacleIPM from the camera when using IPM images uncertainty has to be adapted to the IPM method of distance computation. is described in Figure 8. The height of the camera is known, along with its orientation. A pixel in the The process of estimating the distance of an obstacle from the camera when using IPM images is image corresponds to a line from theofcamera, which makes with the theAangle θ.the This angle described in Figure 8. The height the camera is known, along with itsvertical orientation. pixel in dependsimage on several factors: between camera’s optical axis and the vertical, the position corresponds to athe lineangle from the camera, the which makes with the vertical the angle θ. This angle depends on image several factors: opticaland axis the and position the vertical, of point. of the pixel in the plane, the theangle focalbetween lengththe ofcamera’s the camera, ofthe theposition principal the pixel in the image plane, the focal length of the camera, and the position of the principal point. We are not concerned with these details. For all practical purposes, we can say that we know θ, and We are not concerned with these details. For all practical purposes, we can say that we know θ, thus, from the red triangle described in Figure 8, we can compute z. and thus, from the red triangle described in Figure 8, we can compute z.

Figure 8. The principle of IPM distance measurement.

Figure 8. The principle of IPM distance measurement. The IPM image is not constructed by using angles and heights, but by using the projection matrix, as described in Section However, theby simplified model isand usefulheights, in understanding mechanism The IPM image is not3.2. constructed using angles but by the using the projection of uncertainty. As the height of the camera above the ground can be considered, to a reasonable degree, matrix, as described in Section 3.2. However, the simplified model is useful in understanding the to be fixed, the uncertainty affects mostly the angle θ. This uncertainty comes from the pitching of the mechanism of uncertainty. As the height of the camera above the ground can be considered, to a observing vehicle, as the car encounters uneven surfaces, accelerates or brakes. Small and temporary reasonable degree, to real be fixed, the uncertainty affects mostlycan thealso angle θ. This deviations of the road surface from the flat road assumptions be modeled as uncertainty errors in the comes from theθ angle, pitching theterm, observing ascannot the car encounters uneven surfaces,to accelerates or whileoflong significantvehicle, deviations be modeled and will lead, eventually, wrong ◦ estimations. An experimentally tuned value of real σθ = road 0.25 was chosen to account for road these situations. brakes. Small and temporary deviations of the surface from the flat assumptions can The relation between z and θ is: also be modeled as errors in the θ angle, while long term, significant deviations cannot be modeled z = htanθ. and will lead, eventually, to wrong estimations. An experimentally tuned value of σθ =(11) 0.25° was chosen to account for these situations. The relation between the uncertainty of z and the uncertainty of θ is: The relation between z and θ is: dz σz =

σθ .

z  hdθ tan  . The relation between the uncertainty of z and the uncertainty of θ is:

(12)

(11)

Sensors 2016, 16, 1721

9 of 21

This can be written as: σz = h(1 + tan2 θ )σθ .

(13)

Using Equation (11), we can replace the tangent, and obtain the final equation for the uncertainty of z: z2 σz = h 1 + 2 h

σθ + σz0 .

(14)

The quantity σz0 is a constant covering other sources of uncertainty that cannot be modeled analytically. For each cell in the grid, the difference dz in grid row coordinates and the difference dx in grid column coordinates is computed between the position of the grid cell and the position of the nearest obstacle cell in the binary obstacle map segmented from the IPM image. Using these distances and the uncertainty values computed from Equations (14) and (10), we can compute the measurement probability density under the assumption of the cell being occupied, by employing the bivariate Gaussian function: 2 2 1 − 1 ( dz ) +( dσxx ) e 2 σz 2πσz σx

poccupied =

(15)

Based on the measurement probability value, the predicted particles in a cell may be multiplied or decimated. The particle creation, movement, multiplication and destruction mechanisms of the dynamic occupancy grid [16] can now be used with the IPM binary obstacle grid as measurement data. 3.6. Detection of Individual Obstacles The dynamic grid’s cells that hold a significant number of particles are considered occupied, and therefore parts of obstacles. A grouping algorithm, which takes into account the proximity of occupied cells, but also the agreement of their speed vectors, extracts connected regions from the grid, regions that describe individual objects. An oriented rectangle shape is matched to these connected regions, and the 3D object is extracted as a cuboid (obviously, the height of the object above the road cannot be computed, and a fixed value is set). Due to the nature of the occupancy grid, its cells will have a speed vector, computed as the resultant of the individual speed vectors of the particles. After the grid cells are grouped into individual obstacles, a resultant speed vector for the cell group is computed, and this vector is assigned to the detected object. The resulted objects have length, width, orientation and speed, and can be therefore classified into static and dynamic. The process is seen in Figure 9, where the stationary vehicles are shown in green, and the moving vehicle is shown in red.

Sensors 2016, 16, 16, 1721 1721 Sensors 2016,

10 10 of of 21 21

Sensors 2016, 16, 1721

10 of 21

Figure 9. Obstacle detection results: (Left) the dynamic occupancy grid; and (Right) the extracted cuboids. Figure 9. Obstacle detection results: (Left) the dynamic occupancy grid; and (Right) the extracted Figure 9. Obstacle detection results: (Left) the dynamic occupancy grid; and (Right) the extracted cuboids. cuboids.

4. Implementation for Android Mobile Devices 4. Implementation for Android Mobile Devices 4. Implementation for Android Mobile Devices 4.1. of the the Solution Solution 4.1. Software Software Architecture Architecture of 4.1. Software Architecture of the Solution The Android application, using JavaJava for the interface, and The solution solutionwas wasimplemented implementedasasanan Android application, using for user the user interface, native C++ for the algorithm processing workload (Figure 10). The solution as an Android application, using Java for the user interface, and and native C++ forwas theimplemented algorithm processing workload (Figure 10). The calibration process is by automatic retrieval the camera’s camera’s horizontal horizontal and native C++ for the algorithm workload (Figure 10). The calibration process processing is simplified simplified by the the automatic retrieval of of the and vertical field of view angles using the Android API. The fixed image resolution of 640 × 480 pixels, The calibration process is simplified by the automatic retrieval of the camera’s horizontal and vertical field of view angles using the Android API. The fixed image resolution of 640 × 480 pixels, combined with the ofthe view, give us us the focal distance in pixels. The position of vertical field of view angles field usingof Android API. The fixed image in resolution of 640 × 480 pixels, combined with the known known field view, give the focal distance pixels. The position of the the principal assumed be the center of thethe image. user to the angle combinedpoint with is knownto view, give focalThe distance pixels. The position the principal point isthe assumed tofield be in inofthe center of us the image. The user is isinasked asked to tune tune the pitch pitchof angle and the camera height, while the IPM image is shown with a distance grid superimposed, so that the principal point is assumed to the be in theimage centerisofshown the image. user isgrid asked to tune the pitch angle and the camera height, while IPM with aThe distance superimposed, so that the user can adjust parameters untilimage top view view scene is is in accordance accordance to the the grid. and height, while the until IPM is shown with a distance grid superimposed, so that the userthe cancamera adjust these these parameters top scene in to grid. grid tracking tracking algorithmrequires requires themotion motion parameters vehicle, speed user The can adjust these parameters until topthe view sceneparameters is in accordance to host the grid. The grid algorithm of of thethe host vehicle, thethe speed andand the the yaw rate. The speed is provided by the mobile device’s GPS receiver. For the yaw rate, the algorithmby requires the motion of the vehicle, and yaw The rate.grid The tracking speed is provided the mobile device’sparameters GPS receiver. Forhost the yaw rate,the thespeed on-board on-board gyroscope is used whenotherwise, available; otherwise, the lateral acceleration combined with the the yaw rate. Thewhen speed is provided by thethe mobile GPS combined receiver. For the rate, the gyroscope is used available; lateraldevice’s acceleration with theyaw speed is used speed is used to infer the yaw rate. on-board gyroscope is used when available; otherwise, the lateral acceleration combined with the to infer the yaw rate. speed is used to infer the yaw rate.

Figure 10. Architecture of the solution.

4.2. Calibration Calibration 4.2.

Figure 10. Architecture of the solution.

4.2. Calibration 4.2.1. Calibration of the Intrinsic Camera Parameters 4.2.1. Calibration of the Intrinsic Camera Parameters The intrinsic parameters of the camera are the focal length (usually expressed in pixels), 4.2.1.The Calibration the Intrinsic Parameters intrinsicofparameters of Camera the camera are the focal length (usually expressed in pixels), the the position of the principal point (the intersection between the optical axis and the image plane), position of the principal point of (the intersection the optical axis and the image plane), Thedistortion intrinsic parameters camera arebetween thecan focal length (usually in toolbox pixels), and the and the coefficients. Allthe these parameters be computed using expressed a calibration [19]. the distortion coefficients. All these parametersbetween can be computed using a calibration toolbox [19]. position of the principal point (the intersection the optical axis and the image plane), and However, using a calibration toolbox requires a certain expertise, a significant amount of effort from However, using a calibration requires a certain expertise, a significant amount of effort from the distortion Alltoolbox these parameters the part of the coefficients. user, and makes the system hard tocan set be up computed and use. using a calibration toolbox [19]. the part of the user, and makes the system hard to set up and use. However, using a calibration toolbox requires a certain expertise, a significant amount of effort from the part of the user, and makes the system hard to set up and use.

Sensors 2016, 16, 1721

11 of 21

An alternative is to make several simplifying assumptions, and use the information provided by the Android API. We can assume that the principal point is in the center of the image (modern camera technology ensures that the principal point is close to the image center, and a precise estimation of this point is not required for monocular applications), and to ignore the distortion coefficients of the lens. The only intrinsic parameter that still needs to be calibrated is the focal distance. Using the available Android API, developers can access the vertical and the horizontal view angles, by calling the getVerticalViewAngle() or the getHorizontalViewAngle() methods of the Camera. Parameters object. Denoting the horizontal view angle by α, and the image width in pixels by w, the focal distance in pixels f can be computed as: f =

w 2tan(α/2)

(16)

The same equation can be applied for the vertical field of view, and the image height h, and the resulted focal length should be the same, if the camera pixel is square. If the camera supports variable zoom, the focal length can be adjusted to take into account the zoom level. Using Equation (16), the intrinsic parameters of the camera can be computed automatically when the system is started. The intrinsic parameters matrix A is thus expressed by Equation (17), which assumes that the principal point of the camera model is in the image center:   f 0 w/2   A =  0 f h/2  (17) 0 0 1 In order to test the automatic approach for focal distance calibration, we have performed classic calibration, using the Open CV calibration toolbox [19], for several mobile devices. The results are shown in Table 1. Table 1. Focal distance calibration comparison.

f computed using Android API f computed using Calibration Toolbox

Samsung Galaxy S5

Sony Xperia Z1

Motorola Moto G

HTC One Mini 2

697 686

484 502

598 617

530 520

While not perfectly identical, the results are in good agreement, with less than 4% percent error. These values can be used in a monocular vision application. 4.2.2. Calibration of the Extrinsic Camera Parameters The extrinsic parameters relate the 3D coordinate system of the camera to the 3D coordinate system of the vehicle. The detection is performed in the coordinate system of the host vehicle, which is has the origin on the longitudinal symmetry plane of the car, whose intersection with the road plane is the distance axis, Z. The lateral coordinate, X, points to the right of the direction of travel, and the Y coordinate points towards the road. The position of the origin coincides with the closest point of the road in front of the vehicle, as seen in Figure 11.

Sensors 2016, 16, 1721

12 of 21

Sensors 2016, 2016, 16, 16, 1721 1721 Sensors

12 of 21

Figure 11. The coordinate system of the vehicle. Figure 11. The coordinate system of the vehicle. Figure 11. The coordinate system of the vehicle.

The extrinsic parameters are represented by the rotation matrix R and the translation vector T. Thecalibration extrinsic parameters are represented by the rotation R and the translation T. A full of these parameters can be performed usingmatrix a calibration toolbox, and avector reference The extrinsic parameters are represented by the rotation matrix R and the translation vector T. Ascene full calibration of at these parameters using a calibration toolbox, this and means a reference with objects known positionscan in be theperformed car coordinate system. Unfortunately, again A fullwith calibration of these parameters can becar performed using a calibration toolbox,this andmeans a reference scene objects at known positions in the coordinate system. Unfortunately, again that the system cannot be set up easily. Instead, we have made several simplifying assumptions, and scenethe with objects at known positions in the car coordinate system. Unfortunately, thisassumptions, means again that system cannot be set upallowing easily. Instead, weeasily have tune made several simplifying designed a simple interface for the user to a simplified extrinsic parameter set. that designed the system cannotinterface be set up easily. Instead, user we have made several simplifying assumptions, and and a simple tune a simplified parameter The user has to set upfor theallowing mobilethe device toineasily the vehicle, aligningextrinsic the camera with set. the designed a simple interface formobile allowing the user tovehicle, easily tune a simplified extrinsic parameter set. The user has to set up the device in the aligning the camera with the longitudinal longitudinal axis of the car as best as possible. This alignment is not critical, and can be easily The toas set up the mobile device in critical, the vehicle, aligning the camerabywith the axis of theuser car ashas best possible. This alignment is not and can be achieved the user. achieved by the user. The calibration view of the application allows theeasily user to input the height of longitudinal axis of the car as best as possible. This alignment is not critical, and can be easily The calibration view of the application allows the user to input the height of the camera above the the camera above the ground (H), and the distance from the camera to the front end of the car (L). achieved byand the the user. The calibration view of to the application allows the(L). user to input the height of ground (H), distance from the camera the front end of the car These parameters canmust be These parameters can be easily measured. A very critical parameter is the pitch angle θ, which the camera above the ground and theisdistance from theθ,camera to thebefront ofby thethe caruser. (L). easily measured. very critical(H), parameter the pitch angle which must also end input be also input by A the user. These parameters can be easily measured. A very critical parameter is the pitch angle θ, which must To Toassist assistthe theuser, user,the thesystem systemperforms performslive liveInverse InversePerspective PerspectiveMapping Mappingon onthe thecapture captureimages, images, bereal alsotime. inputAbygrid theisuser. in superimposed on the IPM result, and the horizon line, resulted in real time. A grid is superimposed on the IPM result, and the horizon line, resultedfrom fromthe thepitch pitch To assist the user, theperspective system performs live Inverse Perspective Mappingthe on the capture images, angle, angle,isisprojected projectedon onthe the perspectiveimage, image,asasseen seenininFigure Figure12. 12.This Thisway, way, theuser usercan canadjust adjustthe the in real time.until A grid ishorizon superimposed on the IPM result, andofthe horizon line, resulted from the pitch parameters the line matches the true horizon the observed scene, and if the observed parameters until the horizon line matches the true horizon of the observed scene, and if the observed angle,has is projectedaton the perspective image, as seen in Figure 12. This way, the user can adjust the scene scene hasobjects objects atknown knowndistances, distances,these thesedistances distancescan canbe becompared comparedwith withthe themetric metricgrid gridduring during parameters until the horizon line matches the true horizon of the observed scene, and if the observed parameter tuning. If road markings are visible, a good indication for the correctness of the pitch parameter tuning. If road markings are visible, a good indication for the correctness of theangle pitch scene objects at known distances, distances can be compared with the metric grid during is that has they must parallel in the these IPM image. angle is that theyappear must appear parallel in the IPM image. parameter tuning. If road markings are visible, a good indication for the correctness of the pitch angle is that they must appear parallel in the IPM image.

Figure12. 12.The Thecalibration calibrationuser userinterface. interface. Figure Figure 12. The calibration user interface. Theparameters parametersH, H,LLand andθ θinput inputbybythe theuser userare areused usedtotogenerate generatethe thetranslation translationvector vectorand andthe the The rotation matrix: rotation Thematrix: parameters H, L and θ input by the user are used to generate the translation vector and the

rotation matrix:

T TT=(00 HH L)L

T

T  0 H

L

T

(18) (18) (18)

Sensors 2016, 16, 1721 Sensors 2016, 16, 1721

13 of 21 13 of 21

0 0  1   R   01 cos0   0sin      00 sin   −cos R= cosθ sinθ       

(19) (19)

0 sinθcan cosθ Now the intrinsic and the extrinsic parameters be used to build the projection matrix, which will relate 3D point and in thethe carextrinsic coordinate system tocan a point in thetoimage. matrix is used for Now any the intrinsic parameters be used build This the projection matrix, computing the Inverse Perspective Transform image. which will relate any 3D point in the car coordinate system to a point in the image. This matrix is used for computing the Inverse Perspective Transform P A[R | T]image.

(20)

A[R|T] easily to include sliders for the other two (20) The user calibration user interface can P be=extended rotation yaw anduser roll.interface Each of can these willeasily create rotation matrix, cantwo be The angles, user calibration be angles extended to ainclude sliders for which the other multiplied to the rotation matrix of the pitch, to form the total rotation. However, the user can rotation angles, yaw and roll. Each of these angles will create a rotation matrix, which can be multiplied position the mobile a reasonably heading (yaw), and levelcan it to a reasonable roll to the rotation matrixdevice of the to pitch, to form theaccurate total rotation. However, the user position the mobile angle. of the order of magnitude of degrees (±5°)it in angles roll willangle. have aErrors certain effect on device Errors to a reasonably accurate heading (yaw), and level to these a reasonable of the order ◦ the final results, causing the obstacles to become slightly misaligned, or having their lateral position of magnitude of degrees (±5 ) in these angles will have a certain effect on the final results, causing the slightly but errors of much less magnitude thelateral pitchposition angle will haveoff, significant in obstaclesoff, to become slightly misaligned, or having in their slightly but errorseffects of much distance estimation. For this reason, and for trying to keep the user interface as simple as possible, less magnitude in the pitch angle will have significant effects in distance estimation. For this reason, the having the user tuneas only the pitch angle was made. andcompromise for trying to of keep the user interface simple as possible, the compromise of having the user tune the intrinsic and the extrinsic parameters can be used to build the projection matrix, which only Now the pitch angle was made. will relate 3D point and in thethe carextrinsic coordinate system tocan a point in thetoimage. matrix is used for Now any the intrinsic parameters be used build This the projection matrix, computing the Inverse Perspective Transform image. which will relate any 3D point in the car coordinate system to a point in the image. This matrix is used for computing the Inverse Perspective Transform image. 5. Experimental Setup and Evaluation 5. Experimental Setup and Evaluation The algorithm was tested on recorded sequences from mostly urban scenarios, under a various rangeThe of lighting conditions. from a wide range shapes, speeds and orientations have algorithm was testedObstacles on recorded sequences from of mostly urban scenarios, under a various been successfully detected, and their static and dynamic properties successfully measured. The IPM range of lighting conditions. Obstacles from a wide range of shapes, speeds and orientations have been image, the binary obstacle grid, theand estimated with color codingThe of IPM speed and successfully detected, and their static dynamicoccupancy properties grid successfully measured. image, orientation for eachgrid, cell (see Figure 13),occupancy and the projected on theoforiginal perspective image the binary obstacle the estimated grid withcuboids color coding speed and orientation for are forFigure several13), realand traffic in Figures eachshown cell (see the scenarios, projected cuboids on14–16. the original perspective image are shown for In Figures 14 scenarios, and 15, successful situations several real traffic in Figures 14–16. are shown in a variety of challenging scenarios, and two clear failure are shown in Figure 16. scenarios are explained in the captions of two the In Figures 14 cases and 15, successful situations are The shown in a variety of challenging scenarios, and figures. The detected objects are16. displayed with the red, and stationaryofobjects with clear failure cases aremoving shown in Figure The scenarios arecolor explained in the captions the figures. green. The detected moving objects are displayed with the color red, and the stationary objects with green.

Figure 13. 13.Color Colorcoding coding occupancy vectors (full half occupancy). The hue forfor occupancy grid grid speedspeed vectors (full and halfand occupancy). The hue represents represents the direction (i.e., red means the obstacle cell is moving to the right), the saturation the direction (i.e., red means the obstacle cell is moving to the right), the saturation represents the represents the speed and the intensity representsprobability the occupancy the IEEE. grid speed magnitude, andmagnitude, the intensity represents the occupancy of theprobability grid [16]. ©of2011 [16]. © 2011with IEEE. Reprinted, with permission, from Transactions on Intelligent Transportation Reprinted, permission, from IEEE Transactions onIEEE Intelligent Transportation Systems, Vol. 12, No. 4. Systems, Vol. 12, No. 4.

The qualitative analysis shows that most of the obstacles are correctly detected, but there are The qualitative analysis that most of the obstacles areofcorrectly detected, but there still problems with objects of shows low contrast (especially at the point contact with the road, whichare is still problems with objects of low contrast (especially at the point of contact with the road, which is the point that matters for IPM), and with strong shadows in bright daylight. A video showing the point matters for IPM), and with strong and shadows in bright daylight. A video results results forthat a 11 min drive sequence (recorded processed offline) can be seen atshowing http://vimeo. for a 11 minObstacles drive on sequence processed offline) canm. Each be seen at com/107588262. the road (recorded are detectedand at various distances, up to 40 obstacle http://vimeo.com/107588262. Obstacles on the road are detected at various distances, up to 40 m. is bounded by a cuboid and the distance from the ego-vehicle to the obstacles is written below the

Sensors 2016, 16, 1721

14 of 21

Sensors 2016, 16, 1721

14 of 21

cuboid. Stationary obstacles are shown as green cuboids, while moving obstacles are shown as red Each obstacle bounded byframe, a cuboid and the distance from the using ego-vehicle to coding the obstacles is cuboids. On theistop left of the the occupancy grid is displayed, the color for speed written below the cuboid. Stationary obstacles are shown as green cuboids, while moving obstacles and orientation of each cell, as described by Figure 13. are shown as redwas cuboids. top left of thedevices: frame, the occupancyGalaxy grid isS5 displayed, using The system testedOn on the multiple mobile the Samsung phone, and the the LG color andGalaxy orientation of each cell,Tests as described by Figure 13. traffic with the devices GPadcoding 8.3 andfor thespeed Samsung Tab Pro tablets. were performed in city The in, system was them testedtoon multiple mobile devices: Samsung S5 phone,and andfeedback the LG plugged allowing run at full processing power.the The usage ofGalaxy vocal assistance GPadreduce 8.3 and Samsung Galaxy Tabscreen Pro tablets. Tests were performed in city traffic with the will thethe necessity of the device to be on. Depending on the complexity of the scene, devices plugged in, allowing them to run at full processing power. The usage of vocal assistance and the system’s processing time ranged from 50 to 80 ms. The resulted frame rate ranged between 9 and feedback reduceframe the necessity of the device screen toprocess be on. Depending on the complexity ofrate the 15 fps, as will the whole cycle includes the acquisition and the result display. A frame scene, the system’s processing time video, rangedis from 50 toin80 ms.2.The resulted frame rate ranged comparison, using the same recorded presented Table between 9 and 15 fps, as the whole frame cycle includes the acquisition process and the result Table 2. comparison, Frame rate comparison processing recorded sequence. display. A frame rate using theusing sameoffline recorded video,ofisa presented in Table 2. S5 using Sonyoffline Xperiaprocessing Z1 Motorola Moto Gsequence. HTC One Mini 2 Table 2. Samsung Frame rateGalaxy comparison of a recorded Average frame rate

Average frame rate

17.6 Samsung Galaxy S5 17.6

Sony11.4 Xperia Z1 11.4

8.7Moto G Motorola 8.7

14.5Mini 2 HTC One 14.5

In order to assess the battery consumption of the system, we have conducted power tests with the In order to assess the battery ofruns, the system, have battery conducted power tests with devices running on battery power,consumption during 20 min and thewe overall drain was measured. the running on in battery during 20 min is runs, the overall was The devices results are presented Table power, 3. Battery consumption also and influenced by thebattery fact thatdrain currently measured. The are in Table Battery consumption is the alsotime. influenced by the fact the output of theresults system is presented visual, requiring the3.device screen to be on all However, real that currently the output of the system is visual, requiring the device screen to be on all the time. use of such a driving assistance system will be either as an audio warning system, or as a data source However, the real(such use of a driving assistance willinto be either as an of audio warning system, for other devices as such the car’s computer itself), system connected an Internet Things configuration, or asthe a data othertodevices (such thewill car’s computer itself), connected into an Internet of and needsource for thefor screen be on all the as time disappear. Things configuration, and the need for the screen to be on all the time will disappear. Table 3. Battery drain on a 20 min test run.

Table 3. Battery drain on a 20 min test run. Samsung Galaxy S5 Battery Batteryusage usage(% (%ofoftotal) total)

Samsung Galaxy S5 13% 13%

Samsung Galaxy S6

Samsung Galaxy S6 11% 11%

Motorola Moto G

Motorola Moto G 13% 13%

HTC One M8

HTC One M8 12% 12%

Figure 14. (Left) Three dynamic objects, going in the same direction as the observing vehicle; and Figure 14. (Left) Three dynamic objects, going in the same direction as the observing vehicle; (Right) pedestrian on a on worn out crossing. In theIndistance, a vehicle is detected as stationary, as it and (Right) pedestrian a worn out crossing. the distance, a vehicle is detected as stationary, waits to enter the the roundabout. TheThe IPM image, thethe segmentation as it waits to enter roundabout. IPM image, segmentationresult resultand andthe the occupancy occupancy grid grid estimation are shown on top of the perspective image. estimation are shown on top of the perspective image.

Sensors 2016, 16, 1721 Sensors 2016, 16, 1721

15 of 21 15 of 21

Figure 15. (Left) (Left)Detected Detected incoming vehicle, at a relative speed of 100 around 100 and (Right) Figure 15. incoming vehicle, at a relative speed of around km/h; andkm/h; (Right) diagonally diagonally crossing object, on patched road. The uneven road surface does not impede detection. crossing object, on patched road. The uneven road surface does not impede detection.

Figure 16. bicyclist on the rightright side side lane is notis detected, due to its low Figure 16. Failure Failurecases: cases:(Left) (Left)The The bicyclist on the lane not detected, due to contrast its low between between the wheels the road; (Right) false positives to strong shadows in shadows very bright contrast theand wheels and and the road; and (Right) falsedue positives due to strong in light. very bright light.

Figures 17–19 show the application running on mobile devices, in urban traffic. Figures 17–19 show the application running on mobile devices, in urban traffic.

Sensors 2016, 16, 1721 Sensors 2016, 16, 1721

16 of 21 16 of 21

Figure 17. 17. Obstacle Obstacle detection detection on on the the Samsung Samsung Galaxy Galaxy S5 S5 smartphone: smartphone: Stop Stop and and go go traffic. traffic. Figure

Figure 18. onon thethe Samsung Galaxy Tab Tab Pro, Pro, waiting to cross an intersection: (Left) Figure 18. Obstacle Obstacledetection detection Samsung Galaxy waiting to cross an intersection: perspective projection of the detection results; and (Right) top view of the detected obstacles. (Left) perspective projection of the detection results; and (Right) top view of the detected obstacles.

Sensors 2016, 16, 1721 Sensors 2016, 16, 1721 Sensors 2016, 16, 1721

17 of 21 17 of 21 17 of 21

Figure 19. 19. Detection of of moving and and parked obstacles. obstacles. The speed vector of of the parked parked vehicles is is not Figure Figure 19. Detection Detection of moving moving and parked parked obstacles. The The speed speed vector vector of the the parked vehicles vehicles is not not correctly estimated as their limits in the direction of our motion are not clear. correctly estimated as their limits in the direction of our motion are not clear. correctly estimated as their limits in the direction of our motion are not clear.

Due to to the the monocular, monocular, grayscale grayscale nature nature of of the the information information used used for for detection, detection, the results are are Due the results information sometimes affected false positives, positives, due shadows on the road road (Figure (Figure 20), 20), or or by by false false negatives negatives sometimes affected by by false due to to shadows on the negatives when the contrast between the road and the obstacle is not high enough. when the contrast between the road and the obstacle is not high enough. when the contrast between the road and the obstacle is not high enough.

Figure 20. False positives due to shadows on the road. Figure 20. 20. False positives due due to to shadows shadows on on the the road. road. Figure False positives

In order to assess the accuracy of the obstacle measurement process, the obstacle range obtained In order order to to assess assess the the accuracy accuracy of of the the obstacle measurement process, process, the the obstacle obstacle range range obtained obtained In obstacle from the proposed solution is compared with themeasurement range provided by a stereovision-based obstacle from the proposed solution is compared with the range provided by a stereovision-based obstacle from the system. proposed solution is compared range by a stereovision-based tracking The monocular solutionwith wasthe tested onprovided the left frame of the stereo pair, obstacle and the tracking system. system.The Themonocular monocularsolution solution was tested on the left frame ofstereo the stereo pair,the and the tracking was tested on the left frame of the pair, and stereo stereo system used both images. The stereo images were acquired with a 22 cm baseline stereo rig, stereo system used both images. The stereo images were acquired with a 22 cm baseline stereo rig, system used both images. The stereo images acquired a 22 cmwith baseline rig,the using two using two 2/3” CCD cameras (8.3 μm × 8.3 were μm pixel size)with equipped 8 mmstereo lenses, images using two 2/3” CCD cameras (8.3 μm × 8.3 μm pixel size) equipped with 8 mm lenses, the images being down sampled to a resolution of 512 × 383 pixels. This stereovision system is a custom rig, being down sampled to a resolution of 512 × 383 pixels. This stereovision system is a custom rig,

Sensors 2016, 16, 1721

18 of 21

Sensors 2016, 16, 1721

18 of 21

2/3” CCD cameras (8.3 µm × 8.3 µm pixel size) equipped with 8 mm lenses, the images being down built for accuracy, not aof stereo of a mobile device, which have small baseline be used sampled to a resolution 512 ×system 383 pixels. This stereovision system is aa too custom rig, built fortoaccuracy, for measuring distances in the ranges required for the driving environment. not a stereo system of a mobile device, which have a too small baseline to be used for measuring An obstacle in front of the for observing vehicle was tracked form approximately 800 frames, or distances in the ranges required the driving environment. aboutAn80obstacle s. During thisoftime, the target vehicle andapproximately decreased its800 distance the in front the observing vehicle wasincreased tracked form frames,from or about observer multiple times, and the observing vehicle was following. The target vehicle was not 80 s. During this time, the target vehicle increased and decreased its distance from the observer knowledgeably involved in the experiment, its actions Theknowledgeably track was lost multiple times, and the observing vehicle wasand following. Thewere targetindependent. vehicle was not when the target vehicle passed beyond 35 m, which seems to be the detection limit of our involved in the experiment, and its actions were independent. The track was lost whensystem. the target Thepassed rangebeyond estimation shown 21. limit While the estimations of the vehicle 35 m,results which are seems to be in theFigure detection of noisier, our system. monocular system follow closely results.21. ForWhile the whole sequence, the Root of Mean The range estimation resultsthe arestereovision shown in Figure noisier, the estimations the Square Error was 1.33 m, and the Mean Average Error 1.02 m. monocular system follow closely the stereovision results. For the whole sequence, the Root Mean In Error orderwas to estimate thethe detection rate, weError have1.02 used Square 1.33 m, and Mean Average m. the sequence presented in the video available at the link http://vimeo.com/107588262, for which we have also theinright image of the In order to estimate the detection rate, we have used the sequence presented the video available stereo pair, and therefore a comparison between the monocular and the stereovision method can be at the link http://vimeo.com/107588262, for which we have also the right image of the stereo pair, performed. A amanual identification themonocular objects in the range of method 0 to 40 m, a and therefore comparison betweenofthe anddistance the stereovision canand be spanning performed. lateral distance of three lanes (the current lane and two side lanes), was performed on 40 frames of A manual identification of the objects in the distance range of 0 to 40 m, and spanning a lateral distance the sequence, spaced in and timetwo at approximately 15 performed s, covering on a driving timeofofthe 11 sequence, min. The of three lanes equally (the current lane side lanes), was 40 frames obstacles were in heterogeneous, including15cars, pedestrians, and time scenery. resultswere are equally spaced time at approximately s, covering a driving of 11The min.detection The obstacles presented in Table 4. As expected, the monocular solution shows a poorer detection rate heterogeneous, including cars, pedestrians, and scenery. The detection results are presented in Table 4. performance with stereovision, butathis rate detection was computed taking into account all types As expected, compared the monocular solution shows poorer rate performance compared with of obstacles, including scenery such as poles, which have a low contrast with respect to the road stereovision, but this rate was computed taking into account all types of obstacles, including scenery surface, and which also have the obstacles which onlyto the partially visibleand in also thetheimage, such as such as poles, a low contrast withare respect road surface, obstacles which overtaking/overtaken cars, which are such easilyasdetected by stereovision, butwhich cannotare beeasily reliably detected are only partially visible in the image, overtaking/overtaken cars, detected by by the monocular solution. If the comparison were limited to only cars and pedestrians clearly stereovision, but cannot be reliably detected by the monocular solution. If the comparison were limited visible front the vehicle (at risk visible of beinginhit), the rates much more similar. to onlyin cars andofpedestrians clearly front ofdetection the vehicle (at would risk of be being hit), the detection rates would be much more similar. 40

35

Obstacle Distance (m)

30

25

20

15

10

5

Monocular

Stereo-KF 755

730

705

680

655

630

605

580

555

530

505

480

455

430

405

380

355

330

305

280

255

230

205

180

155

130

80

105

55

5

30

0

Frame number

Figure 21. Distance estimation comparison between the proposed monocular solution and a Figure 21. Distance estimation comparison between the proposed monocular solution and a stereovision-based solution. stereovision-based solution.

Table 4. Detection rate comparison between the monocular approach and stereovision. Ground

Stereovision: True

Monocular: True

Stereovision: False

Monocular:

Sensors 2016, 16, 1721

19 of 21

Table 4. Detection rate comparison between the monocular approach and stereovision.

Number of obstacles Detection rates

Ground Truth

Stereovision: True Positives

Monocular: True Positives

Stereovision: False Positives

Monocular: False Positives

162 -

157 96.7%

110 67.9%

6 3.7%

3 1.8%

The distance measuring capabilities of the monocular application was also tested against ground truth data acquired with laser measuring equipment. In the absence of a laserscanner, distances to stationary targets were measured using a Leica Disto D5 laser ranging tool, having a measurement of error of ±1 mm, and an effective range of 0.05 m to 200 m. The mobile application was run on a Samsung Galaxy S6 mobile phone. A comparison between the distance reported by the monocular mobile application for the detected objects, and the ground truth from laser measurements is provided in Table 5. The relative error of distance measurement is around 10%, which is expected from a monocular vision system. Table 5. Distance measurement accuracy—comparison with laser data. Ground Truth (m)

Monocular Estimated Distance (m)

Absolute Error (m)

Relative Error (%)

6.29 11.87 13.95 14.08 14.32 17.32 18.89 22.81 31.32 31.34

6.7 11.6 14.0 15.7 16.1 18.1 17.3 21.7 33.9 34.7

0.41 0.27 0.05 1.62 1.78 0.78 1.59 1.11 2.58 3.36

6.52% 2.27% 0.36% 11.51% 12.43% 4.5% 8.42% 4.87% 8.24% 10.72%

6. Conclusions and Future Work This paper describes a mobile device based, monocular computer vision solution for detecting the position, size and speed of the obstacles in traffic. The detection method uses a simple and fast segmentation technique, followed by free-form tracking of the obstacle areas using the particle based occupancy grid. The tracked grid cells are grouped into discrete obstacles. As the grouped cells already have a speed vector, the resulted obstacles will have speed vector as well, with magnitude (absolute speed) and orientation. While most existing solutions are able to detect clearly visible vehicles, especially those that are followed by the host vehicle, the proposed system makes little assumptions about the appearance of the obstacles. The system was tested in real, heavy urban traffic, and shows promising results, turning a mobile device into a driving assistance sensory system. There is significant room for improvement. Due to monocular vision limitations, the false positives or the false negatives are always a threat, especially for a generic obstacle detection system. The reduction of the error rate can be achieved by using color data (as they are already provided by the device’s camera) or optical flow. The integration of a lane tracker algorithm to the system is in progress, and the estimated pitch angle of the lane can reduce the distance measurement errors caused by forward-backward balancing. In order to transform the method into a useful system for driving assistance, the detection results have to be communicated non-disruptively. A basic audio warning signal can be issued when collision is likely, but a more detailed description of the obstacles, including all their parameters, can be transmitted using Bluetooth or WiFi, to be used by other systems which may take active steps to prevent accidents, log data for traffic analysis or real time mapping, or monitor the driver’s behavior.

Sensors 2016, 16, 1721

20 of 21

Acknowledgments: Research supported by grants of the Romanian Authority for Scientific Research, Projects PN-II-PCE-2011-3-1086 (MultiSens) and PN-II-PCCA-2011-3.2-0742 (SmartCoDrive). Author Contributions: R.D. designed the detection algorithms; R.D. and R.I. designed the calibration methodology; R.I. and A.P. implemented the algorithms for mobile devices; R.D., R.I. and A.P. designed and performed the experiments, and wrote the paper. Conflicts of Interest: The authors declare no conflict of interest.

References 1. 2. 3. 4. 5.

6.

7.

8. 9. 10.

11.

12. 13.

14. 15.

16. 17.

iOnRoad. iOnRoad Augmented Driving Pro. Available online: http://www.ionroad.com/ (accessed on 28 July 2016). Drivea. Drivea—Driving Assistant App. Available online: http://www.appszoom.com/android_ applications/transportation/drivea-driving-assistant-app_bdwmk.html (accessed on 28 July 2016). Movon Corporation. Movon FCW. Available online: https://play.google.com/store/apps/details?id=com. movon.fcw (accessed on 28 July 2016). Sivaraman, S.; Trivedi, M.M. Looking at Vehicles on the Road: A Survey of Vision-Based Vehicle Detection, Tracking, and Behavior Analysis. IEEE Trans. Intell. Transp. Syst. 2013, 14, 1773–1794. [CrossRef] Petrovai, A.; Costea, A.; Oniga, F.; Nedevschi, S. Obstacle detection using stereovision for Android-based mobile devices. In Proceedings of the 2014 IEEE 10th International Conference on Intelligent Computer Communication and Processing, Cluj-Napoca, Romania, 4–6 September 2014; pp. 141–147. Haltakov, V.; Belzner, H.; Ilic, S. Scene Understanding from a Moving Camera for Object Detection and Free Space Estimation. In Proceedings of the 2012 IEEE Intelligent Vehicles Symposium, Alcala de Henares, Spain, 3–7 June 2012; pp. 105–110. Wybo, S.; Tsishkou, D.; Vestri, C.; Abad, F.; Bougnoux, S.; Bendahan, R. Monocular vision obstacles detection for autonomous navigation. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Nice, France, 22–26 September 2008; pp. 4190–4196. Reisman, P.; Mano, O.; Avidan, S.; Shashua, A. Crowd detection in video sequences. In Proceedings of the 2004 IEEE Intelligent Vehicles Symposium, Parma, Italy, 14–17 June 2004; pp. 66–71. Zhang, X.; Jiang, P.; Wang, F. Overtaking Vehicle Detection Using A Spatio-temporal CRF. In Proceedings of the 2014 IEEE Intelligent Vehicles Symposium, Dearborn, MI, USA, 8–11 June 2014; pp. 338–343. Graefe, V.; Efenberg, W. A Novel Approach for the Detection of Vehicles on Freeways by Real-Time Vision. In Proceedings of the 1996 IEEE Intelligent Vehicles Symposium, Tokyo, Japan, 19–20 September 1996; pp. 363–368. Lim, K.H.; Ang, L.M.; Seng, K.P.; Chin, S.W. Lane-Vehicle Detection and Tracking. In Proceedings of the 2009 International Multi Conference of Engineers and Computer Scientists (IMECS), Hong Kong, China, 18–20 March 2009; pp. 1124–1129. Sivaraman, S.; Trivedi, M.M. A General Active-Learning Framework for On-Road Vehicle Recognition and Tracking. IEEE Trans. Intell. Transp. Syst. 2010, 11, 267–276. [CrossRef] Arrospide, J.; Salgado, L.; Marinas, J. HOG-like Gradient-based Descriptor for Visual Vehicle Detection. In Proceedings of the 2012 IEEE Intelligent Vehicles Symposium, Alcala de Henares, Spain, 3–7 June 2012; pp. 223–228. Bertozzi, M.; Broggi, A. GOLD: A Parallel Real-Time Stereo Vision System for Generic Obstacle and Lane Detection. IEEE Trans. Image Process. 1998, 7, 62–81. [CrossRef] [PubMed] Tuohy, S.; O’Cualain, D.; Jones, E.; Glavin, M. Distance Determination for an Automobile Environment Using Inverse Perspective Mapping in OpenCV. In Proceedings of the 2010 IET Signal and Systems Conference (ISSC), Cork, Ireland, 23–24 June 2010; pp. 100–105. Danescu, R.; Oniga, F.; Nedevschi, S. Modeling and Tracking the Driving Environment with a Particle-Based Occupancy Grid. IEEE Trans. Intell. Transp. Syst. 2011, 12, 1331–1342. [CrossRef] Itu, R.; Danescu, R. An Efficient Obstacle Awareness Application for Android Mobile Devices. In Proceedings of the IEEE 10th International Conference on Intelligent Computer Communication and Processing (ICCP), Cluj-Napoca, Romania, 4–6 September 2014; pp. 157–163.

Sensors 2016, 16, 1721

18. 19.

21 of 21

Isard, M.; Blake, A. CONDENSATION—Conditional density propagation for visual tracking. Int. J. Comput. Vis. 1998, 29, 5–28. [CrossRef] OpenCV. Camera Calibration with OpenCV. Available online: http://docs.opencv.org/doc/tutorials/ calib3d/camera_calibration/camera_calibration.html (accessed on 28 July 2016). © 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

Fast video encryption using the H.264 error propagation property for smart mobile devices.

Development of a mini-mobile digital radiography system by using wireless smart devices.

Using smart mobile devices in social-network-based health education practice: a learning behavior analysis.

Electricity usage scheduling in smart building environments using smart devices.

Dynamic population mapping using mobile phone data.

Comprehensive model for predicting perceptual image quality of smart mobile devices.

VibeComm: radio-free wireless communication for smart devices using vibration.

Democratizing digital content creation using mobile devices with inbuilt sensors.

Privacy risks when using mobile devices in health care.

Using mobile devices for teaching and learning in clinical medicine.

Enhancing patient adherence: introducing smart pill devices.

Diabetic foot ulcer mobile detection system using smart phone thermal camera: a feasibility study.

How mobile devices are changing pharmacy practice.

Towards Scalable Graph Computation on Mobile Devices.

Mobile devices as adjunctive pain management tools.

OnCampus: a mobile platform towards a smart campus.

Smart caching based on mobile agent of power WebGIS platform.

Analyzing Activity Behavior and Movement in a Naturalistic Environment Using Smart Home Techniques.

Visual perception: a dynamic theory.

Smart devices and a future of hybrid tobacco cessation programs.

Perception and attitude of Korean physicians towards generic drugs.

Dynamic Dazzle Distorts Speed Perception.

Comparing dynamic treatment regimes using repeated-measures outcomes: modeling considerations in SMART studies.

Improving the accuracy of smart devices to measure noise exposure.