The Jonker-Volgenant algorithm applied to click-train separation.

The Jonker-Volgenant algorithm applied to click-train separation (L) Paul M. Baggenstossa) Department 1511, Naval Undersea Warfare Center, 1176 Howell Street, Newport, Rhode Island 02841

(Received 8 October 2013; revised 7 February 2014; accepted 10 March 2014) The problem of click-train separation is cast as a linear assignment problem to obtain a faster solution guaranteed to achieve the global minimum error. It is shown how the problem can be cast in a compact matrix form that is solvable by an off-the-shelf algorithm, the Jonker-Volgenant algorithm. [http://dx.doi.org/10.1121/1.4869677] PACS number(s): 43.30.Sf, 43.30.Wi, 43.60.Jn, 43.60.Lq [MS]

For each click-pair, we calculate a pairing error

I. INTRODUCTION AND BACKGROUND

The problem of click-train separation, or segregation, is important for classification and localization of clicking marine mammals. When multiple animals are vocalizing and/or there is multipath, the resulting output from a click detector can be confusing. Separating the various click-trains into individual whales and/or paths is key to localization and classification.1,2 In two previous publications, we cast click separation as an optimization problem.3,4 In this short paper, we apply the Jonker-Volgenant algorithm to obtain a much faster solution that is guaranteed to achieve the global error minimum. II. PROBLEM DEFINITION

There are two formulations of the click-separation problem. The first is based on click-pairs, and the second is based on click-triples. Using click-triples has the advantage that inter-click interval (ICI) can be quantified, but may result in ambiguous solutions. A. Click-pair formulation

Let there be Nd detected clicks received at a single sensor with detection times tðiÞ;

1 i Nd :

We assume that t(j) > t(i) if j > i. We can immediately examine the list of click-times and determine a list of potential pairings. For two clicks to be paired means that the two clicks could be two consecutive clicks in a click-train. Therefore, the time difference t(j) t(i) is consistent with the ICI of the given species. Furthermore, we can examine the amplitude ratio a(i)/a(j) to see if the amplitude change is within acceptable bounds. Let there be Np potential click-pairs fik ; jk g

1 k Np ;

where ik, jk are the first and second detections for pair k.

a)

Author to whom correspondence should be addressed. Electronic mail: [email protected]

J. Acoust. Soc. Am. 135 (5), May 2014

Ek ¼

ðtðjk Þ tðik Þ sÞ2 ½logfaðjk Þ=aðik Þg2 þ þ ; r2t r2a

where s is the average ICI for the given species, r2t is the ICI variance, and r2a is the variance of the log amplitude ratio. Any additional click-pairing error statistics can be used including correlation coefficient and spectral difference measures. Alternatively, a statistical error metric Lik,jk, as presented in Ref. 4, Eq. (2), is also possible. Define a grouping g as a potential click-separation solution. It is a proposed organization of the available Nd detections into groups such that: (1) Each click is in exactly one group. (2) Each group has at least one click. (3) For groups of two clicks or more, all clicks in the group are ordered into a list of clicks such that each consecutive click pair is in the list of Np available pairs. The maximum number of groups is Nd if each click is in a group by itself. The error associated with grouping g, denoted by E(g), is the sum of the click-pair errors Ek for all the clickpairs in all the groups in grouping g. The last click in each group is assessed a penalty p. The error contribution for one group is Ek þ El þ Em þ þ p;

1 i Nd ;

and amplitudes aðiÞ;

Pages: 2485–2487

(1)

where by definition, the second click in pair k must be the same as the first click in pair l, and so on. Each group that contains just one click contributes error p to the total. Typically, p is set to the largest acceptable pairing error, the result being that pairs with error greater than that will not be utilized because the algorithm can achieve a lower total error by separating the clicks. We define the best grouping g* as the grouping that achieves the minimum of E(g) over all valid groupings. B. Click-triple formulation

The click-pair formulation does not account for ICI consistency. For example, it can assign click i to click j with ICI s1 ¼ tj ti, then assign j to click k with ICI s2 ¼ tk tj, without considering if s1 and s2 are consistent. By formulating the problem as triples, we can consider ICI consistency. Let us consider the Nt triples created by pairing each click-pair k with another click-pair l such that jk ¼ il. Thus, triple m is defined by the two click-pairs

0001-4966/2014/135(5)/2485/4/$30.00

2485

fkm lm g; where km, lm are the click-pair indexes 1 km N p ;

1 lm Np ;

1 m Nt :

Let the total error associated with triple m be ETm ¼ Ekm þ Elm þ vm ; where vm is the squared ICI consistency error for triple m:

log

vm ¼

tðikm Þ tðjkm Þ tðilm Þ tðjlm Þ r2s

2 ;

where r2s is the variance of the ICI consistency (see Ref. 4, Fig. 2), and the double-subscript ikm , jkm are the left and right clicks for the click-pair km, and il m , jlm are the left and right clicks for the click-pair lm, associated with triple m. III. PROBLEM SOLUTION A. The linear assignment problem

The linear assignment (LA) problem is a mathematical optimization problem with a long and interesting history in which there are a fixed number n of workers or agents and a number m of tasks. Let ci,j be the cost associated with assigning worker i to task j. The solution of the LA problem is the list of assignments that minimizes the total cost. If m > n, then some tasks go unassigned, and if n > m, some workers have nothing to do. As simple as the problem appears on the surface, the brute-force solution is to search all m!/(m - n)! combinations (assuming m n). Fortunately, fast algorithms have been developed. Most notably, the Hungarian algorithm by Kuhn which is based on the work of two Hungarian mathematicians K~onig and Egervary.5 It is also named after Munkres due to his studies of the algorithm.6 Interestingly, Kuhn himself discovered in 2006 that the solution was published posthumously in Latin by Carl Gustav Jacobi in 1890.7,8 For n ¼ m, the original algorithm was O(n4); however, improvements achieved O(n3) complexity. The Jonker-Volgenant algorithm9 makes further improvements, and is reported to be ten times faster than a similar coding of the Munkres code for n ¼ 1000.10

pair is only specified for the forward time order—the reverse order of the same two clicks is invalid. If interference is present (it always is), we must also allow for the possibility that a click is not used in any pairing. To do this, we add an additional “task” for each worker, the null assignment. Since by definition only one worker can be assigned to a task, we need a different null assignment for each click. Thus, we have Nd workers and 2Nd tasks. An arbitrary null assignment cost [p in Eq. (1)] must be chosen for null assignments; otherwise the algorithm could assign all clicks to the null task. This is typically set to the value of the highest acceptable click-pairing error. C. Matrix formulation

We now provide specifics of the matrix formulation. Let the Nd Nd pairing error matrix D be such that, for each click pair k, Dik ;jk ¼ Ek : Each unused element of D is set to a value u equal to infinity or a large number, indicating that a pairing is not possible. A large number may be preferable to infinity if the numerical implementation cannot handle infinite values. Define the Nd 2Nd augmented pairing error matrix A ¼ ½D; ðpI þ u~IÞ; where I is the Nd Nd identity matrix, p is the null assignment cost, and ~I is the complement Nd Nd identity matrix (ones everywhere except the diagonal which is zero), and u is a large number. Define the Nd 2Nd assignment matrix C, a sparse matrix of zeros and ones, such that if Ci,j ¼ 1, it means that click i is assigned to click j as a pairing. Each row of C has exactly one non-zero entry. There is a null assignment i þ Nd defined for each click i such that if click i is assigned to null click Nd þ i, it means click i has no right-hand neighbor, so it is the last (or only) click in a group. The solution of the above grouping problem, that is the finding of g*, is equivalent to the following problem: Create the matrix C as defined above in order to minimize the total assignment error, Etot ¼

Nd X 2Nd X

Ci;j Ai;j :

i¼1 j¼1

B. Solving the click-pair problem

The LA algorithm can be adapted to solve the click-pair version of the click-train separation problem if we allow each click to be both a worker and a task. A click is thought of as a worker looking for a task when it seeks the next click in the click-train sequence. But, when a click seeks the previous click in the sequence, it is thought of as a task looking for a worker. There are therefore n ¼ Nd workers and m ¼ Nd tasks and the cost associated with each assignment is Ek where k is the respective click-pair. We must specify the cost of every pairing, so we specify an arbitrary large number for the cost of pairing two clicks that do not meet the criteria for pairing (invalid pairings). Note that each valid click 2486

J. Acoust. Soc. Am., Vol. 135, No. 5, May 2014

The following code fragment implements the algorithm, where variables are Nd ¼ Nd, D2 ¼ (pI þ u~I), C ¼ C, u ¼ u, p ¼ p. function C ¼ do_assignment (D, p, u); Nd ¼ size(D, 1); D2 ¼ eye(Nd)*p; D2 (find(D2(:)¼ ¼0)) ¼ u; A ¼ [D D2]; [C, cost] ¼ lapjv (A, 0.00001); return. The returned matrix C is then searched for non-zero elements (ones) to discover the chosen click pairings. Paul M. Baggenstoss: Letters to the Editor

FIG. 1. Result of synthetic clickpairing solution. Top: All clicks. Remaining graphs: Chains 1 to 4.

The Jonker-Volgenant algorithm9 solves this LA problem extremely rapidly. The MATLAB program lapjv.m is freely available online.10 The resulting assignment matrix C can be used to determine the click-trains by following the assignments. For example, (left) click i assigned to (right) click j, then (left) click j assigned to (right) click k, etc., continuing until a click is assigned to its null click. D. Synthetic example

We created three synthetic click-trains with ICIs of 0.3, 0.35, and 0.4 s, and amplitudes of 0.9, 2, and 4, respectively. We did not create synthetic time-series, but just the click arrival time and amplitude values in a 12-s window. We also added ten spurious clicks with random arrival times and amplitudes. We gathered all the clicks into a list sorted by time. The graph of all clicks and amplitudes are shown in Fig. 1, top. As a clickpairing error, we used the absolute value of the log of the amplitude ratio plus the log of the Gaussian distribution for ICI: Ei;j ¼ jlogðai =aj Þj þ ðtj ti sÞ2 = 2r2t ; where s ¼ 0.35 and r2t ¼ 0.2. In practice, additional features would be used to develop a more comprehensive measure of click-pairing error. The clicks assigned to four different chains are also shown in Fig. 1. It is interesting to notice that three of the ten spurious clicks were assigned to a fourth chain, while the remaining seven were not assigned (assigned to the null task). The MATLAB listing for generating this graph is provided in the Appendix. E. Solving the click-triple formulation

If we substitute the notion of click triple for click-pair, the above problem can be formulated in matrix form, just J. Acoust. Soc. Am., Vol. 135, No. 5, May 2014

like the click-pair formulation in Sec. III C. As such we define the Np Np click-triple error matrix DT with entry DTkm ;lm being the click triple error for click triple m. All unused elements of DT are, as before, set to a large number to indicate no triple is possible. We then form the augmented matrix T AT 5½DT ; ðpI1u~I Þ;

where I is the Np Np identity matrix, and ~I T is the complement Np Np identity matrix (ones everywhere except the diagonal which is zero), and u is a large number. The click-triple assignment matrix CT is defined analogously to C above. One inconsistency with the click-triple formulation is that although each click-pair may appear only once in a grouping, it is possible for a given click to appear more than once in a solution, causing an ambiguous solution. We have not found this to be detrimental to the results, however. IV. CONCLUSIONS

We have presented a matrix formulation of the clicktrain separation problem that yields to an efficient solution attaining the global error minimum using an off-the-shelf linear assignment algorithm. We presented two formulations, one based on click-trains and one based on click-triples. We provided a synthetic example of the click-pair method and the corresponding program MATLAB listing.

ACKNOWLEDGMENTS

The author would like to acknowledge support from both the ONR marine mammal Advanced Detection Classification and Localization (ADCL) program and the Living Marine Resources (LMR) program. Paul M. Baggenstoss: Letters to the Editor

2487

chain_nr(j) ¼ nchain; chain_nr(i) ¼ nchain; end; end; end; end;

APPENDIX % Parameters of the three click-trains ICI ¼ [0.3 0.35 0.4]; % ICI Ampl ¼ [0.9 2 4]; % Amplitude T ¼ 12; % total time window % generates all the click times Ts ¼ []; As ¼ []; for i ¼ 1: length (ICI), t ¼ rand þ [0:ICI(i): T/ICI(i)]; t ¼ t (find(t > 0 & t < T)); Ts ¼ [Ts; t(:)]; As ¼ [As; t(:)*0 þ Ampl(i)]; end;

% plot the results subplot (511); for i ¼ 1:length (Ts), plot ([Ts(i) Ts(i)],[0 As(i)],‘bo-’); hold on end; hold off ax ¼ axis; % save graph axis for ich ¼ 1:nchain, subplot (5,1,ich þ 1); ii ¼ find(chain_nr ¼ ¼ ich); for i ¼ 1:length (ii), plot ([Ts(ii(i)) Ts(ii(i))], [0 As(ii(i))],‘bo-’); hold on end; hold off axis (ax); xlabel (‘Time, sec’); ylabel (‘Amplitude’); ylabel ([‘Chain’ num2str(ich)’ Ampl’]); end; return

% add ten spurious clicks for i ¼ 1:10, Ts ¼ [Ts; rand*T]; As ¼ [As; rand*6]; end; % sort in time-increasing order [Ts,is] ¼ sort(Ts); As ¼ As(is); % Compute pairing error based on amplitude ratio % and ICI Nd ¼ length (Ts); D ¼ 1e9*ones (Nd,Nd); ICIstd ¼ 0.2; ICImean ¼ 0.35; for i ¼ 1: Nd, for j ¼ i þ 1: Nd, D (i,j) ¼ abs(log(As(i)/As(j))) þ (Ts(j) - Ts(i) - ICImean)^2/2/ ICIstd^2; end; end; % solve LA problem C ¼ do_assignment (D,0.2, 1000); % unravel the chains. When finished, nchain will % be the number of chains and chain_nr(i) will % be the assignment of click i to a chain nchain ¼ 0; chain_nr ¼ zeros(Nd,1); for i ¼ 1:Nd, j ¼ find(C(i,1:Nd)); if (isempty(j)), if (chain_nr(i) && any (chain_nr(j))), chain_nr(j) ¼ chain_nr(i); else, if(chain_nr(i)¼¼0 && any (chain_nr(j))), nchain ¼ nchain þ 1;

2488

J. Acoust. Soc. Am., Vol. 135, No. 5, May 2014

1

R. Bahl and T. Ura, “Automatic real-time segregation and classification of multiple vocalizing sperm whales,” Seisan-Kenkyu Bimonthly J. Inst. Ind. Sci. 55, 61–64 (2003). 2 R. Bahl, T. Ura, and T. Fukuchi, “Techniques for segregation and classification of several vocalizing sperm whales for auv-based localization applications,” in Proceedings OCEANS 2003 (Institute of Electrical and Electronics Engineers, 445 Hoes Lane, Piscataway, NJ, 2003), Vol. 1, pp. 457–463. 3 P. M. Baggenstoss, “Joint localization and separation of sperm whale clicks,” Can. J. Acoust. 36, 125–131 (2008). 4 P. M. Baggenstoss, “Separation of sperm whale click-trains for multipath rejection,” J. Acoust. Soc. Am. 129, 3598–3609 (2011). 5 H. W. Kuhn, “The Hungarian method for the assignment problem,” Naval Res. Log. Q. 2, 83–97 (1955). 6 J. Munkres, “Algorithms for the assignment and transportation problems,” J. Soc. Ind. Appl. Math. 5, 32–38 (1957). 7 H. W. Kuhn, “The Hungarian method for the assignment problem and how Jacobi beat me by 100 years,” Concordia University Seminar (2006). 8 S. Martello, “Jeno Egervary: From the origins of the Hungarian algorithm to satellite communications,” Central European J. Op. Res. 18, 47–58 (2010). 9 R. Jonker and A. Volgenant, “A shortest augmenting path algorithm for dense and spare linear assignment problems,” Comput. 38, 325–340 (1987). 10 Y. Cao, “LAPJV—Jonker-Volgenant algorithm for linear assignment problem, v3.0,” Mathworks File Exchange, http://www.mathworks.com/ matlabcentral/fileexchange 26836 (2013).

Paul M. Baggenstoss: Letters to the Editor

Copyright of Journal of the Acoustical Society of America is the property of American Institute of Physics and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.

Multiobjective Genetic Algorithm applied to dengue control.

Improved bat algorithm applied to multilevel image thresholding.

ATria: a novel centrality algorithm applied to biological networks.

Cognitive state prediction using an EM algorithm applied to Gamma distributed data.

An advanced shape-fitting algorithm applied to quadrupedal mammals: improving volumetric mass estimates.

A New Missing Data Imputation Algorithm Applied to Electrical Data Loggers.

Improved PDR Integration Algorithm Applied to Dynamic Indoor Positioning for Pedestrians.

A quantum chemistry study for ionic liquids applied to gas capture and separation.

Evaluation of a metal artifacts reduction algorithm applied to postinterventional flat panel detector CT imaging.

Multisurface method of pattern separation for medical diagnosis applied to breast cytology.

An algorithm for separation of mixed sparse and Gaussian sources.

Windowed persistent homology: A topological signal processing algorithm applied to clinical obesity data.

Localized Ambient Solidity Separation Algorithm Based Computer User Segmentation.

Neural networks and differential evolution algorithm applied for modelling the depollution process of some gaseous streams.

The maximum separation cluster analysis algorithm for atom-probe tomography: parameter determination and accuracy.

The Camera Lucida applied to the Ophthalmoscope.

A Novel Algorithm for Movement Artifact Removal in ECG Signals Acquired from Wearable Systems Applied to Horses.

Separation of botulinum-positive and -negative fish samples by means of a pattern recognition method applied to headspace gas chromatograms.

Independent component analysis-based algorithm for automatic identification of Raman spectra applied to artistic pigments and pigment mixtures.

The Binet Tests Applied to Delinquent Girls.

The Binet Tests Applied to Colored Children.

The random phase approximation applied to ice.

Reverse pharmacology applied to the cannabinoid receptor.

Fever: thermodynamics applied to the leucocyte.