Original Article

Computationally Efficient Multidimensional Analysis of Complex Flow Cytometry Data Using Second Order Polynomial Histograms John Zaunders,1,2* Junmei Jing,3 Michael Leipold,4 Holden Maecker,4 Anthony D. Kelleher,1,2 Inge Koch5

1

St Vincent’s Centre for Applied Medical Research, St Vincent’s Hospital, Darlinghurst, New South Wales 2010, Australia

2

Kirby Institute, UNSW Australia, Kensington, New South Wales, 2052, Australia

3

Centre for Bioinformatics Science, Mathematical Science Institute, Australia National University, Canberra, Australian Capital Territory 2600, Australia

4

Institute for Immunity, Transplantation and Infection, Stanford University School of Medicine, Stanford, California 94305

5

School of Mathematical Sciences, University of Adelaide, South Australia 5005, Australia

Received 30 September 2015; Revised 7 April 2015; Accepted 18 May 2015 Grant sponsor: Grant sponsor: Australian National Health and Medical Research Council (NHMRC) Additional Supporting Information may be found in the online version of this article. *Correspondence to: John Zaunders, Centre for Applied Medical Research, St Vincent’s Hospital, 405 Liverpool St, Darlinghurst, NSW 2010 Australia. E-mail: [email protected] Published online 11 June 2015 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/cyto.a.22704 C 2015 International Society for V

Advancement of Cytometry

Cytometry Part A  89A: 4458, 2016

 Abstract Many methods have been described for automated clustering analysis of complex flow cytometry data, but so far the goal to efficiently estimate multivariate densities and their modes for a moderate number of dimensions and potentially millions of data points has not been attained. We have devised a novel approach to describing modes using second order polynomial histogram estimators (SOPHE). The method divides the data into multivariate bins and determines the shape of the data in each bin based on second order polynomials, which is an efficient computation. These calculations yield local maxima and allow joining of adjacent bins to identify clusters. The use of second order polynomials also optimally uses wide bins, such that in most cases each parameter (dimension) need only be divided into 4–8 bins, again reducing computational load. We have validated this method using defined mixtures of up to 17 fluorescent beads in 16 dimensions, correctly identifying all populations in data files of 100,000 beads in

Computationally efficient multidimensional analysis of complex flow cytometry data using second order polynomial histograms.

Many methods have been described for automated clustering analysis of complex flow cytometry data, but so far the goal to efficiently estimate multiva...
1MB Sizes 0 Downloads 12 Views