J Med Syst (2015) 39: 44 DOI 10.1007/s10916-015-0232-4

SYSTEMS-LEVEL QUALITY IMPROVEMENT

Metadata from Data: Identifying Holidays from Anesthesia Data Joseph R. Starnes & Jonathan P. Wanderer & Jesse M. Ehrenfeld

Received: 26 December 2014 / Accepted: 11 February 2015 / Published online: 3 March 2015 # Springer Science+Business Media New York 2015

Abstract The increasingly large databases available to researchers necessitate high-quality metadata that is not always available. We describe a method for generating this metadata independently. Cluster analysis and expectation-maximization were used to separate days into holidays/weekends and regular workdays using anesthesia data from Vanderbilt University Medical Center from 2004 to 2014. This classification was then used to describe differences between the two sets of days over time. We evaluated 3802 days and correctly categorized 3797 based on anesthesia case time (representing an error rate of 0.13 %). Use of other metrics for categorization, such as billed anesthesia hours and number of anesthesia cases per day, led to similar results. Analysis of the two categories showed that surgical volume increased more quickly with time for non-holidays than holidays (p0.5) was assumed to be correct without consideration of how high the calculated probability was. The cluster having the smaller values according to each metric was deemed to be the holiday group. For some analyses, linear trend was removed from the dataset. To do this, a least squares linear regression was used to create a fit for aggregated daily data. The residual of this regression was then used in place of the raw aggregates. To compare regressions between non-holidays and holidays/ weekends, analysis of covariance (ANCOVA) was used.

Data source For analysis we used a de-identified dataset containing all anesthesia cases at VUMC between January 1st, 2004 and May 29th, 2014. This set included 725,144 individual cases covering 3802 days. The average number of cases per day was 190.7, and the average length of a case was 1.83 h. We excluded cases with a duration greater than 24 h, as they were most likely caused by data entry errors. Cases that were missing times necessary for calculation of case length, representing 7.66 % of cases, were coded as lasting zero minutes. This missing data did not significantly affect the ability of the method to classify days. Data analysis R—supplemented with the eXtensible Time Series (xts) [12], Normal Mixture Modeling for Model-Based Clustering,

Results All three metrics—case time, number of cases, and billed anesthesia time—aggregated by day created histograms with two easily distinguishable groups (Fig. 1). At least bimodality was shown for each using Hartigan’s dip statistic (D≈ 1, p

Metadata from data: identifying holidays from anesthesia data.

The increasingly large databases available to researchers necessitate high-quality metadata that is not always available. We describe a method for gen...
443KB Sizes 0 Downloads 10 Views