It's distributions all the way down!: Second order changes in statistical distributions also occur doi:10.1017/S0140525X13001763 Mark T. Keane^ and Aaron Gerow" '^School of Computer Science and Informatics, University College Dublin, Beifieid, Dubiin 4, ireiand; ''Schooi of Computer Science and Statistics, University of Dubiin, Trinity College, College Green, Dublin 2, Ireiand. [email protected] [email protected] www.csi.ucd.ie/users/mark-keane www.scss.tcd.ie/~gerowa

Abstract: The textual, big-data literature misses Bentley et al.'s message on distributions; it largely examines tlie first-order effects of how a single, signature distribution can predict population behaviour, neglecting second-order effects involving distrihntíonal shifts, either between signature distributions or within a given signature distribution. Indeed, Bendey et al. themselves under-emphasise the potential richness of tlie latter, within-distribution effects.

It has been reported, possibly as apocrypha, that when South Sea islanders were asked to explain their cosmology about what supported the World Turtle (who supported the world), one bright spark replied: "It's turtles all tlie way down!" The take-home message from the target article is analogously: "It's distributions all the way down!" Though the message may seem obvious, it is interesting to note diat most ofthe current textual, big-data literature either doesn't get it or fails to appreciate its richness. The hon's share ofthe textual, big-data literature (that analyses, e.g., words, sentiment terms, tweet tags, phrases, articles, blogs) really concentrates on what could be called first-order effects, rather tlian on second-order effects. First-order effects are changes in a single, statistical distribution of some measured text-unit (word, artiele, sentiment term, tag) - usually from some large, unstruetured, online dataset - that can act as a predictive proxy for some population-level behaviour. So, for instance, Michel et al. (2011) show how relative word frequencies in Google Books can refiect cultural trends; others have shown diat distributions of Google search-terms can predict the spread of fiu (Ginsberg et al. 2009) and motor sales (Ghoi & Varian 2012), and the numbers of news articles and of tweet rates can both be used to predict movie box-offiee revenues (Asur & Huberman 2010). In these cases, the actual distribution of the text data is directly refieeted in some population behaviour or decision. Second-order distributional effects are changes across several, separate distributions that can predict population-level behaviour; usually, where these separate distributions span different time periods. Big-data research on such effects is a lot less common; to put it succinctly, most of the work focuses on eitlier the wisdom of crowds (Surowiecki 2005) or herd-like behaviour (Huang & Ghen 2006) but not qn how a wise crowd becomes a stupid herd (for rare exceptions, see Bentley & Ormerod 2010; Bentley et al. 2012; Onnela & Reed-Tsochas 2010; Salganik et al. 2006). Bendey et al. recognise the importance of such second-order effects in stressing that population decision-

making may shift from one signature distribution to another. However, we believe that there is more to be said about such second-order effects. For instance, seeond-order effects can be further divided into between-distribution and within-distribution effects. Between-distribution effects concem die types of changes, mentioned by Bentley et al., where there is change from one signature distribution to another over time (movement between the quadrants of their map): for example, from the Gaussian distribution of the wisdom of erowds to the power-law distribution of herd behaviour. In physical phenomena, such distributional changes are often cast as phase transitions; for example, when water tiims to ice or when a heated iron bar becomes magnetised (see Barabasi & Gargos 2002). Bendey et al. raise the prospect that there are a host of similar phase transitions in human population behaviour, transitions that can be tracked and predieted by mining various kinds of big data. We find this prospeet very exciting, given the notable gap in die big-data literature on such effects. Bentley et al. say a lot less about within-distribution effects, the systematie ehanges that may occur in die propeiües of a given signature distribution from one time-period to the next; for example, the specific properties of separate power-law distributions of some measured item might change systematically over time. Research on such widiin-distribution effects in textual, big-data research is even rarer than that on between-distribution effects - one of diese raiities being our own work on stock-market trends (Gerow & Keane 2011). In tMs study (Gerow & Keane 2011), we found systematic changes in die power-law distributions of the language in finance articles (from the BBG, the Nexo York Times, and the Financial Times) duiing the emergence of the 2007 stock-market bubble and crash, based on a coipus analysis of 18,000 onBne news articles over a 4-yeai- period (10M+ words). Specifically, we foimd that week-to-week changes in tlie power-law distributions of verb-phrases correlated strongly with market movements of the major indiees; for example, the Dow Jones Index correlated (r=0.79) with changes in the 8-week-windowed, geometric mean of die alpha terms of the power laws. These within-distribution ehanges showed that, week-on-week, journalists were using a progressively narrower set of words to describe the maifct, reporting on the same small set of companies using the same overwhelmingly positive language. So, as the agreement between journalists increases, the power-law distributions ehange systematically in ways that track population decisions to buy stocks. In Bendey et al.'s map, this analysis is about tracking movement within die southeast quadrant. Naturally, we also find the prospect of such within-distribution effects very exciting, especially given the lack of attention to them thus far. So, it really is "Distributions all the way down!" Gurrent research is only scraping die siufaee of die possible distributional effeets that may be mined from various online sources. Bendey et al. have provided a framework for thinking about such second-order effects, especially of the between-distribution kind, but diere is also a whole slew of within-distribution phenomena yet to be explored.

Keeping conceptual boundaries distinct between decision making and learning is necessary to understand social influence doi:10.1017/S0140525X13001775 Gael Le Mens Department of Economics and Business, Universität Pompeu Fabra, 08005 Barceiona, Spain. [email protected] http://www.econ.upf.edu/~giemens/

Abstract: Bentley et al. make the deliberate choice to binr the distinction between learning and decision making. This obscures the social influence BEHAVIORAL AND BRAIN SCIENCES (2014) 37:1

87

Copyright of Behavioral & Brain Sciences is the property of Cambridge University Press and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.

The textual, big-data literature misses Bentley et al.'s message on distributions; it largely examines the first-order effects of how a single, signat...

1MB Sizes
0 Downloads
3 Views

Species abundance distributions, statistical mechanics and the priors of MaxEnt.

The methods of Maximum Entropy have been deployed for some years to address the problem of species abundance distributions. In this approach, it is important to identify the correct weighting factors, or priors, to be applied before maximising the en

Correction: Distributions of Autocorrelated First-Order Kinetic Outcomes: Illness Severity.

[This corrects the article DOI: 10.1371/journal.pone.0129042.].

Distributions of Autocorrelated First-Order Kinetic Outcomes: Illness Severity.

Many complex systems produce outcomes having recurring, power law-like distributions over wide ranges. However, the form necessarily breaks down at extremes, whereas the Weibull distribution has been demonstrated over the full observed range. Here th

New Method for Measuring Statistical Distributions of Partial Discharge Pulses.

A new digital detection system is described for measuring pulsating partial discharges (PDs). The PD detection system can continuously record all PD pulses that occur over extended periods of time, with a minimum inter-pulse time separation of 6 μs a

The universal statistical distributions of the affinity, equilibrium constants, kinetics and specificity in biomolecular recognition.

We uncovered the universal statistical laws for the biomolecular recognition/binding process. We quantified the statistical energy landscapes for binding, from which we can characterize the distributions of the binding free energy (affinity), the equ

Sampling over Nonuniform Distributions: A Neural Efficiency Account of the Primacy Effect in Statistical Learning.

Successful knowledge acquisition requires a cognitive system that is both sensitive to statistical information and able to distinguish among multiple structures (i.e., to detect pattern shifts and form distinct representations). Extensive behavioral

The S-transform of distributions.

Parseval's formula and inversion formula for the S-transform are given. A relation between the S-transform and pseudodifferential operators is obtained. The S-transform is studied on the spaces S(ℝn) and S'(ℝn).

Rapidity distributions in Drell-Yan and Higgs productions at threshold to third order in QCD.

We present the threshold N(3)LO perturbative QCD corrections to the rapidity distributions of dileptons in the Drell-Yan process and Higgs boson in gluon fusion. Sudakov resummation of QCD amplitudes, renormalization group invariance, and the mass fa

A statistical approach for identifying differential distributions in single-cell RNA-seq experiments.

The ability to quantify cellular heterogeneity is a major advantage of single-cell technologies. However, statistical methods often treat cellular heterogeneity as a nuisance. We present a novel method to characterize differences in expression in the

Copyright © 2019 DOCKSCI.COM. All rights reserved. | Design by w3layouts