It's distributions all the way down!: second order changes in statistical distributions also occur.

Commentary/Bentley et al.: Mapping collective behavior when we move to the south ofthe map. Thus, while Bentley et al. consider the south-east of tlieir map as a region of "herding" that will not reach any efficient outcome if eveiybody is just following the others, we tliink that the south-east, if interpreted as a region with very ambiguous payoffs but where players are strongly tied, can lead to more effieiency than the norili. ACKNOWLEDGMENT Funding through tlie Agence Nationale de la Recherche (ANR: 010 JCJC 1803 01) is gratefully acknowledged.

It's distributions all the way down!: Second order changes in statistical distributions also occur doi:10.1017/S0140525X13001763 Mark T. Keane^ and Aaron Gerow" '^School of Computer Science and Informatics, University College Dublin, Beifieid, Dubiin 4, ireiand; ''Schooi of Computer Science and Statistics, University of Dubiin, Trinity College, College Green, Dublin 2, Ireiand. [email protected] [email protected] www.csi.ucd.ie/users/mark-keane www.scss.tcd.ie/~gerowa

Abstract: The textual, big-data literature misses Bentley et al.'s message on distributions; it largely examines tlie first-order effects of how a single, signature distribution can predict population behaviour, neglecting second-order effects involving distrihntíonal shifts, either between signature distributions or within a given signature distribution. Indeed, Bendey et al. themselves under-emphasise the potential richness of tlie latter, within-distribution effects.

It has been reported, possibly as apocrypha, that when South Sea islanders were asked to explain their cosmology about what supported the World Turtle (who supported the world), one bright spark replied: "It's turtles all tlie way down!" The take-home message from the target article is analogously: "It's distributions all the way down!" Though the message may seem obvious, it is interesting to note diat most ofthe current textual, big-data literature either doesn't get it or fails to appreciate its richness. The hon's share ofthe textual, big-data literature (that analyses, e.g., words, sentiment terms, tweet tags, phrases, articles, blogs) really concentrates on what could be called first-order effects, rather tlian on second-order effects. First-order effects are changes in a single, statistical distribution of some measured text-unit (word, artiele, sentiment term, tag) - usually from some large, unstruetured, online dataset - that can act as a predictive proxy for some population-level behaviour. So, for instance, Michel et al. (2011) show how relative word frequencies in Google Books can refiect cultural trends; others have shown diat distributions of Google search-terms can predict the spread of fiu (Ginsberg et al. 2009) and motor sales (Ghoi & Varian 2012), and the numbers of news articles and of tweet rates can both be used to predict movie box-offiee revenues (Asur & Huberman 2010). In these cases, the actual distribution of the text data is directly refieeted in some population behaviour or decision. Second-order distributional effects are changes across several, separate distributions that can predict population-level behaviour; usually, where these separate distributions span different time periods. Big-data research on such effects is a lot less common; to put it succinctly, most of the work focuses on eitlier the wisdom of crowds (Surowiecki 2005) or herd-like behaviour (Huang & Ghen 2006) but not qn how a wise crowd becomes a stupid herd (for rare exceptions, see Bentley & Ormerod 2010; Bentley et al. 2012; Onnela & Reed-Tsochas 2010; Salganik et al. 2006). Bendey et al. recognise the importance of such second-order effects in stressing that population decision-

making may shift from one signature distribution to another. However, we believe that there is more to be said about such second-order effects. For instance, seeond-order effects can be further divided into between-distribution and within-distribution effects. Between-distribution effects concem die types of changes, mentioned by Bentley et al., where there is change from one signature distribution to another over time (movement between the quadrants of their map): for example, from the Gaussian distribution of the wisdom of erowds to the power-law distribution of herd behaviour. In physical phenomena, such distributional changes are often cast as phase transitions; for example, when water tiims to ice or when a heated iron bar becomes magnetised (see Barabasi & Gargos 2002). Bendey et al. raise the prospect that there are a host of similar phase transitions in human population behaviour, transitions that can be tracked and predieted by mining various kinds of big data. We find this prospeet very exciting, given the notable gap in die big-data literature on such effects. Bentley et al. say a lot less about within-distribution effects, the systematie ehanges that may occur in die propeiües of a given signature distribution from one time-period to the next; for example, the specific properties of separate power-law distributions of some measured item might change systematically over time. Research on such widiin-distribution effects in textual, big-data research is even rarer than that on between-distribution effects - one of diese raiities being our own work on stock-market trends (Gerow & Keane 2011). In tMs study (Gerow & Keane 2011), we found systematic changes in die power-law distributions of the language in finance articles (from the BBG, the Nexo York Times, and the Financial Times) duiing the emergence of the 2007 stock-market bubble and crash, based on a coipus analysis of 18,000 onBne news articles over a 4-yeai- period (10M+ words). Specifically, we foimd that week-to-week changes in tlie power-law distributions of verb-phrases correlated strongly with market movements of the major indiees; for example, the Dow Jones Index correlated (r=0.79) with changes in the 8-week-windowed, geometric mean of die alpha terms of the power laws. These within-distribution ehanges showed that, week-on-week, journalists were using a progressively narrower set of words to describe the maifct, reporting on the same small set of companies using the same overwhelmingly positive language. So, as the agreement between journalists increases, the power-law distributions ehange systematically in ways that track population decisions to buy stocks. In Bendey et al.'s map, this analysis is about tracking movement within die southeast quadrant. Naturally, we also find the prospect of such within-distribution effects very exciting, especially given the lack of attention to them thus far. So, it really is "Distributions all the way down!" Gurrent research is only scraping die siufaee of die possible distributional effeets that may be mined from various online sources. Bendey et al. have provided a framework for thinking about such second-order effects, especially of the between-distribution kind, but diere is also a whole slew of within-distribution phenomena yet to be explored.

Keeping conceptual boundaries distinct between decision making and learning is necessary to understand social influence doi:10.1017/S0140525X13001775 Gael Le Mens Department of Economics and Business, Universität Pompeu Fabra, 08005 Barceiona, Spain. [email protected] http://www.econ.upf.edu/~giemens/

Abstract: Bentley et al. make the deliberate choice to binr the distinction between learning and decision making. This obscures the social influence BEHAVIORAL AND BRAIN SCIENCES (2014) 37:1

87

Copyright of Behavioral & Brain Sciences is the property of Cambridge University Press and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.

Species abundance distributions, statistical mechanics and the priors of MaxEnt.

Correction: Distributions of Autocorrelated First-Order Kinetic Outcomes: Illness Severity.

Distributions of Autocorrelated First-Order Kinetic Outcomes: Illness Severity.

New Method for Measuring Statistical Distributions of Partial Discharge Pulses.

Normal distributions.

The universal statistical distributions of the affinity, equilibrium constants, kinetics and specificity in biomolecular recognition.

Sampling over Nonuniform Distributions: A Neural Efficiency Account of the Primacy Effect in Statistical Learning.

The S-transform of distributions.

Sampling distributions and the bootstrap.

Rapidity distributions in Drell-Yan and Higgs productions at threshold to third order in QCD.

A statistical approach for identifying differential distributions in single-cell RNA-seq experiments.

Length distributions in loop soups.

Generalised extreme value distributions provide a natural hypothesis for the shape of seed mass distributions.

Factor analysis and score distributions of the HIP--replication by a second examiner.

Statistical analyses of counts and distributions of restriction sites in DNA sequences.

Dose distributions in regions containing beta sources: irregularly shaped source distributions in homogeneous media.

A statistical approach to quantification of genetically modified organisms (GMO) using frequency distributions.

4 determination of dose distributions.

On the asymmetry of biological frequency distributions.

Patchy distributions: Optimising sample size.

Moving forward with species distributions.

Skewed distributions and parametric tests.

Methodology-specific Rayleigh-match distributions.

Wandering distributions and the electrophoretic profile.