Logistic Regression in Rare Events Data |
| |
Authors: | King, Gary Zeng, Langche |
| |
Affiliation: | Center for Basic Research in the Social Sciences, 34 Kirkland Street, Harvard University Cambridge, MA 02138 e-mail: King{at}Harvard.Edu http://GKing.Harvard.Edu Department of Political Science, George Washington University Funger Hall, 2201 G Street NW, Washington, DC 20052 e-mail: lzeng{at}gwu.edu |
| |
Abstract: | We study rare events data, binary dependent variables with dozensto thousands of times fewer ones (events, such as wars, vetoes,cases of political activism, or epidemiological infections)than zeros ("nonevents"). In many literatures, these variableshave proven difficult to explain and predict, a problem thatseems to have at least two sources. First, popular statisticalprocedures, such as logistic regression, can sharply underestimatethe probability of rare events. We recommend corrections thatoutperform existing methods and change the estimates of absoluteand relative risks by as much as some estimated effects reportedin the literature. Second, commonly used data collection strategiesare grossly inefficient for rare events data. The fear of collectingdata with too few events has led to data collections with hugenumbers of observations but relatively few, and poorly measured,explanatory variables, such as in international conflict datawith more than a quarter-million dyads, only a few of whichare at war. As it turns out, more efficient sampling designsexist for making valid inferences, such as sampling all availableevents (e.g., wars) and a tiny fraction of nonevents (peace).This enables scholars to save as much as 99% of their (nonfixed)data collection costs or to collect much more meaningful explanatoryvariables. We provide methods that link these two results, enablingboth types of corrections to work simultaneously, and softwarethat implements the methods developed. |
| |
Keywords: | |
本文献已被 Oxford 等数据库收录! |
|