首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Theory predicts that regression discontinuity (RD) provides valid causal inference at the cutoff score that determines treatment assignment. One purpose of this paper is to test RD's internal validity across 15 studies. Each of them assesses the correspondence between causal estimates from an RD study and a randomized control trial (RCT) when the estimates are made at the same cutoff point where they should not differ asymptotically. However, statistical error, imperfect design implementation, and a plethora of different possible analysis options, mean that they might nonetheless differ. We test whether they do, assuming that the bias potential is greater with RDs than RCTs. A second purpose of this paper is to investigate the external validity of RD by exploring how the size of the bias estimates varies across the 15 studies, for they differ in their settings, interventions, analyses, and implementation details. Both Bayesian and frequentist meta‐analysis methods show that the RD bias is below 0.01 standard deviations on average, indicating RD's high internal validity. When the study‐specific estimates are shrunken to capitalize on the information the other studies provide, all the RD causal estimates fall within 0.07 standard deviations of their RCT counterparts, now indicating high external validity. With unshrunken estimates, the mean RD bias is still essentially zero, but the distribution of RD bias estimates is less tight, especially with smaller samples and when parametric RD analyses are used.  相似文献   

2.
The ability of nonexperimental estimators to match impact estimates derived from random assignment is examined using data from the evaluation of two interdistrict magnet schools. As in previous within‐study comparisons, nonexperimental estimates differ from estimates based on random assignment when nonexperimental estimators are implemented without pretreatment measures of academic performance. With comparison groups consisting of students drawn from the same districts or districts with similar student body characteristics as the districts where treatment group students reside, using pretreatment test scores reduces the bias in nonexperimental methods between 64 and 96 percent. Adding pretreatment test scores does not achieve as much bias reduction when the comparison group consists of students drawn from districts with different student body characteristics than the treatment group students’ districts. The results suggest that using pretreatment outcome measures and comparison groups that are geographically aligned with the treatment group greatly improves the performance of nonexperimental estimators.  相似文献   

3.
Evaluations of the impact of social programs are often carried out in multiple sites, such as school districts, housing authorities, local TANF offices, or One‐Stop Career Centers. Most evaluations select sites purposively following a process that is nonrandom. Unfortunately, purposive site selection can produce a sample of sites that is not representative of the population of interest for the program. In this paper, we propose a conceptual model of purposive site selection. We begin with the proposition that a purposive sample of sites can usefully be conceptualized as a random sample of sites from some well‐defined population, for which the sampling probabilities are unknown and vary across sites. This proposition allows us to derive a formal, yet intuitive, mathematical expression for the bias in the pooled impact estimate when sites are selected purposively. This formula helps us to better understand the consequences of selecting sites purposively, and the factors that contribute to the bias. Additional research is needed to obtain evidence on how large the bias tends to be in actual studies that select sites purposively, and to develop methods to increase the external validity of these studies. © 2012 by the Association for Public Policy Analysis and Management.  相似文献   

4.
This paper analyzes 12 recent within‐study comparisons contrasting causal estimates from a randomized experiment with those from an observational study sharing the same treatment group. The aim is to test whether different causal estimates result when a counterfactual group is formed, either with or without random assignment, and when statistical adjustments for selection are made in the group from which random assignment is absent. We identify three studies comparing experiments and regression‐discontinuity (RD) studies. They produce quite comparable causal estimates at points around the RD cutoff. We identify three other studies where the quasi‐experiment involves careful intact group matching on the pretest. Despite the logical possibility of hidden bias in this instance, all three cases also reproduce their experimental estimates, especially if the match is geographically local. We then identify two studies where the treatment and nonrandomized comparison groups manifestly differ at pretest but where the selection process into treatment is completely or very plausibly known. Here too, experimental results are recreated. Two of the remaining studies result in correspondent experimental and nonexperimental results under some circumstances but not others, while two others produce different experimental and nonexperimental estimates, though in each case the observational study was poorly designed and analyzed. Such evidence is more promising than what was achieved in past within‐study comparisons, most involving job training. Reasons for this difference are discussed. © 2008 by the Association for Public Policy Analysis and Management.  相似文献   

5.
The sharp regression discontinuity design (RDD) has three key weaknesses compared to the randomized clinical trial (RCT). It has lower statistical power, it is more dependent on statistical modeling assumptions, and its treatment effect estimates are limited to the narrow subpopulation of cases immediately around the cutoff, which is rarely of direct scientific or policy interest. This paper examines how adding an untreated comparison to the basic RDD structure can mitigate these three problems. In the example we present, pretest observations on the posttest outcome measure are used to form a comparison RDD function. To assess its performance as a supplement to the basic RDD, we designed a within‐study comparison that compares causal estimates and their standard errors for (1) the basic posttest‐only RDD, (2) a pretest‐supplemented RDD, and (3) an RCT chosen to serve as the causal benchmark. The two RDD designs are constructed from the RCT, and all analyses are replicated with three different assignment cutoffs in three American states. The results show that adding the pretest makes functional form assumptions more transparent. It also produces causal estimates that are more precise than in the posttest‐only RDD, but that are nonetheless larger than in the RCT. Neither RDD version shows much bias at the cutoff, and the pretest‐supplemented RDD produces causal effects in the region beyond the cutoff that are very similar to the RCT estimates for that same region. Thus, the pretest‐supplemented RDD improves on the standard RDD in multiple ways that bring causal estimates and their standard errors closer to those of an RCT, not just at the cutoff, but also away from it.  相似文献   

6.
Political scientists often cite the importance of mechanism‐specific causal knowledge, both for its intrinsic scientific value and as a necessity for informed policy. This article explains why two common inferential heuristics for mechanism‐specific (i.e., indirect) effects can provide misleading answers, such as sign reversals and false null results, even when linear regressions provide unbiased estimates of constituent effects. Additionally, this article demonstrates that the inferential difficulties associated with indirect effects can be ameliorated with the use of stratification, interaction terms, and the restriction of inference to subpopulations (e.g., the indirect effect on the treated). However, indirect effects are inherently not identifiable—even when randomized experiments are possible. The methodological discussion is illustrated using a study on the indirect effect of Islamic religious tradition on democracy scores (due to the subordination of women).  相似文献   

7.
This paper analyzes the effect of a change in the status of housing equity as a protected asset for Medicaid long‐term care payment eligibility. A difference‐in‐difference‐in‐differences strategy is employed to estimate the effect of the policy on the housing equity holdings of potentially treated individuals. Using a panel of unmarried homeowners, the policy induced treated individuals who were likely to require long‐term care to hold less housing equity by values of $82,000 to $193,000 relative to control individuals. This equates to relative reductions of 12 to 29 percent for treated individuals after the policy change. Similar effects are not observed when considering health measures less predictive of long‐term care services and for a sample of married households who were unlikely affected by the policy. These estimates confirm the importance of the housing asset as a shelter for Medicaid eligibility.  相似文献   

8.
While the popularity of using the item count technique (ICT) or list experiment to obtain estimates of attitudes and behaviors subject to social desirability bias has increased in recent years among political scientists, many of the empirical properties of the technique remain untested. In this paper, we explore whether estimates are biased due to the different list lengths provided to control and treatment groups rather than due to the substance of the treatment items. By using face-to-face survey data from national probability samples of households in Uruguay and Honduras, we assess how effective the ICT is in the context of face-to-face surveys—where social desirability bias should be strongest—and in developing contexts—where literacy rates raise questions about the capability of respondents to engage in cognitively taxing process required by ICT. We find little evidence that the ICT overestimates the incidence of behaviors and instead find that the ICT provides extremely conservative estimates of high incidence behaviors. Thus, the ICT may be more useful for detecting low prevalence attitudes and behaviors and may overstate social desirability bias when the technique is used for higher frequency socially desirable attitudes and behaviors. However, we do not find strong evidence of variance in deflationary effects across common demographic subgroups, suggesting that multivariate estimates using the ICT may not be biased.  相似文献   

9.
This paper uses meta‐analysis to investigate whether random assignment (or experimental) evaluations of voluntary government‐funded training programs for the disadvantaged have produced different conclusions than nonexperimental evaluations. Information includes several hundred estimates from 31 evaluations of 15 programs that operated between 1964 and 1998. The results suggest that experimental and nonexperimental evaluations yield similar conclusions about the effectiveness of training programs, but that estimates of average effects for youth and possibly men might have been larger in experimental studies. The results also suggest that variation among nonexprimental estimates of program effects is similar to variation among experimental estimates for men and youth, but not for women (for whom it seems to be larger), although small sample sizes make the estimated differences somewhat imprecise for all three groups. The policy implications of the findings are discussed. © 2006 by the Association for Public Policy Analysis and Management  相似文献   

10.
We use a field experiment to evaluate the impact of two informational get‐out‐the‐vote campaigns to boost female electoral participation in rural areas of Paraguay. We find that public rallies had a small and insignificant effect either on registration or voter turnout in the 2013 presidential elections. Households that received door‐to‐door canvassing treatment were 4.6 percentage points more likely to vote. Experimental variation on the intensity of the treatment at the locality level allows us to estimate spillover effects, which are present in localities that are geographically more concentrated, which may favor social interactions and diffusion of information. Reinforcement effects on the already treated population are twice as large as diffusion effects on the untreated. Our results underscore the importance of taking into account urbanization patterns when designing informational campaigns.  相似文献   

11.
In real‐world bureaucratic encounters the Weberian goal of perfect impersonal administration is not completely attained and unfairness sometimes results. Theories of bias attribute unfairness to social characteristics such as income, education, ethnicity, and gender. A random theory characterizes unfairness as the result of idiosyncratic conditions that give everyone an equal probability of being treated unfairly regardless of their social characteristics. In Latvia, bias would be expected on grounds of ethnicity as well as social characteristics, since its population is divided politically by citizenship, language, and ethnicity as well as socioeconomic characteristics. Survey data from the New Baltic Barometer shows that a majority of both Latvians and Russians expect fair treatment in bureaucratic encounters and multivariate statistical analysis confirms the random hypothesis. Insofar as unfair treatment occurs it tends to be distributed according to idiosyncratic circumstances rather than being the systematic fate of members of a particular social group. The evidence indicates that the professional norms and training of service deliverers are more important in bureaucratic encounters than individual attributes of claimants, even in a clearly divided society.  相似文献   

12.
A large literature has developed in which labor market contracts are used to estimate the value of a statistical life (VSL). Reported estimates of the VSL vary substantially, from less than $100,000 to more than $25 million. This research uses meta‐analysis to quantitatively assess the VSL literature. Results from existing studies are pooled to identify the systematic relationships between VSL estimates and each study's particular features, such as the sample composition and research methods. This meta‐analysis suggests that a VSL range of approximately $1.5 million to $2.5 million (in 1998 dollars) is what can be reasonably inferred from past labor‐market studies when “best practice” assumptions are invoked. This range is considerably below many previous qualitative reviews of this literature. © 2002 by the Association for Public Policy Analysis and Management.  相似文献   

13.
This article argues that a key step in King's iterative approachto R x C ecological inference problems—the aggregationof groups into broad conglomerate categories—can introduceproblems of aggregation bias and multimodality into data, inducingmodel violations. As a result, iterative EI estimates can beconsiderably biased, even when the original data conform tothe assumptions of the model. I demonstrate this problem intuitivelyand through simulations, show the conditions under which itis likely to arise, and illustrate it with the example of Colouredvoting during the 1994 elections in South Africa. I then proposean easy fix to the problem, demonstrating the usefulness ofthe fix both through simulations and in the specific South Africancontext.  相似文献   

14.
In principle, experiments offer a straightforward method for social scientists to accurately estimate causal effects. However, scholars often unwittingly distort treatment effect estimates by conditioning on variables that could be affected by their experimental manipulation. Typical examples include controlling for posttreatment variables in statistical models, eliminating observations based on posttreatment criteria, or subsetting the data based on posttreatment variables. Though these modeling choices are intended to address common problems encountered when conducting experiments, they can bias estimates of causal effects. Moreover, problems associated with conditioning on posttreatment variables remain largely unrecognized in the field, which we show frequently publishes experimental studies using these practices in our discipline's most prestigious journals. We demonstrate the severity of experimental posttreatment bias analytically and document the magnitude of the potential distortions it induces using visualizations and reanalyses of real‐world data. We conclude by providing applied researchers with recommendations for best practice.  相似文献   

15.
Most evaluations are still quasi‐experimental and most recent quasi‐experimental methodological research has focused on various types of propensity score matching to minimize conventional selection bias on observables. Although these methods create better‐matched treatment and comparison groups on observables, the issue of selection on unobservables still looms large. Thus, in the absence of being able to run randomized controlled trials (RCTs) or natural experiments, it is important to understand how well different regression‐based estimators perform in terms of minimizing pure selection bias, that is, selection on unobservables. We examine the relative magnitudes of three sources of pure selection bias: heterogeneous response bias, time‐invariant individual heterogeneity (fixed effects [FEs]), and intertemporal dependence (autoregressive process of order one [AR(1)]). Because the relative magnitude of each source of pure selection bias may vary in different policy contexts, it is important to understand how well different regression‐based estimators handle each source of selection bias. Expanding simulations that have their origins in the work of Heckman, LaLonde, and Smith ( 1999 ), we find that difference‐in‐differences (DID) using equidistant pre‐ and postperiods and FEs estimators are less biased and have smaller standard errors in estimating the Treatment on the Treated (TT) than other regression‐based estimators. Our data analysis using the Job Training Partnership Act (JTPA) program replicates our simulation findings in estimating the TT.  相似文献   

16.
Rather than exhibiting bias or open‐minded reasoning at baseline, we argue that information processing is motivated by whatever goals a context makes salient. Thus, if politics feels like debate, people will be motivated to argue for their side. If politics feels like deliberation, they will be motivated to seek consensus through open‐minded processing. Results from three experiments demonstrate: (1) Politics evokes thoughts similar to conflictual contexts and dissimilar from deliberative contexts. (2) Consequently, information labeled “political” primes the motivation to counterargue, leading to opinion polarization. Absent such labeling, no such motivation is evident, explaining why bias is common but not inherent to politics. (3) Despite this capacity for bias, people can be motivated to actively process and accept counterattitudinal information by simply making the value of open‐mindedness salient. This suggests open‐minded discourse is possible even absent motivation to evaluate information accurately. We conclude by discussing the implications of our research for political discourse.  相似文献   

17.
Randomized experiments provide unbiased estimates of treatment effects, but are costly and time consuming. We demonstrate how a randomized experiment can be leveraged to measure selection bias by conducting a subsequent observational study that is identical in every way except that subjects choose their treatment—a quasi‐doubly randomized preference trial (quasi‐DRPT). Researchers first strive to think of and measure all possible confounders and then determine how well these confounders as controls can reduce or eliminate selection bias. We use a quasi‐DRPT to study the effect of class time on student performance in an undergraduate introductory microeconomics course at a large public university, illustrating its required design elements: experimental and choice arms conducted in the same setting with identical interventions and measurements, and all confounders measured prospectively to treatment assignment or choice. Quasi‐DRPTs augment randomized experiments in real‐world settings where participants choose their treatments.  相似文献   

18.
With an unrepresentative sample, the estimate of a causal effect may fail to characterize how effects operate in the population of interest. What is less well understood is that conventional estimation practices for observational studies may produce the same problem even with a representative sample. Causal effects estimated via multiple regression differentially weight each unit's contribution. The “effective sample” that regression uses to generate the estimate may bear little resemblance to the population of interest, and the results may be nonrepresentative in a manner similar to what quasi‐experimental methods or experiments with convenience samples produce. There is no general external validity basis for preferring multiple regression on representative samples over quasi‐experimental or experimental methods. We show how to estimate the “multiple regression weights” that allow one to study the effective sample. We discuss alternative approaches that, under certain conditions, recover representative average causal effects. The requisite conditions cannot always be met.  相似文献   

19.
Many enduring questions in international relations theory focus on power relations, so it is important that scholars have a good measure of relative power. The standard measure of relative military power, the capability ratio, is barely better than random guessing at predicting militarized dispute outcomes. We use machine learning to build a superior proxy, the Dispute Outcome Expectations (DOE) score, from the same underlying data. Our measure is an order of magnitude better than the capability ratio at predicting dispute outcomes. We replicate Reed et al. (2008) and find, contrary to the original conclusions, that the probability of conflict is always highest when the state with the least benefits has a preponderance of power. In replications of 18 other dyadic analyses that use power as a control, we find that replacing the standard measure with DOE scores usually improves both in‐sample and out‐of‐sample goodness of fit.  相似文献   

20.
This article evaluates the feasibility of performing natural resource damage assessments under the current Superfund legislation. Using the analyses developed for two recent cases, it explains the sources of the substantial divergences between plaintiffs' and defendants' estimates of these damages. Three factors explain the differences in damage estimates: (1) the time horizon used and treatment of capitalization effects of past damages; (2) the extent of the market assumed in estimating the effects of a release of hazardous wastes on the demand for the affected natural resource; and (3) the character and availability of substitutes for the resource involved.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号