Current Status: Working Paper
Winner's Curse Drives False Promises in Data-Driven Decisions: A Case Study in Refugee Matching
with Hamsa Bastani and Osbert Bastani
Abstract: A major challenge in data-driven decision-making is accurate policy evaluation-i.e., guaranteeing that a learned decision-making policy achieves the promised benefits. A popular strategy is model-based policy evaluation, which estimates a model from data to infer counterfactual outcomes. This strategy is known to produce unwarrantedly optimistic estimates of the true benefit due to the winner's curse. We searched the recent literature on data-driven decision-making, identifying a sample of 55 papers published in the Management Science in the past decade; all but two relied on this flawed methodology. Several common justifications are provided: (1) the estimated models are accurate, stable, and well-calibrated, (2) the historical data uses random treatment assignment, (3) the model family is well-specified, and (4) the evaluation methodology uses sample splitting. Unfortunately, we show that no combination of these justifications avoids the winner's curse. First, we provide a theoretical analysis demonstrating that the winner's curse can cause large, spurious reported benefits even when all these justifications hold. Second, we perform a simulation study based on the recent and consequential data-driven refugee matching problem. We construct a synthetic refugee matching environment (calibrated to closely match the real setting) but designed so that no assignment policy can improve expected employment compared to random assignment. Model-based methods report large, stable gains of around 60% even when the true effect is zero; these gains are on par with improvements of 22-75% reported in the literature. Our results provide strong evidence against model-based evaluation.
Current Status: Working Paper
Beating the Winner’s Curse via Inference-Aware Policy Optimization
with Hamsa Bastani and Osbert Bastani
Abstract: There has been a surge of recent interest in automatically learning policies to target treatment decisions based on rich individual covariates. In addition, practitioners want confidence that the learned policy has better performance than the incumbent policy according to downstream policy evaluation. However, due to the winner's curse -- an issue where the policy optimization procedure exploits prediction errors rather than finding actual improvements -- predicted performance improvements are often not substantiated by downstream policy evaluation. To address this challenge, we propose a novel strategy called inference-aware policy optimization, which modifies policy optimization to account for how the policy will be evaluated downstream. Specifically, it optimizes not only for the estimated objective value, but also for the chances that the estimate of the policy's improvement passes a significance test during downstream policy evaluation. We mathematically characterize the Pareto frontier of policies according to the tradeoff of these two goals. Based on our characterization, we design a policy optimization algorithm that estimates the Pareto frontier using machine learning models; then, the decision-maker can select the policy that optimizes their desired tradeoff, after which policy evaluation can be performed on the test set as usual. Finally, we perform simulations to illustrate the effectiveness of our methodology.
Current Status: Working Paper
Additional Material: Link to preanalysis plan
Designing Algorithmic Recommendations to Achieve Human–AI Complementarity
with Jann Spiess
Abstract: Algorithms often assist, rather than replace, human decision-makers. However, these algorithms typically address the problem the decision-maker faces without modeling how their outputs cause the human to take different decisions. This discrepancy between the design and role algorithmic assistants becomes particularly apparent in light of empirical evidence that suggests algorithmic assistants often fail to improve human decisions. In this article, we formalize the design of recommendation algorithms which assist human decision-makers without making restrictive assumptions about how they use these recommendations. We formulate an algorithmic design problem that leverages the potential-outcomes framework from causal inference to model the effect of recommendations on a human’s binary treatment choice. We introduce a monotonicity assumption that gives intuitive structure to the feasible responses the human could have to the recommendation. Under this monotonicty assumption, we can express the human’s response to an algorithmic recommendations in terms of their compliance with the algorithm and the decision they would take if unassisted, both of which can be estimated from the human’s decision data. We showcase our framework using data from an online hiring experiment to explain why subjects that received a recommendation which complemented the structure of their private information outperformed counterparts who received the optimal decision algorithm as a recommendation.
Current Status: Working Paper
(previously accepted, presented, and published [as extended abstract] at EC’23)
(Minor Revision at Management Science)
Algorithmic Assistance with Recommendation-Dependent Preferences
with Jann Spiess
Abstract: When we use algorithms to produce risk assessments, we typically think of these predictions as providing helpful input to human decisions, such as when risk scores are presented to judges or doctors. But when a decision-maker obtains algorithmic assistance, they may not only react to the information. The decision-maker may view the input of the algorithm as recommending a default action, making it costly for them to deviate, such as when a judge is reluctant to overrule a high-risk assessment of a defendant or a doctor fears the consequences of deviating from recommended procedures. In this article, we consider the effect and design of algorithmic recommendations when they affect choices not just by shifting beliefs, but also by altering preferences. We motivate our model from institutional factors, such as a desire to avoid audits, as well as from well-established models in behavioral science that predict loss aversion relative to a reference point, which here is set by the algorithm. We show that recommendation-dependent preferences create inefficiencies where the decision-maker is overly responsive to the recommendation. As a potential remedy, we discuss algorithms that strategically withhold recommendations, and show how they can improve the quality of final decisions.
Current Status: Working Paper
(previously accepted, presented, and published [as extended abstract] at FAccT’22)
Additional Material: Link to preanalyis plan
On the Fairness of Machine-Assisted Human Decisions
with Talia Gillis and Jann Spiess
Abstract: When machine-learning algorithms are deployed in high-stakes decisions, we want to ensure that their deployment leads to fair and equitable outcomes. This concern has motivated a fast-growing literature that focuses on diagnosing and addressing disparities in machine predictions. However, many machine predictions are deployed to assist in decisions where a human decision-maker retains the ultimate decision authority. In this article, we therefore consider in a formal model and in a lab experiment how properties of machine predictions affect the resulting human decisions. In our formal model of statistical decision-making, we show that the inclusion of a biased human decision-maker can revert common relationships between the structure of the algorithm and the qualities of resulting decisions. Specifically, we document that excluding information about protected groups from the prediction may fail to reduce, and may even increase, ultimate disparities. In the lab experiment, we demonstrate how predictions informed by gender-specific information can reduce average gender disparities in decisions. While our concrete theoretical results rely on specific assumptions about the data, algorithm, and decision-maker, and the experiment focuses on a particular prediction task, our findings show more broadly that any study of critical properties of complex decision systems, such as the fairness of machine-assisted human decisions, should go beyond focusing on the underlying algorithmic predictions in isolation.
Current Status: Published by The Electronic Journal of Combinatorics in 2022 link
On Distinct Distances Between a Variety and a Point Set
with Mohamed Omar
Abstract: We consider the problem of determining the number of distinct distances between two point sets in the real plane where one point set of size m lies on a real algebraic curve of fixed degree r, and the other point set of size n is arbitrary. We generalize lower bounds formulated Pohoata and Sheffer to a much looser set of restrictions on the point set arrangement. This complements the work of Pach and de Zeeuw.