John Cochrane soltou um post bacana sobre replicação em economia. Vale a pena conferir.
On replication in economics. Just in time for bar-room discussions at the annual meetings.
“I have a truly marvelous demonstration of this proposition which this margin is too narrow to contain.” -Fermat
“I have a truly marvelous regression result, but I can’t show you the data and won’t even show you the computer program that produced the result” – Typical paper in economics and finance.
Science demands transparency. Yet much research in economics and finance uses secret data. The journals publish results and conclusions, but the data and sometimes even the programs are not available for review or inspection. Replication, even just checking what the author(s) did given their data, is getting harder.
Quite often, when one digs in, empirical results are nowhere near as strong as the papers make them out to be.
- Simple coding errors are not unknown. Reinhart and Rogoff are a famous example — which only came to light because they were honest and ethical and posted their data.
- There are data errors.
- Many results are driven by one or two observations, which at least tempers the interpretation of the results. Often a simple plot of the data, not provided in the paper, reveals that fact.
- Standard error computation is a dark art, producing 2.11 t statistics and the requisite two or three stars suspiciously often.
- Small changes in sample period or specification destroy many “facts.”
- Many regressions involve a large set of extra right hand variables, with no strong reason for inclusion or exclusion, and the fact is often quite sensitive to those choices. Just which instruments you use and how to transform variables changes results.
- Many large-data papers difference, difference differences, add dozens of controls and fixed effects, and so forth, throwing out most of the variation in the data in the admirable quest for cause-and-effect interpretability. Alas, that procedure can load the results up on measurement errors, or slightly different and equally plausible variations can produce very different results.
- There is often a lot of ambiguity in how to define variables, which proxies to use, which data series to use, and so forth, and equally plausible variations change the results.
I have seen many examples of these problems, in papers published in top journals. Many facts that you think are facts are not facts. Yet as more and more papers use secret data, it’s getting harder and harder to know.
The solution is pretty obvious: to be considered peer-reviewed “scientific” research, authors should post their programs and data. If the world cannot see your lab methods, you have an anecdote, an undocumented claim, you don’t have research. An empirical paper without data and programs is like a theoretical paper without proofs.