Chris Stucchio

Boosting as a scheme for transfer learning

Here's a scenario that I believe to be common. I've got a dataset I've been collecting over time, with features \(x_1, \ldots, x_m\) This dataset will generally represent decisions I want to make at a certain time. This data is not a timeseries, it's just data I happen to have …

more ...

Calibrating a classifier when the base rate changes

In a previous job, I built a machine learning system to detect financial fraud. Fraud was a big problem at the time - for simplicity of having nice round numbers, suppose 10% of attempted transactions were fraudulent. My machine learning system worked great - as a further set of made-up round numbers …

more ...

Shareholder Short-Termism Theory has Died of COVID-19

It's become a popular meme that "shareholders only care about the next quarter". Lots of people make arguments like this - for example, Jamie Dimon and Warren Buffet. As the meme goes, shareholders only care about the next quarter of earnings, and CEOs make decisions accordingly - sacrificing long term profitability to …

more ...

Scalably Detecting Odd-looking Histograms

A lot of suspicious behavior can be detected simply by looking at a histogram. Here's a nice example. There's a paper Distributions of p-values smaller than .05 in Psychology: What is going on? which attempts to characterize the level of data manipulation performed in academic psychology. Now under normal circumstances …

more ...

Isotonic: A Python package for doing fancier versions of isotonic regression

Frequently in data science, we have a relationship between X and y where (probabilistically) y increases as X does. The relationship is often not linear, but rather reflects something more complex. Here's an example of a relationship like this:

In this plot of synthetic we have a non-linear but increasing …

more ...

Cost Matters: Why Lambda School should have a lower success rate than college

Lambda School has recently come under fire by the mainstream media for having success rates smaller than 100%, as well as for having a founder who is a nerd. The articles imply that Lambda School is somehow ripping off it's students - possibly by, um, tricking hedge funds into paying for …

more ...

Notes on setting up a Data Science app on Azure

I have recently been working on setting up a trading strategy and running it in the cloud. Although I haven't used Azure before, I wanted to try it out - some of the data science features that Microsoft advertises look pretty nice. This post is not of general interest, and most …

more ...

Backtest your SQL queries - they are models too

I was recently discussing a project with a younger data scientist and I noticed a curious mismatch between our language. We had an API that we wanted to impose rate limits on. We want to ensure that 99% of our good customers have a good experience and never hit the …

more ...

The Final Stage of Grief (about bad data) is Acceptance

I recently gave a talk at the Fifth Elephant 2019. The talk was a discussion about how to use math to handle unfixably bad data. The slides are available here.. Go check it out.

more ...

Don't believe the hype: Basic Income reduces labor supply by 10%, which is a lot

With Andrew Yang's presidential candidacy moving forward, people are discussing basic income again. One common meme about a Basic Income is that by removing the implicit high marginal tax rates that arise from the withdrawal of welfare benefits, disincentives for labor would be reduced and therefore a Basic Income would …

more ...