Selected blog posts

Posts listed on the homepage are the more popular or interesting ones. Click here for the chronological blog listing.

Boosting as a scheme for transfer learning

Here's a scenario that I believe to be common. I've got a dataset I've been collecting over time, with features \(x_1, \ldots, x_m\) This dataset will generally represent decisions I want to make at a certain time. This data is not a timeseries, it's just data I happen to have …

more ...

Calibrating a classifier when the base rate changes

In a previous job, I built a machine learning system to detect financial fraud. Fraud was a big problem at the time - for simplicity of having nice round numbers, suppose 10% of attempted transactions were fraudulent. My machine learning system worked great - as a further set of made-up round numbers …

more ...







Wingify releases Bayesian A/B tester

I've written a number of posts here about a/b testing, and readers have probably observed that I favor the Bayesian approach. I'm very happy to announce that Wingify (my employer) has release SmartStats - a fully Bayesian A/B testing engine. I've always maintained that you should A/B test …

more ...

Don't use Hadoop - your data isn't that big

image possibly inspired by this post

"So, how much experience do you have with Big Data and Hadoop?" they asked me. I told them that I use Hadoop all the time, but rarely for jobs larger than a few TB. I'm basically a big data neophite - I know the concepts, I've written code, but never at …

more ...

A High Frequency Trader's Apology, Pt 1

I'm a former high frequency trader. And following the tradition of G.H. Hardy, I feel the need to make an apology for my former profession. Not an apology in the sense of a request for forgiveness of wrongs performed, but merely an intellectual justification of a field which is …

more ...

Read more