How to measure a changing conversion rate (with python code)

As the owner of the spamblog http://www.iwishiwastaller.com, I've run into the following problem. I'm selling some height enhancing pills full of organic free range snake oil. I've come up with several different calls to action:

  • Tired of finding pants that fit? Click here for the solution.
  • Click …
more ...




Postgres NOTIFY for cache busting and more

"There are only two hard things in Computer Science: cache invalidation and naming things."

Phil Karlton

For those of using Postgres as a data store, cache invalidation has become significantly easier. Postgres has introduced the command NOTIFY which can be used to inform the cache of necessary invalidation.

The old …

more ...

Compound Aggregates in Hadoop/Scalding

Consider the following problem. I have an extremely large number of servers, each of which uploads their logs to a Hadoop cluster. Each line of the log file contains a server IP address, and represents a single message in Hadoop. I'm investigating a network intrusion. One of my network admins …

more ...

Don't use Hadoop - your data isn't that big

image possibly inspired by this post

"So, how much experience do you have with Big Data and Hadoop?" they asked me. I told them that I use Hadoop all the time, but rarely for jobs larger than a few TB. I'm basically a big data neophite - I know the concepts, I've written code, but never at …

more ...

java.lang.OutOfMemoryError, GC overhead limit exceeded

One annoying error which I often see when running Hadoop jobs is this:

java.lang.OutOfMemoryError: GC overhead limit exceeded

The cause of this error is that Java is spending a lot of time inside the garbage collector, and is not freeing up large chunks of memory. When this error …

more ...

Mechanical Turk and Error Correcting Codes

In a recent blog post, Panos Ipeirotis asks the following question (loosely paraphrased).

Consider a set of n oracles or mechanical turks, each with a probability q of returning an erroneous response. The capacity of a channel with error probability q is C(q,n), and information theoretic channel capacity …

more ...