The magic of conjugate priors (for online learning)
In Bayesian reasoning, the fundamental problem is the following. Given a prior distribution $@p(x)$@, and some set of evidence $@E$@, compute a posterior distribution on $@x$@ namely $@p(x | E)$@. For example, $@x$@ might be the conversion rate of some email. Before you have any evidence you might expect the conversion rate to be somewhere in the range of perhaps $@5\%$@ and $@50\%$@. After you have evidence, you update your belief - if you sent out thousands of emails and observed an empirical $@16.5\%$@ conversion rate, you are now reasonably confident that the true conversion rate lies roughly in the range of $@16\%-17\%$@.
In mathematics, a conjugate prior consists of the following. Consider a family of probability distributions characterized by some parameter $@\theta$@ (possibly a single number, possibly a tuple). A prior is a conjugate prior if it is a member of this family and if all possible posterior distributions are also members of this family.