This is a reading notes for Probabilistic Programming and Bayesian Methods for Hackers, Chapter 1, The Philosophy of Bayesian Inference. The github link is: https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers .
Note: All tips that is written by book will has a sign near it.
All code example in this post comes from book.
The Philosophy of Bayesian Inference
Bayesian inference is simply updating your
beliefs after considering new evidence. We update our beliefs about an outcome; ralely can we be absolutely sure unless we rule out all other alternatives.
In Bayesian, the
probability is consideres as
believablity of an event. i.e. how
confident we are.
long-runfrequency of events.
probablityis a measure of
belief. Note that
individually, which means there can be conflicting of blieves.
probablityof A given the evidence
- For example:
P(A): The code likely has a bug with it.
P(A|X): The code passed all
Xtests, there still might be a bug, but its presence is less likely now
Every time after a new evidence
X comes, we re-weighted the
prior probablity to incorporated the new evidence. After this process, our guesses become less wrong. i.e. we try to be more right after each guess.
Frequentist will return a number, while
Bayesians will return probablity. For the exampl above, if asking “My code passes all tests. Is my code bug-free?”,
Frequentist will return yes. if asking “Often my code has bugs. (prior parameter) My code passed all X tests. Is my code bug-free?”,
Bayesians will return Yes, with probablity 0.8; No, with probablity 0.2.
N as the number of instances of evidence we possess. As we gather an infinite amoutn of evidence, say as
N -> Infinity, our Bayesian results (often) align with frequentist results.
- So for large
N, statistical inference is more or less
- for small
N, inference is much more
frequentistestimates have more variance and larger confidence intervals.
probablities, it can preserve the
uncentaintythat reflects the instability (not stable) of statistical inference of a small
if our code passes
X tests, we want to update our belief to incorporate this. We call this new belief the
posterior probability. Bayes’ Theorem is used for updating process:
Bayesian inference merely uses it to connect prior probabilities
P(A) with an updated
Conditional Probabiliry Knowledge (get from Wikipedia
The Understanding of Variables for Coding Example
Ameans the event that code has no bug and
Xmeans code passes
P(A) = p.
P(A|X): the probability for no bugs, giving passing debug test
P(X|A): the probability that code passes
Xtest given there is no bugs. This always be 1.
Z be some random variable. Then associated with
Z is a
probability distribution function that assigns probablities to the different outcomes
Z can take. A
probability distribution is a curve where the probability of an outcome is proportional to the height of the curve.
Zcan be discreate: only assume values on a speical list. Such as populations, movie ratings, etc.
Zcan be continous: takes aribitrarily exact values. Such as temporature, speed, etc
Zcan be mixed by two types above.
If Z is discreate
Z is discreate, its distribution is called
probability mass function, which measures the probablity
Z takes on the value
Z is posisson-distributed if:
λis called a parameter of the distribution, or
possison distributionand it controls the distribution’s
λcan be any positive number.
λincreate, add probability to larger values. if
λdecrease, add probability to smaller values.
kmust be a non-negative integer.
If a random variable
Z has a
poisson mass distribution, we can express as:
Z ~ Poi(λ)
poisson distribution, its expected value is equal to its parameter, i.e.
E[Z | λ] = λ
If Z is continuous
Z is continous, its distribution is called
probability density function. The density function for an exponential random variable looks like:
zcan only take non-negative value, including non-integer
Z has an exponential distribution, we say
exponential and we have:
Z ~ Exp(λ) and
E[Z | λ] = 1/λ