Reading Note for Probabilistic Programming and Bayesian Methods for Hackers, Chapter 1
This is a reading notes for Probabilistic Programming and Bayesian Methods for Hackers, Chapter 1, The Philosophy of Bayesian Inference. The github link is: https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers .
Note: All tips that is written by book will has a sign near it.
All code example in this post comes from book.
The Philosophy of Bayesian Inference
Bayesian inference is simply updating your
beliefs
after considering new evidence. We update our beliefs about an outcome; ralely can we be absolutely sure unless we rule out all other alternatives.
In Bayesian, the probability
is consideres as believablity
of an event. i.e. how confident
we are.
Frequentist
: assumeprobablity
is thelong-run
frequency of events.Bayesian
: asssumeprobablity
is a measure ofbelief
. Note thatbelief
is assigneedindividually
, which means there can be conflicting of blieves.P(A)
:prior probability
P(A|X)
: theprobablity
of A given the evidenceX
For example:
P(A)
: The code likely has a bug with it.P(A|X)
: The code passed allX
tests, there still might be a bug, but its presence is less likely now
Every time after a new evidence
X
comes, we re-weighted the prior probablity
to incorporated the new evidence. After this process, our guesses become less wrong. i.e. we try to be more right after each guess.
Frequentist
will return a number, while Bayesians
will return probablity. For the exampl above, if asking “My code passes all tests. Is my code bug-free?”, Frequentist
will return yes. if asking “Often my code has bugs. (prior parameter) My code passed all X tests. Is my code bug-free?”, Bayesians
will return Yes, with probablity 0.8; No, with probablity 0.2.
Denote
N
as the number of instances of evidence we possess. As we gather an infinite amoutn of evidence, say as N -> Infinity
, our Bayesian results (often) align with frequentist results.
- So for large
N
, statistical inference is more or lessobjective
- for small
N
, inference is much moreunstable
:frequentist
estimates have more variance and larger confidence intervals. - Hence
Bayesians
introduce aprior
and returningprobablities
, it can preserve theuncentainty
that reflects the instability (not stable) of statistical inference of a smallN
dataset.
Bayes’ Theorem if our code passes
X
tests, we want to update our belief to incorporate this. We call this new belief the posterior probability
. Bayes’ Theorem is used for updating process:Bayesian
inference merely uses it to connect prior probabilities P(A)
with an updated posterior probabilities
P(A|X)
.
Conditional Probabiliry Knowledge (get from Wikipedia
The Understanding of Variables for Coding Example
- Assume
A
means the event that code has no bug andX
means code passesX
tests. AssumeP(A) = p
. P(A|X)
: the probability for no bugs, giving passing debug testX
.P(X|A)
: the probability that code passesX
test given there is no bugs. This always be 1.P(X)
: since:
|
|
|
|
Probability Distributions
Let
Z
be some random variable. Then associated with Z
is a probability distribution function
that assigns probablities to the different outcomes Z
can take. A probability distribution
is a curve where the probability of an outcome is proportional to the height of the curve.
Z
can be discreate: only assume values on a speical list. Such as populations, movie ratings, etc.Z
can be continous: takes aribitrarily exact values. Such as temporature, speed, etcZ
can be mixed by two types above.
If Z is discreate
If
Z
is discreate, its distribution is called probability mass function
, which measures the probablity Z
takes on the value k
, denoted P(Z=k)
.
We saw Z
is posisson-distributed if:
λ
is called a parameter of the distribution, orintensity
of thepossison distribution
and it controls the distribution’sshape
. Here,λ
can be any positive number.- if
λ
increate, add probability to larger values. ifλ
decrease, add probability to smaller values. k
must be a non-negative integer.
If a random variable Z
has a poisson mass distribution
, we can express as: Z ~ Poi(λ)
For a poisson distribution
, its expected value is equal to its parameter, i.e. E[Z | λ] = λ
If Z is continuous
If
Z
is continous, its distribution is called probability density function
. The density function for an exponential random variable looks like:
z
can only take non-negative value, including non-integer
If Z
has an exponential distribution, we say Z
is exponential
and we have: Z ~ Exp(λ)
and E[Z | λ] = 1/λ
PyMC
|
|