Option 1: $x \in \{1,2,\ldots,K\}$.
Option 2: $x=(x_1,\ldots,x_K)^T$ with binary selection variables
where $m_k= \sum_n x_{nk}$ is the total number of occurrences that we 'threw' $k$ eyes. Note that $\sum_k m_k = N$.
we used a beta distribution that was conjugate with the binomial and forced us to choose prior pseudo-counts.
where $\Gamma(\cdot)$ is the Gamma function.
The Gamma function can be interpreted as a generalization of the factorial function to the real ($\mathbb{R}$) numbers. If $n$ is a natural number ($1,2,3, \ldots $), then $\Gamma(n) = (n-1)!$, where $(n-1)! = (n-1)\cdot (n-2) \cdot 1$.
As before for the Beta distribution in the coin toss experiment, you can interpret $\alpha_k-1$ as the prior number of (pseudo-)observations that the die landed on the $k$-th face.
where $m = (m_1,m_2,\ldots,m_K)^T$.
(You can find the mean of the Dirichlet distribution at its Wikipedia site).
This result is simply a generalization of Laplace's rule of succession.
where we used the fact that the maximum of the Dirichlet distribution $\mathrm{Dir}(\{\alpha_1,\ldots,\alpha_K\})$ is obtained at $(\alpha_k-1)/(\sum_k\alpha_k - K)$.
Of course, we shouldn't have to go through the full Bayesian framework to get the maximum likelihood estimate. Alternatively, we can find the maximum likelihood (ML) solution directly by optimizing the (constrained) log-likelihood.
The log-likelihood for the multinomial distribution is given by
where we get $\lambda$ from the constraint $$\begin{equation*} \sum_k \hat \mu_k = \sum_k \frac{m_k} {\lambda} = \frac{N}{\lambda} \overset{!}{=} 1 \end{equation*}$$
Given $N$ IID observations $D=\{x_1,\dotsc,x_N\}$.
open("../../styles/aipstyle.html") do f
display("text/html", read(f, String))
end