May 11, 2026

Lucas's Theorem

Let’s explore a theorem that breaks massive combinatorial problems down into hand-computable pieces.

Motivation

Combination questions take the form of “how many ways can you choose $y$ items from $x$ options?” If you’ve got six flowers, all different colors, how many ways can you choose three of them for a small bouquet?

The standard syntax and factorial definition of combinationsCombinations are better known as binomial coefficients. are:

\binom{n}{k} = \frac{n!}{k!(n-k)!}

While accurate, the above formula is difficult to deal with for even moderately large values. The value of $\binom{482}{176}$ is practically intractible using factorials. Most applications don’t even have a use for a number that large.

Taking the residue constrains the magnitude of the final result, but the computation is still prodigious:

\binom{n}{k} \equiv{} x \mod m

Assuming the modulus $m$ is prime-powered, Lucas’s theorem gives us a useful decomposition of this problem.

Theorem Declaration

A binomial coefficient under a prime modulus $p$ is congruent to the product of the binomial coefficients created by the base- $p$ digits of the original coefficient. In mathematical notation:

\binom{n}{k} \equiv{} \prod_{i=0}^{d} \binom{n_i}{k_i} \mod p

For example, take the $\binom{482}{176}$ example from above. Convert the two values to base 5:

\begin{align*} 482 &= 3 * 5^3 + 4 * 5^2 + 1 * 5^1 + 2 * 5^0\\ 176 &= 1 * 5^3 + 2 * 5^2 + 0 * 5^1 + 1 * 5^0 \end{align*}

The coefficients to the powers of 5 are inserted into the binomial coefficient product:

\begin{align*} \binom{482}{176} &\equiv{} \binom{3}{1} \binom{4}{2} \binom{1}{0} \binom{2}{1} & \mod 5\\ &\equiv{} 3*6*1*2 & \mod 5\\ &\equiv{} 36 & \mod 5\\ &\equiv{} 1 & \mod 5 \end{align*}

Considering the fact that the non-residue value is approximately $9.2 * 10^{135}$ , computing a residue so efficiently with Lucas’s theroem is quite a feat.

Proof

We use the “binomial” characteristic of binomial coefficients as the basis of the proof.

\begin{align*} \sum_{c=0}^{r} \binom{r}{c} x^c &= (1+x)^r\\ &= \prod_{m=0}^{k} (1+x)^{r_m p^m} \end{align*}

The product is simply using the base- $p$ expansion of $r$ .

Lemma

To go much further we need to justify a binomial quirk. For prime $p$ :

(1+x)^{p^m} \equiv (1+x^{p^m}) \mod p

Let’s look at the binomial coefficients. The factorial definition gives that $\binom{p^m}{k} = \frac{p^m}{k} * \binom{p^m-1}{k-1}$ . Multiply $k$ over and the result is:

k * \binom{p^m}{k} = p^m * \binom{p^m-1}{k-1}

The right-hand side is divisible by $p$ at least $m$ times, and since $k \leq p^m$ , $k$ will be divisible by $p$ at most $p-1$ times (or equal $p^m$ ). Binomial coefficients are integers, so $p$ must divide $\binom{p^m}{k}$ for every $0 < k < p^m$ . The boundary cases are each $1$ by definition.

Any coefficient divisible by $p$ is congruent to $0 \mod p$ , which leaves only the first and last terms in the binomial expansion: $1$ , and $x^{p^m}$ . The aforementioned congruence follows.

Returning to the main equation:

\begin{align*} \prod_{m=0}^{k} (1+x)^{r_m p^m} &= \prod_{m=0}^{k} ((1+x)^{p^m})^{r_m}\\ &\equiv \prod_{m=0}^{k} (1+x^{p^m})^{r_m} \mod p \end{align*}

Expand that inner binomial out into a sum:

\begin{align*} &\prod_{m=0}^{k} \sum_{s=0}^{r_m} \binom{r_m}{s} (x^{p^m})^s\\ =&\prod_{m=0}^{k} \sum_{s=0}^{r_m} \binom{r_m}{s} x^{sp^m} \mod p \end{align*}

The distributive law allows us to move the inner sum outside, turning it into an iterated sum:

\begin{align*} &\prod_{m=0}^{k} \sum_{s=0}^{r_m} \binom{r_m}{s} x^{sp^m}\\ =& \sum_{s_0=0}^{r_0} \sum_{s_1=0}^{r_1} \dotsm \sum_{s_k=0}^{r_k} \prod_{m=0}^{k} \binom{r_m}{s_m} x^{s_mp^m} \mod p \end{align*}

Consider the set of $(s_0, s_1, \dots s_k)$ . The $x$ term in the inner product becomes $x^{s_0p^0} * x^{s_1p^1} * \dots x^{s_kp^k}$ , making the combined exponent $s_0p^0 + s_1p^1 + \dots s_kp^k$ .

In other words, the sets of $s$ are the base- $p$ representations of each term’s exponent.

When all $s_m = 0$ , the exponent is $0$ .
When all $s_m = r_m$ , the exponent is $r$ .

Now, recall our original summation variable $c: 0 \leq c \leq r$ . It would be nice to replace the iterated sum with a simple $\sum_{c=0}^{r}$ , but not every number from $0$ to $r$ is necessarily represented in the iterated sum. Consider, for example, $c=p^e - 1$ where $e < k$ . All of the $c_m = p-1$ , the maximum valid coefficient value, but not all $r_m$ necessarily go that high.

Nonetheless, the replacement sum can still be justified if, for each of those extra cases where $c_m > r_m$ , the resulting term is $0$ . Consider the binomial coefficient at one of those cases: $\binom{r_m}{c_m} = 0$ when the inequality is trueAfter all, one cannot “choose” more items than exist in the set..

Let’s collapse that iterated sum using that justification:

\begin{align*} & \sum_{s_0=0}^{r_0} \sum_{s_1=0}^{r_1} \dotsm \sum_{s_k=0}^{r_k} \prod_{m=0}^{k} \binom{r_m}{s_m} x^{s_mp^m}\\ =& \sum_{c=0}^{r} \prod_{m=0}^{k} \binom{r_m}{c_m} x^{c_mp^m}\\ =& \sum_{c=0}^{r} \Bigg[\prod_{m=0}^{k} \binom{r_m}{c_m} * \prod_{m=0}^{k} x^{c_mp^m}\Bigg]\\ =& \sum_{c=0}^{r} \Bigg[\prod_{m=0}^{k} \binom{r_m}{c_m} * x^c\Bigg] \mod p \end{align*}

We now have a structure that closely parallels our original expression:

\sum_{c=0}^{r} \binom{r}{c} x^c \equiv \sum_{c=0}^{r} \prod_{m=0}^{k} \binom{r_m}{c_m} x^c \mod p

Compare each coefficient, and you arrive at the theorem statement.

$\square$

Applications

Problems where Lucas's theorem can be directly applied are tricky to invent, and rarely practical. If you have a quantity of items expressible as a binomial coefficient, and wish to find the number left over after divvying the items into prime-numbered groups, Lucas can get you there easilyBrilliant has an example of counting oranges stacked in a pyramid structure. That’s about as practical of an example as I can find..

However, binomial coefficients in general tend to appear in many unexpected places throughout both number theory and applied math. Knowing a little trick exists when a binomial and prime modulus are involved can be the ticket you need to drastically reduce complexity.

Works Cited

Larry Riddle’s writeup of the theorem’s proof was instrumental in helping me understand it. Most of the steps above, down to the variable names, are taken from his summary: I simply fleshed out several steps that caught me off-guard when I was learning. The numeric example he works through is worth a read, in particular. I wouldn’t have understood the general proof without it!

math LaTeX

Fluid CSS During a recent website re-design, I was delighted by the robust and flexible interface provided by modern CSS.