December 21, 2016
22 minutes
4488 words

Twelvefold Way

       A  I  S
  f   01 02 03
  fSB 04 05 06
SLf   07 08 09
SLfSB 10 11 12

We’ve all been there: you have a pile of balls and a stack of labels and you think, “How could I possibly label all of these balls?”

How could I label all of these balls?

Right, of course you haven’t. But perhaps you are an aspiring combinatorialist! Or perhaps you’re just curious.

Anyways, we’ll try to resist esoteric mathematical definitions until the end. For now, we have balls and labels to enumerate.

Labels and Balls

Let’s call the set of all balls B and the set of all labels L. We’re concerned with determining how to apply labels, elements in the set L, to balls, elements in the set B, given some imposed rules. Must each ball have at least one label? Must all labels be used at least once? Exactly once? Does the order of the labels produced matter?

Balls and labels and sets

So far we’re just counting, which sounds simple enough. However, when rules are applied, counting can become complex. Thanks to Richard Stanley and his predecessors1, we have the twelvefold way2 to guide us.

Before we explore that complexity, allow me one formality that will be key to our understanding: the conceptualization of applying labels to balls as a Mathematical function3. Specifically, think of the application of a label to a ball as a function that maps a set of balls (B) to a set of labels (L), written as f: B → L. The inverse of f, denoted f-1, reverses the mapping, such that f-1: L → B.

  • If f(1) = 2, then f maps ball 1 to label 2.

  • If f(2) = {∅} , then f does not map ball 2 to any labels.

  • If f-1(2) = 3 , then f maps ball 3 to label 2.

For our purposes, a function takes a single value and returns a single value. This is imperative. As a result, in terms of balls and labels, a ball can only be labeled once, but a label can be applied to an indefinite number of balls.

You might have just made a realization regarding invertibility: if a function maps two or more balls to the same label, then the function is not invertible because one parameter (the label) of the inverted function maps to multiple values (the balls).

Function of Balls to Labels

From here, we can consider the set, F, of all functions mapping B to L, which represents all possible mappings, or ways to label the balls. Instead of considering all functions, we can also restrict F with rules.

  • If we restrict F to only functions that map all balls to at least one label, that eliminates all functions that leave out one or more of the balls in the set B.

Formally defining sets gives us the ability to make substantiated claims about abstractions, which claims will inform our ability to count!

Exciting stuff, right? Now, regarding the Twelvefold Way.

Why twelve?

Good question. The twelvefold way describes the twelve combinations of rules we can apply to ball labelling, composed of four ways to describe equivalence and three rules for labelling. There are twelve ways to choose one from each of these two sets of rules, so voila!

Equivalence: Order matters

How can equivalence vary? Consider sets of integers, A and B, such that A = {1, 2, 3} and B = {1, 3, 2}. Are they the same set? It depends on what you mean by “the same”, which is the loose definition of an equivalence relation4.

Recall the idea of labelling balls as a function, f: B → L. Specifically, let’s use the sets B = {1, 2} and L = {A, B, C}. Consider the following set of six functions:

f1: (1, 2) → (A, A)

f2: (1, 2) → (B, B)

f3: (1, 2) → (A, B)

f4: (1, 2) → (B, A)

f5: (1, 2) → (B, C)

f6: (1, 2) → (C, B)

By our most common definition, each function represents a distinct outcome. That is, the codomains of these functions are distinguishable. There are, however, four different ways to approach the concept of distinguishability:

  1. Count the lists as distinct possibilities. This, again, is likely the most intuitive option. In the given example, out enumeration results in six distinct functions. We’ll symbolically refer to this as f.

  2. Count the lists as distinct up to a permutation of balls. If the only thing distinguishing multiple outcomes is the order of the inputs (and thus the outputs), then only count those outcomes once. In this paradigm we can convince ourselves that (A, B) and (B, A) are only different because of the order in which the inputs arrive (a permutation of balls). Pragmatically, this means that we will sort the output, then ignore duplicate results while counting. In the given example, f3 = f4 and f5 = f6, leaving only four distinct functions. Representing this pre-processing step, we will refer to this as f∘SB.

  3. Count the lists as distinct up to a permutation of labels. Pretend that you don’t already have an order of labels in mind before they come out of the function; rather, count them as they appear: (A, B) is (1, 2) because you saw A first, then B second; (B, A) is (1, 2) because you saw B first, then A second; (A, A) is (1, 1) because you saw A first; (B, B) is (1, 1) because you saw B first; Whereas equivalence relation 2 (above) counts equivalence with a pre-ordering, this concept counts equivalence with a post-ordering. In the given example, f1 = f2 and f3 = f4 = f5 = f6, leaving only two distinct functions. To represent the post-processing, we can call this option SL∘f.

  4. Count the lists as distinct up to a permutation of balls and labels. This essentially amounts to pre- and post-ordering, then counting the results. For instance, (A, B) and (B, A) would both get pre-ordered into (A, B), which would be post-ordered into (1, 2). In the given example, f1 = f2 and f3 = f4 = f5 = f6, leaving two distinct functions. Intuitively, we refer to this symbolically as SL∘f∘SB.

Depending on the equivalence rules we choose, the results of our counting can change drastically.

Labeling: Injective, Surjective, or Neither

Richard Stanley comes home to his kids, One, Two, and Three, with four balls. Like any good father, he has three labels—one for each of his children. He considers three different rules by which he could label them:

  1. No rules! Don’t label any of the balls. Label each ball with each label. It doesn’t matter!

  2. One-to-one (injective), in which each label is used at most once. It is as if he only has one of each label. Nobody gets more than one ball, but there is no guarantee of getting a ball.

  3. Onto (surjective), in which each label gets used on at least one ball, but some might be used on more than one ball. He’s got stacks of labels, which is fortunate because each child demands a ball. They might even get more than one, each!

There is also a name for both injective and surjective: bijective, in which each ball gets at exactly one label. (We don’t count it as a unique option in the twelvefold ways because, being a combination of two options, it does not add any further “ways”.) Everyone is guaranteed to get a ball that they don’t have to share!

Injective, Surjective, Bijective

You might notice that, if |L| > |B|, then bijectivity is impossible, courtesy of the Pidgeonhole Principle5. That is, when each ball has exactly one label (injective), there exists at least one label without a ball (not surjective). However, labeling a ball with one of the remaining labels will violate injectivity.

Great! Can we finally see the twelve options?


f-class Any Injective Surjective
f 01 02 03
f∘SB 04 05 06
SL∘f 07 08 09
SL∘f∘SB 10 11 12

Our analysis of each case will cover a few things: intuition about the requirements, cardinality of B and L (e.g. can we have any number of balls and labels, or do we require more of one than the other?), a formula for counting, and an example.

01 Any f

Each label can be placed on any ball with no rules. Because order matters, outputs {a, b, c} and {c, b, a} are distinguishable, so they count separately. Consider labeling each ball, of which there are B, with any of the labels, of which there are L. You never run out of any one kind of label, and you don’t have to use each label if you don’t want to.




Sets of any size will do!


Given B = {1, 2, 3} and L = {a, b}, all possibilities of f: B → L are:

{a, a, a}, {a, a, b}, {a, b, a}, {a, b, b}, {b, a, a}, {b, a, b}, {b, b, a}, {b, b, b}

For each ball, of which there are B = 3, there are L = 2 labels: 01 example

02 Injective f

Order matters, so {a, b, c} and {c, b, a} are distinguishable and count separately. Once a label has been used, we cannot use it again. Consider labeling each ball, of which there are B, with any of the labels, of which we start with L. For the first label, we have L labels to choose from, but for the second we only have L-1 options. With each label placed, the options reduce by one until all balls have been labeled, all labels have been used, or we simply decide to stop.


02full, which is commonly called a falling factorial.


Sets of any size will do!


Given L = {i, j} and B = {a, b, c, d}, all possibilities of f: L → B are:

{a, b}, {a, c}, {a, d}, {b, a}, {b, c}, {b, d}, {c, a}, {c, b}, {c, d}, {d, a}, {d, b}, {d, c}

02 example

03 Surjective f

This is best understood within the context of 09 Surjective SL∘f, which we will see later is equivalent to counting partitions of B into X sets, resulting in:


Starting from that point, we’re left to account for the extra permutations of labels, un-doing the post-ordering step of 09. The number of permutations of L is L!, so we multiply by that term to yield out formula.




Due to surjectivity, this requires |B|≥|L|.


Given B = {1, 2, 3, 4} and L = {A, B}, all possibilities of f: L → B are:

{A, A, B, B}, {A, B, B, A}, {A, B, A, B}, {A, B, B, B}, {B, A, B, B}, {B, B, A, B}, {B, B, B, A}, {B, B, A, A}, {B, A, A, B}, {B, A, B, A}, {B, A, A, A}, {A, B, A, A}, {A, A, B, A}, {A, A, A, B},

Notice that the first row of answers is identical to the entire answer set of 09. The second row simply swaps A with B, represented by the leading 2! coefficient:


04 Any f∘SB

We aren’t restricted by labeling rules, but we do recognize equivalence up to a permutation of balls (e.g. {A, A, B} = {A, B, A} != {B, B, A}). Counter-intuitively, case 01 will not help us much. However, understanding combinations6 with repetition—particularly how it differs from combinations without repetition (see case 05)–certainly will.

(Proof of the relationship between multicombinations and the binomial coefficient below is forthcoming. In the meantime, read Wikipedia’s content specifically regarding combinations with repetition)




(L+B-1) ≥ B in order to get a valid result. Otherwise, the answer is 0.


Given B = {1, 2} and L = {A, B, C, D}, all possibilities of f: L → B are:

{A, A}, {A, B}, {A, C}, {A, D}, {B, B}, {B, C}, {B, D}, {C, C}, {C, D}, {D, D}

Applying the formula yields the same result:

04 example

Notice the similarities (and key difference!) between this case and case 05, in both the structure of the example outcomes and the formula.

05 Injective f∘SB

Injectivity restricts us to use each label at most once. f∘SB means that we pre-order the balls, which will differentiate this case from case 02. In our example, the pre-ordering will group {a, c} and {c, a} into one equivalence class.

By this description, you might recognize this case: enumerating the subsets of L with B elements. In common parlance, “B choose L”, is represented by the binomial coefficient.




L ≥ B in order to get a valid result. Otherwise, the answer is 0.


Given B = {1, 2} and L = {A, B, C, D}, all possibilities of f: L → B are:

{A, B}, {A, C}, {A, D}, {B, C}, {B, D}, {C, D}

05 example

06 Surjective f∘SB

All labels must be used at least once (surjective) and functions are equivalent up to permutations of balls (e.g. {A, B, B, B} = {B, B, A, B} ≠ {A, A, B, A}). It’s plain to see that we must have more balls than labels for this to make sense. Let’s look at an example before diving into the intuition:


Given B = {1, 2, 3, 4} and L = {A, B}, all possibilities of f: L → B are:

{A, A, A, B}, {A, A, B, B}, {A, B, B, B}

We can show that this case is equivalent to counting compositions of B with L terms. To see the equivalence, grouping the like-terms into integers: think of {A, A, B, B} as {2, 2}, {A, A, A, B} as {3, 1}, and {A, B, B, B} as {1, 3}. These are precisely the three ways of describing 4 as a sum of 2 non-zero integers, which is the definition of a composition. We align with the rules of case 06 because {A, A, A, B} and {A, B, B, B} are distinguishable (unlike in SL∘f) but {A, A, A, B} is indistinguishable from {A, B, A, A} (both count as {3, 1}).

Phew! We won’t go all the way to proving why compositions of B with L terms adhere to the given formula, but you can find a succinct, non-rigorous proof in the Number of compositions section of Composition7 on Wikipedia. It’s worth proving it, yourself, too!





Applying the formula above to our example above, then reducing the binomial coefficient, we get the answer we expected:


07 Any SL∘f

Any labeling will do, but functions are equivalent up to a permutation of labels, meaning we post-order function output (e.g. {A, B, A} = {B, A, B} = {1, 2, 1}, which is not equivalent to {A, A, B} = {1, 1, 2}).

For the sake of brevity, we can leverage the results from case 09, which counts the number of partitions of B into L subsets. The major difference is that case 07 is not surjective, so we don’t need to enforce the non-emptiness of the sets. It’s fine if some sets are empty, because that’s the same as not using some labels, which is okay here.

As a result, we can simply ignore those “empty sets”, and reduce this to the sum of the number of partitions of B into x subsets for x from 0 to L:



Note: when L=B, this formula gives the expression for the Bell number, BB


Sets of any size will do. However, if L > B, then when k grows to be greater than B, those terms of the sum will produce 0.


Given B = {1, 2, 3, 4} and L = {A, B}, the options (lines corresponding, respectively, to terms in the formula’s sum) are:

none, {A, A, A, A}, {A, A, B, B}, {A, B, B, A}, {A, B, A, B}, {A, B, B, B}, {B, A, B, B}, {B, B, A, B}, {B, B, B, A}

Using the formula, we yield:

07 example

08 Injective SL∘f

An injective requirement means each label can correspond to at most one ball. The equivalence class, SL∘f, will post-order the mapping (e.g. {A, B, C} = {B, A, C} under the post-ordering {1, 2, 3)). Quite simply, this means that we’ll try to map each ball to exactly one label, where the specific label is obscured by post-ordering. If we are able to find a unique label per ball, then there is one possible mapping; otherwise, there are zero.

08 illustration


08, which is known as an Iverson bracket


In order to have one option, L ≥ B.

Example 1

Given B = {1, 2, 3, 4} and L = {A, B}, we yield:

08 example 1

Example 2

Given B = {1, 2} and L = {A, B, C, D}, we yield:

08 example 2

09 Surjective SL∘f

It’ll help us to think of this as organizing balls into non-empty sets. Surjective means we have to use all the labels (no empty sets) and SL∘f means that outcome are indistinguishable up to a post-ordering (e.g. (A, A, B) = (B, B, A) because both equal (1, 1, 2) after post-ordering).

Those familiar with partitions of sets8 will recognize this case as the number of partitions of B into L subsets, or the Stirling number of the second kind9.




Due to surjectivity, this requires |B|≥|L|.


Given B = {1, 2, 3, 4} and L = {A, B}, all possibilities of f: L → B are:

{A, A, B, B}, {A, B, B, A}, {A, B, A, B}, {A, B, B, B}, {B, A, B, B}, {B, B, A, B}, {B, B, B, A} as illustrated:

09 illustration

This yields the final answer:


10 Any SL∘f∘SB

We’ll rely on our solution to case 12, so go read that if you haven’t already!

Here, unlike in case 12, we’re not restricted to injective or surjective functions, but we do recognize equivalence up to permutations of both labels and balls. By removing the surjective restraint, relative to case 12, we’re allowing all possible functions that do not cover the entire codomain (i.e. do not apply all labels to at least one ball).

Given the knowledge covered in case 12, we know that surjective functions under equivalence SL∘f∘SB are enumerated equivalently to the number of partitions of B into L subsets. Simply enough (and much like case 07 relative to case 09) we can just sum the enumerations for k from 0 to L, adding in all the cases excluded in case 12.




Sets of any size will do. However, if L > B, then when k grows to be greater than B, those terms of the sum will produce 0.


Given B = {1, 2, 3, 4} and L = {A, B}, all possibilities of f: L → B are:

{A, A, A, A}, {A, A, B, B}, {A, B, B, B}

as illustrated by:


Our example above checks out with the formula:


11 Injective SL∘f∘SB

If you understand 08, then you already understand this case! The pre-ordering step that differentiates this case cannot add any possible outcomes, so it trivially reduces to the result of Injective SL∘f.

To see this reduction, consider the examples from 08:

Example 1

Given B = {1, 2, 3, 4} and L = {A, B}, we still cannot find a single mapping. We yield:

11 example 1

Example 2

Given B = {1, 2} and L = {A, B, C, D}, the extra pre-order equivalence class means that mapping balls (1, 2) is equivalent to mapping (2, 1). However, the post-ordering step makes both of those cases equivalent already. Nothing changes, yielding:

11 example 2

12 Surjective SL∘f∘SB

It’s useful to understand 09 Surjective SL∘f.

As in 03, 06, and 09, we need to organize balls into non-empty sets (surjective), but by pre-ordering and post-ordering, we’re left with the most general equivalence relation, which only really distinguishes the sizes of the sub-classes. This is easily expressed in

12 illustration

By enforcing both pre- and post-order, the seven cases in the example from 09 collapse into only to groups that are fundamentally distinguishable by the size of the subsets, which is made more clear by their shapes.


12, which is the number of partitions of B into L subsets.

We’re left with the final enumeration:



Due to surjectivity, this requires |B|≥|L|.


Given f: B → L, how many possible outcomes exist?

f-class Any Injective Surjective
f 01 02 03
f∘SB 04 05 06
SL∘f 07 08 09
SL∘f∘SB 10 11 12

A slightly different organization of this same information can be found from–among other places–UC Denver’s Math 45010.



A function maps elements from one set to another set. A function, f, described by f : A→B is usually said to map the set A to the set B.


A function’s domain is the set from which it maps. Given f : A→B, the domain is the set A.


similar to range A function’s codomain is the set to which it maps. Given f : A→B, the codomain is the set B.


Cardinality is the number of elements in the set, often denoted |S| for set S. |X| = m and |Y| = n. Consider assigning one cardinal number to each element.


or one-to-one An injective function maps each element of its domain uniquely to at most one element of its codomain. Some elements may not map to the codomain. No element in the codomain maps back to two elements in the domain.


or onto A surjective function maps to all elements in its codomain. Unlike injective functions, surjective functions can map elements in the domain to multiple elements in the codomain.


A bijective function is both injective and surjective. Each element in the function’s domain maps to exactly one element in the codomain.

Consider sets A = {1, 2, 3}, B = {3, 1, 2}, and C = {3, 1, 2}. A, B, and C contain the same three elements. Additionally, B and C order those elements the same. We can define equivalence relations on A and B that determine whether or not they are equal.


or equal, strictly equal Strict equality requires two sets to have the same elements in the same order. As such, AB = C. This relation is referred to as distinguishable because A is distinguishable from B and C. We can define distinguishability per direction, as well. That is, mapping X to Y, we can require equality up to a permutation of Y only. In that paradigm, f : A → B is equal, but f-1 : B → A is not.


or equal up to a permutation An equivalence relation can be called indistinguishable if order does not matter for either set. As such, A = B = C. This relation is referred to as indistinguishable because A, B, and C are indistinguishable from each other.


Any re-ordering of elements in a set is called a permutation; e.g. the permutations of the set {A, B, C} are {A, B, C}, {A, C, B}, {B, A, C}, {B, C, A}, {C, A, B}, {C, B, A}. For a set, X, of x elements, the first position has x options, the second has x-1 and so on, leading to x! total permutations of X.


A combination is a possible selection of unordered elements from a set. The fact that a combination is not ordered (e.g. {A, B, C} = {C, A, B}) differentiates it from a permutation. If a set, N, has n elements, then the number of combinations of k elements (called “n choose k”) can be calculated with the binomial coefficient:

Binomial coefficient


A partition of an integer n is a way of writing n as a sum of positive integers. For example, the partitions of 3 are 3, 2+1, 1+1+1, resulting in the number of partitions: p(3) = 3. In a partition, 3+1 counts the same as 1+3, meaning a partition is an un-ordered version of a composition.


A composition of an integer n is a way of writing n as an ordered sequence of integers. For example, the compositions of 3 are 3, 2+1, 2+1, 1+1+1. A composition is an ordered version of a partition.

The number of compositions of n is given by: Composition of n

The number of compositions of n into k parts is given by: Composition of n into k which is supported by the trivial fact that



falling factorial

Falling factorial

binomial coefficient

Within the context of this article, a binomial coefficient enumerates the number of possible combinations of k elements chosen from a set of n elements:

Binomial coefficient

In a greater context, a binomial coefficient (n choose k) is “the coefficient of the xk term in the polynomial expansion of the binomial power (1 + x)n,” according to Wikipedia23. Pascal’s triangle is entirely composed of binomial coefficients.

Stirling number of second order

A Stirling number of the second order, denoted S(n, k), counts the ways of partitioning n elements into k non-empty subsets. Other common notations include Stirling number of second order

They follow the following recurrence relation, courtesy of UC Denver 10:

S(0, 0) = 1 S(n, 0) = 0, for n ≥ 1 S(n, k) = 0, for n ≤ k S(n, k) = (k)S(n−1, k) + S(n−1, k−1), for n ≥ k ≥ 1

Iverson bracket

For denoting truth or falseness, the Iverson bracket applies a 1 to a true statement and 0 to a false statement. Symbolically, that can be written: Iverson bracket

Bell number

The Bell numbers, indexed by Bn, enumerate the partitions of a set, specifically of size n. Bell numbers can be described by the following recurrence relation: Bell number recurrence relation

or by the following summation of Stirling numbers of the second kind Bell number sum

Equivalence relation

It sounds circular, but an equivalence relation partitions a set into equivalence classes. In more lay-terms, it determines what is and is not equivalent. For instance, if order does not matter, then (A, B) is equivalent to (B, A); however, if order does matter, then (A, B) is not equivalent to (B, A).

Many branches of mathematics depend on a more broad and robust definition of an equivalence relation, which I encourage you to explore! Wikipedia4 is a good place to start.

I must note here that I owe a lot of references to Wikipedia and its many authors for things that are laborious to reference at each mention, such as formulas and specific rigorous wording. I hope to always give all the credit due to that community.

  1. Richard P. Stanley, Wikipedia

  2. Twelvefold way, Wikipedia

  3. Fuction (mathematics), Wikipedia

  4. Equivalence relation, Wikipedia

  5. Pidgeonhole principle, Wikipedia

  6. Combination, Wikipedia

  7. Composition (combinatorics), Wikipedia

  8. Partition of a set, Wikipedia

  9. Stirling number of the second kind, Wolfram Mathematica

  10. Twelvefold way, UC Denver Math 450

  11. Function (mathematics), Wikipedia

  12. Cardinality, Wikipedia

  13. Injective function, Wikipedia

  14. Surjective function, Wikipedia

  15. Domain of a function, Wikipedia

  16. Codomain, Wikipedia

  17. Equivalence relation, Wikipedia

  18. Information on Enumerative Combinatorics, Richard Stanley, MIT

  19. Falling and rising factorials, Wikipedia

  20. Partition (number theory), Wikipedia

  21. Iverson bracket, Wikipedia

  22. Bell number, Wikipedia

  23. Binomial coefficient, Wikipedia