.

Writing mostly about computers and math.

📅 

My favorite Rubik's cube

I got the idea in my head the other day to write a Rubik's cube solver. While working on that I accidentally learned some interesting math that I thought might be nice to share. Here, then, is a brief, probably over-simplified introduction to group theory and its applications to Rubik's cubes.

What is Group Theory, Anyway?

Group theory is the kind of math that for whatever reason most people never know about unless they study math at a university even though the basics are simple enough that a first grader could understand them. There are really just two building blocks underpinning the fun stuff: functions and sets.

Functions

If you've programmed a computer or taken a high school math class, you already have the right idea of what a function is. Functions are basically like little black boxes that accept some input and map it to an output. For example, \(f(x) = x^2\) takes every number and maps it to its square.

When talking about functions, we use domain to talk about the kinds of inputs a function can take and range to describe its output. The domain and range of a function that takes the natural numbers (\(\Bbb N\)) and maps them to the rationals (\(\Bbb Q\)) - maybe something like \(f(x, y) = \frac{x}{y}\) - would be written as \(f: \Bbb N \rightarrow \Bbb Q\). That is, \(f\) takes everything in \(\Bbb N\) and maps it to something in \(\Bbb Q\).

The last major notational thing to mention that's relevant to group theory is the idea of inverses. The inverse of a function is just another function that "undoes" it. To use the example from before, the inverse of \(f(x) = x^2\) would be \(f^{-1}(x) = \sqrt{x}\) (ignoring the multiple square roots of course).

Sets

Sets are also pretty easy to understand by intuition plus a little math. Set theory gets a lot fancier post-19th century, but Naive Set Theory is good enough for working with Rubik's cubes. In Naive Set Theory, a set is simply any well-defined group of objects. For example, the integers are a set, usually written \(\Bbb Z\). Similarly, the positive even numbers are a set (\(\{2, 4, 6, 8, \dots\}\)), the primes are a set (\(\{2, 3, 5, 7, \dots\}\)), and so on. For the purposes of this article, we're mostly interested in set membership. Membership is described using the \(\in\) symbol. For example, to say that \(x\) is a member of some set \(S\), we would write \(x \in S\). For the opposite, it's \(x \notin S\).

Groups

Now that we sort of understand functions and sets, it's pretty easy to make the leap to groups. A group has two parts: a set and an operation. These are usually written as \(G\) and \(*\) and the group is referred to by the tuple \((G, *)\). There are four axioms that these have to satisfy in order to be considered a group: closure, associativity, identity element, and inverse element.

Closure

\(\forall a, b \in G,\: a * b \in G\)

This means that for all members of the set \(G\), applying the operation to any pair of them gives you a result that's also a member of \(G\), i.e. the domain and range of our operator \(*\) are both \(G\).

Associativity

\(\forall a, b, c \in G,\: (a * b) * c = a * (b * c)\)

This just means that the operation we pick (\(*\)) has to be associative, that is, we can group the operations however we want without changing the result. The integers, for example, are a group under addition but not subtraction because addition is associative but subtraction is not, e.g. \(3 + (2 + 1) = (3 + 2) + 1\) but \(3 - (2 - 1) \neq (3 - 2) - 1\). That is, \((\Bbb Z, +)\) is a group but \((\Bbb Z, -)\) is not.

Identity Element

\(\exists e \in G\) such that \(\forall a \in G,\: e * a = a * e = a\)

This means that there is always a member of the set \(G\) that basically does nothing when combined with another element. To use the addition example again, the identity element for \((\Bbb Z, +)\) is 0, since \(0 + a = a + 0 = a\). For multiplication, the identity element is 1. This means that \(\Bbb N^*\), the natural numbers starting at 1, are not a group under addition since the identity element for addition, 0, is not a member of the set.

Inverse Element

\(\forall a \in G,\: \exists b \in G\) such that \(a * b = b * a = e\)

This axiom means that each element \(a\) in \(G\) must have an inverse (\(a^{-1}\)) that gives you the identity element when you \(*\) them together. In \((\Bbb Z, +)\), the inverse of each number is just its additive inverse (\(a^{-1} = -a\)) since \(a + -a = 0\). This axiom also excludes the natural numbers under addition since none of the inverses of natural numbers are themselves natural numbers.

Mechanics of Rubik's Cubes

In order to apply group theory to solving Rubik's cubes, we first need to understand how Rubik's cubes work. Since a Rubik's cube is, well, a cube, it has six faces. We call these up (U), down (D), right (R), left (L), front (F), and back (B). Each face is subdivided into nine smaller cubes called "cubies." There are 26 cubies in the Rubik's cube, \(3^3 - 1\) since there is no center cubie. Here are all of the faces labeled so you can get an idea of how they relate to each other:

You can see that there are eight corners (4 on the F face and 4 on the D face) and twelve edges (4 on U, 4 on D, and 4 in the middle). This plus the six center cubies gives us our total of 26.

Each face can be rotated about its center. A move consisting of a clockwise rotation of a face is written simply as the name of the face (e.g. R for clockwise rotation of the right face), whereas a counterclockwise rotation is written with a prime or single quote (e.g. B' for counterclockwise rotation of the back face). Since rotating a face four times gets you back to the original state, we can write all of the counterclockwise rotations as combinations of clockwise rotations, for example B' = BBB since rotating the back face clockwise three times is the same as rotating it once counterclockwise.

Note that the centers of the faces never move relative to each other; in fact, if you take apart a Rubik's cube, you'll see that the internal structure actually connects all the centers together so the faces can turn. Most standard cubes look something like this:

A disassembled Rubik's cube

So Where's the Group?

Well, we know a group consists of a set and an operation, so what does that correspond to for the Rubik's cube? One way to think about it would be to have a set of all the possible cube states and use the rotations as operations for going from state to state. This sort of works, but we run into trouble when we try to satisfy the identity axiom: what move would be the identity move? Then of course there's the fact that we're actually trying to define six different operations on our set, which is too many for a valid group.

It's more useful if instead we use the moves themselves as the set (call it \(\Bbb G\)) and then the operation can be concatenating moves together (call it \(*\)). The trick is that a move might consist of zero rotations or millions of rotations - this allows us to represent every possible cube state as the sequence of moves that gets us there from a solved cube. So, does this make a group? Let's see if it satisfies our four axioms.

Closure

Well, there are no two moves you can make that combine to give you something other than a valid move. More formally, \(\forall M_1, M_2 \in \Bbb G,\: M_1 * M_2 \in \Bbb G\).

Associativity

This one's a little tricky. First, it's important to note that this isn't the same thing as commutativity, i.e. it's OK that we don't get the same result if the moves are applied in a different order - we still have a group (a non-abelian group if you're interested). That aside, how do we show that concatenating moves is associative?

Well, what happens if we think of a move as a function that takes us from one state of the cube to another? If we call the state \(S\) then we might write a move as \(M(S)\). If we combine two moves with \(*\) then we would get something like \(M_1 * M_2 = M_2(M_1(S))\). Note that \(M_2\) is on the outside since it gets applied to the cube state after \(M_1\). So here's what associativity looks like with our function notation then: $$M_1 * (M_2 * M_3) = (M_1 * M_2) * M_3\\ M_1 * M_3(M_2(S)) = M_2(M_1(S)) * M_3\\ M_3(M_2(M_1(S))) = M_3(M_2(M_1(S)))$$ The two sides of the equation are the same, so \(*\) is associative.

Identity

If we include a null move consisting of no rotations (call it \(E\)) in our set, then this requirement is satisfied since doing nothing followed by a move (or a move followed by nothing) is the same as just doing that move. That is, \(\forall M \in \Bbb G,\: E * M = M * E = M\).

Inverse

This one's also pretty simple: for every move \(M\), we can define \(M'\) to be the steps that undo it. For example, if \(M = \{RFL'\}\) then \(M' = \{LF'R'\}\). Since \(M'\) has to consist of nothing but face rotations, it must be a member of our set. Formally, \(\forall M \in \Bbb G,\: \exists M' \in \Bbb G\) such that \(M * M' = M' * M = E\).

There you go, all the axioms are satisfied. \(\Bbb G\) is a group.

Using Group Theory to Solve the Cube

Just proving that the group of Rubik's cube moves exists is neat and all, but it doesn't seem to be terribly useful. It turns out, however, that we can use groups to make finding solutions to scrambled cubes a lot faster.

If I were writing an algorithm to solve a Rubik's cube, I would probably (in fact I did) start with some kind of iterative deepening algorithm where we take our cube and see if applying any one of the base moves solves it. If not, we try two moves, then three moves, etc. Eventually we'll find the shortest solution, but it will probably take a very long time since there are about 43 quintillion possible cube states all within 20 or so moves. I tried this and it starts to get unusably slow around 7 moves, which is not enough to solve every cube.

One of the first fast algorithms for cube solving was discovered by Morwen Thistlethwaite. The so-called "descent through nested sub-groups" algorithm works by moving the cube state into subgroups that require fewer and fewer moves until it gets to the solved state. The first group, \(G_0\), is the same as our \(\Bbb G\) group; it contains moves made of all six base moves. The next group consists only of moves made of double turns of the up and down faces, but 90 degree turns of the other faces are allowed. The next two groups work the same way, eliminating single turns of the front and back faces and finally removing single turns of the left and right faces. From there, the only subgroup left is the one that contains the solved state.

This algorithm works because the subgroups are so much smaller than the original group that it's possible to construct lookup tables that tell you how to get from one subgroup to the next. The original algorithm used a few preliminary moves to get the cube into a state that matched the lookup table for the \(G_0 \rightarrow G_1\) transition, but now there are lookup tables for that as well. You can find these lookup tables here if you're interested.

If we take advantage of some facts related to group theory, we can take a problem that's too big to solve reasonably (find a solution to an arbitrary cube) and make it trivially solvable — modern cube solving algorithms can find solutions in milliseconds. Group theory has practical applications all over the place though; modern cryptography depends on it, pretty much any error correcting algorithm uses group theory in some way, and it informs a lot of current physics and chemistry research, among other things.

The ID solver I wrote, along with a little Rust library to support it, is on my GitHub if you're interested. I'll probably add some fancier solution algorithms (like Thistlethwaite's algorithm for starters) when I have time.