Cubr


Cubr is a project I completed in three weeks at the end of my introductory computer science class at CMU. The idea is simple: you mix up a Rubik’s cube. You show the cube to your computer’s webcam. Some magic happens, and your cube appears onscreen. Then, the cube begins to solve itself, and all you have to do is follow along and you will have solved your cube!

Here is the web version (under development).

Here’s a video presentation:

Color Recognition

One of the biggest challenges in this project is color recognition. Figuring out what color a pixel is showing on your monitor is easy, because your computer has a digital model of it. But it’s a little more tricky to read the input from a webcam to figure out what color is on a shiny sticker on a plastic cube. Every room has different lighting, stickers show glare if your monitor’s brightness is too high, and so on. The best we can do is guess what color a sticker is. Of course, we can try to make our guesses really good. I gathered a fairly large set of data on the HSV and RGB values of each colored sticker in different lighting situations. Looking back now, I could have done a lot better with that data. I could have trained a machine learning model, and I could have taken into account whole-setting attributes rather than looking at each sticker individually (i.e., notice that this blue sticker is lighter than most blue stickers –> ask the model how that will most likely affect the orange stickers). Alas, I didn’t do that. Instead, I did this:

I wrote code to find the mean HSV and modal RGB from each sticker. I assumed that the dataset’s HSVRGB curve for each sticker color had a normal distribution. The “model” finds the deviation from the mean along each dimension for each sticker color, and determines a probability for each sticker color. It returns the most likely color. Sounds good, right? Well… it was correct less often than hard-coded thresholds. If I had known what an HMM was at that time, maybe it wouldn’t have been so inaccurate. But I didn’t, and it was. And I needed accurate results for my presentation… so as much as it killed my sense of robust design, I eyeballed the HSV and RGB values and found some thresholds that worked 95% of the time in 95% of lightings.

if blue / red > 2 and blue / green > 2: 
    return 'blue'
elif green / red > 2:
    return 'green'

if hue > 150 :
    return 'red'
elif hue < 20 and s < 150:
    return 'white'
elif hue < 20:
    return 'orange'
elif hue < 50:
    return 'yellow'
else:
    return 'white'

Ahh... it burns. It burns my soul. But it works.

Of course, I included a GUI to manually modify the colors on each cell of the cube. A 3D "mirror" of the cube is shown on the same screen as the webcam's view. The colors on this mirror represent what will be stored internally, so the user can notice if it doesn't match up with the physical cube. The highest level functions I could use for drawing onto the numpy.ndarray image were polygon-filling functions. This made drawing a 3D Rubik's cube a bit of a challenge. Fortunately, I already had written some 3D-Vector-polygon-to-2D-output code for interfacing with Tkinter.

A good GUI in Tkinter is possible, but...

You have to reinvent the wheel. And the spoke, and the axle. Not wood, though. Wood is already invented for you.

To create a GUI in Tkinter, I used one widget: a Canvas. In early versions of the game, I used Tkinter's Button and Label widgets, but they were unappealing and offered little flexibility. I wanted as much flexibility as possible for this project. That meant no VPython, because I would be unable to draw 2D shapes directly onto the Canvas, I would be unable to create polygons (they would need to be really thin polyhedrons), and the events to which I could bind listeners would be limited (VPython takes over clicking and dragging). If anyone knows how to get around any of that, I'd love to hear from you about it. But as it was, I was stuck using Tkinter. So, I wrote a geometry module to implement a Vector class and a Camera class. The Camera class has a flatten method that takes a 3D vector and "flattens" it to 2D coordinates based on the Camera's view. After I had this foundation, I could write all my code to reason in 3-space, and drawing onto the 2D canvas was taken care of.

As for the buttons, Tkinter does have decent event handling. You can trivially change the color or border-width that a shape displays when it is moused over:

solveButton = pane.create_oval(cx - r, cy - r, cx + r, cy + r,
                               fill='#0088ff', activefill='#00ffff', 
                               outline='#ffffff', width=1, activewidth=3)

And binding an event listener to the shape is equally trivial:

pane.tag_bind(solveButton, '', self.solve)

I was able to design the GUI in an intuitive and responsive way, for the purposes of this project anyway, using Tkinter. The main limitation is that it could get slow. Rendering 3D on a 2D canvas using CPython is not the quickest thing in the world. Since I did not do any shading or texturing for this project, the performance was fine, but I would not recommend it for graphics-intensive projects. Tkinter does not interface with the OS as well as frameworks like .NET. For this project, though, most of the interface was atomic. However, I did run into some trouble when integrating with OpenCV. To display video from the webcam, I had to hand over control to OpenCV's "NamedWindow" class and put my Tkinter window in the background. It would have been ideal if I were able to integrate the video stream into my GUI. So, Tkinter has its limitations, but for projects that do not need to surpass those limitations, it can be used effectively.

"I'm sure you all know how to solve a Rubik's cube, but I don't..."

This was my opening line for my presentation on this project. People laughed. He created a Rubik's cube solver; of course he knows how to do it! Well, I understand the steps. I understand what each algorithm does. But I really haven't memorized those algorithms. I didn't get into software design because I wanted to memorize and regurgitate algorithms. I did it because I want to design and write them. If you gave me a Rubik's cube right now, I could solve it part of the way. I could solve the first two layers and a bit of the third. But after that, I'd have to get my computer out and fire up Cubr.

I had a couple of choices when I was deciding which Rubik's cube solution algorithm I wanted to implement. One is the beginner's three-layer method. This is the one that most people know and memorize. I used to know it, and I could solve a cube in around 50 seconds using it (that's not very good). There are other algorithms that require a lot more memorization and quick pattern recognition; however, they are in the same class of solutions as the three-layer method. The other option was to write an algorithm to find an optimal solution. This would have been a lot easier to implement, because all I would have to do is walk a tree of all possible move combinations until I find a solution. Simple, right? Yes, it would be no more than 10 lines of code.

moves = ["R", "R'", "R2", "L", "L'", "L2", "U", "U'", "U2", "D", "D'", "D2", "B", "B'", "B2", "F", "F'", "F2"]
solutions = set() # Assume we have checked that it is not already solved
for sequence in itertools.product([moves]*20):
    temp_cubestate = cubestate.copy()
    for i, move in enumerate(sequence):
        temp_cubestate.make_move(move)
        if temp_cubestate.is_solved():
               solutions.add(sequence[:i+1])
               break
best_solution = min(solutions, key=lambda x: len(x))

BUT- it would take a while to find a solution. The maximum number of move sequences that would need to be tried is on the order of 18^20, which is about 1.275E25. That's an upper bound, of course, because many of those sequences could be excluded by not exploring the same node twice. But at that upper bound, assuming we have a super-fancy problem-specific logic gate that can execute 20 Rubik's cube turns on an array in one clock cycle, at 4GHz (why not, right?) that's 101,060 millennia to find the optimal solution. So my ten lines of code may not work so well..

Upon further research, I found that there are profile tables that can be used to speed up the search for an optimal solution. However, the algorithm I found was for a grad school project, and I only had three weeks, so learning how to create tables to solve NP-complete problems with a defined search space had to wait for another day.

Instead, I implemented the three-layer algorithm. It is the most popular algorithm among beginners because it is intuitive and requires relatively little memorization. There were a couple hurdles to pass. Most beginners are taught to "solve the cross" on the top layer by moving the edges pieces on one face into their correct locations. This is intuitive for a human to understand, but special care was needed to cover all the possible cases in my solutions module. I have to admit, the solutions module is a lot more supervised than "try to put that piece over there". It examines each piece of the cube, in a specified order (except the third layer, which is done all at once), and chooses the correct algorithm from a set based on its position relative to its desired position. All in all, it came to over a thousand lines of code to explicate the algorithm.

The most difficult part of writing the algorithm was translating "if the piece is directly under where it needs to be, then turn the bottom face clockwise, turn the right face counterclockwise, ..." into "if the piece's location minus the edge face's location dotted with the desired location minus the edge face's location is equal to negative one...". It was very much a spacial reasoning problem, and in order to abstract it as much as possible, that is, to generalize as many cases as possible, I found myself expressing conditions as various vector expressions that described a general situation, like crossing to vectors and dotting that with the face they share to determine whether a piece is clockwise or counterclockwise from another piece. And so on.

The good news is, with all this guidance, the algorithm works very quickly, even with the overhead of python lists (as compared with ndarrays). On average, the algorithm takes less than 100,000 millennia.