Introduction
While working on my 'Searching, generating, validating and rendering Euclidean tilings' project, I was needing to build up a list of distinct shape arrangements (!!! Fig 404 !!!). This needed a way to check an arrangement against a list of previously seen arrangements.
The complexity and interesting part of this problem came because the shapes could be arranged cyclicly, and there was no defined start or end point. This meant that the same arrangement could be represented in multiple ways. I was also dealing with an infinite amount of these arrangements, and 100,000s of them every second.
Arrangements as sequences
The intuitive way to start comparing these shape arrangements is going to be representing them as some sort of sequence , so given the shape arrangement from !!! Fig 404 !!!, we can represent this using Rust's fixed length arrays (!!! Fig 404 !!!).
The fixed length arrays were chosen because as stated above the program was processing a lot of these sequences, and they are more performant than other options. The Rust book has a great section on this - "The Stack and the Heap", but the short explanation is that known spaces on the stack are already available for memory allocation, whereas for unknown sized things (like vectors) require an allocator to search the heap for available space, which makes it slower. We want to limit use of the heap and use the stack where possible.
The sequences are simply a list of numbers that represent the shapes around the center shape, in this case a dodecagon. The numbers represent the number of sides on the shape, so a 3 represents a triangle and a 4 represents a square. The start point in these sequences is irrelevant at the moment
A 0 represents an empty space, and in this case, the dodecagon has 12 sides so the sequence is of length 12. The one downside of using a fixed length array is that we'll need a simple utility function to get the actual length of the sequence for a lot of the operations we'll be doing (!!! Fig 404 !!!).
Symmetrical sequences
A property that's not indicated by the sequences shown above is what direction around the shape arrangement did we go to build up the sequence? Does it matter?
It may seem at first that all sequences when traversed in either direction would yield the same result but it's a specific property depending on the contents of the sequence. Let's take a look at some examples.
If we take the sequence
(!!! Fig 404 !!!) , we can iterate through the starting point and which ever direction we traverse we produce these same set of paths.[3, 4, 3, 12]
However if we take a very similar sequence of
(!!! Fig 404 !!!) and do the same thing, then we get different traversal results when we traverse in both directions.[3, 3, 4, 12]
We can say the first sequence of
is symmetrical and the second sequence of [3, 4, 3, 12]
is asymmetrical.[3, 3, 4, 12]
This is important (at least it is for the shape arrangements example) because arrangements that are asymmetrical sequences are still the same arrangement, just in a different direction and should not be counted as distinct arrangements.
It's fairly simple to check if a sequence is symmetrical. We can simply concatenate the sequence and check for the inverted sequence within itself (!!! Fig 404 !!!).
This
function (!!! Fig 404 !!!) is going to give us a way to check which sequences we need to run any reverse functions on, in get_symmetry_index
time.O(n)
Matches and partial matches
The next step is now to try and match up a sequence with another from a list of sequences.
Outside the context of the 'Searching, generating, validating and rendering Euclidean tilings' project, we could keep ourselves concerned only with complete matches, however for my purpose I also needed to identify partial matches so there will be a little bit of extra logic to handle this.
The way I achieved this was to iterate through the collection of sequences, as soon as there is a full match immediately return it, otherwise the first partial match we keep note of and continue to iterate through the rest of the sequences, returning the first partial match at the end (!!! Fig 404 !!!).
Our comparing logic for two sequences is going to be somewhat similar to the symmetrical check. We're going to concatenate the sequence with itself (by looping twice) and then check for the target sequence within the concatenated sequence, forwards and backwards (when asymmetrical) (!!! Fig 404 !!!).
We can use the Knuth-Morris-Pratt algorithm to do this, which is an efficient way to search for a substring within a string in
time, instead of O(n + m)
. I wont go into the details of the algorithm here as there are plenty of resources that explain it better than I could.O(n * m)
Normalizing sequences
We now have a way to compare sequences to build up a distinct collection, however depending on what sequence we start with, we could end up with a collection that looks different. It might pass our comparison logic, but when outputted to a UI for debugging purposes, it could be confusing to see the same arrangement in different forms. It'll be useful to have a way to normalize the sequences so that they are all in the same form.
The logic we pick here isn't particularly important, as long as it's consistent. I chose to normalize them by finding the smallest permutation of the sequence, whether it's forwards or backwards (another use for the symmetrical checking). By "smallest permutation" I mean, if we were to join the sequence into a single continuous number we can easily compare 2 sequences to find the smallest/largest of the 2.
Maybe this could be performed in less than
time but for my purposes it is called very infrequently so I didn't worry too much about it.O(n^2)
Let's take a look at some examples.
Summary
There's sequences of numbers that can have a cyclic property to them and this poses a problem when trying to compare them because there's no defied start and endpoint. I solved this by essentially concatenating (by looping twice) the sequence and then checking for the target sequence within the duplicated sequence. This checking behaviour can be done using a slightly modified Knuth-Morris-Pratt algorithm, to make it more efficient. I also covered the symmetrical property of the sequence and how we can use this to our advantage to find the starting index of the reverse sequence if it exists.
With that done, I can now continue with the rest of the other project.