Introduction

While working on my 'Searching, generating, validating and rendering lattice structures' project, I was needing to build up a list of distinct shape arrangements (!!! Fig 404 !!!). This needed a way to check an arrangement against a list of previously seen arrangements.

The complexity and interesting part of this problem came because the shapes could be arranged cyclicly, and there was no defined start or end point. This meant that the same arrangement could be represented in multiple ways. I was also dealing with an infinite amount of these arrangements, and 100,000s of them every second.

Fig. - Shape arrangement of a dodecagon at the center, with alternating triangles, squares and hexagons on it's edges.

Wasm loading...

Arrangements as sequences

The intuitive way to start comparing these shape arrangements is going to be representing them as some sort of sequence , so given the shape arrangement from !!! Fig 404 !!!, we can represent this using Rust's fixed length arrays (!!! Fig 404 !!!).

The fixed length arrays were chosen because as stated above the program was processing a lot of these sequences, and they are more performant than other options. The Rust book has a great section on this - "The Stack and the Heap", but the short explanation is that known spaces on the stack are already available for memory allocation, whereas for unknown sized things (like vectors) require an allocator to search the heap for available space, which makes it slower. We want to limit use of the heap and use the stack where possible.

Fig. - A couple of example sequences represented as Rust fixed length arrays.

// Regular polygon tilings are limited to 3, 4, 6, 8
// and 12 sided shapes so I'll use 12 here as an upper
// sequence limit, but this could be extended to whatever.
type Sequence = [u8; 12];

// A dodecagon sequence
let seq_1: Sequence = [3, 4, 6, 4, 3, 4, 6, 4, 3, 4, 6, 4];

// A triangle sequence
let seq_2: Sequence = [6, 6, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0];

The sequences are simply a list of numbers that represent the shapes around the center shape, in this case a dodecagon. The numbers represent the number of sides on the shape, so a 3 represents a triangle and a 4 represents a square. The start point in these sequences is irrelevant at the moment

A 0 represents an empty space, and in this case, the dodecagon has 12 sides so the sequence is of length 12. The one downside of using a fixed length array is that we'll need a simple utility function to get the actual length of the sequence for a lot of the operations we'll be doing (!!! Fig 404 !!!).

Fig. - Implementation to return the actual length of a sequence

/// Returns the length of a sequence.
///
/// Space: O(1)
/// Time:  O(n)
pub fn get_length(sequence: &Sequence) -> usize {
  let mut length = 0;

  for value in sequence {
    if *value == 0 {
      return length;
    }

    length += 1;
  }

  length
}

Symmetrical sequences

A property that's not indicated by the sequences shown above is what direction around the shape arrangement did we go to build up the sequence? Does it matter?

It may seem at first that all sequences when traversed in either direction would yield the same result but it's a specific property depending on the contents of the sequence. Let's take a look at some examples.

If we take the sequence [3, 4, 3, 12] (!!! Fig 404 !!!) , we can iterate through the starting point and which ever direction we traverse we produce these same set of paths.

Fig. - A symmetrical sequence, the same traversal is produced regardless of the direction.

Wasm loading...

// Anti/Clockwise
[3, 4, 3, 12]
[4, 3, 12, 3]
[3, 12, 3, 4]
[12, 3, 4, 3]

However if we take a very similar sequence of [3, 3, 4, 12] (!!! Fig 404 !!!) and do the same thing, then we get different traversal results when we traverse in both directions.

Fig. - An asymmetrical sequence, and it's differing traversals in either direction

Wasm loading...

// Clockwise
[3, 3, 4, 12]
[3, 4, 12, 3]
[4, 12, 3, 3]
[12, 3, 3, 4]

// Anticlockwise
[12, 4, 3, 3]
[3, 12, 4, 3]
[3, 3, 12, 4]
[4, 3, 3, 12]

We can say the first sequence of [3, 4, 3, 12] is symmetrical and the second sequence of [3, 3, 4, 12] is asymmetrical.

This is important (at least it is for the shape arrangements example) because arrangements that are asymmetrical sequences are still the same arrangement, just in a different direction and should not be counted as distinct arrangements.

It's fairly simple to check if a sequence is symmetrical. We can simply concatenate the sequence and check for the inverted sequence within itself (!!! Fig 404 !!!).

Fig. - Concatenated symmetrical and asymmetrical sequences

Wasm loading...

This get_symmetry_index function (!!! Fig 404 !!!) is going to give us a way to check which sequences we need to run any reverse functions on, in O(n) time.

Fig. - Implementation to retrieve the starting index of the reverse sequence if it exists

/// A symmetrical sequence is one that starting
/// at any point and going in one direction.
/// Whatever path has been taken, can be taken
/// in the opposite direction.
///
/// The approach we'll take here it to loop through
/// the sequence twice, and while doing so, checking
/// if there is the appearance of the sequence in
/// it's reversed form.
///
/// Space: O(1)
/// Time:  O(n)
pub fn get_symmetry_index(sequence: &Sequence) -> Option<usize> {
  let length = get_length(sequence);
  let mut c = 0;
  let mut i = None;

  for _ in 0..2 {
    for j in 0..length {
      if sequence[j] == sequence[length - 1 - c] {
        if c == length - 1 {
          return i;
        }

        if i.is_none() {
          i = Some(j);
        }

        c += 1;
      } else {
        c = 0;
      }
    }
  }

  None
}

Matches and partial matches

The next step is now to try and match up a sequence with another from a list of sequences.

Outside the context of the 'Searching, generating, validating and rendering lattice structures' project, we could keep ourselves concerned only with complete matches, however for my purpose I also needed to identify partial matches so there will be a little bit of extra logic to handle this.

The way I achieved this was to iterate through the collection of sequences, as soon as there is a full match immediately return it, otherwise the first partial match we keep note of and continue to iterate through the rest of the sequences, returning the first partial match at the end (!!! Fig 404 !!!).

Fig. - Implementation to search through the sequences

#[derive(Clone, Copy, Debug, PartialEq, Eq, Serialize)]
pub enum Match {
  Exact(u8),
  Partial(u8),
  None,
}

#[derive(Clone, Copy, Debug, PartialEq, Eq, Serialize)]
pub enum Direction {
  Forward,
  Backward,
}

/// Searches for a sequence in a list of sequences, and returns the first
/// sequence that either matches exactly or partially. Note that this function
/// assumes that the targets are sorted.
///
/// Space: O(1)
/// Time:  O(n + m) where n is the number of targets and m is the length of the
///        longest target.
pub fn get_match(sequence: &Sequence, targets: &[Sequence]) -> Match {
  let mut first_partial_match = Match::None;

  for (index, target) in targets.iter().enumerate() {
    match get_sequence_match(sequence, target, Direction::Forward) {
      Match::Exact(_) => {
        return Match::Exact(index as u8);
      }
      Match::Partial(_) => {
        if matches!(first_partial_match, Match::None) {
          first_partial_match = Match::Partial(index as u8);
        }
      }
      _ => {}
    }
  }

  first_partial_match
}

Our comparing logic for two sequences is going to be somewhat similar to the symmetrical check. We're going to concatenate the sequence with itself (by looping twice) and then check for the target sequence within the concatenated sequence, forwards and backwards (when asymmetrical) (!!! Fig 404 !!!).

We can use the Knuth-Morris-Pratt algorithm to do this, which is an efficient way to search for a substring within a string in O(n + m) time, instead of O(n * m). I wont go into the details of the algorithm here as there are plenty of resources that explain it better than I could.

Fig. - Implementation to search through the sequences

// Searches for a sequence in a list of sequences, and returns the first
// sequence that either matches exactly or partially. This function uses
// the Knuth-Morris-Pratt algorithm to search for the sequence in the target,
// wrapping around the target to match offset starting points. It will search
// in both directions if the target is not symmetrical.
//
// Space: O(n)
// Time:  O(n + m)
fn get_sequence_match(source: &Sequence, target: &Sequence, direction: Direction) -> Match {
  let source_length = get_length(source);
  let target_length = get_length(target);
  let lps = create_lps(source);

  if source_length > target_length {
    return Match::None;
  }

  let mut target_i = 0;
  let mut source_i = 0;

  while target_i < target_length * 2 {
    if source[source_i] == target[target_i % target_length] {
      target_i += 1;
      source_i += 1;
    } else if source_i != 0 {
      source_i = lps[source_i - 1] as usize;
    } else {
      target_i += 1;
    }

    if source_i == source_length {
      if source_i == target_length {
        return Match::Exact(0);
      }

      return Match::Partial(0);
    }
  }

  if direction == Direction::Forward && !is_symmetrical(target) {
    return get_sequence_match(&reverse(source), target, Direction::Backward);
  }

  Match::None
}

// Creates a KMP longest prefix suffix array
//
// Space: O(n)
// Time:  O(n)
fn create_lps(sequence: &Sequence) -> Sequence {
  let length = get_length(sequence);
  let mut lps = Sequence::default();

  let mut l = 0;
  let mut r = 1;

  // The first element is always 0, as there
  // are no prefixes or suffixes
  lps[0] = 0;

  while r < length {
    if sequence[l] == sequence[r] {
      lps[r] = (l + 1) as u8;
      l += 1;
      r += 1;
    } else if l != 0 {
      l = lps[l - 1] as usize;
    } else {
      lps[r] = 0;
      r += 1;
    }
  }

  lps
}

Normalizing sequences

We now have a way to compare sequences to build up a distinct collection, however depending on what sequence we start with, we could end up with a collection that looks different. It might pass our comparison logic, but when outputted to a UI for debugging purposes, it could be confusing to see the same arrangement in different forms. It'll be useful to have a way to normalize the sequences so that they are all in the same form.

The logic we pick here isn't particularly important, as long as it's consistent. I chose to normalize them by finding the smallest permutation of the sequence, whether it's forwards or backwards (another use for the symmetrical checking). By "smallest permutation" I mean, if we were to join the sequence into a single continuous number we can easily compare 2 sequences to find the smallest/largest of the 2.

Maybe this could be performed in less than O(n^2) time but for my purposes it is called very infrequently so I didn't worry too much about it.

Fig. - Implementation to search through the sequences

/// Returns the minimum permutation of a sequence.
///
/// Space: O(1)
/// Time: O(n^2)
pub fn get_min_permutation(sequence: &Sequence) -> Sequence {
  get_min_permutation_directional(sequence, Direction::Forward)
}

/// Returns the minimum permutation of a sequence with
/// the option to reverse the sequence.
///
/// Space: O(1)
/// Time: O(n^2)
fn get_min_permutation_directional(sequence: &Sequence, direction: Direction) -> Sequence {
  let length = get_length(sequence);
  let mut a = *sequence;

  for i in 0..length {
    let b = shift_left(sequence, i);

    if compare(&a, &b) == std::cmp::Ordering::Greater {
      a = b;
    }
  }

  if direction == Direction::Forward && !is_symmetrical(sequence) {
    let b = get_min_permutation_directional(&reverse(sequence), Direction::Backward);

    if compare(&a, &b) == std::cmp::Ordering::Greater {
      a = b;
    }
  }

  a
}

/// Shifts a sequence to the left and inserts the
/// last value at the beginning.
///
/// Space: O(1)
/// Time:  O(n)
fn shift_left(sequence: &Sequence, shift: usize) -> Sequence {
  let length = get_length(sequence);
  let mut shifted_sequence = Sequence::default();

  #[allow(clippy::needless_range_loop)]
  for i in 0..length {
    let j = (i + shift) % length;
    shifted_sequence[i] = sequence[j];
  }

  shifted_sequence
}

/// Reverses a sequence
///
/// Space: O(n)
/// Time:  O(n)
pub fn reverse(sequence: &Sequence) -> Sequence {
  let mut reversed_sequence = Sequence::default();
  let length = get_length(sequence);

  for i in 0..length {
    reversed_sequence[i] = sequence[length - 1 - i];
  }

  reversed_sequence
}

Let's take a look at some examples.

Wasm loading...

Summary

There's sequences of numbers that can have a cyclic property to them and this poses a problem when trying to compare them because there's no defied start and endpoint. I solved this by essentially concatenating (by looping twice) the sequence and then checking for the target sequence within the duplicated sequence. This checking behaviour can be done using a slightly modified Knuth-Morris-Pratt algorithm, to make it more efficient. I also covered the symmetrical property of the sequence and how we can use this to our advantage to find the starting index of the reverse sequence if it exists.

With that done, I can now continue with the rest of the other project.

Introduction

Fig. - Shape arrangement of a dodecagon at the center, with alternating triangles, squares and hexagons on it's edges.

Wasm loading...

Arrangements as sequences

Fig. - A couple of example sequences represented as Rust fixed length arrays.

// Regular polygon tilings are limited to 3, 4, 6, 8
// and 12 sided shapes so I'll use 12 here as an upper
// sequence limit, but this could be extended to whatever.
type Sequence = [u8; 12];

// A dodecagon sequence
let seq_1: Sequence = [3, 4, 6, 4, 3, 4, 6, 4, 3, 4, 6, 4];

// A triangle sequence
let seq_2: Sequence = [6, 6, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0];

Fig. - Implementation to return the actual length of a sequence

/// Returns the length of a sequence.
///
/// Space: O(1)
/// Time:  O(n)
pub fn get_length(sequence: &Sequence) -> usize {
  let mut length = 0;

  for value in sequence {
    if *value == 0 {
      return length;
    }

    length += 1;
  }

  length
}

Symmetrical sequences

A property that's not indicated by the sequences shown above is what direction around the shape arrangement did we go to build up the sequence? Does it matter?

If we take the sequence [3, 4, 3, 12] (!!! Fig 404 !!!) , we can iterate through the starting point and which ever direction we traverse we produce these same set of paths.

Fig. - A symmetrical sequence, the same traversal is produced regardless of the direction.

Wasm loading...

// Anti/Clockwise
[3, 4, 3, 12]
[4, 3, 12, 3]
[3, 12, 3, 4]
[12, 3, 4, 3]

However if we take a very similar sequence of [3, 3, 4, 12] (!!! Fig 404 !!!) and do the same thing, then we get different traversal results when we traverse in both directions.

Fig. - An asymmetrical sequence, and it's differing traversals in either direction

Wasm loading...

// Clockwise
[3, 3, 4, 12]
[3, 4, 12, 3]
[4, 12, 3, 3]
[12, 3, 3, 4]

// Anticlockwise
[12, 4, 3, 3]
[3, 12, 4, 3]
[3, 3, 12, 4]
[4, 3, 3, 12]

We can say the first sequence of [3, 4, 3, 12] is symmetrical and the second sequence of [3, 3, 4, 12] is asymmetrical.

It's fairly simple to check if a sequence is symmetrical. We can simply concatenate the sequence and check for the inverted sequence within itself (!!! Fig 404 !!!).

Fig. - Concatenated symmetrical and asymmetrical sequences

Wasm loading...

This get_symmetry_index function (!!! Fig 404 !!!) is going to give us a way to check which sequences we need to run any reverse functions on, in O(n) time.

Fig. - Implementation to retrieve the starting index of the reverse sequence if it exists

/// A symmetrical sequence is one that starting
/// at any point and going in one direction.
/// Whatever path has been taken, can be taken
/// in the opposite direction.
///
/// The approach we'll take here it to loop through
/// the sequence twice, and while doing so, checking
/// if there is the appearance of the sequence in
/// it's reversed form.
///
/// Space: O(1)
/// Time:  O(n)
pub fn get_symmetry_index(sequence: &Sequence) -> Option<usize> {
  let length = get_length(sequence);
  let mut c = 0;
  let mut i = None;

  for _ in 0..2 {
    for j in 0..length {
      if sequence[j] == sequence[length - 1 - c] {
        if c == length - 1 {
          return i;
        }

        if i.is_none() {
          i = Some(j);
        }

        c += 1;
      } else {
        c = 0;
      }
    }
  }

  None
}

Matches and partial matches

The next step is now to try and match up a sequence with another from a list of sequences.

Fig. - Implementation to search through the sequences

#[derive(Clone, Copy, Debug, PartialEq, Eq, Serialize)]
pub enum Match {
  Exact(u8),
  Partial(u8),
  None,
}

#[derive(Clone, Copy, Debug, PartialEq, Eq, Serialize)]
pub enum Direction {
  Forward,
  Backward,
}

/// Searches for a sequence in a list of sequences, and returns the first
/// sequence that either matches exactly or partially. Note that this function
/// assumes that the targets are sorted.
///
/// Space: O(1)
/// Time:  O(n + m) where n is the number of targets and m is the length of the
///        longest target.
pub fn get_match(sequence: &Sequence, targets: &[Sequence]) -> Match {
  let mut first_partial_match = Match::None;

  for (index, target) in targets.iter().enumerate() {
    match get_sequence_match(sequence, target, Direction::Forward) {
      Match::Exact(_) => {
        return Match::Exact(index as u8);
      }
      Match::Partial(_) => {
        if matches!(first_partial_match, Match::None) {
          first_partial_match = Match::Partial(index as u8);
        }
      }
      _ => {}
    }
  }

  first_partial_match
}

Fig. - Implementation to search through the sequences

// Searches for a sequence in a list of sequences, and returns the first
// sequence that either matches exactly or partially. This function uses
// the Knuth-Morris-Pratt algorithm to search for the sequence in the target,
// wrapping around the target to match offset starting points. It will search
// in both directions if the target is not symmetrical.
//
// Space: O(n)
// Time:  O(n + m)
fn get_sequence_match(source: &Sequence, target: &Sequence, direction: Direction) -> Match {
  let source_length = get_length(source);
  let target_length = get_length(target);
  let lps = create_lps(source);

  if source_length > target_length {
    return Match::None;
  }

  let mut target_i = 0;
  let mut source_i = 0;

  while target_i < target_length * 2 {
    if source[source_i] == target[target_i % target_length] {
      target_i += 1;
      source_i += 1;
    } else if source_i != 0 {
      source_i = lps[source_i - 1] as usize;
    } else {
      target_i += 1;
    }

    if source_i == source_length {
      if source_i == target_length {
        return Match::Exact(0);
      }

      return Match::Partial(0);
    }
  }

  if direction == Direction::Forward && !is_symmetrical(target) {
    return get_sequence_match(&reverse(source), target, Direction::Backward);
  }

  Match::None
}

// Creates a KMP longest prefix suffix array
//
// Space: O(n)
// Time:  O(n)
fn create_lps(sequence: &Sequence) -> Sequence {
  let length = get_length(sequence);
  let mut lps = Sequence::default();

  let mut l = 0;
  let mut r = 1;

  // The first element is always 0, as there
  // are no prefixes or suffixes
  lps[0] = 0;

  while r < length {
    if sequence[l] == sequence[r] {
      lps[r] = (l + 1) as u8;
      l += 1;
      r += 1;
    } else if l != 0 {
      l = lps[l - 1] as usize;
    } else {
      lps[r] = 0;
      r += 1;
    }
  }

  lps
}

Normalizing sequences

Maybe this could be performed in less than O(n^2) time but for my purposes it is called very infrequently so I didn't worry too much about it.

Fig. - Implementation to search through the sequences

/// Returns the minimum permutation of a sequence.
///
/// Space: O(1)
/// Time: O(n^2)
pub fn get_min_permutation(sequence: &Sequence) -> Sequence {
  get_min_permutation_directional(sequence, Direction::Forward)
}

/// Returns the minimum permutation of a sequence with
/// the option to reverse the sequence.
///
/// Space: O(1)
/// Time: O(n^2)
fn get_min_permutation_directional(sequence: &Sequence, direction: Direction) -> Sequence {
  let length = get_length(sequence);
  let mut a = *sequence;

  for i in 0..length {
    let b = shift_left(sequence, i);

    if compare(&a, &b) == std::cmp::Ordering::Greater {
      a = b;
    }
  }

  if direction == Direction::Forward && !is_symmetrical(sequence) {
    let b = get_min_permutation_directional(&reverse(sequence), Direction::Backward);

    if compare(&a, &b) == std::cmp::Ordering::Greater {
      a = b;
    }
  }

  a
}

/// Shifts a sequence to the left and inserts the
/// last value at the beginning.
///
/// Space: O(1)
/// Time:  O(n)
fn shift_left(sequence: &Sequence, shift: usize) -> Sequence {
  let length = get_length(sequence);
  let mut shifted_sequence = Sequence::default();

  #[allow(clippy::needless_range_loop)]
  for i in 0..length {
    let j = (i + shift) % length;
    shifted_sequence[i] = sequence[j];
  }

  shifted_sequence
}

/// Reverses a sequence
///
/// Space: O(n)
/// Time:  O(n)
pub fn reverse(sequence: &Sequence) -> Sequence {
  let mut reversed_sequence = Sequence::default();
  let length = get_length(sequence);

  for i in 0..length {
    reversed_sequence[i] = sequence[length - 1 - i];
  }

  reversed_sequence
}

Let's take a look at some examples.

Wasm loading...

Summary

With that done, I can now continue with the rest of the other project.