Adjective Noun

Pick and Choose (Part 2)

2018-03-17

So I'm learning a bunch about combinatorics, improving my knowledge of Raku Iterators, and getting better at NQP. Of those 3 things, writing NQP is probably the least interesting one to write about, so I'm not sure I'll talk much about it, and I covered a little on Iterators already in part 1½.

Before deciding to write a fast, native, combinatorics module, I did a lot of playing around to find ways to do produce of these sequences as a simple (often one-liner) solution... even if they weren't terribly fast. For someone who never did more than high school maths, I ran up against ideas that might be obvious to the more mathematically inclined... but it's these realisations that were the most interesting to me, so that's mainly what I'm going to talk about.

The next combinatorial algorithm I wanted to implement was a type of permutation known as "k-permutations of n", or P(n, k). Although, looking up information on this algorithm, I learned that it's not strictly a permutation in the true sense of the word, so it has been known by some other names... such as "variations" or "n choose k".

Similar to a normal permutations, this algorithm is about the different ways you can arrange n items. The catch is, you can only arrange up to k of those items. I don't have a cool ice cream analogy to explain this one, but since order matters, the first one that comes to mind is a racing podium. In a given race of 8 racers, how many different possible ways could you order the participants at 1st, 2nd, and 3rd place? The answer is P(8, 3) == 336.

One of the reasons I wanted to tackle this algorithm next is because - like combinations_with_repetition - it's available in Python under the itertools module. It's generated when using the permutations function, with an additional argument (the k in P(n, k)).

>>> from itertools import permutations
>>> list(permutations([1, 2, 3, 4], 2))
[(1, 2), (1, 3), (1, 4), (2, 1), (2, 3), (2, 4),
 (3, 1), (3, 2), (3, 4), (4, 1), (4, 2), (4, 3)]

Again, like my previous multicombinations algorithm, I had originally worked out a one-liner that produces the same results, albeit, not in lexicographic order

> (1, 2, 3, 4).combinations(2).map(|*.permutations)
((1 2) (2 1) (1 3) (3 1) (1 4) (4 1)
 (2 3) (3 2) (2 4) (4 2) (3 4) (4 3))

If you can get over the odd order, this looks pretty efficient because I am not throwing away any produced elements. However, there is quite a lot of overhead associated with creating so many short lived permutations Iterators. Of course, if you just need a short solution with no dependencies, this will suffice... but we can do a little better.

I realised that with sequences of k elements, where k ≥ n - 1, I could just generate the permutations and trim the sub-lists.

> (^3).combinations(2).map(|*.permutations).sort
((0 1) (0 2) (1 0) (1 2) (2 0) (2 1))
> (^3).permutations».[^2]
((0 1) (0 2) (1 0) (1 2) (2 0) (2 1))

... and then I realised that when k == n - 2, I could generate the permutations, skip every second element, and trim the sub-lists

> (^4).combinations(2).map(|*.permutations).sort
((0 1) (0 2) (0 3) (1 0) (1 2) (1 3) (2 0) (2 1) (2 3) (3 0) (3 1) (3 2))
> (^4).permutations[0, 2 ... *]».[^2]
((0 1) (0 2) (0 3) (1 0) (1 2) (1 3) (2 0) (2 1) (2 3) (3 0) (3 1) (3 2))

Some of you are smarter than me and know where this is heading... but I had to run a few more examples until I realised that I could always generate permutations, and the number of elements to skip is (n - k)! ie. the factorial of n - k.

sub postfix:<!> ($n) { [×] 1 .. $n }  # Factorial function

sub variations(@n, $k) {
    @n.permutations[ 0, * + (@n - $k)! ... * ].map(*[^$k])
}

You might think that since I'm throwing away data, it's inefficient... and you'd be right! However, it's got less overhead than the first one-liner; it's only making one call to permutations, and it's doing a relatively simple index into some lists. It benches faster too, and there are some simple ways to speed this up a little if you sacrifice it's brevity... but if I talk about them now, I'll run out of paper... That's how this computer contraption works, right?

So now on to the meat. How do I efficiently generate this sequence algorithmically? This algorithm was a bit harder to track down, particularly because it's known by a few names, the main one of which is confusingly close to a different algorithm. I also couldn't find a RosettaCode task that talks about this specific sequence.

Eventually I found a blog post titled "A Simple, Efficient P(n,k) Algorithm" by some chap named Alistair Israel. The description matched what I was looking for, so I got to work translating it to Raku, and here it is.

sub variations(@n is copy, $k) {

    my @i = ^@n.elems;
    my $m = @n.end;
    my $e = $k - 1;

    gather loop {

        take @n[ @i[^$k] ];

        my $j = $k;
        $j++ while $j$m && @i[$e] ≥ @i[$j];

        if $j$m {
            swap( @i[$e], @i[$j] );
        }
        else {
            @i[$k .. $m] .= reverse;

            my $i = $e - 1;
            $i-- while $i0 && @i[$i] ≥ @i[$i + 1];

            last if $i < 0;

            $j = $m;
            $j-- while $j > $i && @i[$i] ≥ @i[$j];

            swap( @i[$i], @i[$j] );
            @i[$i + 1 .. $m] .= reverse;
        }
    }
}

Don't run it yet! There is no swap function in Raku. Of course, I can simply swap natively...

> my @x = 0..5
[0 1 2 3 4 5]
> (@x[2], @x[4]) = (@x[4], @x[2]); @x
[0 1 4 3 2 5]

... but I have to repeat myself too much, and since swap is a fairly common task in permutation algorithms, it's best to factor it out. You could do that with a plain ol' function

sub swap($a is rw, $b is rw) {
    ($a, $b) = $b, $a;
}

Or if you feeling extra adventurous, you could experiment with the current macro system.

use experimental :macros;

macro swap($a, $b) {
    quasi {
        ({{{$a}}}, {{{$b}}}) = {{{$b}}}, {{{$a}}}
    }
}

I say current to re-enforce that macros are an experimental feature they may change in future releases.

Whichever one you choose, this variations function - in pure Raku - benches faster than the previous one-line for shorter sequences like P(4, 2), but starts to lose on longer sequences like P(6, 5), but that should be rectified by converting this algorithm to NQP. Which it was... and is! I have pushed the NQP based variations to Github if you wanna give it a spin.

There still some improvements to make. It occurs to me that calculating the count-only each time is silly. I should calculate it once during the Iterator creation, and store it for later use.

Lastly... My guess is that if people are playing with combinatorics, they're probably playing with factorials as well... So the module now also has two additional exports: a factorial() function, and factorial-op which exports the postfix ! factorial operator. Factorials are interesting in themselves and I have been looking into faster implementations of that, so that will most likely make it's way into later commits.