Adjective Noun

Tag combinatorics

Pick and Choose (Part N)

2020-06-26 22:51, Tags: combinatorics

Hey, you remember this series, right? No? Oh that's probably because the last article was years ago. If you're interested, here's the previous articles...

Ahh, those heady days of 2018. I had big dreams. I had hoped my momentum for working on this module would eventually pick back up. I realised along the way that I'm sitting on some half-decent functions that no one can really use because I haven't published them. That changes today.

Today, I have published Math::Combinatorics to CPAN. It features the following functions

  • multicombinations (aka combinations with replacement)
  • variations (aka k-permutations of n)
  • partitions
  • derangements

Of those, only derangements is not written in NQP. My skill with NQP is that of a amateur, so I may not have written the most efficient code. However the implementations written in NQP should at least be noticeably faster than most pure-Raku functions implementing the same algorithms.

I held off on publishing this module for so long because I wanted to polish it, provide more functions, and implement faster .skip on things like permutations, where the n-th permutation in a sequence can be determined algorithmically.

In hindsight, it was silly not to publish sooner. It doesn't prevent me from working on those things, and encourages others to get involved too.

Sometimes in life, you have to pick and choose your battles.

Pick and Choose (Part 2)

2018-03-17 01:04, Tags: combinatorics

So I'm learning a bunch about combinatorics, improving my knowledge of Raku Iterators, and getting better at NQP. Of those 3 things, writing NQP is probably the least interesting one to write about, so I'm not sure I'll talk much about it, and I covered a little on Iterators already in part 1½.

Before deciding to write a fast, native, combinatorics module, I did a lot of playing around to find ways to do produce of these sequences as a simple (often one-liner) solution... even if they weren't terribly fast. For someone who never did more than high school maths, I ran up against ideas that might be obvious to the more mathematically inclined... but it's these realisations that were the most interesting to me, so that's mainly what I'm going to talk about.

The next combinatorial algorithm I wanted to implement was a type of permutation known as "k-permutations of n", or P(n, k). Although, looking up information on this algorithm, I learned that it's not strictly a permutation in the true sense of the word, so it has been known by some other names... such as "variations" or "n choose k".

Similar to a normal permutations, this algorithm is about the different ways you can arrange n items. The catch is, you can only arrange up to k of those items. I don't have a cool ice cream analogy to explain this one, but since order matters, the first one that comes to mind is a racing podium. In a given race of 8 racers, how many different possible ways could you order the participants at 1st, 2nd, and 3rd place? The answer is P(8, 3) == 336.

One of the reasons I wanted to tackle this algorithm next is because - like combinations_with_repetition - it's available in Python under the itertools module. It's generated when using the permutations function, with an additional argument (the k in P(n, k)).

>>> from itertools import permutations
>>> list(permutations([1, 2, 3, 4], 2))
[(1, 2), (1, 3), (1, 4), (2, 1), (2, 3), (2, 4),
 (3, 1), (3, 2), (3, 4), (4, 1), (4, 2), (4, 3)]

Again, like my previous multicombinations algorithm, I had originally worked out a one-liner that produces the same results, albeit, not in lexicographic order

> (1, 2, 3, 4).combinations(2).map(|*.permutations)
((1 2) (2 1) (1 3) (3 1) (1 4) (4 1)
 (2 3) (3 2) (2 4) (4 2) (3 4) (4 3))

If you can get over the odd order, this looks pretty efficient because I am not throwing away any produced elements. However, there is quite a lot of overhead associated with creating so many short lived permutations Iterators. Of course, if you just need a short solution with no dependencies, this will suffice... but we can do a little better.

I realised that with sequences of k elements, where k ≥ n - 1, I could just generate the permutations and trim the sub-lists.

> (^3).combinations(2).map(|*.permutations).sort
((0 1) (0 2) (1 0) (1 2) (2 0) (2 1))
> (^3).permutations».[^2]
((0 1) (0 2) (1 0) (1 2) (2 0) (2 1))

... and then I realised that when k == n - 2, I could generate the permutations, skip every second element, and trim the sub-lists

> (^4).combinations(2).map(|*.permutations).sort
((0 1) (0 2) (0 3) (1 0) (1 2) (1 3) (2 0) (2 1) (2 3) (3 0) (3 1) (3 2))
> (^4).permutations[0, 2 ... *]».[^2]
((0 1) (0 2) (0 3) (1 0) (1 2) (1 3) (2 0) (2 1) (2 3) (3 0) (3 1) (3 2))

Some of you are smarter than me and know where this is heading... but I had to run a few more examples until I realised that I could always generate permutations, and the number of elements to skip is (n - k)! ie. the factorial of n - k.

sub postfix:<!> ($n) { [×] 1 .. $n }  # Factorial function

sub variations(@n, $k) {
    @n.permutations[ 0, * + (@n - $k)! ... * ].map(*[^$k])
}

You might think that since I'm throwing away data, it's inefficient... and you'd be right! However, it's got less overhead than the first one-liner; it's only making one call to permutations, and it's doing a relatively simple index into some lists. It benches faster too, and there are some simple ways to speed this up a little if you sacrifice it's brevity... but if I talk about them now, I'll run out of paper... That's how this computer contraption works, right?

So now on to the meat. How do I efficiently generate this sequence algorithmically? This algorithm was a bit harder to track down, particularly because it's known by a few names, the main one of which is confusingly close to a different algorithm. I also couldn't find a RosettaCode task that talks about this specific sequence.

Eventually I found a blog post titled "A Simple, Efficient P(n,k) Algorithm" by some chap named Alistair Israel. The description matched what I was looking for, so I got to work translating it to Raku, and here it is.

sub variations(@n is copy, $k) {

    my @i = ^@n.elems;
    my $m = @n.end;
    my $e = $k - 1;

    gather loop {

        take @n[ @i[^$k] ];

        my $j = $k;
        $j++ while $j$m && @i[$e] ≥ @i[$j];

        if $j$m {
            swap( @i[$e], @i[$j] );
        }
        else {
            @i[$k .. $m] .= reverse;

            my $i = $e - 1;
            $i-- while $i0 && @i[$i] ≥ @i[$i + 1];

            last if $i < 0;

            $j = $m;
            $j-- while $j > $i && @i[$i] ≥ @i[$j];

            swap( @i[$i], @i[$j] );
            @i[$i + 1 .. $m] .= reverse;
        }
    }
}

Don't run it yet! There is no swap function in Raku. Of course, I can simply swap natively...

> my @x = 0..5
[0 1 2 3 4 5]
> (@x[2], @x[4]) = (@x[4], @x[2]); @x
[0 1 4 3 2 5]

... but I have to repeat myself too much, and since swap is a fairly common task in permutation algorithms, it's best to factor it out. You could do that with a plain ol' function

sub swap($a is rw, $b is rw) {
    ($a, $b) = $b, $a;
}

Or if you feeling extra adventurous, you could experiment with the current macro system.

use experimental :macros;

macro swap($a, $b) {
    quasi {
        ({{{$a}}}, {{{$b}}}) = {{{$b}}}, {{{$a}}}
    }
}

I say current to re-enforce that macros are an experimental feature they may change in future releases.

Whichever one you choose, this variations function - in pure Raku - benches faster than the previous one-line for shorter sequences like P(4, 2), but starts to lose on longer sequences like P(6, 5), but that should be rectified by converting this algorithm to NQP. Which it was... and is! I have pushed the NQP based variations to Github if you wanna give it a spin.

There still some improvements to make. It occurs to me that calculating the count-only each time is silly. I should calculate it once during the Iterator creation, and store it for later use.

Lastly... My guess is that if people are playing with combinatorics, they're probably playing with factorials as well... So the module now also has two additional exports: a factorial() function, and factorial-op which exports the postfix ! factorial operator. Factorials are interesting in themselves and I have been looking into faster implementations of that, so that will most likely make it's way into later commits.

Pick and Choose (Part 1½)

2018-03-17 01:00, Tags: combinatorics

Part 1½? What is this? Yes, I know I said I would tackle some permutations thingy, but I got side-tracked thinking about Iterators... so I wanted to make a quick post about those thoughts while they're still bubbling. If you don't care for such jibber jabber, you can head straight on over to Part 2 right now!

So what is an Iterator? Briefly... It is a Role that can be applied to a Class (typically a Sequence, but not necessarily) that adds additional behaviors to that Class. In my previous post, I demonstrated how the power of Iterators allowed me to quickly count the elements of a sequence without actually producing them. This is because Iterators can implement a count-only method that gets called when you do something like call .elems on it. By the way, I'm not going to go over Iterators in a lot of detail. For that, you should head over Zoffix Znet's blog Perl 6 Party, and check out... well, all the articles... but particularly the series on Sequences and Iterators: Seqs, Drugs, and Rock'n'Roll.

The only thing you need to tell an Iterator how to do is produce values, but there are other things you can tell an Iterator how to do. I've already mentioned count-only (which I should be able to implement for all combinatorial sequences), but the other special Iterator method that I'd like to see work it's way into my module is efficient skip-at-least implementations. You see, some combinatorial algorithms have efficient means to generate an element in the middle of the sequence. This ability could then be incorporated into the Iterator to allow skipping of elements in (possibly very large) sequences.

While talking about this in the Reddit comments, I threw together a quick gist to demonstrate an example of a permutations function that is capable of producing enormous sequences (that you'd never be able to iterate through in a lifetime) which allows huge numbers of sequences to be skipped. Here's a brief preview, but check out the gist for the full code.

my @l = 'A'..'Z';

say permute(@l).elems;
# OUTPUT: 403291461126605635584000000

say permute(@l).skip(268860974084403757046816342)[0];
# OUTPUT: [R I J Z Y X W V U T S Q P O N K E H B L M D F C A G]

say "Completed in { now - INIT now } seconds";
# OUTPUT: Completed in 0.0531508 seconds

Now, I obviously have a preference for algorithms that are fast, and produce results lexicographically. Typically these algorithms are iterative in nature, in that each element in the sequence is generated by modifying the previous element... which is usually what makes them so fast. I don't want to sacrifice too much speed to get efficient skipping, but it's something I've been thinking about. For now, though, I think I just need to focus on getting working implementations of as many combinatorial algorithms as I can. This is the part where I exclaim pull requests are welcome!

Anyways, in those same Reddit comments, I also mentioned a few issues with the built in permutations routine. One that always kinda irked me was that the permutations() function, and the .permutations method produce different results. The method accepts a List-y argument, produces a sequence of permutations of that List. The subroutine accepts an Int (or coerces it's argument to an Int) - let's call it n - and produces a sequence of permutations of the list 0 up to n.

> (<A B C>).permutations
((A B C) (A C B) (B A C) (B C A) (C A B) (C B A))
> permutations(<A B C>)
((0 1 2) (0 2 1) (1 0 2) (1 2 0) (2 0 1) (2 1 0))

I think it's a little silly, but I've made peace with it. I suspect the function was made that way to allow maths lovers to get the permutations of n just by typing permutations(n). That got me thinking about different things that could happen when giving my functions an Int instead of a list.

As you'd expect, when given a list, combinatorial functions in this module will produce the combinatorial sequence of that given list. What if, however, when called with an Int in place of the list, it would instead provide the number of elements in that sequence! Here's some imaginary - but totally do-able - example code

> permutations(3)
6
> permutations([0, 1, 2])
((0 1 2) (0 2 1) (1 0 2) (1 2 0) (2 0 1) (2 1 0))

Is that a stupid idea? Let me know... If you shame me enough you might convince me to abandon the idea altogether... but it's seems fairly sane to me.

Lastly, in the same Reddit comments mentioned above, Zoffix linked to a handy helper function that is used in the Rakudo core for testing iterators to ensure that the important Iterator methods are working as expected. I'll certainly be adding that into my test files.

Now on to the real Part 2.

Pick and Choose (Part 1)

2018-03-05 11:30, Tags: combinatorics

My recent obsession has been around combinatorics. For those of you who may be unfamiliar, combinatorics is a branch of mathematics closely related to graph theory. If I had to explain it in a short sentence, I'd probably say it's about the different ways in which a set of elements can be enumerated or constructed. That's a gross generalisation, but it will do for now.

There are a whole host of combinatoric algorithms, and Raku has 2 of them in the core language: permutations and combinations. There's good reason why just those 2... They are among the most common, and most useful, but that's not to say the other's aren't useful, and when I found myself needing one of those other algorithms, it led me on my aforementioned obsession.

The first one I want to talk about is "combinations with repetitions". This algorithm could be described as... At a given ice cream shop, how many different ways can I order 2 scoops. Order of choices doesn't matter, so 'Vanilla and Chocolate' is the same as 'Chocolate and Vanilla'

As a general rule, when order doesn't matter, you're talking combinations. When order matters, you're talking permutations

Now, there exists a way to do this in Raku on RosettaCode, but I want to state that I did come up with a solution by myself first based on a something I read in the Python documentation, and it also helped me later realise that - upon seeing it - the RosettaCode snippet was incorrect.

So back to Python for a minute... It has a combinations_with_replacement function in the itertools core module. Lets see what it looks like.

>>> from itertools import *
>>> list(combinations_with_replacement('ABCD', 2))
[('A', 'A'), ('A', 'B'), ('A', 'C'), ('A', 'D'), ('B', 'B'),
 ('B', 'C'), ('B', 'D'), ('C', 'C'), ('C', 'D'), ('D', 'D')]

In the itertools documentation for this function, it mentions that the result can be "expressed as a subsequence of product() after filtering entries where the elements are not in sorted order". In Raku, using the cross (Cartesian product) meta-operator ([X]), I came up with this nifty one-liner.

> sub cwr(@l, $k) { ([X] ^@l xx $k).unique(:as(~*.sort)).map({ @l[|$_] }) }
> cwr(<A B C D>, 2)
((A A) (A B) (A C) (A D) (B B) (B C) (B D) (C C) (C D) (D D))

I started by creating $k copies of my list indices, then create a Cartesian product of those lists, keeping unique ones (based on the stringified sorted order). I then use those indices to get the elements from the original list.

For the couple of benchmarks I ran (on admittedly small datasets), doing .unique(:as(~*.sort)) was slightly faster than doing something like .grep({ [≤] $_ }). In a pinch, this little snippet will do the trick, but it's also quite clear that I'm generating a bunch of data that I just throw away, so it can never be truly efficient.

Now take a look at the Raku snippet on RosettaCode for comparison. At the time of writing, it looked like this.

[X](@S xx $k).unique(as => *.sort.cache, with => &[eqv])

It certainly looks similar enough, and initially when I tried it out it seemed to work... However I quickly realised it had a flaw.

> sub ros(@S, $k) { [X](@S xx $k).unique(as => *.sort.cache, with => &[eqv]) }
> ros([0,1,2,3], 2)
((0 0) (0 1) (0 2) (0 3) (1 1) (1 2) (1 3) (2 2) (2 3) (3 3))
> ros([1,1,1,1], 2)
((1 1))
> cwr([1,1,1,1], 2)
((1 1) (1 1) (1 1) (1 1) (1 1) (1 1) (1 1) (1 1) (1 1) (1 1))

And here's Python just for good measure

>>> list(combinations_with_replacement([1,1,1,1], 2))
[(1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1)]

Now I suppose you could argue that it's a combination, so order doesn't matter, but to push my ice cream analogy... Say your ice cream shop only has one flavour, but it has four buckets of that flavour. This algorithm is concerned with the different ways you can take two scoops in terms of which buckets you scoop from, so this RosettaCode snippet is slightly broken.

There's also a recursive version on RosettaCode, which I've included below.

proto combs_with_rep(UInt, @) {*}
multi combs_with_rep(0,  @) { () }
multi combs_with_rep(1,  @a) { map { $_, }, @a }
multi combs_with_rep($,  []) { () }
multi combs_with_rep($n, [$head, *@tail]) {
    |combs_with_rep($n - 1, ($head, |@tail)).map({ $head, |@_ }),
    |combs_with_rep($n, @tail);
}

say combs_with_rep(2, [1, 1, 1, 1]);

# OUTPUT: ((1 1) (1 1) (1 1) (1 1) (1 1) (1 1) (1 1) (1 1) (1 1) (1 1))

Apart from the minor difference of taking the list as the second argument, this function performs correctly, but it's slower than my one-liner (at least in the few benchmarks I ran).

I committed to finding a faster and more efficient algorithm. Most of the other snippets on RosettaCode were recursive functions. I knew that iterative code was generally more performant than recursive, so I kept looking for a iterative solution. I noticed the C++ version, and converted it to Raku. It was faster, but eventually I came upon another algorithm which - when converted to RAku - benched even faster.

I'm sure those of you of the more Computer Science persuasion could have told me where to look, but several sites referenced Donald Knuth's The Art of Computer Programming books. Specifically, "Fascicle 2: Generating All Tuples and Permutations" and "Fascicle 3: Generating All Combinations and Partitions". I had a look and it seems the books don't straight-up give you some code, but rather more-or-less describe an algorithm. I suspect most the algorithms in use for this sequence are interpretations of the algorithm described.

So far, the fastest algorithm I found (as far as pure Raku benchmarks are concerned) is the following

sub cwr(@list, int $k) {
    gather {
        my @idx = 0 xx $k;
        take @list[@idx];
        my int $e = @list.end;
        loop {
            if @idx[$k - 1] < $e {
                @idx[$k - 1]++;
            }
            else {
                loop (my int $j = $k - 2; $j0; $j--) {
                    last if @idx[$j] != $e;
                }
                last if $j < 0;
                @idx[$j]++;
                loop ($j += 1; $j < $k; $j++) {
                    @idx[$j] = @idx[$j - 1];
                }
            }
            take @list[@idx];
        }
    }
}

This algorithm does not take into account what should happen when $k ≤ 0 or @list is empty, but those can be added fairly trivially. Upon gazing at this code, your first thought might be "Egads man! Why are you using c-style loops", and the reason should be obvious. I benched it and it was faster than using a Range.

UPDATE - Jun-2020: This may no longer be the case, as the extraordinary lizmat has made several optimisations to Ranges in Raku since this article was published

So far, this is the fastest algorithm I benched in pure Raku, but can it go faster? It can if we move beyond pure Raku, and into the world of NQP. NQP is the sub-language that forms the building blocks of the Raku language. It's more difficult to write, but you'll find that most expensive operations in the Raku core are written in NQP (including the existing permutations and combinations built-ins).

Writing these algorithms in NQP was a challenge for me. I hadn't written NQP before, so I mainly copied what I'd seen in the Rakudo code base, and referred to the list of NQP Opcodes page when necessary. The reward for my efforts was functions that ran much faster. I converted the few different algorithms I found to NQP, but the the above one was also (marginally) the fastest in NQP.

This post is already quite long enough, so I don't want to dump a whole page of NQP code here, but while my mind still has a hankering for combinatorics, I figure I might tackle a few more algorithms and make a module out of it. I'm gonna keep it off the ecosystem until it's a bit more fleshed out, but if you're interested in combinatorics, and/or a deft hand with NQP, pull requests are welcome.

Lastly, I would remiss to mention that Perl has a Algorithm::Combinatorics module, which has just about any combinatoric algorithm you could need written in fast XS, and it can be used just fine in Raku via Inline::Perl5.

> use Algorithm::Combinatorics:from<Perl5> 'combinations_with_repetition'
> combinations_with_repetition(<A B C D>, 2)
[[A A] [A B] [A C] [A D] [B B] [B C] [B D] [C C] [C D] [D D]]

Once imported, it's combinations_with_repetition function is at least twice as fast as my NQP algorithm. Which is to say, if you have a C compiler installed, and have Perl built with the right flags to support Inline::Perl5, you can install that module and use it today.

For the rest of you who need/want a fast native combinatorics library, I hope to implement as many of those algorithms as I can in NQP to make a Raku equivalent of Algorithm::Combinatorics. NQP still won't top C for performance, but Raku will allow very nice functionality, such as lazy evaluation, and Iterator shortcuts like count-only (which I've already implemented).

use Math::Combinatorics 'multicombinations';
use Algorithm::Combinatorics:from<Perl5> 'combinations_with_repetition';

sub time-it($desc, &func) {
    say "$desc: {func()} (%s seconds)".sprintf: now - ENTER now;
}

time-it 'Raku', { multicombinations(^16, 10).elems }
time-it 'Perl', { combinations_with_repetition(^16, 10).elems }

#`[ OUTPUT:
Raku: 3268760 (0.0043160 seconds)
Perl: 3268760 (5.1210621 seconds)
]

For algorithms that can find the "nth" iteration, then the skip methods can also be implemented for fast indexing into the sequence.

I'm not sure about some of the names, though. For example, combinations-with-replacement is quite a mouthful. I've seen it referred to as multicombinations in some circles - so that's what I'm using - but I'm not entirely sure if it means the same thing. If you're familiar with combinatorics, let me know if that name makes sense.

I've purposely labeled this article "Part 1" to force gently remind myself to keep working on this stuff. I'll probably be tackling some permutation of the permutations algorithm next.

To be continued...