Adjective Noun

Page 2

Pick and Choose (Part 1½)

2018-03-17 01:00, Tags: combinatorics

Part 1½? What is this? Yes, I know I said I would tackle some permutations thingy, but I got side-tracked thinking about Iterators... so I wanted to make a quick post about those thoughts while they're still bubbling. If you don't care for such jibber jabber, you can head straight on over to Part 2 right now!

So what is an Iterator? Briefly... It is a Role that can be applied to a Class (typically a Sequence, but not necessarily) that adds additional behaviors to that Class. In my previous post, I demonstrated how the power of Iterators allowed me to quickly count the elements of a sequence without actually producing them. This is because Iterators can implement a count-only method that gets called when you do something like call .elems on it. By the way, I'm not going to go over Iterators in a lot of detail. For that, you should head over Zoffix Znet's blog Perl 6 Party, and check out... well, all the articles... but particularly the series on Sequences and Iterators: Seqs, Drugs, and Rock'n'Roll.

The only thing you need to tell an Iterator how to do is produce values, but there are other things you can tell an Iterator how to do. I've already mentioned count-only (which I should be able to implement for all combinatorial sequences), but the other special Iterator method that I'd like to see work it's way into my module is efficient skip-at-least implementations. You see, some combinatorial algorithms have efficient means to generate an element in the middle of the sequence. This ability could then be incorporated into the Iterator to allow skipping of elements in (possibly very large) sequences.

While talking about this in the Reddit comments, I threw together a quick gist to demonstrate an example of a permutations function that is capable of producing enormous sequences (that you'd never be able to iterate through in a lifetime) which allows huge numbers of sequences to be skipped. Here's a brief preview, but check out the gist for the full code.

my @l = 'A'..'Z';

say permute(@l).elems;
# OUTPUT: 403291461126605635584000000

say permute(@l).skip(268860974084403757046816342)[0];
# OUTPUT: [R I J Z Y X W V U T S Q P O N K E H B L M D F C A G]

say "Completed in { now - INIT now } seconds";
# OUTPUT: Completed in 0.0531508 seconds

Now, I obviously have a preference for algorithms that are fast, and produce results lexicographically. Typically these algorithms are iterative in nature, in that each element in the sequence is generated by modifying the previous element... which is usually what makes them so fast. I don't want to sacrifice too much speed to get efficient skipping, but it's something I've been thinking about. For now, though, I think I just need to focus on getting working implementations of as many combinatorial algorithms as I can. This is the part where I exclaim pull requests are welcome!

Anyways, in those same Reddit comments, I also mentioned a few issues with the built in permutations routine. One that always kinda irked me was that the permutations() function, and the .permutations method produce different results. The method accepts a List-y argument, produces a sequence of permutations of that List. The subroutine accepts an Int (or coerces it's argument to an Int) - let's call it n - and produces a sequence of permutations of the list 0 up to n.

> (<A B C>).permutations
((A B C) (A C B) (B A C) (B C A) (C A B) (C B A))
> permutations(<A B C>)
((0 1 2) (0 2 1) (1 0 2) (1 2 0) (2 0 1) (2 1 0))

I think it's a little silly, but I've made peace with it. I suspect the function was made that way to allow maths lovers to get the permutations of n just by typing permutations(n). That got me thinking about different things that could happen when giving my functions an Int instead of a list.

As you'd expect, when given a list, combinatorial functions in this module will produce the combinatorial sequence of that given list. What if, however, when called with an Int in place of the list, it would instead provide the number of elements in that sequence! Here's some imaginary - but totally do-able - example code

> permutations(3)
6
> permutations([0, 1, 2])
((0 1 2) (0 2 1) (1 0 2) (1 2 0) (2 0 1) (2 1 0))

Is that a stupid idea? Let me know... If you shame me enough you might convince me to abandon the idea altogether... but it's seems fairly sane to me.

Lastly, in the same Reddit comments mentioned above, Zoffix linked to a handy helper function that is used in the Rakudo core for testing iterators to ensure that the important Iterator methods are working as expected. I'll certainly be adding that into my test files.

Now on to the real Part 2.

Pick and Choose (Part 1)

2018-03-05 11:30, Tags: combinatorics

My recent obsession has been around combinatorics. For those of you who may be unfamiliar, combinatorics is a branch of mathematics closely related to graph theory. If I had to explain it in a short sentence, I'd probably say it's about the different ways in which a set of elements can be enumerated or constructed. That's a gross generalisation, but it will do for now.

There are a whole host of combinatoric algorithms, and Raku has 2 of them in the core language: permutations and combinations. There's good reason why just those 2... They are among the most common, and most useful, but that's not to say the other's aren't useful, and when I found myself needing one of those other algorithms, it led me on my aforementioned obsession.

The first one I want to talk about is "combinations with repetitions". This algorithm could be described as... At a given ice cream shop, how many different ways can I order 2 scoops. Order of choices doesn't matter, so 'Vanilla and Chocolate' is the same as 'Chocolate and Vanilla'

As a general rule, when order doesn't matter, you're talking combinations. When order matters, you're talking permutations

Now, there exists a way to do this in Raku on RosettaCode, but I want to state that I did come up with a solution by myself first based on a something I read in the Python documentation, and it also helped me later realise that - upon seeing it - the RosettaCode snippet was incorrect.

So back to Python for a minute... It has a combinations_with_replacement function in the itertools core module. Lets see what it looks like.

>>> from itertools import *
>>> list(combinations_with_replacement('ABCD', 2))
[('A', 'A'), ('A', 'B'), ('A', 'C'), ('A', 'D'), ('B', 'B'),
 ('B', 'C'), ('B', 'D'), ('C', 'C'), ('C', 'D'), ('D', 'D')]

In the itertools documentation for this function, it mentions that the result can be "expressed as a subsequence of product() after filtering entries where the elements are not in sorted order". In Raku, using the cross (Cartesian product) meta-operator ([X]), I came up with this nifty one-liner.

> sub cwr(@l, $k) { ([X] ^@l xx $k).unique(:as(~*.sort)).map({ @l[|$_] }) }
> cwr(<A B C D>, 2)
((A A) (A B) (A C) (A D) (B B) (B C) (B D) (C C) (C D) (D D))

I started by creating $k copies of my list indices, then create a Cartesian product of those lists, keeping unique ones (based on the stringified sorted order). I then use those indices to get the elements from the original list.

For the couple of benchmarks I ran (on admittedly small datasets), doing .unique(:as(~*.sort)) was slightly faster than doing something like .grep({ [≤] $_ }). In a pinch, this little snippet will do the trick, but it's also quite clear that I'm generating a bunch of data that I just throw away, so it can never be truly efficient.

Now take a look at the Raku snippet on RosettaCode for comparison. At the time of writing, it looked like this.

[X](@S xx $k).unique(as => *.sort.cache, with => &[eqv])

It certainly looks similar enough, and initially when I tried it out it seemed to work... However I quickly realised it had a flaw.

> sub ros(@S, $k) { [X](@S xx $k).unique(as => *.sort.cache, with => &[eqv]) }
> ros([0,1,2,3], 2)
((0 0) (0 1) (0 2) (0 3) (1 1) (1 2) (1 3) (2 2) (2 3) (3 3))
> ros([1,1,1,1], 2)
((1 1))
> cwr([1,1,1,1], 2)
((1 1) (1 1) (1 1) (1 1) (1 1) (1 1) (1 1) (1 1) (1 1) (1 1))

And here's Python just for good measure

>>> list(combinations_with_replacement([1,1,1,1], 2))
[(1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1)]

Now I suppose you could argue that it's a combination, so order doesn't matter, but to push my ice cream analogy... Say your ice cream shop only has one flavour, but it has four buckets of that flavour. This algorithm is concerned with the different ways you can take two scoops in terms of which buckets you scoop from, so this RosettaCode snippet is slightly broken.

There's also a recursive version on RosettaCode, which I've included below.

proto combs_with_rep(UInt, @) {*}
multi combs_with_rep(0,  @) { () }
multi combs_with_rep(1,  @a) { map { $_, }, @a }
multi combs_with_rep($,  []) { () }
multi combs_with_rep($n, [$head, *@tail]) {
    |combs_with_rep($n - 1, ($head, |@tail)).map({ $head, |@_ }),
    |combs_with_rep($n, @tail);
}

say combs_with_rep(2, [1, 1, 1, 1]);

# OUTPUT: ((1 1) (1 1) (1 1) (1 1) (1 1) (1 1) (1 1) (1 1) (1 1) (1 1))

Apart from the minor difference of taking the list as the second argument, this function performs correctly, but it's slower than my one-liner (at least in the few benchmarks I ran).

I committed to finding a faster and more efficient algorithm. Most of the other snippets on RosettaCode were recursive functions. I knew that iterative code was generally more performant than recursive, so I kept looking for a iterative solution. I noticed the C++ version, and converted it to Raku. It was faster, but eventually I came upon another algorithm which - when converted to RAku - benched even faster.

I'm sure those of you of the more Computer Science persuasion could have told me where to look, but several sites referenced Donald Knuth's The Art of Computer Programming books. Specifically, "Fascicle 2: Generating All Tuples and Permutations" and "Fascicle 3: Generating All Combinations and Partitions". I had a look and it seems the books don't straight-up give you some code, but rather more-or-less describe an algorithm. I suspect most the algorithms in use for this sequence are interpretations of the algorithm described.

So far, the fastest algorithm I found (as far as pure Raku benchmarks are concerned) is the following

sub cwr(@list, int $k) {
    gather {
        my @idx = 0 xx $k;
        take @list[@idx];
        my int $e = @list.end;
        loop {
            if @idx[$k - 1] < $e {
                @idx[$k - 1]++;
            }
            else {
                loop (my int $j = $k - 2; $j0; $j--) {
                    last if @idx[$j] != $e;
                }
                last if $j < 0;
                @idx[$j]++;
                loop ($j += 1; $j < $k; $j++) {
                    @idx[$j] = @idx[$j - 1];
                }
            }
            take @list[@idx];
        }
    }
}

This algorithm does not take into account what should happen when $k ≤ 0 or @list is empty, but those can be added fairly trivially. Upon gazing at this code, your first thought might be "Egads man! Why are you using c-style loops", and the reason should be obvious. I benched it and it was faster than using a Range.

UPDATE - Jun-2020: This may no longer be the case, as the extraordinary lizmat has made several optimisations to Ranges in Raku since this article was published

So far, this is the fastest algorithm I benched in pure Raku, but can it go faster? It can if we move beyond pure Raku, and into the world of NQP. NQP is the sub-language that forms the building blocks of the Raku language. It's more difficult to write, but you'll find that most expensive operations in the Raku core are written in NQP (including the existing permutations and combinations built-ins).

Writing these algorithms in NQP was a challenge for me. I hadn't written NQP before, so I mainly copied what I'd seen in the Rakudo code base, and referred to the list of NQP Opcodes page when necessary. The reward for my efforts was functions that ran much faster. I converted the few different algorithms I found to NQP, but the the above one was also (marginally) the fastest in NQP.

This post is already quite long enough, so I don't want to dump a whole page of NQP code here, but while my mind still has a hankering for combinatorics, I figure I might tackle a few more algorithms and make a module out of it. I'm gonna keep it off the ecosystem until it's a bit more fleshed out, but if you're interested in combinatorics, and/or a deft hand with NQP, pull requests are welcome.

Lastly, I would remiss to mention that Perl has a Algorithm::Combinatorics module, which has just about any combinatoric algorithm you could need written in fast XS, and it can be used just fine in Raku via Inline::Perl5.

> use Algorithm::Combinatorics:from<Perl5> 'combinations_with_repetition'
> combinations_with_repetition(<A B C D>, 2)
[[A A] [A B] [A C] [A D] [B B] [B C] [B D] [C C] [C D] [D D]]

Once imported, it's combinations_with_repetition function is at least twice as fast as my NQP algorithm. Which is to say, if you have a C compiler installed, and have Perl built with the right flags to support Inline::Perl5, you can install that module and use it today.

For the rest of you who need/want a fast native combinatorics library, I hope to implement as many of those algorithms as I can in NQP to make a Raku equivalent of Algorithm::Combinatorics. NQP still won't top C for performance, but Raku will allow very nice functionality, such as lazy evaluation, and Iterator shortcuts like count-only (which I've already implemented).

use Math::Combinatorics 'multicombinations';
use Algorithm::Combinatorics:from<Perl5> 'combinations_with_repetition';

sub time-it($desc, &func) {
    say "$desc: {func()} (%s seconds)".sprintf: now - ENTER now;
}

time-it 'Raku', { multicombinations(^16, 10).elems }
time-it 'Perl', { combinations_with_repetition(^16, 10).elems }

#`[ OUTPUT:
Raku: 3268760 (0.0043160 seconds)
Perl: 3268760 (5.1210621 seconds)
]

For algorithms that can find the "nth" iteration, then the skip methods can also be implemented for fast indexing into the sequence.

I'm not sure about some of the names, though. For example, combinations-with-replacement is quite a mouthful. I've seen it referred to as multicombinations in some circles - so that's what I'm using - but I'm not entirely sure if it means the same thing. If you're familiar with combinatorics, let me know if that name makes sense.

I've purposely labeled this article "Part 1" to force gently remind myself to keep working on this stuff. I'll probably be tackling some permutation of the permutations algorithm next.

To be continued...

Everyone Loves Porgs

2018-02-17 08:35, Tags: raku roles

It's been a while. I have several post ideas in various stages of completion, and it's hard to prioritise that over life sometimes... So I figure I need to start posting shorter ideas and things I've been playing with, lest this turn into one of those blogs that never updates. So here we go.

Classes are really easy to define in Raku. They're so easy that I find myself using them to encapsulate small Hash-like things, where I also want maybe one or two methods

class Contact {
    has $.name;
    has $.phone;
    has $.bday;
    method age {
        (Date.new($.bday), *.later(:1year) ...^ * > Date.today).end
    }
}

Yes, that's an inefficient way to calculate age... Like a lot of things in life, that method gets slower the older you are.

Anyways, now I have defined a simple little class for holding some data together, but to actually instantiate one I have to bust out some named arguments.

my @contacts;
@contacts.push: Contact.new(:name<John>, :phone<555-1111>, :bday<1940-10-09>);

Who's got time for all those characters? Sometimes I just want to build them with positional args, but that means writing a custom multi method new to handle those cases... but I'm just throwing together a quick & dirty class, is it really worth my time to build a custom constructor?

So I started playing around, and created a Role which lets me build my class with Positional arguments... or an Array.. or List... and hey, I threw in a Hash for free!

@contacts.push: Contact.new('James', '555-1112', '1942-06-18');

@contacts.push: Contact.new(< George 555-1113 1943-02-25 >)

my %hash = name => 'Richard', phone => '555-1114', bday => '1940-07-07';
@contacts.push: Contact.new(%hash);

I used the introspection method .^attributes to get a list of attributes. I'm only interested in local attributes (not inherited ones), though you certainly could change that, or even control it via a Parametrized Role. I'm also only interested in attributes that have an accessor (ie. public attributes).

role Porgs {
    multi method new(*@args where *.elems) {
        self.bless: |%(
            self.^attributes(:local)
                .grep(*.has_accessor)
                .map(*.name.substr: 2)
            Z=> @args)
    }
    multi method new(List $args) {
        self.new: |$args
    }
    multi method new(%args) {
        self.bless: |%args
    }
}

class Contact does Porgs { ... }

I called the role Porgs, which is a contraction of "Positional Args", but also shares the name of a creature from Star Wars. The Porgs role allows you to write classes which are small and cute, much like the creature. Also, everyone loves Porgs.

So that's all for today. I 'm not planning on publishing this to the ecosystem or anything, so feel free to steel this idea, improve upon it, rename it and publish it yourself to the ecosystem if you so desire. Also, I'm not sure if the fact that self.^attributes returns the attributes in the order you declare them is an is an implementation detail... so perhaps that might change?

It's A Wrap

2017-11-27 13:43, Tags: raku perl python functional

In my last post, I briefly touched on the concept of wrapping functions. I also learned that they are similar to decorators in Python. Apart from one time I used the @property decorator in a Python class to make some attributes read-only, I didn't really know what they were. I just figured it was some weird Python syntax. I've since learned a little be more and played around with them in Python, Perl, and Raku.

A decorator is a function that takes another function as it's argument, and typically does something "around" that function, which is why it's also referred to "wrapping" a function. A decorator can't change what the wrapped function does internally, but it can can run code before or after calling that function, or not call it at all.

I may use the words 'wrapper' and 'decorator' interchangeably, by which I mean 'a function that wraps another function'

There are some quintessential applications for decorators; the main ones being caching, logging, and timing of functions. As a reference point, here is a timing decorator in Python 3.6.

import time

def timed(func):
    name = func.__name__
    def wrapped(*args):
        start = time.time()
        res = func(*args)
        print(f"Run time for function '{name}' was {time.time() - start:f}")
        return res
    return wrapped

@timed
def costly(n):
    time.sleep(n);
    return 'Have a string'

x = costly(3)
# OUTPUT: Run time for function 'costly' was 3.02231

print(x)
# OUTPUT: Have a string

In the above example, I grab the name of the function, then create the wrapper function. My wrapper kicks off a timer, then runs the original (decorated) function and assigns the result to a variable res. I then stop the time, print out the stats then return the result.

So without further ado, or much explanation, here's a Raku sub trait that achieves the same result.

multi sub trait_mod:<is>(Routine $func, :$timed) {
    $func.wrap({
        my $start = now;
        my $res = callsame;
        note "Run time for function '{$func.name}' was {now - $start}";
        $res;
    })
}

sub costly($n) is timed {
    sleep($n);
    return 'Have a string';
}

my $x = costly(3);
# OUTPUT: Run time for function 'costly' was 3.0030732

say $x;
# OUTPUT: Have a string

Most of this should be fairly obvious, except maybe callsame, which I covered in my last post... but if you need a refresher, it tells the dispatcher to call the same function that was just called. Also, note the note function, which is exactly like say except that it outputs to STDERR.

Traits wrap a function at (some time around) compile time, but sometimes you might want to wrap a function at runtime, or rather... You might want to decide whether you want to wrap a function at runtime; which functions you want wrapped with what; and when.

Take debugging for example. It would be trivial to create a trait that reports to STDERR when a function has been called, and with what arguments... but adding and removing a trait everytime you want to debug - especially on multiple functions - can get a little unwieldy.

Typically when you debug with print statements (we all do it!) you might manage your programs DEBUG mode via a global variable. At runtime you can inspect the variable and wrap your desired functions accordingly.

constant DEBUG = True;

sub foo($n) {
    return $n × $n;
}

&foo.wrap(&debug) if DEBUG;

my $x = foo(42);

sub debug(|c) {
    my &func = nextcallee;
    my $res = func(|c);
    note "Calling '{&func.name}' with args {c.perl} returned: {$res.perl}";
    $res;
}

# STDERR: Calling 'foo' with args \(42) returned: 1764

The .wrap() method actually returns something called a WrapHandle, which is handy if you want to be able to unwrap your function at any point. It also means you can decide which wrappers get removed.

Perhaps you have a logging wrapper, something that performs a similar role as the debug wrapper, but instead punts the information to your logger of choice, or maybe just a text file. You want to disable the debugger at some point, but keep logging.

my $wh-logger = &foo.wrap(&logger);

my $wh-debug = &foo.wrap(&debug) if DEBUG;

my $x = foo(42);

# Success threshold, debugging is no longer required
&foo.unwrap($wh-debug) if DEBUG;

# Calls to 'foo' still hit the logger
my $y = foo(19);

The beauty of wrappers is your wrapped functions don't have to know they are being wrapped. They can concern themselves with their core purpose. Additionally they only need to be wrapped once, instead of, for example, manually calling your logger function all over the place.

So these decorator things are nice, but I still use Perl quite a lot, and I wanted to know if there was a way to wrap functions in Perl with the same syntactic niceness that trait's provide. What I eventually landed on was attributes, and Attribute::Handlers.

Like trait mods (and Python decorators), attributes are added at the point of your function declarations. Attribute::Handles just makes working with them a little easier. Here's the example from up top, implemented with Perl.

use v5.26;
use warnings; no warnings 'redefine';
use experimental 'signatures';
use Time::HR 'gethrtime';
use Attribute::Handlers;

sub Timed( $pkg, $sym, $code, @ ) :ATTR {
    my $func = substr( ${$sym}, length($pkg) + 3 );
    *$sym = sub (@args) {
        my $start = gethrtime();
        my $res   = $code->(@args);
        my $time  = ( gethrtime() - $start ) / 1_000_000_000;
        say {*STDERR} "Run time for function '$func' was $time";
        return $res;
    }
}

sub costly($n) :Timed {
    sleep($n);
    return 'Have a string';
}

my $x = costly(3);
# STDERR: Run time for function 'costly' was 3.001124

say $x;
# OUTPUT: Have a string

A few caveats to note about Perl... This is classed as redefining a symbol that already exists, and Perl will give a warning that the wrapped function has been redefined, so I disabled that warning. It will also give a warning if you give your attribute an all-lowercase name, as lowercase attributes are reserved for possible future use. Also, as far as I found, the only way to import attributes from a module is to declare them into the UNIVERSAL namespace, for example: UNIVERSAL::Timed, which technically means you don't even need to export them from your module, so... Yay, I guess.

One final note. It's curious to me that I'm talking about "wrapping" and "decorating" this close to December, when those words typically mean something else entirely. Happy holidays!

Reddit comments

Later Earlier