Adjective Noun

Using matplotlib in Perl 6 (part 4)

2017-03-08

This is Part 4 in a series. You can start at the Intro here.

The last post of this series was kind of more about replicating Numpy's linspace function in Perl 6 than it was about testing the limits of this Matplotlib wrapper. In this part I am going to hit one of those limits and encounter a minor short-coming of the wrapper as it currently exists.

If you're following along, I had decided to take a shot at one of the histograms and jumped into the first one named histogram_demo_features. I took a glance at the Python code and froze...

import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt

Straight off the bat I'm in trouble: Queue dramatic music. The Matplotlib wrapper I'd been using was really a wrapper for matplotlib's sub-module pyplot. My wrapper gave me no access to other sub-modules like mlab. I was going to have to modify the wrapper.

In Python, importing a package doesn't necessarily give me access to the sub-packages, eg. If I import the matplotlib base package, I can't use matplotlib.pyplot or matplotlib.mlab... but! If I import only matplotlib.pyplot, I can now also use matplotlib.mlab. Maybe (probably) I just don't understand Python packaging.

Stefan Seifert, the guy who created the Inline::Python module is a clever guy. Me, I'm not so clever, so I just kinda hacked away until things worked, and this is what I landed on. I imported matplotlib at the package level (which will run when the wrapped modules is used) and defined pyplot and mlab as their own class.

use Inline::Python;
my $py = Inline::Python.new();
$py.run('import matplotlib.pyplot');

class Matplotlib::Mlab {
    method FALLBACK($name, |c) {
        $py.call('matplotlib.mlab', $name, |c);
    }
}

class Matplotlib::Plot {
    method FALLBACK($name, |c) {
        $py.call('matplotlib.pyplot', $name, |c);
    }
}

class Matplotlib {
    method FALLBACK($name, |c) {
        $py.call('matplotlib', $name, |c);
    }
}

Oh, I also dropped the py from pyplot in my module because... reasons. I've also got a class for the top-level matplotlib package. I'm not sure if there's methods in there I will need to call, but it doesn't hurt to be prepared.

I don't even know if jumped this hurdle, or just kinda kicked it over and stumbled ahead. Let me know if you have a more sane way to I could have done this. In any case, what mattered to me most at the time was that it worked and I could move on to playing with plots. To use this fancy new wrapper in all my previous examples, all I need to do is change the class instantiation from this

my $plt = Matplotlib.new;

... to this

my $plt = Matplotlib::Plot.new;

Which actually maps closer the Python code, anyways. With that out of the way I can move on the next few lines of code.

np.random.seed(0)

# example data
mu = 100  # mean of distribution
sigma = 15  # standard deviation of distribution
x = mu + sigma * np.random.randn(437)

I can guess what random.seed does. Pseudo-random number generators (or PRNG's) use an algorithm to compute a random number; this is the "pseudo" part of pseudo-random. Provided you start the algorithm at the same number (the seed) each time, the result is always the same. How the seed is obtained normally (and how the random numbers are generated) differs between operating systems and programming languages. The seed function in Perl is called srand, so that part's easy.

Then we come to randn. A quick search led me to this StackOverflow post where I learned that it creates a "normal distribution." That link jumps to one of the replies, which is from an actual statistician! This helpful human explains that a normal distribution is "a distribution where the values are more likely to occur near the mean value". So, think bell curve.

I'm not a stats guy. Heck, I'm not even a maths guy... So I headed to RosettaCode to grab a normal distribution function in Perl 6. I modified it slightly (hopefully without breaking it) so that behaves like a very simple clone of numpy.random.randn, and like numpy, stuck it in it's own sub-module to the Numpl module I created in Part 3.

class Numpl::Random {
    method randn($n) {
        sqrt(-2 × log(rand)) × cos× rand) xx $n;
    }
}
class Numpl {
    # linspace stuff ...

    method random {
        Numpl::Random.new();
    }
}

Which means I could now do this

my $np = Numpl.new;
my $x = $np.random.randn(437)

The next few lines are pretty straight-forward, so moving now to mlab.normpdf. Ok, so the comment there tells me that this thing adds a line of "best fit", but what the heck does it have to do with PDF? Being curious, I did a search and found out it stands for 'Probability Density Function'. With my curiosity quenched, I converted the rest of the code to Perl without much fanfare.

use Numpl;
use Matplotlib;

my $np   = Numpl.new;
my $plt  = Matplotlib::Plot.new;
my $mlab = Matplotlib::Mlab.new;

srand(0);

# example data
my $mu = 100;    # mean of distribution
my $sigma = 15;  # standard deviation of distribution
my $x = $np.random.randn(437).map(* × $sigma + $mu);

my $num_bins = 50;

my ($fig, $ax) = $plt.subplots();

# the histogram of the data
my ($n, $bins, $patches) = $ax.hist($x, $num_bins, :normed(1));

# add a 'best fit' line
my $y = $bins.map(-> $value {
    $mlab.normpdf($value, $mu, $sigma)
});
$ax.plot($bins, $y, '--');
$ax.set_xlabel('Smarts');
$ax.set_ylabel('Probability density');
$ax.set_title('Histogram of IQ: $\mu=100$, $\sigma=15$');

# Tweak spacing to prevent clipping of ylabel
$fig.tight_layout();
$plt.show();

So, um, yeah... Not much to say here that hasn't been covered. I'm using a map again on the results of randn and norpdf. The rest is pretty standard translation stuff, and here's the result.

Even though I am seeding the PRNG, Perl will generate random numbers differently than Python, so this doesn't look exactly like the one in the gallery. You can remove the srand to get a different graph each time. The colours, however, are a little... academic. I think next I'll try applying one of the style sheets to a graph.

To be continued...