Perl Weekly Challenge Week 015

Benchmarking my Vigenère cipher code

Perl Weekly Challenge - 015

Published 4th July 2019 by Steven Wilson.

My final solutions for week 15 can be found here.

Challenge 2

Problem Statement

Write a script to implement Vigenère cipher. The script should be able encode and decode. Checkout wiki page for more information.

What's my motivation?

The motivation for this blog was that I wasn't happy with my initial encode/decode subroutines for challenge 2 and I wanted to explore better solutions. I also wanted to try benchmarking in Perl which I had no experience of.

I realise the encode/decode subroutines are very similar, they only really have 1 line of code different if you ignore the variable names. I could use some abstraction here but I'm not going to. In the following I'm going to concentrate on the encode subroutine.

Improvements

My initial encode version (see benchmark code below) I had a call to substr inside a grep block inside a for loop. The value returned was the same for each call in the grep block. I wasn't happy with this and moved it before, storing the value in a variable to be used inside the grep block. This gave me the second version of the code.

In the third version, I tried using first_index from the module List::MoreUtils instead of grep. The optimisation here is that the loop through @alphabet, looking for the position of a character, is stopped at the first match. Whereas in a grep all of the items in @alphabet are compared.

Finally, in the process of writing this blog post I remembered an exercise from the Learning Perl book which used ord and chr to find the position of a letter in the alphabet. This method forms my forth version.

Benchmarking

I'm going to use the CPAN module Benchmark::Forking which I can use in the same way as the Benchmark module.

The main difference between the two, as perldoc Benchmark::Forking states, is it runs "each piece of code to be timed in a separate forked process" and "this can make benchmark comparisons more accurate, because the separate test cases are mostly isolated from side-effects caused by the others".

From it I will use the method cmpthese which will "print results of timethese as a comparison chart". It has the signature cmpthese ( COUNT, CODEHASHREF, [ STYLE ] ) where COUNT is the minimum number of CPU seconds to run each process, it can be 0 (default 3 seconds) or a negative number representing a minimum number of seconds to run code, I'll use -10. CODEHASHREF contains a reference to a hash where values are subroutines containing the code to be run and STYLE I don't define therefore the default 'auto' is used.

cmpthese($count, {
    'Name1' => sub { ...code1... },
    'Name2' => sub { ...code2... },
});
                

Benchmark code

#!/usr/bin/env perl

use strict;
use warnings;
use List::MoreUtils qw/ first_index /;;
use Benchmark::Forking qw/ cmpthese /;

my @alphabet = ("A".."Z");

cmpthese(-10, {
    initial_encode => sub{
        my ( $plaintext, $keyword ) = ("MMEZQTBJDGNQKYPURWZMWOIUOC", "LEMON");
        my $cyphertext;
        for my $i (0..((length $plaintext) - 1)){
            (my $mi) = grep { $alphabet[$_] eq substr($plaintext, $i, 1) } (0..@alphabet-1);
            (my $ki) = grep { $alphabet[$_] eq substr($keyword, ($i % length $keyword), 1)} (0..@alphabet-1);
            $cyphertext .= $alphabet[($mi + $ki) % 26];
        }
        return $cyphertext;
    },

    substr_out_grep => sub{
        my ( $plaintext, $keyword ) = ("MMEZQTBJDGNQKYPURWZMWOIUOC", "LEMON");
        my $cyphertext;
        for my $i (0..((length $plaintext) - 1)){
            my $plaintext_chr = substr($plaintext, $i, 1);
            my $keyword_chr   = substr($keyword, ($i % length $keyword), 1);
            (my $mi) = grep { $alphabet[$_] eq $plaintext_chr } (0..@alphabet-1);
            (my $ki) = grep { $alphabet[$_] eq $keyword_chr } (0..@alphabet-1);
            $cyphertext .= $alphabet[($mi + $ki) % 26];
        }
        return $cyphertext;
    },

    use_first_index => sub{
        my ( $plaintext, $keyword ) = ("MMEZQTBJDGNQKYPURWZMWOIUOC", "LEMON");
        my $cyphertext;
        for my $i (0..((length $plaintext) - 1)){
            my $plaintext_chr = substr($plaintext, $i, 1);
            my $keyword_chr   = substr($keyword, ($i % length $keyword), 1);
            (my $mi) = first_index { $alphabet[$_] eq $plaintext_chr } (0..@alphabet-1);
            (my $ki) = first_index { $alphabet[$_] eq $keyword_chr } (0..@alphabet-1);
            $cyphertext .= $alphabet[($mi + $ki) % 26];
        }
        return $cyphertext;
    },

    use_ord_chr => sub{
        my ( $plaintext, $keyword ) = ("MMEZQTBJDGNQKYPURWZMWOIUOC", "LEMON");
        my $cyphertext;
        for my $i (0..((length $plaintext) - 1)){
            my $mi = ord( substr( $plaintext, $i, 1 ) ) - 65;
            my $ki = ord( substr( $keyword, ($i % length $keyword), 1)) - 65;
            $cyphertext .= chr((($mi + $ki) % 26) + 65);
        }
        return $cyphertext;
    },
});

Interpreting results

Comparison chart of benchmark test results.

The chart is ordered with slowest at the top down through to the fastest at the bottom. Rate is the number of code runs per second where more is better. The percentage speed difference between each pair of tests is show in the table.

From the results the initial code is the slowest with 4304 code runs per second. Storing the result of substr in a variable I get a 78% speed gain, which is not insignificant.

Using first_index instead of grep is actually slower by 2%. I ran the test several times to make sure it wasn't a fluke and got similar results. I can't explain this.

The fastest code is where I use ord and chr functions to encode the text. It is significantly faster too, by more than a factor of 10 to the improved grep code and 1798% faster than the initial code!

A link to the final version of my code.

FIN