Kian-Meng Ang Weekly Review: Challenge - 032

Sunday, Nov 10, 2019| Tags: Perl


Continues from previous week.

Feel free to submit a merge request or open a ticket if you found any issues with this post. We highly appreciate and welcome your feedback.

For a quick overview, go through the original tasks and recap of the weekly challenge.

Additional feedback to the our Perl Weekly Challenge’s Twitter account is much appreciated.



Task #1



CPAN modules used: Const::Fast, Data::Dumper, English, Getopt::Long, Getopt::Std, List::Util, Modern::Perl, Term::Size::Perl, Text::CSV_XS, feature, open, strict, warnings

A rather common (there is even a FAQ, link via E. Choroba) task to demonstrate Perl’s text manipulation capabilities. Common approach used by most participants can be break down into several steps consists of read each line from standard input, clean up each line, count the frequency of the word, sort the frequency count, and lastly show the result in both normal and CSV output. The solution by Adam Russell and Dave Cross demonstrated this succinctly with no dependencies on CPAN modules or special features. And yes, start reading these solutions first before you proceed with other.

Beside that, there were other solutions in one-liner that can quickly show the approach to solve this task as shown below.

In Bash by E. Choroba. Yes, using Unix pipe which demonstrated core concepts of Unix philosophy of writing one program to do one thing very well and make sure all can work together through text streams.

$ cat "$@" | sort | uniq -c | sort -n

In Perl by Burkhard Nickels.

$ perl -lne '$sum{$_}++; END { foreach( sort { $sum{$b} <=> $sum{$a} } keys %sum ) { print "$i\t$sum{$i}"; } }' example.txt

Unfortunately, we don’t live in the perfect world. There is always this issue of garbage in, garbage out (GIGO) during text processing. Hence, we noticed different defensive measurements to handle possible edge cases were used by participants. And it showed that, to a certain extend, to handle numerous edges cases, Bash script may not be a convenient or suitable choice compare to Perl script.

For examples, Anton Fedotov (UTF-8 support in both input and output), Duane Powell (non-alphabet characters removal and multiple words within a line support), Dave Jacoby (accept source from file or Standard Input (stdin)), Laurent Rosenfeld (skip empty line), Prajith P (another way to skip empty line without regex and multiple source files support), E. Choroba (alternative UTF-8 support), and Athanasius (check input for different OS).

If you’re new to Perl or favour to be more explicit, look into the solution by Jaldhar H. Vyas which used English CPAN module which avoid the use of ugly punctuation variables. Ovid wrote a blog post on this subject as well. For comparison, we can see the differences of those submissions which used punctuation variables like $! or $ERRNO (Anton Fedotov), $~ or $FORMAT_NAME (Fabrizio Poggi), <> or ARGV (Rage311, Roger Bell_West, Andrezgz, E. Choroba, Ryan Thompson, Duncan C. White), $| or $OUTPUT_AUTOFLUSH (Athanasius), and $^O or $OSNAME (Athanasius).

Another good comparison of explicit vs. implicit coding style was on reading a file by line and count the word frequency.

By Dave Cross which was quite common style among all participants.

my %count;

while (<>) {
    chomp;
    $count{$_}++;
}

By Andrezgz which was short and dense code.

my %entries;
chomp, $entries{$_}++ while (<>); # count instances

By Steven Wilson which was more explicit.

for my $file (@ARGV) {
    open my $fh, '<', $file or die "Can't open < $file: $!";
    while ( !eof $fh ) {
        my $word = readline $fh;
        chomp $word;
        $word_count{$word} += 1;
    }
}

For submission with heavy used of CPAN modules but still short and simple, the solution by E. Choroba was a good example to show case this.

Lastly, once you’ve familiarized yourself with all the submitted solutions of this task, read the answer by Burkhard Nickels. Burkhard have submitted a comprehensive solution with numerous features for this task.



Task #2



CPAN modules used: Const::Fast, Data::Dumper, Function::Parameters, Getopt::Long, Getopt::Std, JSON::PP, List::Util, Math::Round, Modern::Perl, Moo, Term::ReadKey, Term::Size::Perl, Types::Standard, autodie, constant, experimental, feature, namespace::clean, strict, utf8, warnings

Another rather straightforward task where we need to sort the list of item by name or values and generate a histogram graph at the console. Due to similarity of this task with previous task, some participants decided to reuse and extend the existing code in Task #1.

For implementing different sorting strategies, we’ve observed different approaches.

Duane Powell used a simple if-else to sort the output.

# sort and print lines
if ($by_label) {
    # sort by key
    generate_bar($_[0],$_,$format) foreach (sort {lc($a) cmp lc($b)}           (keys %{$_[0]}));
} else {
    # sort by value
    generate_bar($_[0],$_,$format) foreach (sort {$_[0]->{$a} <=> $_[0]->{$b}} (keys %{$_[0]}));
}

Lars Balker, Javier Luque and Giuseppe Terlizzi used similar approach but with anonymous sorting subroutine.

For Lars Balker where the if-else have been replaced with ternary operations and anonymous subroutines.

my $sort = $order_by_label
    ? sub { $a cmp $b }
    : sub { $data->{$b} <=> $data->{$a} || $a cmp $b };
for my $word (sort $sort keys %$data) {
    my $pct = $data->{$word} / $sum;
    printf "%*s | %s\n", $maxlength, $word,
        "#" x int($hashes * $pct);
}

For Javier Luque with flexibility to add more sorting fields.

# Sorting function - just 2 for now
$sort_func = sub { $data->{$::b} <=> $data->{$::a} }
    if ($params->{order_by} eq 'size');
$sort_func = sub { fc($::a) cmp fc($::b) }
    if ($params->{order_by} eq 'name');

# Print the chart
for my $key (sort $sort_func keys %$data) {
printf("%10s | %s \n",
    $key, '#' x int(scalar($data->{$key}) * 4));
}

The more advance and Object-Oriented approach was seen in E. Choroba. Instead of if-else statement, a hash was used as a lookup table to delegate to different sorting strategy subroutines.

method generate (SortBy $sort_by = 'keys') {
    my $data = $self->data;
    my $max = max(values %$data);

    my $sort = {labels      => \&_by_key,
                values      => \&_by_value,
                labels_desc => \&_by_key_desc,
                values_desc => \&_by_value_desc}->{$sort_by};

    for my $key (sort { $self->$sort } keys %$data) {
        printf '%' . $self->_max_length . "s%s%s\n",
            $key,
            $self->separator,
            '#' x ($self->_bar_width / $max * $data->{$key});
    }
}

Another thing we’ve noticed where participants have put extra effort in scaling the histogram due to different range of values (Nazareno Delucca, Lars Balker, Adam Russell, Athanasius, Laurent Rosenfeld, Prajith P, Lars Thegler, Ulrich Rieke, Jaldhar H Vyas, Roger Bell_West, Ryan Thompson, Duncan C. White, Andrezgz, Giuseppe Terlizzi, Colin Crain, and Joelle Maslak), floating numbers (Ulrich Rieke, Colin Crain, and Joelle Maslak), and terminal width (Burkhard Nickels, Roger Bell_West, Ryan Thompson, and Joelle Maslak).



SEE ALSO



(1) Perl Weekly Challenge 032 by Adam Russell

(2) The Raku Instance Bar by Arne Sommer

(3) Perl Weekly Challenge #32, Number of occurences and bar chart by Burkhard Nickels

(4) Perl Weekly Challenge 032: Frequency Table & ASCII Bar Chart by E. Choroba

(5) Perl Weekly Challenge: Week 32 by Jaldhar H. Vyas

(6) Perl Weekly Challenge - 32 by Javier Luque

(7) Perl Weekly Challenge 32: Word Histogram and ASCII Bar Chart by Laurent Rosenfeld

(8) RogerBW’s Blog: Perl Weekly Challenge 32 by Roger Bell_West


SO WHAT DO YOU THINK ?

If you have any suggestions or ideas then please do share with us.

Contact with me