Continues from previous week.
Feel free to submit a merge request or open a ticket if you found any issues with this post. We highly appreciate and welcome your feedback.
Read the excellent blog post by Arne Sommer on his investigation to find the shortest solution in both Perl 5 and Perl 6.
Quite a few participants were taken aback (including the reviewer) by this task. Upon re-reading and reviewing the task and submitted solutions, perhaps this was one of those task where the solution depends solely on the interpretation of the participant and along the way, let us all learn something about the Perl interpreter itself (see Arne Sommer’s post). Nevertheless, we will look at different ways used by participants to solve this task.
First off is the empty file solution. First submitted by Joelle Maslak and followed by E.Choroba, Simon Proctor, and Ruben Westerberg. Joelle Maslak did a comparison of parsing empty file between a few programming languages and Perl 5 have the fastest startup time without throwing any errors at “doing nothing”.
Some participants (Dave Cross and Roger Bell West) have an alternative opinion that an empty file is not really a Perl script. Well, they are partially correct as according to
file command, an empty file, is just an empty file.
$ file challenge-024/joelle-maslak/perl5/ch-1.pl challenge-024/joelle-maslak/perl5/ch-1.pl: empty
Hence, by adding a
shebang as a interpreter directive, the
file command will recognize a file as Perl executable script.
$ file challenge-024/roger-bell-west/perl5/ch-1.pl challenge-024/roger-bell-west/perl5/ch-1.pl: Perl script text executable
Does that means that a text file without a Perl
shebang interpreter directive is not a Perl script?
$ perl challenge-024/andrezgz/perl5/ch-1.pl This script is the smallest in terms of size that on execution doesn't throw any error, doesn't do anything special and explains what it does $ file challenge-024/andrezgz/perl5/ch-1.pl challenge-024/andrezgz/perl5/ch-1.pl: ASCII text
For those who use code linter in their development environment,
perlcritic, even at the most gentle setting will raise some concerns. Well, this is not part of the requirement of the task, it’s good to know how
perlcritic evaluates a basic Perl script.
$ perlcritic --gentle challenge-024/lubos-kolouch/perl5/ch-1.pl challenge-024/lubos-kolouch/perl5/ch-1.pl: [TestingAndDebugging::RequireUseStrict] Code before strictures are enabled at line 1, column 1 (Severity 5). Using strictures is probably the single most effective way to improve the quality of your code. This policy requires that the `'use strict'' statement must come before any other statements except `package', `require', and other `use' statements. Thus, all the code in the entire package will be affected. There are special exemptions for Moose, Moose::Role, and Moose::Util::TypeConstraints because they enforces strictness; e.g. `'use Moose'' is treated as equivalent to `'use strict''. The maximum number of violations per document for this policy defaults to 1.
In short, an empty file is probably the shortest and closest answer to this task that fulfil most of the requirements.
CPAN modules used:
If you haven’t done this task and want to learn more about implementing full text search using Inverted Index, start with submission by Laurent Rosenfeld. The solution was concise but comprehensive enough to demonstrate a working implementation of Inverted Index. Next, move to something similar with test case by Lubos Kolouch. If the regex on extracting words to build the indexes confuse you, you can read the submission by Andrezgz, which have good written comments on the regex. Now if you still cannot grok how Inverted Index works, then look at output of the solution by Yet Ebreo where you can visualize the how the index works as shown below.
perl .\ch-2.pl "i sing eat and love" .\file1.txt .\file2.txt .\file3.txt .\file4.txt .\file5.txt +-------+--------------------------------+ | Words | File(s) | +-------+--------------------------------+ | and | file1.txt file2.txt | | eat | file4.txt | | i | file1.txt file2.txt file4.txt | | love | file2.txt file5.txt | | sing | (N/A) | +-------+--------------------------------+
By going through all four submissions, you’re now equipped with good fundamental overview of the implementation of Inverted Index.
The above mentioned four solutions served as a good working prototype to get things started. You must be wondering, can we improve or extend on these solutions?
Yes, there were quite a few.
Interestingly, two of the participants (E. Choroba and Joelle Maslak) used their own CPAN module in their own solution. The first we’ve seen so far and it caught us by surprise! First, the solution by E. Choroba used
Syntax::Construct CPAN module as an alternative way to manage
feature pragma. Second, the solution by Joelle Maslak which used
File::ByLine CPAN module to process single a file in a parallel manner.
How about storage? Both Duane Powell and E. Choroba used
Storable CPAN module to capture the index in a persistent manner. While Guillermo Ramos was the only participant who used DB to store the index.
How about counting the word frequency for each document? Randy Lauen was the only participant which implemented this approach.
Now, for some other miscellaneous we’ve noticed from reviewing these solutions.
If you don’t like to use
glob to filter and get the list of files, you can look into
File::Find::Rule as seen in submission by Duane Powell. In formatting the output of the result, Adam Russell demonstrated that you can use the Perl formats (quite old school way) to achieve that.
(1) Small Inversions with Perl 6 by Arne Sommer. Recommended read of the week.
(2) Inverted Index Formatting by Adam Russell
(3) Perl Weekly Challenge # 24: Smallest Script and Inverted Index by Laurent Rosenfeld
(4) Perl Weekly Challenge 24 by Jaldhar H. Vyas
(5) RogerBW’s Blog: Perl Weekly Challenge 24 by Roger Bell West
(6) Perl Weekly Challenge W024 - Smallest Script, Inverted Index by Yet Ebreo
(7) Perl Weekly Challenge 024: Inverted Index and Shortest Oneliner by E. Choroba
(8) Perl is Good for Nothing by Joelle Maslak