Code-a-holic: March 2008

Tuesday, March 11, 2008

[Perl] How to do it better

There are some really helpful people in the Perl community. I advertised the original posting on comp.lang.perl.misc and received some very useful responses from John W. Krahn and Michele Dondi, as below.

With respect to the rules, John wrote:

# Rule 1. Remove all 's' 
sub rule1 {
   ( my $arg = shift ) =~ tr/s//d;
   return $arg;
}

# Rule 2. Sort the characters of the word into alphabetic order 
sub rule2 {
   return join '', sort split //, shift;
}

John> Why [were] you converting the '|' character and the 'e' character to 'e'?

# Rule 3. Convert all vowels to 'e 
sub rule3 {
     ( my $arg = shift ) =~ tr/aiou/e/;
     return $arg;
}

John> The . character class matches a lot more than just letters, or did you really mean "replace any first character except newline with 'n'".

# Rule 4. Replace the first letter with 'n' 
sub rule4 {
   ( my $arg = shift ) =~ s/\A[[:alpha:]]/n/;
   return $arg;
}

John> The . character class matches a lot more than just letters.

# Rule 5. Drop the last letter 
sub rule5 {
   ( my $arg = shift ) =~ s/[[:alpha:]]\z//;
   return $arg;
}

# Rule  6. Replace letter pairs with 'ow'
sub rule6 {
   ( my $arg = shift ) =~ s/([[:lower:]])\1/ow/g;
   return $arg;
}

Then, with respect to the string eval() of each rule, John said, "Ouch! Use a dispatch table instead of string eval()."

my %rule = (
     1 => sub {
         ( my $arg = shift ) =~ tr/s//d;
         return $arg;
         },
     2 => sub {
         return join '', sort split //, shift;
         },
     3 => sub {
         ( my $arg = shift ) =~ tr/aiou/e/;
         return $arg;
         },
# more rules here
     );

# and later on, use rules
     $res = $rule{ $j }( $arg ); 

At this point Michele Dondi chipped in with, ">my %rule = (
>  1 => sub {
>    ( my $arg = shift ) =~ tr/s//d;
>    return $arg;
>    },
>  2 => sub {
>    return join '', sort split //, shift;
>    },

Since the keys are numbers, an array may be appropriate."

So there you have it, better rules, anonymous subroutines, and hashes. Powerful stuff.

I wonder how Tcl and the other languages would have done it. Or Protium, for that matter. Any takers?

© Copyright Bruce M. Axtens, 2008

Sunday, March 02, 2008

[Perl] How not to do it?

The header for this website says, "Some useful stuff and some examples of how not to do it." This may fall into the latter category.

One of my kids had been playing a computer-based game which used a variety of word puzzles. The question he posed to me was to take a word, apply a small set of rules to it, as many times as necessary, and come up with another word. He would supply the rules and both words, and I would supply the sequence of rule applications which would effect the conversion.

(I'm no guru when it comes to Perl, so if you see something that could be expressed in a more efficient manner, please let me know.)

These are the rules:

1. Remove all 's'

sub rule1 {
  my $arg = shift;
  $arg =~ s/s//g;
  return $arg;
}

2. Sort the characters of the word into alphabetic order

sub rule2 {
  my $arg = shift;
  my @arr = split( //, $arg );
  @arr = sort @arr ;
  $arg = join( '', @arr );
  return $arg;
}

3. Convert all vowels to 'e'

sub rule3 {
    my $arg = shift;
    $arg =~ s/[a|e|i|o|u]/e/g;
    return $arg;
}

4. Replace the first letter with 'n'

sub rule4 {
  my $arg = shift;
  $arg =~ s/^./n/;
  return $arg;
}

5. Drop the last letter

sub rule5 {
  $arg = shift;
  $arg =~ s/.$//;
  return $arg;
}

6. Replace letter pairs with 'ow'

sub rule6 {
  $arg = shift;
  @arr = split( //, "abcdefghijklmnopqrstuvwxyz" );
  for $letter ( @arr ) {
    $arg =~ s/$letter$letter/"ow"/eg;
  }
  return $arg;
}

My son then said that the start word was 'first' (or 'ant') and the stop word was 'now'. After some fiddling, resulting in the code below, I said, "With these rules you can't get from 'first' to 'now'. Not even from 'ant' to 'now'. But from 'gnat', yes."

So here's the code, for what it's worth. It was assumed that this would be run as a command line tool, so I load up the start word (stored in $root) and a recursion management flag (stored in $managed). Recursion management is defaulted to true. The recursion level is marked with $level and there's a hash, called %deadends, to keep track of "solutions" that shouldn't be pushed any further as they have already been proved not to get any closer to the solution.

my $root = shift;
my $managed = shift;
$managed = 1 unless defined($managed);
my $level = 0;
my %deadends;

Everything else happens in the apply function which looks at every possible combination of rules in pursuit of the target word. After getting the word to check from @_ with shift, a couple of variables are declared and a for loop initiated, stepping through the rules.

sub apply {
  my $arg = shift;
  my $res;
  my $reason;
  for ( my $j = 1; $j <=6; $j++ ) {

Each rule is evaluated against the passed in value in $arg, and stored in $res. $reason is cleared and each test applied to $res.

    $res = eval( "rule$j(\"$arg\")" );
    $reason = "";

If $res is the same as $arg, $reason is "equal". If the length of $res is less than 3, $reason is set to "too short". If $managed is 1, and $res is already in the %deadends hash, $reason is set to "deadend", and if $res is equal to "now" (the goal, as it happens) then $reason is set to "found".

    if ($res eq $arg) {
      $reason = "equal";
    } elsif ( length( $res ) < 3) {
      $reason = "too short";
    } elsif ($deadends{$res}) {
      $reason = "deadend" unless $managed eq 0;
    } elsif ( $res eq "now" ) {
      $reason = "found";
    }

If $reason is not empty and not "equal" then print a newline, as many spaces as there are levels of recursion, the rule that got us here, the incoming word and the result of the rule application. If $reason is "deadend" then print an exclamation mark to show that a deadend has been reached, otherwise print a full stop.

    if ( $reason ne "" ) {
      if ( $reason ne "equal" ) {
        print "\n";
        print " " x $level;
        print "(R$j) $arg => $res";
        if ( $reason eq "deadend" ) {
          print "!";
        } else {
          print ".";
        }
      }

If we've actually reached "now", indicate that with an asterisk. (We could exit the script at this point, but I left it to show all the possible paths to the stop word.)

      if ( $res eq "now" ) {
        print "*";
      }

If managing recursion, store the value of $res in the deadends hash.

      $deadends{$res}=1 unless $managed eq 0;

Now, if $reason is, for some reason, empty, print a newline, as many spaces as there are recursion levels, the rule, and the $arg and the $res. Then increase the value of $level and recursively call apply with the contents of $res. When it returns, decrease the value of $level.

    } else {
      print "\n";
      print " " x $level;
      print "(R$j) $arg => $res,";
      $level++;
      apply( $res );
      $level--;
    }
  }
}

Here's the first call to apply, with a newline displayed once processing returns from the call.

apply( $root );
print "\n";

Keeping track of dead-ends proved useful. Without it, the 'first' to 'now' attempt generated at 406K file (redirecting the output). With it, I got a 5K file. Similarly, 'ant' to 'now' was 1.8K without, and 457 bytes with. When it came to starting with 'gnat', a managed conversion generated an 8K file. Without management the laptop slowed to a crawl. After about five minutes I got an "Out of memory!" on stderr, so I killed the perl processing resulting in a 902 Megabyte file.

This is the result of a managed attempt with 'ant' as the start word:

(R3) ant => ent,
 (R4) ent => nnt,
  (R5) nnt => nn.
  (R6) nnt => owt,
   (R2) owt => otw,
    (R3) otw => etw,
     (R4) etw => ntw,
      (R5) ntw => nt.
     (R5) etw => et.
    (R4) otw => ntw!
    (R5) otw => ot.
   (R3) owt => ewt,
    (R2) ewt => etw!
    (R4) ewt => nwt,
     (R2) nwt => ntw!
     (R5) nwt => nw.
    (R5) ewt => ew.
   (R4) owt => nwt!
   (R5) owt => ow.
 (R5) ent => en.
(R4) ant => nnt!
(R5) ant => an.

Sadly, no asterisks. Next, the log of a managed run starting with 'gnat'. Success came quite quickly: start with 'gnat' and apply rules 2, 3, 4, 2, 4, 5, 6 and 2. An even shorter path appears further down: 3, 4, 2, 5, 6, and 4. The shortest appears to be 4, 2, 5, 6, 4 -- nnat, annt, ann, aow, now.

(R2) gnat => agnt,
 (R3) agnt => egnt,
  (R4) egnt => ngnt,
   (R2) ngnt => gnnt,
    (R4) gnnt => nnnt,
     (R5) nnnt => nnn,
      (R5) nnn => nn.
      (R6) nnn => own,
       (R2) own => now.*
       (R3) own => ewn,
        (R2) ewn => enw,
         (R4) enw => nnw,
          (R5) nnw => nn.
          (R6) nnw => oow,
           (R3) oow => eew,
            (R4) eew => new,
             (R2) new => enw!
             (R5) new => ne.
            (R5) eew => ee.
            (R6) eew => oow!
           (R4) oow => now!*
           (R5) oow => oo.
         (R5) enw => en.
        (R4) ewn => nwn,
         (R2) nwn => nnw!
         (R5) nwn => nw.
        (R5) ewn => ew.
       (R4) own => nwn!
       (R5) own => ow.
     (R6) nnnt => ownt,
      (R2) ownt => notw,
       (R3) notw => netw,
        (R2) netw => entw,
         (R4) entw => nntw,
          (R5) nntw => nnt,
           (R5) nnt => nn.
           (R6) nnt => owt,
            (R2) owt => otw,
             (R3) otw => etw,
              (R4) etw => ntw,
               (R5) ntw => nt.
              (R5) etw => et.
             (R4) otw => ntw!
             (R5) otw => ot.
            (R3) owt => ewt,
             (R2) ewt => etw!
             (R4) ewt => nwt,
              (R2) nwt => ntw!
              (R5) nwt => nw.
             (R5) ewt => ew.
            (R4) owt => nwt!
            (R5) owt => ow.
          (R6) nntw => owtw,
           (R2) owtw => otww,
            (R3) otww => etww,
             (R4) etww => ntww,
              (R5) ntww => ntw!
              (R6) ntww => ntow,
               (R2) ntow => notw!
               (R3) ntow => ntew,
                (R2) ntew => entw!
                (R5) ntew => nte,
                 (R2) nte => ent,
                  (R4) ent => nnt!
                  (R5) ent => en.
                 (R5) nte => nt.
               (R5) ntow => nto,
                (R2) nto => not,
                 (R3) not => net,
                  (R2) net => ent!
                  (R5) net => ne.
                 (R5) not => no.
                (R3) nto => nte!
                (R5) nto => nt.
             (R5) etww => etw!
             (R6) etww => etow,
              (R2) etow => eotw,
               (R3) eotw => eetw,
                (R4) eetw => netw!
                (R5) eetw => eet,
                 (R4) eet => net!
                 (R5) eet => ee.
                 (R6) eet => owt!
                (R6) eetw => owtw!
               (R4) eotw => notw!
               (R5) eotw => eot,
                (R3) eot => eet!
                (R4) eot => not!
                (R5) eot => eo.
              (R3) etow => etew,
               (R2) etew => eetw!
               (R4) etew => ntew!
               (R5) etew => ete,
                (R2) ete => eet!
                (R4) ete => nte!
                (R5) ete => et.
              (R4) etow => ntow!
              (R5) etow => eto,
               (R2) eto => eot!
               (R3) eto => ete!
               (R4) eto => nto!
               (R5) eto => et.
            (R4) otww => ntww!
            (R5) otww => otw!
            (R6) otww => otow,
             (R2) otow => ootw,
              (R3) ootw => eetw!
              (R4) ootw => notw!
              (R5) ootw => oot,
               (R3) oot => eet!
               (R4) oot => not!
               (R5) oot => oo.
               (R6) oot => owt!
              (R6) ootw => owtw!
             (R3) otow => etew!
             (R4) otow => ntow!
             (R5) otow => oto,
              (R2) oto => oot!
              (R3) oto => ete!
              (R4) oto => nto!
              (R5) oto => ot.
           (R3) owtw => ewtw,
            (R2) ewtw => etww!
            (R4) ewtw => nwtw,
             (R2) nwtw => ntww!
             (R5) nwtw => nwt!
            (R5) ewtw => ewt!
           (R4) owtw => nwtw!
           (R5) owtw => owt!
         (R5) entw => ent!
        (R5) netw => net!
       (R5) notw => not!
      (R3) ownt => ewnt,
       (R2) ewnt => entw!
       (R4) ewnt => nwnt,
        (R2) nwnt => nntw!
        (R5) nwnt => nwn!
       (R5) ewnt => ewn!
      (R4) ownt => nwnt!
      (R5) ownt => own!
    (R5) gnnt => gnn,
     (R4) gnn => nnn!
     (R5) gnn => gn.
     (R6) gnn => gow,
      (R3) gow => gew,
       (R2) gew => egw,
        (R4) egw => ngw,
         (R2) ngw => gnw,
          (R4) gnw => nnw!
          (R5) gnw => gn.
         (R5) ngw => ng.
        (R5) egw => eg.
       (R4) gew => new!
       (R5) gew => ge.
      (R4) gow => now!*
      (R5) gow => go.
    (R6) gnnt => gowt,
     (R2) gowt => gotw,
      (R3) gotw => getw,
       (R2) getw => egtw,
        (R4) egtw => ngtw,
         (R2) ngtw => gntw,
          (R4) gntw => nntw!
          (R5) gntw => gnt,
           (R4) gnt => nnt!
           (R5) gnt => gn.
         (R5) ngtw => ngt,
          (R2) ngt => gnt!
          (R5) ngt => ng.
        (R5) egtw => egt,
         (R4) egt => ngt!
         (R5) egt => eg.
       (R4) getw => netw!
       (R5) getw => get,
        (R2) get => egt!
        (R4) get => net!
        (R5) get => ge.
      (R4) gotw => notw!
      (R5) gotw => got,
       (R3) got => get!
       (R4) got => not!
       (R5) got => go.
     (R3) gowt => gewt,
      (R2) gewt => egtw!
      (R4) gewt => newt,
       (R2) newt => entw!
       (R5) newt => new!
      (R5) gewt => gew!
     (R4) gowt => nowt,
      (R2) nowt => notw!
      (R3) nowt => newt!
      (R5) nowt => now!*
     (R5) gowt => gow!
   (R5) ngnt => ngn,
    (R2) ngn => gnn!
    (R5) ngn => ng.
  (R5) egnt => egn,
   (R4) egn => ngn!
   (R5) egn => eg.
 (R4) agnt => ngnt!
 (R5) agnt => agn,
  (R3) agn => egn!
  (R4) agn => ngn!
  (R5) agn => ag.
(R3) gnat => gnet,
 (R2) gnet => egnt!
 (R4) gnet => nnet,
  (R2) nnet => ennt,
   (R4) ennt => nnnt!
   (R5) ennt => enn,
    (R4) enn => nnn!
    (R5) enn => en.
    (R6) enn => eow,
     (R3) eow => eew!
     (R4) eow => now!*
     (R5) eow => eo.
   (R6) ennt => eowt,
    (R2) eowt => eotw!
    (R3) eowt => eewt,
     (R2) eewt => eetw!
     (R4) eewt => newt!
     (R5) eewt => eew!
     (R6) eewt => oowt,
      (R2) oowt => ootw!
      (R3) oowt => eewt!
      (R4) oowt => nowt!
      (R5) oowt => oow!
    (R4) eowt => nowt!
    (R5) eowt => eow!
  (R5) nnet => nne,
   (R2) nne => enn!
   (R5) nne => nn.
   (R6) nne => owe,
    (R2) owe => eow!
    (R3) owe => ewe,
     (R2) ewe => eew!
     (R4) ewe => nwe,
      (R2) nwe => enw!
      (R5) nwe => nw.
     (R5) ewe => ew.
    (R4) owe => nwe!
    (R5) owe => ow.
  (R6) nnet => owet,
   (R2) owet => eotw!
   (R3) owet => ewet,
    (R2) ewet => eetw!
    (R4) ewet => nwet,
     (R2) nwet => entw!
     (R5) nwet => nwe!
    (R5) ewet => ewe!
   (R4) owet => nwet!
   (R5) owet => owe!
 (R5) gnet => gne,
  (R2) gne => egn!
  (R4) gne => nne!
  (R5) gne => gn.
(R4) gnat => nnat,
 (R2) nnat => annt,
  (R3) annt => ennt!
  (R4) annt => nnnt!
  (R5) annt => ann,
   (R3) ann => enn!
   (R4) ann => nnn!
   (R5) ann => an.
   (R6) ann => aow,
    (R3) aow => eew!
    (R4) aow => now!*
    (R5) aow => ao.
  (R6) annt => aowt,
   (R2) aowt => aotw,
    (R3) aotw => eetw!
    (R4) aotw => notw!
    (R5) aotw => aot,
     (R3) aot => eet!
     (R4) aot => not!
     (R5) aot => ao.
   (R3) aowt => eewt!
   (R4) aowt => nowt!
   (R5) aowt => aow!
 (R3) nnat => nnet!
 (R5) nnat => nna,
  (R2) nna => ann!
  (R3) nna => nne!
  (R5) nna => nn.
  (R6) nna => owa,
   (R2) owa => aow!
   (R3) owa => ewe!
   (R4) owa => nwa,
    (R2) nwa => anw,
     (R3) anw => enw!
     (R4) anw => nnw!
     (R5) anw => an.
    (R3) nwa => nwe!
    (R5) nwa => nw.
   (R5) owa => ow.
 (R6) nnat => owat,
  (R2) owat => aotw!
  (R3) owat => ewet!
  (R4) owat => nwat,
   (R2) nwat => antw,
    (R3) antw => entw!
    (R4) antw => nntw!
    (R5) antw => ant,
     (R3) ant => ent!
     (R4) ant => nnt!
     (R5) ant => an.
   (R3) nwat => nwet!
   (R5) nwat => nwa!
  (R5) owat => owa!
(R5) gnat => gna,
 (R2) gna => agn!
 (R3) gna => gne!
 (R4) gna => nna!
 (R5) gna => gn.

Writing this makes me wonder if I should have found some way to jump from a successful traversal back to 'gnat' rather than applying the rules to instances of 'now' in search of an extended path to 'now'. I leave that as an exercise to the reader, and if you work out how to do it, please let me know.

© Copyright Bruce M. Axtens, 2008