Monday, January 07, 2008

Fun with WWW::Mechanize

WWW::Mechanize is really a very handy module if you want to automate web page related tasks. Following PERL script downloads the Perl Cook Book pages in a directory. You can put this Perl script in a scheduler and all pages will be collected in few days.

use strict;
use WWW::Mechanize;

my $mech = WWW::Mechanize->new();
my $ouputDir = 'E:\work\documents\PerlCookBook';
my @urls = (
'http://www.perl.com/cookbook/perlckbk2/solution.csp?day=1',
'http://www.perl.com/cookbook/perlckbk2/solution.csp?day=2',
'http://www.perl.com/cookbook/perlckbk2/solution.csp?day=3',
'http://www.perl.com/cookbook/perlckbk2/solution.csp?day=4',
'http://www.perl.com/cookbook/perlckbk2/solution.csp?day=5'
);

foreach my $url (@urls) {
GetPage($url);
}

sub GetPage {
my $url = $_[0];
$mech->get($url);
if ( $mech->success() ) {
my $content = $mech->content();
if ( $content =~ /\<h2 class="head1"\>(.*)?\<\/h2>/ ) {
my $title = $1;

#$title =~ s/\s+//g;
my $filePath = $ouputDir . "\\PCB_" . $title . ".html";
if ( !-e $filePath ) {
open( WR, ">$filePath" ) or die "Can't create file $filePath\n";
print WR $content;
close WR;
print "File was created successfully\n";
}
else {
print "File already exists\n.";
}
}
}
else {
print "Reqeust was not successful\n";
}
}

No comments: