This is a perl script I came up with quite some time ago, and had meant to make it available for others to use. The idea was that for a while I was downloading all the penny arcade comics and saving them in a folder for later viewing pleasure. After a while it became very difficult to find a comic when I went back to search for it due to the filenames only containing dates, I started saving the files with the title of the comic so I could find them later. This all seemed like way too much work for me, especially since I wanted to go back and get comics from the past couple years that I had been reading.
I was using perl for several other applications and wanted to try out some spidering with LWP (Lib WWW Perl) so I came up with this script. The idea is to download an archive of all the penny-arcade comics, to get a listing of comics I parsed their calendar.php site passing the year in the URL, this site displays a calendar with links to all their comics throughout the year. Also located in this page is the comic title in the title attribute of the anchor tags. Excellent! We have the date of the comic and the title right off the bat. I quickly built functionality to generate the image URL’s from the dates and save the files with the date and title. This was great, the script would do all the work of downloading the files, finding the title, organizing and naming it for me. I quickly ran into a few problems as they never seemed to follow a strict file naming convention, Since I didn’t really want to spend a whole lot of time on this I manually wrote in some fixes for the files I knew were missing. Also as a last note I wrote in some extra command line parameters to control the functionality a bit along the way, run the script with -h or -? To get a listing. Also if you want to run the perl version of this script you will need the LWP libraries which should come with most versions of perl as well as Tie::File and Getopt::Long.
End result, a script that parses the penny-arcade site for a list of all comics and downloads them into folders by year naming the files by date and title. I’m sure that there are probably a few more exception comics that I missed, but all in all for my first spidering program I’m pretty happy with it. As I get more and more into OO Perl I’ve been considering re-writing this script, perhaps if there’s interest I’ll do so.
Perl Script
Compiled Script
Special note, the compiled script was created by a fairly generic perl compiler, it takes a while to run the first time you use it as it actually extracts the perl file and modules and runs them through an interpreter





How do I use it?
When I ran it, it made a date.txt file that was empty, and after a while, it quit.
Am I supposed to put something in the date.txt file?
Thanks
Bob
October 20th, 2007
Wow, I’m pretty confident that their site has changed a bunch since this was written (almost 2 1/2 yrs ago). It likely does not work anymore. Though with some quick research into the current site you could probably fix it.
Rich Waters
October 20th, 2007