HOWTO: Batch Download a Book in PDF Pages from NetLibrary

NetLibrary is an online book resources that universities or other individuals pay to supply them with virtual copies of books. These books are available online, and can be searched, downloaded, and saved. The catch is that NetLibrary’s interface limits you to viewing (in horribly slow Acrobat reader) one page at a time. Given how unresponsive Acrobat makes many computers, this can make printing out a long book take hours.

Therefore, I took the effort to figure out how to batch download a book from NetLibrary, saving me valuable time.

My solution uses a combination of Firefox and Perl, but other solutions are of course available.

After I loaded up the first true page of the book in the NetLibrary interface, I gave the frame with the PDF its own Window used Firefox’s Tools | Page Info | Media properties dialog box to determine the URL of the embedded PDF file. It turns out it’s a call to a program named nlReader.dll, but it takes a book identification number and page number as arguments:

http://0-www.netlibrary.com.library.unl.edu/nlreader/nlReader.dll?BookID=BOOKIDGOESHERE&FileName=FILENAMEGOESHERE

Obviously, the library.unl.edu part requires my university proxy. For normal pages, the filename was in the format of Page_1.pdf, Page_2.pdf, etc. So I wrote a perlscript to create hyperlinks to pages 1 to 499, saved the output to HTML, used the DownloadThemAll! Firefox extention to get them, and…

Then Acrobat crashed trying to print out those hundreds of PDFs. Boo! Fortunately, Perl came to my rescue… I used ppm to install the module Perl::Reuse, then wrote a script to append all those pdfs into one. The final product is about 500 pages ans 70 megs, but quite easy to store, print out, etc.

Thanks, NetLibrary!

Dozier Spam Bot Attacks tdaxp?

Two strange messages (I’ve left them in tact, except for the hyperlink) have appeared in the comments for my posts, Dozier Internet Law harms client’s reputation and Did Dozier Internet Law Misrepresent a Federal Judge?.”

The first comment reads:

Here is the Dozier Internet Law Blog:

[url redacted by tdaxp]

Frankly, it seems pretty insightful.

and the second is:

I don’t know who is right. It looks like it might be Dozier:

[url redacted by tdaxp]

At first blanch, these are merely spam messages. The IPs of the two comments (left with the same nick and email account) are quite different… the 128.241.*.* range resolves to NTT America (a “global IP solutions company”), while the range of 207.195.240.0 to .255.255 resoles to Global Tac, LLC. Global Tac has been implemented in spam messages before. It appears that Global Tac hides behind150 different IP messages to conduct its spam campaigns, so the discrepancy between the IP addresses is smaller than it appears.

Dozier Internet Law is no stranger to spam as a means of advertising – they’ve long generated spam websites with nonsensical information. Still, escalating this to include spam comments on private blogs comes dangerously close to trespass and hacking.