HOWTO: Batch Download a Book in PDF Pages from NetLibrary

NetLibrary is an online book resources that universities or other individuals pay to supply them with virtual copies of books. These books are available online, and can be searched, downloaded, and saved. The catch is that NetLibrary’s interface limits you to viewing (in horribly slow Acrobat reader) one page at a time. Given how unresponsive Acrobat makes many computers, this can make printing out a long book take hours.

Therefore, I took the effort to figure out how to batch download a book from NetLibrary, saving me valuable time.

My solution uses a combination of Firefox and Perl, but other solutions are of course available.

After I loaded up the first true page of the book in the NetLibrary interface, I gave the frame with the PDF its own Window used Firefox’s Tools | Page Info | Media properties dialog box to determine the URL of the embedded PDF file. It turns out it’s a call to a program named nlReader.dll, but it takes a book identification number and page number as arguments:

Obviously, the part requires my university proxy. For normal pages, the filename was in the format of Page_1.pdf, Page_2.pdf, etc. So I wrote a perlscript to create hyperlinks to pages 1 to 499, saved the output to HTML, used the DownloadThemAll! Firefox extention to get them, and…

Then Acrobat crashed trying to print out those hundreds of PDFs. Boo! Fortunately, Perl came to my rescue… I used ppm to install the module Perl::Reuse, then wrote a script to append all those pdfs into one. The final product is about 500 pages ans 70 megs, but quite easy to store, print out, etc.

Thanks, NetLibrary!

12 thoughts on “HOWTO: Batch Download a Book in PDF Pages from NetLibrary”

  1. Sounds like a mighty fine pe(a)rl ye found, matey. You have a promising career as a pirate ahead of you (so long as you can sail the seven seas aboard a .edu proxy).

    How many titles are available? Just scholarly stuff, or mainstream books too?

  2. As I'm merely automating intended use that I pay for, I certainly hope it isn't piracy!

    (Now if only I could write a pearl front-end for Blackboard [1] !)

    Re: their collection, NetLibrary provides access to 2,740 e-books at Nebraska. The newest titles are :

    * Competencies in Organizational E-learning: Concepts and Tools
    by Sicilia, Miguel-Angel

    * Mail and Internet Surveys: The Tailored Design Method
    2Nd Ed., 2007 Update With New Internet, Visual, and Mixed-mode Guide.
    by Dillman, Don A.

    * Outsourcing Management Information Systems
    by Schniederjans, Marc J.; Schniederjans, Ashlyn M.; Schniederjans, Dara

    * Agile Virtual Enterprises: Implementation and Management Support
    by Cunha, Maria Manuela.; Putnik, Goran

    * The AMA Handbook of Project Management 2Nd Ed.
    by Dinsmore, Paul C.; Cabanis-Brewin, Jeannette.

    And the oldest are:

    * Sequoyah
    by Foreman, Grant.
    Publication: Norman, Okla University of Oklahoma Press, 1938.

    * The Branch Will Not Break: Poems
    by Wright, James Arlington.
    Publication: Middletown, Conn. Wesleyan University Press, 1963.

    * The Branch Will Not Break: Poems
    by Wright, James Arlington.
    Publication: Middletown, Conn. Wesleyan University Press, 1963.

    * History of Nineteenth-century Russian Literature. Vol. 2, The Age of Realism
    by Chyzhevskyi, Dmytro.
    Publication: Nashville, Tenn. Vanderbilt University Press, 1

    * Early Modern English
    Language Library
    by Barber, Charles Laurence.
    Publication: Edinburgh Edinburgh University Press, 1976.

    The Love Library has Tom's dissertation [1], btw, but sadly not an e-version of it…


  3. Well, you wouldn't have to worry about a hard copy if you put it on a PDA or dedicated e-reading device for more convenient viewing.

  4. About time you typed something useful 🙂 How long has it been since you wrote any code? The Rails Student Boredom Generator?

    See you next Saturday.

  5. I tried this with an ebook I’m interested in, but the pages aren’t linked as individual files. Each page in netlibrary is formatted as text with links to the individual images. Using your method would simply download all the images and none of the text. Any ideas for how to get around this?

  6. No idea. I haven’t needed this method in 2008. Off the top of my head, if you’re script is downloading the pages, you can save out the text, and then download the images separately.

  7. Ironically I would like to use this perlscript to learn more about perl. You wouldn’t happen to have a copy of this script uploaded anywhere?

  8. I did something similar using Automator but I found that netlibrary auto logs me out every 100 pages or so, so I have to clear my cookie and continue. Did you have a problem with that?
    (yes I realise this article is old)

  9. Mork,

    I don’t recall the problem at a time — maybe that changed their procedure in response to this post?

    Either way, its crummy that books we pay for (directly or indirectly) are not accessible in a non-DRM way.

Leave a Reply

Your email address will not be published. Required fields are marked *