The Robservatory

Robservations on everything…

 

Count pages in all PDFs within a folder structure

Please see this newer post, with a new script that provides subtotals by subfolder, which is what I really wanted when I wrote this one.

Recently I’ve been trying to go paperless (well, mostly paperless) via a Fujitsu ScanSanp ix500. (I’ll have more to say about the scanner in a future post).

One way to go paperless is to just go from now forward—start scanning stuff and don’t worry about history. I decided that I’d go the other route, and work through our old paper files: some would be scanned and kept, much would just be recycled. The process went really quickly, compared to what I had expected. It helps that the Fujitsu is a wicked-fast document scanner!

But I was curious about how much I was scanning, in terms of total PDF pages—not files, but counting the pages in the files. Spotlight to the rescue; the field kMDItemNumberOfPages returns the number of pages in a document, and it seemed accurate in testing via mdls:

$ mdls /path/to/somefile.pdf | grep kMDItemNumberOfPages
kMDItemNumberOfPages = 4

So I set out to write a script to traverse my “Scans” folder, and return the total number of PDF pages.

This script is very simple—it’s got a basic error check to make sure there’s a value for kMDItemNumberOfPages, but other than that, it just spits out one line per file, showing the number of pages per file, and then a grand total at the end.

Here’s the script:

Copy and paste into a new shell script, save it, and make it executable (chmod 755 scriptname). Then, assuming you’ve saved it somewhere on your path, just execute it, and you’ll get a list of every file’s PDF page count, along with a grand total.

If you’d rather see just the grand total, comment out the indicated lines. Note that it’s written specifically to look at PDFs, as shown in the myFiles line. Change that to look at other file types.

A really fancy version of this script would provide totals by directory. But such fanciness is beyond my shell scripting skills, so mine doesn’t do it.

Oh, and so far? 5,109 pages scanned…and counting, thanks to the script.

4 Comments

Add a Comment
    1. It does – it comes with a version of FineReader OCR, and it’s really good. Fast, too, as it will often be done with OCR only a second or two after I finish a stack of documents.

      -rob.

  1. Cool script! Where’d you ferret out the “kMDItemNumberOfPages” field? Now you know exactly how many reams to buy if you go back to paper!

    1. I thought this must be something that the system tracked, so I used mdls in Terminal, which shows all the Spotlight data for a given file. Browsing through that, I spotted kMDItemNumberofPages, then checked it against a couple files to make sure it was right. And it was :).

      -rob.

Leave a Reply

The Robservatory © 2017 Built from the Frontier theme