Last week, I wrote a script that ran through a folder structure and output the page count of every PDF in all folders and sub-folders, and also spit out a grand total.
While this worked well, what I really wanted was a script that just totaled PDF pages by sub-folder, without seeing all the file-by-file detail. After trying to retrofit the first script, I realized that was a waste of time, and started over from scratch.
The resulting script works just as I'd like it to, traversing a folder structure and showing PDF page counts by folder:
$ countpdfbydir
47: ./_Legal
2: ./_Medical-Dental
15: ./_Medical-Dental/Kids
11: ./_Medical-Dental/Marian
2: ./_Medical-Dental/Rob
35: ./_Personal Documents/Kids
87: ./_Personal Documents/Marian
28: ./_Personal Documents/Rob
10: ./_Personal Documents/Rob/Golf
12: ./_Personal Documents/Rob/Travel
-------------------------------------------------------------------
249: Total PDF Pages
It took a few revisions, but I like this version; it even does some simplistic padding to keep the figures lined up in the output.
Here's what I came up with:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | #!/bin/bash saveIFS=$IFS IFS=$(echo -en "\n\b") baseDir=`pwd` myDirs=($(find . -mindepth 0 -maxdepth 999 -type d)) myDirCount=${#myDirs[*]} grandtotalPages=0 i=0 while [ $i -lt $myDirCount ]; do cd ${myDirs[$i]} myFiles=($(find . -maxdepth 1 -name "*.pdf")) myFileCount=${#myFiles[*]} subtotalPages=0 # We have PDFs in this dir, so loop through and count pages if [ $myFileCount -ne 0 ]; then j=0 while [ $j -lt $myFileCount ]; do pageCount=$(mdls ${myFiles[j]} | grep kMDItemNumberOfPages | awk -F'= ' '{print $2}') size=${#pageCount} if [ $size -eq 0 ] then # This PDF is missing a page count, so we skip it # echo ${myFiles[j]} : \*\* Skipped - no page count \*\* echo "" else # Increment a subtotal by directory and a running grand total subtotalPages=$(($subtotalPages + $pageCount)) grandtotalPages=$((grandtotalPages + $pageCount)) fi j=$(( $j + 1 )) done # Pad the results for nice alignment of page counts digitCount=${#subtotalPages} case $digitCount in 1) padding=" ";; 2) padding=" ";; 3) padding=" ";; 4) padding=" ";; *) ;; esac echo "$padding$subtotalPages: ${myDirs[i]}" fi i=$(( $i + 1 )) cd $baseDir done # Pad the results for nice alignment of grand total digitCount=${#grandtotalPages} case $digitCount in 1) padding=" ";; 2) padding=" ";; 3) padding=" ";; 4) padding=" ";; *) ;; esac echo "-------------------------------------------------------------------" echo "$padding$grandtotalPages: Total PDF Pages" IFS=$saveIFS |
I feared this would be incredibly slow, but it only took about 40 seconds to traverse a folder structure with about a gigabyte of PDFs in about 1,500 files spread across 160 subfolders, and totalling 5,306 PDF pages.
Once I had this version working, I repurposed the original script to output file-level PDF page counts only for the current directory, so I can use that one when I want the details:
$ cd Home\ Stuff
$ pdfcountbyfile
2: 2015-03-27 - Lowes.pdf
4: 2015-07-14 - Home Depot.pdf
1: 2015-09-03 - Home Depot.pdf
-----------------------------------------------------------------
7: Total PDF pages in this folder
In case you want it, here's the modified script that generates the file-level PDF page counts:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | #!/bin/bash saveIFS=$IFS IFS=$(echo -en "\n\b") myFiles=($(find . -maxdepth 1 -name "*.pdf")) myFileCount=${#myFiles[*]} totalPages=0 i=0 while [ $i -lt $myFileCount ] do prettyName=$(echo ${myFiles[i]}|cut -c 3-999) pageCount=$(mdls ${myFiles[i]} | grep kMDItemNumberOfPages | awk -F'= ' '{print $2}') size=${#pageCount} if [ $size -eq 0 ] then echo $prettyName : \*\* Skipped - no page count \*\* else # Pad the results for nice alignment of page counts digitCount=${#pageCount} case $digitCount in 1) padding=" ";; 2) padding=" ";; 3) padding=" ";; 4) padding=" ";; *) ;; esac echo "$padding$pageCount: $prettyName" totalPages=$(($totalPages + $pageCount)) fi i=$(( $i + 1 )) done # Pad the results for nice alignment of grand total digitCount=${#totalPages} case $digitCount in 1) padding=" ";; 2) padding=" ";; 3) padding=" ";; 4) padding=" ";; *) ;; esac echo "-----------------------------------------------------------------" echo "$padding$totalPages: Total PDF pages in this folder" IFS=$saveIFS |
These are clearly not need-every-day scripts, but I like the information they provide (because I'm a data geek), and they were fun for my shell-scripting-challenged brain to figure out. I'm 99.9% positive the efficiency could be improved by a factor of 100, but this works well enough for my needs.
This is a great little script and very useful. Thanks for your work on this!
This sounds like exactly what I need. But I am not that technical. How do I make this script work?
A full-blown shell script primer is beyond my scope here, but there are lots of tutorials out there. In a nutshell, you need to:
1) Copy the script.
2) Paste it into a new pure text editor, like BBedit or TextEdit in plain text mode.
3) Save the file
4) Make it executable in Terminal with chmod 755 scriptname
5) Run the script—but this requires either saving it somewhere on your path, or referencing the full path to the file each time. And this is where things get complicated, so the tutorials would be useful.
regards;
-rob.
Hi Rob, when I run the script line by line it works but when I make it in to a countpdf.sh file it says
F : ** Skipped - no page count **
.0_BE_02May2017_cl : ** Skipped - no page count **
: ** Skipped - no page count **
it : ** Skipped - no page count **
01_ : ** Skipped - no page count **
: ** Skipped - no page count **
...pdf : ** Skipped - no page count **
6x : ** Skipped - no page count **
: ** Skipped - no page count **
: ** Skipped - no page count **
d : ** Skipped - no page count **
pi : ** Skipped - no page count **
: ** Skipped - no page count **
.0_BE_02May2017_cl : ** Skipped - no page count **
: ** Skipped - no page count **
it : ** Skipped - no page count **
02_ : ** Skipped - no page count **
: ** Skipped - no page count **
...pdf : ** Skipped - no page count **
6x : ** Skipped - no page count **
: ** Skipped - no page count **
: ** Skipped - no page count **
d : ** Skipped - no page count **
pi : ** Skipped - no page count **
: ** Skipped - no page count **
.0_BE_02May2017_cl : ** Skipped - no page count **
: ** Skipped - no page count **
it : ** Skipped - no page count **
03_ : ** Skipped - no page count **
: ** Skipped - no page count **
...pdf : ** Skipped - no page count **
Do you know what im doing wrong?
Hope so :)
many thanks Patrick
Comments are closed.