Total PDF pages in subfolders across folder structure

Last week, I wrote a script that ran through a folder structure and output the page count of every PDF in all folders and sub-folders, and also spit out a grand total.

While this worked well, what I really wanted was a script that just totaled PDF pages by sub-folder, without seeing all the file-by-file detail. After trying to retrofit the first script, I realized that was a waste of time, and started over from scratch.

The resulting script works just as I'd like it to, traversing a folder structure and showing PDF page counts by folder:

$ countpdfbydir 47: ./_Legal 2: ./_Medical-Dental 15: ./_Medical-Dental/Kids 11: ./_Medical-Dental/Marian 2: ./_Medical-Dental/Rob 35: ./_Personal Documents/Kids 87: ./_Personal Documents/Marian 28: ./_Personal Documents/Rob 10: ./_Personal Documents/Rob/Golf 12: ./_Personal Documents/Rob/Travel ------------------------------------------------------------------- 249: Total PDF Pages

It took a few revisions, but I like this version; it even does some simplistic padding to keep the figures lined up in the output.

Here's what I came up with:

#!/bin/bash

saveIFS=$IFS
IFS=$(echo -en "\n\b")

baseDir=`pwd`
myDirs=($(find . -mindepth 0 -maxdepth 999 -type d))
myDirCount=${#myDirs[*]}

grandtotalPages=0

i=0
while [ $i -lt $myDirCount ]; do
	cd ${myDirs[$i]}
	
	myFiles=($(find . -maxdepth 1 -name "*.pdf"))
	myFileCount=${#myFiles[*]}
	subtotalPages=0
	
	# We have PDFs in this dir, so loop through and count pages
	if [ $myFileCount -ne 0 ]; then
		j=0
		while [ $j -lt $myFileCount ]; do
			pageCount=$(mdls ${myFiles[j]} | grep kMDItemNumberOfPages | awk -F'= ' '{print $2}')
			size=${#pageCount}
  			if [ $size -eq 0 ]
  			then
  				# This PDF is missing a page count, so we skip it
    			# echo ${myFiles[j]} : \*\* Skipped - no page count \*\*
    			echo ""
			else
				# Increment a subtotal by directory and a running grand total
    			subtotalPages=$(($subtotalPages + $pageCount)) 
    			grandtotalPages=$((grandtotalPages + $pageCount))
  			fi
  			j=$(( $j + 1 ))
  		done
 
  		# Pad the results for nice alignment of page counts
  		digitCount=${#subtotalPages}
 		case $digitCount in
  			1)
  				padding="    ";;
  			2)
  				padding="   ";;
  			3)
  				padding="  ";;
  			4)
  				padding=" ";;
  			*) ;;
 		esac
  		
  		echo "$padding$subtotalPages: ${myDirs[i]}"
  	fi
  	
	i=$(( $i + 1 ))
	cd $baseDir
done

		# Pad the results for nice alignment of grand total
  		digitCount=${#grandtotalPages}
 		case $digitCount in
 			1)
 				padding="    ";;
 			2)
 				padding="   ";;
 			3)
 				padding="  ";;
  			4)
  				padding=" ";;
  			*) ;;
 		esac

echo "-------------------------------------------------------------------"
echo "$padding$grandtotalPages: Total PDF Pages"
  		
IFS=$saveIFS

#!/bin/bash

saveIFS=$IFS

IFS=$(echo -en "\n\b")

baseDir=`pwd`

myDirs=($(find . -mindepth 0 -maxdepth 999 -type d))

myDirCount=${#myDirs[*]}

grandtotalPages=0

i=0

while [ $i -lt $myDirCount ]; do

cd ${myDirs[$i]}

myFiles=($(find . -maxdepth 1 -name "*.pdf"))

myFileCount=${#myFiles[*]}

subtotalPages=0

# We have PDFs in this dir, so loop through and count pages

if [ $myFileCount -ne 0 ]; then

j=0

while [ $j -lt $myFileCount ]; do

pageCount=$(mdls ${myFiles[j]} | grep kMDItemNumberOfPages | awk -F'= ' '{print $2}')

size=${#pageCount}

if [ $size -eq 0 ]

then

# This PDF is missing a page count, so we skip it

# echo ${myFiles[j]} : \*\* Skipped - no page count \*\*

echo ""

else

# Increment a subtotal by directory and a running grand total

subtotalPages=$(($subtotalPages + $pageCount))

grandtotalPages=$((grandtotalPages + $pageCount))

j=$(( $j + 1 ))

done

# Pad the results for nice alignment of page counts

digitCount=${#subtotalPages}

case $digitCount in

padding=" ";;

*) ;;

esac

echo "$padding$subtotalPages: ${myDirs[i]}"

i=$(( $i + 1 ))

cd $baseDir

done

# Pad the results for nice alignment of grand total

digitCount=${#grandtotalPages}

case $digitCount in

padding=" ";;

*) ;;

esac

echo "-------------------------------------------------------------------"

echo "$padding$grandtotalPages: Total PDF Pages"

IFS=$saveIFS

I feared this would be incredibly slow, but it only took about 40 seconds to traverse a folder structure with about a gigabyte of PDFs in about 1,500 files spread across 160 subfolders, and totalling 5,306 PDF pages.

Once I had this version working, I repurposed the original script to output file-level PDF page counts only for the current directory, so I can use that one when I want the details:

$ cd Home\ Stuff $ pdfcountbyfile 2: 2015-03-27 - Lowes.pdf 4: 2015-07-14 - Home Depot.pdf 1: 2015-09-03 - Home Depot.pdf ----------------------------------------------------------------- 7: Total PDF pages in this folder

In case you want it, here's the modified script that generates the file-level PDF page counts:

#!/bin/bash

saveIFS=$IFS
IFS=$(echo -en "\n\b")

myFiles=($(find . -maxdepth 1 -name "*.pdf"))
myFileCount=${#myFiles[*]}
totalPages=0
i=0

while [ $i -lt $myFileCount ]
do
	prettyName=$(echo ${myFiles[i]}|cut -c 3-999)
	pageCount=$(mdls ${myFiles[i]} | grep kMDItemNumberOfPages | awk -F'= ' '{print $2}')
	size=${#pageCount}
	if [ $size -eq 0 ]
	then
		echo $prettyName : \*\* Skipped - no page count \*\*
  else
 	# Pad the results for nice alignment of page counts
  	digitCount=${#pageCount}
 	case $digitCount in
  		1)
  			padding="    ";;
  		2)
  			padding="   ";;
  		3)
  			padding="  ";;
  		4)
  			padding=" ";;
  		*) ;;
 	esac
	echo "$padding$pageCount: $prettyName"
	
    totalPages=$(($totalPages + $pageCount))  
  fi
  
  i=$(( $i + 1 ))
  
done

# Pad the results for nice alignment of grand total
digitCount=${#totalPages}
case $digitCount in
	1)
 		padding="    ";;
 	2)
 		padding="   ";;
 	3)
 		padding="  ";;
  	4)
  		padding=" ";;
  	*) ;;
esac

echo "-----------------------------------------------------------------"
echo "$padding$totalPages: Total PDF pages in this folder"

IFS=$saveIFS

#!/bin/bash

saveIFS=$IFS

IFS=$(echo -en "\n\b")

myFiles=($(find . -maxdepth 1 -name "*.pdf"))

myFileCount=${#myFiles[*]}

totalPages=0

i=0

while [ $i -lt $myFileCount ]

prettyName=$(echo ${myFiles[i]}|cut -c 3-999)

pageCount=$(mdls ${myFiles[i]} | grep kMDItemNumberOfPages | awk -F'= ' '{print $2}')

size=${#pageCount}

if [ $size -eq 0 ]

then

echo $prettyName : \*\* Skipped - no page count \*\*

else

# Pad the results for nice alignment of page counts

digitCount=${#pageCount}

case $digitCount in

padding=" ";;

*) ;;

esac

echo "$padding$pageCount: $prettyName"

totalPages=$(($totalPages + $pageCount))

i=$(( $i + 1 ))

done

# Pad the results for nice alignment of grand total

digitCount=${#totalPages}

case $digitCount in

padding=" ";;

*) ;;

esac

echo "-----------------------------------------------------------------"

echo "$padding$totalPages: Total PDF pages in this folder"

IFS=$saveIFS

These are clearly not need-every-day scripts, but I like the information they provide (because I'm a data geek), and they were fun for my shell-scripting-challenged brain to figure out. I'm 99.9% positive the efficiency could be improved by a factor of 100, but this works well enough for my needs.

4 thoughts on “Total PDF pages in subfolders across folder structure”

A Corbett Mar 6 '16 at 2:52 pm
This is a great little script and very useful. Thanks for your work on this!
Alex Aug 29 '17 at 1:24 pm
This sounds like exactly what I need. But I am not that technical. How do I make this script work?
1. Rob Griffiths Aug 29 '17 at 1:32 pm
  A full-blown shell script primer is beyond my scope here, but there are lots of tutorials out there. In a nutshell, you need to:
  1) Copy the script.
  2) Paste it into a new pure text editor, like BBedit or TextEdit in plain text mode.
  3) Save the file
  4) Make it executable in Terminal with chmod 755 scriptname
  5) Run the script—but this requires either saving it somewhere on your path, or referencing the full path to the file each time. And this is where things get complicated, so the tutorials would be useful.
  regards;
  -rob.
Patrick Oct 18 '17 at 3:05 am
Hi Rob, when I run the script line by line it works but when I make it in to a countpdf.sh file it says
F : ** Skipped - no page count **
.0_BE_02May2017_cl : ** Skipped - no page count **
: ** Skipped - no page count **
it : ** Skipped - no page count **
01_ : ** Skipped - no page count **
: ** Skipped - no page count **
...pdf : ** Skipped - no page count **
6x : ** Skipped - no page count **
: ** Skipped - no page count **
: ** Skipped - no page count **
d : ** Skipped - no page count **
pi : ** Skipped - no page count **
: ** Skipped - no page count **
.0_BE_02May2017_cl : ** Skipped - no page count **
: ** Skipped - no page count **
it : ** Skipped - no page count **
02_ : ** Skipped - no page count **
: ** Skipped - no page count **
...pdf : ** Skipped - no page count **
6x : ** Skipped - no page count **
: ** Skipped - no page count **
: ** Skipped - no page count **
d : ** Skipped - no page count **
pi : ** Skipped - no page count **
: ** Skipped - no page count **
.0_BE_02May2017_cl : ** Skipped - no page count **
: ** Skipped - no page count **
it : ** Skipped - no page count **
03_ : ** Skipped - no page count **
: ** Skipped - no page count **
...pdf : ** Skipped - no page count **
Do you know what im doing wrong?
Hope so :)
many thanks Patrick

Comments are closed.

Total PDF pages in subfolders across folder structure

Related Posts:

4 thoughts on “Total PDF pages in subfolders across folder structure”