There are two ways to do this:
#!/usr/bin/python # pyPdf available at http://pybrary.net/pyPdf/ from pyPdf import PdfFileWriter, PdfFileReader import os for fileName in os.listdir('.'): try: if fileName.lower()[-3:] != "pdf": continue input1 = PdfFileReader(file(fileName, "rb")) print input1.getDocumentInfo().title os.rename(fileName, input1.getDocumentInfo().title + ".pdf") except: print ''
#!/bin/bash for file in ./* do a=$(pdftk $file dump_data_utf8 | grep -A 1 'InfoKey: Title' | sed '/InfoKey: Title/d' | sed 's/InfoValue: //') if [ "$a" != "" ] then echo $a".pdf" b="$a"".pdf" mv $file "$b" fi done
This reads the first heading that comes after the word Chapter, and uses it to rename the files.
#!/bin/bash for file in ./* do a=$(pdftotext -layout "$file" - | grep 'Chapter' -A1 -m1 | sed -e 's/^[ \t]*//' | tr '\n' ' ' | sed -e "s/[[:space:]]\+/ /g" | sed 's/\s*$//g' ) #| sed 's/\s*$//g') if [ "$a" != "" ] then b=$a".pdf" echo $file " ==> " $a " --> " $b mv "$file" "$b" fi done
Joel G Mathew, known in tech circles by the pseudonym Droidzone, is an opensource and programming enthusiast.
His favorite pastime is grappling with GNU compilers, discovering newer Linux secrets, writing scripts, hacking roms, and programs (nothing illegal), reading, blogging. and testing out the latest gadgets.
When away from the tech world, Dr Joel G. Mathew is a practising ENT Surgeon, busy with surgeries and clinical practise.