A-PDF Preview and Rename

A-PDF Preview and Rename

You’re often faced with a delimma where you get a set of files from somewhere with the filenames not matching the contents. They may be chapters from a book where it would be good if the title matched the chapter name. We’re previously covered a post on A-PDF Rename, a software which can use the PDF Title metadata to batch rename a set of files. But for complex tasks like where the Title metadata has not been set for some files, A-PDF Rename fails. I’d been using Linux script to workaround this for months. Here’s my script to batch rename such files (may need to be tweaked for your file):

#!/bin/bash
if [ "$1" == "" ]
then
    direc="$PWD"
else
        echo $1
        direc="$1"
fi
for file in *pdf
do
  chap=""
  name=""
  chap=`pdf2txt -p 1 "$file" | grep 'CHAPTER' -A3 | sed '2q;d' | sed -e 's/^[ \t]*//' | sed 's/[ \t]*$//'`
  if [ "$chap" == "" ]
  then        
        name=`pdf2txt -p 1 "$file" 2>/dev/null | sed '3q;d' | sed -e 's/^[ \t]*//' | sed 's/[ \t]*$//' | sed 's/ \+/ /g' | sed 's/://g' | sed 's/\*//g'`
        newname="$name.pdf"
  else
          name=`pdf2txt -p 1 "$file" | grep 'CHAPTER' -A3 | sed '4q;d' | sed -e 's/^[ \t]*//' | sed 's/[ \t]*$//' | sed 's/ \+/ /g' | sed 's/://g' | sed 's/\*//g'`
          newname="$chap-$name.pdf"
  fi  
  echo "$file => $newname"
  echo "move \"$file\" \"$newname\"" >> renfile.bat  
done

But if you’re a Windows user, don’t despair. Enter, A-PDF Preview and Rename, a software that can help you draw boxes around text that contains the name of the pdf that you would like, and recognizes it by OCR, and can even use the similiar location to rename multiple files simultaneously. Get it from here.

The steps to do this are included as screenshots below:
Step1: Draw the files into A-PDF Preview and Rename app:

Step 2: Click on a file, and select option “Select text and OCR”. Drag a box with the mouse around the text you want to select.

Step 3. Click on “OCR” option. You will be shown a preview of the new filename. If this is correct, choose the “Batch OCR” option to rename all files. Note that the program is not 100% accurate, and you may still need to manually correct the filenames in some cases. Still, it works great for a lot of files.


You are reading this post on Joel G Mathew’s tech blog. Joel's personal blog is the Eyrie, hosted here.

Rename all pdf files in a directory with their titles

There are two ways to do this:
Method 1:

#!/usr/bin/python
# pyPdf available at http://pybrary.net/pyPdf/
from pyPdf import PdfFileWriter, PdfFileReader
import os

for fileName in os.listdir('.'):
        try:
                if fileName.lower()[-3:] != "pdf": continue
                input1 = PdfFileReader(file(fileName, "rb"))
                print input1.getDocumentInfo().title
                os.rename(fileName, input1.getDocumentInfo().title + ".pdf")
        except:
                print ''

Method 2:

#!/bin/bash
for file in ./*
do
    a=$(pdftk $file dump_data_utf8 | grep -A 1 'InfoKey: Title' | sed '/InfoKey: Title/d' | sed 's/InfoValue: //')
    if [ "$a" != "" ]
        then
                echo $a".pdf"
                b="$a"".pdf"
                mv $file "$b"
        fi
done

Method 3:
This reads the first heading that comes after the word Chapter, and uses it to rename the files.

#!/bin/bash
for file in ./*
do

    a=$(pdftotext -layout "$file" - | grep 'Chapter' -A1 -m1 | sed -e 's/^[ \t]*//' | tr '\n' ' ' | sed -e "s/[[:space:]]\+/ /g" | sed 's/\s*$//g' )
#| sed 's/\s*$//g')
    if [ "$a" != "" ]
    then
        b=$a".pdf"
        echo $file " ==> " $a " --> " $b
        mv "$file" "$b"
    fi
done

You are reading this post on Joel G Mathew’s tech blog. Joel's personal blog is the Eyrie, hosted here.

How to monitor CPU usage stats per process with munin

This post will walk you through the process of installing a plugin to monitor your munin nodes/clients for CPU usage per process
Note that the edits are to be done on the node(s).

Original script page: http://www.ajohnstone.com/achives/monitoring-processes-with-ps/

Plugin code:

Create /usr/share/munin/plugins/proc_cpu:

emacs /usr/share/munin/plugins/proc_cpu

with the following contents:

#!/bin/sh
#
# (c) 2010, Andrew Johnstone andrew @ajohnstone.com
# Based on the 'proc_mem' plugin, written by Rodrigo Sieiro rsieiro @gmail.com
#
# Configure it by using the processes env var, i.e.:
#
# [proc_mem]
# env.processes        php mysqld apache2
#

. $MUNIN_LIBDIR/plugins/plugin.sh

if [ "$1" = "autoconf" ]; then
        echo yes
        exit 0
fi

processes=${processes:="php mysqld apache2"}

if [ "$1" = "config" ]; then

        NCPU=$(egrep '^cpu[0-9]+ ' /proc/stat | wc -l)
        PERCENT=$(($NCPU * 100))
        if [ "$scaleto100" = "yes" ]; then
                graphlimit=100
        else
                graphlimit=$PERCENT
        fi
        SYSWARNING=`expr $PERCENT '*' 30 / 100`
        SYSCRITICAL=`expr $PERCENT '*' 50 / 100`
        USRWARNING=`expr $PERCENT '*' 80 / 100`

        echo 'graph_title CPU usage by process'
        echo "graph_args --base 1000 -r --lower-limit 0 --upper-limit $graphlimit"
        echo 'graph_vlabel %'
        echo 'graph_category processes'
        echo 'graph_info This graph shows the cpu usage of several processes'

        for proc in $processes; do
                  echo "$proc.label $proc"
        done
        exit 0
fi

TMPFILE=`mktemp -t top.XXXXXXXXXX` && {

  top -b -n1 > $TMPFILE

  for proc in $processes; do
    value=$(cat $TMPFILE | grep $proc | awk 'BEGIN { SUM = 0 } { SUM += $9} END { print SUM }')
    echo "$proc.value $value"
  done
  rm -f $TMPFILE
}

Now:

chmod 755 /usr/share/munin/plugins/proc_cpu
ln -s /usr/share/munin/plugins/proc_cpu /etc/munin/plugins/proc_cpu

Edit: /etc/munin/plugin-conf.d/munin-node

Add the following to the end of that file:

[proc_cpu]
user munin
group munin

Restart the service:

service munin-node restart

The data will appear under:
Overview :: MuninMonitor :: MuninMonitor :: proc cpu

Tips:
The plugins are put in /usr/share/munin/plugins/pluginname. They need to be chmodded to 755. There should be a link to the plugin from /etc/munin/plugins/pluginname. Once installed, restart munin-node with:

service munin-node restart

Please also check this:http://munin-monitoring.org/browser/munin-contrib/plugins/system/cpu-usage-by-process

Also try this:

cd /usr/share/munin/plugins
wget https://redmine.koumbit.net/projects/munin-contrib/repository/revisions/256709738d6a15b80715d91de4b7af55f1e3905e/raw/plugins/processes/multicpu
chmod 755 multicpu
ln -s multicpu /etc/munin/plugins/multicpu
service munin-node restart

New script:
Edit /etc/munin/plugin-conf.d/munin-node
Add the following lines to the end:
[proc_mem]
env.processes php mysqld apache2

Now add the script:

#!/bin/bash
#Get last 5 min cpu load
load=$(uptime | awk '{print $11}' | sed 's/[^a-zA-Z0-9.]//g')
echo "CPU load:"$load
if (( $(echo "$load > 0.8" | bc -l) ))
then 
	#CPU load is high (>1)
	#Gets list of top 10 processes using cpu load and adds them to log list
	proclist=$(ps -Ao fname,pcpu | sort -nrk 2,2 | awk '!x[$1]++' | awk '{print $1}' | head -n 10 | sed ':a;N;$!ba;s/\n/ /g')
	echo "Top processes: "$proclist
	sed -i "/env.processes/c\env.processes        $proclist/g" /etc/munin/plugin-conf.d/munin-node
	#Now restart
	service munin-node restart
else 
	echo "CPU load is ok"
fi

Add the cron job:

*/5 * * * * /bin/bash /root/bash-advanced-scripts/topprocesses

You are reading this post on Joel G Mathew’s tech blog. Joel's personal blog is the Eyrie, hosted here.

Enable .htaccess on apache2

In the file:/etc/apache2/sites-enabled/000-default.conf

Look for the line:

AllowOverride None

and change it to:

AllowOverride All

on all lines.

Alternately, a single line of sed can achieve this:

sed -i 's/AllowOverride\ None/AllowOverride\ All/g' /etc/apache2/sites-enabled/000-default.conf

Restart apache.


You are reading this post on Joel G Mathew’s tech blog. Joel's personal blog is the Eyrie, hosted here.

sed delete the nth line

Say you want to delete the nth line in a file

You do:

sed -i ‘4d’

which deletes the 4th line

I wanted to find the version of tar, and use it in a script.

#tar --version
tar (GNU tar) 1.23
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by John Gilmore and Jay Fenlason.

The first script I wrote was this:

 #tar --version | sed '2,$d' | awk '{print $4}'
1.23

Then, I modified it slightly:

tar --version | sed '2,$d' | awk '{print $NF}'

That was overkill. I could just have done:

#tar --version | head -1 | awk '{print $NF}'
1.23

 


You are reading this post on Joel G Mathew’s tech blog. Joel's personal blog is the Eyrie, hosted here.

Fix “Host key verification failed, offending host key is” error

If you have a VPS/server and reinstall it, you may see a message similiar to the below while logging in to it:

scp .ssh/ [email protected]:~/
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that the RSA host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
43:ab:e2:c8:66:c3:c3:b7:b3:49:6d:01:57:4b:cd:39.
Please contact your system administrator.
Add correct host key in /root/.ssh/known_hosts to get rid of this message.
Offending key in /root/.ssh/known_hosts:4
RSA host key for 192.145.45.167 has changed and you have requested strict checking.
Host key verification failed.
lost connection

 

To fix this, you need to delete the offending host key from .ssh/known_hosts.

You can do this with sed:

sed -i '4d' .ssh/known_hosts[/code]
The "-i" option means, do an "in-place" modification of the file. The "4d" means to delete line number 4 from the file.

So after doing it, try again:
[[email protected]] ~ #sed -i '4d' .ssh/known_hosts
[[email protected]] ~ #scp -r .ssh/ [email protected]:~/
The authenticity of host '192.145.45.167 (192.145.45.167)' can't be established.
RSA key fingerprint is 43:ab:e2:c8:66:c3:c3:b7:b3:49:6d:01:57:4b:cd:39.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.145.45.167' (RSA) to the list of known hosts.
[email protected]'s password:
authorized_keys~                              100%  399     0.4KB/s   00:00
authorized_keys                               100%  801     0.8KB/s   00:00
id_rsa                                        100% 1679     1.6KB/s   00:00
known_hosts                                   100% 1768     1.7KB/s   00:00
id_rsa.pub                                    100%  391     0.4KB/s   00:00

 


You are reading this post on Joel G Mathew’s tech blog. Joel's personal blog is the Eyrie, hosted here.

Parse text files with assignments, and get the value – Bash

Suppose you have a text file containing the following:

strings=hello world

and you wanted to use the value of strings, which is “hello world” in your script

You can parse it with the following:

grep '^strings=' test | sed 's/strings=//g'

So the generic format is:

grep '^variablename=' filename | sed 's/variablename=//g'

The grep searches for a string beginning with ‘strings=’, in the file ‘test’, and outputs the following:

$grep '^strings=' test
strings=hello world

Next we have to remove the initial assignment part upto and including the ‘=’ sign, this is done by piping the output of grep, to sed.

Now the sed line substitutes the string “strings=” with an empty string. The substitution is done globally. Finally sed outputs just the value of the assigned variable.

Practical example:

I wanted to use the main User directory on the server, to make my backups more generic. I know that the file /etc/imscp/imscp.conf contains the following line:

USER_HOME_DIR = /var/www/virtual

So what I want is to get /var/www/virtual from this file, removing the assignment and spaces.

The solution is:

grep -i 'USER_HOME_DIR' imscp/imscp.conf | sed 's/USER_HOME_DIR[ ]=[ ]//'

which outputs:

/var/www/virtual

without the initial spaces. A space within a square brackets matches any number of spaces in the expression.

However, if you want to use these values in variables, you need to make minor adjustments to take care of variable escaping:

apache_home_var="USER_HOME_DIR"
apache_conf_file=/etc/imscp/imscp.conf
grep -i "$apache_home_var" "$apache_conf_file" | sed 's/'$apache_home_var'[ ]=[ ]//'

 


You are reading this post on Joel G Mathew’s tech blog. Joel's personal blog is the Eyrie, hosted here.

Expand variables in a sed substitution on Bash

sed -i.bak ‘s/string_to_replace/’$variablename’/g’ filename

The string to be replaced is, as usual, unquoted. However the variable to be used is single quoted. Contrary to normal Bash practise, the single quote expands instead of being literal.


You are reading this post on Joel G Mathew’s tech blog. Joel's personal blog is the Eyrie, hosted here.

A simple example of SED

The basic command for replacing occurences of a particular string in a file to another string and output to a new file:

Let’s call our source file file1

sed 's/'old string'/'new string'/' file1

Example
sed 's,'CONFIG_TUN=m','#CONFIG_TUN=m','lconfig

This will output the replaced text to screen. If you want to make changes in the file, you need to pipe it to a new file and finally move the file back to original name.

sed 's/'old string'/'new string'/' file1 > file2 ; mv file2 file1[/code]
And to test if it was successful:
grep -i 'new string' destinationfile

You are reading this post on Joel G Mathew’s tech blog. Joel's personal blog is the Eyrie, hosted here.