The basic Unix distributions don’t come with an inbuilt utility to convert pdf documents to text documents. But thanks to xpdf, we can use the pdftotext command for this kind of task. Below I have listed the steps, from how to install pdftotext on your machine to using it for pdf to text document conversion.
- Download the source code from ftp://ftp.foolabs.com/pub/xpdf/xpdf-3.02.tar.gz ( wget will do )
- Untar and uncompress the archive : tar -xzf xpdf-3.02.tar.gz
- Go into the directory xpdf-3.02/
- type configure ( install in standard path)
- make ( you will need gcc )
- make install ( root priviledge needed)
- At this point you should have successfully installed the xpdf utilities
- Now try converting a pdf document to text : pdftotext foo.pdf
Hopefully thats helpful for someone
Cheers