To be or not to be

August 21, 2008

How to convert Pdf to Text in Unix? – Use xpdf’s pdftotext

Filed under: How To, Shell Script — Tags: , , — tdas @ 7:01 pm

The basic Unix distributions don’t come with an inbuilt utility to convert pdf documents to text documents. But thanks to xpdf, we can use the pdftotext command for this kind of task. Below I have listed the steps, from how to install pdftotext on your machine to using it for pdf to text document conversion.

  1. Download the source code from ftp://ftp.foolabs.com/pub/xpdf/xpdf-3.02.tar.gz ( wget will do )
  2. Untar and uncompress the archive : tar -xzf xpdf-3.02.tar.gz
  3. Go into the directory xpdf-3.02/
  4. type configure ( install in standard path)
  5. make ( you will need gcc )
  6. make install ( root priviledge needed)
  7. At this point you should have successfully installed the xpdf utilities :)
  8. Now try converting a pdf document to text : pdftotext foo.pdf

Hopefully thats helpful for someone :)

Cheers

May 26, 2008

How to use the OR operator in Grep?

Filed under: How To, Shell Script — Tags: , , — tdas @ 12:44 pm

Ever wondered how to search for multiple patterns in one grep statement? One problem I often come across is, while using grep, if I want to search for something like “find all matches that start with a $ or with #“, I am stuck.

Well apparently using the OR operator in grep is trivial.

grep “^$\|^#” foo.dat ( this will return all matches that start with a $ or with # )

Note: Do NOT forget the backslash \ before the |.

Hope that will help someone :)

May 21, 2008

How to tar and untar files in UNIX?

Filed under: How To, Shell Script — Tags: , , , , , — tdas @ 3:14 am

Few years ago, during my undergraduate degree, I was asked to compress my assignment using tar and submit it. I was so scared with all the tar+compress+unix jig, that I ended up NOT submitting the assignment :O. Now when I look back, I feel so stupid. Anyways, now that I know a lil bit more about tar and untar, I’d like to share my knowledge with everyone and hopefully help someone from NOT submitting an assignment :P

Basically tar can be used to group multiple files/directories into one single file, and separate(extract) an archive created by tar into separate files.

* To group multiple files : tar -cvf foo.tar a.dat b.dat c.dat ( this will group files [a-c]*.dat to one file foo.tar )
c = create a tar file
v = verbose( nothing important :P )
f = create the tar file with filename provided as the argument

Thats all you need to know to tar(group) a bunch of files/directories.

* To tar files and gzip them : tar -czf foo.tar.gz *.dat ( this will create a gzip-compressed Tar file of the name foo.tar.gz of all files with a .dat suffix in that directory )

* To untar(separate) files from a tar archive : tar -xvf foo.tar ( this will produce three separate files a.dat, b.dat and c.dat )

* To untar(extract) a gzipped tar archive file : tar -xzf foo.tar.gz

* To untar a bzipped (.bz2) tar archive file : tar -xjf foo.tar.bz2

May 13, 2008

How To Flush/Clear Squid Cache

Filed under: How To, Shell Script — Tags: , , , — tdas @ 7:29 pm

Squid is a high-performance proxy caching server for web clients, supporting FTP, gopher, and HTTP data objects. Unlike traditional caching software, squid handles all requests in a single, non-blocking, I/O-driven process.

Sometimes we need to clear the contents in the cache and restart the program. Clearing the squid cache is brain simple:

goto the directory where the squid program resides( e.g. /etc/init.d/ )
./squid flush

You would need root(su) priviledges to perform the operation

Reference :
Squid Cache

February 17, 2008

How To Extract lines from a file in Unix?

Filed under: How To, Shell Script — Tags: , , , , — tdas @ 8:27 pm

To extract k lines from a text file in Unix, we can use a combination of head and tail.

head -20 file.dat | tail -10 //this will gives us line number [10-20] from file.dat

Another elegant and easy solution for extracting a range of lines from a text file in unix would be using sed.

cat file.dat | sed -n ‘10,20p’ > output.dat // this will also extract lines [10-20] from file.dat

How To Convert Lower Case to Upper case and Vice Versa Unix?

Filed under: How To, Shell Script — Tags: , , — tdas @ 6:32 pm

To convert lower case charactera to upper case and vice versa is a fairly common task in the computer world. In Unix this can be done very easily by using the tr command.

To convert a file containing lower case characters to upper case characters :

tr ‘[:lower:]‘ ‘[:upper:]‘ < foo.dat //Note this will change everything to upper case

To convert a file containing upper case characters to lower case characters :

tr ‘[:upper:]‘ ‘[:lower:]‘ < foo.dat //Note this will change everything to lower case

February 11, 2008

Unix Sort

Filed under: Shell Script — Tags: , , — tdas @ 3:20 am

The Unix sort command is one of the most useful/powerful commands I have ever used. Below I have listed some cool things you can do with the sort command:

Sort and output to the same file (-0) : sort -o foo.dat foo.dat

Sort and keep only unique values (-u): sort -u -o foo.dat foo.dat

Sort numbers (-n): sort -n -o foo.dat foo.dat

Sort numbers in reverse (-r) : sort -n -r -o foo.dat foo.dat

Union of two files : sort file1 file2 | uniq > file3

Intersection of two files : sort file1 file2 | uniq -d >file3

    February 3, 2008

    Speed up Grep

    Filed under: Shell Script — Tags: , — tdas @ 5:02 pm

    GNU grep is very slow in the UTF-8 locale. It is orders of magnitude faster in the C locale. To check your current
    locale, type the following at shell prompt: locale

    LANG=en_US.UTF-8
    LC_CTYPE=”en_US.UTF-8″
    LC_NUMERIC=”en_US.UTF-8″
    LC_TIME=”en_US.UTF-8″
    LC_COLLATE=”en_US.UTF-8″
    LC_MONETARY=”en_US.UTF-8″
    LC_MESSAGES=”en_US.UTF-8″
    LC_PAPER=”en_US.UTF-8″
    LC_NAME=”en_US.UTF-8″
    LC_ADDRESS=”en_US.UTF-8″
    LC_TELEPHONE=”en_US.UTF-8″
    LC_MEASUREMENT=”en_US.UTF-8″
    LC_IDENTIFICATION=”en_US.UTF-8″
    LC_ALL=

    In the above example, my locale is en_US.UTF-8. If you are
    grep’ing very large files, you can greatly improve the speed by changing
    the locale to C. In bash, you would type: export LC_ALL=C

    Then type locale again, the display should look something like this :

    LANG=en_US.UTF-8
    LC_CTYPE=”C”
    LC_NUMERIC=”C”
    LC_TIME=”C”
    LC_COLLATE=”C”
    LC_MONETARY=”C”
    LC_MESSAGES=”C”
    LC_PAPER=”C”
    LC_NAME=”C”
    LC_ADDRESS=”C”
    LC_TELEPHONE=”C”
    LC_MEASUREMENT=”C”
    LC_IDENTIFICATION=”C”
    LC_ALL=C

    Future version of grep are planned to address this issue. Until then,
    use the C locale with grep. If you are frequently using grep to search for large text files, you should include it in your  .bash_profile.

    Grep – The Magic Unix Command

    Filed under: Shell Script — Tags: , , , — tdas @ 4:50 pm

    I have been using the grep command for a while now, and I have been in awe of it ever since. Assuming the readers have some basic knowledge of Unix commands and grep in general I would like to mention a couple really cool features of grep that I find really handy.

    Looking for the exact match : Imagine you are looking to extract a specific pattern from a text file, but you do not want the other information in the matching line, then use; grep -o.  For example, if you want to extract the domain name from the URL : http://www.cs.dal.ca/abc/report.html?report=34A.  Use the following command:  ‘ cat http://www.cs.dal.ca/abc/report.html?report=34A | grep -o “www.[^\/]*“; this will return www.cs.dal.ca.

    Looking for adjacent lines : A lot of the times, when performing a search on a text file using grep, we want to see the adjacent lines for the match. grep supports this feature by using the -B, -A -C options.

    grep -A 5 “^abc” file.dat // this will return the line starting with abc in file.dat and 5 lines after it.

    grep -B 5 “^abc” file.dat // this will return the line starting with abc in file.dat and 5 lines before it.

    grep -C 5 “^abc” file.dat // this will return the line starting with abc in file.dat and 5 lines before & after it.

     

    Hopefully, these tricks will be of help to someone :)

      Blog at WordPress.com.