To be or not to be

February 3, 2008

Speed up Grep

Filed under: Shell Script — Tags: , — tdas @ 5:02 pm

GNU grep is very slow in the UTF-8 locale. It is orders of magnitude faster in the C locale. To check your current
locale, type the following at shell prompt: locale

LANG=en_US.UTF-8
LC_CTYPE=”en_US.UTF-8″
LC_NUMERIC=”en_US.UTF-8″
LC_TIME=”en_US.UTF-8″
LC_COLLATE=”en_US.UTF-8″
LC_MONETARY=”en_US.UTF-8″
LC_MESSAGES=”en_US.UTF-8″
LC_PAPER=”en_US.UTF-8″
LC_NAME=”en_US.UTF-8″
LC_ADDRESS=”en_US.UTF-8″
LC_TELEPHONE=”en_US.UTF-8″
LC_MEASUREMENT=”en_US.UTF-8″
LC_IDENTIFICATION=”en_US.UTF-8″
LC_ALL=

In the above example, my locale is en_US.UTF-8. If you are
grep’ing very large files, you can greatly improve the speed by changing
the locale to C. In bash, you would type: export LC_ALL=C

Then type locale again, the display should look something like this :

LANG=en_US.UTF-8
LC_CTYPE=”C”
LC_NUMERIC=”C”
LC_TIME=”C”
LC_COLLATE=”C”
LC_MONETARY=”C”
LC_MESSAGES=”C”
LC_PAPER=”C”
LC_NAME=”C”
LC_ADDRESS=”C”
LC_TELEPHONE=”C”
LC_MEASUREMENT=”C”
LC_IDENTIFICATION=”C”
LC_ALL=C

Future version of grep are planned to address this issue. Until then,
use the C locale with grep. If you are frequently using grep to search for large text files, you should include it in your  .bash_profile.

4 Comments »

  1. Thanks. Works great

    Comment by user — July 1, 2009 @ 6:24 am

  2. Helps a lot.

    Comment by Arne v.Irmer — October 14, 2010 @ 9:11 am

  3. Thank you! I’d been wondering for months why grep was running so slowly. Changing the locale to C really speeds it up, especially for case insensitive (grep -i) searches.

    An example on my code tree:

    en_US.UTF-8: 30.749s
    C: 0.469s

    Comment by Woot — October 30, 2010 @ 4:43 am

  4. Also, I just noticed that this has been fixed in grep 2.7, released Sep 20, 2010. Release announcement: http://savannah.gnu.org/forum/forum.php?forum_id=6521

    Comment by Woot — October 30, 2010 @ 5:16 am


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Theme: Shocking Blue Green. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.