GNU grep is very slow in the UTF-8 locale. It is orders of magnitude faster in the C locale. To check your current
locale, type the following at shell prompt: locale
LANG=en_US.UTF-8
LC_CTYPE=”en_US.UTF-8″
LC_NUMERIC=”en_US.UTF-8″
LC_TIME=”en_US.UTF-8″
LC_COLLATE=”en_US.UTF-8″
LC_MONETARY=”en_US.UTF-8″
LC_MESSAGES=”en_US.UTF-8″
LC_PAPER=”en_US.UTF-8″
LC_NAME=”en_US.UTF-8″
LC_ADDRESS=”en_US.UTF-8″
LC_TELEPHONE=”en_US.UTF-8″
LC_MEASUREMENT=”en_US.UTF-8″
LC_IDENTIFICATION=”en_US.UTF-8″
LC_ALL=
In the above example, my locale is en_US.UTF-8. If you are
grep’ing very large files, you can greatly improve the speed by changing
the locale to C. In bash, you would type: export LC_ALL=C
Then type locale again, the display should look something like this :
LANG=en_US.UTF-8
LC_CTYPE=”C”
LC_NUMERIC=”C”
LC_TIME=”C”
LC_COLLATE=”C”
LC_MONETARY=”C”
LC_MESSAGES=”C”
LC_PAPER=”C”
LC_NAME=”C”
LC_ADDRESS=”C”
LC_TELEPHONE=”C”
LC_MEASUREMENT=”C”
LC_IDENTIFICATION=”C”
LC_ALL=C
Future version of grep are planned to address this issue. Until then,
use the C locale with grep. If you are frequently using grep to search for large text files, you should include it in your .bash_profile.
Thanks. Works great
Comment by user — July 1, 2009 @ 6:24 am
Helps a lot.
Comment by Arne v.Irmer — October 14, 2010 @ 9:11 am
Thank you! I’d been wondering for months why grep was running so slowly. Changing the locale to C really speeds it up, especially for case insensitive (grep -i) searches.
An example on my code tree:
en_US.UTF-8: 30.749s
C: 0.469s
Comment by Woot — October 30, 2010 @ 4:43 am
Also, I just noticed that this has been fixed in grep 2.7, released Sep 20, 2010. Release announcement: http://savannah.gnu.org/forum/forum.php?forum_id=6521
Comment by Woot — October 30, 2010 @ 5:16 am