GNU grep is very slow in the UTF-8 locale. It is orders of magnitude faster in the C locale. To check your current
locale, type the following at shell prompt: locale
LANG=en_US.UTF-8
LC_CTYPE=”en_US.UTF-8″
LC_NUMERIC=”en_US.UTF-8″
LC_TIME=”en_US.UTF-8″
LC_COLLATE=”en_US.UTF-8″
LC_MONETARY=”en_US.UTF-8″
LC_MESSAGES=”en_US.UTF-8″
LC_PAPER=”en_US.UTF-8″
LC_NAME=”en_US.UTF-8″
LC_ADDRESS=”en_US.UTF-8″
LC_TELEPHONE=”en_US.UTF-8″
LC_MEASUREMENT=”en_US.UTF-8″
LC_IDENTIFICATION=”en_US.UTF-8″
LC_ALL=
In the above example, my locale is en_US.UTF-8. If you are
grep’ing very large files, you can greatly improve the speed by changing
the locale to C. In bash, you would type: export LC_ALL=C
Then type locale again, the display should look something like this :
LANG=en_US.UTF-8
LC_CTYPE=”C”
LC_NUMERIC=”C”
LC_TIME=”C”
LC_COLLATE=”C”
LC_MONETARY=”C”
LC_MESSAGES=”C”
LC_PAPER=”C”
LC_NAME=”C”
LC_ADDRESS=”C”
LC_TELEPHONE=”C”
LC_MEASUREMENT=”C”
LC_IDENTIFICATION=”C”
LC_ALL=C
Future version of grep are planned to address this issue. Until then,
use the C locale with grep. If you are frequently using grep to search for large text files, you should include it in your .bash_profile.