Recommend this
on Google+

Recommend this
on Google+

Wednesday, October 20, 2010

Lack of a string-trimming function in Javascript…

I love using Javascript in my web applications to reduce network traffic as much as possible by minimizing server requests. Today, I noticed a strange and ridiculous flaw in Javascript API. There is no method to remove leading and trailing spaces, as we have in all other standard scripting languages. How could the designers & vendors miss this simple "trim()" functionality?
As a follower of “Bhagavad-Gita”, I realized that simply pointing out mistake with no effort to provide solution is against “Dharma”. Hence, as JavaScript doesn't include a trim method natively I decided to write my own code and add to "string.prototype" for enabling access globally. I've tried to write the code that runs optimally. Intutively, I tried to write a number of functions using various approaches. It helped me learn few interesting facts; which most of us just ignore and write lengthy crude code, performing inappropriately.

  • Using only regular expressions:
    Here we use pure regular expressions to perform trim(). One would wonder why I wrote a conditional statement in between. The first regular expression within “if” works best for short strings. Where as the one in “else” performs faster with long strings — when efficiency matters. The speed is largely due to a number of optimizations internal to JavaScript regex interpreters which the two discrete regexes here trigger. Specifically, the pre-check of required character and start of string anchor optimizations, possibly among others.
    function trim(input_string)
    {
        if (input_string.length < 25)
        {
    /*
                If the input string is not too long. I assumed a string to be
                considered short in length if it contains less than 25  characters.
                I chose 25 just randomly.
          */
             
          return input_string.replace(/^\s*([\S\s]*?)\s*$/, '$1'
    );   
        }
        else
        {
    /*           If the input string is longer. I assumed a string to be
               considered long in length if it contains 25 or more characters.
             */
          return input_string.replace(/^\s\s*/, '').replace(/\s\s*$/, '');
        }
    }
  • No regular expressions

    I wondered how an implementation which used no regular expressions would perform. Here's what I tried. With long strings it blows away the competition, even with first approach described above. Trickily, this doesn’t mean regular expressions are poor performers. Regular expressions start with trimming leading whitespace and continue till the last of string. Hence, they do not offer a direct approach to jump to the last -character. Hence, this approach uses the second loop to directly jump to last, and works backwards until it finds a non-whitespace character.


    function trim(input_string){
        var whitespace = '\n\r\t\f\x0b\xa0\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u200b\u2028\u2029\u3000';
        for (var i = 0; i < input_string.length; i++)
        {
            if (whitespace.indexOf(input_string.charAt(i)) === -1)
            {
                input_string = input_string.substring(i);
                break;
            }
        }
        for (i = input_string.length - 1; i >= 0; i--)
        {
            if (whitespace.indexOf(input_string.charAt(i)) === -1)
            {
                input_string = input_string.substring(0, i + 1);
                break;
            }
        }
        return whitespace.indexOf(input_string.charAt(0)) === -1 ? input_string : '';
    }
  • Mixed approach

    Here, I experimented a mixed-implementation combining the best of both worlds. Knowing that,  Regular-Expressions are efficient at trimming leading whitespace; and second loop of the above method's advantage at removing trailing spaces. One more profit of using this approach is it utilizes least amount of code, and provides best performance.
    function trim(input_string)
    {
        input_string = input_string.replace(/^\s+/, '');
        for (var i = input_string.length - 1; i >= 0; i--)
        {
            if (/\S/.test(input_string.charAt(i)))
            {
                input_string = input_string.substring(0, i + 1);
                break;
            }
        }
        return input_string;
    }
Please let me know in case of any mistakes...
K Chandrasekhar Omkar,
kcomkar@gmail.com