I was looking into one problem that I had today and it was to do with either using a greedy or non-greedy regex rule to get what I want.
Say we have this text that we want to filter:
<a href=”http://www.davidtan.org”></a>Home</a>
<a href=”http://www.davidtan.org/index.php”>Index PHP Home</a>
<a href=”http://www.davidtan.org/index.php”>Index HTML Home</a>
and we use
Regex Rule 1 (Greedy) : @<a [^>]+>(.*)</a>@
Regex Rule 2 (Non Greedy) : @<a [^>]+>(.*?)</a>@
Using the greedy match, there will only be one match of html tag starting with <a> and ending </a>as it matches everything up till the last one that it can see. The * is greedy; therefore, the .* portion of the regex will match as much as it can and still allow the remainder of the regex to match.
The non greedy rule will, however, return you with three matches as there are 3 unique pairs of <a> and </a>. .*? portion attempts to match the least amount of data while allowing the remainder of the regex to match.
Regular Expression (Regex) is not hard. We just need more practice.
Leave a Reply