Sunday, July 19, 2009

I will never understand regular expressions

Not at this rate, anyway. Consider the following code snippet (tested in Firefox 3.5). You can probably see it's an attempt to search for empty HTML tags. It ... does not work.

var pattern = /\<([a-zA-Z0-9_\:]+)((?: [a-zA-Z0-9_\:]+=".*?")*?)\/>/;
var match = pattern.exec('<span class="noread nodeleted"><input class="edit" size="40" ui:id="set_title" value="awake"/></span>');
inonit.debug.debug("Match: " + match[2]);


The output is:
class="noread nodeleted"><input class="edit" size="40" ui:id="set_title" value="awake"

Why?!? Shouldn't the non-greedy quantifier prevent us from capturing everything between the first and last quotation marks?

I messed with this for far too long, until I stumbled across a solution:

var pattern = /\<([a-zA-Z0-9_\:]+)((?: [a-zA-Z0-9_\:]+="[^"]*?")*?)\/>/;
var match = pattern.exec('<span class="noread nodeleted"><input class="edit" size="40" ui:id="set_title" value="awake"/></span>');
inonit.debug.debug("Match: " + match[2]);


The output for this?
class="edit" size="40" ui:id="set_title" value="awake"

Which is, of course, what I'm looking for. But why do I have to specify that I'm not matching the double quote character?

Maybe I'm just dumb and I'm missing something about non-greedy quantifiers. But then why does this work?

var pattern = /"(.*?)"/;
var match = pattern.exec('"Hello" "World"');
inonit.debug.debug("Match: " + match[1]);


Output:
Hello, of course.

Makes no sense to me.

No comments:

Post a Comment