These quantifiers are greedy - that is your pattern will try to match as much text as possible. Sometimes it presents a problem. Let's consider a typical example - define a pattern to match delimited text, i.e. text enclosed in quotes, brackets, etc. Since we don't know what kind of text is inside the quotes we'll use
/".*"/
But this pattern will match everything between the first " and the last " in the following line:
this file is normally "$VIM/.gvimrc". You can check this with ":version".
This problem can be resolved by using non-greedy quantifiers:
Quantifier
Description
\{-}
matches 0 or more of the preceding atom, as few as possible
\{-n,m}
matches 1 or more of the preceding characters...
\{-n,}
matches at lease or more of the preceding characters...
\{-,m}
matches 1 or more of the preceding characters...
where n and m are positive integers (>0)
Let's use \{-} in place of * in our pattern. So, now ".\{-}" will match the first quoted text:
this file is normally "$VIM/gvimrc". You can check this with ":version".
.\{-} pattern is not without surprises. Look what will happen to the following text after we apply:
:s:.\{-}:_:g
Before:
n and m are decimal numbers between
After:
_n_ _a_n_d_ _m_ _a_r_e_ _d_e_c_i_m_a_l_ _n_u_m_b_e_r_s_ _b_e_t_w_e_e_n_
"As few as possible" applied here means zero character replacements. However match does occur between characters! To explain this behavior I quote Bram himself:
Matching zero characters is still a match. Thus it will replace zero characters with a "_". And then go on to the next position, where it will match again.
It's true that using "\{-}" is mostly useless. It works this way to be consistent with "*", which also matches zero characters. There are more useless ones: "x\{-1,}" always matches one x. You could just use "x". More useful is something like "x\{70}". The others are just consistent behavior: ..., "x\{-3,}", "x\{-2,}", "x\{-1,}.
Thursday, April 16, 2009
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment