apache url rewrite及正则表达式笔记
什么是mod_rewrite?mod_rewrite是apache一个允许服务器端对请求url做修改的模块。入端url将和一系列的rule来比对。这些rule包含一个正则表达式以便检测每个特别的模式。如果在url中检测到该模式,并且适当的预设条件满足,name该模式将被一个预设的字符串或者行为所替换。
这个过程持续进行直到没有任何未处理的规则或者该过程被显式地停止。
这可以用三点来总结:
有一系列的顺序处理的规则rule集如果有一条规则被匹配,将同时检查该规则对应的条件是否满足如果一切处理结果都是go,那么将执行一条替换或者其他动作mod_rewrite的好处有一些比较明显的好处,但是也有一些并不是很明显:
mod_rewrite非常普遍地被用于转换丑陋的,难以明义的URL,形成所谓"友好或干净的url"。
另一方面,这些转换后的url将会是搜索引擎友好的
正则表达式token:\s{2,}2个以上的空格
\|backward referrence
\\matches a '\'
\bWord boundary position,比如whitespace或者字符串的开始或者结束
\BNot a word boundary position
(?=ABC)positive lookahead. Matches a group after your main exPRession without including it in the result
(?!ABC)Negative lookahead.Specifies a group that can not match after your main expression(ie. if it matches, the result is discarded)
(?<=ABC) Positive lookbehind. Matches a group before your main expression without including it in the result.
(?<!ABC)Negative lookbehind.Specifies a group that can not match before your main expression(ie.if it matches, the result is discarded)
*?:match zero or more of the preceeding token. This is a lazy match, and will match as few characters as possible before satisfying the next token
+?:match one or more of the preceeding token. This is a lazy match, and will match as few characters as possible before satisfying the next token
{5} :matches exactly 5 of the preceeding token;
{2,5}: matches 2 to 5 of the preceding token. Greedy match;
{2,5}?matches 2 to 5 of the preceding token. lazy match;
(ABC)groups multiple tokens together. This allows you to apply quantifiers to the fall group. Creates a capture group roll over a match highlight to see the capture group result
(?:ABC)groups multiple tokens without creating capture group;
$$escaped $ symbol $`: insert the portion of the string that precedes the match
$&:inserts the matched substring $' : insert the portion of the string that follows the match[$1]:inserts the result of the first capture group
mmultiline
iignore case
"S"match any character, except for line breaks if dotall is false
"g"search globally
?zero or one
\escape
\.\\\+\*\?\^\$\[\]\(\)\{\}\/\'\#
[ABC]Any single character in ABC set
+one or more
*zero or more
|or matches the full before or after '|'(https?|ftp)://
^matches the beginning of the string
$matches the end of the string
$1refer to a match
$2refer to another match
?:within parenthesis to not capture (^.+(?:jpg|png|gif)$)
[^ABC]Any single character not in the set
[a-z]any single character in the a-z range
[^b-e]any single character that is not in range b-e
[0-9]
[\w'-]any world characater, single quote or -
\t\r\ntab
\xFFspecifying a character by its hexdecimal index
\xA9 => copyright symbol