正则表达式在处理文本的时候有着十分强的优势。如果你熟悉linux的工作环境,如果经常使用比如grep ,sed
perl或者emacs,vi你便知道regex在作用作用,可以大大提高工作效率。
很多语言都提供了对正则表达式的支持,比如故Java(java.util.regex),perl等,很多脚本语言真是应为regex才显得特别有用。
Now boost::regex也提供对C++对正则表达式的库支持。同时它也会被纳入下一代的标准之中,在tr1中
one simple example is more than thousands words.
how do U split string to word?
like this? :( sorry for my ugly codes)
void split_string()
{
size_t beg = 0;
size_t end = 0;
char* str = "davinci is a very good boy";
string word;
for(unsigned i = 0;i<strlen(str);i++)
{
if(i>=1 && str[i] != ' '&&str[i-1] == ' ')
{
beg = i;
}
else if(str[i] == ' ')
{
end = i;
}
else if(str[i+1] == '\0')
{
end = i + 1;
}
if(beg < end)
{
word = std::string(str+beg,str+end);
cout<<word<<endl;
}
}
,but you can use boost::regex_split()
template <class OutputIterator, class charT, class Traits1, class Alloc1>
std::size_t regex_split(OutputIterator out,
std::basic_string<charT, Traits1, Alloc1>& s);
当然是重载了的,还有其他的形式
main()
{
using boost::lambda::_1;
main()
{
using boost::lambda::_1;
std::list<std::string> ls;
std::string str = "davinci is a very good boy";
boost::regex_split(std::back_inserter(ls),str);
std::for_each(ls.begin(),ls.end(),std::cout<<_1<<"\n");
}
//-----------------------------------------------
On regex rule:
每一种不同工具对正则表达式有少许不同。
.[{()\*+?|^$
*(0|n ) + (1|n ) ?(0|1) 0次或者1次
n表示n次
? No-greedy match非贪婪匹配.尽可能少的匹配
{n,m}? Matches the previous atom between n and m times, while consuming as little input as possible.
| 或者 ab(c|d)匹配abc或者abd
[] 可选择其中任意一个,[abc],可以a,或者b,或者c,
[a-z]表示a到z任意一个
[0-9],I know you know and every know it.
. match any single character
指定 具体的重复次数
a{n}重复n次
a{n,}n次或者更多
a{n,m}n到m次之间
A '^' character shall match the start of a line.
A '
example :
^a{2,3}cb$ 字符a开头,b结尾 . a重复2,到3次
包括
aacb
aaacb
^(a*).*\1$
但是下面是错误的:
a(*)
example:
apple@40years:~$ cat regex
anjutaProjects
a.out
book
Desktop
happy
misc
myProject
Project
regex
regex_replace.cpp
study
test
tmp
apple@40years:~$ grep ^a < regex //全是以a开头的字符串
anjutaProjects
a.out
apple@40years:~$ grep t___FCKpd___3lt;regex grep t___FCKpd___3lt;regex //t结尾的
a.out
myProject
Project
test
apple@40years:~$ grep [aj]o* < regex //包含a或者j,o可以出现任意次数或者不出现
anjutaProjects
a.out
happy
myProject
Project
regex_replace.cpp
apple@40years:~$
//---------------------------------
Yeah, it is more simple,and readable,but the function is not remarkable,and even it deprecated.
now one more:
如果你要找到一行注释/*smth */ and more ,how will you do ?
#include<boost/regex.hpp>
#include<iostream>
#include<string>
#include<map> [/url] main()
{
using std::string;
using std::endl;
using std::cout;
string re = main()
{
using std::string;
using std::endl;
using std::cout;
string re = main()
{
using std::string;
using std::endl;
using std::cout;
string re = main()
{
using std::string;
using std::endl;
using std::cout;
string re = "\/*\w+\*\/"; //æ煡æ壘/**/æ敞é噴
boost::regex e (re);
string s = "int a = 33,int b,/*333*/ [url=http://dev.csdn.net/title=] c ; /*efg*/;int k,/*sadfa*/";
c ; /*efg*/;int k,/*sadfa*/";
c ; /*efg*/;int k,/*sadfa*/";
c ; /*efg*/;int k,/*sadfa*/";
boost::match_results<std::string::const_iterator> result;
i = 0;
i = 0;
i = 0;
while(boost::regex_search(s,result,e)&& i++<4 )
{
cout<<*result.begin()<<"n"<<*(result.end()-1)<<"n";
cout<<"suffix="<<result.suffix()<<endl;
s = result.suffix(); //ç户ç画æ悳ç储å悗è竟ç殑å瓧çä覆
}
{
cout<<"error"<<endl;
}
/*****************************************/
cout<<"------------------------------------"<<endl;
//é暱æ壘å瓧çä覆""
typedef std::map<std::string,int,std::less<std::string> > map_type;
std::string insert_beg = "<font>";
std::string insert_end = "</font>";
std::string str = "abc "davinci"23abca3 abcd ";
e = boost::regex("("\w*")\w*(a\d)");//??è〃çずå叾ä腑å彲ä互ä负ä换ä綍ä釜ä换æ剰ç殑å瓧ç
// e = boost::regex(""\w*"");
std::string::const_iterator beg = str.begin();
std::string::const_iterator end = str.end();
boost::match_results<std::string::const_iterator> what;
map_type m;
while(boost::regex_search(beg,end,what,e))
{
cout<<"what[0] = "<<what[0]<<endl;
cout<<"what[1] = "<<what[1]<<endl;//sub_match 1, ("\w*)
cout<<"what[2] = "<<what[2]<<endl;//sub_match 2 (a\d)
std::string::size_type pos_end = what[1].second -str.begin();
std::string::size_type pos_beg = what[1].first - str.begin();
str.insert(pos_end,insert_end);
str.insert(pos_beg,insert_beg);
m[what[1]] = pos_beg;
assert(pos_end<str.size());
//update region
beg = what[1].second;
cout<<str<<endl;
}
}
//----------------------
match_result是sub_match的集合,sub_match:public pair<biIt,biIt>
match_result<std::string::const_iterator> what;
cout<<what[0]<<endl;//输出第一个匹配sub_match的字符串
what[0]是一个sub_match,一个pair,,对<<操作符进行了重载.
所以what[0]不是字符串string.
what[0].str()可以获取字符串.
what[0].first,what[0].second是BidirectionIterator,std::string(what[0].second-what[0].first)
what[1].first,what[1].second之间标志了该匹配的字符串
what[0]整个匹配的字符串
what[1],第一个sub_match匹配的字符串
what[2],第二个匹配的字符串
what[n]第n个sub_match匹配的字符串
.//---------------------------------------
So Overview it now:
three most import algorithm is start_with word regex
boost::regex_search()
boost::regex_match()
boost::regex_replace()
Types [url=file:///usr/share/doc/libboost-doc/HTML/libs/regex/doc/syntax_option_type.html]syntax_option_type error_type match_flag_type class regex_error class regex_traits class template basic_regex class template sub_match class template match_results Algorithms regex_match regex_search regex_replace Iterators regex_iterator regex_token_iterator Typedefs regex [ = basic_regex<char> ] wregex [ = basic_regex<wchar_t> ] cmatch [ = match_results<const char*> ] wcmatch [ = match_results<const wchar_t*> ] smatch [ = match_results<std::string::const_iterator> ] wsmatch [ = match_results<std::wstring::const_iterator> ] cregex_iterator [ = regex_iterator<const char*>] wcregex_iterator [ = regex_iterator<const wchar_t*>] sregex_iterator [ = regex_iterator<std::string::const_iterator>] wsregex_iterator [ = regex_iterator<std::wstring::const_iterator>] cregex_token_iterator [ = regex_token_iterator<const char*>] wcregex_token_iterator [ = regex_token_iterator<const wchar_t*>] sregex_token_iterator [ = regex_token_iterator<std::string::const_iterator>] wsregex_token_iterator [ = regex_token_iterator<std::wstring::const_iterator>] Deprecated interfaces POSIX API Compatibility Functions class regbase class template reg_expression Algorithm regex_grep Algorithm regex_format Algorithm regex_merge Algorithm regex_split class RegEx
for more information ,visit
http://www.boost.org/libs/regex/doc/index.html
There are many examples,which is wonderful.
中文网站也有,please google it,
wish you love boost::regex,
BTW, when you compile the source codes ,you must link lib
like this:
apple@40years:~/test/boost$ cd ..
apple@40years:~/test$ g++ -g -Wall -lboost_regex split_string.cpp
apple@40years:~/test$ ./a.out
davinci
is
a
very
good
boy
apple@40years:~/test$sd
All ,thanks