BOOST 1..33.0 快出来了,并重写了regex,增加了
*对unicode支持
*对ATL MFC CString的支持
***********
迫不及待,先下了一个来看看.
源码下载:
=========
boost地址:
cvs -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/boost login
cvs -z9 -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/boost co -P boost
ICU地址:(boost 1.33.0的regex的unicode解决方案是基于IBM的unicode库ICU)
http://www.ibm.com/software/globalization/icu/
源码编译:
=============
编译环境是vc7.1+vc7.1自带的C++ STL,进入到BOOST_ROOT\libs\regex\build
bjam -sICU_PATH=d:\icu32 -sTOOLS=vc-7_1 stage
Unicode支持测试:
================
看了一下icu的dll,boost regex动态连接的三个dll总体积居然达到10M,心情不好,放弃测试。
ATL MFC支持:
===============
在vc7.1里面,新开个win32 console,加入下面代码:
/*
*
* Copyright (c) 2004
* John Maddock
*
* Use, modification and distribution are subject to the
* Boost Software License, Version 1.0. (See accompanying file
* LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
*
*/
/*
* LOCATION: see http://www.boost.org for most recent version.
* FILE mfc_example.cpp
* VERSION see <boost/version.hpp>
* DESCRIPTION: examples of using Boost.Regex with MFC and ATL string types.
*/
#define TEST_MFC
#ifdef TEST_MFC
#include <boost/regex/mfc.hpp>
#include <cstringt.h>
#include <atlstr.h>
#include <assert.h>
#include <tchar.h>
#include <iostream>
#ifdef _UNICODE
#define cout wcout
#endif
//
// Find out if *password* meets our password requirements,
// as defined by the regular expression *requirements*.
//
bool is_valid_password(const CString& password, const CString& requirements)
{
return boost::regex_match(password, boost::make_regex(requirements));
}
//
// Extract filename part of a path from a CString and return the result
// as another CString:
//
CString get_filename(const CString& path)
{
boost::tregex r(__T("(?:\\A|.*\\\\)([^\\\\]+)"));
boost::tmatch what;
if(boost::regex_match(path, what, r))
{
// extract $1 as a CString:
return CString(what[1].first, what.length(1));
}
else
{
throw std::runtime_error("Invalid pathname");
}
}
CString extract_postcode(const CString& address)
{
// searches throw address for a UK postcode and returns the result,
// the expression used is by Phil A. on www.regxlib.com:
boost::tregex r(__T("^(([A-Z]{1,2}[0-9]{1,2})|([A-Z]{1,2}[0-9][A-Z]))\\s?([0-9][A-Z]{2})$"));
boost::tmatch what;
if(boost::regex_search(address, what, r))
{
// extract $0 as a CString:
return CString(what[0].first, what.length());
}
else
{
throw std::runtime_error("No postcode found");
}
}
void enumerate_links(const CString& html)
{
// enumerate and print all the <a> links in some HTML text,
// the expression used is by Andew Lee on www.regxlib.com:
boost::tregex r(__T("href=[\"\']((http:\\/\\/|\\.\\/|\\/)?\\w+(\.\w+)*(\/\w+(\.\w+)?)*(\/|\?\w*=\w*(&\w*=\w*)*)?)["']"));
boost::tregex_iterator i(boost::make_regex_iterator(html, r)), j;
while(i != j)
{
std::cout << (*i)[1] << std::endl;
++i;
}
}
void enumerate_links2(const CString& html)
{
// enumerate and print all the <a> links in some HTML text,
// the expression used is by Andew Lee on www.regxlib.com:
boost::tregex r(__T("href=[\"\']((http:\\/\\/|\\.\\/|\\/)?\\w+(\.\w+)*(\/\w+(\.\w+)?)*(\/|\?\w*=\w*(&\w*=\w*)*)?)["']"));
boost::tregex_token_iterator i(boost::make_regex_token_iterator(html, r, 1)), j;
while(i != j)
{
std::cout << *i << std::endl;
++i;
}
}
//
// Take a credit card number as a string of digits,
// and reformat it as a human readable string with "-"
// separating each group of four digits:
//
const boost::tregex e(__T("\A(\d{3,4})[- ]?(\d{4})[- ]?(\d{4})[- ]?(\d{4})\z"));
const CString human_format = __T("$1-$2-$3-$4");
CString human_readable_card_number(const CString& s)
{
return boost::regex_replace(s, e, human_format);
}
int main()
{
// password checks using regex_match:
CString pwd = "abcDEF---";
CString pwd_check = "(?=.*[[:lower:]])(?=.*[[:upper:]])(?=.*[[:punct:]]).{6,}";
bool b = is_valid_password(pwd, pwd_check);
assert(b);
pwd = "abcD-";
b = is_valid_password(pwd, pwd_check);
assert(!b);
// filename extraction with regex_match:
CString file = "abc.hpp";
file = get_filename(file);
assert(file == "abc.hpp");
file = "c:\\a\\b\\c\\d.h";
file = get_filename(file);
assert(file == "d.h");
// postcode extraction with regex_search:
CString address = "Joe Bloke, 001 Somestreet, Somewhere,\nPL2 8AB";
CString postcode = extract_postcode(address);
assert(postcode = "PL2 8NV");
// html link extraction with regex_iterator:
CString text = "<dt><a href=\"syntax_perl.html\">Perl Regular Expressions</a></dt><dt><a href=\"syntax_extended.html\">POSIX-Extended Regular Expressions</a></dt><dt><a href=\"syntax_basic.html\">POSIX-Basic Regular Expressions</a></dt>";
enumerate_links(text);
enumerate_links2(text);
CString credit_card_number = "1234567887654321";
credit_card_number = human_readable_card_number(credit_card_number);
assert(credit_card_number == "1234-5678-8765-4321");
return 0;
}
#else
#include <iostream>
int main()
{
std::cout << "<NOTE>MFC support not enabled, feature unavailable</NOTE>";
return 0;
}
#endif
设置编译环境:
=============
*include路径里面包含$(BOOST_ROOT);%(ICU_PATH)\include,都在vc7.1相关include目录之后。
设置编译属性:
============
*使用unicode字符集
*使用/Zc:wchar_t(注意:vc7.1默认编译boost时候,wchar_t是作为元数据处理的,所以,如果要支持unicode,而不是mbcs时候,请使用此编译项编译工程)
*使用多线程调试dll /MDd(请不要使用其他的,如果你不明白这个是什么意思)
*设置宏BOOST_REGEX_DYN_LINK(默认情况下,regex是静态连接,如果想动态连接,就设置此宏)
编译连接“顺利”通过。
编译命令行为:
/Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "BOOST_REGEX_DYN_LINK" /D "_UNICODE" /D "UNICODE" /Gm /EHsc /RTC1 /MDd /Zc:wchar_t /Yu"stdafx.h" /Fp"Debug/capture.pch" /Fo"Debug/" /Fd"Debug/vc70.pdb" /W3 /nologo /c /Wp64 /ZI /TP
连接命令行为:
/OUT:"Debug/capture.exe" /INCREMENTAL /NOLOGO /DEBUG /PDB:"Debug/capture.pdb" /SUBSYSTEM:CONSOLE /MACHINE:X86 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib
BOOST 1.33.0 regex changelog
=====================
Boost 1.33.0.
Completely rewritten expression parsing code, and traits class support; now conforms to the standardization proposal. Added support for (?imsx-imsx) constructs. Added support for lookbehind expressions (?<=positive-lookbehind) and (?<!negative-lookbehind). Added support for conditional expressions (?(assertion)true-expresion|false-expression). Added MFC/ATL string wrappers. Added Unicode support; based on ICU. Changed newline support to recognise \f as a line separator (all character types), and \x85 as a line separator for wide characters / Unicode only.Boost 1.32.1.
Fixed bug in partial matches of bounded repeats of '.'.Boost 1.31.0.
Completely rewritten pattern matching code - it is now up to 10 times faster than before. Reorganized documentation. Deprecated all interfaces that are not part of the regular expression standardization proposal. Added regex_iterator and regex_token_iterator . Added support for Perl style independent sub-expressions. Added non-member operators to the sub_match class, so that you can compare sub_match's with strings, or add them to a string to produce a new string. Added experimental support for extended capture information. Changed the match flags so that they are a distinct type (not an integer), if you try to pass the match flags as an integer rather than match_flag_type to the regex algorithms then you will now get a compiler error.[end]