分享
 
 
 

SAS9新体验-在DATA STEP中使用perl 正则表达式支持(Regular Expressions)

王朝perl·作者佚名  2006-01-09
窄屏简体版  字體: |||超大  

sas自9版开始支持perl(Perl 5.6.1 ) 正则表达式支持,极大的方便了数据校验的简易性、可靠性

在没有Regular Expressions(RE)前,只能使用index,substr,tranwrd等函数对字符串进行操作,但这些函数对动态字符串的操作是缺乏弹性且效率较低

故SAS9推出RE,以方便的进行字符串校验、替换、提取

Regexp是由一组被称为metacharacters的特殊字符组成,这些特殊字符代表着特殊的匹配规则,具体请参考

http://www.perldoc.com/perl5.6.1/pod/perlre.html

各种使用案例如下:

1、对客户数据中的电话号码进行数据校验

data _null_;

?? retain re;

?? length first last home business $ 16;

?? if _N_ = 1 then do;

????? /*设置电话匹配模式1 (XXX) XXX-XXXX */

????? paren = "\([2-9]\d\d\) ?[2-9]\d\d-\d\d\d\d";

???

????? /*设置电话匹配模式2 XXX-XXX-XXXX */

????? dash = "[2-9]\d\d-[2-9]\d\d-\d\d\d\d";

????? /* 合并两种匹配模式,使用【|】特殊符号 */

????? regexp = "/(" || paren || ")|(" || dash || ")/";

???/*判断是否为正确的正则表达式*/

????? re = prxparse(regexp);

????? if missing(re) then do;

???????? putlog "ERROR: Invalid regexp " regexp;

???????? stop;

????? end;

?? end;

?? input first last home business;

?/*启用正则匹配,如果匹配失败则返回missing*/

?? if ^prxmatch(re, home) then

????? putlog "NOTE: Invalid home phone number for " first last home;

?? if ^prxmatch(re, business) then

????? putlog "NOTE: Invalid business phone number for " first last business;

datalines;

Jerome Johnson (919)319-1677 (919)846-2198

Romeo Montague 800-899-2164 360-973-6201

Imani Rashid (508)852-2146 (508)366-9821

Palinor Kent . 919-782-3199

Ruby Archuleta . .

Takei Ito 7042982145 .

Tom Joad 209/963/2764 2099-66-8474

;

输出结果如下:

NOTE: Invalid home phone number for Palinor Kent

NOTE: Invalid home phone number for Ruby Archuleta

NOTE: Invalid business phone number for Ruby Archuleta

NOTE: Invalid home phone number for Takei Ito 7042982145

NOTE: Invalid business phone number for Takei Ito

NOTE: Invalid home phone number for Tom Joad 209/963/2764

NOTE: Invalid business phone number for Tom Joad 2099-66-84

2、替换字符串,把替换为>

data _null_;

?? retain lt_re gt_re;

?? if _N_ = 1 then do;

?? ??/*设置替换模式 格式为:s/正则匹配表达式/替换的文本/*/

????? lt_re = prxparse('s/

?????

????? gt_re = prxparse('s//>/');

????? if missing(lt_re) or missing(gt_re) then do;

???????? putlog "ERROR: Invalid regexp.";

???????? stop;

????? end;

?? end;

?? input;

?? /*启用这则替换*/

?? call prxchange(lt_re, -1, _infile_);

?? call prxchange(gt_re, -1, _infile_);

?? put _infile_;

datalines4;

The bracketing construct ( ... ) creates capture buffers.

To refer to the digit'th buffer use \ within the match.

Outside the match use "$" instead of "\". (The notation works in certain circumstances outside the match.

See the warning below about \1 vs $1 for details.) Referring

back to another part of the match is called a backreference.

;;;;

输出结果如下:

The bracketing construct ( ... ) creates capture buffers.

To refer to the digit'th buffer use \<digit> within the match.

Outside the match use "$" instead of "\". (The \<digit>

notation works in certain circumstances outside the match.

See the warning below about \1 vs $1 for details.) Referring

back to another part of the match is called a backreference.

3、从客户信息中提取客户的办公电话文本

data _null_;

?? retain re areacode_re;

?? length first last home business $ 16;

?? length areacode $ 3;

?? if _N_ = 1 then do;

????? /* (XXX) XXX-XXXX */

????? paren = "\(([2-9]\d\d)\) ?[2-9]\d\d-\d\d\d\d";

????? /* XXX-XXX-XXXX */

????? dash = "([2-9]\d\d)-[2-9]\d\d-\d\d\d\d";

????? /* Combine two phone patterns into one with a | */

????? regexp = "/(" || paren || ")|(" || dash || ")/";

????? re = prxparse(regexp);

????? if missing(re) then do;

???????? putlog "ERROR: Invalid regexp " regexp;

???????? stop;

????? end;

????? areacode_re = prxparse("/828|336|704|910|919|252/");

????? if missing(areacode_re) then do;

???????? putlog "ERROR: Invalid area code regexp";

???????? stop;

????? end;

?? end;

?? input first last home business;

?? if ^prxmatch(re, home) then

????? putlog "NOTE: Invalid home phone number for " first last home;

?

?? if prxmatch(re, business) then do;

?? ??/*返回最后匹配结果的信息*/

????? which_format = prxparen(re);

????? /*从匹配结果中提取字符串*/

????? call prxposn(re, which_format, pos, len);

????? areacode = substr(business, pos, len);

????? /*判断提取出的字符串的区号是否匹配,匹配则输出结果*/

????? if prxmatch(areacode_re, areacode) then

???????? put "In North Carolina: " first last business;

?? end;

?? else

????? putlog "NOTE: Invalid business phone number for " first last business;

datalines;

Jerome Johnson (919)319-1677 (919)846-2198

Romeo Montague 800-899-2164 360-973-6201

Imani Rashid (508)852-2146 (508)366-9821

Palinor Kent 704-782-4673 704-782-3199

Ruby Archuleta 905-384-2839 905-328-3892

Takei Ito 704-298-2145 704-298-4738

Tom Joad 515-372-4829 515-389-2838

;

输出结果如下:

In North Carolina: Jerome Johnson (919)846-2198

In North Carolina: Palinor Kent 704-782-3199

In North Carolina: Takei Ito 704-298-4738

以上源代码来自SAS网站,我只是稍微加了点注释,便于初次接触者了解,详情请参考SAS网站

?

?

?

?

 
 
 
免责声明:本文为网络用户发布,其观点仅代表作者个人观点,与本站无关,本站仅提供信息存储服务。文中陈述内容未经本站证实,其真实性、完整性、及时性本站不作任何保证或承诺,请读者仅作参考,并请自行核实相关内容。
2023年上半年GDP全球前十五强
 百态   2023-10-24
美众议院议长启动对拜登的弹劾调查
 百态   2023-09-13
上海、济南、武汉等多地出现不明坠落物
 探索   2023-09-06
印度或要将国名改为“巴拉特”
 百态   2023-09-06
男子为女友送行,买票不登机被捕
 百态   2023-08-20
手机地震预警功能怎么开?
 干货   2023-08-06
女子4年卖2套房花700多万做美容:不但没变美脸,面部还出现变形
 百态   2023-08-04
住户一楼被水淹 还冲来8头猪
 百态   2023-07-31
女子体内爬出大量瓜子状活虫
 百态   2023-07-25
地球连续35年收到神秘规律性信号,网友:不要回答!
 探索   2023-07-21
全球镓价格本周大涨27%
 探索   2023-07-09
钱都流向了那些不缺钱的人,苦都留给了能吃苦的人
 探索   2023-07-02
倩女手游刀客魅者强控制(强混乱强眩晕强睡眠)和对应控制抗性的关系
 百态   2020-08-20
美国5月9日最新疫情:美国确诊人数突破131万
 百态   2020-05-09
荷兰政府宣布将集体辞职
 干货   2020-04-30
倩女幽魂手游师徒任务情义春秋猜成语答案逍遥观:鹏程万里
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案神机营:射石饮羽
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案昆仑山:拔刀相助
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案天工阁:鬼斧神工
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案丝路古道:单枪匹马
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案镇郊荒野:与虎谋皮
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案镇郊荒野:李代桃僵
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案镇郊荒野:指鹿为马
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案金陵:小鸟依人
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案金陵:千金买邻
 干货   2019-11-12
 
推荐阅读
 
 
 
>>返回首頁<<
 
靜靜地坐在廢墟上,四周的荒凉一望無際,忽然覺得,淒涼也很美
© 2005- 王朝網路 版權所有