分享
 
 
 

Scanning Text With java.util.Scanner

王朝java/jsp·作者佚名  2006-01-09
窄屏简体版  字體: |||超大  

J2SE 5.0 adds classes and methods that can make every day tasks easier to perform. In this tip you will see how the newly added java.util.Scanner class makes it easier to read and parse strings and primitive types using regular expressions.

Before the J2SE 5.0 release, you probably wrote code such as the following TextReader class to read text from a file:

import java.io.BufferedReader;

import java.io.FileReader;

import java.io.IOException;

import java.io.File;

public class TextReader {

private static void readFile(String fileName) {

try {

File file = new File(fileName);

FileReader reader = new FileReader(file);

BufferedReader in = new BufferedReader(reader);

String string;

while ((string = in.readLine()) != null) {

System.out.println(string);

}

in.close();

} catch (IOException e) {

e.printStackTrace();

}

}

public static void main(String[] args) {

if (args.length != 1) {

System.err.println("usage: java TextReader " + "file location");

System.exit(0);

}

readFile(args[0]);

}

}

The basic approach in classes like this is to create a File object that corresponds to the actual file on the hard drive. The class then creates a FileReader associated with the file and then a BufferedReader from the FileReader. It then uses the BufferedFile reader to read the file one line at a time.

To view the TextReader class in action, you need to create a document for the class to read and parse. To create the document, save the following two lines of text in a file named TextSample.txt in the same directory as TextReader:

Here is a small text file that you will

use to test java.util.scanner.

Compile TextReader. Then run it by entering the following:

java TextReader TextSample.txt

You should see the original file echoed back to you in standard output.

You can simplify the code in TextReader by using java.util.Scanner, a class that parses primitive types and strings:

import java.io.File;

import java.io.FileNotFoundException;

import java.util.Scanner;

public class TextScanner {

private static void readFile(String fileName) {

try {

File file = new File(fileName);

Scanner scanner = new Scanner(file);

while (scanner.hasNext()) {

System.out.println(scanner.next());

}

scanner.close();

} catch (FileNotFoundException e) {

e.printStackTrace();

}

}

public static void main(String[] args) {

if (args.length != 1) {

System.err.println("usage: java TextScanner1" + "file location");

System.exit(0);

}

readFile(args[0]);

}

}

Compile TextScanner. Then run it as follows:

java TextScanner TextSample.txt

You should get the following output:

Here

is

a

small

text

file

that

you

will

use

to

test

java.util.scanner.

TextScanner creates a Scanner object from the File. The Scanner breaks the contents of the File into tokens using a delimiter pattern, By default the delimiter pattern is whitespace. TextScanner then calls the hasNext() method in Scanner. This method returns true if another token exists in the Scanner's input, which is the case until it reaches the end of the file. The next() method returns a String that represents the next token. So until it reaches the end of the file, TextScanner prints the String returned by next() on a separate line.

You can change the delimeter that is used to tokenize the input, through the useDelimiter method of Scanner. You can pass in a String or a java.util.regex.Pattern to the method. See the JavaDocs page for Pattern for information on what patterns are appropriate. For example, you can read the input one line at a time by using the newline character (\n) as a delimiter. Here is the revised readFile() method for TextScanner that uses a newline character as the delimiter:

private static void readFile(String fileName) {

try {

Scanner scanner = new Scanner(new File(fileName));

scanner.useDelimiter(System.getProperty("line.separator"));

while (scanner.hasNext())

System.out.println(scanner.next());

scanner.close();

} catch (FileNotFoundException e) {

e.printStackTrace();

}

}

Note that there are other options for detecting the end of a line. You could, for example, test for lines that end with a newline character or that end with a carriage return and a newline character. You can do that using the regular expression "\r\n|\n". The JavaDocs for java.util.regex.Pattern shows other possible line terminators, so a more complete check would use the expression "\r\n|[\r\n\u2028\u2029\u0085]". You can also use the hasNextLine() and nextLine() methods from the Scanner class. In any case, with the revised TextScanner, the output should match the contents and layout of TextSample.txt. In other words, you should see the following:

Here is a small text file that you will

use to test java.util.scanner.

A simple change of the pattern in the delimiter used by the Scanner gives you a great deal of power and flexibility. For example, if you specify the following delimiter:

scanner.useDelimiter("\\z");

it reads in the entire file at once. This is similar to the trick suggested by Pat Niemeyer in his java.net blog. You can read in the entire contents of a web page without creating several intermediate objects. The code for the following class, WebPageScanner, reads in the current contents of the java.net homepage:

import java.net.URL;

import java.net.URLConnection;

import java.io.IOException;

import java.util.Scanner;

public class WebPageScanner {

public static void main(String[] args) {

try {

URLConnection connection = new URL("http://java.net").openConnection();

String text = new Scanner(connection.getInputStream()).useDelimiter("\\Z").next();

} catch (IOException e) {

e.printStackTrace();

}

}

}

You can handle more than Strings with the Scanner class. You can also use Scanner to parse data that consists of primitives. To illustrate this, save the following three lines in a file named Employee.data (in the same directory as TextSample):

Joe, 38, true

Kay, 27, true

Lou, 33, false

You could still treat this as one large String and perform the conversions after parsing the String. Instead, you can parse this file in two steps. This is illustrated in the following class, DataScanner:

import java.util.Scanner;

import java.io.File;

import java.io.FileNotFoundException;

public class DataScanner {

private static void readFile(String fileName) {

try {

Scanner scanner = new Scanner(new File(fileName));

scanner.useDelimiter(System.getProperty("line.separator"));

while (scanner.hasNext()) {

parseLine(scanner.next());

}

scanner.close();

} catch (FileNotFoundException e) {

e.printStackTrace();

}

}

private static void parseLine(String line) {

Scanner lineScanner = new Scanner(line);

lineScanner.useDelimiter("\\s*,\\s*");

String name = lineScanner.next();

int age = lineScanner.nextInt();

boolean isCertified = lineScanner.nextBoolean();

System.out.println("It is " + isCertified + " that " + name + ", age " + age + ", is certified.");

}

public static void main(String[] args) {

if (args.length != 1) {

System.err.println("usage: java TextScanner2" + "file location");

System.exit(0);

}

readFile(args[0]);

}

}

The outer Scanner object in DataScanner reads a file, one line at a time. The readFile() method passes each line to a second scanner. The second scanner parses the comma delimited data and discards the whitespace on either side of the comma. There are variants of the hasNext() and next() methods which enable you to test whether or not the next token is of a specified type and to attempt to treat the next token as an instance of that type. For example, nextBoolean() attempts to treat the next token as a boolean and tries to match it to either the String "true" or the String "false". If the match cannot be made, a java.util.InputMismatchException is thrown. The parseLine() method of DataScanner shows how each line is parsed into a String, an int, and a boolean.

Compile DataScanner. Then run it as follows:

java DataScanner Employee.data

You should get the following output:

It is true that Joe, age 38, is certified.

It is true that Kay, age 27, is certified.

It is false that Lou, age 33, is certified.

You might be tempted to use just the comma as a delimiter. In other words you might try this:

lineScanner.useDelimiter(",");

This will result in an InputMismatchException. That's because an extra space will be included in the token that you are trying to convert to a boolean, and the space does not match either "true" or "false". As is the case with all applications of regular expressions, the underlying power requires that you take extra care in constructing your patterns.

For more information on Scanner, see the formal documentation.

 
 
 
免责声明:本文为网络用户发布,其观点仅代表作者个人观点,与本站无关,本站仅提供信息存储服务。文中陈述内容未经本站证实,其真实性、完整性、及时性本站不作任何保证或承诺,请读者仅作参考,并请自行核实相关内容。
2023年上半年GDP全球前十五强
 百态   2023-10-24
美众议院议长启动对拜登的弹劾调查
 百态   2023-09-13
上海、济南、武汉等多地出现不明坠落物
 探索   2023-09-06
印度或要将国名改为“巴拉特”
 百态   2023-09-06
男子为女友送行,买票不登机被捕
 百态   2023-08-20
手机地震预警功能怎么开?
 干货   2023-08-06
女子4年卖2套房花700多万做美容:不但没变美脸,面部还出现变形
 百态   2023-08-04
住户一楼被水淹 还冲来8头猪
 百态   2023-07-31
女子体内爬出大量瓜子状活虫
 百态   2023-07-25
地球连续35年收到神秘规律性信号,网友:不要回答!
 探索   2023-07-21
全球镓价格本周大涨27%
 探索   2023-07-09
钱都流向了那些不缺钱的人,苦都留给了能吃苦的人
 探索   2023-07-02
倩女手游刀客魅者强控制(强混乱强眩晕强睡眠)和对应控制抗性的关系
 百态   2020-08-20
美国5月9日最新疫情:美国确诊人数突破131万
 百态   2020-05-09
荷兰政府宣布将集体辞职
 干货   2020-04-30
倩女幽魂手游师徒任务情义春秋猜成语答案逍遥观:鹏程万里
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案神机营:射石饮羽
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案昆仑山:拔刀相助
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案天工阁:鬼斧神工
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案丝路古道:单枪匹马
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案镇郊荒野:与虎谋皮
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案镇郊荒野:李代桃僵
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案镇郊荒野:指鹿为马
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案金陵:小鸟依人
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案金陵:千金买邻
 干货   2019-11-12
 
推荐阅读
 
 
 
>>返回首頁<<
 
靜靜地坐在廢墟上,四周的荒凉一望無際,忽然覺得,淒涼也很美
© 2005- 王朝網路 版權所有