New I/0 Functionality for JavaTM 2 Standard Edition 1.4
John Zukowski
October, 2001
Back in January 2000, while many people were arguing about whether the year 2000 was the last or first year of the century, life began for JSR 51 as an approved Java Specification Request (JSR). The name of that JSR is New I/O APIs for the Java Platform. Many people think of the new capabilities as just offering non-blocking I/O operations. However, the new features introduced into the JavaTM 2 Platform, Standard Edition (J2SETM), version 1.4 Beta, include many other new and interesting features. While the API certainly will offer support for scalable I/O operations for both sockets and files, you'll also find a regular expression package for pattern matching, encoders and decoders for character set conversions, and improved file system support like file locking and memory mapping. All four of these new features will be covered in this article.
Note: The Java Native Interface (JNI) changes made to support the New I/O operations will not be covered. For information on these changes, see the Resources section at the end of this article.
Buffers
Starting from the simplest and building up to the most complex, the first improvement to mention is the set of Buffer classes found in the java.nio package. These buffers provide a mechanism to store a set of primitive data elements in an in-memory container. Basically, imagine wrapping a combined DataInputStream/DataOutputStream around a fixed-size byte array and then only being able to read and write one data type, like char, int, or double. There are seven such buffers available:
ByteBuffer
CharBuffer
DoubleBuffer
FloatBuffer
IntBuffer
LongBuffer
ShortBuffer
The ByteBuffer actually supports reading and writing the other six types, but the others are type specific. To demonstrate the use of a buffer, the following snippet converts a String to a CharBuffer and reads a character at a time. You convert the String to a CharBuffer with the wrap method, then get each letter with the get method.
CharBuffer buff = CharBuffer.wrap(args[0]);
for (int i=0, n=buff.length(); i<n; i++) {
System.out.println(buff.get());
}
When using buffers, it is important to realize there are different sizing and positioning values to worry about. The length method is actually non-standard, specific to CharBuffer. There is nothing wrong with it, but it really reports the remaining length, so if the position is not at the beginning, the reported length will not be the buffer length, but the number of remaining characters within the buffer. In other words, the above loop can also be written as follows.
CharBuffer buff = CharBuffer.wrap(args[0]);
for (int i=0; buff.length() > 0; i++) {
System.out.println(buff.get());
}
Getting back to the different sizing and positioning values, the four values are known as mark, position, limit, and capacity:
mark -- setable position with mark method that can be used to reset the position with reset, <= position, >= 0
position -- current read/write position within buffer, <= limit
limit -- index of first element that should not be read, <= capacity
capacity -- size of buffer, >= limit
The position is an important piece of information to keep in mind when reading from and writing to a buffer. For instance, if you want to read what you just wrote you must move the position to where you want to read from, otherwise, you'll read past the limit and get whatever just happens to be there. This is where the flip method comes in handy, changing the limit to the current position and moving the current position to zero. You can also rewind a buffer to keep the current limit and move the position back to zero. For example, removing the flip call from the following snippet will get back a space, assuming nothing was put in the buffer originally.
buff.put('a');
buff.flip();
buff.get();
The wrap mechanism shown above is an example of a non-direct buffer. Non-direct buffers can also be created and sized with the allocate method, essentially wrapping the data into an array. At a slightly higher creation cost, you can also create a contiguous memory block, also called a direct buffer, with the allocateDirect method. Direct buffers rely on the system's native I/O operations to optimize access operations.
Mapped Files
There is one specialized form of direct ByteBuffer known as a MappedByteBuffer. This class represents a buffer of bytes mapped to a file. To map a file to a MappedByteBuffer, you first must get the channel for a file. A channel represents a connection to something, such as a pipe, socket, or file, that can perform I/O operations. In the case of a FileChannel, you can get one from a FileInputStream, FileOutputStream, or RandomAccessFile through the getChannel method. Once you have the channel, you map it to a buffer with map, specifying the mode and portion of the file you want to map. The file channel can be opened read-only (MAP_RO), copy-on-write (MAP_COW), or read-write (MAP_RW).
Here's the basic process for creating a read-only MappedByteBuffer from a file:
String filename = ...;
FileInputStream input = new
FileInputStream(filename);
FileChannel channel = input.getChannel();
int fileLength = (int)channel.size();
MappedByteBuffer buffer =
channel.map(FileChannel.MAP_RO, 0, fileLength);
You'll find the channel-related classes in the java.nio.channels package.
Once the MappedByteBuffer has been created, you can access it like any other ByteBuffer. In this particular case though, it is read-only, so any attempt to put something will throw an exception, NonWritableChannelException in this case. If you need to treat the bytes as characters, you must convert the ByteBuffer into a CharBuffer through the use of a character set for the conversion. This character set is specified by the Charset class. You then decode the file contents through the CharsetDecoder class. There is also a CharsetEncoder to go in the other direction.
// ISO-8859-1 is ISO Latin Alphabet #1
Charset charset = Charset.forName("ISO-8859-1");
CharsetDecoder decoder = charset.newDecoder();
CharBuffer charBuffer = decoder.decode(buffer);
These classes are found in the java.nio.charset package.
Regular Expressions
Once you've mapped the input file to a CharBuffer, you can do pattern matching on the file contents. Think of running grep or wc on the file to do regular expression matching or word counting, respectively. That's where the java.util.regex package comes into play and the Pattern and Matcher classes get used.
The Pattern class provides a whole slew of constructs for matching regular expressions. Basically, you provide the pattern as a String. See the class documentation for full details of the patterns. Here are some samples to get you started:
Line pattern, any number of characters followed by carriage return and/or line feed: .*\r?\n or .*$
Series of numbers: [0-9]* or \d*
A control character {cntrl}
An upper or lowercase US-ASCII character, followed by white space, followed by punctuation: [{lower}{upper}]\s{punct}
When you provide the pattern, you tell the Pattern class to compile it. Because pattern matching tries to find the largest possible match, in the case of the end-of-line character ($), you don't want to match the entire file to the end of it. You must use the compile option of MULTILINE. There are other options for tasks like case-insensitive matching and Unicode-aware case folding, among a few others. So, if your pattern was for the line pattern above, the code would look like such:
Pattern linePattern = Pattern.compile(".*$",
Pattern.MULTILINE);
When it is time to match the pattern, you call the matcher method to get a Matcher back. From this, you can find out if the pattern matches or find and get the matching piece with group, or you can split the string by providing the break pattern, and getting the individual pieces back with split. For instance, the following is a framework for reading a line at a time and getting words out of each line.
Matcher matcher = p.matcher(aString);
Pattern wordBreakPattern =
Pattern.compile("[{space}{punct}]");
// Loop through the lines
while (lineMatcher.find()) {
CharSequence line = lineMatcher.group();
String words[] = wordBreakPattern.split(line);
// For each word
for (int i=0, n=words.length; i<n; i++) {
// Lines with just break characters return an empty string
if (words[i].length() > 0) {
System.out.println(":" + words[i] + ":");
}
}
}
There is also a shortcut for matching with boolean b = Pattern.matches(".*\r?\n", aString), but it isn't efficient for when you need to recheck for matches as the Pattern is not compiled.
To combine all the previously mentioned skills, the following example performs a word/line count on file passed into the program:
import java.io.*;
import java.nio.*;
import java.nio.channels.*;
import java.nio.charset.*;
import java.util.*;
import java.util.regex.*;
public class WordCount {
public static void main(String args[]) throws
Exception {
String filename = args[0];
// Map File from filename to byte buffer
FileInputStream input = new
FileInputStream(filename);
FileChannel channel = input.getChannel();
int fileLength = (int)channel.size();
MappedByteBuffer buffer =
channel.map(FileChannel.MAP_RO, 0,
fileLength);
// Convert to character buffer
Charset charset = Charset.forName("ISO-8859-1");
CharsetDecoder decoder = charset.newDecoder();
CharBuffer charBuffer = decoder.decode(buffer);
// Create line pattern
Pattern linePattern = Pattern.compile(".*$",
Pattern.MULTILINE);
// Create word pattern
Pattern wordBreakPattern =
Pattern.compile("[{space}{punct}]");
// Match line pattern to buffer
Matcher lineMatcher =
linePattern.matcher(charBuffer);
Map map = new TreeMap();
Integer ONE = new Integer(1);
// For each line
while (lineMatcher.find()) {
// Get line
CharSequence line = lineMatcher.group();
// Get array of words on line
String words[] = wordBreakPattern.split(line);
// For each word
for (int i=0, n=words.length; i<n; i++) {
if (words[i].length() > 0) {
Integer frequency =
(Integer)map.get(words[i]);
if (frequency == null) {
frequency = ONE;
} else {
int value = frequency.intValue();
frequency = new Integer(value + 1);
}
map.put(words[i], frequency);
}
}
}
System.out.println(map);
}
}
For additional information about the regular expression library, see the Regular Expressions and the Java Programming Language article referenced in the Resources.
Socket Channels
Moving on from file channels takes us to channels for reading from and writing to socket connections. These channels can be used in a blocking or non-blocking fashion. In the blocking fashion, they just replace the call to connect or accept, depending on whether you are a client or a server. In the non-blocking fashion, there is no equivalent.
The new classes to deal with for basic socket reading and writing are the InetSocketAddress class in the java.net package to specify where to connect to, and the SocketChannel class in the java.nio.channels package to do the actual reading and writing operations.
Connecting with InetSocketAddress is very similar to working with the Socket class. All you have to do is provide the host and port:
String host = ...;
InetSocketAddress socketAddress = new
InetSocketAddress(host, 80);
Once you have the InetSocketAddress, that's where life changes. Instead of reading from the socket's input stream and writing to the output stream, you need to open a SocketChannel and connect it to the InetSocketAddress:
SocketChannel channel = SocketChannel.open();
channel.connect(socketAddress);
Once connected, you can read from or write to the channel with ByteBuffer objects. For instance, you can wrap a String in a CharBuffer with the help of an CharsetEncoder to send an HTTP request:
Charset charset = Charset.forName("ISO-8859-1");
CharsetEncoder encoder = charset.newEncoder();
String request = "GET / \n\r\n\r";
channel.write(encoder.encode(CharBuffer.wrap(request)));
You can then read the response from the channel. Since the response for this HTTP request will be text, you'll need to convert that response into a CharBuffer through a CharsetDecoder. By creating just a CharBuffer to start, you can keep reusing the object to avoid unnecessary garbage collection between reads:
ByteBuffer buffer = ByteBuffer.allocateDirect(1024);
CharBuffer charBuffer = CharBuffer.allocate(1024);
while ((channel.read(buffer)) != -1) {
buffer.flip();
decoder.decode(buffer, charBuffer, false);
charBuffer.flip();
System.out.println(charBuffer);
buffer.clear();
charBuffer.clear();
}
The following program connects all these pieces to read the main page of a Web site through an HTTP request. Feel free to save the output to a file to compare the results to viewing the page with a browser.
import java.io.*;
import java.net.*;
import java.nio.*;
import java.nio.channels.*;
import java.nio.charset.*;
public class ReadURL {
public static void main(String args[]) {
String host = args[0];
SocketChannel channel = null;
try {
// Setup
InetSocketAddress socketAddress =
new InetSocketAddress(host, 80);
Charset charset =
Charset.forName("ISO-8859-1");
CharsetDecoder decoder =
charset.newDecoder();
CharsetEncoder encoder =
charset.newEncoder();
// Allocate buffers
ByteBuffer buffer =
ByteBuffer.allocateDirect(1024);
CharBuffer charBuffer =
CharBuffer.allocate(1024);
// Connect
channel = SocketChannel.open();
channel.connect(socketAddress);
// Send request
String request = "GET / \n\r\n\r";
channel.write(encoder.encode(CharBuffer.wrap(request)));
// Read response
while ((channel.read(buffer)) != -1) {
buffer.flip();
// Decode buffer
decoder.decode(buffer, charBuffer, false);
// Display
charBuffer.flip();
System.out.println(charBuffer);
buffer.clear();
charBuffer.clear();
}
} catch (UnknownHostException e) {
System.err.println(e);
} catch (IOException e) {
System.err.println(e);
} finally {
if (channel != null) {
try {
channel.close();
} catch (IOException ignored) {
}
}
}
}
}
Non-Blocking Reads
Now comes the interesting part, and what people are most interested in in the new I/O packages. How do you configure the channel connection to non-blocking? The basic step is to call the configureBlocking method on the opened SocketChannel, and pass in a value of false. Once you call the connect method, the method now returns immediately.
String host = ...;
InetSocketAddress socketAddress =
new InetSocketAddress(host, 80);
channel = SocketChannel.open();
channel.configureBlocking(false);
channel.connect(socketAddress);
Once you have a non-blocking channel, you then have to figure out how to actually work with the channel. The SocketChannel is an example of a SelectableChannel. These selectable channels work with a Selector. Basically, you register the channel with the Selector, tell the Selector what events you are interested in, and it notifies you when something interesting happens.
To get a Selector instance, just call the static open method of the class:
Selector selector = Selector.open();
Registering with the Selector is done through the register method of the channel. The events are specified by fields of the SelectionKey class. In the case of the SocketChannel class, the available operations are OP_CONNECT, OP_READ, and OP_WRITE. So, if you were interested in read and connection operations, you would register as follows:
channel.register(selector,
SelectionKey.OP_CONNECT | SelectionKey.OP_READ);
At this point, you have to wait on the selector to tell you when events of interest happen on registered channels. The select method of the Selector will block until something interesting happens. To find this out, you can put a while (selector.select() > 0) loop in its own thread and then go off and do your own thing while the I/O events are being processed. The select method returns when something happens, where the value returned is the count of channels ready to be acted upon. This value doesn't really matter though.
Once something interesting happens, you have to figure out what happened and respond accordingly. For the channel registered here with the selector, you expressed interest in both the OP_CONNECT and OP_READ operations, so you know it can only be one of those events. So, what you do is get the Set of ready objects through the selectedKeys method, and iterate. The element in the Set is a SelectionKey, and you can check if it isConnectable or isReadable for the two states of interest.
Here's the basic framework of the loop so far:
while (selector.select(500) > 0) {
// Get set of ready objects
Set readyKeys = selector.selectedKeys();
Iterator readyItor = readyKeys.iterator();
// Walk through set
while (readyItor.hasNext()) {
// Get key from set
SelectionKey key =
(SelectionKey)readyItor.next();
// Remove current entry
readyItor.remove();
// Get channel
SocketChannel keyChannel =
(SocketChannel)key.channel();
if (key.isConnectable()) {
} else if (key.isReadable()) {
}
}
}
The remove method call requires a little explanation. The ready set of channels can change while you are processing them. So, you should remove the one you are processing when you process it. There's also a timeout setup here for the select call so it doesn't wait forever if there is nothing to do. There's also a call to get the channel from the key in there. You'll need that for each operation.
For the sample program here you're doing the equivalent of reading from an HTTP connection, so upon connection you need to send the initial HTTP request. Basically, once you know the connection is made, you send a GET request for the root of the site. When the selector reports that the channel is connectable, it may not have finished connecting yet. So, you should always check if the connection is pending through isConnectionPending and call finishConnect if it is. Once connected, you can write to the channel, but must use a ByteBuffer, not the more familiar I/O streams.
Here's what the connection code looks like:
// OUTSIDE WHILE LOOP
Charset charset =
Charset.forName("ISO-8859-1");
CharsetEncoder encoder = charset.newEncoder();
// INSIDE if (channel.isConnectable())
// Finish connection
if (keyChannel.isConnectionPending()) {
keyChannel.finishConnect();
}
// Send request
String request = "GET / \n\r\n\r";
keyChannel.write
(encoder.encode(CharBuffer.wrap(request)));
The reading from a socket channel is just like from a file channel. There is one exception though. It is more likely that the buffer may not be full when reading from a socket. Not a big deal though, as you are just going to read what is ready.
// OUTSIDE WHILE LOOP
CharsetDecoder decoder = charset.newDecoder();
ByteBuffer buffer = ByteBuffer.allocateDirect(1024);
CharBuffer charBuffer = CharBuffer.allocate(1024);
// INSIDE if (channel.isReadable())
// Read what's ready in response
keyChannel.read(buffer);
buffer.flip();
// Decode buffer
decoder.decode(buffer, charBuffer, false);
// Display
charBuffer.flip();
System.out.print(charBuffer);
// Clear for next pass
buffer.clear();
charBuffer.clear();
Add in the necessary exception handling code and you have your socket reader. Be sure to close the channel in the finally clause to make sure its resources are released, even if there is an exception. Here's the complete client code:
import java.io.*;
import java.net.*;
import java.nio.*;
import java.nio.channels.*;
import java.nio.charset.*;
import java.util.*;
public class NonBlockingReadURL {
static Selector selector;
public static void main(String args[]) {
String host = args[0];
SocketChannel channel = null;
try {
// Setup
InetSocketAddress socketAddress =
new InetSocketAddress(host, 80);
Charset charset =
Charset.forName("ISO-8859-1");
CharsetDecoder decoder =
charset.newDecoder();
CharsetEncoder encoder =
charset.newEncoder();
// Allocate buffers
ByteBuffer buffer =
ByteBuffer.allocateDirect(1024);
CharBuffer charBuffer =
CharBuffer.allocate(1024);
// Connect
channel = SocketChannel.open();
channel.configureBlocking(false);
channel.connect(socketAddress);
// Open Selector
selector = Selector.open();
// Register interest in when connection
channel.register(selector,
SelectionKey.OP_CONNECT |
SelectionKey.OP_READ);
// Wait for something of interest to happen
while (selector.select(500) > 0) {
// Get set of ready objects
Set readyKeys = selector.selectedKeys();
Iterator readyItor = readyKeys.iterator();
// Walk through set
while (readyItor.hasNext()) {
// Get key from set
SelectionKey key =
(SelectionKey)readyItor.next();
// Remove current entry
readyItor.remove();
// Get channel
SocketChannel keyChannel =
(SocketChannel)key.channel();
if (key.isConnectable()) {
// Finish connection
if (keyChannel.isConnectionPending()) {
keyChannel.finishConnect();
}
// Send request
String request =
"GET / \n\r\n\r";
keyChannel.write(encoder.encode(
CharBuffer.wrap(request)));
} else if (key.isReadable()) {
// Read what's ready in response
keyChannel.read(buffer);
buffer.flip();
// Decode buffer
decoder.decode(buffer,
charBuffer, false);
// Display
charBuffer.flip();
System.out.print(charBuffer);
// Clear for next pass
buffer.clear();
charBuffer.clear();
} else {
System.err.println("Ooops");
}
}
}
} catch (UnknownHostException e) {
System.err.println(e);
} catch (IOException e) {
System.err.println(e);
} finally {
if (channel != null) {
try {
channel.close();
} catch (IOException ignored) {
}
}
}
System.out.println();
}
}
Non-Blocking Servers
The final piece is having a Web server use the NIO package. With the new I/O capabilities, you can create a Web server that does not require one thread per connection. You can certainly pool threads for long processing tasks, but all you have to do is select and wait for something to do, not have all the threads waiting separately.
The basic setup of the server using channels involves you calling bind to connect a ServerSocketChannel to a InetSocketAddress.
ServerSocketChannel channel =
ServerSocketChannel.open();
channel.configureBlocking(false);
InetSocketAddress isa =
new InetSocketAddress(port);
channel.socket().bind(isa);
Everything else is the same as the client reading, except this time you need to register the OP_ACCEPT key, and check for isAcceptable when the selector notifies you of the event. It is that simple.
The following code example shows just how simple this is. It is your basic single-threaded server, sending back a canned text message for each request. Just use telnet to connect to port 9999 and see the response.
import java.io.*;
import java.net.*;
import java.nio.*;
import java.nio.channels.*;
import java.util.*;
public class Server {
private static int port = 9999;
public static void main(String args[])
throws Exception {
Selector selector = Selector.open();
ServerSocketChannel channel =
ServerSocketChannel.open();
channel.configureBlocking(false);
InetSocketAddress isa =
new InetSocketAddress(port);
channel.socket().bind(isa);
// Register interest in when connection
channel.register(selector,
SelectionKey.OP_ACCEPT);
// Wait for something of interest to happen
while (selector.select() > 0) {
// Get set of ready objects
Set readyKeys = selector.selectedKeys();
Iterator readyItor = readyKeys.iterator();
// Walk through set
while (readyItor.hasNext()) {
// Get key from set
SelectionKey key =
(SelectionKey)readyItor.next();
// Remove current entry
readyItor.remove();
if (key.isAcceptable()) {
// Get channel
ServerSocketChannel keyChannel =
(ServerSocketChannel)key.channel();
// Accept request
Socket socket = keyChannel.accept();
// Return canned message
PrintWriter out = new PrintWriter(
socket.getOutputStream(), true);
out.println("Hello, NIO");
out.close();
} else {
System.err.println("Ooops");
}
}
}
// Never ends
}
}
After accepting the request, you could get the channel from the socket, make it non-blocking, and register that with the selector, too. This framework just provides the basics of using the NIO classes within a Web server. For additional information about creating a multi-threaded server, see the JavaWorld article referenced in the Resources section.
Conclusion
The New I/O features introduced to the J2SE version 1.4 Beta release provide exciting new ways to improve the performance of your programs. By taking advantage of the new capabilities, not only will they be faster but they can be much more scalable because you won't have to worry about tasks like one thread per connection. This is especially important on the server side, greatly increasing the possible number of simultaneous connections supported.
Note: If you look at the list of capabilities in JSR 51, you'll notice there is mention of scanning and formatting support, similar to C's printf. This feature didn't make the 1.4 beta release and will be saved for a later version.