41bb726c

Day 19

Streams and I/O

by Charles L. Perkins and Laura Lemay


CONTENTS

The package java.io, part of the standard Java class library, provides a large number of classes designed for handling input and output to files, network connections, and other sources. These I/O classes are known as streams, and provide functionality for reading and writing data in various ways. You got a glimpse of these classes on Day 14, "Windows, Networking, and Other Tidbits," when we opened a network connection to a file and read the contents into an applet.

Today you'll explore Java's input and output classes:

You'll also learn about two stream interfaces that make the reading and writing of typed streams much easier (as well as about several utility classes used to access the file system).

What Are Streams?

A stream is a path of communication between the source of some information and its destination. This information can come from a file, the computer's memory, or even from the Internet. In fact, the source and destination of a stream are completely arbitrary producers and consumers of bytes, respectively-you don't need to know about the source of the information when reading from a stream, and you don't need to know about the final destination when writing to one.

A stream is a path of communication between a source of information and its destination. For example, an input stream allows you to read data from a source, and an output stream allows you to write data to a destination.

General-purpose methods that can read from any source accept a stream argument to specify that source; general-purpose methods for writing accept a stream to specify the destination. Arbitrary processors of data commonly have two stream arguments. They read from the first, process the data, and write the results to the second. These processors have no idea of either the source or the destination of the data they are processing. Sources and destinations can vary widely: from two memory buffers on the same local computer, to the ELF (extremely low frequency) transmissions to and from a submarine at sea, to the real-time data streams of a NASA probe in deep space.

By decoupling the consuming, processing, or producing of data from the sources and destinations of that data, you can mix and match any combination of them at will as you write your program. In the future, when new, previously nonexistent forms of source or destination (or consumer, processor, or producer) appear, they can be used within the same framework, with no changes to your classes. In addition, new stream abstractions, supporting higher levels of interpretation "on top of" the bytes, can be written completely independently of the underlying transport mechanisms for the bytes themselves.

The java.io Package

All the classes you will learn about today are part of the package java.io. To use any of these classes in your own programs, you will need to import each individual class or to import the entire java.io package, like this:

import java.io.InputStream;
import java.io.FilteredInputStream;
import java.io.FileOutputStream;

import java.io.*;

All the methods you will explore today are declared to throw IOExceptions. This new subclass of Exception conceptually embodies all the possible I/O errors that might occur while using streams. Several subclasses of it define a few, more specific exceptions that can be thrown as well. For now, it is enough to know that you must either catch an IOException, or be in a method that can "pass it along," to be a well-behaved user of streams.

The foundations of this stream framework in the Java class hierarchy are the two abstract classes, InputStream and OutputStream. Inheriting from these classes is a virtual cornucopia of categorized subclasses, demonstrating the wide range of streams in the system, but also demonstrating an extremely well-designed hierarchy of relationships between these streams-one well worth learning from. Let's begin with the parents, InputStream and OutputStream, and then work our way down this bushy tree.

Input Streams

Input streams are streams that allow you to read data from a source. These include the root abstract class InputStream, filtered streams, buffered streams, and streams that read from files, strings, and byte arrays.

The Abstract Class InputStream

InputStream is an abstract class that defines the fundamental ways in which a destination (consumer) reads a stream of bytes from some source. The identity of the source, and the manner of the creation and transport of the bytes, is irrelevant. When using an input stream, you are the destination of those bytes, and that's all you need to know.

Note
All input streams descend from InputStream. All share in common the few methods described in this section. Thus, the streams used in these examples can be any of the more complex input streams described in the next few sections.

read()

The most important method to the consumer of an input stream is the one that reads bytes from the source. This method, read(), comes in many flavors, and each is demonstrated in an example in today's lesson.

Each of these read() methods is defined to "block" (wait) until all the input requested becomes available. Don't worry about this limitation; because of multithreading, you can do as many other things as you like while this one thread is waiting for input. In fact, it is a common idiom to assign a thread to each stream of input (and for each stream of output) that is solely responsible for reading from it (or writing to it). These input threads might then "hand off" the information to other threads for processing. This naturally overlaps the I/O time of your program with its compute time.

Here's the first form of read():

InputStream  s      = getAnInputStreamFromSomewhere();
byte[]       buffer = new byte[1024];   // any size will do

if (s.read(buffer) != buffer.length)
    System.out.println("I got less than I expected.");

Note
Here and throughout the rest of today's lesson, assume that either an import java.io.* appears before all the examples or that you mentally prefix all references to java.io classes with the prefix java.io.

This form of read() attempts to fill the entire buffer given. If it cannot (usually due to reaching the end of the input stream), it returns the actual number of bytes that were read into the buffer. After that, any further calls to read() return -1, indicating that you are at the end of the stream. Note that the if statement still works even in this case, because -1 != 1024 (this corresponds to an input stream with no bytes in it at all).

Note
Don't forget that, unlike in C, the -1 case in Java is not used to indicate an error. Any I/O errors throw instances of IOException (which you're not catching yet). You learned on Day 17, "Exceptions," that all uses of distinguished values can be replaced by the use of exceptions, and so they should. The -1 in the last example is a bit of a historical anachronism. You'll soon see a better approach to indicating the end of the stream using the class DataInputStream.

You can also read into a "slice" of your buffer by specifying the offset into the buffer, and the length desired, as arguments to read():

s.read(buffer, 100, 300);

This example tries to fill in bytes 100 through 399 and behaves otherwise exactly the same as the previous read() method.

Finally, you can read in bytes one at a time:

InputStream  s = getAnInputStreamFromSomewhere();
byte         b;
int          byteOrMinus1;

while ((byteOrMinus1 = s.read()) != -1) {
     b = (byte) byteOrMinus1;
     . . .    // process the byte b
}
. . .    // reached end of stream

Note
Because of the nature of integer promotion in Java in general, and because in this case the read() method returns an int, using the byte type in your code may be a little frustrating. You'll find yourself constantly having to explicitly cast the result of arithmetic expressions, or of int return values, back to your size. Because read() really should be returning a byte in this case, we feel justified in declaring and using it as such (despite the pain)-it makes the size of the data being read clearer. In cases where you feel that the range of a variable is naturally limited to a byte (or a short) rather than an int, please take the time to declare it that way and pay the small price necessary to gain the added clarity. By the way, a lot of the Java class library code simply stores the result of read() in an int.

skip()

What if you want to skip over some of the bytes in a stream, or start reading a stream from other than its beginning? A method similar to read() does the trick:

if (s.skip(1024) != 1024)
    System.out.println("I skipped less than I expected.");

This example skips over the next 1024 bytes in the input stream. However, the implementation of skip() in InputStream may skip fewer bytes than the given argument, and so it returns a long integer representing the number of bytes it actually skipped. In this example, therefore, a message is printed if the actual number of bytes skipped is less than 1024.

Note
The API documentation for skip() in the InputStream class says that skip() behaves this way for "a variety of reasons." Subclasses of InputStream should override this default implementation of skip() if they want to handle skipping more properly.

available()

If for some reason you would like to know how many bytes are in the stream right now, you can ask the following:

if (s.available() < 1024)
    System.out.println("Too little is available right now.");

This tells you the number of bytes that you can read without blocking. Because of the abstract nature of the source of these bytes, streams may or may not be able to tell you a reasonable answer to this question. For example, some streams always return 0. Unless you use specific subclasses of InputStream that you know provide a reasonable answer to this question, it's not a good idea to rely on this method. Remember that multithreading eliminates many of the problems associated with blocking while waiting for a stream to fill again. Thus, one of the strongest rationales for the use of available() goes away.

mark() and reset()

Some streams support the notion of marking a position in the stream and then later resetting the stream to that position to reread the bytes there. Clearly, the stream would have to "remember" all those bytes, so there is a limitation on how far apart in a stream the mark and its subsequent reset can occur. There's also a method that asks whether the stream supports the notion of marking at all. Here's an example:

InputStream  s = getAnInputStreamFromSomewhere();

if (s.markSupported()) {    // does s support the notion?
    . . .        // read the stream for a while
    s.mark(1024);
    . . .        // read less than 1024 more bytes
    s.reset();
    . . .        // we can now re-read those bytes
} else {
    . . .                   // no, perform some alternative
}

When marking a stream, you specify the maximum number of bytes you intend to allow to pass before resetting it. This allows the stream to limit the size of its byte "memory." If this number of bytes goes by and you have not yet used reset(), the mark becomes invalid, and attempting to use reset() will throw an exception.

Marking and resetting a stream is most valuable when you are attempting to identify the type of the stream (or the next part of the stream), but to do so, you must consume a significant piece of it in the process. Often, this is because you have several black-box parsers that you can hand the stream to, but they will consume some (unknown to you) number of bytes before making up their mind about whether the stream is of their type. Set a large size for the limit in mark(), and let each parser run until it either throws an error or completes a successful parse. If an error is thrown, use reset() and try the next parser.

close()

Because you don't know what resources an open stream represents, nor how to deal with them properly when you're finished reading the stream, you should (usually) explicitly close down a stream so that it can release these resources. Of course, garbage collection and a finalization method can do this for you, but what if you need to reopen that stream or those resources before they have been freed by this asynchronous process? At best, this is annoying or confusing; at worst, it introduces an unexpected, obscure, and difficult-to-track-down bug. Because you're interacting with the outside world of external resources, it's safer to be explicit about when you're finished using them:

InputStream  s = alwaysMakesANewInputStream();

try {
    . . .     // use s to your heart's content
} finally {
    s.close();
}

Get used to this idiom (using finally); it's a useful way to be sure something (such as closing the stream) always gets done. Of course, you're assuming that the stream is always successfully created. If this is not always the case, and null is sometimes returned instead, here's the correct way to be safe:

InputStream  s = tryToMakeANewInputStream();

if (s != null) {
    try {
        . . .
    } finally {
        s.close();
    }
}

ByteArrayInputStream

The "inverse" of some of the previous examples would be to create an input stream from an array of bytes. This is exactly what ByteArrayInputStream does:

byte[]  buffer = new byte[1024];

fillWithUsefulData(buffer);

InputStream  s = new ByteArrayInputStream(buffer);

Readers of the new stream s see a stream 1024 bytes long, containing the bytes in the array buffer. Just as read() has a form that takes an offset and a length, so does this class's constructor:

InputStream  s = new ByteArrayInputStream(buffer, 100, 300);

Here the stream is 300 bytes long and consists of bytes 100-399 from the array buffer.

Note
Finally, you've seen your first examples of the creation of a stream. These new streams are attached to the simplest of all possible sources of data: an array of bytes in the memory of the local computer.

ByteArrayInputStreams simply implement the standard set of methods that all input streams do. Here, however, the available() method has a particularly simple job-it returns 1024 and 300, respectively, for the two instances of ByteArrayInputStream you created previously, because it knows exactly how many bytes are available. Finally, calling reset() on a ByteArrayInputStream resets it to the beginning of the stream (buffer), no matter where the mark is set.

FileInputStream

One of the most common uses of streams, and historically the earliest, is to attach them to files in the file system. Here, for example, is the creation of such an input stream on a UNIX system:

InputStream  s = new FileInputStream("/some/path/and/fileName");

Warning
Applets attempting to open, read, or write streams based on files in the file system will usually cause security exceptions to be thrown from the browser. If you're developing applets, you won't be able to depend on files at all, and you'll have to use your server to hold shared information. (Standalone Java programs have none of these problems, of course.)

You also can create the stream from a previously opened file descriptor (an instance of the FileDescriptor class). Usually, you get file descriptors using the getFD() method on FileInputStream or FileOutputStream classes, so, for example, you could use the same file descriptor to open a file for reading and then reopen it for writing:

FileDescriptor       fd = someFileStream.getFD();
InputStream  s  = new FileInputStream(fd);

In either case, because it's based on an actual (finite length) file, the input stream created can implement available() precisely and can skip like a champ (just as ByteArrayInputStream can, by the way). In addition, FileInputStream knows a few more tricks:

FileInputStream  aFIS = new FileInputStream("aFileName");

FileDescriptor  myFD = aFIS.getFD(); // get a file descriptor

 aFIS.finalize();   // will call close() when automatically called by GC

Tip
To call these new methods, you must declare the stream variable aFIS to be of type FileInputStream, because plain InputStreams don't know about them.

The first is obvious: getFD() returns the file descriptor of the file on which the stream is based. The second, though, is an interesting shortcut that allows you to create FileInputStreams without worrying about closing them later. FileInputStream's implementation of finalize(), a protected method, closes the stream. Unlike in the contrived call in comments, you almost never can nor should call a finalize() method directly. The garbage collector calls it after noticing that the stream is no longer in use, but before actually destroying the stream. Thus, you can go merrily along using the stream, never closing it, and all will be well. The system takes care of closing it (eventually).

You can get away with this because streams based on files tie up very few resources, and these resources cannot be accidentally reused before garbage collection (these were the things worried about in the previous discussion of finalization and close()). Of course, if you were also writing to the file, you would have to be more careful. (Reopening the file too soon after writing might make it appear in an inconsistent state because the finalize()-and thus the close()-might not have happened yet.) Just because you don't have to close the stream doesn't mean you might not want to do so anyway. For clarity, or if you don't know precisely what type of an InputStream you were handed, you might choose to call close() yourself.

FilterInputStream

This "abstract" class simply provides a "pass-through" for all the standard methods of InputStream. (It's "abstract," in quotes, because it's not technically an abstract class; you can create instances of it. In most cases, however, you'll use one of the more useful subclasses of FilterInputStream instead of FilterInputStream itself.) FilterInputStream holds inside itself another stream, by definition one further "down" the chain of filters, to which it forwards all method calls. It implements nothing new but allows itself to be nested:

InputStream        s  = getAnInputStreamFromSomewhere();
FilterInputStream  s1 = new FilterInputStream(s);
FilterInputStream  s2 = new FilterInputStream(s1);
FilterInputStream  s3 = new FilterInputStream(s2);

... s3.read() ...

Whenever a read is performed on the filtered stream s3, it passes along the request to s2, then s2 does the same to s1, and finally s is asked to provide the bytes. Subclasses of FilterInputStream will, of course, do some nontrivial processing of the bytes as they flow past. The rather verbose form of "chaining" in the previous example can be made more elegant:

s3 = new FilterInputStream(new FilterInputStream(new FilterInputStream(s)));

You should use this idiom in your code whenever you can. It clearly expresses the nesting of chained filters, and can easily be parsed and "read aloud" by starting at the innermost stream s and reading outward-each filter stream applying to the one within-until you reach the outermost stream s3.

Now let's examine each of the subclasses of FilterInputStream in turn.

BufferedInputStream

This is one of the most valuable of all streams. It implements the full complement of InputStream's methods, but it does so by using a buffered array of bytes that acts as a cache for future reading. This decouples the rate and the size of the "chunks" you're reading from the more regular, larger block sizes in which streams are most efficiently read (from, for example, peripheral devices, files in the file system, or the network). It also allows smart streams to read ahead when they expect that you will want more data soon.

Because the buffering of BufferedInputStream is so valuable, and it's also the only class to handle mark() and reset() properly, you might wish that every input stream could somehow share its valuable capabilities. Normally, because those stream classes do not implement them, you would be out of luck. Fortunately, you already saw a way that filter streams can wrap themselves "around" other streams. Suppose that you would like a buffered FileInputStream that can handle marking and resetting correctly. Et voilà:

InputStream  s = new BufferedInputStream(new FileInputStream("foo"));

You have a buffered input stream based on the file foo that can use mark() and reset().

Now you can begin to see the power of nesting streams. Any capability provided by a filter input stream (or output stream, as you'll see soon) can be used by any other basic stream via nesting. Of course, any combination of these capabilities, and in any order, can be as easily accomplished by nesting the filter streams themselves.

DataInputStream

All the methods that instances of this class understand are defined in a separate interface, which both DataInputStream and RandomAccessFile (another class in java.io) implement. This interface is general-purpose enough that you might want to use it yourself in the classes you create. It is called DataInput.

The DataInput Interface

When you begin using streams to any degree, you'll quickly discover that byte streams are not a really helpful format into which to force all data. In particular, the primitive types of the Java language embody a rather nice way of looking at data, but with the streams you've been defining thus far in this book, you could not read data of these types. The DataInput interface specifies a higher-level set of methods that, when used for both reading and writing, can support a more complex, typed stream of data. Here are the methods this interface defines:

void  readFully(byte[]  buffer)                           throws IOException;
void  readFully(byte[]  buffer, int  offset, int  length) throws IOException;
int   skipBytes(int n)                                    throws IOException;

boolean  readBoolean()       throws IOException;
byte     readByte()          throws IOException;
int      readUnsignedByte()  throws IOException;
short    readShort()         throws IOException;
int      readUnsignedShort() throws IOException;
char     readChar()          throws IOException;
int      readInt()           throws IOException;
long     readLong()          throws IOException;
float    readFloat()         throws IOException;
double   readDouble()        throws IOException;

String   readLine()          throws IOException;
String   readUTF()           throws IOException;

The first three methods are simply new names for skip() and the two forms of read() you've seen previously. Each of the next 10 methods reads in a primitive type or its unsigned counterpart (useful for using every bit efficiently in a binary stream). These latter methods must return an integer of a wider size than you might think; because integers are signed in Java, the unsigned value does not fit in anything smaller. The final two methods read a newline ('\r', '\n', or "\r\n") terminated string of characters from the stream-the first in ASCII, and the second in Unicode.

Now that you know what the interface that DataInputStream implements looks like, let's see it in action:

DataInputStream  s = new DataInputStream(myRecordInputStream());

long  size = s.readLong();    // the number of items in the stream

while (size-- > 0) {
    if (s.readBoolean()) {    // should I process this item?
        int     anInteger     = s.readInt();
        int     magicBitFlags = s.readUnsignedShort();
        double  aDouble       = s.readDouble();

        if ((magicBitFlags & 0100000) != 0) {
            . . .    // high bit set, do something special
        }
        . . .    // process anInteger and aDouble
    }
}

Because the class implements an interface for all its methods, you can also use the following interface:

DataInput  d = new DataInputStream(new FileInputStream("anything"));
String     line;

while ((line = d.readLine()) != null) {
    . . .     // process the line
}

EOFException

One final point about most of DataInputStream's methods: When the end of the stream is reached, the methods throw an EOFException. This is tremendously useful and, in fact, allows you to rewrite all the kludgey uses of -1 you saw earlier today in a much nicer fashion:

DataInputStream  s = new DataInputStream(getAnInputStreamFromSomewhere());

try {
    while (true) {
        byte  b = (byte) s.readByte();
        . . .    // process the byte b
    }
} catch (EOFException e) {
    . . .    // reached end of stream
} finally {
  s.close();
}

This works just as well for all but the last two of the read methods of DataInputStream.

Warning
skipBytes() does nothing at all on end of stream, readLine() returns null, and readUTF() might throw a UTFDataFormatException, if it notices the problem at all.

LineNumberInputStream

In an editor or a debugger, line numbering is crucial. To add this valuable capability to your programs, use the filter stream LineNumberInputStream, which keeps track of line numbers as its stream "flows through" it. It's even smart enough to remember a line number and later restore it, during a mark() and reset(). You might use this class as follows:

LineNumberInputStream  aLNIS;
aLNIS = new LineNumberInputStream(new FileInputStream("source"));

DataInputStream  s = new DataInputStream(aLNIS);
String           line;

while ((line = s.readLine()) != null) {
    . . .    // process the line
    System.out.println("Did line number: " + aLNIS.getLineNumber());
}

Here, two filter streams are nested around the FileInputStream actually providing the data-the first to read lines one at a time and the second to keep track of the line numbers of these lines as they go by. You must explicitly name the intermediate filter stream, aLNIS, because if you did not, you couldn't call getLineNumber() later. Note that if you invert the order of the nested streams, reading from DataInputStream does not cause LineNumberInputStream to "see" the lines.

You must put any filter streams acting as "monitors" in the middle of the chain and "pull" the data from the outermost filter stream so that the data will pass through each of the monitors in turn. In the same way, buffering should occur as far inside the chain as possible, because the buffered stream won't be able to do its job properly unless most of the streams that need buffering come after it in the flow. For example, here's a silly order:

new BufferedInputStream(new LineNumberInputStream(
            _new DataInputStream(new FileInputStream("foo"));

and here's a much better order:

new DataInputStream(new LineNumberInputStream(
            _new BufferedInputStream(new FileInputStream("foo"));

LineNumberInputStreams can also be told setl1ineNumber(), for those few times when you know more than they do.

PushbackInputStream

The filter stream class PushbackInputStream is commonly used in parsers, to "push back" a single character in the input (after reading it) while trying to determine what to do next-a simplified version of the mark() and reset() utility you learned about earlier. Its only addition to the standard set of InputStream methods is unread(), which, as you might guess, pretends that it never read the byte passed in as its argument, and then gives that byte back as the return value of the next read().

Listing 19.1 shows a simple implementation of readLine() using this class:


Listing 19.1. A simple line reader.
 1:import java.io;
 2:
 3:public class  SimpleLineReader {
 4:    private FilterInputStream  s;
 5:
 6:    public  SimpleLineReader(InputStream  anIS) {
 7:        s = new DataInputStream(anIS);
 8:    }
 9:
10:    . . .    // other read() methods using stream s
11:
12:    public String  readLine() throws IOException {
13:        char[]  buffer = new char[100];
14:        int     offset = 0;
15:        byte    thisByte;
16:
17:        try {
18:loop:        while (offset < buffer.length) {
19:                switch (thisByte = (byte) s.read()) {
20:                    case '\n':
21:                        break loop;
22:                    case '\r':
23:                        byte  nextByte = (byte) s.read();
24:
25:                        if (nextByte != '\n') {
26:                            if (!(s instanceof PushbackInputStream)) {
27:                                s = new PushbackInputStream(s);
28:                            }
29:                            ((PushbackInputStream) s).unread(nextByte);
30:                        }
31:                        break loop;
32:                    default:
33:                        buffer[offset++] = (char) thisByte;
34:                        break;
35:                }
36:            }
37:        } catch (EOFException e) {
38:            if (offset == 0)
39:                return null;
40:        }
41:          return String.copyValueOf(buffer, 0, offset);
42:     }
43:}

This example demonstrates numerous things. For the purpose of this example, the readLine() method is restricted to reading the first 100 characters of the line. In this respect, it demonstrates how not to write a general-purpose line processor (you should be able to read a line of any size). This example does, however, show you how to break out of an outer loop (using the loop label in line 18 and the break statements in lines 21 and 31), and how to produce a String from an array of characters (in this case, from a "slice" of the array of characters). This example also includes standard uses of InputStream's read() for reading bytes one at a time, and of determining the end of the stream by enclosing it in a DataInputStream and catching EOFException.

One of the more unusual aspects of the example is the way PushbackInputStream is used. To be sure that '\n' is ignored following '\r', you have to "look ahead" one character; but if it is not a '\n', you must push back that character. Look at the lines 26 through 29 as if you didn't know much about the stream s. The general technique used is instructive. First, you see whether s is already an instance of some kind of PushbackInputStream. If so, you can simply use it. If not, you enclose the current stream (whatever it is) inside a new PushbackInputStream and use this new stream. Now, let's jump back into the context of the example.

Line 29 following that if statement in line 26 wants to call the method unread(). The problem is that s has a compile-time type of FilterInputStream, and thus doesn't understand that method. The previous three lines (26) have guaranteed, however, that the runtime type of the stream in s is PushbackInputStream, so you can safely cast it to that type and then safely call unread().

Note
This example was done in an unusual way for demonstration purposes. You could have simply declared a PushbackInputStream variable and always enclosed the DataInputStream in it. (Conversely, SimpleLineReader's constructor could have checked whether its argument was already of the right class, the way PushbackInputStream did, before creating a new DataInputStream.) The interesting thing about this approach of wrapping a class only when needed is that it works for any InputStream that you hand it, and it does additional work only if it needs to. Both of these are good general design principles.

All the subclasses of FilterInputStream have now been described. It's time to return to the direct subclasses of InputStream.

PipedInputStream

This class, along with its brother class PipedOutputStream, are covered later today (they need to be understood and demonstrated together). For now, all you need to know is that together they create a simple, two-way communication conduit between threads.

SequenceInputStream

Suppose you have two separate streams and you would like to make a composite stream that consists of one stream followed by the other (like appending two Strings together). This is exactly what SequenceInputStream was created for:

InputStream  s1 = new FileInputStream("theFirstPart");
InputStream  s2 = new FileInputStream("theRest");

InputStream  s  = new SequenceInputStream(s1, s2);

... s.read() ...   // reads from each stream in turn

You could have "faked" this example by reading each file in turn-but what if you had to hand the composite stream s to some other method that was expecting only a single InputStream? Here's an example (using s) that line-numbers the two previous files with a common numbering scheme:

LineNumberInputStream  aLNIS = new LineNumberInputStream(s);

... aLNIS.getLineNumber() ...

Note
Stringing together streams this way is especially useful when the streams are of unknown length and origin and were just handed to you by someone else.

What if you want to string together more than two streams? You could try the following:

Vector  v = new Vector();
. . .   // set up all the streams and add each to the Vector
InputStream  s1 = new SequenceInputStream(v.elementAt(0), v.elementAt(1));
InputStream  s2 = new SequenceInputStream(s1, v.elementAt(2));
InputStream  s3 = new SequenceInputStream(s2, v.elementAt(3));
. . .

Note
A Vector is a growable array of objects that can be filled, referenced (with elementAt()), and enumerated.

However, it's much easier to use a different constructor that SequenceInputStream provides:

InputStream  s  = new SequenceInputStream(v.elements());

This constructor takes one argument-an object of type Enumeration (in this example, we got that object using Vector's elements() method). The resulting SequenceInputStream object contains all the streams you want to combine and returns a single stream that reads through the data of each in turn.

StringBufferInputStream

StringBufferInputStream is exactly like ByteArrayInputStream, but instead of being based on a byte array, it's based on an array of characters (a String):

String       buffer = "Now is the time for all good men to come...";
InputStream  s      = new StringBufferInputStream(buffer);

All comments that were made about ByteArrayInputStream apply here as well.

Note
StringBufferInputStream is a bit of a misnomer because this input stream is actually based on a String. It should really be called StringInputStream.

Output Streams

An output stream is the reverse of an input stream; whereas with an input stream you read data from the stream, with output streams you write data to the stream. Most of the InputStream subclasses you've already seen have their equivalent OutputStream brother classes. If an InputStream performs a certain operation, the brother OutputStream performs the inverse operation. You'll see more of what this means soon.

The Abstract Class OutputStream

OutputStream is the abstract class that defines the fundamental ways in which a source (producer) writes a stream of bytes to some destination. The identity of the destination, and the manner of the transport and storage of the bytes, is irrelevant. When using an output stream, you are the source of those bytes, and that's all you need to know.

write()

The most important method to the producer of an output stream is the one that writes bytes to the destination. This method, write(), comes in many flavors, each demonstrated in the following examples:

Note
Every one of these write() methods is defined to block until all the output requested has been written. You don't need to worry about this limitation-see the note under InputStream's read() method if you don't remember why.

OutputStream  s      = getAnOutputStreamFromSomewhere();
byte[]        buffer = new byte[1024];    // any size will do

fillInData(buffer);    // the data we want to output
s.write(buffer);

You also can write a "slice" of your buffer by specifying the offset into the buffer, and the length desired, as arguments to write():

s.write(buffer, 100, 300);

This example writes out bytes 100 through 399 and behaves otherwise exactly the same as the previous write() method.

Finally, you can write out bytes one at a time:

while (thereAreMoreBytesToOutput()) {
    byte  b = getNextByteForOutput();

    s.write(b);
}

flush()

Because you don't know what an output stream is connected to, you might be required to "flush" your output through some buffered cache to get it to be written (in a timely manner, or at all). OutputStream's version of this method does nothing, but it is expected that subclasses that require flushing (for example, BufferedOutputStream and PrintStream) will override this version to do something nontrivial.

close()

Just like for an InputStream, you should (usually) explicitly close down an OutputStream so that it can release any resources it may have reserved on your behalf. (All the same notes and examples from InputStream's close() method apply here, with the prefix In replaced everywhere by Out.)

All output streams descend from the abstract class OutputStream. All share the previous few methods in common.

ByteArrayOutputStream

The inverse of ByteArrayInputStream, which creates an input stream from an array of bytes, is ByteArrayOutputStream, which directs an output stream into an array of bytes:

OutputStream  s = new ByteArrayOutputStream();

s.write(123);
. . .

The size of the (internal) byte array grows as needed to store a stream of any length. You can provide an initial capacity as an aid to the class, if you like:

OutputStream  s = new ByteArrayOutputStream(1024 * 1024);  // 1 Megabyte

Note
You've just seen your first examples of the creation of an output stream. These new streams were attached to the simplest of all possible destinations of data, an array of bytes in the memory of the local computer.

Once the ByteArrayOutputStream object, stored in the variable s, has been "filled," it can be output to another output stream:

OutputStream           anotherOutputStream = getTheOtherOutputStream();
ByteArrayOutputStream  s = new ByteArrayOutputStream();

fillWithUsefulData(s);
s.writeTo(anotherOutputStream);

It also can be extracted as a byte array or converted to a String:

byte[]  buffer              = s.toByteArray();
String  bufferString        = s.toString();
String  bufferUnicodeString = s.toString(upperByteValue);

Note
The last method allows you to "fake" Unicode (16-bit) characters by filling in their lower bytes with ASCII and then specifying a common upper byte (usually 0) to create a Unicode String result.

ByteArrayOutputStreams have two utility methods: One simply returns the current number of bytes stored in the internal byte array, and the other resets the array so that the stream can be rewritten from the beginning:

int  sizeOfMyByteArray = s.size();

s.reset();     // s.size() would now return 0
s.write(123);
. . .

FileOutputStream

One of the most common uses of streams is to attach them to files in the file system. Here, for example, is the creation of such an output stream on a UNIX system:

OutputStream  s = new FileOutputStream("/some/path/and/fileName");

Warning
Applets attempting to open, read, or write streams based on files in the file system will cause security violations. See the note under FileInputStream for more details.

As with FileInputStream, you also can create the stream from a previously opened file descriptor:

FileDescriptor           fd = someFileStream.getFD();
OutputStream  s  = new FileOutputStream(fd);

FileOutputStream is the inverse of FileInputStream, and it knows the same tricks:

FileOutputStream  aFOS = new FileOutputStream("aFileName");

FileDescriptor  myFD = aFOS.getFD(); // get a file descriptor

aFOS.finalize();  // will call close() when automatically called by GC

Note
To call the new methods, you must declare the stream variable aFOS to be of type FileOutputStream, because plain OutputStreams don't know about them.

The first is obvious: getFD() simply returns the file descriptor for the file on which the stream is based. The second, commented, contrived call to finalize() is there to remind you that you may not have to worry about closing this type of stream-it is done for you automatically.

FilterOutputStream

This abstract class simply provides a "pass-through" for all the standard methods of OutputStream. It holds inside itself another stream, by definition one further "down" the chain of filters, to which it forwards all method calls. It implements nothing new but allows itself to be nested:

OutputStream        s  = getAnOutputStreamFromSomewhere();
FilterOutputStream  s1 = new FilterOutputStream(s);
FilterOutputStream  s2 = new FilterOutputStream(s1);
FilterOutputStream  s3 = new FilterOutputStream(s2);

... s3.write(123) ...

Whenever a write is performed on the filtered stream s3, it passes along the request to s2. Then s2 does the same to s1, and finally s is asked to output the bytes. Subclasses of FilterOutputStream, of course, do some nontrivial processing of the bytes as they flow past. This chain can be tightly nested-see its brother class, FilterInputStream, for more.

Now let's examine each of the subclasses of FilterOutputStream in turn.

BufferedOutputStream

BufferedOutputStream is one of the most valuable of all streams. All it does is implement the full complement of OutputStream's methods, but it does so by using a buffered array of bytes that acts as a cache for writing. This decouples the rate and the size of the "chunks" you're writing from the more regular, larger block sizes in which streams are most efficiently written (to peripheral devices, files in the file system, or the network, for example).

BufferedOutputStream is one of two classes in the Java library to implement flush(), which pushes the bytes you've written through the buffer and out the other side. Because buffering is so valuable, you might wish that every output stream could somehow be buffered. Fortunately, you can surround any output stream in such a way as to achieve just that:

OutputStream  s = new BufferedOutputStream(new FileOutputStream("foo"));

You now have a buffered output stream based on the file foo that can be flushed.

Just as for filter input streams, any capability provided by a filter output stream can be used by any other basic stream via nesting, and any combination of these capabilities, in any order, can be as easily accomplished by nesting the filter streams themselves.

DataOutputStream

All the methods that instances of this class understand are defined in a separate interface, which both DataOutputStream and RandomAccessFile implement. This interface is general-purpose enough that you might want to use it yourself in the classes you create. It is called DataOutput.

The DataOutput Interface

In cooperation with its brother inverse interface, DataInput, DataOutput provides a higher-level, typed-stream approach to the reading and writing of data. Rather than dealing with bytes, this interface deals with writing the primitive types of the Java language directly:

void  write(int i)                                    throws IOException;
void  write(byte[]  buffer)                           throws IOException;
void  write(byte[]  buffer, int  offset, int  length) throws IOException;

void  writeBoolean(boolean b) throws IOException;
void  writeByte(int i)        throws IOException;
void  writeShort(int i)       throws IOException;
void  writeChar(int i)        throws IOException;
void  writeInt(int i)         throws IOException;
void  writeLong(long l)       throws IOException;
void  writeFloat(float f)     throws IOException;
void  writeDouble(double d)   throws IOException;

void  writeBytes(String s) throws IOException;
void  writeChars(String s) throws IOException;
void  writeUTF(String s)   throws IOException;

Most of these methods have counterparts in the interface DataInput.

The first three methods mirror the three forms of write() you saw previously. Each of the next eight methods writes out a primitive type. The final three methods write out a string of bytes or characters to the stream: the first one as 8-bit bytes; the second, as 16-bit Unicode characters; and the last, as a special Unicode stream (readable by DataInput's readUTF()).

Note
The unsigned read methods in DataInput have no counterparts here. You can write out the data they need via DataOutput's signed methods because they accept int arguments and also because they write out the correct number of bits for the unsigned integer of a given size as a side effect of writing out the signed integer of that same size. It is the method that reads this integer that must interpret the sign bit correctly; the writer's job is easy.

Now that you know what the interface that DataOutputStream implements looks like, let's see it in action:

DataOutputStream  s    = new DataOutputStream(myRecordOutputStream());
long              size = getNumberOfItemsInNumericStream();

s.writeLong(size);

for (int  i = 0;  i < size;  ++i) {
    if (shouldProcessNumber(i)) {
        s.writeBoolean(true);     // should process this item
        s.writeInt(theIntegerForItemNumber(i));
        s.writeShort(theMagicBitFlagsForItemNumber(i));
        s.writeDouble(theDoubleForItemNumber(i));
    } else
        s.writeBoolean(false);
}

This is the exact inverse of the example that was given for DataInput. Together, they form a pair that can communicate a particular array of structured primitive types across any stream (or "transport layer"). Use this pair as a jumping-off point whenever you need to do something similar.

In addition to the preceding interface, the class itself implements one (self-explanatory) utility method:

int  theNumberOfBytesWrittenSoFar = s.size();
Processing a File

One of the most common idioms in file I/O is to open a file, read and process it line-by-line, and output it again to another file. Here's a prototypical example of how that would be done in Java:

DataInput   aDI = new DataInputStream(new FileInputStream("source"));
DataOutput  aDO = new DataOutputStream(new FileOutputStream("dest"));
String      line;

while ((line = aDI.readLine()) != null) {
    StringBuffer  modifiedLine = new StringBuffer(line);

    . . .      // process modifiedLine in place
    aDO.writeBytes(modifiedLine.toString());
}
aDI.close();
aDO.close();

If you want to process it byte-by-byte, use this:

try {
    while (true) {
        byte  b = (byte) aDI.readByte();

        . . .      // process b in place
        aDO.writeByte(b);
    }
} finally {
    aDI.close();
    aDO.close();
}

Here's a cute two-liner that just copies the file:

try { while (true) aDO.writeByte(aDI.readByte()); }
finally { aDI.close(); aDO.close(); }

Warning
Many of the examples in today's lesson (as well as the last two) are assumed to appear inside a method that has IOException in its throws clause, so they don't have to worry about catching those exceptions and handling them more reasonably. Your code should be a little less cavalier.

PrintStream

You may not realize it, but you're already intimately familiar with the use of two methods of the PrintStream class. That's because whenever you use these method calls:

System.out.print(. . .)
System.out.println(. . .)

you are actually using a PrintStream instance located in System's class variable out to perform the output. System.err is also a PrintStream, and System.in is an InputStream.

Note
On UNIX systems, these three streams will be attached to standard output, standard error, and standard input, respectively.

PrintStream is uniquely an output stream class (it has no brother class). Because it is usually attached to a screen output device of some kind, it provides an implementation of flush(). It also provides the familiar close() and write() methods, as well as a plethora of choices for outputting the primitive types and Strings of Java:

public void  write(int b);
public void  write(byte[]  buffer, int  offset, int  length);
public void  flush();
public void  close();

public void  print(Object o);
public void  print(String s);
public void  print(char[]  buffer);
public void  print(char c);
public void  print(int i);
public void  print(long l);
public void  print(float f);
public void  print(double d);
public void  print(boolean b);

public void  println(Object o);
public void  println(String s);
public void  println(char[]  buffer);
public void  println(char c);
public void  println(int i);
public void  println(long l);
public void  println(float f);
public void  println(double d);
public void  println(boolean b);

public void  println();   // output a blank line

PrintStream can also be wrapped around any output stream, just like a filter class:

PrintStream  s = new PrintStream(new FileOutputStream("foo"));

s.println("Here's the first line of text in the file foo.");

If you provide a second argument to the constructor for PrintStream, that second argument is a boolean that specifies whether the stream should auto-flush. If true, a flush() is sent after each newline character is written.

Here's a simple sample program that operates like the UNIX command cat, taking the standard input, line-by-line, and outputting it to the standard output:

import java.io.*;   // the one time in the chapter we'll say this

public class  Cat {
    public static void  main(String argv[]) {
        DataInput  d = new DataInputStream(System.in);
        String     line;

     try {  while ((line = d.readLine()) != null)
            System.out.println(line);
        } catch (IOException  ignored) { }
    }
}

PipedOutputStream

Along with PipedInputStream, this pair of classes supports a UNIX-pipe-like connection between two threads, implementing all the careful synchronization that allows this sort of "shared queue" to operate safely. Use the following to set up the connection:

PipedInputStream   sIn  = PipedInputStream();
PipedOutputStream  sOut = PipedOutputStream(sIn);

One thread writes to sOut; the other reads from sIn. By setting up two such pairs, the threads can communicate safely in both directions.

Related Classes

The other classes and interfaces in java.io supplement the streams to provide a complete I/O system. Three of them are described here.

The File class abstracts files in a platform-independent way. Given a filename, it can respond to queries about the type, status, and properties of a file or directory in the file system.

A RandomAccessFile is created given a file, a filename, or a file descriptor. It combines in one class implementations of the DataInput and DataOutput interfaces, both tuned for "random access" to a file in the file system. In addition to these interfaces, RandomAccessFile provides certain traditional UNIX-like facilities, such as seeking to a random point in the file.

Finally, the StreamTokenizer class takes an input stream and produces a sequence of tokens. By overriding its various methods in your own subclasses, you can create powerful lexical parsers.

You can learn more about any and all of these other classes from the full (online) API descriptions in your Java release.

Object Serialization (Java 1.1)

A topic to streams, and one that will be available in the core Java library with Java 1.1, is object serialization. Serialization is the ability to write a Java object to a stream such as a file or a network connection, and then read it and reconstruct that object on the other side. Object serialization is crucial for the ability to save Java objects to a file (what's called object persistence), or to be able to accomplish network-based applications that make use of Remote Method Invocation (RMI)-a capability you'll learn more of on Day 27, "The Standard Extension APIs."

At the heart of object serialization are two streams classes: ObjectInputStream, which inherits from DataInputStream, and ObjectOutputStream, which inherits from DataOutputStream. Both of these classes will be part of the java.io package and will be used much in the same way as the standard input and output streams are. In addition, two interfaces, ObjectOutput and ObjectInput, which inherit from DataInput and DataOutput, respectively, will provide abstract behavior for reading and writing objects.

To use the ObjectInputStream and ObjectOutputStream classes, you create new instances much in the same way you do ordinary streams, and then use the readObject() and writeObject() methods to read and write objects to and from those streams.

ObjectOutputStream's writeObject() method, which takes a single object argument, serializes that object as well as any object it has references to. Other objects written to the same stream are serialized as well, with references to already-serialized objects kept track of and circular references preserved.

ObjectInputStream's readObject() method takes no arguments and reads an object from the stream (you'll need to cast that object to an object of the appropriate class). Objects are read from the stream in the same order in which they are written.

Here's a simple example from the object serialization specification that writes a date to a file (actually, it writes a string label, "Today", and then a Date object):

FileOutputStream f = new FileOutputStream("tmp");
ObjectOutput  s  =  new  ObjectOutputStream(f);
s.writeObject("Today");
s.writeObject(new Date());
s.flush();

To deserialize the object (read it back in again), use this code:

FileInputStream in = new FileInputStream("tmp");
ObjectInputStream s = new ObjectInputStream(in);
String today = (String)s.readObject();
Date date = (Date)s.readObject();

One other feature of object serialization to note is the transient modifier. Used in instance variable declarations as other modifiers are, the transient modifier means that the value of that object should not be stored when the object is serialized-that its value is temporary or will need to be re-created from scratch once the object is reconstructed. Use transient variables for environment-specific information (such as file handles that may be different from one side of the serialization to the other) or for values that can be easily recalculated to save space in the final serialized object.

To declare a transient variable, use the transient modifier the way you do other modifiers such as public, private, or abstract:

public transient int transientValue = 4;

At the time of this writing, object serialization is available as an additional package for Java 1.0.2 as part of the RMI package. You can find out more about it, including full specifications and downloadable software, from the Java RMI Web site at http://chatsubo.javasoft.com/current/.

Summary

Today you have learned about the general idea of streams and have met input streams based on byte arrays, files, pipes, sequences of other streams, and string buffers, as well as input filters for buffering, typed data, line numbering, and pushing-back characters.

You have also met the analogous brother output streams for byte arrays, files, and pipes, output filters for buffering and typed data, and the unique output filter used for printing.

Along the way, you have become familiar with the fundamental methods all streams understand (such as read() and write()), as well as the unique methods many streams add to this repertoire. You have learned about catching IOExceptions-especially the most useful of them, EOFException.

Finally, the twice-useful DataInput and DataOutput interfaces formed the heart of RandomAccessFile, one of the several utility classes that round out Java's I/O facilities.

Java streams provide a powerful base on which you can build multithreaded, streaming interfaces of the most complex kinds, and the programs (such as HotJava) to interpret them. The higher-level Internet protocols and services of the future that your applets can build on this base are really limited only by your imagination.

Q&A

Q:
In an early read() example, you did something with the variable byteOrMinus1 that seemed a little clumsy. Isn't there a better way? If not, why recommend the cast later?
A:
Yes, there is something a little odd about those statements. You might be tempted to try something like this instead:

while ((b = (byte) s.read()) != -1) {
    . . .    // process the byte b
}

The problem with this shortcut occurs if read() returns the value 0xFF (0377). Because of the way values are cast, it will appear to be identical to the integer value -1 that indicates end of stream. Only saving that value in a separate integer variable, and then casting it later, will accomplish the desired result. The cast to byte is recommended in the note for slightly different reasons than this, however-storing integer values in correctly sized variables is always good style (and besides, read() really should be returning something of byte size here and throwing an exception for end of stream).

Q:
What input streams in java.io actually implement mark(), reset(), and markSupported()?
A:
InputStream itself does-and in their default implementations, markSupported() returns false, mark() does nothing, and reset() throws an exception. The only input stream in the current release that correctly supports marking is BufferedInputStream, which overrides these defaults. LineNumberInputStream actually implements mark() and reset(), but in the current release, it doesn't answer markSupported() correctly, so it looks as if it does not.
Q:
Why is available() useful, if it sometimes gives the wrong answer?
A:
First, for many streams, it gives the right answer. Second, for some network streams, its implementation might be sending a special query to discover some information you couldn't get any other way (for example, the size of a file being transferred by ftp). If you are displaying a "progress bar" for network or file transfers, for example, available() will often give you the total size of the transfer, and when it does not-usually by returning 0-it will be obvious to you (and your users).
Q:
What's a good example of the use of the DataInput/DataOutput pair of interfaces?
A:
One common use of such a pair is when objects want to "pickle" themselves for storage or movement over a network. Each object implements read and write methods using these interfaces, effectively converting itself to a stream that can later be reconstituted "on the other end" into a copy of the original object.