MICROSOFT EXCEL FILE FORMAT
Microsoft Excel is a popular spreadsheet. It uses a file format called BIFF (Binary
File Format). There are many types of BIFF records. Each has a 4 byte header. The
first two bytes are an opcode that specifies the record type. The second two bytes
specify record length. Header values are stored in byte-reversed form (less significant
byte first). The rest of the record is the data itself (Figure 2-1).
Figure 2-1. BIFF record header.
| Record Header | Record Body
Byte Number | 0 1 2 3 | 0 1 ...
-----------------------------------
Record Contents | XX | XX | XX | XX | XX | XX | ...
-----------------------------------
| opcode | length | data
Each X represents a hexadecimal digit
Two X's form a byte. The least significant (low) byte of the opcode is byte 0 and the
most significant (high) byte is byte 1. Similarly, the low byte of the record length
field is byte 2 and the high byte is byte 3.
BOF (Beginning of File)
The first record in every spreadsheet is always of the BOF type (Figure 2-2).
Figure 2-2. BOF record.
| Record Header | Record Body |
Byte | 0 1 2 3 | 0 1 2 3 |
-----------------------------------------
Contents | 09 | 00 | 04 | 00 | 02 | 00 | 10 | 00 |
-----------------------------------------
| opcode | length | version | file |
| | | number | type |
The first two bytes, arranged with the low byte first, show that the opcode for BOF is
09h. The second two bytes indicate that the record body is 4 bytes long. The first two
bytes of the body are the version number (2 for the initial version of Excel). The last
two bytes are the file type. Type 10h is a worksheet file.
Relating Spreadsheet Cells to Record Data Bytes
A spreadsheet appears on a screen or printout as a matrix of rectangular cells. Each
column is identified by a letter at its top, and each row is identified by a number.
Thus cell A1 is in the first column and the first row. Cell C240 is in the third column
and the 240th row. This scheme identifies cells in a way easily understood by people.
However, it is not particularly convenient for computers, as they do not handle letters
efficiently. They are best at dealing with binary numbers. Thus, Excel stores cell
identifiers as binary numbers, that people can read as hexadecimal. The first number in
the system is 0 rather than 1.
Figure 2-3, which shows the form of an INTEGER record, illustrates the storage of column
and row information.