Write Your Own Operating System Tutorial
Lesson 1: The Boot Sector
In this lesson we’ll learn about the contents of the boot sector so that we can learn to write our own boot program.
When the computer boots from a floppy, BIOS (Basic Input/Output System) reads the disk and loads the first sector into memory at address 0000:7C00. This first sector is called the DOS Boot Record (DBR). BIOS jumps to the address 0x7C00 and begins executing instructions there. It is these instructions (the “boot loader”) that will load the operating system (OS) into memory and begin the OS’s boot process.
The first thing to do is to take a look inside the Boot Record. The DOS utility DEBUG is a widely available tool that can be used to view the contents of memory and disks. We’ll use DEBUG to look at a floppy disk’s Boot Record.
At a DOS (or Windows) command prompt type debug. This will leave you with just a hyphen as a prompt. If you enter letter ‘d’ as a command and press Enter, it will show you a portion of the contents of RAM. Typing the question mark as a command will give you a list of all the available commands in DEBUG. (Be very careful when using the DEBUG utility. This utility can be used to overwrite data on any disk drive, possibly causing loss of data.)
Place a freshly formatted disk in the A: drive. To load the Boot Record off your floppy disk, type the following command.
-l 0 0 0 1
(The first character is the letter ‘l’, not the number ‘1’.) This command loads sectors off a disk into a portion of RAM. The 4 numbers after the ‘l’ represent in order, the beginning address where you want the data loaded, the drive number (0 for first floppy driver), the first sector on the disk to load, and how many sectors to load. Typing this command will load the first sector of the floppy into memory starting at address 0.
Now that we have the Boot Record loaded into memory, we want to view its contents. Type the following command.
-d 0
What you see are 8 lines that represent the first 128 (0x80 in hex) bytes in the floppy’s Boot Record. The results (for my floppy disk) are the following.
0AF6:0000 EB 3C 90 4D 53 44 4F 53-35 2E 30 00 02 01 01 00 .<.MSDOS5.0.....
0AF6:0010 02 E0 00 40 0B F0 09 00-12 00 02 00 00 00 00 00 ...@............
0AF6:0020 00 00 00 00 00 00 29 F6-63 30 88 4E 4F 20 4E 41 ......).c0.NO NA
0AF6:0030 4D 45 20 20 20 20 46 41-54 31 32 20 20 20 33 C9 ME FAT12 3.
0AF6:0040 8E D1 BC F0 7B 8E D9 B8-00 20 8E C0 FC BD 00 7C ....{.... .....|
0AF6:0050 38 4E 24 7D 24 8B C1 99-E8 3C 01 72 1C 83 EB 3A 8N$}$....<.r...:
0AF6:0060 66 A1 1C 7C 26 66 3B 07-26 8A 57 FC 75 06 80 CA f..|&f;.&.W.u...
0AF6:0070 02 88 56 02 80 C3 10 73-EB 33 C9 8A 46 10 98 F7 ..V....s.3..F...
At first glance, this doesn’t tell me much. I can see that it looks like this is a MS-DOS 5.0 disk with no name and a FAT12 file system. The numbers in the far left column show the memory addresses in RAM. The hexadecimal numbers in the middle show all the bytes in this portion of memory, and the column on the right shows the ASCII characters that the hex bytes represent (a period is shown if the byte does not translate to any visible character). Some of the bytes you see in this portion of the Boot Record are parts of instructions in the boot loader, and some of them hold information about the disk such as the number of bytes per sector, the number of sectors per track, etc…
Now it’s time to take a glance at the code for the boot loader. Type the following command.
-u 0
This performs an “unassemble” operation. This shows us the same bytes as before (starting with address 0), but this time DEBUG shows us the Intel instructions that these bytes represent. The results for my floppy are the following.
0AF6:0000 EB3C JMP 003E
0AF6:0002 90 NOP
0AF6:0003 4D DEC BP
0AF6:0004 53 PUSH BX
0AF6:0005 44 INC SP
0AF6:0006 4F DEC DI
0AF6:0007 53 PUSH BX
0AF6:0008 352E30 XOR AX,302E
0AF6:000B 0002 ADD [BP+SI],AL
0AF6:000D 0101 ADD [BX+DI],AX
0AF6:000F 0002 ADD [BP+SI],AL
0AF6:0011 E000 LOOPNZ 0013
0AF6:0013 40 INC AX
0AF6:0014 0BF0 OR SI,AX
0AF6:0016 0900 OR [BX+SI],AX
0AF6:0018 1200 ADC AL,[BX+SI]
0AF6:001A 0200 ADD AL,[BX+SI]
0AF6:001C 0000 ADD [BX+SI],AL
0AF6:001E 0000 ADD [BX+SI],AL
The first instruction says to jump to address 0x3E. The bytes after this are the data about the disk I mentioned before and do not really correspond to instructions, but DEBUG does its duty and tries to interpret them as such.
The first instruction jumps over this data to the boot program code that follows starting at address 0x3E. Let’s look at the instructions there. Type
-u 3E
Here you can see the beginning of the code that will load the DOS (or Windows) operating system. This code (for MS-DOS) looks on the disk for the files IO.SYS and MSDOS.SYS. These files contain the code for the operating system. The boot loader code will load these files into memory and begin executing them. If the files are not found on the disk, then the boot loader will display the famous error message.
Invalid system disk
Disk I/O error
Replace the disk, and then press any key
This message can be seen if you look towards the end of the DOS Boot Record. You can see this on my floppy below.
-d 180
0AFC:0180 18 01 27 0D 0A 49 6E 76-61 6C 69 64 20 73 79 73 ..'..Invalid sys
0AFC:0190 74 65 6D 20 64 69 73 6B-FF 0D 0A 44 69 73 6B 20 tem disk...Disk
0AFC:01A0 49 2F 4F 20 65 72 72 6F-72 FF 0D 0A 52 65 70 6C I/O error...Repl
0AFC:01B0 61 63 65 20 74 68 65 20-64 69 73 6B 2C 20 61 6E ace the disk, an
0AFC:01C0 64 20 74 68 65 6E 20 70-72 65 73 73 20 61 6E 79 d then press any
0AFC:01D0 20 6B 65 79 0D 0A 00 00-49 4F 20 20 20 20 20 20 key....IO
0AFC:01E0 53 59 53 4D 53 44 4F 53-20 20 20 53 59 53 7F 01 SYSMSDOS SYS..
0AFC:01F0 00 41 BB 00 07 60 66 6A-00 E9 3B FF 00 00 55 AA .A...`fj..;...U.
This shows the very end of the Boot Record. The Boot Record is exactly one sector (512 bytes) on the disk. If it is loaded into memory starting with address 0, then the last byte will be in address 0x1FF. If you look at the last two bytes of the Boot Record (0x1FE and 0x1FF), you will notice that they are 0x55 and 0xAA. The last two bytes of the Boot Record must be set to these values or else BIOS will not load the sector and begin executing it.
So, to recap, the DOS Boot Record starts with an instruction to jump over the data that follows that instruction. These 60 bytes of data starts at address 0x02 and ends on 0x3D, with the boot code resuming at 0x3E and going all the way to 0x1FD, which is followed by the two bytes, 0x55 and 0xAA. In the next lesson we will use this knowledge to start making our own book program.
Lesson 2: Making Our First Bootable Disk
In this lesson, we will learn how to create a boot program on a floppy disk. We will start by modifying the Microsoft DOS Boot Record.
For our purposes, we want to replace the boot loader code without changing the other data in the boot sector. If we change the data to something invalid, then DOS and Windows will not recognize the disk as being valid. Windows will give an error saying the disk is not formatted. This will cause you to be unable to access any of the files on the disk. However, we can change the boot program code all we want and, as long as we don’t mess with the other data, DOS and Windows will be able to read and write the files on the disk just fine.
We will leave the first instruction (jmp 0x3E) alone, because we need to jump over the Boot Record data. Thus we can begin modifying the code at 0x3E. Run the DOS DEBUG program and load the first sector of a formatted floppy disk into memory at address 0. Then type the command
-u 3E
to view the instructions there. Now, we will begin modifying the code. Type the command
-a 3E
to begin assembling instructions. The prompt changes from a hyphen to the address at the location that we gave. Type the following instruction and press enter.
jmp 3E
The instruction is assembled to machine code and placed into memory, and the following prompt is the next available memory after the instruction you just entered. Press Enter once more to exit the assembly mode. The whole procedure on my computer looked like this.
-a 3E
0AFC:003E jmp 3E
0AFC:0040
-
The segment address (0x0AFC, in my case) can (an probably will be) different on your computer, or even between different sessions of DEBUG. Now view the instruction you just entered by giving the unassembled command.
-u 3E
As you can see, the first instruction is now our jump instruction. This will create an infinite loop. If we quit DEBUG now, no changes will be saved, but we can now write our modified boot sector back to the disk (overwriting the previous one) by typing this command.
-w 0 0 0 1
This "write" command uses the same syntax as the "load" command. This writes the data found at memory address 0 to disk 0, starting with sector 0 and writing 1 sector. Be very careful when using the write command. This command can be used to overwrite sectors on any drive, and cause loss of data.
You can now boot with this floppy. When you boot, BIOS will load the first sector off the disk into memory and begin execution at the beginning of the sector. This will be the jump to 0x3E instruction. The instruction there is one to jump to 0x3E, so this will continue forever. Try it. Boot up a computer with this disk. Nothing appears to happen. The computer will just sit there and do nothing. But your new “operating system” is running.
Okay, I know what you’re saying, you want to see some sign that the code you wrote is actually does running and you haven’t done something to mess up your computer. In order to do this, we are going to make function calls to BIOS (at least at first). As of the time of this writing, you can find a short list of BIOS function calls at http://users.win.be/W0005997/GI/biosref.html. A longer list of software interrupts can be found at http://burak1.virtualave.net/Interrup.txt, but keep in mind that some of those interrupts are BIOS calls, while others are MS-DOS calls which cannot be used since, of course, MS-DOS is not running. You would have to implement those functions yourself before using them.
We are going to use interrupt 0x10, function 0x0E to write a character to the screen. The registers must be set as follows.
AH = 0x0E
AL = ASCII code of the character to be printed
BL = color/style of character
Now, repeat the instructions in this lesson, only instead of entering the jump instruction as we did before, this time enter the following instructions.
-a 3E
0AF6:003E mov ah, 0e
0AF6:0040 mov al, 48
0AF6:0042 mov bl, 07
0AF6:0044 int 10
0AF6:0046 jmp 46
0AF6:0048
-
First we set AH to zero, AL to 0x48 (ASCII for the letter ‘H’), BL to 7 (color code for white-on-black), and then we call interrupt 0x10, which handles the video controller. The last instruction creates an infinite loop like before, so things stop there. Save the modified boot sector to a disk (-w 0 0 0 1) and try booting with the disk. This time you should see the character ‘H’ printed on the screen before the system hangs.
Play around with this for a while. You can repeat the code for printing a character multiple times to print a phrase, or you can try out other software interrupts. When you are done, continue on to the next lesson where we will learn to use a full-blown assembler to write our programs rather than DEBUG.
Lesson 3: NASM
In this lesson we will learn to use an assembler to write our programs. In previous lessons we have assembled them using DEBUG. After playing around with this for a while, you will quickly see that it would be a pain to use DEBUG to create a program of more than a handful of instructions (even harder to modify). We need a simpler way. We will start by using the “Netwide Assembler” (NASM). Go to the official web page at http://www.octium.net/nasm/ and download a copy of the assembler.
Now we’ll use this assembler to create the same “operating system” that we did at the end of Lesson 2. Download the boot program h.asm and take a look at it. The first instruction should be somewhat familiar by now. This is the instruction to jump over the Boot Record data. In this case, it’s a jump to the label begin. After the jump instruction is 20 bytes of data. This is the data that I read off my floppy disk using the DEBUG program. These values should work fine. If you want, try replacing the data with the data from your own disk. Most of it should be the same.
(NOTE: Keep in mind that numbers made up of more than 1 byte will look “byte swapped” when viewed in DEBUG because on the Intel architecture, the least significant byte is stored at the lowest memory address and vice versa. The bytes will look backwards.)
The code starting at the label begin should look similar to the code we wrote for Lesson 2. It simply prints the letter ‘H’ to the screen and loops forever. At the bottom of this file you will see first a check to make sure the code all fits within 512 bytes (the size of one sector), then the line beginning with the word “times” adds zeros to the end of the file to pad the executable to 510 bytes. Finally, the two-byte signature 0x55, 0xAA is added to the end of the file. Assemble the file at the command prompt with the following command.
nasmw h.asm –o h.bin
This assembles the assembly file to a pure binary executable h.bin. Check the size of the binary file. If we have done things right, it should be exactly 512 bytes. This is exactly the size needed to fit in the boot sector of the floppy.
Now we need to copy this file onto our floppy. With the floppy in the disk, run DEBUG, and type the following commands
debug
-n h.bin
-l 0
This loads our file into memory starting at address 0. Use the dump (d) and unassemble (u) commands if you wish to confirm that our file has assembled and been loaded correctly. You will be able to see the few instructions that we have written. Notice that the file has been correctly padded with zeros up until bytes 0x1FE and 0x1FF at the end of the sector. Also note that DEBUG fills the CX register with the number of bytes loaded from the file. (Display the contents of the registers with the r command.)
With the floppy disk in the drive, write the file to the disk with the usual command.
-w 0 0 0 1
Reboot the computer with the floppy and see the program in action. Try some more things with the source code, now. For example, maybe modify the code to print more characters. When you are ready, proceed to the next lesson where we will create a “Hello, World” operating system.
Lesson 4: Hello, World
Now is the time you’ve all been waiting for. Finally we get to the classic “first” program. Every decent programming book has a “Hello, World” program, and now we know enough to make a “Hello, World” operating system. If you have done some experimenting on your own and have already done this, then you may want to skip this lesson. We will create a function to print a string and use it to display our message.
It will get tedious to print one character at a time to the screen, so we’ll create a function to print a zero-terminated string to the screen. This is just a simple loop that prints all the character in a string one at a time.
; ---------------------------------------------
; Print a null-terminated string on the screen
; ---------------------------------------------
putstr:
lodsb ; AL = [DS:SI]
or al, al ; Set zero flag if al=0
jz putstrd ; jump to putstrd if zero flag is set
mov ah, 0x0e ; video function 0Eh (print char)
mov bx, 0x0007 ; color
int 0x10
jmp putstr
putstrd:
retn
Now, a little on how to use this function. First you have to load the address of the first character of the string into the register SI. Then simply call this subroutine putstr.
You can create a string like this in your assembly file.
msg db 'Hello, World!', 0
The zero on the end terminates the string. Then you can print the string to the screen using the following instructions.
mov si, msg ; Load address of message
call putstr ; Print the message
There is just one more thing that needs to be set up before this will work. The address msg, loaded into the register SI, is actually an offset off the beginning of the beginning of the segment that is pointed to by the register DS. So, before you can use the address msg, you must set up the current data segment. For now, we will use flat addressing from the bottom of physical RAM. To set the data segment to start from the bottom, set the DS register to zero. The following two instructions will do this.
xor ax, ax ; Zero out ax
mov ds, ax ; Set data segment to base of RAM
Try putting all of these parts together using the file h.asm from Lesson 3 as a starting point. Then, using the same method described in Lesson 3, assemble your file, copy it to your floppy disk and boot with it. Have fun. If you get stuck, you can look at my solution, helowrld.asm, but it’s no fair peeking until you’ve tried!
Once you have finished, proceed to the next lesson where we will learn how to make our operating system interactive.
Lesson 5: Let’s Make It Interactive
All of this printing stuff to the screen is fun, but no operating system would be any good at all if it did not provide any interactivity. Let’s make it read input from the keyboard. Again we will be using calls to a function in BIOS to read the keyboard.
We are going to be using function 0, interrupt 0x16. This is done easily with the following two instructions.
xor ah, ah ; we want function zero
int 0x16 ; wait for a keypress
This function causes the computer to pause and does not return until a key is pressed. This can be used in a “Press any key to continue” situation, or also if you want to get input from the user. The scan code of the key pressed will be returned in register AH, and the ASCII code is returned in AL.
Your assignment for this lesson is to write a simple boot program that demonstrates a bit of interactivity. Perhaps it could print a message each time a key is pressed. Or maybe allow the user to type at the keyboard and echo each character to the screen as it is typed.
If you get stuck, here is an example of my own. But it's no fair peeking until you’ve tried by yourself!
In the next lesson we will learn how to make our operating system larger than the single sector of the Boot Record.
Lesson 6: Boot Loader
Everything we’ve done so far has been placed entirely inside the boot sector. We can’t make our operating system very big at all if it is to fit in one sector. We need a way of expanding. We will do this by making a boot program that simply loads an executable file off the disk and begins executing it. This is called a boot loader. This file loaded off the disk can be as big as we want, since it will not be constrained to one sector.
This is more difficult than anything else we’ve done so far. It might be a good idea, now, to locate a reference on the FAT file system (or the file system of your choice, but I will be assuming the use of the FAT system). I will give a brief overview of the boot loading process.
A floppy disk contains, in this order, the DOS Boot Record (the first sector we have been working with), the File Allocation Table (FAT), the Root Directory, and then the data contained in the files on the disk. (A hard disk is more complicated. It has a Master Boot Record and multiple partitions.) Suppose we write an operating system, compile/assemble it to a file named LOADER.BIN, and place it on the disk. The boot loader will load it as follows.
The DOS Boot Record (DBR) is read to determine the size of the DBR, FAT, and Root Directory. The location of each on the disk is then determined.
The Root Directory is read in to memory.
The Root Directory is searched for the file name LOADER.BIN. If found, we can look in the directory entry to find out which is the file’s first cluster (file allocation unit). If not found, we give an error message.
The File Allocation Table is read off the disk in to memory.
Starting with the file’s first cluster, we use the FAT to locate all the clusters belonging to the file. We read them all off the disk into memory at a specific location.
We jump to that location to begin execution of the operating system.
All of the reading from the disk will be done using calls to BIOS. If you feel adventurous, use a reference of BIOS functions to learn how to read sectors from the disk and try writing your own boot loader. Otherwise, I have provided a slightly modified version of John S. Fine’s FAT12 bootstrap loader. If you can find a copy of his utility “partcopy,” then use his compiling and installing instructions (and let me know where to find it). Otherwise, copy the boot loader to the floppy disk using the same method we have used in the previous lessons.
There are many user-adjustable settings in John Fine’s bootstrap loader. His loader assumes the use of a FAT12 file system (the system that is used on floppy disks). For another system, you will need to use a different loader. Things you can adjust are the locations where the operating system and various FAT data structures will be loaded into memory. You can also adjust the filename (of the operating system) that the loader loads.
By default, the loader loads a file named LOADER.BIN in the root directory (if one exists) into memory starting at address 0x1000:0000. (This is adjustable by the %define IMAGE_SEG). Thus you can compile/assemble an operating system and copy it to the floppy disk as a file named LOADER.BIN.
As an example, we will take the Hello, World operating system from Lesson 4 and run it with our boot loader. We cannot use the exact same source file from Lesson 4, however. We need to make a few changes. First, we need to take into account that this file will now be loaded into a different location in memory (0x1000:0000 instead of 0000:7C00), and, secondly, we can get rid of the DOS Boot Record data.
We can start the code by setting up the data and stack segments and the stack pointer. We will do this as shown below. The current code segment is in the CS register, and the static data assembled into the executable is here, so we will use this as the data segment as well. For now, we will use this as the stack segment, but we will likely want to change this in the future.
mov ax, cs ; Get the current segment
mov ds, ax ; The data is in this segment
cli ; disable interrupts while changing stack
mov ss, ax ; We'll use this segment for the stack too
mov sp, 0xfffe ; Start the stack at the top of the segment
sti ; Reenable interrupts
Finally, we can get rid of some lines at the bottom of the source code that adds the boot sector signature and the check to make sure the file is exactly 1 sector long. All of the other code should look familiar. The resulting file can be downloaded here: lesson6.asm.
Assemble this file and copy it to your disk using the following commands.
nasmw lesson6.asm –o lesson6.bin
copy lesson6.bin a:\LOADER.BIN
Then, assuming you have already installed the boot loader, you can go ahead and boot with the disk. Once you have this working, feel free to modify any other programs you have written in previous lessons, so that you can try loading them with this boot loader. Most of the following lessons will assume that you will be using this boot loader (or other boot loader of your choice) to load your operating system file(s).
Now we can make our operating system larger than a single sector.
Lesson 7: Start Saying Goodbye To BIOS
Now that we have a boot loader that will load our operating system, and thus can make our operating system larger than one sector, we can now begin to add some complexity to our system. One of the first things to do is to loosen our tie to BIOS. So far we have been using BIOS functions for all of our input and output. BIOS hides all of the input and output from us with its interface, so we don’t know exactly how it goes about performing its functions. BIOS can often be slower than handling I/O by ourselves, and in doing it ourselves, we can know exactly what is going on, thus giving us more power, control, and flexibility with the design of our operating system. Of course, my main reason for learning to perform the I/O myself it simply to see how it is done. If anyone has any arguments for or against the use of BIOS for I/O, let me know. Perhaps we could discuss it further.
The area we will begin with is that of text output to the screen. So far we have been using BIOS interrupt 0x10, function 0x0E. We will begin performing text output ourselves. Before we can even start to do this, we need to know a few things. First of all, video memory is mapped to main memory addresses starting at address 0xB0000 (flat address from base of the physical address space). The region of memory holding the text content of the screen starts at 0xB0000 for monochrome displays and 0xB8000 for color displays. Try the latter address first, and if you can’t get it working, perhaps try the former. I will proceed assuming the use of a color display.
The first byte (0xB8000) holds the ASCII code for the upper-leftmost character on the screen. The next byte (0xB8001) holds the color/style code for that character. These bytes are followed by two more bytes that hold the ASCII code and color/style code for the next character to the right. The video memory continues alternating ASCII code and style/color code to the end of the first row of text. The next bytes after this represent the first character on the second row, and so on for the rest of the screen.
So, all that is necessary to output text to the screen is to write ASCII codes into this region of memory. (This is referred to as memory-mapped I/O.) However, you will now need to keep track of the location of the cursor.
Speaking of the cursor, you can write characters to anywhere on the screen (anywhere in video memory). But it will look odd to the user if they are typing on the keyboard and characters are appearing one place on the screen and the little blinking cursor is elsewhere on the screen. The video controller chip (6845 on the IBM PC) takes care of drawing the blinking cursor on the screen; we just need to tell it where to move the cursor.
The 6845 video controller is connected to I/O port 0x3B4-0x3B5 for a mono display and 0x3D4-0x3D5 for a color display. As far as I can tell, the 6845 has various registers, and (assuming a color display) port 0x3D4 is used to indicate which port we would like to write to, and then we write the data to port 0x3D5. Registers 14-15 tell the 6845 where to draw the blinky cursor. The following is psuedo-code for moving the cursor.
outbyte(0x3D4, 14); // write to register 14 first
outbyte(0x3D5, (cursorpos>>8) & 0xFF); // output high byte
outbyte(0x3D4, 15); // again to register 15
outbyte(0x3D5, cursorpos & 0xFF); // low byte in this register
The cursor position (in cursorpos) is the character number, starting with 0 and number all the characters in the order that they are arranged in video memory. (The offset in video memory for a given cursor position is cursorpos*2 for the ASCII code and (cursorpos*2)+1 for the color/style code.)
Using this region of video memory to output characters and the I/O ports to tell the video controller where to draw the cursor, it is now your job to write a text driver for your operating system. Create a set of functions that you can call to output characters, strings, numbers, pointers, etc to the screen without using BIOS (this means no software interrupts). Make sure that your text driver handles scrolling the text on the screen upward before going off the bottom. Write a function to clear the screen.
Try allowing the user to type characters with the keyboard (you can still use BIOS for keyboard input for now), and echo each character to the screen as it is typed. If you have any troubles writing your text driver, let me know and perhaps I can give you some hints.
As far as the color/style byte is concerned, you can simply use 07 (white on black) for most purposes, but for those of you who are curious, I will explain the different color/style settings. The color/style of a character is one byte. Those 8 bits are used as follows.
Bits 3-0 : Foreground color
Bit
3
2
1
0
Color
Intensity
Red
Green
Blue
These four bits can be used in any and all 16 combinations. If bit 3 is 1, it indicates full intensity, 0 indicates half intensity. For example, 3 would be cyan (blue + green) while 11 would be bright cyan (intensity+blue+green).
Bit 5: Reverse Video
Bit 6: Black on colored background, color given by bits 3-0
Bit 7: Blinking text
For example a code of 0x2C (00101100bin) would be reverse video bright red.