Interfacing DJGPP with Assembly-Language Procedures
Written by Matthew Mastracci
Last Revised: 10/11/1997
---------------------------------------------------
Table of Contents
=---------------------------------------------------
Section
=---------------------------------------------------------------------------
1.0 Introduction
1.1 A few words
1.2 Where to get NASM
2.0 Assembly language with DJGPP
2.1 Inline assembly (AT&T style)
2.2 When and when not to use NASM
3.0 DJGPP and NASM
3.1 Introduction to NASM
3.2 Using NASM with makefiles
3.3 Using NASM with RHIDE
3.4 Getting used to NASM
3.5 A note on memory references
3.6 Returning 64-bit values
3.7 Name-mangling
3.8 Internal data structures
3.9 Exportable data structures
3.10 A note about labels
3.11 Accessing external symbols
4.0 Advanced NASM topics
4.1 Accessing real-mode interrupts
4.2 Direct memory access (to protected-mode memory)
4.3 Direct memory access (to real-mode memory)
4.4 malloc() and NASM
4.5 Hooking interrupts
4.6 _CRT0_FLAG_LOCK_MEMORY
4.7 Real-mode callback functions
4.8 Doubleword-aligned accesses
5.0 Contacting the author
5.1 Closing comments
5.2 Getting DJGPPASM.DOC
=---------------------------------------------------
1.0 - Introduction
=---------------------------------------------------
1.1 - A few words
=---------------------------------------------------
Most programmers have no trouble dealing with assembly language in
real mode. Your code is in one segment, your data is in another and you
have complete, unhindered access to the entire system. Things like
"general protection faults", "segment violations" and "page faults" seemed
like things only Windows programmers had to deal with.
In protected mode, however, things are different. Instead of
segments, we have to use selectors. You aren't allowed to write to any
memory location you please and the absolute addresses you took for granted
in real mode don't work in quite the same way. The purpose of this
tutorial is to ease the real to protected mode transition and help the
reader to grasp the important fundamental concepts that are important to
well-behaved assembly language.
1.2 - Where to get NASM
=---------------------------------------------------
The easiest place to get NASM from is Simtel. There is a large list of
Simtel mirrors in the DJGPP FAQ. It's in the "/pub/simtelnet/msdos/asmutl"
directory, with the name NASM???.ZIP (where ??? is the latest version
number).
=---------------------------------------------------
2.0 - Assembly language with DJGPP
=---------------------------------------------------
2.1 - Inline assembly (AT&T style)
=---------------------------------------------------
DJGPP allows you code full-blown inline assembly with one catch: you
have to use a semantically different set of opcodes referred to as
"AT&T-syntax." On top of that, Gas, DJGPP's assembler, is only used to
getting code straight from GCC, which means that it has very limited
syntax-checking and may not do exactly what you think you told it to.
To begin, let's look at the format of an inline-assembler statement in
a function:
unsigned short int AddFour(unsigned short int x)
{
unsigned int y;
__asm__ __volatile__(
"movw %w1, %%ax\n"
"addw $4, %%ax\n"
"movw %%ax, %w0\n"
: "=r" (y)
: "r" (x)
);
return y;
}
This function (as you may have guessed), adds four to a given
parameter and returns the value. You may, however, wonder about the
strange arrangement of registers and variables for the opcodes. Compared to
Intel-syntax asm, they're backwards! In essence, standard x86 opcodes
"act" from right to left, ie: mov ax, bx gets the value of ax from bx; add
ax, 4 adds 4 to ax. AT&T opcodes, however, acts from left to right. Here
are some examples:
Intel AT&T
mov bx, ax movw %%ax, %%bx
add ax, 4 addw $4, %%ax
Basically, the format of an inline assembly statement is:
__asm__ [__volatile__](
"opcodes" :
output-vars :
input-vars :
modified-regs
);
- "__asm__" instructs the compiler to treat the parameters of the
statement as pure assembly and to pass them to the assembler (Gas) as
written. ;
- "__volatile__" is an optional statement which instructs the compiler not
to move any of your opcodes around from where you place them or combine
them as it pleases. This is probably a good idea, as Gas is designed to
take pure GCC output and may take some unwanted liberties with the code;
- output-vars is a list of constraints indicating which variables will be
modified by the routine: use the format "=r" (x), where r is the type of x
and x is the output variable to be modified;
- input-vars is a list of constraints indicating which variables will be
used by the routine: use the format "r" (x), where r is the type of x and x
is the input variable to be referenced in the routine;
- modified-regs is a list of hard registers which are "clobbered" by the
function.
Inside the actual assembly statements, you'll notice that you don't
actually say "movw x, %%ax" or something like that. That's because the
compiler "renames" them, in the order you "mention" them. That means that
the first variable defined as an output variable will be "%w0", and, if it
exists, the next output variable will be "%w1", if it doesn't exists, the
first input variable will be "%w1" instead. In essence, the numbering
starts at output-vars at zero and counts up through the end of output-vars
and then through input-vars, incrementing by one each time.
Inline assembly is convenient in DJGPP and quite fast, but its
complexity in programming and general unreliability makes it difficult to
use for long programming tasks. In the next section, we will explore an
alternative, Intel-syntax-based method that works incredibly well with
DJGPP.
2.2 - When and when not to use NASM
=---------------------------------------------------
There are a few points to consider when choosing whether NASM will be
the best assembly-language compiler for you to use or even whether a
combination of the two methods would be better.
1. Using the __asm__ directive means that you can inline your assembly
language function, which you can't do with NASM. On top of that, you can
tell the compiler which registers you clobber and it can work around that.
2. GAS was only designed to take input from the compiler, not from the
programmer. There are a few things you need to watch out for, as its
error-checking is fairly minimal.
3. NASM follows the TASM/MASM format for assembly in most cases. If you're
used to these compilers, the adjustment period is much less, where as with
AT&T syntax, you have to practice a little more to get used to it.
4. NASM is a solid compiler that won't optimize behind your back. Each
instruction you enter is compiled exactly as you enter it. Also, NASM
supports MMX, which (as far as I can tell) isn't supported by GAS.
My personal preference is this: use NASM for your major assembly
routines (ie: the sprite-drawing routines in a graphic library or callback
functions) and use the __asm__ directive for minor functions that you want
to inline at any point in the future. If you're planning on creating a .S
file, you can probably save a lot of headache by just creating it as a .ASM
file and compiling it with NASM instead of GAS.
=---------------------------------------------------
3.0 - DJGPP and NASM
=---------------------------------------------------
3.1 - Introduction to NASM
=---------------------------------------------------
There is a freeware package named the NASM, Netwide ASseMbler,
project. It is a fully-functional assembler with most of the capabilities
of commercial products such as MASM or TASM, but (in the spirit of DJGPP
and other freeware compilers) costs you nothing. Obtain the latest version
of the package, create a %DJDIR%\NASM directory and unZIP the archive
inside the directory with the -d option. This will create the proper
directories for the program. Finally, either copy the NASM executable to
your %DJDIR%\BIN directory or add the directory to your path.
3.2 - Using NASM with makefiles
=---------------------------------------------------
If you use GNU's make to handle your projects, setting up NASM as a
compiler is fairly easy. For each file, the dependancies/method line
should look like this:
filename.o : filename.asm ; nasm -f coff filename.asm
You could also create a rule for making .o files from .asm files, to
save time:
%.o : %.asm ; nasm -f coff $<
3.3 - Using NASM with RHIDE
=---------------------------------------------------
If you want to include a NASM-compiled file in your project, follow
these steps for each .asm file:
1. Open the project window;
2. Select the .asm file you want to compile with NASM;
3. Hit Ctrl-O for the file's local options;
4. Select "User" for compiler;
5. Enter "nasm -f coff $(SOURCE_FILE)" in the "Compiler" text area; and
6. Set the error-checking to "built-in C-compiler".
This should work for all versions of RHIDE, starting at version 1.2.
In future versions, the author may have implemented a new system for using
external compilers. Check the program updates for more information.
3.4 - Getting used to NASM
=---------------------------------------------------
To begin, let's create a sample assembly language program:
nasmtest.asm:
[BITS 32]
[GLOBAL _AddFour__FUi]
[SECTION .text]
; ---------------------------------------------------------------------------
; Prototype: unsigned int AddFour(unsigned int x);
; Returns: x + 4
; ---------------------------------------------------------------------------
x_AddFour equ 8
_AddFour__FUi:
push ebp
mov ebp, esp
mov eax, [ebp + x_AddFour]
add eax, 4
mov esp, ebp
pop ebp
ret
Let's also create a C++ program to test it:
nasmtest.cc:
#include <stdio.h>
extern unsigned int AddFour(unsigned int);
int main(void)
{
printf("AddFour(4) = %i\n", AddFour(4));
return 0;
}
To make things easier, create a makefile like so:
nasmtest.mak:
nasmtest.exe : nasmtest.cc nasmtest.o ; gcc nasmtest.cc -o nasmtest.exe nasmtest.o -v -Wall
nasmtest.o : nasmtest.asm ; nasm -f coff nasmtest.asm
Type "make -f nasmtest.mak" to create the executable. When you
run it, it should print "AddFour(4) = 8".
Let's go through the assembler source:
[BITS 32]
This line instructs NASM that you want it to prefix 16-bit code and
default to 32-bit code, not the other way around. You must have this in
any functions designed to run in protected mode. If you don't have it (in
a version of NASM prior to 0.91) you'll probably generate a page fault (try
it), because all of your memory references will have the top word truncated
(ie: ebp = 0x47282567, chopped to 0x2567).
[GLOBAL _AddFour__FUi]
This line tells NASM that the label is to be exported and will be
accessable as an external symbol to other modules. There are other things
that can be placed here, but we will learn about those later.
[SECTION .text]
This defines the .text segment, which is the segment placed first in
the assembled file. It contains any executable code (exportable or not)
for the module. There are other types of segments which we will learn
about later.
; ---------------------------------------------------------------------------
; Prototype: unsigned int AddFour(unsigned int x);
; Returns: x + 4
; ---------------------------------------------------------------------------
These lines give the prototype of the function for your C++ function,
for easy reference. They aren't necessary, but help you with remembering
what your function takes as parameters and returns.
x_AddFour equ 8
As NASM has no direct support for parameter structures (as of yet), we
can define our function's parameters with this method. These are used for
referencing parameters placed on the stack by the calling function. Each
parameter will take a minimum of four bytes (padded with zeros), which
helps you keep your code 32-bit.
_AddFour__FUi:
This label declaration defines a function called "AddFour." There is,
however, a suffix which represents the parameters of the function. This
suffix is unique to C++ (the label would be "_AddFour" in C) to account for
external function overloading. Each suffix begins with "__F" to say that
it is, in fact, a function. Following this, there are characters
representing the data types (some of them):
Character Data Type
c char
d double
f float
i int
s short
v void
x long long int
In addition, there are prefixes you can add to the characters to
change them:
Prefix Meaning
U unsigned (ie: Uc is 'unsigned char')
P pointer (ie: Pi is 'int *')
R reference (ie: RUc is 'unsigned char &')
push ebp
mov ebp, esp
These lines are analagous to the "push bp/mov bp, sp" opcodes in
real-mode. They are used to preserve the 32-bit values of the caller's
stack frame.
mov eax, [ebp + x_AddFour]
add eax, 4
We load eax with the 32-bit unsigned int from the stack and add four
to it. Note that all function return values are returned in eax (extended
to edx if necessary).
mov esp, ebp
pop ebp
ret
This simply restores the caller's stack frame and returns to the
caller.
3.5 - A note on memory references
=---------------------------------------------------
In NASM, memory references work a little differently than they do in
most other assemblers. To specify the address of a variable (a symbol), you
don't use the "offset" function. Instead, you just specify the name of the
symbol as so:
mov esi, output_variable ; loads ESI with the *address* of output_variable
mov output_variable, esi ; illegal! trying to load into immediate value
If you want to access the contents of a memory location, put the
variable name in square brackets:
mov esi, [output_variable] ; loads ESI with the *value* of output_variable
mov [output_variable], eax ; loads output_variable with the contents of eax
That's all there is. If you're ever confused, just imagine that the
compiler is just replacing each of the symbols with its absolute constant
value (which it is, as a matter of fact). That way (if output_variable is
at offset 2362), this makes no sense:
mov 2342, esi
But these do make sense:
mov esi, 2342
mov [2342], eax
3.6 - Returning 64-bit values
=---------------------------------------------------
One of the great things about DJGPP is its built-in support for 64-bit
integers (signed and unsigned). How can we use these in NASM? Simple: we
just pass the low byte as we would pass a regular int, but we pass the high
byte in edx. Consider:
[BITS 64]
[GLOBAL _BigNum__FUiUi]
[SECTION .text]
; ---------------------------------------------------------------------------
; Prototype: unsigned long long int BigNum(unsigned int a, unsigned int b);
; Returns: unsigned 64-bit integer, with a as the high word and b as the low
; word
; ---------------------------------------------------------------------------
a_BigNum equ 8
b_BigNum equ 12
_BigNum__FUiUi:
push ebp
mov ebp, esp
mov edx, [ebp + b_BigNum]
mov eax, [ebp + a_BigNum]
mov esp, ebp
pop ebp
ret
Let's create a quick C++ program to test this:
#include <stdio.h>
extern unsigned long long int BigNum(unsigned int a, unsigned int b);
int main(void)
{
unsigned int a = 0x11111111, b = 0x22222222;
printf("The 64-bit integer made from 0x%0x and 0x%0x is 0x%016Lx (%Lu)", a,
b, BigNum(a, b), BigNum(a, b));
return 0;
}
The range of the 64-bit 'unsigned long long int' is from 0 to
18,446,744,073,709,551,615. That's 18 quintillion, or 18x10^18!
3.7 - Name-mangling
=---------------------------------------------------
As mentioned before, C++ functions are "mangled" to account for how
it allows the programmer to overload the parameters of a function. If you
don't want your functions names to have annoying trailers like "__Fv" or
"__FUiUicP11__dpmi_regs", there's another way to do it. If you load up
one of DJGPP's supplied include files, you'll notice this near the start:
#ifdef __cplusplus
extern "C" {
#endif
int foo(int bar);
void fx(void);
...
#ifdef __cplusplus
}
#endif
The extern "C" declaration tells the compiler that the function was
originally compiled with a regular C-compiler, not a fancy C++-compiler
that mangles your function names. This way, you can write your assembly
language functions will just a preceeding underscore (like _foo or _bar).
Before you get mad at C++ for mangling your simple names, take a
look at what C++ does to your complicated class names!
3.8 - Internal data structures
=---------------------------------------------------
Many of the functions you will write with the assembler will require
various variables that won't be seen outside the scope of the module. To
do this, let's create a new section below the function AddFour. Put this
at the end of nasmtest.asm:
[SECTION .data]
AddConstant dd 00000004h
Now modify AddFour like so:
push ebp
mov ebp, esp
mov eax, [ebp + x_AddFour]
add eax, [AddConstant]
mov esp, ebp
pop ebp
ret
The code now references data inside the data segment (everything in
.data is placed in the data segment referenced by the selector in ds).
With NASM, you must place the variable's name in square brackets to
indicate a reference to a memory location. If you forget the brackets, the
compiler will treat it as the actual offset of the variable, often leading
to unwanted side-effects.
3.9 - Exportable data structures
=---------------------------------------------------
What if we want a function to make a variable accessable to a module
it gets linked with? Let's add a version string to the module for the
calling program to read. Add this in the .data section, under AddConstant:
_VersionString db "NASMTEST.ASM - Version 0.0a", 00h
Now, at the top of the file, under the first [GLOBAL ...] declaration,
add:
[GLOBAL _VersionString]
This label is now exportable as a variable in the calling program.
Add the following lines to nasmtest.cc, under the declaration of AddFour:
extern char VersionString[];
And under the first printf() statement, add:
printf("VersionString = '%s'\n", VersionString);
Compile this, and you should see the exported string from the assembly
module. Note that the variable can be accessed internally as well.
Consider:
; ---------------------------------------------------------------------------
; Prototype: void NewVersion(char c);
; Returns: nothing
; ---------------------------------------------------------------------------
c_NewVersion equ 8
_NewVersion__Fc:
push ebp
mov ebp, esp
mov al, [ebp + c_NewVersion] ; mov eax, [ebp + c_NewVersion]
; would be just as valid, as the
; stack is padded with zeros
mov byte [_VersionString + 26], al ; 26 is the offset of the 'a'
; in VersionString from position
mov esp, ebp ; zero
pop ebp
ret
Add the following two lines as well:
NewVersion('g');
printf("VersionString = '%s'\n", VersionString);
Don't forget to export the function with [GLOBAL ...] and declare the
external function in your C++ source.
What if we wanted to offer a complicated structure? Add the following
declaration to nasmtest.cc:
extern struct {
unsigned int x, y;
char * VersionStringPointer;
} DataValues;
Now add the following in the .data section:
_DataValues:
dd 00000001 ; unsigned int x
dd 00000002 ; unsigned int y
dd _VersionString ; char * VersionStringPointer
; Loaded with offset of version string,
; effectively a pointer
And export it with:
[GLOBAL _DataValues]
Now you can check that the structure works:
printf("x = %i, y = %i\n", DataValues.x, DataValues.y);
// This line uses a char * instead of char like before
printf("VersionString = '%s'", DataValues.VersionStringPointer);
Other variable types may be passed back and forth in this method. Just
remember to ensure that the structures on both sides are exactly the same.
3.10 - A note about labels
=---------------------------------------------------
In essence, there are three types of labels within NASM:
1) Exportable references: labels which are declared in a [GLOBAL ...]
statement at the top of a file, accessable by other modules linked with
this one. These labels reference functions, structures and variables, and
begin with an underscore, ie:
_InitMode13h__Fv: ; Exportable function
; function code
_DataTable: ; Exportable structure
db 00h
db 00h
_XCoord dw 0000h ; Exportable variable
2) Internal references: labels referencing functions, structures and
variables which are private to the module are written without a prefix
character:
SaveRegs: ; Internal function
; function code
DescriptorTable: ; Internal structure
dw 0000h, 0000h
RAMPointer: dd 00000000h ; Internal variable
3) Internal jump targets: labels which will be the target of conditional
and unconditional jumps are written with a period as a prefix:
.LoopStart:
loop .LoopStart
jmp .DoneProc
.DoneProc:
Adhering to these guidelines will ensure your code works properly with
DJGPP and most debuggers.
3.11 - Accessing external symbols
=---------------------------------------------------
Your assembly language module can access public symbols from other
modules it gets linked with as well. Create two strings in the .data
section like so:
; Note that 0ah is used instead of \n, as \n is not interpreted by
; printf, but processed by the compiler instead.
PrintTemplate1 db "VersionString = '%s'", 0ah, 00h
PrintTemplate2 db "unsigned int Exported = 0x%08x", 0ah, 00h
Now create an assembler function to access some external symbols:
; ---------------------------------------------------------------------------
; Prototype: void PrintStrings(void);
; Returns: nothing
; ---------------------------------------------------------------------------
_PrintStrings__Fv:
push ebp
mov ebp, esp
; Remember that C pushes parameters from right to left, not
; left to right like Pascal!
; Note that we push the pointer for a string...
push dword _VersionString
push dword PrintTemplate1
call _printf
; ... but push the actual value of most everything else
push dword [_Exported]
push dword PrintTemplate2
call _printf
mov esp, ebp
pop ebp
ret
We'll have to tell NASM that we have a few external symbols, so it
doesn't give us any errors. Put these by the [GLOBAL ...] definitions:
[GLOBAL _PrintStrings__Fv]
[EXTERN _printf]
[EXTERN _Exported]
Now remove any code previously contain in main() and add the following
lines:
external void PrintStrings(void);
unsigned int Exported;
int main(void)
{
Exported = 0xdeadbeef;
PrintStrings();
return 0;
}
=---------------------------------------------------
4.0 - Advanced NASM topics
=---------------------------------------------------
4.1 - Accessing real-mode interrupts
=---------------------------------------------------
When you write your C++ code, you usually don't worry about having to
call a real-mode interrupt. Usually, you just define a register structure
and call __dpmi_int or one of the various other wrappers provided by DJGPP.
In assembly language, however, you don't have it quite so easy.
The easiest way is to call the DPMI server's function 0x03000, which
simulates a real-mode interrupt. Create a new file with the following:
inttest.asm:
[BITS 32]
[SECTION .text]
SaveRegs:
mov [SaveEAX], eax
mov [SaveEBX], ebx
mov [SaveECX], ecx
mov [SaveEDX], edx
mov [SaveESI], esi
mov [SaveEDI], edi
mov [SaveEBP], ebp
pushf
pop eax
mov [SaveFlags], ax
ret
RestoreRegs:
pushf
pop eax
mov ax, [SaveFlags]
push eax
popf
mov eax, [SaveEAX]
mov ebx, [SaveEBX]
mov ecx, [SaveECX]
mov edx, [SaveEDX]
mov esi, [SaveESI]
mov edi, [SaveEDI]
mov ebp, [SaveEBP]
ret
[SECTION .data]
; Register set structure for int 31h, function 3000h
RegSet
SaveEDI dd 00000000
SaveESI dd 00000000
SaveEBP dd 00000000
dd 00000000 ; Note: reserved
SaveEBX dd 00000000
SaveEDX dd 00000000
SaveECX dd 00000000
SaveEAX dd 00000000
SaveFlags dw 0000
SaveES dw 0000
SaveDS dw 0000
SaveFS dw 0000
SaveGS dw 0000
SaveIP dw 0000
SaveCS dw 0000
SaveSP dw 0000
SaveSS dw 0000
SaveRegs and RestoreRegs can now be called from your program to fill
and examine the values stored in the structure RegSet. Now you can simply
wrap an interrupt call like:
int 10h
like so:
; Pass our registers to the interrupt
SaveRegs
; Function 3000h: Simulate real-mode interrupt
mov ax, 3000h
; Interrupt to simulate: int 10h
mov bx, 0010h
; Copy zero bytes from our stack (you probably won't need otherwise)
mov cx, 0000h
; Load es with ds, so es:edi points to RegSet
push ds
pop es
mov edi, RegSet
; Do it
int 31h
; Get the return values back
RestoreRegs
As you can see, there is a lot of overhead involved in calling a
real-mode interrupt from protected-mode. The message? If you don't have
to, don't. Use these sparingly, in parts of your code that aren't time
critical.
Let's try this with a real-life example. Add these two export
declarations:
[GLOBAL _InitMode13h__Fv]
[GLOBAL _RestoreTextMode__Fv]
Next, add their respective functions:
; ---------------------------------------------------------------------------
; Prototype: void InitMode13h(void);
; Returns: nothing
; ---------------------------------------------------------------------------
_InitMode13h__Fv:
push ebp
mov ebp, esp
; ah = 00h, Sets video mode with int 10h to mode al
mov ax, 0013h
call SaveRegs
; Simulate real-mode interrupt
mov ax, 0300h
; Of int 10h
mov bx, 0010h
; Copy nothing from the stack
mov cx, 0000h
; Load es with ds, so es:edi points to RegSet
push ds
pop es
mov edi, RegSet
; Do it
int 31h
; Load results back into registers
call RestoreRegs
mov esp, ebp
pop ebp
ret
; ---------------------------------------------------------------------------
; Prototype: void RestoreTextMode(void);
; Returns: nothing
; ---------------------------------------------------------------------------
_RestoreTextMode__Fv:
push ebp
mov ebp, esp
; ah = 00h, Sets video mode with int 10h to mode al
mov ax, 0003h
call SaveRegs
; Simulate real-mode interrupt
mov ax, 0300h
; Of int 10h
mov bx, 0010h
; Copy nothing from the stack
mov cx, 0000h
; Load es with ds, so es:edi points to RegSet
push ds
pop es
mov edi, RegSet
; Do it
int 31h
; Load results back into registers
call RestoreRegs
mov esp, ebp
pop ebp
ret
And now create the test program:
inttest.cc:
#include <conio.h>
#include <go32.h>
#include <sys/farptr.h>
extern void InitMode13h(void);
extern void RestoreTextMode(void);
int main(void)
{
int x, y;
InitMode13h();
_farsetsel(_dos_ds);
for (y = 0; y < 100; y++) {
for (x = 0; x < 100; x++) {
_farnspokeb(0xa0000 + y * 320 + x, ((x * y) >> 2) % 256);
}
}
getch();
RestoreTextMode();
return 0;
}
This should produce a 100x100 square in the top left of your screen,
containing a neat-looking pattern. In a nutshell, that's how you can use
interrupts to perform tasks that are complicated, but aren't called enough
times to warrent optimization. You may have noticed that the routine isn't
quite instant however (if you compiled the program with optimizations off).
In the next section, we will explore ways to increase the speed of direct
memory accesses of all kinds.
4.2 - Direct memory access (to protected-mode memory)
=---------------------------------------------------
As we've seen earlier, data segment pointers can be accessed simply by
using square brackets (ie: mov ax, [DataSegmentWord]). In fact, this
applies to all standard pointers, including those obtained from malloc().
They are all 32-bit near pointers because, hey, in protected-mode,
everything is "near."
This simplifies buffer-to-buffer copying a little. You can copy a
large amount of memory (up to 256k) between two external malloc()'d
pointers by simply writing:
cld ; Forward copy, inc esi/edi
mov esi, [_PointerFrom] ; ds:esi is the source
mov edi, [_PointerTo] ; es:edi is the dest, assumes es = ds
mov ecx, Count ; Count is the actual number of bytes
rep movsd ; divided by 4 (ie: double-words)
Most of the time, you will find es equal to ds while executing your
code. Don't rely on this, however, unless you are absolutely sure. It's a
good idea to set es to ds at the start of the code, to ensure you don't end
up writing to another selector and causing a GPF or worse. If your code
can't afford to waste cycles (and is called repeatedly), consider setting
es to ds in another function and calling it once before the other calls.
Be sure that you aren't calling anything that may change the value of es
under your nose, though.
4.3 - Direct memory access (to real-mode memory)
=---------------------------------------------------
But what if you want to access something from real-mode linear memory?
Here's where it gets a little tricky. You can't simply pull a descriptor
for your particular segment from thin air and access it that way. DJGPP
provides a convenient way to get a descriptor for something like this,
however:
int __dpmi_segment_to_descriptor(int _segment);
You simply pass it the real-mode segment you want a descriptor for and
it returns the descriptor you can use to access it. That's all it takes.
Let's add a function to inttest.asm to do this for us:
; ---------------------------------------------------------------------------
; Prototype: int InitGraphics(void);
; Returns: -1 on error, else zero
; ---------------------------------------------------------------------------
_InitGraphics__Fv:
push ebp
mov ebp, esp
push dword 0a000h ; Push segment for function
call ___dpmi_segment_to_descriptor ; Note the triple underscore
cmp eax, -1 ; Check for error
jne .SelectorOkay ; No error, selector is valid
mov [VideoRAM], 0ffffh ; Error, make selector invalid
jmp .Done ; End function, will return -1
.SelectorOkay:
mov [VideoRAM], ax ; Load selector from ax
xor eax, eax ; Clear eax to return zero
.Done:
mov esp, ebp
pop ebp
ret
We'll also have to add a variable to the .data section:
VideoRAM dw 0000h
And add some declarations at the beginning:
[GLOBAL _InitGraphics__Fv]
[EXTERN ___dpmi_segment_to_descriptor]
Now that we have a descriptor that references segment 0xa000, we can
create a PutPixel routine on its way to being much faster than before:
; ---------------------------------------------------------------------------
; Prototype: void PutPixel(unsigned int x, unsigned int y,
; unsigned char Color);
; Returns: nothing
; ---------------------------------------------------------------------------
x_PutPixel equ 8
y_PutPixel equ 12
Color_PutPixel equ 16
_PutPixel__FUiUiUc:
push ebp
mov ebp, esp
mov eax, [ebp + y_PutPixel]
mov ebx, eax
shl eax, 6 ; Note: x shl 6 + x shl 8 = 320 * x
shl ebx, 8
add ebx, eax
add ebx, [ebp + x_PutPixel]
mov fs, [VideoRAM]
mov al, [ebp + Color_PutPixel]
mov [fs:ebx], al
mov esp, ebp
pop ebp
ret
Now our main procedure becomes:
int main(void)
{
int x, y;
InitGraphics();
InitMode13h();
for (y = 0; y < 100; y++) {
for (x = 0; x < 100; x++) {
PutPixel(x, y, ((x * y) >> 2) % 256);
}
}
getch();
RestoreTextMode();
return 0;
}
You can remove the #include<...> statements at the start of the file,
with the exception of conio.h. Once you've added the extern declarations,
it should compile and run like before.
One important aspect of __dpmi_segment_to_descriptor is that the
number of selectors available to it is finite. Thus, you should only call
the InitGraphics() function once (along with any others that use this
function), at the beginning of the program. In any case, calling it more
than once is redundant and will put a drain on resources and speed.
4.4 - malloc() and NASM
=---------------------------------------------------
One of the most important functions for managing allocated memory is
malloc(). It allows you to assign any pointer to a contigious block of
memory and access it like a structure, array, variable or even as a
function.
Let's look at an example:
alloctst.cc
#include <stdio.h>
extern void FillString(char * StringToFill);
int main(void)
{
char * TestString;
TestString = (char *)malloc(20);
FillString(TestString);
printf("TestString = '%s'\n", TestString);
free(TestString);
return 0;
}
alloctst.asm
[BITS 32]
[GLOBAL _FillString__FPc]
[SECTION .text]
; ---------------------------------------------------------------------------
; Prototype: void FillString(char * StringToFill);
; Returns: nothing
; ---------------------------------------------------------------------------
StringToFill_FillString equ 8
_FillString__FPc:
push ebp
mov ebp, esp
mov edi, [ebp + StringToFill_FillString]
mov esi, Filler
mov ecx, 20
rep movsb
mov esp, ebp
pop ebp
ret
[SECTION .data]
Filler db "<- Buffer filled ->", 00h
As you may have guessed, the program copies the filler (without any
sort of bounds checking) to the passed string and then prints it. In this
way, malloc() can be used for many different applications in your program.
In graphical applications, you may find these buffers useful for storing
sprites, sound and tile data. With a little bit of effort, you can create
dynamically-loaded drivers that you can load and unload into your program
as needed. This could be useful for creating a standard set of procedures
for sound output with a number of drivers that can be loaded for each
different sound card.
4.5 - Hooking interrupts
=---------------------------------------------------
Interrupt-hooking is an integral part of many different types of
programs, from timer-synchronization in games to serial port monitoring in
communication programs. Let's try a quick example. Create a makefile and
enter the following program:
hookint.asm:
[BITS 32]
[GLOBAL _TickHandler__Fv]
[GLOBAL _TickHandler__Size]
[EXTERN _Timer1]
[EXTERN _Timer2]
[EXTERN _Flag1]
[EXTERN _Flag2]
[EXTERN _TickCount]
[SECTION .text]
_TickHandler__Fv:
inc dword [_TickCount]
dec dword [_Timer1]
dec dword [_Timer2]
cmp dword [_Timer1], 0
jz .SetFlag1
jmp .TestFlag2
.SetFlag1:
mov dword [_Flag1], 1
.TestFlag2:
cmp dword [_Timer2], 0
jz .SetFlag2
jmp .Done
.SetFlag2:
mov dword [_Flag2], 1
.Done:
ret
; Calculate size of function by subtracting offsets
_TickHandler__Size dd $-_TickHandler__Fv
[SECTION .data]
hookint.cc:
// Adapted from the DJGPP test program "libc\go32\timer.c"
#include <stdio.h>
#include <pc.h>
#include <dpmi.h>
#include <go32.h>
#define LOCK_VARIABLE(x) _go32_dpmi_lock_data((void *)&x, (long)sizeof(x));
extern void TickHandler(void);
extern int TickHandler__Size;
int Timer1 = 1, Timer2 = 1, Flag1, Flag2, TickCount = 0;
int main()
{
_go32_dpmi_seginfo OldHandler, NewHandler;
printf("Grabbing timer interrupt...\n");
_go32_dpmi_get_protected_mode_interrupt_vector(8, &OldHandler);
_go32_dpmi_lock_code(TickHandler, (long)TickHandler__Size);
LOCK_VARIABLE(Timer1);
LOCK_VARIABLE(Timer2);
LOCK_VARIABLE(Flag1);
LOCK_VARIABLE(Flag2);
LOCK_VARIABLE(TickCount);
NewHandler.pm_offset = (int)TickHandler;
NewHandler.pm_selector = _go32_my_cs();
_go32_dpmi_chain_protected_mode_interrupt_vector(8, &NewHandler);
while (!kbhit())
{
if (Flag1)
{
printf("Timer 1 expired: %i\n", TickCount);
Flag1 = 0;
Timer1 = 5;
}
if (Flag2)
{
printf("Timer 2 expired: %i\n", TickCount);
Flag2 = 0;
Timer2 = 7;
}
}
getkey();
printf("Releasing timer interrupt...\n");
_go32_dpmi_set_protected_mode_interrupt_vector(8, &OldHandler);
return 0;
}
What does this program do? Let's examine it:
_go32_dpmi_get_protected_mode_interrupt_vector(8, &OldHandler);
This line reads the selector and offset of the previous handler for
INT 8 (the timer interrupt) into the structure OldHandler for use later.
_go32_dpmi_lock_code(TickHandler, (long)TickHandler__Size);
LOCK_VARIABLE(Timer1);
LOCK_VARIABLE(Timer2);
LOCK_VARIABLE(Flag1);
LOCK_VARIABLE(Flag2);
LOCK_VARIABLE(TickCount);
Here, we ensure that the handler doesn't get paged out from under our
noses and cause a page fault. As the function call for locking a variable
is fairly complicated to type repeatedly, we define a macro to save some
time. The size of the locked code is calculated at compile-time by our
variable in the source file and passed as an int for us to use.
NewHandler.pm_offset = (int)TickHandler;
NewHandler.pm_selector = _go32_my_cs();
_go32_dpmi_chain_protected_mode_interrupt_vector(8, &NewHandler);
We then get the selector/offset pair for our new handler and add it to
the interrupt chain for INT 8. Note that _go32_my_cs() returns the
selector for the program's code. At this point, a wrapper has been created
for our function so that it will execute every time the interrupt is
called. You won't need to terminate your routine with iret, as you
normally do, because the function to chain the vector will create a special
wrapper to ensure that the routine will run like normal. You cannot,
howver, use any special system functions while in your interrupt routine
(like printf, fopen, fread, etc.), as most of these are non-reentrant.
_go32_dpmi_set_protected_mode_interrupt_vector(8, &OldHandler);
Once we have finished with it, we can return the handler to its
previous state by passing the old handler's address we saved before.
Our protected-mode interrupt handler is quite simple. It increments
the variable TickCount for every timer tick and decrements Timer1 and
Timer2. If either of the two timers is equal to zero, it sets the flag
associated with the timer. The flags are latched, meaning that you must
zero them after you have dealt with them. This also means that if you
aren't able to catch one or more timer expiries (because the system is
busy), you'll only have to service it once.
Interrupt handlers can be quite complicated if necessary. The only
restriction is that you can't call any functions that aren't re-entrant.
This includes most of the system functions and any of yours that fall under
the same category. In most cases, you should stay within your function as
much as possible to prevent possible conflicts.
4.6 - _CRT0_FLAG_LOCK_MEMORY
=---------------------------------------------------
It is possible to lock all of your program's memory using one of the
CRT0 startup flags, _CRT_FLAG_LOCK_MEMORY. This flag effectively disables
virtual memory (disk swapping). It's a great feature if you want to write
a program that shouldn't be swapped to disk at any time (perhaps a game).
To use it, you simply add the following lines to your programs:
#include <crt0.h>
int _CRT0_STARTUP_FLAGS = _CRT0_FLAG_LOCK_MEMORY;
This will ensure that your code, data and allocated memory will all
be locked, without the need to use _go32_dpmi_lock_data(). The amount of
available memory will decrease, however.
4.7 - Real-mode callback functions
=---------------------------------------------------
Interrupt handlers are great for handling interrupts, but what if
you want a real-mode program to call one of your functions when a certain
event occurs? If you were in real-mode too, you could just get the segment
and offset of your function and pass it on to the other program and be done
with it. In protected mode, there's one more step: a wrapper. Just like
before, there's a great library function that does all the tough stuff for
you:
_go32_dpmi_allocate_real_mode_callback_wrapper_retf()
How do we use this? It's easy. Let's look at another example:
rmcbtest.cc:
#include <go32.h>
#include <dos.h>
#include <dpmi.h>
#include <conio.h>
#include <stdio.h>
#include <crt0.h>
int _CRT0_STARTUP_FLAGS = _CRT0_FLAG_LOCK_MEMORY;
/* To make sure the name doesn't get mangled */
extern "C" void MouseCallback(_go32_dpmi_registers * r);
extern volatile int MouseButtons;
extern volatile int MouseX;
extern volatile int MouseY;
_go32_dpmi_registers regs;
int main(void)
{
clrscr();
_go32_dpmi_seginfo info;
/* Set up the handler */
info.pm_offset = (int)MouseCallback;
_go32_dpmi_allocate_real_mode_callback_retf(&info, ®s);
__dpmi_regs r;
/* Set the horizontal range valid from 0 to 1000 */
r.x.ax = 0x07;
r.x.cx = 0;
r.x.dx = 1000;
__dpmi_int(0x33, &r);
/* Set the vertical range valid from 0 to 1000 */
r.x.ax = 0x08;
r.x.cx = 0;
r.x.dx = 1000;
__dpmi_int(0x33, &r);
/* Install the real-mode callback routine */
r.x.ax = 0x0c;
r.x.cx = 0x1f; /* 0x1f traps on movements and RMB/LMB presses */
r.x.dx = info.rm_offset;
r.x.es = info.rm_segment;
__dpmi_int(0x33, &r);
while (!MouseButtons) {
printf("(%i, %i)\n", MouseX, MouseY);
delay(250);
}
/* Clean up handler */
r.x.ax = 0x0c;
r.x.cx = 0;
r.x.dx = 0;
r.x.es = 0;
__dpmi_int(0x33, &r);
_go32_dpmi_free_real_mode_callback(&info);
printf("Mouse button pressed.");
return 0;
}
rmcbtest.asm:
[BITS 32]
[GLOBAL _MouseCallback]
[GLOBAL _MouseX]
[GLOBAL _MouseY]
[GLOBAL _MouseButtons]
[SECTION .text]
; ---------------------------------------------------------------------------
; Prototype: void MouseCallback(__dpmi_regs * r)
; Returns: nothing
; ---------------------------------------------------------------------------
Pointer_MouseCallback equ 8
_MouseCallback:
push ebp
mov ebp, esp
mov esi, [ebp + Pointer_MouseCallback]
xor eax, eax
mov ax, [esi + 16] ; offset of bx in __dpmi_regs
mov [_MouseButtons], eax
mov ax, [esi + 24] ; offset of cx in __dpmi_regs
mov [_MouseX], eax
mov ax, [esi + 20] ; offset of dx in __dpmi_regs
mov [_MouseY], eax
mov esp, ebp
pop ebp
ret
[SECTION .data]
_MouseX dd 00000000h
_MouseY dd 00000000h
_MouseButtons dd 00000000h
The code is fairly straight-forward and similar to the code we
created for handling real-mode interrupts. To set up the callback, we
set the pm_offset member of the info struct and pass it to the routine
allocation function (you know, the one with the excessively long name).
The function creates a wrapper and passes the wrapper's entry points
back in the rm_segment and rm_offset in the info struct.
You might have noticed that it also requires a global variable (this
is important: you can't give it any other type of variable). The function
then uses this variable and passes a pointer to it to your function. If
you write your function correctly, you can save yourself a lot of trouble
by passing it to a struct internal to your asm function. Be careful with
this, though, it could change at any time and break your programs.
To access the register struct, just load esi from [ebp + 8] and then
use [esi + n] to access the registers. You'll need to count the offset
manually from DPMI.H, but don't worry too much, there aren't very many
members.
4.8 - Doubleword-aligned accesses
=---------------------------------------------------
In some cases, on 386 machines an up, accessing memory aligned to a
word or double-word boundary is faster than an unaligned access. On higher-
end machines, double-word boundaries offer the greatest benefit. In NASM,
we can tell the assembler that we want to align an entire program segment to
a double-word boundary easily:
[segment .text ALIGN=4]
How can we tell the assembler that we want to align data or functions
to a double-word boundary? It's actually quite simple. Using the assembler
variables "$$" (the start address of the current segment), "$" (the address
of the current opcode) and the "times" directive, we can create a statement
like so:
times ($$ - $) & 3 nop ; Align the next instruction/data to
; a double-word boundary, assuming
; segment is aligned to double-word
It seems fairly lengthy, and indeed, it is. If you have NASM version
0.94 or later, however, the macro facility comes in handy:
%define align times ($$ - $) & 3 nop
Now if you want to align any of your data/procedures, just use the
align keyword as if it were a directive. Make sure that you put it before
the label, or else you'll end up jumping into the nop's and slowing down
your program:
align
_PutPixel__FUiUiUi:
To see this in action, load up the INTTEST.ASM file and add the %define
line listed above to the top of the file and our new "align" keyword before
each of the functions. Also add the keyword before the definition of the
VideoRAM variable:
align
VideoRAM dw 0000h
Add "ALIGN=4" to each of the segment definitions as well:
[segment .text ALIGN=4]
[segment .data ALIGN=4]
Compile the program and then load it into a debugger (FSDB works well
for this). Trace through the program up to the InitGraphics() function call
and step into the function. Go back a few bytes and notice how there are a
few nop's before the actual function. Also look at the starting address of
the function. It'll end in either 0, 4, 8, or C, meaning it's double-word
aligned. If you want, remove the "align" macros from the file and
recompile. Notice how the function aren't aligned anymore. Neat, huh?
=---------------------------------------------------
5.0 - Contacting the author
=---------------------------------------------------
5.1 - Closing comments
=---------------------------------------------------
This document is still in an unfinished state, so there may be some
errors (glaring or otherwise), omissions or misinformation. If you happen
to stumble across any of these (even typos), feel free to send me email to:
mmastrac@acs.ucalgary.ca
5.2 - Getting DJGPPASM.DOC
=---------------------------------------------------
The latest public version of this document is always available at:
http://www.ucalgary.ca/~mmastrac/djgppasm.doc
The examples created in the document are available in a separate
zip-file at:
http://www.ucalgary.ca/~mmastrac/djnasmex.zip
This manual compiled using MC v1.05 by Matthew Mastracci