Making a Dos program with TASM

The tutorials here are done using Borland's TASM, TLINK and optionally Turbo Debugger. These programs can be found as a suite. If you can get it free, then good. If not, you might have to buy it.

Turning our sourcecode into an executable is a 2-step process.
1) Compile sourcecode(text) into an Object file (.obj)
2) Link object file(s) into the executable (.exe)

We'll use the Turbo Assembler for step one. The sytax is: TASM [switches] [src file]
Use the /zi switch to include debug information

We'll use Turbo Linker for step 2. The syntax is TLINK [switches] [src file(s)] , [output file]
Use the /zi switch to include debug information

Debug informations is used by Turbo Debugger. It allows us to see the source code as we watch the cpu instructions get processed (we also watch registers and data in ram change).
Turbo Debugger syntax is:
TD < [program name] [program's arguments] >


Technicalities of source code.
Assembly code is probably the simplest most straight-foward code there is. However, as all source code, the code itself is a communication to the compilier. You have already seen Instruction syntax, which is the way to code CPU instrutions (opcodes). Now you'll see Directives, which are instructions to the compilier. Directives help you form your program.

1) Segments and Memory Model
2) Labels
3) Defining Variables
4) Defining Procedures
5) Comments

IMPORTANT: When reading/writting code, pay attention to every punctuation mark, because Periods, Colons, Semi-Colons, Singe/Double Quotes have specific meaning to the complilier.


1) Segments and Memory Models
.model selects a memory model. Simplest explaination: different models offer different sizes for total code, data and/or stack. It does so by allowing single or multiple segments for either data or code.See Memory Models
.stack marks start of stack segment. Can be used to declare size of stack.
.data marks the start of the data segment. All variables belong there. The data segment should be pointed to by the Data Segment register (DS).
.code marks the start of the code segment. All code belongs there. Code segment is pointed to by the Code Segment register (CS).

The above directives are needed in almost every program.


2) Labels
Labels are markers in source code. They mark their location. The label is used as a pointer to the location it marks. That pointer is an offset value from the Label's segment.


3) Defining Variables
To define a variable, we use a Data Allocation Directive. By defining a variable, we instruct the compilier to set aside space for that variable (allocate) and to place an initial value there (optional). A variable is given a name, a size, and an initial value. When the compilier processes the source code, and incounters a variable, it replaces the variable's name with the variable's offset from it's segment. So a variable's name is really a LABEL, and a LABEL is our visual handle for a numerical POINTER. Labels are great, because it keeps us from having to manually count the offset value of our variables and procedures.

  DB - define Byte --- size = 1 byte
  DW - define Word --- size = 2 bytes
  DD - define Doubleword --- size = 4 bytes
  DQ - define Quadword --- size = 8 bytes
  DT - define Tenbyte --- size = 10 bytes

These directives can initiallize multiple variables at a time and Initialize each variable with a value. That value can be written as a number, a character in quotes, or as an expression.

  Byte_Var DB 23          ; defines byte with a value of 23

  Word_Var DW 1023          ; defines word with a value of 1023

  String_var DB "hello there"          ; defines 11 bytes in series. Each BYTE has the ascii value for it's respective character in the string being defined.

  Char_var DB 'A'          ; defines byte to the ascii value of 'A' - 41h

  Char_var DB 'A'+10          ; Here's an expression. Defines byte to the ascii value of 'A'+10, which equals ascii 'K' - 4Bh

  Word_var DW -1024          ; defines a negative word. See Signed Numbers.

  String_var DB "hello there",0          ; defines 12 bytes in series. The last byte is the Numerical value of Zero. This is a Null Terminated String which is sometimes used.

  String_var DB "hello there",0Dh,0Ah,'$'          ; defines 14 bytes in series. The byte value 0Dh (13d) is CarrageReturn. 0Ah (10d) is LineFeed. Last Byte is the ASCII char '$'. This is a DollarSign-Terminated string used by DOS.



4) Defining Procedures
Beneath the .CODE directive, we write our code. We must end the code segment with an END directive. The End directive will state the first label declared under .code, which is usually the name of the first procedure defined. A procedure's purpose is to organize code. Procedures are good becuase you can Branch program execution to them by using their name. Like : Call Main; or Jmp Main. A procedure's name is just like any other label (like a variable's). It's just an Offset Address holder.
The following is what your code segment should look like.

.code
main proc
    ...
    ...
    ...
main endp
end main


Here's a code segment with another procedure defined


.code
main proc
    ...
    call check_disk
    ...
main endp

check_disk proc
    ...
    ...
    ret
check_disk endp
end main


Below is the simplest you can code. It's just code. The label 'Start:' is only there so you can have a label to END the code segment with.


.code
start:
    ...
    ...
    ...

end start



5) Comments
Comments require a Semicolon. They are allowed on a line-by-line basis.
exp:

mov ax, 34       ; moves 34 into the AX register

anything after the semicolon (on that line) is ignored by the compilier.



The following is a skeleton of common ASM code. Copy and save it as blank.asm. Use it as a template to speed up writting new code.


;Project Name:
;Authur:
;Original Date:
;Date Modified:
;Purpose:
;Descrition:

.model small
.stack 100h
.data

.code

main proc

mov ax, @data
mov ds, ax

;code goes here

main endp
end main



As you develop your code, and write newer versions, it's good to keeps all the older code somewhere; so if someone plagurizes or steals your code, you can prove it is yours by showing how YOU wrote the code from it's Starting Stage into it's final form.




Memory Models
You may or may not have read about Segments. As in SEGMENT:OFFSET addressing. It is the way memory is addressed in Real mode. Your program will have it's Code and Stack Segment registers set by DOS, but the data segment must be set by your code (your program). If you have more than a segment's worth of code (over 64KB), then you'll have to preform a FAR CALL or FAR JMP when you need to transfer execution to code not in the current Code Segment. OR, perhaps you'll have more than one segment's worth of data; in such a case your DS (Data Segment) or ES (Extra Segment) must be made to point to the segment which contains the variable you wish to access.
So, the programmer needs a way to get the different segment addresses into the proper segemnt registers. This is not hard stuff to do, the compilier is made to keep track code and data and where they belong in relation to the segments you've defined. Part of the Assembly Language syntax includes ways to define segments, and ways to move Segment addresses AND Offset addresses of code and data into registers.

Consequientially, the programmer must know when he'll have more than a segment's worth of code/data, and must decide how he/she wants to organized that code/data under deffernt segments. Keep in mind, one segment is 64 Kilobytes. I think that's alot!
It'll be a long time until you (the beginning assembler) needs more than 64KB for anything.
Here's a situation:

You've declared a variable and you want to get it's value into a register. The following can be done in tasm:

.data
var1 db 80h

.code

mov ax, @data
mov ds, ax
mov si, offset var1
mov al, ds:[si]         al = 80h

In the sample code above, we moved the OFFSET address of var1 into the SI register.
You may ask, "what's an offset?" An offset is the distance of one thing from another. In a variable's case, that distance is measured in bytes.
var1 is Offset from the .data marker, which marks the start of that data segment.
How does the compilier know that var1 is addressable as an offset from .data?
Because, var1 was declared under the .data marker, that act tells the compilier to include var1 in that segment. If you fill that segment with data (over 64k), then the compilier will notice that the data in that segment has passed a segment's capacity and will probably issue an error. You will then need to make another data segment.

.code begins that code segment, ending the .data segment. Any label declared beneath .code will have it's OFFSET measured from the .code marker

You can also move the code and stack segment values into a register by using @code and @stack.

If there are more than one data segment, the other data segment might have a specific name related to it. In any case, here's a way to get a variables segment value into a register regardless of which segment it's in:

mov ax, seg var1
mov ds, ax
mov si, offset var1



Now, Here are the differrent Memory Models available with TASM.
.model Tiny - Code, Data and stack must be less than 64KB. This is used for COM programs.
.model Small - Code and Data each get one segment. That's a limit of 64KB of Code, 64KB of Data.
.model Medium - Only one segment for Data. Multiple segments for Code. 64KB limit on Data. Unlimited Code.
.model Compact - Only one segment for Code. Multiple segments for Data. 64KB limit on Code. Unlimited Data.
.model Large - Multiple segments for both Code and Data. Unlimited Code and Data.
.model Huge - Same as Large, but individual variables can be larger than 64KB.
.model Flat - No segments. 32-bit pointers are used for both Code and Data. This is used in Protected Mode.





Now here's asm code, TASM style, with comments showing the values of the operations. If you don't understand the following, please come back and read it again after you've done a few tutorials.

.data

wordvar_1 dw 132
wordvar_2 dw 476
wordvar_3 dw 796

.code

mov ax, @data
mov ds, ax

mov ax, offset wordvar_1         ;moves wordvar_1's offset address into AX
mov ax, wordvar_1         ;moves the word stored in memory, whose offset address is represented by the label 'wordvar_1', into AX

mov ax, wordvar_1        AX = 132
mov ax, OFFSET wordvar_2        AX = 0
mov ax, ds:[0]        AX = 132

mov ax, wordvar_2        AX = 476
mov ax, OFFSET wordvar_2        AX = 2
mov ax, ds:[2]        AX = 476

mov ax, wordvar_3        AX = 796
mov ax, OFFSET wordvar_3        AX = 4
mov ax, ds:[4]        AX = 796
-------