; Fire - by Clayten Hamacher aka White Night
;
; (c) Copyright 1998 by Clayten Hamacher - Released under the terms of the GNU
; General Public License.
;
; Version 3.1
;
; Based loosely (very) on 'fire.pas' and other descriptions of demo effects.
;
; In the parts of the code that are executed frequently (seed, spread,
; and display) the 'natural' order of the instructions has been shifted
; to reduce stalls where the next instruction depends on the results on the
; last. This makes the code harder to read, but provides aproximately a
; 10% speed increase over the unoptimized versions.

; Version History
;
; Version 0.1 - Planning stages, code segments in profiler. No resemblance
; to finished project.
;
; Version 1.0 - C prototype. This is where it started to look like the final
; version. Used the ease of programming at this stage to test various
; algorithms. Settled on random-walk seeding, seedline smoothing, pixel
; quadrupling vertically, and smoothing. Set the cellular automata size at
; 320x50 and quadrupled that in the display routine for 230x200.
;
; Version 2.0 - ASM. This is close to the final version. Small changes from
; the C versions because I wrote a pseudo random number generator that is
; significantly different (and MUCH faster) than the one in stdlib.h
; This changed the graphics, but not by a lot. This version was aproximately
; 60% faster than the C version. Some of this because the code was more
; efficient, some because the off-screen algorithms changed in minor ways.
;
; Version 3.0 - Final version. The only changes were to remove a few checks
; for conditions that could never happen (short of someone playing with a
; debugger). This is the version where I implemented pentium optimization.
; Instructions that ran badly on the pentium were split into their component
; parts. (ie loop -> dec, branch). Instruction order was also changed to
; improve pipelining. This improved speed by about 10%
;
; Version 3.1 - Final version + Source code mod. No executable changes, but
; things have been cleaned up and comments expanded from shorthand to
; english.
;
; Tests indicate that this code executes at 1fps / 1mhz of a Pentium level
; processor. MMX is expected to increase this to probably 1.3fps / 1mhz due
; to the fact that more of the data will fit into the level 1 data cache.
;
; - Notes
;
; Future changes - Perhaps modifying the program to only use 1-2k of data
; outside of the graphics page. This would be achieved by executing the
; copying and spreading of the data from top to bottom in order to not clober
; it.
;
; This would be done only if it needed to be. The delays from the extra
; code to break the screen from a contiguous lump, into two parts would
; probably outweigh any benefits from a small memory footprint. Also, using
; the graphics page for reading and writing would slow the program down
; considerably.
;
; Link to this from MSVC++ using a screensaver stubb. It would be neat to
; have this as a screensaver. I don't know if I'll try to scale the output
; to the screensize, or simple change it to 320x200 and change back when
; done.
;
; This code will probably appear in my first program implementing
; translucency. It's a good little animation that scales well to meet
; whatever CPU requirements I need. The smoothing is also easy to, uses
; only integer math, and does NOT need to be done in powers of 2. (ie the
; display can be scaled from 320x50 to 320x150 if desired, or 760x150.)
; This lets it be used to fill any window without much of a slowdown.
; The drawing code takes about 20% of the CPU at 320x200. Theoretically
; it could scale to 1024x768, from a 256x128 cellular automata, and
; still display with a reasonable framerate on a p100. If only there existed
; a graphics card that could display 1024x768x24b at 60fps without killing
; the rest of the system. Perhaps AGP would help here.
;
; I was considering writing the pallette generation code in ASM, and trying
; to get the demo below 1k. But, it is under 2k because it doesn't need to
; be larger, making it smaller than 1k would be optimization just for the
; sake of optimization. Now, if there was a good prize I might win... There
; is also the benefit of being able to link a pallette made with my
; pallette creation code in which I get to drag gradients around and see
; the pallette in realtime. Much nicer. Also easier to make work with other
; projects where they need to share a pallette.
;
;
; In an interesting way, this program is as fast as it is only because I
; 'wasted' years playing with Conway's Life, and other CA programs. Trying
; to get Life to run at a decent speed on a 6502 at 1mhz taught me a lot
; of lessons about coding small, and fast, and what tradeoffs should be made.
; The cellular automata aspects are also very similar, and the body of this
; code looks a lot like the Life program I first wrote on the A2+ at school
; in the early eighties when I was in like grade 4 or something. I think
; this background influences my current liking of RISC CPUs.
;
; It is interesting to note that this is the first program I have ever
; written for x86 ASM. I guess 'Mastering Turbo Assembler 2E' is a good book.
;


         IDEAL
; DOSSEG
         MODEL TINY
; .STACK 200h
         P386
         ASSUME CS:@CODE, DS:@CODE
;         LOCALS

;--------------------------------------;
;EQUATES

VGA EQU 0a000h

;--------------------------------------;
         CODESEG       

         ORG 100h

START:  mov     ax, cs
        mov     ds, ax          ; set DS to point to the local variables
        mov     es, ax          ; use ES as well for lod/sto pairs

; Setup - Disable keyboard interupts, set video mode, and set pallette

        mov     dx, 21h
        in      al, dx
        or      al, 2 
        out     dx, al          ;disable keyboard interrupt
                     
        mov     ax,13h
        int     10h             ;set video mode 13h
                     
        cld          
        mov     dx, 3c8h        ; Select palette color selection port
        xor     ax, ax          ; clear ax        
        out     dx, al          ; start at colour 0
        inc     dl              ; set dx to pallette color writing port  
        mov     si, OFFSET Colors       ; prepare to read colors
        mov     cx, 768         ; 256 colors, and 3 bytes each

PutPal:
        lodsb          ; Load a byte
        out   dx,al      ; and write it
        dec   cx
        jnz   PutPal 

; Init - Set initial values, and clear arrays
        mov di, OFFSET SLine
        mov eax, 80808080h
        mov cx, 80
        rep stosd

        mov   di, OFFSET Fire
        xor   eax, eax
        mov   cx, 4400   ; 320 x 55 / 4
        rep   stosd       

Main:
;Seed - Change values in the seedline
        mov cx, 320
        xor bx, bx
        mov si, OFFSET SLine
        dec si
Seed1:
        mov     eax, [Seed]     ; Start the random number generator
        mov     edx, [Mult]
        mul     dx
        inc     ax
        and     eax, 0000ffffh
        mov     [Seed], eax     ; replace random seed
        xor     al, ah
        jpe     SHORT Seed2
        add     bx, 50
        jmp     SHORT Seed3
Seed2:
        sub     bx, 50
Seed3:
        inc     si
        xor     ax, ax
        mov     al, [si]
        add     ax, bx

        test    ax, 8000h
        jne     SHORT Seed4
        cmp     ax, 100h        ; see if ax has gone above 0ffh
        jb      SHORT Seed5
        mov     ax, 0ffh        ; if ax is above 100h
        jmp     SHORT Seed5   
Seed4:
        xor     ax, ax          ; if ax is below zero, set to 0
Seed5:
        cmp     al, 80h         ; we've clamped it within range of al
        jb      SHORT Seed6     ; test if it's over 128, if not jump to Seed6
        sub     al, 03          ; it's over, so subtract 3
        jmp     SHORT Seed7
Seed6:
        add     al, 03          ; add three to bring it towards midpoint
Seed7:

        xor     edx, edx
        mov     ah, 0h
        mov     dl, [si - 1]
        add     ax, ax
        add     ax, dx
        mov     dl, [si + 1]
        add     ax, dx

        cmp     bx, 7fffh
        jnb     SHORT Seed8
        cmp     bx, 10
        jb      SHORT Seed8
        sub     bx, 45
        jmp     SHORT Seed9
Seed8:
        add     bx, 45
Seed9:
        shr     ax, 02          
        dec     cx
        mov     [si], al        
        jnz     Seed1


; Spread - Move flame upwards

        mov     si, OFFSET Fire
        add     si, 1h

;; Used to display just the seedline values to test seeding algo
;         mov   ax, 320
;         add   si, ax
;;

        mov     dx, 54          ; calculate 54 rows
y_loop:
        mov     cx, 318         ; and 318 pixels in each row
x_loop:
        xor     ax, ax
        xor     bx, bx
        mov     al, [si]

;; Comment this block out when uncommenting above block
        mov     bl, [si + 319]  ; NextRow - 1
        add     ax, bx
        mov     bl, [si + 320]  ; NextRow
        add     ax, bx
        mov     bl, [si+321]    ; NextRow + 1
        add     ax, bx
        sub     ax, 5           ; dampen
        shr     ax, 2           ; divide by four
        dec     ax              ; dampen more
        cmp     ax, 0100h       ; if it's bigger than this, it wrapped under zero
        jb      SHORT Write     ; if not, write the value
        xor     ax, ax          ; if so, fix it first
;;



Write:
        mov     [si], al
        inc     si
        dec     cx
        jnz     x_loop

        add     si, 2           ; skip to the start of the next row
        add     di, 2

        dec     dx
        jnz     SHORT y_loop    ; if not at zero yet, do it again

; Display - Copy the flame onto the screen

; Wait for retrace      ; Hurts performance on slow system, not needed on
;                       ; anything less than a p200 - should test for it
         mov   dx,03DAh
l1:
         in    al,dx
         and   al,08h
         jnz   SHORT l1
l2:
         in    al,dx
         and   al,08h
         jz    SHORT l2
; End of WRT

        push    es

        mov     ax, VGA
        mov     es, ax
        mov     si, OFFSET Fire

        mov     di, 1
        inc     si

;; Start Full Display Routine
        mov     dx, 50          ; 50 rows
Display1:
        mov     cx, 318         ; 320 pixels in each row
Display2:
        mov     al, [si]
        inc     si
        mov     bl, [si + 319]  ; NextRow
        shr     bl, 2
        mov     bh, al          ; CurrentRow
        shr     bh, 2
        sub     bl, bh
        mov     [es:di], al
        inc     di
        add     al, bl
        mov     [es:di + 319], al
        add     al, bl
        mov     [es:(di + 639)], al
        add     al, bl
        mov     [es:di + 959], al
        dec     cx
        jnz     Display2

        add     si, 2           ; skip to the start of the next row
        add     di, 962         ; skip for rows down (320 * 4) + 2

        dec     dx
        jnz     SHORT Display1  ; if not done all the rows
;; End Full Display Routine

;; Non-Smoothed/Stretched display routine ; faster, for testing stuff
;        mov   ax, VGA
;        mov   es, ax
;        mov   si, OFFSET Fire
;        mov   di, 0
;        mov   cx, 4400            ; Quick-Display
;        rep   movsd
;; End alternate display routine

        pop     es              ; restore ES to the same value as DS and CS

; Frame Counter
;
;        mov     eax, [Ticks]
;        inc     eax
;        mov     [Ticks], eax
;
;        clc

;; exit code is long because it also contains the busy-loop for pausing

Exit:
        in      al,60h          ; Read a key
        cmp     al, 01h         ; is it ESC
        je      Shutdown        ; If so, shutdown
        jmp     Main
;        cmp     al, 39h         ; if not, is it <space> (pause)
;        jne     Main            ; if not, keep drawing
;Exit2:   
;        in      al, 60h         ; if it is, loop until Key-break - read a key
;        cmp     al, 01h         ; is this esc?
;        je      Exit2           ; if so, shutdown
;        cmp     al, 0B9h        ; is it keybreak for <space>
;        jne     Exit2           ; if not, keep looping
;Exit3:
;        in      al, 60h         ; if it was, then wait for the next <space> -readkey
;        cmp     al, 01h         ; is it esc? 
;        je      Shutdown        ; if so, quit
;        cmp     al, 039h        ; is it space?
;        jne     Exit3           ; if not, keep looping
;Exit4:
;        in      al, 60h         ; it was, now wait for keybreak - read a key
;        cmp     al, 0B9h        ; is it keybreak <space>?
;        jne     Exit4           ; if not, continue to loop
;        jmp     Main            ; if it is, quit pausing, and start drawing again
;
Shutdown:
        mov     dx,21h
        in      al,dx
        and     al,0fdh
        out     dx,al           ;Enable keyboard interrupt
 
        mov     ax,3
        int     10h             ;Set standard video mode

; Print the number of frames - useful for timing
;        mov     ebx, [cs:Ticks] ; Load the number.
;        mov     ax, 0200h       ; setup for int21-6 call
;Print:
;        mov     edx, ebx
;        and     edx, 0f0000000h ; mask out all but high nybble
;        rol     edx, 4h         ; move data to low nybble
;
;        cmp     dl, 0Ah
;        jge     Text
;        add     dl, '0'         ; make it printable
;        jmp     Digit
;Text:
;        sub     dl, 0Ah         ; this is one more step than needed, but
;        add     dl, 'A'         ; makes things more obvious
;Digit:
;        int     21h             ; print it
;        shl     ebx, 4h
;        jne     Print
;        mov     dl, 10          ; \l
;        int     21h             ; print a newline
;        mov     dl, 13          ; \n
;        int     21h             ; print a carriage return
; Exiting
        mov     ax,4c00h
        int     21h             ;Exit

         DATASEG

;--------------------------------------;
;Initialized Data
;
Seed  dd 12345
Mult  dd 9421
Ticks dd 0h

colors: include "colors.inc"
;--------------------------------------;

;--------------------------------------;
;Uninitialized Data
Fire  db 17280 DUP (?)   ; 320 x (50 + 4) to hold screen and offscreen
SLine db   320 DUP (?)   ; Seed-line
;--------------------------------------;
END START

