From: cs94169@assn013.cs.ualberta.ca (David Bond)
Newsgroups: rec.games.programmer
Subject: VESA SVGA - line code and info
Date: 6 Feb 1995 18:39:08 GMT

Hello everyone!

This is a mini-tutorial, and code, relating to VESA SVGA programming.  The
code includes a line procedure which is based upon Bresenham's algorithm.  It
is not blazingly fast, but hopefully it'll work on all SVGA cards with VESA
support, and it is pretty compact - no special cases for slopes.

Many people try to begin programming in SVGA modes straight from mode 13h, or
Xmode variants.  They quickly encounter the problem of only 64K of vid mem
being accessable - falls a little short of the 300K required for 640x480x8bit!
The special address space A000h - AFFFh must be mapped to different parts of
the video memory to make use of it.  This can be done via VESA functions.  If
you don't yet have 'vesasp12.txt' (PCGPE contains this document) then I
suggest you get it from x2ftp.oulu.fi /pub/msdos/programming/specs/vesasp12.
This document details the VESA BIOS extensions used to get info on video
modes, set video modes, pan across a larger virtual screen, and set the CPU
window (A000-AFFF) to map to different places in video mem.

Even though VESA provides a common interface for SVGA cards, there are still
some specifics that have to be dealt with.  The 'granularity' of the window
is the smallest amount by which it can be moved.  A 64KB granularity with
1MB video memory means the CPU window can be mapped to one of 16 'chunks' in
this memory.  A 4KB granularity has more potential mappings - the window is
still 64KB in size, but it can be positioned on any 4K boundary in video.  I
know granularitys of 4K, 16K, 32K, and 64K exist.  Some cards are switchable
(actually the only chipset I'm familiar with that has this option is Cirrus
Logic - defaults to 4K, can be set to 16K.  I think this is necessary for
accessing >1MB).

I see two ways to manage this discrepancy.  Code can assume 64K granularity
always, and the 'bank-switching' routines make sure the window is moved by this
ammount (4K gran would require inc/dec by 16).  The other way is to deal with
each granularity differently - this is how the line code provided below
operates.

Finer granularity can speed up rendering.  Line drawing will be used to
illustrate.  The linear start address is calculated.  The low 16 bits of this
address are mappable to the 64K window.  The high order bits can be used to
locate the position of the window.  With a 64K granularity, the high word is
our window location, and the low word is the displacement into the window.
With 4K gran, The low 12 bits are the offset, and remaining high bits are the
window location.  If a line begins near the end of a 64K aligned chunk (linear
position 123840, say), and continues down a short distance, It'll cross a 64K
boundary.  With 64K gran, the window will have to be moved.  Using a 4K
granularity, the initial offset into a window can be kept below 4096.  So,
lines that aren't too long can always be kept within the starting window.

Another advantage that fine granularity provides is easier alignment with the
edge of the screen.  With a horizontal resolution of 640, 32 lines takes up 
20KB, which is divisible by 4KB.  If all windowing is then limited to be
aligned on these 20KB bounds, one will never have to worry about overflowing
past the end of the window while drawing across a scan-line.  The windows
are positioned so that the 'bottom' of the window is on these 20K bounds.
Inner rendering loops that move across a horizontal line don't bother with
checking for a 'page-cross'.  The outer loop checks for overflow when it moves
down to the next scan-line.  A 64K granularity doesn't align until 512
vertical lines (320KB), which means the inner loop must check for 
page-crossings within the scan-line.

Note: one way to create easy alignment with the edge is to change the length
of a scanline to a power of 2 (say 1024).  This wastes video memory, but
it can be well worth it.  Check vesasp12.txt for setting this.

The line procedure, below, does take advantage of positioning the top of the
window as close to the top of the line as it can.  Thus window moving for
mid-length lines is reduced for cards that have smaller granularities.  It
does not take advantage of alignment with the screen edge.  The code is made
to be fairly 'straight-forward', not much fancy is done - it's just simple,
flexible, small, and I hope easy to understand.  One easy optimization to add
is to check if the endpoints lie in different window addresses - if not, a
routine without a page-cross check can be called; otherwise the standard
routine is called.

Careful eyes may notice that lines are always rendered from top to bottom, but
I have a macro to move the CPU window UP!  A situation where this is needed:
A window begins 382 pixels across on a scan-line.  A line is started just two
pixels into the window (at 383).  The endpoint is on the far left of the screen
(0), and 5 pixels down from start.  The line is going to begin with a string of
pixels straight to the left - passing BACKWARD through the window boundary.
This occurance requires the 'PageUp' macro.  If alignment is done with the
screen edge, this isn't necessary.


This code is provided for learning purposes, and may be used in any fashion
desired - it's free!  If the code doesn't work for you, please let me know.  I
haven't had opportunity to test it on other systems.  It didn't get a rigorous
test on mine either - paging is untested.  Conversion to other resolutions is
pretty simple.  The linear address calculation is all that has to be modified
(I think!?) - 'bx' may be too small at higher resoultions - use 'ebx'.

This can be assembled with:	tasm /m2 /ml <filename>
				tlink /3 <filename>
Or pieces can be extracted, and interfaced to whatever you wish,
however you wish.

-Anthony Tavener 'Daoloth of MetaSentience'
-cs94169@cs.ualberta.ca (Temporary - friend's account)

---CODE BEGIN---
.486
code	segment para public use16
	assume	cs:code

PgDown		macro
	push	bx
	push	dx
	xor	bx,bx
	mov	dx,cs:winpos
	add	dx,cs:disp64k
	mov	cs:winpos,dx
	call	cs:winfunc
	pop	dx
	pop	bx
		endm

PgUp		macro
	push	bx
	push	dx
	xor	bx,bx
	mov	dx,cs:winpos
	sub	dx,1
	mov	cs:winpos,dx
	call	cs:winfunc
	add	di,cs:granmask
	inc	di
	pop	dx
	pop	bx
		endm

	mov	ax,seg stk	;\
	mov	ss,ax		;.set up program stack
	mov	sp,200h		;/

	call	GetVESA		;init variables related to VESA support

	mov	ax,4f02h	;\
	mov	bx,0101h	;.VESA mode 101h (640x480x8bit)
	int	10h		;/

	mov	ax,0a000h
	mov	ds,ax

	mov	eax,10h		;\
	mov	ebx,13h
	mov	ecx,20bh	;test Lin procedure
	mov	edx,1a1h
	mov	ebp,21h
	call	Lin		;/

	mov	ax,4c00h
	int	21h

GetVESA		proc
;This is just a hack to get the window-function address for a direct call,
;and to initialize variables based upon the window granularity.
	mov	ax,4f01h		;\
	mov	cx,0101h
	lea	di,buff			;.use VESA mode info call to..
	push	cs			;.get card stats for mode 101h
	pop	es
	int	10h			;/
	add	di,4
	mov	ax,word ptr es:[di]	;get window granularity (in KB)
	shl	ax,0ah
	dec	ax
	mov	cs:granmask,ax		; = granularity - 1 (in Bytes)
	not	ax
	clc
GVL1:	inc	cs:bitshift		;\
	rcl	ax,1			;.just a way to get vars I need :)
	jc	GVL1			;/
	add	cs:bitshift,0fh
	inc	ax
	mov	disp64k,ax
	add	di,8
	mov	eax,dword ptr es:[di]	;get address of window control
	mov	cs:winfunc,eax
	ret
buff		label	byte
		db	100h dup (?)
		endp

Lin		proc
;Codesegment: Lin
;Inputs: eax: x1, ebx: y1, cx: x2, dx: y2, bp: color
;Destroys: ax, bx, cx, edx, si, edi
;Global: winfunc(dd),winpos(dw),page(dw),granmask(dw),disp64k(dw),bitshift(db)
;Assumes: eax, ebx have clear high words

	cmp	dx,bx			;\
	ja	LinS1			;.sort vertices
	xchg	ax,cx
	xchg	bx,dx			;/

LinS1:	sub	cx,ax			;\
	ja	LinS2			;.calculate deltax and
	neg	cx			;.modify core loop based on sign
	xor	cs:xinc1[1],28h		;/

LinS2:	sub	dx,bx			;deltay
	neg	dx
	dec	dx

	shl	bx,7			;\
	add	ax,bx			;.calc linear start address
	lea	edi,[eax][ebx*4]	;/

	mov	si,dx			;\
	xor	bx,bx
	mov	ax,cs:page	;\
	shl	ax,2		;.pageOffset=page*5*disp64K
	add	ax,cs:page
	mul	cs:disp64k	;/
	push	cx			;.initialize CPU window
	mov	cl,cs:bitshift		;.to top of line
	shld	edx,edi,cl
	pop	cx
	add	dx,ax
	and	di,cs:granmask
	mov	cs:winpos,dx
	call	cs:winfunc
	mov	dx,si			;/

	mov	ax,bp
	mov	bx,dx

;ax:color, bx:err-accumulator, cx:deltaX, dx:vertical count,
;di:location in CPU window, si:deltaY, bp:color

LinL1:	mov	[di],al			;\
	add	bx,cx
	jns	LinS3
LinE1:	add	di,280h
	jc	LinR2			;.core routine to
	inc	dx			;.render line
	jnz	LinL1
	jmp	LinOut
LinL2:	mov	[di],al		;\
xinc1		label	byte
LinS3:	add	di,1		;.this deals with
	jc	LinR1		;.horizontal pixel runs
LinE2:	add	bx,si
	jns	LinL2		;/
	jmp	LinE1			;/

LinR1:	js	LinS7			;\
	PgDown				;.move page down 64k..
	mov	ax,bp
	jmp	LinE2
LinS7:	PgUp				;.or up by 'granularity'
	mov	ax,bp
	jmp	LinE2			;/

LinR2:	PgDown				;\
	mov	ax,bp			;.move page down 64k
	inc	dx
	jnz	LinL1			;/

LinOut:	mov	cs:xinc1[1],0c7h
	ret
		endp

winfunc		dd	?	;fullpointer to VESA setwindow function
winpos		dw	?	;temp storage of CPU window position
granmask	dw	?	;masks address within window granularity
disp64k		dw	?	;number of 'granules' in 64k
page		dw	0	;video page (0,1,2 for 1MB video)
bitshift	db	0	;used to extract high order address bits..
				;\ for setting CPU window
		ends

stk	segment para stack use16 'STACK'
		dw	100h dup (?)
		ends
		end
---CODE END---