images/poc_populated_th.jpeg: unable to load image due to errors.

POC V1: Firmware Development

Having completed eradication of the hardware bug described on the previous page, I resumed with firmware development.  The first step, of course, had already been taken, and that was to get the I/O primitives—the very low-level code that talks to the DUART—out of the way.  "Bare metal" development, as I was doing, is tricky due to the difficulty of debugging when things fail to work as expected.  At this stage of development, there is no machine language monitor with which to peer into memory or examine code, so all debugging is accomplished by inference alone.  The machine goes completely belly-up or does something bizarre and you try to figure it out based on more often than not, what didn't happen rather than what did happen.

As I was developing the bare metal code, I would occasionally have to use my trusty B&K logic probe to see what was going on.  Monitoring the IRQ circuit was a frequent activity, since the watchdog timer (WDT) was supposed to be generating interrupts at 10 millisecond intervals—the "jiffy IRQ" that serves as a sort of heartbeat.  If there were no IRQs then it was a sure bet something had scribbled all over WDT's registers and crashed the machine.  If the jiffy IRQ was present but no input or output was working, the probe would show evenly spaced pulsing but no other activity, since a lack of I/O would fail to produce the corresponding IRQs.  In extreme cases, I'd have to hook up the 'scope and start poking around to verify that the microprocessor (MPU) wasn't on vacation.  It involves a form of inductive reasoning that is difficult to explain, but utterly essential if any progress is to be made.

Shortly after starting on firmware in earnest, I made the decision to change the TIA-232 I/O back to polled instead of interrupt-driven.  Although such a move would slow down the system, it would make it easier to fine-tune the functions that read or write bytes to the buffers.  Also, at this stage in the game, I was still operating the MPU in 65C02 emulation mode.  I figured I would get the more difficult stuff done first before tackling 16 bit loads and stores, along with the other advanced instructions.  That, and the interrupt service routine (ISR) would be less complex because it wouldn't have to worry about register sizes and other details that would come with operating the MPU in native mode.

Development Environment

Before I get too far along, let me describe my software development setup.  It has been many years since I did professional 6502 assembly language development, and I never did program anything powered by the 65C816.  The result was that I was not as well-equipped to write 65C816 assembly language as I needed to be.  Part of the problem is that good 65C816 native assemblers that don't cost a left lung are largely non-existent, and the free ones that are available often exhibit the preconceived notions of their authors on how an assembler ought to work (notions that are frequently incorrect).  Rather than writing an assembler that meets the Western Design Center (WDC) syntax standards, the freeware crowd often writes something that ranges from the mildly annoying to something narrowly focused on a particular machine (e.g., the Super Nintendo Game System, which is 65C816-based), or worse yet, creates an assembler that is full of bugs.

In the world of professionally developed assemblers, WDC sells a Microsoft Windows-only product called ProSDK that includes an ANSI C compiler and macro assembler, as well as tools to generate code libraries (an essential feature in large-scale C development).  The package is available at Western Design's download site and is reasonably priced, with the only caveat being that the installation is keyed to the MAC address of your PC's network interface adapter.  This characteristic effectively precludes reinstallation on another machine, which means you will run into trouble trying to move your software to a new computer when the old one blows up.

At the time when I started on the POC project ProSDK was costly and the frugality in me balked at paying some 400 US dollars for software that I thought, at the time, would probably never be used to generate any significant cash flow.  There are some other assemblers that are just as pricey and in some cases, cumbersome to use.  You may think I'm being unnecessarily picky about this, but as anyone who has done a lot of assembly language programming will attest, the wrong development environment will make life unpleasant at best.  So I decided to take a different approach to this.

Some years back, Michal Kowalski wrote a Windows-based 65(c)02 simulator, complete with an excellent (although slightly non-standard) macro assembler.  As assembler macro support goes, this package is better than average, which prompted me to figure out how to make the assembler understand 65C816-specific mnemonics and addressing modes.  I accomplished this by writing an extensive set of macros to synthesize the '816's native mode instructions, as well as furnish two missing 65C02 instructions (STP and WAI) that the simulator's 65C02 mode, which is the Rockwell version of the 'C02, doesn't implement.  I should note that I assemble programs with the simulator's 65C02 mode enabled because that mode includes all instructions shared by the 'C02 and '816, other than the aforementioned STP and WAI.  As luck would have it, the assembler internally represents all numeric values as 32 bit integers, so it is possible to work with the 24 bit addressing that is used by some '816 instructions.  The assembler also understands C-like notation for shifting, masking and performing Boolean logic, which is handy for isolating the least and most significant words of a 32 bit value.  All of this proved to useful in writing '816 instruction macros and will be useful for future software development, such as when I get to developing a filesystem to work with POC—filesystems typically use 32 or 64 bit arithmetic.

For the purposes of burning EPROMs, I save the assembler's output to a binary file with no header, as such a file can be directly loaded by the EPROM burner's software without jumping through any hoops.  In cases where I want to transfer the assembled program to POC for testing, I generate a Motorola S-record format object file, which is then transmitted to POC via a serial connection.  This latter scheme makes the development and debugging process relatively painless—it was how I tested the SCSI subsystem drivers that I will discuss in a later page.

However, my Frankenstein approach is not an entirely satisfactory arrangement.  Some of the assembler's syntax is non-standard, e.g., using @ to represent bit-wise operands, rather than % as defined by the MOS Technology standard, often causing me to inadvertently create syntax errors by reflexively typing % where @ is required (@ usually represents octal values in MOS-compliant assemblers), the S-record output format is incomplete—there is no S0 header record, and my '816 instruction macros had to be assigned odd names to avoid colliding with genuine 65C02 instructions, while retaining a modicum of similarity to the "real" mnemonic.  The result is that, for example, LDA (1,S),Y in '816 assembly language ends up being coded as LDASI 1, where the SI tacked on to LDA indicates stack indirect indexed addressing.  Similar oddities are in place to allow the coding of 16 bit immediate mode instructions, such as LDA #$1234, which ends up being LDAW $1234, the W signifying a "word" or 16 bit load.  Yet another weird one is JSX <addr>, which stands in for the '816's very useful JSR (<addr>,X) instruction, which behaves much like the ON ... GOSUB construct of BASIC.

Complicating matters is the fact that the assembler, of course, knows nothing about the '816's register widths, which opens the door to various coding mistakes, such as attempting to load a 16 bit value into a register previously set to eight bits.  Nevertheless, with some fine-tuning and refinement, I was able to engineer the entire POC V1 BIOS ROM in the Kowalski assembler, and was even able to make extensive use of the '816's stack addressing features, thanks to the assembler's ability to support re-definable symbols via the .SET pseudo-op, a useful function in defining stack frames.  The entire set of macros and logical operators, as well as a link to the Kowalski simulator, may be found on the Downloads page.

Like WDC's ProSDK, the Kowalski simulator is a Windows-only program and is heavily dependent on Microsoft foundation classes (MFC) to function, which is somewhat unfortunate.  I really don't like Windows very much—it's a bug-ridden mess with numerous security issues that, through long professional experience, I have grown to distrust (I see many instances of Windows' shortcomings as I go about my daily client support activities).  Windows is quite fragile, especially when compared to the UNIX and Linux environments in which I spend much of my working activities.  For that reason, the actual file storage is on a UNIX server running Samba, the latter which makes the server appear to Windows clients as a genuine Windows server.  So at least I don't have to worry about a Windows-induced anomaly completely wiping out all my work.  However, as no computer system is infallible, everything gets backed up to magnetic tape each night.

While not directly related to firmware development, but important for later software development, storing Kowolski output files on a UNIX machine produces added benefits.  Unlike Windows, UNIX provides a number of tools that facilitate the transfer of files between different operating system types.  Also, useful tools for examining file internals are readily available.  od is particularly useful for this purpose—it's how I determined that the Kowalski S-record output was missing the S0 header record.  As earlier mentioned, the design of POC accounts for the UNIX file transferring capabilities by including an auxiliary serial port that can be linked to the server.  A simple shell script reads the S-record object file, prepends an optional S0 record from command line arguments, and sends the file to POC a record at a time, using a protocol I devised to keep the two systems in sync during the transfer process.  POC, in turn, error-checks the input and if it is satisfactory, converts it to binary format and pokes it into memory for later execution.  Trying to do all this with Windows is an incredibly convoluted process.  As most seasoned computer professionals know, in Windows the easy stuff is easy and geeky stuff like sending object code to another system through a serial data link is fraught with difficulty.

Since firmware has to be encoded into an EPROM, I also have an EPROM burner connected to the Windows machine on which I do the development.  I started out with one of those cheap TOPS burners that can be acquired on eBay for about 50 USD.  However, I soon discovered that the software that runs it is bug-infested, poorly documented by someone for whom English isn't even a second language, and the device data library has many errors and omissions.  So I ended up replacing the burner with a more costly but far better unit.  It's definitely a case study of getting your money's worth and no more.

Development Stages

I'm not going to bore you with an endless stream of chatter about how this and that was developed in the the firmware.  The source code and an assembler listing for the latest version are available on the Download page for your reading pleasure (however, be prepared to do a lot of reading, as the source code runs to nearly 12,000 lines).  The main source file, pocrom.65s, lists all the .INCLUDE files used to build the code, which also serves as a table of contents for anyone interested in studying the various modules.  Near the beginning of pocrom.65s, as well as at the beginning of the listing file, is a fairly detailed revision table that describes every significant change in the code.  From that you can glean an idea about how development progressed.

Key development points were:

Version
Origin Date
Features of Note
0.1
2010/08/01 A) Coded to run entirely in the 65C816's native mode.
B) Fully functioning machine language monitor.
0.2
2011/07/06
A) Converted TIA-232 I/O from polled to interrupt-driven.
B) More stringent RAM testing, especially stack and zero page.
C) Many small but important refinements, including a substantial code shrink.
0.4
2011/10/01
A) Added SCSI subsystem functions (coincided with construction of a SCSI host adapter).
B) Modified reset to enumerate the SCSI bus & construct a device table in RAM.
0.5
2012/04/28
A) Enabled the BIOS to load a boot block from a SCSI disk.
B) Added an alarm function analogous to alarm() in Linux or UNIX.
0.7
2012/06/20
A) Completely redesigned the part of the monitor that translates 65C816 mnemonics to opcodes & back.
0.8
2012/08/11
A) SCSI driver primitive rewritten to use IRQs and the controller's DMA handshake functions.
1.0
2012/09/07
A) First "production" version.

There are several firmware design features on which I will elaborate.

BIOS

The BIOS proper consists of TIA-232 I/O primitives, SCSI subsystem driver API, SCSI bus I/O primitives, and interrupt service routines (ISR).  The hardware IRQ ISR is the most complex, as all I/O is interrupt-driven.  The IRQ ISR also processes the 32 bit uptime counter, a 16 bit programmable delay feature and an alarm feature, all of which are slaved to the 100 Hz jiffy IRQ generated by the watchdog timer.  All useful BIOS functions are accessible through a jump table that starts at $00E000.  Also part of the BIOS is a set of vectors located at $000100 that allow a program to "wedge" into the various interrupt handlers and modify or replace that part of the ISR.  A SCSI primitive indirect vector is located here, as well as two user-definable vectors at $00010C and $00010E.

POST

POST is an acronym for Power-On-Self-Test, which is something that virtually all modern computers execute at start-up.  POC's POST is modeled on contemporary practice and includes the following major steps:
images/io_decode_patch_th.jpeg: unable to load image due to errors.
  • First-stage memory testing.  Immediately after power-on or reset, interrupts are disabled and the MPU is switched to native mode operation.  A destructive test of page zero RAM is performed, followed by a similar test on page one RAM, as that is the default stack used by the MPU following a hard reset.  If a memory fault is discovered the MPU is halted, as processing cannot continue without a functioning page zero and stack area.  As no console I/O has been established it is not possible to inform the user of the problem.
  • First-stage hardware initialization.  The real-time clock is initialized so the watchdog timer will generate jiffy IRQs, interrupts are enabled and the DUART is initialized to establish TIA-232 communication.  TIA-232 buffer indices are initialized, thus making the TIA-232 subsystem ready for bi-directional I/O.
  • POST screen display.  An initialization sequence is sent to the console terminal, which clears the screen.  A banner display follows, producing the first visible sign of system activity.
  • Second-stage memory testing.  All RAM from $000200 to $00CFFF is non-destructively tested, the progress of which is displayed on the console as an ascending "bytes found" count.  If a defective memory location is discovered, testing is discontinued, an attempt is made to display the faulty location's address and the system is halted.
  • Second-stage hardware initialization.  The BIOS checks for the presence of the SCSI host adapter and if found, initializes it and then executes a SCSI bus hardware reset.  A delay period follows to allow all SCSI devices to recover from the reset.
  • SCSI enumeration.  The SCSI bus is probed to detect the presence of devices and an enumeration table is built in RAM at $000110.  During enumeration some information about each discovered device is displayed on the console.
  • Initial system load (ISL).  If a bootable device is detected at SCSI ID $00 an attempt is made to load and start an operating system.  A successful load will result in control being given to the operating system.  As is customary practice, the ISL starts with the BIOS reading physical block $00 on the disk, examining it for signs that a valid boot block is present and if found, executing the boot code.  It is then up to the boot code to continue ISL or abort if it can't.
  • Monitor console.  If ISL is unsuccessful control will be given to the machine language monitor.  This is also the default action if the SCSI host adapter is not present, fails to respond to initialization or generates an error during initialization.

Machine Language Monitor

The machine language monitor is the default operating environment if ISL fails or no SCSI hardware is present.  The monitor includes functions to examine and change memory and MPU registers, assemble, disassemble, load and execute code, and execute some basic low-level SCSI commands.  The assembler and disassembler support the entire W65C816S instruction set and all addressing modes.  Implemented monitor commands include:
  • A – Assemble code.
  • C – Compare memory areas.
  • D – Disassemble code.
  • F – Fill memory range with byte value.
  • G – Run code—stops on a BRK or COP instruction.
  • H – Search memory for pattern, accepts multiple byte values or ASCII strings.
  • J – Run code as a subroutine—stops on an RTS instruction, as well as on BRK and COP.
  • L – Load Motorola S-record code through the auxiliary TIA-232 port.
  • M – Display memory contents.
  • R – Display MPU registers.
  • T – Copy memory range.
  • Z – Clear the console screen.
  • > – Modify memory, accepts multiple byte values or ASCII strings.
  • ; – Modify MPU registers.
  • ! – SCSI extensions.
  • Radix conversion (described below).
Monitor commands are not case-sensitve.

Central to the monitor are the assembler and disassembler, which are essential to debugging and patching code.  The assembler is WDC syntax-compliant and is started by entering a (assemble code) followed by the assembly address and then a 65C816 instruction.  For example:
a 2000 rep #%00110000
Upon assembly of the statement, the monitor will immediately replace the typed input with a disassembly of the instruction and prompt for the next instruction.  In the case of the above, the user would see the following upon pressing [CR] (the Return or Enter key):
A 002000  C2 30        REP #$30
.A 002002
The assembler is able to assemble a 16-bit immediate-mode instruction when syntactically correct.  Continuing the example:
A 002000  C2 30        REP #$30
.A 002002 lda #1234
would result in:
A 002000  C2 30        REP #$30
A 002002  A9 34 12     LDA #$1234
.A 002005
As the 65C816 can address 16 MB of memory, an instruction with a 24 bit address is permissible.  For example, here is a "long" (interbank) call to a subroutine at $8F210E:
A 002000  C2 30        REP #$30
A 002002  A9 34 12     LDA #$1234
.A 002005 jsl 8f210e
resulting in:
A 002000  C2 30        REP #$30
A 002002  A9 34 12     LDA #$1234
A 002005  22 0E 21 8F  JSL $8F210E
.A 002009
Assembly is terminated by pressing [CR] without typing any input at the next prompt.

Disassembly of the above code would be displayed as follows:
.d 2000 2005
. 002000  C2 30        REP #$30
. 002002  A9 34 12     LDA #$1234
. 002005  22 0E 21 8F  JSL $8F210E
The period (.) is both the monitor's prompt and the code disassembly prefix.

You may have noticed that excepting the first entered instruction, operands were entered as hexadecimal numbers without telling the assembler that they were to be interpreted as such.  Hexadecimal is the default number base unless a radix prepends the operand.  Radices are as follows:
% – Binary
@ – Octal
+ – Decimal
$ – Hexadecimal
In the case of the REP instruction, bit-wise notation is especially convenient, since the instruction clears selected MPU status register bits.  So the #%00110000 operand is more convenient and mnemonic than #$30.

In general, all numeric values may be entered in any radix.  Also, the radix conversion feature allows one to enter a numeric value prefixed with a radix symbol as a monitor command and get a conversion to all radices.  For example, entering $7fff at the monitor prompt will result in the following display:
.$7fff
   $7FFF
   +32767
   @00077777
   %111111111111111
The supported numeric range is $00000000 to $FFFFFFFF, that is, 32 bits.

The SCSI extensions permit the issuing of low-level commands to logged SCSI devices or to the SCSI subsystem itself:
  • B – Recalibrate device (disk) or rewind medium (tape).
  • F – Format or erase medium.
  • I – Reset and re-enumerate SCSI subsystem, used to recover from errors.
  • R – Read data from device.
  • S – Check device status (request sense).
  • W – Write data to device.
A SCSI command is always prefixed with ! so the monitor can tell that it is to access the SCSI subsystem.  For example:
!r <ID> <LUN> <LBA> <N> <ADDR>
will read N blocks or bytes of data (the SCSI driver figures out if it's blocks or bytes by examining the enumerated device type) from logical unit LUN of SCSI device ID, starting at logical block address LBA, and deposit the data into RAM starting at address <ADDR>.  Without the !, the monitor would think that the MPU registers are to be displayed, but would flag an error because the R (register display) command takes no parameters.  Once the transfer has completed monitor commands can be used to examine block images, etc.

I should hasten to add that this is a primitive and unforgiving interface, as the monitor performs no error checking (beyond basic syntactical checks) or other hand-holding—it is assumed that you are not a naive user.  Doing something stupid can, and usually will, crash the system or overwrite something important (for example, the RTC registers, which has happened a few times due to clumsy typing), resulting in various and sundry weird events.  In particular, there is no Are you sure (Y/N)? question when the !f <ID> <LUN> command is used on a disk or tape—formatting is a hardware-level operation on the medium that blows away all data.  The monitor will indicate an error only if the SCSI subsystem returns one, such as trying to access an LBA that is outside of the target device's addressable range.

As the '816 has more registers than its eight bit cousins, the register dump command offers more information.  Here's an example:
.r
  PB  PC   NVmxDIZC  .C   .X   .Y   SP   DP  DB IRQV
; 00 C09B  00110010 3100 0057 00C1 CDFF 0000 1C E180
Read from left to right are: the 8-bit program bank (PB), 16-bit program counter (PC; combining PB with PC produces the 24 bit effective execution address, $00C09B in this case), 8-bit status register—displayed in convenient bit-wise format, 16-bit accumulator (.C, which is really two registers, .A and .B), 16-bit X index register (.X), 16-bit Y index register (.Y), 16-bit stack pointer (SP), 16-bit direct (zero) page starting address (DP), 8-bit data bank (DB) and 16-bit IRQ service routine vector address (IRQV), the latter which is read from the IRQ handler's indirect jump vector at $000106.

As the m and x status register bits are set in the above example, all registers will be set to eight bits the next time a g (run code) or j (run subroutine) instruction is issued to the monitor.  .C continues to display a 16 bit value because the most significant byte (MSB, $31 in this case) is retained in the hidden B-accumulator.  Switching the index registers to 8 bits forces their MSBs to $00, something that a programmer cannot afford to forget.

The registers may be changed by issuing the ; (change registers) command to the monitor, followed by the appropriate values.  For example:
;04 21C3 %00000000 1234 5678 9abc
would result in:
  PB  PC   NVmxDIZC  .C   .X   .Y   SP   DP  DB IRQV
; 04 21C3  00000000 1234 5678 9ABC CDFF 0000 1C E180
Note that values have to be entered in the correct order and that omitted values (i.e., SP, DP and DB in the above example) leave the corresponding registers unchanged.  The entered values are actually written to "shadow registers" in RAM, not the MPU hardware registers.  If the g or j command is issued, the MPU registers will be loaded from the shadow registers and execution will commence at $0421C3.

Previous Page   Home
x86?  We don't got no x86.  We don't need no stinking x86!

Copyright ©1996—2025 by BCS Technology Limited.  All rights reserved.
Unauthorized copying or reproduction of website content is prohibited.