The Emulator really is alive!

Behold, the very first bytes sent over 9P protocol to the operator's console via the emulator.

It only took 1.6KB of firmware binary and 32KB of emulator software.

@vertigo You're using 9P for the console protocol, then?

@elb Yup; and also for mass storage as well. I eventually also plan on putting a mouse on that channel as well.

The longer-term goal is to reify the ForthBox processor core into an iCE40LP8K FPGA, which in turn plugs into the RC2014 backplane bus.

I already have a plug-in card design which exposes the FPGA as a target-only device on the bus. Since it cannot perform DMA into the Z80's memory, I have to use a FIFO to communicate between the Z80 and the ForthBox. Hence the QIA core; the "Q" stands for Queue. :) (The QIA, in turn, is also a proper subset of another core I have plans to develop, a revision to my Serial Interface Adapter or SIA. So if you see me mention SIA in my toot history or website, that's what it means. In theory1 , I should be able to replace the QIA core with an SIA core, and without any changes to the software, I should be able to successfully pass 9P traffic over a set of wires to another device like an Arduino or RPi Nano.)

Having that simple channel, the next step is to pass messages across it. I tried designing my own protocols, and even looked into using Commodore's IEEE-488-based protocols with their 8-bit computers. But, for various technical reasons, I couldn't provide a clean mapping. The only mapping that seemed to work well would turn out to be 9P.

I've been wanting to use 9P for my previous Kestrel Computer Projects for some time already, but never got far enough. This is the first project where I've made progress far enough to warrant getting real with its implementation.

Not only that, but I even considered the possibility of using 9P for access to RAM as well. Byte for byte, 9P turns out to be about as efficient as RapidIO, give or take, assuming all the FIDs you'd ever use on that channel have already been established. So, using 9P on the motherboard of a computer seems like it would actually be practical.

@elb Forgot to add the footnote:

1 Theory and reality are only theoretically related.

@vertigo I love this so much!

I'm building a TMS 9900-based homebrew, and disk attachment is something I frequently come back to as problematic. I had been planning on IDE or something similar, but now I'm wondering...

@elb I started out with SD cards with my Kestrel-2 (which later evolved into the Kestrel-2DX; I seem to have lost the website for the Kestrel-2 design). It taught me many things:

  • SD protocol is trivial to operate once it is up; but, it's not trivial to bring up in the first place.

  • SDHC/SDXC extensions are not backward compatible with SD card bring-up procedures. Even after pouring through the Sandisk-authored specifications for these cards, I remain, to this day, incapable of properly initializing an SD[HX]C card.

  • SD cards can be quite buggy themselves. After implementing write functionality for the first time, I was baffled to learn I could only write one sector before the card seized and refused to respond again. I literally had to re-initialize the card after every sector write in order to maintain functionality. (Despite this, it didn't really impact perceived performance that much.)

  • Occasionally, SD cards would (I think???) attempt to wear-balance, especially since I did not use the FAT filesystem at the time (and am unlikely to do so in the future). This can lead to stop-the-world garbage-collection-like delays in I/O to or from these cards. You'll be sitting there, waiting minutes for a few kilobytes of data to save, which should normally take a fraction of a second to complete.

Because of this, I learned that I wanted to divorce myself from hardware-specific details. I wanted an I/O channel that I could reuse for many different types of storage and not have to worry about specific details. This lead me to recall Commodore's old IEEE-488-based I/O channels, and from there, to using 9P.

The obvious disadvantage, of course, is that now you need an intelligent peripheral to talk these protocols. An RPi Nano will be markedly more powerful than the machine it's attached to (at least in my case!), but it's probably one of the simplest/cheapest ways of attaining to my goal. ;)

@eris ForthBox is hard for me to define, if only because I hadn't put much thought into it until now. ;)

I basically dissolved my Kestrel Computer Project after Raptor Engineering released a small computer named Kestrel. I didn't want to deal with the obvious legal and market confusion that'd inevitably arise if my Kestrel project ever became more popular.

But, I also didn't want to just stop making homebrew computers either. So, I decided to go back to my roots, and revise the Kestrel-2 design into something newer.

I started out by redesigning its stack architecture CPU into something that could evolve into a 32-bit or 64-bit processor without much fuss. A simple recompile should be sufficient to turn 16-bit code into something that runs on a 32/64-bit processor. (And, if written portably, vice versa.)

The big reason I jumped on the RISC-V bandwagon so many years ago is because I couldn't figure out how to build a MISC CPU that had good code density. I've learned a few tricks since then, and am now applying what I've learned over the years in the design of my new processor.

ForthBox is still (currently) 16-bit, like its Kestrel-2 predecessor. Right now, its code density seems to compare well with a 6502, which is plenty good enough.

The specific design target for actual hardware is an iCE40LP8K FPGA, which will plug into the backplane bus of an RC2014. There, the Z80 will function in a manner not unlike a Commodore floppy disk drive, as well as providing basic keyboard access. Whereas Commodore used the IEEE-488 protocol over their link, I am using 9P.

Over time, I want to upgrade the ForthBox piece by piece, eventually culminating in a desktop computer rivaling a Commodore-Amiga or Atari ST (the same goal I had for the Kestrel-3), and quite likely re-introducing a RISC-V processor along the way.

But, for now, I'm starting off simple.

@vertigo this sounds really cool! one question: is your MISC compact like the traditional opcode packing method, or something more like the j1? :O please do tell me about these other tricks for code density, too!

@eris The J1 follows a "horizontal" instruction encoding, so more closely resembles a Novix NC4000.

The MISC concept uses a "vertical" instruction encoding (say, only 5 bits per instruction), but they are packed like a VLIW word. Except, unlike a VLIW processor, they're still executed sequentially. This lets a MSIC processor fetch 4 to 6 instructions in one memory cycle, then spends the next 4 to 6 cycles executing them. In this way, you can (asymptotically) come close to instruction throughputs drawn from an instruction cache, but without the complexity of actually including a proper cache.

My S16X4 processor (and its followup, the S16X4A) packed four instructions into a 16-bit word. And it was pretty darn zippy for such a simple CPU design. Problem is, when you expanded it to 32-bits, you quickly realized that just about half of your memory used for numeric constants (which occur frequently with these types of processors) are just used to store 0-bits. That's a pretty bad waste of space. After comparing it with RISC-V, I realized that a 32-bit MISC would be about as dense as a 48-bit RISC. Ouch!

My current stack processor design uses a more CISC-y approach, where opcodes are single bytes. The only multi-byte instructions that exist are those which push values onto the stack. It will have instructions to push (signed) 8-bit and 16-bit integers onto the stack. There are also instructions which push PC+n (for both 8-bit and 16-bit versions of n) onto the stack. Given that you can push PC+n onto the stack, you can then fetch or store 8, 16, 32, or 64-bit values. This adds some overhead for wider values; however, these occur so infrequently that you won't even notice the performance blip. It'll appear as noise. On the other hand, since these wider values are frequently constants and can be re-used, after something like 3 instances of a large constant, it actually saves memory.

By going this route, I can place immediate values in-line with the instructions that consume them, and that saves a whole bunch.

I do still intend on fetching 4 or 8 bytes at a time when fetching instructions; this technique is still value. Given a 32-bit instruction bus, for example, and assuming the memory system can keep up, that allows me to execute whatever those four bytes are in just 5 cycles (9 cycles for a 64-bit bus). So, amortized, we still get performance close to 1.200 cycles per instruction on a 32-bit bus, 1.125 cycles for a 64-bit bus.

It'll still be slower than a decent RISC architecture; but, put up against a single-issue, in-order RISC processor, it should be able to hold its own.

Sign in to participate in the conversation

A bunch of technomancers in the fediverse. Keep it fairly clean please. This arcology is for all who wash up upon it's digital shore.