User Tools

Site Tools


dippy

Dippy

dippy (adjective): silly and eccentric or scatterbrained

The idea for Dippy originated onhackaday.io but now this is the main site.

Overview

A really wacky computer built using DIP switches for ROM, a few shift registers, no RAM, running a Forth-like instruction set and having a minimal transistor count.

Q: Why DIP switches for ROM? A: So that you can visibly see all of the instructions. There will be one LED per bank of DIP switches so that you can see which one is being read and the instruction bus will also have LEDs on it so that you can see that the values on the switch have made it to the processor.

Q: Why shift registers? A: To keep the component count down - registers/RAM can easily dominate the component count. Ideally the shift registers would be physical, needing components only to read or write, more on this below. A secondary consideration is that shift registers imply a bit serial ALU, which simplifies that a lot.

Q: Why no RAM? A: The idea is that the shift registers have enough storage for really basic programs. External RAM violates the completeness of the solution and it wouldn't be visible.

Q: Why Forth-like instruction set? Because it is compact, I've got to squeeeze in as much as possible here.

Q: Why minimal transistor count? A: Partly because of the challange, partly because it is easier for others to replicate and build on this work and largely because I'm terrible at soldering so unless it's minimial it may never get finished.

Rough specs

10 bit instructions in a 9 bit read-only address space built from 512 10-way dip switches (e.g. eBay).

16 shift registers, holding at least 16 bits each, hopefully 32 bits. These form an ascending Forth data stack and a descending Forth address stack (even though addresses are 10 bits). Two 4 bit registers act as the data and address stack pointers. If four components per memory cell (may be 5), then 16 registers at 16 bits would be 1024 components (512 transistors). At 32 bits per register that would be 2048 components (1024 transistors).

DIP switch ROM

The goal is a reasonably inexpensive program storage where it is clear how it operates.

A simple (linear) design would take the 9 bit address bus and invert each line to give 18 signals. Any and all addresses can now be decoded using 9 diodes and one transistor per address. Each diode is wired to either the corresponding data line or its inverse, so that current flows in all cases except the required address. This then feeds a transistor whose output is high when the required address is present. A LED can be used as the load so that there is a visual indication of the address used. The output feeds all the top pins of the DIP switches via diodes, with all lower pins connected to the instruction bus. There's a huge number of DIP switches and an enoumous number of diodes…

An improved design would first get rid of all of the diodes to the instruction bus, then work on better decoding.

Shift Registers

I've no good answer for this yet. Ideally you'd be able to see all the digits and it would have a variable clock rate between one cycle every few seconds (so that you can see exactly how everything works) up to hundreds of kiloHertz (the max switching speed of the transistors). Anyway, here are some options:

Acoustic Options

Let's start off assuming that we can use acoustics to store data:

I'd really like the registers to be 32 bits as, unusually, the component count is independent of the register size, i.e. the register bits come for free (okay, that's assuming no acoustic dissipation and even then the instruction clock does run proportionally slower). At 2m tube length, that's just over 6cm a bit. Let's assume that's six wavelengths, so one is 0.01m and with the speed of sound at 340m/s that makes 34kHz or thereabouts. This seems possible, it's in the range of acoustics I know a little bit about. In order to transport the finished machine, and to hang it on the wall, the maximum dimension has to be 2m.

I originally started off with the idea of using 40kHz ultrasonic transducers. However, after playing about a bit I find that they have less than 5kHz bandwidth. In general any piezo transducer has a resonant frequency and so limited bandwidth (ref). Whilst it's great to filter out the low frequency background noise, I need more bandwidth to fit the bits into a 2m tube.

Recently I've found that small electret microphones can have a 30kHz frequency response. The idea of building a 32 bit computer is very appealing, so this is where my effort is going at the moment. potential speakers

The clock rate is dependent on the tube length, so at 2m length that's only 170 instructions a second! It's almost a shame it's not slower, then we'd be able to see the computer working. I'll probably hack in hardware NOPs to make everything run at about 1Hz and be visible.

I'd like the computer to run for 10 hours without error. Assuming 32 bits per register and 16 registers then that's no errors in 32 * 16 * 170 * 10 * 60 * 60 = 6120000. So we need a bit error rate of about 1 in 10^8, which will be quite tough give that we are competing with environmental noise.

Old links:

Other analog options

TVs used to be analog. One PAL scan line is 64 μs https://www.youtube.com/watch?v=bsk4WWtRx6M https://www.youtube.com/watch?v=-qerYLM-eEg and that's 720 'pixels' - lots of bits! I haven't found a source for these yet, but if I do then that would be great.

The physics behind an analog TV delay line is interesting, 1.3 μs delay from 2,816 turns of enamelled copper wire between two conducting tubes - http://www.hawestv.com/mtv_color/delayline.htm. Not easy to construct and just not enough delay to store enough bits.

A cheap eBay laser pen claims to have a 10 mile range, that's 53 μs, about the same time delay as the PAL line. I can't imagine getting 16 of these working with line of sight or mirrors parallel enough to get multiple reflections, but someone else may know how to (Free-space_optical_communication). It seems that 24 core optic fibre comes in at about $240 per km, so if the rest was built in TTL it may work (much faster than acoustic).

The start of the ionosphere is 75km up, if the bounce was clean (which it won't be) the Shortwave_radio would give a 0.5ms delay. It would have to be quite broadband, but Spread_spectrum >50MHz is licence free. The real problem is the bounce has no chance of being clean. We can detect lasers on the moon…

Discrete physical options

Not well thought out, but a rotating metal drum with magnets in one of two positions might work. The idea is that it would be possible to read and write the position using electromagnets.

Discrete electronic options

I have a two-transistor memory cell which can drive a LED. It should be possible to store the output on a capacitor and so chain these. The idea is that the capacitor stores the previous output and all the read select lines are pulsed at once, so moving a bit pattern one step down. This may well require an additional resistor so that the capacitors don't change state whilst the memory cells are updating.

Instruction set

b10 b9 b8-b6 b5-b1
0 0 JSR to 9 bit address ending in 0
0 1 LOAD - 8 bit immediate load
1 0 condition branch relative: -16 to +15
1 1 condition basic instruction (5 bit)

If top two bits are clear, then JSR to the remaining address (even addresses only). Thus this is subroutine linked Forth but without the overhead of the JSR instruction.

If next bit clear, load immediate the lower 8 bits (possibly sign extended - TBD)

Everything else is 3 bit conditional (1, lt0, le0, eq0, ne0, ge0, gt0, 0). Half of the space is for relative branching, of 5 bits (-16 to +15). The remaining is the basic instruction space:

  • IN input to top of stack
  • OUT output top of stack
  • RET return from JSR
  • NOT bitwise invert top of data stack
  • INC increment the value at the top of the data stack
  • DEC decrement the value at the top of the data stack
  • DROP decrement data pointer
  • SWAP swap top two items on data stack
  • AND/OR/XOR/ADD/SUB operate on top two items and leave one
  • D2R/R2D/R stack manipulation

If can keep the basic instructions to 16 then I can rejig the instruction space so that JSR doesn't have to end in zero. But if I use DIP switches to decode the instructions then it would be nice to allow others to add instructions just by setting these switches. Here is full 9 bit JSR addressing:

b10 b9 b8-b6 b5 b4-b1
0 JSR to full 9 bit address
1 0 8 bit immediate load
1 1 condition 0 branch relative -8 to +7
1 1 condition 1 basic instruction (4 bit)

Microcode

I find a C like notation very convenient:

  • P is the program counter, P++ means increment the program counter
  • D is the data stack, D++ increments the data stack (– decrements)
  • R is the return stack, R– decrements the return stack (for pushing a value as it's a descending stack)
  • *P/*D/*R is the value of the data at the program/data/return counter
  • Z is a flag indicating that the value of the last ALU instruction was zero
  • N is a flag indicating that the value of the last ALU instruction was negative
INSTRUCTION Register movement
JSR *R– = P ; P = I
RET P = *R++
LOAD(X) *D++ = X
B(OFFSET) IF condition THEN P += OFFSET

Make believe code

I've not yet written an emulator, or even fixed the instruction set, so none of this is final. Nevertheless, it's useful to write some code to see what is missing.

Flash some lights

LOAD(0) :loop INC DUP OUT B(:loop) 

Learning: DUP is a very common instruction and it may well be worth having a DUP-OUT as well as a DUP instruction. On the other hand, DUP OUT RET is only 3 or 4 words. Is 4x slower and 4x the memory worth it? Maybe it depends on what the microcode decode looks like and how much instruction space there is. AND/OR/XOR/ADD/SUB/D2R are all candicaes for an extra DUP or two (e.g. DUP2ADD which is a non-destructive ADD).

Add two numbers

IN IN ADD OUT

Learning: Input has to be buffered, that is the processor should stop if input is not yet available. Perhaps input is done with a 0/1 toggle switch and an add to buffer. Once it's full then the processor can continue. Another add-to-buffer switch which adds 8 copies may well be useful as a 32 bit input is probably all 1s or all 0s in the top bits.

It would be nice to have more than one register as output. Maybe an RPi will feed the input and store all output?

Multiplication

There is only a few registers and no carry bit, so this is just 32bit by 32bit giving a 32bit result. With no LSB its hard to peel off the low bits and stop when the result is zero, which is a shame as most invocations won't be full width.

Simple first pass with LSR - return stack stores accumulator, works best with last arg +ve (can test and switch):

def MUL
LOAD(0) D2R # set accumulator to zero
:loop
DUP LOAD(1) AND BZ(:skip) # test low bit and skip hard work if not set
DUPDUPADD R2D ADD D2R
:skip
D2R DUP ADD R2D # double the first arguement 
LSR # halve the second argument
BNZ(:loop) # loop if not zero
DROP DROP # get rid of both arguments
R2D # retrive result
RET # and exit happy

Where:

def DUPDUPADD OVER OVER ADD RET  # this would be much better with a non-consuming ADD
def OVER SWAP DUP D2R SWAP R2D RET   

Learning: Really need both LSR and non-destructive ALU operations. What is a good naming convention for ALU operations that implicitly encodes the data stack changes?

Division

This is a major challenge as there is no LSR instruction (as this is very hard on a serial ALU).

dippy.txt · Last modified: 2019/07/23 07:14 by admin