RISC-V is too RISCy for Me!
Which CPU's assembly language should I learn next?
For some time I've been hankering to do some assembly language programming again. And that brings up the question of which microprocessor architecture do I want to dig in to and get to really know? Back in the day I was quite fluent in Z80 assembly/machine language. I loved writing in assembly. Doing so I felt like all the chains are gone and I'm completely free to do whatever I want... granted at the expense of a lot more writing! ;-) To be a really good programmer it's good to know what's going on in the CPU under all those layers of languages, OS and libraries.
The wishing road.
Due to IBM steam-rolling the PC market, I was forced from the Z80 into 8088/6 world. It was horrible! =-O As seems to always happen with progress in tech fields I lost many things. I did get a 1MB address space, but the 16 bit segmenting made it feel like I was still in 64K land. The worst part of all, was the 10x, actual measured, reduction of speed. And that wasn't even accounting for the clock speed differences. With the Z80s running at around 1.75MHz and the IBM PC (8088) running at 4.77MHz that makes the clock-cycle difference an actual 30x reduction in speed.
And this is before you suffer the losses of the painfully slow BIOS and M$ DOS calls. I wasn't the only one astounded by this glacial slowness. A Lockheed computer engineer friend of mine once got together with me to try and determine where all those clock-cylces disappeared to. We didn't have the time to really trace it down. Today I'm pretty sure I know where a significant portion of them went. But its immaterial now. :))
In the midst of the forced transition to an architecture I hated, I was blessed with a short period of time working on a TI 9900 based system. It had a fascinating machine code, that I fell in love with. And it was FAST! This got me to wondering what my buddies at Zilog were doing in the 16bit realm. I found they had the Z8000 which was a similar architecture to the 9900. I really wanted one! Unfortunately I have never had the pleasure.
I followed the Intel architecture for a while. But like most things that keep getting propped up in the name of progress, a lot of stuff keeps getting glommed on top of the original architecture. Basically I became even more disallusioned. Later Intel announced the Itanium, with a really bizzarre architecture and its 3 opcode bundles. Not something I can get excited about. AMD finally said "enough is enough" and added the long awaited 64bit extensions to the 32 bit extensions to the 16 bit extensions to the 8 bit architecture. I really long for a less cluttered, cleaner architecture.
IBM, Apple and Motorla got together and came up with the PowerPC (PPC) architecture. It looked like it had promise. But Macs have always been priced way more than they're worth and industry direction was pushing the IBM, M$, Intel path. Not long after that Be announced the BeBox, which was a dual PPC PC. I drooled as I read the write up in Nuts & Volts (June '96) and the idea of a non-M$ fresh OS made it even more exciting. Unfortunately, at that time, I could never scrounge up the cash for such a frivilous purchase. And then it died.
The PPC always occupied a corner of my mind as a possible CPU I could fall in love with. Shortly after I upgraded my home PC to an AMD K6 @ 266MHz I was gifted a blue iMac castoff. It had a PPC chip @ 233MHz. So I thought, "lets see if this chip is all hype or can it compete with the Intel strain?" Now my K6 benched faster than the Pentiums at the same clock and I had carefully selected the components for optimal performance. It was pretty darn fast running Linux. Well I found Linux for the iMac too! So I thought, "Finally! An apples-to-apples comparison!" With all the same software the iMac kept up with my K6 and was even a tad faster at times! Cool! Remember the iMac is 33MHz slower doing the same things just as fast. IMO this answered the question: PPC was better than x86!
So I had planned to put some time into learning the PPC ... but time never permitted, prior to Apple closing the book on their PPC family. At that point the CPU basically became unobtanium and pointless.
The current crop.
Then ARM arrives on the scene in various small devices. And I began to wonder: is it a CPU I can fall in love with? I made a few attempts to get to know its assembly lanugage but between its long herritage, baggage collection and horrible documentation I always ran out of time before I could make any headway in to finding the useful information.
Then RISC-V comes on the scene. I thought, "Wow! Can't get any fresher than that!" And its OpenSource! So it can be extended and enhanced. Maybe it will gain momentum and become the nirvana of modern computing. Well... no, at least not for me. Apparently ignorance reigns supreme in its design.
I started keeping an eye out for an affordable RISC-V device and I finally saw something that fit the bill and tripped my trigger on multiple levels. Without going off on a rant against iOS and Android, DevTerm offered a portable hand held I thought I could find really useful. At the time they offered RISC-V and ARM compute modules to power it. I decided to get the 64bit RISC-V module and figured if I didn't like it, I could get an ARM brain for it for around $40 more. No biggie!
NOTE: Unfortunately, as of the time of this writing, they have dropped all of the ARM modules, except for the RaspberrPI 4 CM adapter. I'll have to get me one of those and the respective Pi.
I'm not one to usually groan about slow clock rates or lack-luster performance. After all I started with 1.75MHz and I was impressed with what all those millions of clock cycles could accomplish. But this RISC-V design is astounding in how little it can get done with the 1GHz (a BILLION clock cylces per sec.). I spent quite a bit of time testing it and documenting those tests on the ClockWork Pi forum. I couldn't pin down where the slowness was coming from, other than the architecture itself. The short of it is that my C.H.I.P., which is an Allwinner R8, sporting an old generation ARM 32bit single core, running at 1GHz, is a speed-demon (2x faster) compared to the Allwinner D7 RISC-V 64bit, single core at 1GHz. And the R8 is considered slow by todays standards.
I could probably live with that since software I write myself seldom needs a whole lot of CPU cycles. And I bought this mostly as a portable tool that I could code on and print from. But the point of this exorcise was: could I love the RISC-V architecture? So I dug up some docs on its architecture and machine instructions and set about studying it. The RISC-V docs are almost as bad as the ARM docs in trying to piece the bits together to familiarize yourself with it. But there are a number of people that have condensed the docs into something that someone starting with the ISA can use.
I didn't get far. I liked the depth of the register stack. But I immediately tossed the docs when I read that there were no CPU flags. Huh?!?! The author of the doc I was reading decided this point was important enough he quoted directly from the official architecture spec. I'm not going to burn the time to dig up the exact quote but it boiled down to the designers thinking the only purpose in these "wasteful" flags was conditional branching. IMO this is an alarming level of ignorance for a group trying to set an international computing standard. While a large percentage of flag use does boil down to temporary tests and branching (like: compare, jump not zero) there are a whole other set of situations where these flags come in to play. Obviously they never spent time in the 8bit realm, where it was common to chain 8 bit additions, through the cary flag, to perform 16bit or larger math. But there are also times where you want to keep the results and conditionally branch. All of these common situations will require 2 to 4 more instructions, extra registers and / or RAM locations to do the same work. IF the architecture is sooo efficient that all these happen in a single clock... it might could compete... maybe. But my results show its really slow.
Where's the love?
Well... I imagine the RISC-V stuff is good for someone. I'm also sure there will be those that find beauty in it and defend it vigorously. But not me. The proof is in the pudding and its really slow doing the same things my other computers do, even adjusting for clock rate.
It turns out I was browsing my October '91 issue of Circuit Cellar Ink this past week. Much to my amusement I saw an article about the "Acorn ARM", written by Tom Cantrell. It kind of came off as a CISC vs. RISC debate until you get into it. I found it fascinating since I'm heading back to the ARM camp for another look. It really is the only other architecture readily available.
From what he documented of the instruction architecture I was immediately impressed. Apparently he wasn't. :-D He was coming from the perspective of embedded devices and pointed out that the ARM required substatnial supporting hardware in '91. I wonder if he's taken a look at the field now with the plethora of ARM based micrcontrollers and SOCs. Its almost exactly what he ordered! ;-)
Back in the Z80/6502 days I took a look at the 6502. Frankly I was impressed with the software I saw running on the Atari game consoles and the Apple ][ computers. They felt unusually fast. I really have nothing to compare it with, but it always pricked my curiosity. I took a brief look at the instruction set of the 6502 and said, "No way!" I was not going to strap myself to just 3, 8 bit registers when the Z80 gave me 8, that could be ganged into 4 16bit registers and had 3 other 16bit registers. And frankly there was nothing I saw those 6502s doing that I could say, "Wow! That's faster than the Z80 could do." While I did have people see me do things with my Z80s and say, "Wow! The [IBM] PCs can't do that, that fast." "Yeah, I know."
Tom's article has wet my appetite. I think its an architecture I can like, if not "love". But since its the only other widely available option it doesn't really matter. It will be better than the RISC-V and x86 path, even if encumbered with its own legacy baggage.
I'm not a CISC or a RISC guy. I think a good CPU falls somewhere in between, assuming there is an in between. I suppose for me I'm looking for a CRISC architecture: Complexity Reduced Instruction Set CPU. Start with a lean and consistent core set of operations that work well with each other. And then add some (not many) instructions to accelerate common operations like: multiply, divide, memory move, memory search, ... But only if the hardware backing the instructions can do the job faster than the equivalent machine language routine and the hardware isn't a burden. But don't get carried away with a constant parade of additions, so you can keep shouting *NEW*. And don't cling to some ideology that isn't effective in the real world and only serves to handicap your product.
Whatever your preference, Happy Computing!
|