Follow

I really hope my text here is helpful and that it answers at least some of your questions. Please feel free to ask for clarifications if I am not helping.

I often say that the Forth programming language has the expressive power approximately equal to the C programming language. In some ways, it's more expressive (superior ability to factor code); and, in others, it's less expressive (lack of things like "struct" and "array" concepts that are built-in). But, on the whole, everything that can be expressed in C can be expressed in Forth in more or less a similar amount of code.

However, when I say this, I literally only mean the C programming language itself, and not any of its standard libraries. A lot of the functionality one thinks about when they think of programs written in C comes not from the language per se, but from its standard libraries. Here's an example of what I mean: compare the strlen function in C versus in Forth:

size_t strlen(const char *p) {
const char *q = p;
while(*q)
q++;
return q-p;
}

: strlen ( addr - n )
DUP BEGIN DUP C@ WHILE 1+ AGAIN SWAP - ;

Given that we're working with a systems programming language in both cases, the rest of the differences between C and Forth must boil down to a combination of the standard libraries offered in each, and the philosophy which guides both.

If you'll indulge a slightly different way of thinking about programs, command line programs can be thought of as subroutines which you call from a different programming language (specifically, the shell). They have a starting point (in C, the function named main), they return results (characters printed to stdout or stderr, plus the result code), etc. How this is done is different between operating systems; Windows is very different from OS/2, which is different still from Unix, which is different again from AmigaOS, etc. That C can work well across all these environments is, in large part, thanks to the standard libraries that C frequently ships with.

For example, main takes two parameters: argc, which provides a count of the arguments provided on the command-line, and argv, which is an array of pointers to strings, each holding a single parameter as typed on the command line. But, something has to provide that. In the case of Unix, the kernel actually provides this information. In the case of Windows and AmigaOS, the C startup code parses a flat string into these things to provide a compatible interface.

As it happens, you can also do the same thing in Forth; however, Forth doesn't have this facility built-in. It's something that you either need to code yourself, or you need to rely on an interpreter-specific package for this purpose. For example, GForth (for Linux) offers the next-arg, arg, shift-arg, and even argc and argv words for this purpose (see gforth.org/manual/OS-command-l). PygmyForth for MS-DOS, however, uses a completely different mechanism.

As you might expect, a Commodore 64 won't provide any CLI mechanism, since it lacks a CLI in the same way that Unix or Windows provides. Which now begs the question: is there an alternative way to accomplish the same goals as a CLI interface on a computer which doesn't offer a CLI? In other words, working entirely within Forth, with no external dependencies, can we get a similar experience without writing an entire clone of a typical shell interface?

There are several approaches.

The first is to create a kind of domain-specific language that is interpreted from a string input. So, in Forth, you could specify something like:

S" file1 file2 -p" COPY

where COPY is the Forth word which processes these arguments and, perhaps, copies file1 to file2 preserving permissions settings. Making a lexer that can split arguments apart for this kind of input can be done in about 60 lines of commented Forth code. (The link is to a lexer that I use in a compiler project; but, the same sort of steps are needed in both applications.) These can be shared across all your utilities; but, each utility will need more code to actually make sense of what the lexer is reading out.

To accept input directly off the Forth command line, you'll want to first parse to the end of the line to grab a string, and then submit that to a word which already works with arbitrary strings. This is basically how the \ word works to implement single-line comments: it parses to the end of the line, then just ignores what it read.

Finally, you could "forward parse" characters right from the Forth input buffer directly. I prefer people avoid doing this because it is harder to test and can actually be somewhat less portable; but, it is a definite possibility. The Forth : (colon compiler) works this way. The >IN variable is used to keep track of where you are in the current input buffer, so you'll be using this variable frequently to index into the input buffer.

Another approach, and one which is much more idiomatically Forth, is to use a Builder pattern (or "fluent interface" as it's also known in some communities). In this case, you don't write "a" Forth word which represents "a" single program. Instead, you have multiple words which work together to produce the same effect. It's more verbose; but, it is also a whole lot less work too. For example, to copy file1 to file2 using this approach, you might write:

COPY S" file1" FROM S" file2" TO GO

In this scenario, COPY doesn't do anything more than reset the internal state of "the program" (for example, clearing the input parameter buffers and resetting options to their default values). FROM and TO set parameter buffers to their user-specified values. Finally, GO is the word which actually does the work of actually performing the copy. In its most advanced form, COPY can change vocabularies, so that FROM and TO are understood in the context of a file copy operation, while GO can restore the previous vocabulary. This would let you run multiple commands in sequence without needing to do anything special. This would only work on Forth systems that support vocabularies of course. Here's what I think it'd look like:

COPY S" source.asm" FROM S" source.bak" TO GO
ASSEMBLE S" source.asm" FROM S" source.obj" TO S" stdlib.lib" LIBRARY GO
LINK S" source.obj" FROM S" stdlib.obj" FROM S" output.exe" TO GO

There are definitely options available for processing command-line arguments. None of them are as convenient to use as a dedicated shell language, which is what Bash is, but you can get pretty close!

Now, this might raise some issues concerning memory management. Not all Forth systems support dynamic memory management. In C, you have malloc() and free(); more precisely, in C's standard library, you have malloc() and free(). C itself, like Forth, is completely unaware of dynamically managed memory!

Modern, ANSI-compliant Forth systems offer two words for managing memory dynamically: ALLOCATE and FREE. For Forth environments that pre-date ANSI or which do not support that wordset, you can actually do a lot with just ALLOT. HERE provides a pointer to the next free byte in the dictionary, and ALLOT is used to advance HERE. It's literally what functional programming language implementors call a "bump allocator."

1/2

So, one of the things you might be able to do is copy strings into temporary buffers at HERE on an as-needed basis. Here's a sample implementation for a hypothetical COPY operation:

VARIABLE 'here0
VARIABLE 'from
VARIABLE 'to

: COPY
HERE 'here0 ! ( set 'here0 = current HERE value )
'from OFF 'to OFF ( reset our parameter settings )
\ ...
;

: placeString ( caddr u -- )
( Copies string to HERE, then advances HERE to allocate it. )
DUP C, DUP >R HERE SWAP MOVE R> ALLOT ;

: FROM ( caddr u -- )
HERE 'from ! placeString;

: TO ( caddr u -- )
HERE 'to ! placeString ;

: GO ( -- )
\ ... do the copy stuff here ...

\ OK, the copy has been performed. It's time to
\ "free" all the memory we used. Calculate
\ 'here0 - HERE which will be a negative number.
\ Thus, ALLOT counter-"bumps" HERE, thus freeing
\ the space we used above.
'here0 @ HERE - ALLOT ;

To retrieve the contents of a string, you can do something like 'from @ COUNT to recover both the character address and its length on the stack. If your dialect of Forth doesn't have the COUNT word, it is easy to make yourself as well:

: COUNT ( a -- caddr u )
DUP C@ SWAP 1+ ;

There are some things which most Forth implementations lack in relation to other operating systems like Unix. The biggest ommission you'll likely encounter is a lack of redirectable I/O. More precisely, lack of easy to use I/O redirection. Most Forth systems allow you to revector EMIT, TYPE, KEY, etc. to support other I/O devices. However, again, this is the sort of thing you need to write yourself. There's nothing standardized, and there's nothing at all even approximating pipes between processes.

A lot of this is largely due to the over-arching philosophy differences between Unix and Forth. If I may over-generalize significantly, Unix and its progeny were initially designed to allow AT&T employees to support text processing applications in a multi-user environment on time-sharing computers. Since multiple users were operating a single computer, protection barriers between concurrently running processes were a requirement. These people were intimately familiar with IBM mainframe operating systems of the day, and were looking for something easier to use, while keeping all the good bits they liked. You can think of Unix as a kind of "MVS, the Good Parts."

Forth, on the other hand, was intended to support the fabrication of carpets originally, and only later the data acquisition and control of NRAO radio telescopes. The computer systems used were intended to be single-user and extremely cheap, as factory floors and science departments at universities are almost always cash strapped. (Although the latter application did support multiple online users at once, which was a feat of engineering for the day.) The system resources were so small that even Unix 1st Edition wouldn't fit even if Chuck Moore had cared enough to try. And, once installed, the environment would be a long-lasting environment; the idea of changing applications frequently wasn't a concern except during development. Even then, with all applications being full-screen, the idea of multitasking in the way we think of it today didn't apply. Unlike Unix, there are no object files; programs always loaded from source directly into binary representation. Thus, source was always available, and that meant no need for configuration files (nor the parsers needed to process them), and indirectly, no need for parsing parameters from command-lines. You just hacked the program directly to do your bidding and things got done. In fact, there wasn't even a filesystem at all; early Forth systems just used raw block numbers on disk drives. You were responsible for allocating disk space manually. Forth evolved as a response to growing complexity on IBM minicomputers, which meant people were already familar with hacking "job control language" files and manually allocating "data sets" anyway, so this was all perfectly natural. By analogy, you can think of Forth as "*IBSYS, the Absolutely Essential Parts and Nothing More."

Where Unix was designed to support consolidation of multiple users on a single machine, Forth advocated even then lots of small, inexpensive, single-purpose environments which communicated with each other using message passing techniques. Unix philosophy strove for long up-times by controlling what resources users could access. Forth philosophy strove for long up-times through viciously cutting complexity to the bone and, ironically, "failing fast." Users were given full access to the machine, and crashes were learning experiences. It is interesting to note that both approaches work quite well, and are in common use today. Linux machines can have stellar up-times if maintained well, and a lot of utility companies run optical networking switches that use Forth as their operating system and run for years at a time.

These philosophical differences continue to permeate the two environments to this day.

2/2

I goofed on the definition of COUNT. It should read:

: COUNT ( a -- caddr u )
DUP 1+ SWAP C@ ;

Apologies if this message seems out of context. It was intended to be a response to a private message and somehow it ended up going public. Sigh. :(

@vertigo thanks for the write-up! I learned something. and there's a chance I'll write some Forth for the first time since the project has a minimal Forth bootstrapped from the tiny hex seed, just nothing built with it yet :blob_raccoon_peek:

@theruran Meanwhile, I'm happy to report that my "actors" and "message queue" system for my System Forth implementation is coming along nicely, even if I am two months late with it.

Test-driven development enabled by Shoehorn and my ForthBox emulator is a very pleasant experience.

@theruran It has proven useful. :)

Here's the current body of code for my "actors": git.sr.ht/~vertigo/forthbox/tr

Here's the startup code/primitive library it depends upon: git.sr.ht/~vertigo/forthbox/tr

The current chunk of code compiles to about 2KB of binary, about 1.5KB of which is unit test code. So, just a smidge over 512 bytes for the actor logic.

@theruran Oh, I forgot to include a link to the design docs for the actor concept. I think you've read this before, but not sure.

git.sr.ht/~vertigo/forthbox/tr

@vertigo I don't see that they have developed structs/records like GForth provides, although they list "Stores and handles structured items (strings, queues, lists, stacks) on stack and return stack. No memory required." :thounking:

@theruran Structures are relatively easy to add into Forth; for that matter, a single-dispatch object system can be done in only a handful of lines of Forth.

@vertigo @theruran thank-you both for this conversation. I’m back on my forth learning kick and it’s inspiring to see other people talking about applying it to interesting problems.

@requiem @theruran Happy to help where I can. (Slowly hides his Rust Programming Language book from view.)

@vertigo @requiem I've been thinking about cutting out C entirely from the bootstrap chain to an Ada compiler. that would be a significant improvement, I think!

@theruran @requiem For my ForthBox system, off in the distant future, where dreams survive on cotton-candy and lollipops, I was planning on writing a minimal C compiler in Forth, for use right inside the Forth environment.

Just spit-balling here, but imagine something like:

: K>F ( n -- n ) ( convert kelvin to degrees F )
273 - 180 100 */ 32 + ;

: typez ( a - ) ( Print a nul-terminated string )
BEGIN DUP C@ DUP
WHILE EMIT 1+
REPEAT
2DROP ;

C-CODE
___ImportFromForth___ {
int to_F/"K>F"(int);
void puts/"typez"(char *);
void puti/"."(int);
}

unsigned int
convert(unsigned int degf) {
return ((degF - 32) / 180) * 100;
}

void
print_f_deg(void) {
int temp;

puts("Water boils at ");
temp = to_F(373);
puti(temp);
puts(" degrees fahrenheit.\n");
}

___ExportToForth___ {
int convert/"degF->degC"(int);
void print_f_deg/"water-boils"(void);
}
END-C-CODE

." Water boils at "
212 degF->degC 273 +
." Kelvin." CR

water-boils

I strongly suspect, though, that this will have to wait until I have the RISC-V version of the ForthBox running. I don't think the 65816 version will let me have a big enough dictionary space to support both Forth and C compilation at the same time.

@vertigo This is a nice property that well-integrated, well-curated stacks share. A language is a sweet spot in the tension between power (can say a lot) and convenience (things you want to say most often are short). When the set of things you want to frequently say grows too large, create a new language. When programs in a language are too slow, bump down to a lower language.

@vertigo As you point out, Forth doesn't need a shell because there's no concurrency and so no process or file abstraction to streamline access to. Since it has no shell, commandline args are moot.

So much has been lost as we frame our thinking in terms of single languages rather than stacks of languages. The goal of a HLL is not to avoid having to learn a LLL. "The best tool for the job" is not "the tool I know best as I do the job".

@akkartik

"The best tool for the job" is not "the tool I know best as I do the job".

Brilliantly expressed! I'd put that on a T-shirt if I could.

@vertigo If this is any consolation, it was a great read!

@vertigo a great post. I never wrote Forth professionally, but IMO 'Starting Forth' is one of the great books on software development.

@dougfort Indeed, especially when coupled with Thinking Forth.

However, considering the prevalence of Unix-y operating systems today, figuring out how to use Forth in the context of most system software today can be frustrating. Starting Forth shows its age here.

@vertigo you might want to use CW on the future to be polite. This took up most of my phone screen for several swipes

@dualhammers like I already said elsewhere, this was an accident. It was originally intended to be a private message.

@vertigo okay. I didn't see the elsewhere, only that message got RT'd into my feed

@dualhammers Thanks! I just wish I was paying more attention when I hit the Toot button.

Sign in to participate in the conversation
hackers.town

A bunch of technomancers in the fediverse. This arcology is for all who wash up upon it's digital shore.