Open
Conversation
…to bss rather than text section which avoids the need to call mprotect(), rename things
… be wrapped with PROGRAM / END, also removes automatic bye token that was generated by END
…time.seedsource, so that we can run textual forth code without the tests or the banner
… writes to stderr, fix self-hosted tokenizer termination issue (was debugged with eemit)
e3b1a9b to
7115f49
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
More hacking... what I set out to do was to make the seedForth tokenizer self-hosting, so that after bootstrap you would not need gForth to develop applications. So my idea was make the tokenizer work in gForth like now (for bootstrapping) and also work in seedForth interactive version (for application development). It turned out to be quite difficult, but ultimately it works.
So the actual changes to
seedForth-tokenizer.fsto make it run under seedForth were not that huge, mainly a matter of accounting for seedForth's case sensitivity and restricted syntax for hex and character literals and various things like that, as well as minor differences in the words available (parse-nameinstead of<name>etc). But the larger difficulty was in making aseedForthorseedForthInteractiveprogram run cleanly as a filter. I had to modify the runtime library and I/O system a lot.There was also another issue to deal with which concerns the wrapping of the
*.seedand*.seedsourcefiles. Originally the input was wrapped inPROGRAM/ENDand the output was wrapped with an automaticbyetoken added at the end. I have removed the need for all of this wrapping, at the cost of its being slightly more awkward to invoke thegForthversion of the tokenizer. Since this is only done from theMakefileduring bootstrap, that's not a big deal. It's only just occurred to me now that the unusual extension*.seedsourcewas probably due to the wrapping, so maybe we can rename them to*.forthnow?Here is a detailed summary of all the changes I have made to support the self-hosting tokenizer:
./seedForth-tokenizerscript, which operates as a filter and takes a*.seedsourcefile onstdinand outputs the corresponding*.seedfile onstdout. It works similarly to./seedby concatenating the various input files into./seedForth.catinto every compiled preForth/seedForth application, so it will either processstdinif there are no command line arguments, or else open and read each file specified on the command line in sequence, where-isstdin. The effect of this change is that when running./seedyou no longer need to press Enter after typingbyeto makeseedForthquit. The extra keystroke was needed to force the front-endcatinvocation to try to send something toseedForthand then it would realize the pipe was broken and quit. WithseedForthmanaging its own input, you can quit cleanly.key?word use apoll()rather thanioctl()system call, since we now expect thatstdinmight come from a file.eemitword throughout the system which is the same asemitbut writes tostderr. I use this for debugging.keyandemitto higher token numbers, to make it easier to detect theEOTcharacter which used to correspond to thekeytoken. Implement a neweottoken at 4 which is similar to thebyetoken. The reason for this change is because thebyetoken was overloaded to use as[, i.e. it would restart the interpreter after compiling the;token and during certain control flow constructs. This meant you couldn't compile abyetoken into a program. By moving the original usage ofbyeonto the neweottoken, it meansbyeis no longer special and can be compiled normally, while also the changes to the existing system are minimal, and as a bonus, if input runs out during a:-definition, the resulting EOT will be interpreted as[and send us back to interpretive state, where a further EOT is considered invalid and quits the interpreter too.seedForthInteractive.seedsourceinto a newseedForthRuntime.seedsourceand fromhi.forthinto a newruntime.forth. The original filesseedForthInteractive.seedsourceandhi.forthstill exist and contain all the tests as well as less essential words likesqrwhich you can grab if you actually need them. To use the system as it was previously, you have to tokenizeseedForthRuntime.seedsource+seedForthInteractive.seedsourceintoseedForthInteractive.seedand then pass itruntime.forth+hi.forthand theMakefileand./seedscript have been updated appropriately. But when running the self-hosted tokenizer, it uses a different*.seedfile which is basically generated by tokenizingseedForthRuntime.seedsource+ a call toboot, and it uses theruntime.forthwithout thehi.forthpart.tibfrom 80 to 255 characters, also fix a bug inacceptwhich allowed it to write one character beyond thetib. Note that some source lines in the system as originally were > 80 characters, and I think they may have been silently truncated and the incomplete code not noticed. The extra character seemed to cause a crash on my Z80 port, which alerted me to the issue. There isn't really a good way to flag too-long lines to the user, but I have at least made it not echo any extra characters.accept,refillandrestartwords to detect EOT and return something or quit. This is needed to prevent the self-hosted tokenizer from hanging after it tokenizes all the input. I'm not entirely happy with the solutions I came up with here, and I think possibly the entire concept of using EOT as a marker for the end of input might be flawed. Could we makekeythrow an exception instead? I'm not a very experienced Forth programmer so I don't really know how this would be done conventionally. But at any rate, you can now exit from./seedby typing Ctrl-D (no Enter) orbyeand I find the first more comfortable. Just be aware that a partial last line is not supported, either in a*.seedsourcefile or a./seedsession, it will say "not found".echoandinput-echobe off. That's because after loading a seedForth runtime from*.seedfile, you will always want to load further runtime as textual Forth source. So it's cleaner to let the second runtime enable echo. This makes the output of./seedcleaner as well. But primarily it's needed to avoid junk getting into the tokenized*.seedfiles.DO/?DO/LOOP, as the experimental?DOthat was commented didn't have a correct companionLOOP.Some of the more detailed changes might not be well explained in the above summary, or might be objectionable for whatever reason, so please feel free to check with me. Also, keep in mind that this changeset is "on top of" the previous changeset that I PR'ed the other day, so github will show both changesets. It's annoying the way github does this, and it does not recalculate the changeset after you merge the first PR. But you can force it to, by changing the base branch name and then changing it back.
I had a really good time doing this, even though it involved a lot of head-scratching and dealing with strange crashes and errors and unexpected behaviour. As I mentioned I'm not an experienced Forth programmer, but I've become more conversant with it.
Note: There is a minor bug in this PR, that I had directly invoked
gforthinMakefileinstead of$(HOSTFORTH). It is fixed in #12 so I have not fixed it here. If you do want the fixed version of this PR see the branchself_hosting_tokenizer1in my github account. I wouldn't recommend using that branch though, because it will cause conflicts later when mergining #12 and others.