1 Introduction
What follows is a hodgepodge tutorial designed for my younger brother, George. He had experience with scheme, Java, and some C. I decided to help him make and build entire free software projects by implementing a rough sort of LISPy list data structure in C.
He was unfamiliar with pointers and the implementation of nested lists in scheme. What follows is taken from a series of mails from me to him.
2 Pointers
RAM is like a set of numbered mailboxes. It's a big long string of boxes, each with an identification number. If your mailbox is #1 and mine is #2, and you put a note to me ("hello, world.") in #2, that's like a standard variable assignment:
|
But if you put a note in your mailbox that says "See mailbox #2", that's a pointer. In C, it would look like:
|
So, if you look in my mailbox, you'll get the message:
|
But if you look in your mailbox, you'll just get a meaningless number ("mailbox #2"), which is a confusing message unless you realize that you're looking for the location of the real message. So we "dereference" the pointer. That is, we follow it to another mailbox.
|
That is my one-page guide to pointers. If there's something you don't understand, please read it again. If there's still something, mail me before going any further. It's the sort of thing that becomes second nature very quickly, but you just need to suss it out in your head first.
3 Cons Cells
okay, so the basic lisp data structure is the cons cell:
|
It contains two pointers (the locations of other variables and structures). AR and DR. These are vestigial names from the IBM system that LISP was developed on. CAR and CDR are named because they were the IBM assembly instructions to take the "Contents of the Address Register" and the "Contents of the Decrement Register".
For a cons cell to be an atom, we'll set the AR to zero (an invalid memory address). The empty list is a cons cell with both AR and DR set to zero.
The only type of atom we have is a string. Strings in C are easy, because they're just a pointer to an array that has 0 as the final element.
So, the atom "hello, world" looks like this:
|
So let's see how this would look if it were mailboxes (I made the space character a _ for clarity):
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 3 | h | e | l | l | o | , | _ | w | o | r | l | d | . | \0 |
In fact, the letters are really just numbers themselves (see the ASCII table for which numbers), but let's look at it:
The cons cell is a two-box data structure. The ar is address 1, and the dr is address 2. What's the contents of the dr? the number 3. Well, since this is a pointer, it means that there's something interesting at address 3. In our case, since strings are the only valid things to point to if the ar is 0, we know that it'll be an array of ascii characters that ends with the number 0 (the character '\0'.
So let's say we cons it onto the empty list (), which is the cons cell [0|0]. We make ("hello, world.") (one atom, since the quotes surround the spaces).
|
So let's see how this would look if it were mailboxes:
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 3 | h | e | l | l | o | , | _ | w | o | r | l | d | . | \0 | 0 | 0 | 1 | 17 |
So, the atom cons cell is still in mailboxes 1 & 2, the string still lives in 3-16. But now we have the nil list cons cell in 17,18, and the head of our list is the cell that lives at 19 & 20.
so, if we take the car of the list head, we see that it contains 1, which is the address of an atom cell, whose data is at location 3. Going to location 3, we find our string. If we take the cdr of the list head, we see that it points to location 17, which contains a nil list cell.
so, if we want to cons another atom onto this list, we just make a list cell whose ar points to an atom cell whose dr points to another string. We then make the dr of our new list cell set to "19".
4 Simple Predicates
So in C, 0 is false and nonzero is true (even though the Unix authors decided that 0 is success and nonzero numbers indicate which kind of failure).
We should do the same thing, for now. In many lisps, the nil list is false and anything else is true, but we can leave most of the control structures alone for now.
So the functions we want are:
These are the function prototypes, used to let the compiler know ahead of time that functions are going to be used. This allows you to have functions listed in any order, so that if an earlier one calls a later one, the compiler won't complain about not knowing what it is. The prototypes only list the function name and the data types of the parameters and return value.
So, given the criteria listed in a previous e-mail (an atom has a NULL ar, the nil list has two NULL pointers, and a list has two non-NULL pointers), can you finish the following functions?
|
5 Autoconf and Automake
So, here's the rundown for the way autoconf and automake work.
Autoconf originally required that you write hideously long Makefile.in files which contained all of these nice things that would be translated by autoconf. It was a lot like sticking your tongue in an electric fan.
So bear in mind that automake was a bit of an afterthought, and as such it contains many magic variables and silly naming conventions that you just have to respect.
So, here's the list of files you need, relative to the root of your project tree:
Now, I can show you how to do the configure.in later. the key thing is that your top-level Makefile.am contains:
|
...and that your configure.in contains:
|
So the top-level Makefile.am's subdirs says "take the Makefile.ams in all these dirs and turn them into Makefile.ins", and then the AC_OUTPUT says "make these files from the corresponding .in files". YOu can actually do some nice auto-substitution things by making foo.c.in files which are generalized to insert data that configure finds. That's a little outside the scope of this document though.
So your src/Makefile.am will be the big important one that lists the targets. have a look at rfk's src/Makefile.am (which contains a lot of useless crap):
|
This makes some special setup variables, becuase /usr/games isn't in automake's vocabulary of automatic variables. had it been destined for /usr/bin, I would have only had to say bin_PROGRAMS without saying "bindir=...".
So you specify the binaries, and then make a variable with binaryname_SOURCES. Then, if any libs are needed, you specify them with binaryname_LDADD. So an example src/Makefile.am would be:
|
But here we want to MAKE a library, not just use one. Well, there's a magical variable for THAT, too:
|
note that the dot was changed to an underscore.
So, try putting the two library lines in src/Makefile.am and make a dummy doc/ and doc/Makefile.am (leave it blank). Make the top-level Makefile.am with the subdirs= line, and put the following in your configure.in:
|
The first two lines initialize automake and autoconf, providing a file to check for, the name of the project, and the version number. The next line just says that we want autoconf to make a config.h header file for us so that we can #include it for the platform-specific settings. Then we tell it that we're making a C program.
Then we check for the C compiler, libtool (the program that manages all that lib_LIBRARIES stuff for automake), and install (which just installs files with given permissions).
Then comes the output line, as explained.
Then we have to actually write some code in lithp.c!
6 CVS
So you've briefly used CVS for robotfindskitten, and basically only as a clumsy way of downloading the latest gargargar. You've made some changes and checked them in, but have you ever browsed through the CVSweb interface on sourceforge?
That ish is pretty damn groovy. This stuff keeps track of changes you've made, and allows you to roll your own combination of revisions onto a file, like "oh, show me what this file would look like if frank hadn't MADE THAT STUPID CHANGE on October 13!". Crazybad!
So it almost sounds like it's the sort of ish that's so hyper-complicated that only greybeards and their PDP-11s will ever be able to configure properly, right? Well, it's obscure, but for basic use it's quite straightforward.
So you can set up a networked cvs repository right here on zork if you like. You can reserve it for your own use, or we can set up a group of zork users who have access. And it's really only one or two commands to do so.
So watch me as I make a CVS repository for nwall, and then check it out from various places:
|
That's really the magic of setting up a CVS repository right there. All it does is just make a CVSROOT directory with some magic files in it. The only trick is that you need to specify an absolute path to the directory. Nothing tricky yet. Now, let's look at nwall.
|
Blah. Lots of symlinks and tarballs and stuff. I'll need to clean up some of the cruft first. I want to make sure I get only the important stuff in the repository, without any binaries or extraneous symlinks.
|
Okay, now it's time to pull it into the directory with cvs import. This is where we need to be careful. What we're doing is importing the current directory and everything under it into the tree. We can specify a name for the tree, and two "tags", which are just vestigial identifiers that aren't important right now.
Since our cvsroot is on the local machine (as opposed to being a remote pserver or directory for us to ssh to), we specify the repository with "-d :local:/home/nick/cvs/".
|
...and up comes the familiar $EDITOR for our log message. I type in a simple "Imported nwall.", save, and...
|
So I think the L means that the file was a symlink, though it'll be converted to an actual file in the repository. The I seems to be an ignore (it doesn't want my vim backup files, which is good), and the N seems to be a New file.
now, let's have a look here:
|
So it made the directory nwall (the third-to-last word in our import command), and put all these ,v files in there, each representing a file in the actual distribution. Nothing too obscure, here, really. Most of the CVS bookkeeping was put into the CVSROOT files. These ,v files are actually old-school RCS files (if you remember those at all).
So now let's see if I can check it out:
|
Well, that sure was easy. But what about networking? Surely I'll want to check this stuff out from other machines! No problem:
|
...and through the magic of SSH, it is done!
Since CVS is really little more than a magic directory full of RCS revision-log files, we can do standard Unix group permissions things to create repositories for a group of developers. This means that we could make my cvs repository writable by the emgnulation group, say, and then both you and I would be able to check in and out changes (yes, even checking out! You must have write access to the CVS logs in order to check out files.)
So, here's what you should do:
make a /home/zen/cvs dir, and do "cvs -d /home/zen/cvs init" on zork. on the georgebox, go into the root of your lithp tree (which is probably pretty lightweight at the moment) and do
|
Remember that you MUST be positioned INSIDE the root of the tree for this to work! Don't worry about screwing things up, since you can always remove /home/zen/cvs and try again. You'll know you've got it right when your /home/zen/cvs/ tree looks like my /home/nick/cvs/ tree (though with fewer files).
Now cd into your root directory and move your old lithp directory someplace out of the way, like libithp-old, and do:
|
(this assumes that your CVS_RSH var is still set to ssh-- best to make sure that's true by putting it in your .bash_profile. YOu can type "echo $CVS_RSH" to see what it's set to at the command line)
Now you'll have a directory named libithp that contains your cvs repository. each time you create a new file or directory, you can use "cvs add" to put it in the repository. Also, since all the dirs in the repository now have subdirs named "CVS/", you no longer need to do the "-d :ext:zen@zorkblahblahblah" any more. All you need do is edit your files and do "cvs ci" from time to time (or "cvs up" if you've checked in changes from someplace else).
I also highly recommend that you put the following in ~/.cvsrc on both the georgebox and zork:
|
The diff -u means that the patches generated by a cvs diff will be in the nicer "unified diff" format. The update -Pd means that you'll actually get new directories when they are made in the repository (saves a LOT of grief swearing over files that didn't get updated and thus broke builds). The cvs -z3 means that it will use gzip compression (level 3) for all transactions (useful on that loaded linfield connection, no doubt).