4 Tutorial

This chapter presents the Knit unit language by taking you through a series of example programs. Complete source code for each of the examples is included in the Knit software distribution, within the examples subdirectory, so you can try out Knit as you read this chapter. The examples used in this chapter include:

Note that this tutorial does not (yet!) cover all of the features in the Knit unit language. For a full and formal treatment of the Knit language, please read the Report on the Language Knit: A Component Definition and Linking Language, which is also part of the Knit software distribution.

4.1 Unit Basics: The hello Example

Our first example has two goals: first, to introduce a few basic Knit concepts such as units and bundles, and second, to take you through the steps of compiling a program with Knit. So, to get started, we will show how to create the standard “Hello, World!” program with Knit.

4.1.1 The Unit Model of Software Components

In Knit, a program is made up of software components called units. A unit is a “logical” wrapper around code, and by “logical” we mean two things:

In other words, a Knit unit is a kind of description of the code that it “wraps.” This description is used when your program is compiled, not when your program is executed. Therefore, a unit describes the things that Knit must know about a piece of code in order for that code to be combined and linked with other units to form a complete system. These things include:

4.1.2 A Unit File

The following unit definition shows how the above-described features of a unit are expressed in Knit. The next few sections will discuss the parts of this unit definition in more detail.

  unit Hello = {
      imports [ io: {printf} ];
      exports [ main: {main} ];
      depends { main needs io; };
      files { "hello.c" };
  }

This definition comes from the file examples/hello/hello.unit. As you can see, Knit unit definitions are written in a textual specification or “programming” language. These definitions are grouped and stored in unit files (with names ending with ‘.unit’) for processing by the Knit tools. As described previously, unit files are separate from the files that contain your program’s code.

If you look at the hello.unit file yourself, you will see that Knit supports C++-style comments in unit files. A comment either starts with “//” and runs to the end of the line, or starts with “/*” and ends with “*/”. (A comment that begins with “/*#” and ends with “#*/” is a doc comment, as described in Section 3.2.)

// `Hello' is the (atomic) unit that describes our program. It imports some
// I/O services (function `printf') and exports `main'.

The comment says that Hello is the unit definition for the entire “Hello, World!” program. (The unit is “atomic” because it is implemented by a set of files, rather than by a set of other units.) Let us take a closer look at the parts of this definition.

4.1.3 Imports and Exports

A unit has a set of imports and a set of exports. (Both the imports and exports parts of a unit definition are required, even if one or both are empty.)

Because a unit may import many items and export many items, Knit allows you divide your imports and exports into groups of related items, called bundles. An imports or exports specification contains a list of bundles definitions, which looks something like a list of variable declarations:

imports [ bundle-name1: bundle-type1 [, bundle-name2: bundle-type2, ...] ];
exports [ bundle-name1: bundle-type1 [, bundle-name2: bundle-type2, ...] ];

The name of a bundle (a symbol) appears to the left of the colon, and the type of the bundle appears to the right. Multiple bundle declarations are separated by commas. In our Hello unit, the type of each bundle is given as a list of names enclosed in braces, indicating the names of the objects being imported or exported from the unit:

imports [ io: {printf} ];
exports [ main: {main} ];

Hello has a single import bundle named io: this bundle has a single member printf. (If there were multiple members in the bundle, one would put commas between the member names.) Similarly, the exported bundle is main and contains a single member, also called main. It is not a problem to reuse names in this way, because the names of bundles and the names of bundle members are kept in separate namespaces. This is similar to the handling of variable names and struct member names in C: the names are in separate namespaces, and so will never conflict.

The names of bundles may be used in subsequent parts of the unit description. For instance, we may use them in the depends part of the unit, which we describe next.

4.1.4 Dependencies

The depends part of a unit definition states the dependency relations that exist between the imports and exports of a unit. Knit needs this information in order to schedule the initialization and finalization of units. For instance, the Hello unit says the following:

depends { main needs io; };

The declaration “main needs io” says that the functions in the main bundle make use of the functions from the io bundle. Thus, any initializers that are associated with the io bundle must be run before any functions from the main bundle can be called.

While we do not use Knit’s scheduling features in our current example, it is always a good idea to describe the dependencies that exist in a unit: this information may be needed in other programs that incorporate your units. For this reason, Knit requires that all atomic unit definitions contain a depends clause.

We will describe dependencies in greater detail when we deal with initialization and finalization later in this tutorial (Section 4.3). Now, however, we describe the final part of our Hello unit.

4.1.5 Files

The implementation of the Hello unit comes from the file hello.c, as described in the unit definition:

files { "hello.c" };

If our program were more complex, we could list more than one source file in the files part of our unit definition. (Multiple file names would be separated by commas.) Knit needs the names of the implementation files in order to produce the knit_generated.mk file, which will contain the make rules for compiling our unit.

So, to sum up, our Hello unit definition says that the code in hello.c implements a function called main. The main function calls printf (as stated in the depends clause), and printf is imported from outside the unit. Now that we understand what the definition says, we are ready to process the unit file with Knit in order to create the “Hello, World!” program.

4.1.6 Compiling the hello Program

Assuming that you followed the Knit configuration instructions in Section 2.3, you should have an examples/hello directory in your Knit build tree. That directory should contain a GNUmakefile that you can use to run Knit and compile the hello program. (Note that the GNUmakefile reads a separate file called GNUmakerules, which is located in the Knit source tree, not in your build tree.)

Go into the examples/hello directory of your Knit build tree and type “make” or “make all”. Assuming that you are starting from a blank slate (i.e., the program has not been built already), the build will start by running a command like this:

  ../../bin/knit \
    UNIT_PATH=.../examples/hello ... \
    hello.unit Hello

The command line syntax of the knit compiler is described in Section 3.1.2. In brief, the above command tells knit to process the hello.unit file and create the set of output files that are needed in order to compile an instance of the Hello unit. When knit finishes, it will have produced three files:

After knit has completed, the make process will continue: make will read the newly created knit_generated.mk and proceed to compile the knit_inits.c file.

gcc -c knit_inits.c -g -O2 -Wall -Wshadow

Note that knit_inits.c is not part of the Hello unit. Rather, it is part of the “runtime environment” for the unit.

  gcc -g -O2 ... -o Hello_hello.c.raw.o -c .../examples/hello/hello.c
  cp Hello_hello.c.raw.o Hello_hello.c.o
  ../../bin/rename_dot_o_files 'Hello_' rename_Hello Hello_hello.c.o
  ar csq foo0.a Hello_hello.c.o

The actual make issues a few additional commands, which we have omitted to improve readability. The idea is to compile each source file, run the rename_dot_o_files tool to transform the object files as needed, and then combine all of the objects into an archive.

Finally, make will link the compiled unit, the knit_inits.o file, and the standard C runtime library to create the hello program:

gcc -o hello --begin-group knit_inits.o foo0.a --end-group

First, remember that the Hello unit is defined to import a printf function and export a main function. In our case, since Hello is our top-level unit, these symbols will be imported from and exported to the “environment” of our unit code, i.e., the standard C library and any other object files that we link into our program. If we had not imported printf, then our program would not link. Symbols from the environment are not implicitly imported into a unit: rather, they must be explicitly imported. Similarly, if we had not exported main, our program would not link, because the C library would not have access to the main function defined in our unit.

Second, the link command line lists the objects and libraries for our program between --begin-group and --end-group. As previously described in Section 3.1.3, this simply helps to avoid problems that the linker might have in resolving symbols, and eliminates the need for us to carefully order the files on the link command line. There is no “deep magic” here; it is simply convenient practice.

Before moving on to the next example, you should check that everything actually worked:

[10] examples/hello> ./hello
Hello, world!

4.2 Using Multiple Units: The msg Example

Our second example is similar to the “Hello, World!” program that we just built, except that our new program is built by combining two separate units. We will call our new program “msg”. Let us start by looking at the source code for the main program, which is in the file examples/msg/main.c in the Knit source tree. The interesting part of the file is this:

  const char *message();

  int main(int argc, char** argv)
  {
   printf("%s", message());
   return 0;
  }

This is of course nearly identical to the main function of the “Hello, World!” program, except that now, the string to be printed is returned by an external function called message. Since this is a Knit tutorial, we of course want to get the message function from another unit. That other unit will export the function, and our main program unit will import it.

4.2.1 Bundletypes

It is not a problem to define multiple units: simply put multiple unit definitions in your unit file. But if the units are designed to be linked together, how can we best ensure that all of the various import and export bundles have the appropriate types? In the hello example (Section 4.1.3), you learned that import and export bundles can be written like this:

import [ bundle-name : { member 1, member 2, ... } ];

In other words, you list all of the members of the bundle between braces. This style is tedious, however, if you want to use the same kind of bundle more than one place — which is usually the case, after all, because most bundles are exported from one unit and imported into another! So, to help you avoid errors and verbosity in your unit definitions, Knit allows you to define bundle types by name. For example, in the unit file for our current program (examples/msg/msg.unit), you will see this:

  // Define our ``bundletypes.''  A bundle is like an ``interface'': a set of
  // functions that describe the imports or exports of a unit.
  //
  // Our bundletypes are exceedingly simple, since each contains only a single
  // member.  In general, a bundletype contains several members and describes a
  // group of related functions.
  //
  bundletype IO_T = { printf }
  bundletype Msg_T = { message }
  bundletype Main_T = { main }

These definitions define three bundletypes in the obvious way. We can now use the bundletypes to define the unit that will contain our main program code, i.e., the code in main.c:

  // `Main' is the unit that encapsulates our `main' function.  If you look at
  // the code in `main.c', you will see that `main' calls `printf' and `message'.
  // We import those functions in two bundles (`io' and `msg').
  //
  unit Main = {
      imports [ io: IO_T,
         msg: Msg_T ];
      exports [ main: Main_T ];
      depends { main needs (io+msg); };
      files { "main.c" };
  }

This unit definition should look familiar, since it is very much like the definition of the Hello unit from the previous example (in Section 4.1.2). The three main differences are that:

4.2.2 Renaming

Now we turn our attention to the second unit in our example: namely, the unit that will provide the definition of the message function. This unit will be implemented by the code in the examples/msg/messages.c file. If you look at that file, you will see that it defines three functions, each of which returns a string:

  const char *not_worth_knowing() { return "..."; }
  const char *rarely_fits() { return "..."; }
  const char *change_the_spec() { return "..."; }

Unfortunately, although all of these functions have the same C type as the message function we need, none of the functions at hand are actually called message! This kind of problem is often encountered by programmers who need to combine code from different sources. The usual C solution is to write miniature wrapper functions or to use C preprocessor magic to establish the wanted connections between functions. Unfortunately, these solutions are often tedious and break down at large scale.

With Knit, however, we can do better. Without changing the C code, we can define a unit that exports every one of these functions as a different instance of a message function. We will later decide which of these instances will be “the” message function that is imported into our Main unit.

To export each of our three functions as an instance of message, we define a unit that exports three bundles, each of type Msg_T. Then, we use rename declarations to say which C functions correspond to which bundle members, like so:

  unit Messages = {
      imports [];
      exports [ msg_1: Msg_T,
         msg_2: Msg_T,
         msg_3: Msg_T ];
      depends { exports needs imports; };
      files { "messages.c" };
      rename {
   msg_1.message to not_worth_knowing;
   msg_2.message to rarely_fits;
   msg_3.message to change_the_spec;
      };
  }

All three exported bundles are of type Msg_T, so each one exports a message function. By default, Knit automatically associates each imported or exported bundle member with a C function of the same name. But since that default rule cannot work here, we must make explicit pairings. Each rename declaration in our unit has the form:

rename bundle-name .member to c-function ;

These declarations have the “obvious” effects. The not_worth_knowing function is exported as the function referenced by the message member of the msg_1 bundle. The other two functions are referenced via the message members of msg_2 and msg_3. Later in this tutorial you will learn how to rename several functions at once, but for now, we proceed with the current example.

4.2.3 Compound Units

At this point we have two units, Main and Messages. Each of these units is atomic, meaning that each is implemented by one or more source files. What remains in this example is to connect our units together, via a compound unit, to form a single unit that we can use as the top-level for our program.

A compound unit is very similar the units we have seen so far: for instance, a compound unit has imports and exports. However, instead of a files section, a compound unit has a link section. The link section states the units that make up the compound unit, and further, defines how these “internal” units are connected to each other and to the imports and exports of the compound unit itself.

For our current program, we need a compound unit that connects an instance of our Main unit with an instance of our Messages unit. If you look in the msg.unit file, you will find the following definition of the compound unit we need:

  unit Msg = {
      imports [ io: IO_T ];
      exports [ main: Main_T ];
      link {
   [main] <- Main <- [io, msg_1];
   [msg_1, msg_2, msg_3] <- Messages <- [];
      };
  }

The imports and exports should look familiar. As explained in Section 4.1.6, since Msg is going to be the top-level unit for our program, we must explicitly import the services that we need from the environment (in this case, the functions listed in the IO_T bundletype) and explicitly export the functions that the runtime needs to invoke (i.e., our main function).

The link section of a compound unit describes how the unit is implemented in terms of a network of other units. Each statement in the link section above is of the form:

[ export 1, export 2, ... ] <- unit <- [ import 1, import 2, ... ]

Each line causes an instance of the named unit to be created. Let us take a closer look at the first line in the link part of our Msg unit definition. That line says that a Main unit instance will be created as part of the (compound) Msg unit. At the start of that line, “[main]” is a list of symbols: these give names to the bundles that are exported by our Main unit. Bundles are named in the order they are listed in the exports list of the unit being instantiated. (Of course, our Main unit has only one exported bundle.) At the end of the line, the list “[io, msg_1]” gives names to the imported bundles. Again, the bundles are named in the order they are listed in the Main unit definition.

Connections between units are indicated when the same name is used at two or more places in the compound unit. Looking again at the first line within the link part of Msg, we see that bundle being exported from Main has the same name as the bundle being exported from the Msg unit itself. This indicates that the export of Main is connected to the export of Msg: in other words, Msg exports the functions from its internal Main unit. Similarly, the io bundle that is imported to Main is connected to the io bundle that is imported by Msg.

So finally we see which of the functions in our Messages unit becomes “the” message function to be called in our program. The second link of our link specification gives names to the three bundles that will be exported from an instance of our Messages unit. One of these bundles (msg_1) is specified as an import to our Main bundle. Thus, the “wiring” in our compound unit tells us that the function that implements msg_1.message will be the one to actually be called in our program. (The bundles msg_2 and msg_3 are not connected to any other units, nor are they exported from the compound unit. This is not a problem — the bundles are simply unused. Also note that the Messages unit requires no imports, and so its import list is empty.)

Now it should be clear that you can easily change the “wiring” of the program, without changing the C source code. If you simply replace the msg_1 import to Main with either msg_2 or msg_3, you effectively change the message string that will be output by the complete program. This kind of flexibility is critical when building programs from components: in Knit, the linking specifications are separate from the component implementations.

4.2.4 Compiling the msg Program

Finally, you are ready to compile the msg program. Go to the examples/msg directory of our Knit build tree and type “make” or “make all”. The make process will go through the steps described previously in Section 4.1.6, and the result will be program called msg. Run it:

  [11] examples/msg> ./msg
  A language that doesn't affect the way you think about programming is not
  worth knowing.

If you re-examine the msg.unit file, you should be able to see why the program prints the message shown above, and not some other message. At this point, you might want to experiment by editing msg.unit to change the message output by your program. After a change to the unit file, a simple “make” should be all that is required to re-Knit and recompile your program. (Do not change the name of the Msg unit, however! If you change that name, you will have to edit the GNUmakefile in your build directory to match.)

4.3 Knitting Tricks: The calc Example

Now that you have mastered Knit basics, it is time to see how Knit can help in the development of a nontrivial C program. In this example, we will use Knit to define, build, and analyze a four-function expression evaluator — in other words, a calculator. The basic program will read expressions from the user, evaluate them, and print out the results:

  [12] examples/calc> ./calc
  1+2
  read    : 1 + 2
  eval    : 3

To make things a little more interesting, we will Knit together a special version of the program that monitors calls to malloc and free for two different datatypes in the program. The enhanced program will report its allocation statistics for each input expression, like so:

  [13] examples/calc> ./calc
  1+2
  read    : 1 + 2
  read    : (allocs/frees)  4/ 4 tokens,  3/ 0 exprs
  eval    : 3
  eval    : (allocs/frees)  0/ 0 tokens,  3/ 2 exprs
  cleanup : (allocs/frees)  0/ 0 tokens,  0/ 4 exprs
  total   : (allocs/frees)  4/ 4 tokens,  6/ 6 exprs

As shown in the transcript, in the “read” phase of the program, four token objects were allocated, four tokens were freed, three exprs were allocated, and zero exprs were freed. Similar statistics were reported for the “eval” and “cleanup” phases. Finally, the “total” line shows the sums of the counts from the three phases. In the example shown, for both tokens and exprs, the number of allocs is equal to the number of frees. This is good evidence that there were no memory leaks.¹

The C code for our calculator — approximately 1000 lines — is located in the examples/calc directory of the Knit source tree. The calc.unit file organizes the calculator as a small number of atomic units — one for each major component — and links them together using compound units. If you have worked through the previous examples in this chapter, you should already understand most of the contents of the calc.unit file. Therefore, in the sections below, we describe only the Knit language features that were not used in the hello or msg programs.

4.3.1 Initializers and Finalizers

The first new Knit language feature in our example is the use of initializers and finalizers. In Knit, a component can specify one or more functions that must be called to initialize the component — more precisely, to initialize one or more of the component’s exports. Similarly, a finalizer is a function that must be called in order to shut down some of the component’s exports.

Why does Knit treat initializers and finalizers in a special way? Why not simply list initializers and finalizers among a unit’s imports and exports? It is for the same reasons that languages like C++ have special notions of constructors and destructors:

The syntax for specifying initializers and finalizers is illustrated by the Input unit in our calc example:

  unit Input = {
      imports [ alloc : Alloc_T,
         io : IO_T ];
      exports [ input : Input_T ];

      initializer init_input for exports;
      finalizer fini_input for exports;
      depends {
   // { init_input } is syntax for ``the set containing `init_input'.''
   { init_input } needs io;
   { fini_input } needs io;
   exports needs imports;
   //
   // As described previously, if we wished, we could replace the above
   // three lines with a single (overgeneral) statement that all of our
   // exports, initializers, and finalizers depend on all of our imports:
   //
   // (exports + inits + finis) needs imports;
      };
      files { "input.c" };
  }

In the above definition, the C function init_input is specified to be an initializer for all of the unit’s exports (as indicated by the keyword exports). Similarly, the C function fini_input is the finalizer for all of the exports. In general, one can provide a specific set of bundles when defining an initializer or finalizer, but it is usually sufficient to say simply that the function is an initializer or finalizer for all exports. Moreover, it is often a good idea to overgeneralize in this way. If you later tweak the C code and add a new unit export, for example, you do not have to remember to specify that your initializer or finalizer also applies to the new export. Finally, note that initializers and finalizers do not generally need to be exported: Knit invokes them specially. (The only reason to export an initializer or finalizer would be if you want Knit to invoke the function automatically and you want to explicitly invoke it yourself. This would be rather odd.)

So how are initializers and finalizers used? When the knit compiler is run, it creates a file called knit_inits.c that contains two function definitions. The first function, knit_init, contains a list of calls to the initialization functions for the unit instances within your program.² The second function, knit_fini, contains calls to the finalizers in your program.

The knit_init function must be called before your program proper, i.e., before any of the exports from your program’s top level unit are called. In the current example, this is accomplished with some “runtime magic” in the init.c file. Pay special attention: the main function of the calculator program does not invoke the initializers! Instead, the Knit runtime support in init.c ensures that the knit_init function is run before main is called. Similarly, the code in init.c ensures that knit_fini will be called after the top-level exports (in this example, the main function) will no longer be called (i.e., after main has returned, or exit has been called).

4.3.2 More About Dependencies

The order of the calls in the Knit-generated knit_init and knit_fini functions are based on the dependency information found in your unit definitions. For instance, if one initializer needs to call functions that are imported from a second unit, then the second unit must be initialized before the first. Accurate (or, at least, conservative) dependency information in all units is a must in order for Knit to find correct initialization and finalization schedules. This is why dependency information is required even for atomic units that do not themselves have initializers and finalizers, as was previously described in Section 4.1.4.

If we look again at the depends section of our Input unit, you will notice some new syntax for describing dependencies:

      depends {
   // { init_input } is syntax for ``the set containing `init_input'.''
   { init_input } needs io;
   { fini_input } needs io;
   exports needs imports;
   //
   // As described previously, if we wished, we could replace the above
   // three lines with a single (overgeneral) statement that all of our
   // exports, initializers, and finalizers depend on all of our imports:
   //
   // (exports + inits + finis) needs imports;
      };

The first piece of new syntax is for “object sets” as illustrated in the first two statements. To specify that the init_input and fini_input functions each call functions from the imported io bundle, we create object sets by putting the function names in braces as shown. Note that we could have put both functions in a single set. Also note that we must use the object set syntax here, because our initializer and finalizer functions are not part of any named (imported or exported) bundle.

The second piece of new syntax is illustrated by the third statement. Instead of naming specific bundles, a dependency statement can refer to certain predefined groups of bundles:

Further, Knit allows the unit writer to combine object sets using “+” for set union and “-” for set difference, as shown in the comments above. As the comments describe, we could replace all of the dependency statements in the Input unit with the single statement:

(exports + inits + finis) needs imports;

which conservatively approximates (overgeneralizes) all of the actual dependencies in the unit. When writing your own units, it is often good to start with the above statement — but, be careful! If dependency information is too conservative, Knit may find an initialization cycle: a cycle of units in which each unit requires that the previous unit to be initialized before initializing itself. This can happen if there is in fact a true dependency cycle, or if your units’ dependency specifications are too general (so that they introduce false dependency cycles). In the latter case, you will need to make your dependency specifications more accurate, so that Knit can find workable initialization and finalization sequences. Fortunately, both true and false dependency cycles are rare in most programs.

Before moving on to further discussion of renaming, it would be useful for you to read through the definition of the Alloc unit in our calculator unit file. The (rather long) comments in the depends section in particular clarify the relationship between dependencies and initializers. (In case you do not have the file handy right now, the lesson is this: it is extremely unusual for a bundle to depend on its initializer.)

4.3.3 More About Renaming

If you just read through the Alloc unit definition as suggested above, you may have noticed some new syntax for renaming:

      rename {
   // We need to associate `counted_alloc.malloc' with the C function
   // `counted_malloc', and likewise for `counted_alloc.free'.  To make
   // these associations, we could use two separate renaming declarations:
   //
   // counted_alloc.malloc to counted_malloc;
   // counted_alloc.free to counted_free;
   //
   // But we can do the same job by saying that the C function names are
   // derived by adding a prefix to the names of the bundle members:
   //
   counted_alloc with prefix counted_;
      };

When we previously discussed renaming in Section 4.2.2, we learned how to make associations one-by-one. To make certain common cases easier, however, Knit provides special syntax for renaming when the names of C functions can be manufactured by adding a prefix or suffix to the names of the members of a bundle. The syntax of these special cases is:

  rename {
    bundle-name with prefix identifier ;
    bundle-name with suffix identifier ;
  };

Of course, this convenient syntax is useful only for prefix or suffix transformations. You cannot apply both a prefix and a suffix. For situations requiring more that a simple prefix or suffix addition, you must use Knit’s one-by-one syntax.

4.3.4 Wrappers and Transparent Interposition

For the allocation-monitored version of the calc program, we want to count the numbers of expr and token objects that are dynamically allocated and freed. Further, we want to count these events separately for each type. Let us consider exprs first. If you look at the code in the expr.c file, you will find the alloc_expr function, which handles all dynamic allocations of exprs:

  static expr
  alloc_expr(void)
  {
   return ((expr) malloc(sizeof(expr_struct)));
  }

There is an analogous free_expr function for handling dynamic releases; free_expr invokes the standard free function to actually release the memory for a given expr object.

Counting the number of dynamic expr allocations and frees, therefore, amounts to counting the number of times that alloc_expr calls malloc and the number of times that free_expr calls free.³ More precisely, we must count the number of times that malloc returns a non-null result and the number of times that free is called with a non-null argument. To do this, we need to interpose on or wrap calls to these functions, so that we can insert our instrumentation.

Given this scenario, most C programmers would either (1) edit the alloc_expr and free_expr functions to insert the needed instrumentation, or (2) define C macros to “magically” replace the calls to malloc and free with calls to other functions. Each of these solutions has its problems, however. The first technique requires the programmer to edit the code. Likely, the programmer will complicate the code with #ifdefs so that the instrumentation can be conditionally incorporated into the program. While doing this once or twice might not be a problem, doing it many times turns the code into an #ifdef jungle!

The second approach — instrumentation via macro magic — has a similar but different problem. It is easy enough to define malloc and free as macros that call other functions, say, counted_malloc and counted_free. To use these macros, the programmer would probably need to change only a few #include lines in the source, so while source changes are still required, they are minimal. A new problem arises, however, when the programmer remembers that we want to monitor both expr and token allocation, and that we want separate counts for each type! Now we cannot use our simple macros to insert instrumentation into both expr.c and token.c, because we need slightly different instrumentation for each file.

A possible solution would be to make more complicated macros, e.g., macros that expand differently based on other macros. But you do not want to do that. You want to use Knit, which can solve your problems in an elegant and principled way.

In the calculator program at hand, the code in expr.c is encapsulated by the Expr unit, which is defined in the calc.unit file. That unit says that the functions malloc and free are imported into the unit as elements of a bundle called alloc, as shown in this excerpt:

  bundletype Alloc_T = { free, malloc }

  unit Expr = {
      imports [ alloc : Alloc_T,

You, Knit user, decide where these functions come from. They do not have to come from the standard C library: they can come from any unit that exports a bundle of type Alloc_T. Moreover, the choice is transparent to the code in the Expr unit: the code in the Expr does know or care where the allocation functions come from. Thus, we can transparently replace the standard allocation functions with monitored versions of those functions, if we have a unit that implements the counted versions we want.

Fortunately, we have such a unit: Alloc. The Alloc unit provides versions of the allocation functions that count the number of times that they allocate or free memory. Functions to get and reset the allocation counts are provided in a separate exported bundle, as shown in the excerpt below:

  unit Alloc = {
      imports [ alloc : Alloc_T ];
      exports [ counted_alloc : Alloc_T,
         counts : AllocCounts_T ];

Note that Alloc both imports and exports bundles of type Alloc_T. This is a common Knit idiom for a unit that wraps another unit: the “wrapper” unit modifies or otherwise interposes on access to the inner “wrapped” unit. In this case, the Alloc unit imports definitions of malloc and free, and then exports its own versions of these functions. (A rename declaration, which we discussed previously in Section 4.3.3, is required in order to export the unit’s own definitions as bundle elements called malloc and free.)

4.3.5 Multiple Instantiation

As described in the previous section, the code in expr.c is encapsulated by the Expr unit, and the Expr unit imports the functions malloc and free from the outside, i.e., some other unit. Now, notice that the code in token.c is encapsulated in the Token unit, and that like Expr, the Token unit imports the allocation functions from another unit:

unit Token = {
imports [ alloc : Alloc_T,

The key insight here is that, although both Expr and Token import allocation functions, they do not have to import these functions from the same unit. Instead, every unit instance can import these functions from a separate unit instance. The Expr unit can get allocation from one unit instance, and the Token unit can get different allocation functions from a different unit instance. In this way, we can effectively instrument our Expr and Token units separately, without changing the C source code of either unit.

The obvious solution, then, is for us to put two separate instances of our Alloc unit into our final calculator program: one to track the behavior of expression objects and the other to track the behavior of tokens. This approach gives us separate counts for expr and token objects, which is what we want — but not everything we want. Remember that in addition to tracking allocation for the two type separately, we also want to track allocation behavior both for each “phase” of the interpreter, and for each input expression as a whole (i.e., the “total” counts in transcript below).

  [14] examples/calc> ./calc
  1+2
  read    : 1 + 2
  read    : (allocs/frees)  4/ 4 tokens,  3/ 0 exprs
  eval    : 3
  eval    : (allocs/frees)  0/ 0 tokens,  3/ 2 exprs
  cleanup : (allocs/frees)  0/ 0 tokens,  0/ 4 exprs
  total   : (allocs/frees)  4/ 4 tokens,  6/ 6 exprs

In other words, what we really need are two sets of numbers for each type: one set of numbers that we clear between phases of the interpreter, and a second set that we clear only between expressions. The Alloc unit in our unit file defines a unit that exports allocation functions and one set of allocation counters. Cleverly, we can use this unit to create a unit exports allocation functions and two sets of counters, simply by composing two instances of Alloc as shown below:

  unit Alloc_2 = {
      imports [ alloc : Alloc_T ];
      exports [ counted_alloc : Alloc_T,
         counts_1 : AllocCounts_T,
         counts_2 : AllocCounts_T ];

      link {
   // [export, export, export, ...]
   //   <- Unit
   //   <- [import, import, import, ...];

   // The exported `counted_alloc' and `counts_1' bundles, from one
   // instance of an `Alloc' unit.
   [counted_alloc, counts_1]
       <- Alloc
       <- [counted_alloc_internal];
   // The internal `counted_alloc_internal' bundle and the exported
   // `counts_2' bundle, from a separate instance of our `Alloc' unit.
   [counted_alloc_internal, counts_2]
       <- Alloc
       <- [alloc];
      };
  }

Alloc_2 defines a unit like Alloc, but with two separate sets of counters. Each counter set is accessed by a bundle of functions: the first set by elements of counts_1, and the second set by the elements of counts_2. The Alloc_2 unit is implemented as a compound unit (Section 4.2.3) that connects two instances of Alloc in the “obvious” way. The exported allocation functions from one instance of Alloc are given as imports to the second instance of Alloc. The exported allocation functions from the second instance are then exported from Alloc_2 itself. The bundles for accessing the counts are also exported from Alloc_2, thus creating the two-count-set unit that we need for our calculator program.

When a unit is instantiated more than once, each instance of the unit is independent. This is true whether the unit is instantiated multiple times within a single containing unit (as shown above) or multiple times as parts of different containing units. In either case, each instance of a unit has its own imports, its own exports, and therefore, its own copy of its code and data (e.g., static variables declared in the unit’s C files). In terms of “objects,” you might think of each unit instance as a separate object instance, with its own relationships to other units. In terms of linking, you might think in terms of the object code being linked multiple times into the final program, although tailored for each individual copy.

A careful reading of the Alloc_2 unit definition provides additional detail about the behavior of the two counter sets. The two sets of counters are correlated: for each call to one of the exported allocation functions, two counter instances will be incremented. But, by looking closely at the wiring within Alloc_2, one can see that the two sets of counters are independent: i.e., that neither depends on the values of the other, and that if you reset one of the counter sets, the other counter set will be unaffected. You can see this because neither Alloc unit imports the other’s counts bundle, and therefore, neither unit could possibly invoke the get_alloc_counts or reset_alloc_counts functions on the other.

4.3.6 Summary: The Calc and Calc_Counted Units

So, finally, we have what we need in order to build an instrumented version of the calculator program! The calc.unit file contains top-level units for both the “plain” and instrumented versions of the program: these units are called Calc and Calc_Counted, respectively. Let us briefly summarize the important parts of Calc_Counted:

The complete definitions of the Calc and Calc_Counted units are located at the end of the calc.unit file. By now, everything in these unit definitions should be clear to you — except for the flatten directives, which we will describe below in Section 4.3.8.

4.3.7 Compiling the calc Program

A simple “make” or “make all” in the examples/calc directory of your Knit build tree will run knit to produce the instrumented version of the calculator program. The make process will go through the steps described previously in Section 4.1.6, and the result will be program called calc. In the output from make, you may notice that knit prints out the schedule for the program’s initializers and finalizers.

  [15] examples/calc> ./calc
  1+2
  read    : 1 + 2
  read    : (allocs/frees)  4/ 4 tokens,  3/ 0 exprs
  eval    : 3
  eval    : (allocs/frees)  0/ 0 tokens,  3/ 2 exprs
  cleanup : (allocs/frees)  0/ 0 tokens,  0/ 4 exprs
  total   : (allocs/frees)  4/ 4 tokens,  6/ 6 exprs

As we described in the introduction to this example (Section 4.3), the statistics reported by calc give us confidence that the program is behaving correctly, without memory leaks. It turns out, however, that most nontrivial programs have bugs:

  (1
  read    : scanner error: unexpected end of input
  read    : (allocs/frees)  3/ 3 tokens,  2/ 0 exprs
  eval    : scanner error: unexpected end of input
  eval    : (allocs/frees)  0/ 0 tokens,  1/ 0 exprs
  cleanup : (allocs/frees)  0/ 0 tokens,  0/ 2 exprs
  total   : (allocs/frees)  3/ 3 tokens,  3/ 2 exprs

Although the program correctly handled the erroneous input, it apparently leaked an expr object in the process: the “total” line indicates that calc allocated three exprs but freed only two. The author of the calc example found this bug quickly because Knit allowed him to easily insert instrumentation code into the program. This bug has been left in the C code in case you wish to examine it — see the function parse_term in the file parse.c.

At this point, you may want to experiment with the calc program. Here are some suggested exercises:

4.3.8 Optimizing the Code via “Flattening”

Flattening is an optimization technique in which the knit compiler “weaves” all of the C source files that make up a unit into a single (“flat”) C file. The source code is manipulated to create the proper internal unit connections, of course, but is also manipulated so as to inline functions, remove dead (unused) functions, and hopefully improve the C compiler’s ability to further optimize the code. The code transformations are heuristic, but work well for many cases.

Flattening is controlled by directives in your unit definitions. To say that the implementation of a unit should be flattened, insert the flatten directive into the unit’s definition. (This directive is already specified in the Calc and Calc_Counted units of this example.) Flattening will be “recursively” applied to the entire body of the unit — including the bodies of units within compound units — except for units whose definitions include a noflatten directive. Through flatten and noflatten directives, you can specify which parts of your units are flattened and which are not. Finally, note that flattening directives are honored only when the optimization is enabled on the knit command line via the -f option. By default, flattening is not performed.

To apply flattening to the calc example, first “make veryclean” in order to remove any files previously created by Knit. Then, do:

This will rebuild the program with flattening enabled. Because flatten is specified for the top-level program unit, the entire program will be flattened. The result is an optimized calc program — although you might not notice much speed improvement in this simple example!

Flattening can be “fine-tuned” by specifying an inlining budget. To specify a budget, add a value for KNIT_BUDGET to the list of flags for knit:

make veryclean
make KNIT_FLAGS='-f KNIT_BUDGET=1000'

As described previously in Section 3.1.2, very roughly, KNIT_BUDGET is the total number of static RISC instructions that should be spent on or saved by inlining. Positive values represent spending (i.e., increased code size) while negative values represent saving (reduced code size). To see a difference in the program size of calc, remember to strip the binary of debugging information. Since the calc program is relatively small, you should expect that any program size changes will also be small. Also note that the budget measures are very approximate: a larger budget may actually result in a smaller final program.

4.4 Other Knit Features

Although you have reached the (current) end of the tutorial, there are many features of Knit that we have not yet discussed. These things include:

For information about these language features, refer to the Report on the Language Knit: A Component Definition and Linking Language, which is contained in the doc/report directory of the Knit distribution. Happy Knitting!

Chapter 4Tutorial