3 Using Knit

Chapter 3
Using Knit

Knit operates on unit files: files that describe programs and program parts in terms of their of software components. Each component is called a unit. Unit descriptions are written in a textual specification or “programming” language and stored in unit files (files whose names end with ‘.unit’), which may be processed by Knit.

This chapter describes the Knit tools that operate on unit files, that produce unit files, or that may otherwise by useful to Knit users. Chapter 4 is a tutorial on the unit description language, i.e., the contents of unit files.

3.1 The knit Compiler

The Knit unit file compiler is simply called knit. The command line syntax is as follows:

knit [option ...] [var=value ...] unit-file unit-name

where unit-file is the name of the unit file to be read, and unit-name is the name of the “topmost” unit that should be processed, i.e., the unit that describes the complete program or library that is to be compiled. (The other command line arguments will be described in a moment.)

3.1.1 Output Files

Assuming that compilation is successful, knit produces a set of output files:

knit_generated.mk: knit produces a set of make rules for building the archive (‘.a’) files that will contain the compiled code for the unit named on the knit command line. The knit_generated.mk file is not a complete Makefile; rather, it is designed to be included from another Makefile.
makefile: If the MAKEFILE=template argument was specified on the command line (as described below), knit will use the specified template to create a makefile in the current directory. (Presumably, makefile will include the knit_generated.mk file.) Currently, the template file is simply copied; no transformations are made on the file contents.
rename_*: These files are inputs to the rename_dot_o_files tool (Section 3.4) and describe how symbols in generated object (‘.o’) files should be renamed as part of implementing the unit. The rename_* files are referenced by the rules in knit_generated.mk; you do not need to deal with these files directly.
knit_inits.c: This file contains the implementations of the generated knit_init and knit_fini C functions. These functions will be called to run your units’ initializers and finalizers, respectively. (Initializers and finalizers are described in Section 4.3.1.)
*_anon_*.c: If your unit definitions contain literal C code, knit will move this “anonymous” code into C source files so that it may be compiled. These files may also be made if you use Knit’s “flattening” optimization (described in Section 4.3.8).

knit may also leave certain temporary files behind, with names like TMP, *xxx, and *yyy. You may safely ignore these files.

3.1.2 Command Line Options

The command line options to knit are as follows:

-X: Do not create any output files. This option is useful when you simply want to check the correctness of your unit specifications.
-f: Perform the code “flattening” optimization described in Section 4.3.8. In brief, this option tells Knit to collect and weave all of your C code into one file, so that it may be better optimized by your C compiler. Knit does not perform this optimization by default.
-c: Check the constraints that are specified in the unit specifications. Constraints and constraint-checking are described in the the Report on the Language Knit: A Component Definition and Linking Language (in the doc/report directory of the Knit distribution). By default, Knit does not enforce constraints.

In addition to these options, knit looks for command line arguments of the form var =value . The compiler processes these arguments in three ways.

First, all of the variable definitions are copied into the output knit_generated.mk file. This provides a convenient way for you to specify certain make variables at the time Knit is run. Moreover, it makes it possible for you to put references to these variables at certain points in the unit file itself — for example, in the specifications of C file names. These references will be expanded when your C code is actually compiled by make.

Second, var =value arguments to knit are put into the environment of any subprocesses created by the compiler. This is principally useful when knit’s code flattening optimization is enabled; with flattening, knit uses commands like the following to preprocess your unit code:

env var=value ... \
sh -c 'gcc -P -E $KNIT_CPPFLAGS ... source-file.c '

Because the variables are put into the environment, one can use variable references within directory specifications in a unit file. In fact, it is very good practice to use variables in directory specifications, because this makes your unit files less dependent on the exact organization of your source files.

Third and finally, certain variables have special meaning to knit itself. These variables are:

UNIT_PATH=dirs (Required.): Specify the search path for input unit files. The dirs path is a colon-separated list of directory names. Note that knit requires that you provide a value for UNIT_PATH on the command line.
MAKEFILE=file: If specified, knit will use the given file as a template for creating a makefile in the current directory. If MAKEFILE is not specified on the command line, knit will not create a Makefile for you. (In any case, knit will create the knit_generated.mk rule file.)
KNIT_TOOLS=dir: Tell knit to find its auxiliary programs (e.g., knit_c_parser and knit_smartmv) in the specified directory. This option is primarily used when invoking a non-installed set of Knit tools, such as the programs within a Knit build tree. If KNIT_TOOLS is not specified on the command line, knit will look for its helper programs along the user’s usual program search path.
KNIT_CPPFLAGS=cppflags: When flattening is enabled, KNIT_CPPFLAGS contains any extra flags that knit should pass to the C preprocessor. The default is to pass no extra arguments. When flattening is not enabled, this variable has no effect.
KNIT_BUDGET=number: When flattening is enabled, KNIT_BUDGET provides control over the amount of inlining that should be performed. Very roughly, the value of KNIT_BUDGET is the total number of static RISC instructions that should be spent on or saved by inlining. Positive values of number represent spending (i.e., increased code size) while negative values represent saving (reduced code size). Code size can be reduced by removing dead functions, by inlining functions that are only used once (which eliminates instructions to push arguments, call the function, and return), and by inlining trivial functions whose body requires fewer instructions than a function call. Any such savings count toward achieving the overall budget: increasing the number of inlined instructions that knit may put elsewhere. If unspecified, the default budget is 0.
In practice, the value of KNIT_BUDGET is only very loosely correlated with the size of the final binary. This is because Knit operates on the C source code, and therefore has only indirect control over the optimizations that may (or may not) be performed by the C compiler. Beyond simple inlining and dead function elimination, Knit does not try to predict the effect of other optimizations that the C compiler may provide.

Note that knit processes variable settings from the knit command line only. In particular, knit does not look for settings of environment variables. (You do not want your complete environment copied into the knit_generated.mk file, do you?)

3.1.3 The knit_generated.mk File

Most of the recipe for building your program or library is contained in the knit_generated.mk file, which is produced for you by knit. As described previously, this file contains the make rules for (1) compiling the necessary source files into object files, (2) manipulating the object files as required to make the proper cross-unit connections, and (3) combining the resultant files into one or more archive files. This is as far as the knit_generated.mk rules go, however. More is needed in order to finish the job of making a complete, final program. Since the rules for “finishing the job” are not known to Knit, Knit is designed to make it easy for you to write your own Makefile containing the necessary rules. The idea is that your Makefile will include the knit_generated.mk rules file, and then provide the higher-level rules for the final assembly of your program. Typical rules for final assembly might look like this:

  $(PROGRAM): knit_inits.o $(KNIT_LIBS) ...
   $(CC) -o $@ --begin-group $^ --end-group

  knit_inits.o: knit_inits.c
   $(CC) -c $< $(CFLAGS)

There are four important things to notice about the above rules. First, the complete program is made by linking together the compiled knit_inits.o file and all of the libraries that contain your program’s unit code. If your program requires non-Knitted objects in addition to the Knit-generated libraries, these would also be listed in the rule. Second, the value of KNIT_LIBS is set in the knit_generated.mk file. That file defines a variable KNIT_OBJS as well, and any other variables that were specified on the knit compiler command line, as described previously in Section 3.1.2. Third, the set of program objects is given to the C compiler as a group, between the --begin-group and --end-group options. This idiom — which is specific to gcc, unfortunately — eliminates potential problems that the linker might have in resolving symbols. Finally, the Knit-generated knit_inits.c file is not in a unit, and therefore must be compiled and linked into your program explicitly. Think of the code in knit_inits.c as part of the “runtime environment” for your Knitted code.

The example programs that come with Knit (located within the examples subdirectory of the software distribution) each have a complete Makefile that you can easily copy and adapt for your own work.

3.2 The knitdoc Documentation Generator

WARNING: The knitdoc program does not currently work if you build Knit with Hugs. knitdoc requires some Haskell libraries that are provided with ghc, but not with Hugs.

The knitdoc program produces HTML-format documentation from a unit file. The command line syntax is:

knitdoc [var=value ...] dest-dir unit-files

The command line arguments are:

var =value: Variable settings, as described previously for knit in Section 3.1.2. Like knit, knitdoc requires that the UNIT_PATH be specified on the command line.
dest-dir: The name of the directory into which the HTML output files will go. Note that this directory must already exist.
unit-files: The names of the unit files to be processed. Any files that are included by unit-files will be processed as well.

The output of knitdoc is a set of HTML files describing the units and bundletypes that are defined in the unit files. (Other kinds of top-level definitions are not yet translated.) The “root” of the documentation is found in the generated index.html file.

Although the generated HTML is determined almost entirely by the unit and bundletype declarations themselves, knitdoc supports documentation comments (also called “doc comments”) that are similar to those found in Java. In a unit file, a doc comment begins with the three-character sequence “/*#” and ends with the sequence “#*/”. Every character between these delimiters is part of the comment; leading asterisks and whitespace are not discarded as they would be in a Java doc comment.

  /*# This comment describes the `Part' unit... #*/
  unit Part = {
    ...
  }

A doc comment that precedes a unit or bundletype definition will be copied verbatim into the generated HTML page for the definition. Therefore, the body of a doc comment should be written as valid HTML.¹ Documentation comments must precede the unit or bundletype definition; they cannot be used to document parts of a definition. Also, note that knitdoc does not currently support Java-style tagged paragraphs within doc comments (e.g., paragraphs marked with tags like @see or @author).

3.3 The mk_unit Template Generator

The knit compiler and knitdoc documentation generator both work on unit files, and ultimately, a unit file must be written by a person who understands the purpose and structure of the unit-encapsulated C code. To ease the task of writing a unit file, however, the Knit tool suite includes mk_unit, a small script that can aid the programmer by producing much of the unit file “boilerplate.”

mk_unit reads a set of object (‘.o’) files, analyzes the imported and exported symbols, heuristically groups related symbols into bundles, and finally outputs (to stdout) the boilerplate for a unit that can encapsulate the analyzed objects. The command line syntax of mk_unit is:

mk_unit [-n name ] object-files [-- other-object-files [-- genbundle-args ] ]

where the options and arguments are as follows:

-n name

Use name as the name of the generated unit description. Without this option, mk_unit gives the generated unit a dummy name.

object-files

The names of the object files to be processed. The mk_unit script creates one unit definition that describes the collection of objects, not one unit description for each object. If you want each object as a separate unit, simply run mk_unit separately on each.

-- other-object-files

The names of object files that may import symbols from the unit being generated. When dividing an existing program or library into multiple units, it is useful for mk_unit to know which functions and variables are actually used across unit boundaries. By knowing this, mk_unit can make a better unit definitions, ones in which the exports are driven by actual cross-unit connections.

Thus, mk_unit needs information about the “environment” for the unit being generated, and this is given by other-object-files . These files are used to separate the set of exported symbols into those that must be exported from the unit being generated and those that one may choose not to export from the unit.

If no information about the “unit environment” is available, one can simply specify an empty set of other-object-files .

-- genbundle-args

If a second -- appears on the command line, all remaining arguments are passed through to the knitGenBundles program. This program is invoked by mk_unit to sort the imported and exported symbols into related groups — what Knit calls bundles. The arguments that may usefully appear after -- are the following:

UNIT_PATH=dirs (Required.): Search path for unit files, as described in Section 3.1.2.
var =value: Other bindings as described in Section 3.1.2.
unit-file: The file from which to read bundletypes.

The mk_unit script uses the bundletype definitions in the given unit-file to organize the import and export symbols of the unit definition being created. By providing the set of bundletypes being used in your project, you can greatly improve the quality of the unit definitions generated by mk_unit.

Note that if a second -- option is not given to mk_unit, or if there are no genbundle-args on the command line, then mk_unit will not invoke knitGenBundles to group symbols. Instead, mk_unit will produce a unit definition that has a single import bundle and one or two export bundles. (There may be two export bundles if a non-empty set of other-unit-files was specified.)

The output of mk_unit is a unit definition of the following form:

  unit name = {
    imports[ ... ];
    exports[ ... ];
    depends{ exports + inits + finis needs imports; };
    files{ object-files } with flags {};
  }

The following transcript shows how mk_unit could be used to generate a unit definition for one of the example programs that comes with Knit. In the Knit distribution, the file examples/calc/main.c contains the main function for a calculator-like program called calc. (See Section 4.3.) Since calc is a Knit example, the file examples/calc/calc.unit already contains a unit definition for the code in main.c. Nevertheless, we can use mk_unit to generate a new unit definition for the code. We might do this in order to check the hand-written unit, for example.

  cd examples/calc
  make main.o
  mk_unit -n Main main.o -- -- calc.unit

mk_unit processes main.o, reads the bundletype definitions from the calc.unit file, and finally outputs the following unit definition:

  unit Main = {
    imports[ Repl_T : Repl_T, /* {repl} */
           ];
    exports[ Main_T : Main_T, /* {main} */
           ];
    depends{ exports + inits + finis needs imports };
    files{
      "main.o",
    } with flags {};
  }

Repl_T and Main_T are the names of bundletypes defined in the calc.unit file. If you compare the above output to the actual definition of Main in calc.unit, you will see that the mk_unit-generated definition and the actual definition are nearly identical. mk_unit did not just copy the Main definition from calc.unit, though — it analyzed main.o and produced its own unit definition!

While a mk_unit- generated unit definition will be “complete,” it will almost certainly need some hand-tweaking in order to be most useful. For instance, you may want to:

change the organization of the imports and exports,
reclassify an exported common symbol as an imported common symbol,
specify linking constraints,
specify initializers and finalizers,
provide finer-grain dependency information,
change the files list to refer to the source C files, or
specify C preprocessor flags for the files.

Most of these Knit language features are described in Chapter 4.

Do not be concerned that you will need to edit the generated unit file. The purpose of mk_unit is to “get you off the ground,” not to create the final unit definitions for your project. The mk_unit script is something that you are expected to run once for each set of objects in your project, and then never again.

Finally, note that because mk_unit is written is Perl, you can easily modify the script to suit the needs of specific projects.

3.4 The rename_dot_o_files Object Editor

The final Knit tool described in this chapter is rename_dot_o_files, the object file editor that is invoked by Knit-generated Makefiles. While you would never need to invoke rename_dot_o_files by hand in the normal course of using Knit, you might find rename_dot_o_files to be of use in other projects. So, for hackers and the curious, we describe the program here.

The basic purpose of rename_dot_o_files is to change the names of non-local symbols (i.e., import and exports) that appear within an object file. Specific symbol renamings are described in a renaming file. Symbols not listed in the renaming file are renamed by applying a prefix, which is specified on the command line. The command line syntax of rename_dot_o_files is:

rename_dot_o_files prefix rename-file object-file

where the arguments are as follows:

prefix: The prefix that should be applied by default to every non-local symbol in the object file. However, if an explicit renaming pattern is given for a symbol in the renaming file, then the prefix is not applied to that symbol.
rename-file: The name of the renaming file. This file contains a set of renaming specifications, each on a separate line, and each of the form “from =to ” where from and to are symbols. In the object file being edited, every occurrence of the (non-local) symbol from will be replaced with the symbol to .
rename_dot_o_files is somewhat fussy about the format of this file and the file should not contain any unnecessary white space.
object-file: The name of the object file to be edited. Note that the object file is edited “in place”: rename_dot_o_files mutates the given file instead of producing a new object file.

You might look at the files produced by knit, as described in Section 3.1.1, to get a better feel for how rename_dot_o_files can be used.

[next] [prev] [prev-tail] [front] [up]

Chapter 3Using Knit