A Brief Introduction to TclRAL

What is a Relation?

TclRAL defines two new data types, Tuple and Relation. These are data types in Tcl in much the same way that lists, arrays and dictionaries are available in Tcl. The library then provides a set of operators for these new types. A Relation is fundamentally a set. As a set, a Relation does not contain any duplicated elements. There is also no implied ordering of any of the components of a Relation. Formally, a Relation consists of a heading and a body. The heading is set of attribute names and attribute data types. The body consists of a set of values for each attribute.

Creating Relvars

TclRAL makes a clear distinction between a Relation value and a Relation variable. Since Relations are integrated with the native object system of Tcl, they can be held in ordinary Tcl variables. TclRAL also defines a variable space for Relation variables. In practice this allows the operations that modify variables in place to be well isolated from those that only read the Relation value and perform some computation.

Normally, you define the organization of the data in your program in a set of Relation variables. Let's say that we are interesting in keeping track of the computers in an office environment. We might wish to keep track of a great many aspects of computers and offices, but to make things simple, let's assume that we will only keep track of a few basic facts. In this case, we are interested in things like who made the computer, and where it is located. So for computers, lets assume that in our company we have a capital equipment number that uniquely identifies all capital equipment and that for computers we are only interested in the make and model of the computer. So we would create a Relation variable to hold all the information on computers.

	::ral::relvar create Computer {EquipNo string Make string Model string} EquipNo
This creates a Relation variable named Computer that consists of three attributes:
  1. EquipNo
  2. Make
  3. Model
Each of the attributes is a string and EquipNo is the identifier. This means that every Tuple of the Computer relation must have a unique value of the EquipNo attribute.

Since we are also interesting in where computers are located, we need some way to talk about that. In our simple world let's assume that every office has a number and that only one person occupies an office. So we can characterize offices as:

	::ral::relvar create Office {OfficeNo string Occupant string} OfficeNo

Finally, we want to allow for the fact that some offices contain more that one computer. For example, someone may have both a desktop and a laptop machine. So we need some Relation to keep track of the correlation between Computers and Offices.

	::ral::relvar create ComputerLocation {EquipNo string OfficeNo string} {EquipNo OfficeNo}
Note that the identifier consists of two attributes, EquipNo and OfficeNo. This is perfectly normal and rather common. It is also possible to have multiple sets of attributes that each form an identifier. So the argument to relvar create is the relation heading that defines the type of relation values that can be stored in the relvar and the set of identifiers of the relvar. An identifier consists of one or more attribute names in a list where no identifier is a subset of another identifier of the relvar.

Populating Relvars

Now that we have defined the structure of what we are interested in, we now have to actually enter some data values. This is sometimes the really hard, tedious part. This is also the part where often large user interface programs are written. For our purposes, the data set will be small and we will just specify the computers and offices that we are interested in.

	::ral::relvar insert Computer\
		{EquipNo CE-00357 Make Dell Model {Optimax 4387}}\
		{Make HP Model {Inveron G58} EquipNo CE-10457}\
		{EquipNo CE-00987 Model {LapusMaximus A10} Make Sony}\
		{EquipNo CE-00784 Make Homebrew Model {Assembled from spare parts}}
	::ral::relvar insert Office\
		{OfficeNo A01 Occupant {Mike Newby}}\
		{OfficeNo A09 Occupant {Jane Oldie}}\
		{OfficeNo A12 Occupant {John Intern}}
	::ral::relvar insert ComputerLocation\
		{EquipNo CE-00357 OfficeNo A12}\
		{EquipNo CE-10457 OfficeNo A09}\
		{EquipNo CE-00987 OfficeNo A09}\
		{EquipNo CE-00784 OfficeNo A01}

A few things are worth noticing here. First, you may insert as many tuples in one insert command as you wish. The body of the Relation is specified as a set of attribute / value pairs in much the same way as the array set command accepts its input or in the same fashion as a dictionary is specified. Also note that the ordering of the attributes is not significant. When the tuple is added to the body of the Relation, the right value will end up with the right attribute regardless of the order specified on insert command. Remember that there is no implied left to right ordering in a Relation and therefore no specific order in which attributes must be specified. This holds true for all the TclRAL operators that deal with sets of attributes and their values. It is also the case that the three Relation variables, Computer, Office and ComputerLocation are modified in place by the insert command. All of the subcommands of the relvar command operate on the relation variable directly and hence require a relation variable name as an argument.

Viewing the Data

Now that we have some data, how do we look at it? The relformat command is to Relations what the parray command is to arrays, except that rather than output directly to the standard output, relformat returns a string which you then have to put to a channel. This is a bit more flexible in practice than what parray does.

	% puts [::ral::relformat $Computer Computers]
	|EquipNo |Make    |Model                     |
	|string  |string  |string                    |
	|CE-00357|Dell    |Optimax 4387              |
	|CE-10457|HP      |Inveron G58               |
	|CE-00987|Sony    |LapusMaximus A10          |
	|CE-00784|Homebrew|Assembled from spare parts|
Notice that we obtained the value of the Computer relvar by ordinary Tcl variable substitution. Whenever a relvar is created, a corresponding ordinary Tcl variable is created by the same name in order to be able to obtain the relvar value using convenient Tcl syntax.
	% puts $Computer
	{EquipNo string Make string Model string} {{EquipNo CE-00357 Make Dell Model {Optimax 4387}} {EquipNo CE-10457 Make HP Model {Inveron G58}} {EquipNo CE-00987 Make Sony Model {LapusMaximus A10}} {EquipNo CE-00784 Make Homebrew Model {Assembled from spare parts}}}
Here we finally see the string representation of a Relation. Like all Tcl objects, Relations have a string representation. For Relations, that representation is a list of two elements and it completely describes a relation value. It turns out that it is not terribly easy for a human to read. But when given to the relformat command, the relation value can be seen in its more familiar tabular form.

This is also a good time to warn you about manipulating the string representation of a Relation. Avoid the temptation to use list or string commands on relation values. On that path lies madness! The whole purpose of TclRAL is to provide all the operators needed to manipulate the data type. Directly manipulating the string representation will, at times, yield strange or unexpected results. Don't do it, because it's just not necessary.

Asking Questions About Relations

Now that we have some data, we will be curious about asking some questions about it. To make things a bit more brief, let's import the TclRAL commands into our namespace.

	% namespace import ::ral::*
First, some easy questions. How many computers do we have?
	% relation cardinality $Computer
The cardinality of a relation is the number of tuples contained in its body.

Let's say that we want to know who occupies an office. That information is contained in the Office relation, but we are only interested in the Occupant attribute. The project operator allows us make a new relation value by picking attributes from another relation value.

	set occ [relation project $Office Occupant]
	puts [relformat $occ Occupants]
	|Occupant   |
	|string     |
	|Mike Newby |
	|Jane Oldie |
	|John Intern|

The project operator has a cousin named eliminate. With project you specify which attributes you wish to keep. With eliminate you specify which attributes you wish to discard. Clearly both are not absolutely necessary, but it makes things a lot more convenient to be able to think in either the positive sense (project, keep these attributes) or the negative sense (eliminate, get rid of these attributes).

Lists are a very important data structure in Tcl. So TclRAL provides an operator to move data out of relations and into a list.

	% set occList [relation list $occ]
	{Mike Newby} {Jane Oldie} {John Intern}
In some cases, the returned list will be a proper set, such as when the relation involved has only one attribute (as in this example). The number of attributes in a relation is called its degree (there is a command named, degree, that will return the degree of a relation). For ordinary attributes list has the possibility of returning duplicated values if that is what is contained in the relation.

Suppose now we want to know who uses which computer. To obtain that information we need to operate on the values held in several relation variables. In this case we will use the join and project operations.

	set cl [relation join $Computer $ComputerLocation]
	set clo [relation join $cl $Office]
	set uses [relation project $clo Occupant Make Model]
	puts [relformat $uses "Computer Users"]
	|Occupant   |Make    |Model                     |
	|string     |string  |string                    |
	|John Intern|Dell    |Optimax 4387              |
	|Jane Oldie |HP      |Inveron G58               |
	|Jane Oldie |Sony    |LapusMaximus A10          |
	|Mike Newby |Homebrew|Assembled from spare parts|
	Computer Users

Let's take each of these statements in turn and see what's going on in detail.

	set cl [relation join $Computer ComputerLocation]
	puts [relformat $cl "Computer/ComputerLocation join"]
	|EquipNo |Make    |Model                     |OfficeNo|
	|string  |string  |string                    |string  |
	|CE-00357|Dell    |Optimax 4387              |A12     |
	|CE-10457|HP      |Inveron G58               |A09     |
	|CE-00987|Sony    |LapusMaximus A10          |A09     |
	|CE-00784|Homebrew|Assembled from spare parts|A01     |
	Computer/ComputerLocation join
The join operation combines relations in a very special way. Left to its own devices, join creates a new relation that is a combination of the relations that are its arguments. It combines the tuples from the bodies of its arguments such that tuples which have the same values for all attributes that are named the same become a tuple in the result. In this case, we joined the relation value contained in Computer to the relation value contained in ComputerLocation. Both of these relations have an attribute named EquipNo. So every tuple in Computer which has a value of EquipNo that matches a tuple in ComputerLocation is included in the result. The result has attributes that consist of all the attributes in the first relation plus all the attributes in the second relation. However, note that we don't include a second EquipNo attribute from the ComputerLocation relation. That wouldn't really tell us anything new. The net effect then is to tack on an OfficeNo attribute that corresponds to where the computer is located. Because the matching is for attributes whose values are equal and because the redundant attributes from the second relation are eliminated, this type of join operation is called the natural join (it turns out that you can dream up all types of joins, but natural joins are the most common ones).

This particular example is simple enough not to really exercise all the nuances of join. It so happens that every computer is located in some office, i.e. for every tuple in Computer there is a corresponding match in ComputerLocation. If that were not the case, then the "extras" in both relations would not appear in the result. Another variation is what to do when the attributes across which you wish to join to relation values do not have any commonly named attributes. There are two solutions to this:

Now we join the previous result with the relation value contained in the Office relvar. We do this because we are interested in the names of the people who use the computers and not the office where the computer is located. Fortunately, the relation value contained in the Office relvar tells us who sits in each office.

	set clo [relation join $cl $Office]
	puts [relformat $clo "Computer/ComputerLocation/Office join"]
	|EquipNo |Make    |Model                     |OfficeNo|Occupant   |
	|string  |string  |string                    |string  |string     |
	|CE-00357|Dell    |Optimax 4387              |A12     |John Intern|
	|CE-10457|HP      |Inveron G58               |A09     |Jane Oldie |
	|CE-00987|Sony    |LapusMaximus A10          |A09     |Jane Oldie |
	|CE-00784|Homebrew|Assembled from spare parts|A01     |Mike Newby |
	Computer/ComputerLocation/Office join

Finally, we are only really interested in the computer attributes and the people attributes. So we need a way to get just a subset of the attributes, namely Occupant, Make and Model. So we use the project operator as described above to select only those attributes of interest.

	set uses [relation project $clo Occupant Make Model]
	puts [relformat $uses "Computer Users"]
This gives us the result shown above as "Computer Users".

Nesting Operations

So far we have been putting the intermediate results into ordinary Tcl variables (such as "cl" and "clo"). You might be wondering what is happening to all these variables. The answer is the same thing that happens to any Tcl variable. When it goes out of scope and is no longer referenced it gets cleaned up. There is no need to keep any special track of intermediate results and destroy them later. Because relations operate like any Tcl value, we could have chosen to code the above examples as nested operations without the intermediate storage. That way works just fine. Choose whichever way makes the code clearest. Also note that the relation values contained in a relation variable do not go away without specifically being destroyed. Relation variables serve to always provide a reference to the contained relation value (they serve other purposes too).

Selecting Tuples from a Relation

We've seen that the project command allows us to select particular attributes from a relation. Now we will look at selecting particular tuples from a relation. The operator that picks tuples out of the body of a relation is called restrict. Let's say we are interesting in the computers used by Jane Oldie. We can use our "uses" relation that we just calculated to create a new relation that only contains those tuples where Jane Oldie is the Occupant.

	% set janes [relation restrict $uses u {[tuple extract $u Occupant] eq "Jane Oldie"}]
	% puts [relformat $janes "Jane's Computers"]
	|Occupant  |Make  |Model           |
	|string    |string|string          |
	|Jane Oldie|HP    |Inveron G58     |
	|Jane Oldie|Sony  |LapusMaximus A10|
	Jane's Computers
The restrict command takes three arguments, a relation value, a variable name and an expression. Each tuple in the relation is successively assigned to the given variable and if the expression evaluates to true, then the tuple is included in the result. In the example above, we used the relation value stored in the "uses" variable. Each tuple in "uses" (there are four) is then successively assigned to the variable "u". The expression, {[tuple extract $u Occupant] eq "Jane Oldie"}, is then evaluated. If that expression returns true (non-zero), then the tuple is included in the result. The expression will be evaluated once for each tuple in the relation and is evaluated in the context of the caller. This means that the expression can be of arbitrary complexity and involve variables other than the tuple variable ("u" in this example). In the example, the expression compares the value of the Occupant attribute with the string constant "Jane Oldie" to determine if there is a match. The Occupant attribute is extracted from the "u" tuple using the tuple extract subcommand.

Examining a Relation Tuple by Tuple

Normally there is little reason to iterate on a relation in the usual way that is common when dealing with Tcl lists or arrays. Indeed, part of the power of the relation data type is that operations are done a set at a time. However, there are times when we would like to examine the tuples in a relation one at a time and perhaps in a certain order. As we have noted, there is no inherent order in a relation, but sometimes we would like to see things sorted. To accomodate this, TclRAL provides a relation foreach command that accesses the tuples in a relation one at a time and can provide that access in a specified order.

	relation foreach c $Computer -ascending EquipNo {
		puts [tuple get [relation tuple $c]]
	EquipNo CE-00357 Make Dell Model {Optimax 4387}
	EquipNo CE-00784 Make Homebrew Model {Assembled from spare parts}
	EquipNo CE-00987 Make Sony Model {LapusMaximus A10}
	EquipNo CE-10457 Make HP Model {Inveron G58}
The relation foreach command was clearly patterned after the Tcl core command foreach. Each tuple in the relation value is converted into a relation value of cardinality one and that singular relation value is assigned to the given variable ("c" in this example) and the script is executed. If there is no sorting specification, then the order of iterating through the relation is arbitrary and defined by the implementation. Otherwise, a sorting order and set of attributes may be given and in this case the order of visiting the tuples in the relation is according to the sorting order. The relation tuple command converts a relation value of cardinality one into its corresponding tuple and the tuple get command returns the name / value pairs of a tuple value (patterned after array get).

Poor Man's Persistence

In TclRAL, relations are much like any other Tcl value, there is no persistence provided by the language itself. There are many schemes used in Tcl programs to provide persistence of data across program invocations. TclRAL provides three means to store and reconstruct a set of relvars.

  1. Serialize the relvars
  2. Store the relvars in a Metakit database
  3. Dump a Tcl script that will reconstruct the relvars
Perhaps the simplest way to handle small files of data is simply to serialize the relvars and place the result in a file. There is even a command to handle it all.
	serializeToFile mydata.ral
In the case of our example, the "mydata.ral" file would contain all the values of the relvars plus all the information required to reconstruct the relvars and repopulate them. When the program is run again, then the state may be restored.
	deserializeFromFile mydata.ral
Clearly, you would not want to store huge quantities of data in this fashion. However, for small to modest sized data sets it is a simple and effective technique. Larger quantities of data can be handled by using TclRAL's ability to store data in a Metakit database.

Where To Go From Here

The purpose of this introduction is to help get new users of TclRAL over the blank screen syndrome. The examples here are simple, but I hope they have encouraged you to try some more complicated programs on your own. There is much more to TclRAL than has been presented here. From here I would suggest writing more programs on your own and reading the manual pages. The manual pages are not very good tutorials because they are trying to be a precise reference. However, they do have some examples in them, particularly on the more complicated operations. Readings about the Relational Model of Data from very many sources are available (I happen to like Date, but there are many authors in this area). Relational algebraic thinking encourages a style of programming that one might call relation oriented programming. It's not new. Such a programming style is based on a semantic data model, i.e. a structure of relvars in which the significant entities of your program can be captured as data. My own experience is that the hard thinking goes into designing the proper collection and inter-relationships of the relvars. The programming tends then to be a pleasure because the set at a time operators take much of the tedium out of the implementation. With good data design, the process of formulating the needed operations is much easier.