[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
"Computational processes are abstract beings that inhabit computers." [Abelson/Sussman 1996, p.1]
At its core, any programming language faces the programmer with three distinct abstract notions [Abelson/Sussman 1996]: the entities which are to be manipulated, usually referred to as "data", the computational "processes" which carry out those manipulations, and the description of these processes by finite collections of computational rules, commonly called "programs".
As in other functional programming languages, all computations performed by Q scripts are expression evaluations: The user specifies an expression to be evaluated, the interpreter computes the value of that expression and prints it as the result of the computation.
Having described the "data" manipulated by Q scripts, expressions, we now turn to the computational processes carried out on those expressions, expression evaluations, and their description by collections of computational rules, equations.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
In the Q language, there is in fact no exact equivalent of a "function definition" in conventional languages. Rather, definitions take the form of equations which describe how a given expression, the left-hand side of the equation, can be transformed into another one, the corresponding right-hand side. Note the difference between this interpretation and common mathematical usage in which an equation is usually something to be "solved". Here we are actually dealing with "rewriting rules", i.e., the equations are oriented from left to right. The syntax of equations is captured by the following grammar rules:
definition : expression lqualifiers '=' expression qualifiers ';' {lqualifiers '=' expression qualifiers ';'} lqualifiers : [qualifier {qualifier} ':'] qualifiers : {qualifier} qualifier : condition | where-clause condition : 'if' expression | 'otherwise' where-clause : 'where' expression '=' expression {',' expression '=' expression} |
Variables occurring on the left-hand side of an equation are taken to be
universally quantified. They play the role of formal parameters in
conventional programming languages, which are "bound" to their
corresponding values when the equation is applied to an actual
expression value. For instance, suppose we wish to define a function
sqr
which squares its argument. We know that to square something
we simply multiply it with itself. The corresponding equation is:
sqr X = X*X; |
Here the left-hand side is simply the application of the function symbol
sqr
to an argument value X
which stands for any actual
value we might apply the sqr
function to.
An equation may also contain a collection of conditions and "local"
variable definitions (these two kinds of supplementary elements are also
called qualifiers in Q parlance), and multiple right-hand sides
(which are interpreted as a collection of equations for the same
left-hand side). An empty condition may be denoted with the keyword
otherwise
, which is a convenient means to indicate the "default
case" of a definition. For instance, the following equations define the
factorial function:
fac N = N*fac (N-1) if N>0; = 1 otherwise; |
The otherwise
keyword is nothing but syntactic sugar; it can always
be omitted. However, it tends to improve the readability of definitions like
the one above.
Qualifiers can also be written on the left-hand side of an equation, like so:
fac N if N>0: = N*fac (N-1); otherwise: = 1; |
The two forms are more or less equivalent, and can be mixed, but left-hand side qualifiers may also be shared by multiple equations for the same left-hand side. For instance, here is a definition of Ackerman's function:
ack M N if M>0: = ack (M-1) (ack M (N-1)) if N>0; = ack (M-1) 1 if N=0; if M=0: = N+1; |
Note that the condition M>0
on the left-hand side of the first
equation applies to both the first and the second equation. In general,
the scope of a left-hand qualifier extends up to (but excluding) the
next equation with either another left-hand side qualifier or a new
left-hand side expression.
In difference to function definitions in conventional programming languages, the left-hand side of an equation may also contain constant and structured argument patterns which are to be matched against the actual parameters of a function. For instance, the following equations, which involve constant values on the left-hand side, define the Fibonacci function:
fib 0 = 0; fib 1 = 1; fib N = fib (N-1) + fib (N-2) otherwise; |
As an example for structured argument patterns, operations to extract the head (first element, the "car" in Lisp terminology) and tail (list of remaining elements, the "cdr") of a list are defined in the standard library as follows:
hd [X|Xs] = X; tl [X|Xs] = Xs; |
Note that in the above example we actually only need one of the involved component values. In such a case it is possible to employ the anonymous variable `_' as a placeholder for any component value we do not care for. For instance:
hd [X|_] = X; tl [_|Xs] = Xs; |
Multiple instances of the anonymous variable are matched independently from each other, therefore an equation like
foo _ _ = 0; |
will be matched for any combination of arguments. Also note that values matched by the anonymous variable on the left-hand side of an equation are not accessible on the right-hand side. (The anonymous variable may appear on the right-hand side of an equation, but there it is always treated as a free variable, see Free Variables.)
As another example, here is the definition of a function which adds up the members of a list:
add [] = 0; add [X|Xs] = X+add Xs; |
The above definition is the Q equivalent for what Lisp programmers call "cdr'ing down a list". In general you will find that pattern-matching lets you express many functions operating on recursive, list- or tree-like data structures in a very concise and elegant way. For instance, we can define a binary tree insertion operation simply as follows:
const nil, bin X T1 T2; insert X nil = bin X nil nil; insert X (bin Y T1 T2) = bin Y (insert X T1) T2 if X<Y; = bin Y T1 (insert X T2) otherwise; |
Here the nil
and bin
symbols act as constructors
which are used to build a recursive binary tree data structure. To these
ends, the bin
symbol can be applied to a value X
(which is
the value to be stored at that node in the tree) and the left and right
subtrees T1
and T2
. This works exactly like a function
which is applied to its arguments, but here the bin
symbol does
not actually compute anything but simply stands for itself.
In the above example, we actually advertised the fact that nil
and bin
are just constructors by declaring them with
const
, as described in Declarations. While this declaration
is not strictly necessary, it helps to enforce that these symbols cannot
accidentally be "redefined" by means of an equation:
nil = ...; // compiler will print an error message bin X T1 T2 = ...; // here, too bin X = ...; // here, too (prefix of a constant) |
The same applies to the built-in constant symbols true
and
false
which are predeclared as const
, and also to other
types of built-in constants, i.e., numbers, strings, files, lists,
streams and tuples. The general rule is that no constant value is
allowed as the left-hand side of an equation. A singleton variable is
also forbidden. Thus the left-hand side must either be a
non-const
function symbol or application. Prefixes of constant
applications, such as bin X
in the example above, also count as
constants and are thus forbidden on the left-hand side. On the other
hand, both constants and variables may occur as the head of the
left-hand side of an equation if they are followed by additional
arguments. Thus equations like the following are all valid:
0 X = ...; bin X T1 T2 Y = ...; F X Y = ...; |
Note that since expressions involving operators are nothing but
syntactic sugar for applications, they can occur on the left-hand side
of an equation as well (with the exception of the '
operator
which is a constructor symbol, see Special Forms). For instance,
here is a collection of algebraic simplification rules which cause any
expression involving only +
and *
to be converted into a
"sum of products" form:
// distributivity laws: (X+Y)*Z = X*Y+X*Z; X*(Y+Z) = X*Y+X*Z; // associativity laws: X+(Y+Z) = (X+Y)+Z; X*(Y*Z) = (X*Y)*Z; |
A word of caution: if the =
operator occurs at the toplevel of
the left- or right-hand side of an equation, it must be
parenthesized. This is necessary to prevent the =
operator to be
confused with the =
symbol used to separate both sides of an
equation. (The same also applies to variable definitions.) E.g., you
should write something like
(X=X) = true; |
instead of
X=X = true; // this is a syntax error |
Although almost any expression can form the left-hand side of an equation, you should be aware that an equation will only be applicable if the pattern you specify can actually occur in the course of an expression evaluation. In particular, since the Q interpreter evaluates expressions in a leftmost-innermost fashion, you should make sure that argument patterns match normal form expressions (cf. Normal Forms and Reduction Strategy), unless the function to be defined is a special form (cf. Special Forms). Otherwise the equation may be useless, albeit syntactically correct. The compiler cannot verify these conditions, so they are at your own responsibility.
Using equations we can also easily define "higher order" functions
which take functions as arguments or return them as results. This is
made possible by the fact that function application is an explicit
operation. For instance, as a generalization of the add
function
above, the standard library provides an operation which applies
any "binary" function F
to a list, starting from a given
initial value A
:
foldr F A [] = A; foldr F A [X|Xs] = F X (foldr F A Xs); |
The foldr
("fold-right") function takes as its arguments a function
F
, an initial value A
, and a list [X1,...,Xn]
, and
computes the value
F X1 (F X2 (... (F Xn A)...)). |
(The name of this function comes from the fact that the recursive
applications of F
are parenthesized to the right.) Now we can easily
define the add
function in terms of foldr
as follows:
add = foldr (+) 0; |
There is another version of the fold operation, which accumulates results from left to right rather than from right to left:
foldl F A [] = A; foldl F A [X|Xs] = foldl F (F A X) Xs; |
Note that although both functions work in linear time (with respect to
the size of the input list), the foldl
function actually is more
efficient than foldr
in terms of stack space requirements since
its definition is "tail-recursive" (cf. Tail Recursion).
As another example, here is the definition of the standard library
function map
which maps a given function F
to each member
of a list Xs
:
map F [] = []; map F [X|Xs] = [F X|map F Xs]; |
Another example of a (built-in) higher-order function is lambda
,
which returns another function (cf. Lambda Abstractions). In fact, there is nothing that keeps you from defining a
function like lambda
yourself in Q. As a proof of concept, here
is a toy implementation of the lambda calculus, based on the
combinatorial calculus, as described in [Henson 1987]. (This definition
will only work with simple kinds of expressions. The built-in
lambda
function provides a much more comprehensive implementation
which also properly handles the obscure cases. But our definition in
terms of the combinatorial calculus could be extended to handle those
cases, too.)
Our little lambda
function is declared as a special form
which prevents its arguments from being evaluated. This will be
explained in Special Forms.
special lambda X Y; lambda X X = _I; lambda X (Y Z) = _S (lambda X Y) (lambda X Z); lambda X Y = _K Y otherwise; /* combinator rules */ _I X = X; // identity function _K X Y = X; // constant function _S X Y Z = X Z (Y Z); // argument dispatching |
For instance:
==> lambda X (2*X) _S (_S (_K (*)) (_K 2)) _I ==> _ 4 8 |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
If a variable has multiple occurrences on the left-hand side of a rule,
each occurrence must be matched to the same value. In term rewriting
theory, such rules are called "non-linear" ("non-left-linear", to be
more precise). The following definition implements an operation
uniq
which removes adjacent duplicates from a list:
uniq [] = []; uniq [X,X|Xs] = uniq [X|Xs]; uniq [X|Xs] = [X|uniq Xs] otherwise; |
It is important to notice the difference between the syntactical notion of equality which plays a role in definitions like the one above, and the "semantic" equality relation defined by the `=' operator (cf. Relational Operators). Non-linear equations and equations involving constants, such as
uniq [X,X|Xs] = uniq [X|Xs]; fib 0 = 0; |
call for the verification of syntactic identities. The Q language implements these checks by means of a special relational operator `=='. Thus the above examples could also be written as:
uniq [X,Y|Xs] = uniq [X|Xs] if X==Y; fib X = 0 if X==0; |
Two expressions are considered syntactically equal (`==') only if
they print out the same in the interpreter. In contrast, the meaning of
the normal equality operation `=' depends on built-in rules and
equations specified by the programmer. For instance, two different
instances of the expression 0
always denote the same
object. However, since 0
is an integer and 0.0
a floating
point number, these two expressions are syntactically
different:
==> 0==0.0 false |
Nevertheless, the =
operator allows to compare integers and
floating point numbers. Indeed it asserts that 0
and 0.0
are "equal":
==> 0=0.0 true |
Note that, in contrast to all the other built-in relational operators, the syntactic equality operator `==' is actually implemented as a special form (cf. Special Forms) which compares its operands as literal expressions without evaluating them:
==> 0==0+0 false |
This is necessary to ensure proper operation when a nonlinear equation needs to compare two unevaluated special arguments for syntactic identity, but makes the operation somewhat awkward to use directly in programs. As a remedy, you can easily define a non-special syntactic equality operation yourself as follows:
eq X X = true; eq _ _ = false otherwise; |
In fact, this definition is already provided in the standard library
script stdlib.q
. The standard library also defines the logical
negations `!=' and neq
of `==' and eq
, see
The Standard Library, for details.
You should also note that the strict separation of syntactic and
semantic equality is actually quite useful. First of all, it allows the
=
operation to be treated in a manner consistent with other
comparison operators such as <
and >
. Secondly, it allows
you to tailor the definition of =
to your application. For
instance, an application might call for a set data structure, and you
would like to be able to test whether two expressions of a certain kind
represent the same set. Now there are fairly sophisticated data
structures for representing sets efficiently, but almost none of them
allow you to decide whether two objects represent the same set simply on
the basis of syntactic identity. Instead, you will have to provide an
explicit =
operation which tests whether two set objects contain
the same elements.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
On the right-hand side of an equation (as well as in the condition part
and the right-hand sides of local definitions), we distinguish between
bound variables, i.e., unqualified variable symbols which are
introduced in the left-hand side of an equation (or the left-hand side
of a local definition, see Local Variables), and free or
global variables which are not bound by any of the
left-hand side patterns. Free variables can also be declared explicitly
using a var
declaration (cf. Declarations). They are
effectively treated as (parameterless) function symbols, except that
they may be assigned a value by means of a variable
definition. Variable definitions have the following syntax:
definition : 'def' expression '=' expression {',' expression '=' expression} ';' | 'undef' identifier {',' identifier} ';' |
For instance, in the following equation,
foo X = C*X; |
the variable C
occurs free on the right-hand side and we have:
==> foo 23 C*23 |
We can assign a value to the variable as follows (this can be done either in the script or interactively in the interpreter):
def C = 2; |
After this definition we have:
==> foo 23 46 |
It is worth noting here that, for convenience, the global variable environment uses dynamic binding, i.e., you can change the value of a free variable at any time interactively in the interpreter:
==> def C = 3; foo 23 69 |
This also makes it possible to have recursive definitions like the following:
==> var fac = \N.if N>0 then N*fac (N-1) else 1 ==> fac \X1 . if X1>0 then X1*fac (X1-1) else 1 ==> map fac [1..10] [1,2,6,24,120,720,5040,40320,362880,3628800] |
(Note that we have to use an explicit variable declaration here, since `fac' is an uncapitalized identifier and hence would otherwise be taken to denote a function symbol. We also made use of a variable definition with initializer which declares and defines the variable in a single step. See Declarations.)
In contrast, bound variables (which are bound either by the left-hand side of the equation, cf. Equations, or in a local variable definition, cf. Local Variables) always use lexical binding, i.e., their values are always completely determined by the (static) context of the equation to which they belong. This is important because it guarantees referential transparency, i.e., the values of the left-hand side and local variables in an equation will only depend on the expression to which the equation is applied, as long as the definition does not involve any operations with side-effects, such as I/O. The values of the free variables may change, but only between different expression evaluations, as there is no function which would allow you to change the value of a global variable during an ongoing evaluation. This is in contrast to other functional programming languages such as Lisp which allow you to modify the value assigned to a variable in a function definition.
Free variables are useful if you have a script which repeatedly refers to the same value whose calculation may be costly or involve functions with side-effects. By assigning such a value to a variable, you can refer to it without having to recompute the value each time it is required, as would be the case with a corresponding function definition.
The right-hand side of a variable definition may be any Q expression, as
described in Expressions. The expression is evaluated once,
at the time the script is loaded, and is then matched against the
left-hand side of the definition, binding variables to the corresponding
component values of the right-hand side value. In the special case that
the left-hand side is a single variable, it is simply bound to the
right-hand side value. Whenever a defined variable occurs "free" on
the right-hand side of an equation or variable definition, it will then
be replaced by the value of the assigned expression. For instance, the
following definition assigns the value exp 1.0 = 2.71828…
to the free variable e
:
var e = exp 1.0; |
Variable definitions are carried out in the order in which they occur in
a script, so that a definition may refer to the values of variables
assigned in previous definitions. It is also possible to override
assignments made in previous definitions, and to undefine a
variable. Multiple definitions can be performed with a single def
statement. For instance:
def N = 99, K = N, N = 2*K+1, M = N; undef N, K; |
As already mentioned, the value assigned to a variable may also be
changed with def
and undef
statements in the interpreter,
see Command Language.
Note that a variable can also be declared const
, which can be
used to introduce constant values, as in:
public var const c = 299792458; // the speed of light, in m/sec |
In this case the variable can only be assigned to once, afterwards attempts to assign a new value to the same variable will produce an error message:
==> def c = 300000000 ! Cannot redefine const variable >>> def c = 300000000 ^ |
Free variable definitions can also involve a pattern on the left-hand side which is to be matched against the supplied value. E.g.,
def [X|Xs] = [1,2,3]; |
assigns 1
to the variable X
and the remainder of the list,
[2,3]
, to the variable Xs
. Pattern matching is performed
in the same manner as for the left-hand side of an equation. However, an
error is reported if the match fails (as would be the case, e.g., if we
tried to assign the empty list to [X|Xs]
).
A number of built-in free variables are already defined by the interpreter when it starts up, and there is also a free anonymous variable `_' which is used to record the result of the most recent expression evaluation performed in the interactive evaluation loop of the interpreter. See Command Language, for details.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
As already mentioned, an equation may also contain one or more auxiliary
variable definitions a.k.a. where
clauses, which bind
additional variables to their corresponding values. In difference to the
global definitions of "free" variables discussed in Free Variables, the definitions in where
clauses are "local" to the
rule in which they occur, meaning that the values of these variables are
only available in the rule they are defined in. As in the case of free
variable definitions, the left-hand side of each where
clause can
be an arbitrary expression pattern which is to be matched against the
value computed from the corresponding right-hand side. The variables
occuring on the left-hand side of the definition will then be bound to
their corresponding values, thereby becoming "bound" variables in the
context of the rule, in addition to the variables occuring on the
left-hand side of the equation. Local definitions also act as additional
conditions; the rule will only be applicable if all left-hand sides in
where
clauses match their corresponding right-hand side values
(this is guaranteed, of course, if each left-hand side is a single
variable).
Local definitions are useful if the right-hand side of a rule refers repeatedly to the same (maybe complicated) subexpression. You can avoid the recalculation of such a value be assigning it to a variable, and referring to the variable in the right-hand side. For instance:
foo X = bar Y Z where Y = baz X, Z = qux Y; |
As already mentioned, pattern-matching definitions are permitted as well (and the anonymous variable also works as usual). A failing match causes the entire rule to fail. For instance,
foo Z = bar X Z where [X|_] = Z; |
works like
foo [X|Xs] = bar X [X|Xs]; |
but avoids repeating the list value on the right.
Like conditions, where
clauses can also be written on the
left-hand side of an equation. This allows you to share the local
variables between different equations for the same left-hand side. For
instance:
foo Z where [X|_] = Z: = bar X Z if X>0; = bar (-X) Z otherwise; |
The variable bindings in a single where
clause are performed in
the given order, and each definition may refer to all left-hand side
variables of the rule, as well as all variables introduced in earlier
definitions. Each use of a variable refers to its most recent
definition. It is also possible to have multiple where
clauses,
as in:
foo X = bar Y where Y = baz Z where Z = qux U where U = quux X; |
Note that here the definitions will be processed from right to
left. Such constructs are convenient if there is a series of values to
be computed in which one value depends on the next one. Using multiple
where
clauses, you start off with the definitions immediately
required by the right-hand side of the rule, then add another
where
clause for the variables introduced in the first clause,
etc. The above definition is actually equivalent to the following:
foo X = bar Y where U = quux X, Z = qux U, Y = baz Z; |
It is also possible to mix where
clauses with conditions, as in:
foo X = bar Y if qux Y where Y = baz Z if quux Z where Z = bazz X; |
Again, the different qualifiers are actually considered from right to
left; the rule will be aborted as soon as the first qualifier fails
(i.e., a condition evaluates to false
or the left-hand side of a
definition does not match the corresponding right-hand side), so that
no time is wasted with definitions which are not needed anyway.
Of course, left-hand side qualifiers work in the same fashion:
foo X if qux Y where Y = baz Z if quux Z where Z = bazz X: = bar Y; |
You can also mix left-hand and right-hand qualifiers, in which case the left-hand qualifiers will be processed first:
foo X if quux Z where Z = bazz X: = bar Y if qux Y where Y = baz Z; |
Variables bound by the left-hand side or a where
clause can be
rebound freely. E.g.:
foo X = bar (X+1) where X = qux X X if X>0 where X = baz X; |
Each occurrence of the variable in the right-hand side of the equation or a local definition then refers to its most recent value. Note that this doesn't mean that local variables are mutable; it just means that there are multiple nested scopes of symbols. More precisely, each local definition introduces a nested scope binding the variables in the left-hand side of the definition. It goes without saying that repeated rebinding of the same variable can be confusing, but sometimes it is appropriate and at other times it might just be more convenient than having to invent a bunch of additional variable names for no other purpose than disambiguation.
Also note that if a variable is bound by the left-hand side or a local
definition of a rule, then a global variable of the same name will be
"shadowed" within the rule. However, in this case you can still access
the value of the shadowed binding with a qualified identifier. For
instance, with the following definitions, foo 2
reduces to
bar 2*99
:
/* this is the script foo.q */ def FOO = 99; foo X = FOO*foo::FOO where FOO = bar X; |
If the free variable belongs to the current module, as in the example above, then it can also be escaped using an inline variable declaration (cf. Constants and Variables), like so:
def FOO = 99; foo X = FOO*var FOO where FOO = bar X; |
Inline variable declarations can also be used to explicitly declare free
variable symbols on the fly. If the symbol specified with var
has
not yet been declared in the current module, it will be declared as a
private free variable symbol.
Finally, Haskell programmers should note that Q's where
clauses
are really just variable definitions, and not (as in Haskell) a
kind of "local equations" which would permit you to define "local
functions". (There are technical reasons for this which we won't
discuss here.) So you shouldn't try to emulate Haskell's local function
definitions using Q's where
clauses. For instance, while the
following definition is perfectly legal Q code, it does not have the
intended effect of defining a local factorial function:
// this doesn't work as intended foo X = map fac [1..10] where fac N = if N>0 then N*fac (N-1) else 1; |
Instead, the where
clause is just an (unsuccessful) attempt to
match the right-hand side expression, `if N>0 then N*fac (N-1) else
1', against the left-hand side expression `fac N' and bind the
variable N
accordingly.
It is actually possible to have a kind of "local functions" in Q as well, but this involves creating a lambda abstraction and assigning it to a local variable. However, this is usually not very convenient and a "global" function definition using equations is almost always preferable. Also note that, because of Q's scoping rules in local definitions, you cannot really have recursive local definitions anyway. For instance, the following definition will not work as intended either (even though the analogous global definition of the factorial works just fine, because of the dynamic binding of global variables):
// doesn't work either foo X = map FAC [1..10] where FAC = \N.if N>0 then N*FAC (N-1) else 1; |
Only non-recursive "local functions" can be defined this way. (Well, using the "fixed point combinator" it is in fact possible to define the factorial and actually any recursive function in a form that can be assigned to a local variable, but we won't go into that here.)
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The Q language allows you to restrict the set of patterns which can be
matched by a variable on the left-hand side of an equation or variable
definition. This is done by augmenting any left-hand side occurrence of
the variable with a type guard, using the notation:
variable:type
. For instance, to declare a type
BinTree
which comprises the constructors nil
and bin
X T1 T2
, the following declaration might be used
(cf. Declarations):
public type BinTree = const nil, bin X T1 T2; |
This declaration introduces two function symbols nil
and
bin
, just as if they were declared by:
public const nil, bin X T1 T2; |
However, it also assigns these symbols to the type symbol BinTree
.
The type symbol can then be used as a guard on variables to ensure
that a variable is only matched to objects of the given type. E.g., the
following equation will only apply if the argument to the foo
function has the form nil
or bin X T1 T2
for some
subexpressions X
, T1
and T2
:
foo T:BinTree = ...; |
This makes it possible to avoid explicit argument patterns like
foo nil = ...; foo (bin X T1 T2) = ...; |
in cases in which the component values are not actually required. Also, it
allows you to define operations on a given type without actually referring
to the constructor symbols of the type, and thereby makes it possible to
avoid dependencies on implementation details. You can even "hide" the
constructor symbols of a type by declaring them private
, which makes
the symbols accessible only to the script which declares the type they
belong to. For instance:
public type BinTree = private const nil, bin X T1 T2; |
The Q language allows you to construct a hierarchy of types by making one
type a subtype of another. This is done by specifying a supertype following
the type identifier in a type declaration. For instance, to make
BinTree
a subtype of the type SearchTree
, BinTree
would
be declared as follows:
type BinTree : SearchTree = const nil, bin X T1 T2; |
Now a variable of type SearchTree
will not only match
SearchTree
's, but also any object of its subtype BinTree
. In
fact, we might have declared SearchTree
as an "abstract" type which
does not provide any constructors at all:
type SearchTree; |
Abstract supertypes like SearchTree
are useful for factoring out
generic operations which are shared by different subtypes; see Types,
for examples, and for a more detailed discussion of the type guard concept.
The Q language has a number of predefined type symbols which can be used
to distinguish the corresponding types of objects built into the
language: Bool
, Int
, Float
, Real
(which is
the supertype of both Int
and Float
), Num
(the
supertype of Real
), String
, Char
(the subtype of
String
denoting the single-character strings), File
,
List
, Stream
, Tuple
and Function
. (Besides
these, there are also two types Exception
and SysException
for dealing with exception values, cf. Exception Handling.) For
instance, the standard library defines an operation isstr
, which
determines whether its argument is a string, as follows:
isstr _:String = true; isstr _ = false otherwise; |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The process in which equations are applied in the Q interpreter to evaluate a given expression is fairly straightforward. In fact, it corresponds to the way in which we manipulate algebraic formulae, and we all learned how to do that in high school.
We say that an equation L=R
is applicable to a given expression
X
, if we can substitute actual values for the variables occurring in
L
so that we obtain X
. This is also expressed as "L
matches X
" or "X
is an instance of L
". In this case,
we can replace X
by Y
, where Y
is the expression
obtained from R
by replacing the variables occurring in L
by
their corresponding values. We shall call such a replacement step a
reduction, and write it down as X ⇒ Y
.
For instance, given the rule
sqr X = X*X; |
we have that
sqr 2 ⇒ 2*2 |
since the expression sqr 2
matches the left-hand side sqr X
,
with 2
being substituted for the variable X
. This works with
compound expressions as well, even if they involve variables themselves.
For instance:
sqr (X+1) ⇒ (X+1)*(X+1). |
Here, the compound expression (X+1)
has been substituted for the
variable X
.
Equations are usually applied in the context of a larger expression to be evaluated. For instance, we have that
sqr 2 + 2 ⇒ 2*2+2 |
by the application of the above definition of sqr
to the subexpression
sqr 2
of sqr 2 + 2
.
Before we proceed, a remark on the role of the built-in operations of the
Q language is in order. Primitive operations, such as the built-in +
and *
operators described in Built-In Operators, do not require
any equations specified by the programmer. Instead, they may be thought of
as being predefined by a large set of built-in equations. Each application
of a built-in rule also counts as a single reduction. For instance, we have
that
2*2 ⇒ 4 |
by the built-in rule which handles the multiplication of integers.
Built-in rules take priority over any equations specified by the programmer. However, it is possible to extend the definition of a primitive operation with equations supplied by the programmer. A corresponding example has already been given in Equations. By refining built-in definitions accordingly, you can also treat "exceptions" in built-in operations; see Runtime Errors, for an example.
In order to complete the evaluation of a given expression, we have to
repeatedly perform single reductions until no further equations (built-in
or user-defined) apply. We then say that the resulting expression is in
normal form, and interpret it as the value of the original
expression. For instance, the following reduction sequence leads to the
normal form (i.e., value) 6
for the target expression
sqr 2 + 2
(we still assume the definition of sqr
introduced
above):
sqr 2 + 2 ⇒ 2*2+2 ⇒ 4+2 ⇒ 6. |
Normal forms do not necessarily exist. Just as in conventional programming languages, the evaluation may go into an endless loop and never return an answer. Even if a normal form exists, it need not be unique - it may depend on which equations are used in the reduction process and in what order. The problem is that at any point in the reduction process there may be more than one "reducible" subexpression, and even if there is only one such expression (termed a redex), there may be more than one applicable equation.
To illustrate this kind of problem, consider the definition of the sqr
function from above, and the target expression sqr (1+1)
. This
expression admits three distinct reduction sequences:
sqr (1+1) ⇒ sqr 2 ⇒ 2*2 ⇒ 4 sqr (1+1) ⇒ (1+1)*(1+1) ⇒ 2*(1+1) ⇒ 2*2 ⇒ 4 sqr (1+1) ⇒ (1+1)*(1+1) ⇒ (1+1)*2 ⇒ 2*2 ⇒ 4. |
The second and third reduction sequence appear to be almost the same, but
the first reduction sequence makes clear that it matters whether we reduce
the subexpression (1+1)
before applying the definition of sqr
or not. In the present example, the order in which reductions are performed
only affects the number of required reduction steps, but it is easy to
construct other systems of equations in which both the termination of a
normal form evaluation and the computed normal forms actually depend on the
evaluation order.
Non-uniqueness of normal forms does not necessarily mean that there must be two or more rules being applicable to a given expression. Ambiguities can also arise when an equation "overlaps" with other equations or with itself, as in the following example:
foo (foo X) = bar X; |
This system consisting of a single equation is terminating, but it has
two distinct normal forms for the expression foo (foo (foo X))
,
namely foo (bar X)
and bar (foo X)
.
Indeed, the termination of rewrite systems and the uniqueness of normal forms are important issues in the theory of term rewrite systems [Dershowitz/Jouannaud 1990]. The Q language simply avoids these problems by imposing a reduction strategy which makes the reduction process deterministic. The reduction strategy must specify unambiguously for each reduction step the redex that is to be reduced and the equation which is to be applied. For this purpose the Q interpreter applies the following rules:
The leftmost-innermost strategy is common in many programming languages, and
it also corresponds to the "natural" way in which people tend to carry
out calculations manually. It means that at any point in the reduction
process it is always the leftmost-innermost redex which is to be reduced.
In particular, this implies that in an application of the form
F X
first F
is evaluated, then X
, before the
value of F
is applied to the value of X
. Thus the first
reduction sequence from above,
sqr (1+1) ⇒ sqr 2 ⇒ 2*2 ⇒ 4 |
would be the one actually carried out in the interpreter in the course
of the evaluation of sqr (1+1)
.
The leftmost-innermost strategy resolves all ambiguities in the choice
of redexes during a normal form evaluation. The second disambiguating
rule allows to decide which equation to apply if more than one equation
is applicable to a given redex. As indicated, the default is to apply
equations in the order in which they appear in the source script. This
allows you to have equations with overlapping left-hand sides which
naturally arise, e.g., in definitions involving "special case rules"
such as the definition of the fib
function in Equations:
fib 0 = 0; fib 1 = 1; fib N = fib (N-1) + fib (N-2) otherwise; |
The rule for applying equations is extended to the built-in equations of
primitive operations by assuming - as already mentioned - that the
built-in rules come "before" any equations specified by the
programmer, and hence always take priority. Also note that external
functions, which are declared with the extern
modifier and
implemented in a C module, are treated in an analogous manner. In
general, the interpreter will first try a built-in rule, then an extern
rule, then the equations supplied in the script. (The treatment of
extern functions is described in more detail in C Language Interface.)
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Conditional rules are applied in the same manner as simple equations,
only we also have to check the conditions of the rule. When the
left-hand side of a conditional rule matches the target expression, we
first evaluate the condition (with left-hand side variables being
replaced as usual). This expression must evaluate to a truth
value, otherwise we cannot decide whether the rule is applicable or
not. (The Q interpreter generates a runtime error if the result is not a
truth value.) The rule is applicable only when the condition evaluates
to true
.
For instance, recall the definition of the factorial function:
fac N = N*fac (N-1) if N>0; = 1 otherwise; |
The first rule is applicable only if N>0
evaluates to
true
; if it evaluates to false
, the second rule
applies. Furthermore, if N
happens to be incomparable with
0
, then a runtime error will occur.
Here's how this definition is applied to evaluate the expression
fac 3
:
fac 3 ⇒ 3*fac (3-1) ⇒ 3*fac 2 ⇒ 3*(2*fac (2-1)) ⇒ 3*(2*fac 1) ⇒ 3*(2*(1*fac (1-1))) ⇒ 3*(2*(1*fac 0)) ⇒ 3*(2*(1*1)) ⇒ 3*(2*1) ⇒ 3*2 ⇒ 6. |
Local variable definitions (cf. Local Variables) are treated in a similar fashion. We first evaluate the right-hand side of the definition and match it against the corresponding left-hand side. If the match fails, the rule is skipped. Otherwise, we bind variables accordingly and proceed with the next qualifier or the right-hand side of the rule.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
We already discussed that the Q interpreter normally considers equations in the order in which they appear in a script. Hence, if all your equations go into a single script, you simply write down overlapping equations in the order in which they should be tried by the interpreter. However, if equations for the same expression pattern are scattered out over different modules, then the rule order also depends on the order in which the different modules are processed by the compiler. In the current implementation, the rules of each module are processed when the first import or include declaration for the module is encountered during a preorder traversal of the module graph starting at the main script, but you should not rely on this.
Hence it is usually a bad idea to have overlapping equations scattered out over different source files. However, in some situations this is hard to avoid. For instance, you might decide to put all "default rules" for certain expression patterns in a separate module. But if you simply import this module at the beginning of your main script then the default rules might override all the other equations.
To work around this, the Q language allows you to explicitly attach priorities to equations by means of a priority declaration, which has the following format:
declaration : '@' ['+'|'-'] unsigned-number |
This specifies the given number (which must be an integer, optionally
signed with `+' or `-') as the priority level of the
following equations. In the current implementation, priority levels must
be 32 bit integers, i.e., in the range
-0x80000000..0x7fffffff
. Rules are ordered according to their
priorities, highest priority first. Rules with the same priority are
considered in the order in which they appear in the script.
The priority level set with a priority declaration applies only to the
script in which the declaration is located. The default priority level,
which is in effect up to the first @
declaration, is 0.
Note that even with priority declarations the built-in operations still take priority over all user-defined rules, i.e., those operations effectively have infinite priority. (The same applies to external functions, see C Language Interface.)
As a simple (and contrived) example for the use of priority declarations, consider the following script:
@-1 foo X = -1; // "default" rule @0 foo X:Real = 0; // "standard" rule @+1 foo X:Int = 1; // "special case" rule |
The second equation is at the default priority level, 0. The first equation, although it comes before the second one, becomes a "fallback" rule by assigning it a low priority, -1. The last equation is put on a high priority level, +1, and hence overrides both preceding equations. The net effect is that the equations are actually ordered in reverse textual order. You can verify this in the interpreter:
==> foo 77 1 ==> foo 77.0 0 ==> foo () -1 |
Of course, this example is somewhat silly, because in a single script we can easily arrange equations in the correct textual order. However, if for some reason we had to put the three rules into three different modules, then using the given priority declaration ensures that the equations will always be applied in the correct order.
Priority levels should be used sparingly, if at all. Using low priorities to factor out a module with default rules can occasionally be quite useful, but overriding rules with high priorities is considered harmful and should be avoided whenever possible.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The present and the following section contain material for the technically interested reader which, while not being strictly necessary to work successfully with the Q programming language, helps to understand the operation of the Q interpreter and to write better Q scripts. You might wish to skim the following at first reading, and later return to it when you have become more proficient with the Q language and require some further background knowledge.
While the "rewriting model" of expression evaluation introduced in
Normal Forms and Reduction Strategy, provides a nice conceptual model
for the Q programmer, it is too inefficient to be a useful implementation
model for the Q interpreter. In particular, it would be rather inefficient to
actually construct the full expressions which occur as intermediate results
in the course of a reduction sequence, and repeatedly scan these expressions
for the leftmost-innermost redex. Instead, we can evaluate expressions by a
simple recursive procedure as follows. The input to the algorithm is an
expression X
to be evaluated.
X
is an application, list or tuple, evaluate the parts of X
recursively, from left to right. Proceed with step 2.
X
, invoke it and
return the corresponding value.
L=R
which is
applicable to X
. If there is no such rule, X
is in normal form
and is returned as the value of the expression. Otherwise we recursively
evaluate R
(with variables instantiated to their corresponding values)
and return it as the value of the original expression.
(The above description of course leaves out many important details. For
instance, the interpreter will also have to take care of def
'ed
symbols, and we will have to organize the search for an applicable equation.
However, here we only want to give the general idea, and not a complete
implementation of the interpreter.)
The above procedure can actually be implemented non-recursively if we
keep track of the rules which are currently "active", together with the
corresponding variable bindings. This information can easily be maintained
on a stack. We illustrate this with a simple example. Recall the definition
of the add
function from Equations:
add [] = 0; add [X|Xs] = X+add Xs; |
The evaluation of add [1,2,3]
then proceeds as follows:
add [1,2,3] ⇒ 1+add [2,3]: add [2,3] ⇒ 2+add [3]: add [3] ⇒ 3+add []: add [] ⇒ 0 3+0 ⇒ 3 add [3] ⇒ 3 2+3 ⇒ 5 add [2,3] ⇒ 5 1+5 ⇒ 6 add [1,2,3] ⇒ 6 |
Each new level of indentation indicates that we "suspend" the current rule (push it on the stack) and activate a new rule in order to evaluate some part of the right-hand side of the suspended rule. When all parts of the right-hand side have been evaluated, we return to the suspended rule (pop it from the stack) and actually perform the reduction. More precisely, we replace the left-hand side of the suspended rule by the result obtained by evaluating the right-hand side, which is already in normal form. (We will of course have to keep track of the results of intermediate reductions, which can be done on a second "evaluation" stack. When the evaluation of the right-hand side is finished, the corresponding result will be on top of the evaluation stack.)
If desired, we can easily reconstruct the "context" of a rule by following the chain of stacked rules starting from the current rule, and proceeding towards the bottom of the stack. For instance, when we perform the reduction
add [] ⇒ 0 |
in the above example, the context of this rule is described by the following stacked rules:
add [3] ⇒ 3+add [] add [2,3] ⇒ 2+add [3] add [1,2,3] ⇒ 1+add [2,3] |
Thus, when add []
gets reduced to 0
, we have actually completed
the following initial part of the reduction sequence:
add [1,2,3] ⇒ 1+add [2,3] ⇒ 1+(2+add [3]) ⇒ 1+(2+(3+add [])) ⇒ 1+(2+(3+0)) |
(Note that we never actually constructed intermediate results like the
expression 1+(2+(3+add []))
. Rather, these expressions are represented
implicitly by the contents of the stack.)
We can also apply the above procedure to qualified rules accordingly.
When the interpreter comes to consider a conditional rule, it pushes it
onto the stack as usual. However, it then first starts to evaluate the
qualifying conditions and where
clauses of the rule. When the
value of a condition has been computed, we can check whether it
holds. If the computed value is neither true
nor false
,
the interpreter aborts the evaluation with a runtime error. If it is
true
, it proceeds by evaluating other qualifiers or the
right-hand side. If it is false
, however, it gives up on the
current rule (pops it from the stack), and considers the next applicable
equation. A similar approach is used to handle local variable
definitions.
For instance, consider once more the definition of the factorial function:
fac N = N*fac (N-1) if N>0; = 1 otherwise; |
The computation of fac 3
then proceeds as follows (the ∗
symbol indicates that the condition N>0
has evaluated to false
and hence the corresponding rule is abandoned):
fac 3 ⇒ 3*fac (3-1): 3>0 ⇒ true 3-1 ⇒ 2 fac 2 ⇒ 2*fac (2-1): 2>0 ⇒ true 2-1 ⇒ 1 fac 1 ⇒ 1*fac (1-1): 1>0 ⇒ true 1-1 ⇒ 0 fac 0 ⇒ 0*fac (0-1): 0>0 ⇒ false ∗ fac 0 ⇒ 1 1*1 ⇒ 1 fac 1 ⇒ 1 2*1 ⇒ 2 fac 2 ⇒ 2 3*2 ⇒ 6 fac 3 ⇒ 6 |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The above equations for the add
and fac
functions are examples
for recursive definitions. The computational processes generated from
such definitions are characterized by chains of suspended rules in which
the same rules are activated over and over again. For instance, an
evaluation of fac N
with N>0
requires N
recursive
activations of the rule:
fac N = N*fac (N-1) if N>0; |
Hence, if N
becomes very large, the recursive definition of fac
is in danger of running out of stack space. In the following, we show how to
avoid this defect by employing the technique of "tail-recursive
programming".
It is a well-known fact that the factorial function also has an "iterative"
implementation which can be executed in constant memory space. The idea is
to maintain a "running product" P
and a "counter" I
which
counts down from N
to 1
. The iterative algorithm can be written
down in a conventional programming language like Pascal as follows:
P := 1; I := N; while I>0 do begin P := P*I; I := I-1; end; |
While the Q language does not provide any special looping constructs
(and it also lacks an assignment operation), there is an alternative
definition of fac
which takes up the idea of running products and
counters and implements the factorial function in an iterative fashion:
fac N = fac2 1 N; fac2 P I = fac2 (P*I) (I-1) if I>0; = P otherwise; |
Here, the "state variables" P
and I
are implemented as
arguments of an "auxiliary" function fac2
which is invoked from
fac
. Again, this is a recursive definition; it requires N
recursive applications of the rule
fac2 P I = fac2 (P*I) (I-1) if I>0; |
when fac N
is computed. However, in difference to our previous
definition of fac
, the recursive rule is always applied "on top"
of the target expression. Such rules are called tail-recursive.
(The name "tail recursion" comes from the fact that the recursive
application of fac2
is the last operation considered during the
leftmost-innermost evaluation of the right-hand side.)
For instance, the evaluation of fac 3
now proceeds as follows
(abbreviating reductions by built-in rules for `-' and `*'):
fac 3 ⇒ fac2 1 3 ⇒ fac2 (1*3) (3-1) ⇒ fac2 3 2 ⇒ fac2 (3*2) (2-1) ⇒ fac2 6 1 ⇒ fac2 (6*1) (1-1) ⇒ fac2 6 0 ⇒ 6 |
Tail-recursive definitions can be employed for implementing all kinds of functions which can be computed on a machine with a fixed set of "registers" and no auxiliary memory. For instance, here is a tail-recursive implementation of the Fibonacci function:
fib N = fib2 1 0 N; fib2 A B N = fib2 (A+B) A (N-1) if N>0; = B otherwise; |
(This definition also has the desirable side effect that it cuts down the exponential running time of the "naive" definition given in Equations, to linear.)
The Q interpreter employs a clever optimization trick, commonly known as tail recursion optimization (see e.g., [Steele/Sussman 1975]), to actually execute tail-recursive definitions within constant stack space. Hence no special looping constructs are required for implementing iterative algorithms efficiently in the Q language.
Assume that in our stack model of expression evaluation we are working
on a reduction X ⇒ Y
, and that we already recursively
evaluated all parts of the right-hand side Y
, but not Y
itself. Furthermore, suppose that in order to evaluate Y
we will
have to apply the rule Y ⇒ Z
next. Then, instead of
keeping the rule X ⇒ Y
suspended and evaluating Y
using rule Y ⇒ Z
recursively, we can also immediately
perform the reduction X ⇒ Y
and replace this rule with
Y ⇒ Z
on the stack. Thus, the new rule Y ⇒
Z
will not require any additional stack space at all, but simply reuses
the existing "activation record" for the X ⇒ Y
rule. In
other words, instead of invoking the Y ⇒ Z
rule as a
"subroutine", we effectively perform a kind of "goto" to the new
rule. We also refer to this process as performing the tail
reduction X ⇒ Y
. The evaluation now proceeds as if we
had been working on the rule Y ⇒ Z
in the first place.
For instance, with the new definition of fac
the evaluation of
fac 3
would be carried out using only a single level of suspended
rules (again, the ∗ symbol signals abortion of a rule with failing
qualifier):
fac 3 ⇒ fac2 1 3: fac2 1 3 ⇒ fac2 (1*3) (3-1): 3>0 ⇒ true 1*3 ⇒ 3 3-1 ⇒ 2 fac2 3 2 ⇒ fac2 (3*2) (2-1): 2>0 ⇒ true 3*2 ⇒ 6 2-1 ⇒ 1 fac2 6 1 ⇒ fac2 (6*1) (1-1): 1>0 ⇒ true 6*1 ⇒ 6 1-1 ⇒ 0 fac2 6 0 ⇒ fac2 (6*0) (0-1): 0>0 ⇒ false ∗ fac2 6 0 ⇒ 6 |
Besides the tail recursion optimization technique discussed above, the Q
interpreter also automatically optimizes toplevel sequences
(i.e., applications of the ||
operator which are not inside a
nested subexpression) on the right-hand side of equations, so that basic
imperative-style looping constructs can be executed in a tail-recursive
fashion as well. Consider, for instance, the standard library function
do
which applies a function to each member of a list, like the
map
function discussed in Equations, but instead of
collecting the results in an output list simply throws away the
intermediate values and returns ()
. This is useful, of course,
only if the function is evaluated solely for its side-effects, e.g., if
we want to print out all elements of a list. The definition of do
is rather straightforward:
do F [] = (); do F [X|Xs] = F X || do F Xs; |
Now this definition is not tail-recursive in the strict sense
alluded to above, because the last application on the right-hand side of
the second rule actually involves (||)
and not the do
function. (Recall that an infix expression like X||Y
is nothing
but syntact sugar for the function application (||) X Y
.)
However, as a remedy, the Q interpreter actually implements toplevel
sequences on the right-hand side of a rule by direct stack manipulation.
That is, the first argument of a sequence is thrown away as soon as it
has been computed. By these means, the interpreter, after having
computed F X
, effectively carries out the reduction do F
[X|Xs] ⇒ do F Xs
on which it can perform tail recursion
optimization as usual. Thus the above definition actually works in
constant stack space, as one might reasonably expect.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
There are a few conditions which may force the Q interpreter to give up on a reduction sequence and abort the evaluation with a runtime error message. These conditions are listed below.
Ctl-C
). In this case it is possible to resume the evaluation in
the debugger (cf. Debugging), provided that debugging has been
enabled with the break on
command (see Command Language).
break
is on
,
the debugger is invoked to inform the user of the equation which gave
rise to the error condition.
All of the above error conditions can also be caught and handled by the running script in any desired manner, see Exception Handling.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated by Albert Gräf on February, 23 2008 using texi2html 1.76.