[Top] [Contents] [Index] [ ? ]

The Q Programming Language

The Q Programming Language, Version 7.11, 23 February 2008.

Copyright (c) 1992-2008 by Albert Gräf.

This manual has been published in the series Musikinformatik und Medientechnik, Bereich Musikinformatik, Musikwissenschaftliches Institut, Johannes Gutenberg-Universitaet Mainz

55099 Mainz
Germany
ISSN 0941-0309

The Q programming system is distributed under the terms of the GNU General Public License. See the software license accompanying the distribution for details.

Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies.

Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one.

Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the author.

This document describes the Q programming language and system, version 7.11, 23 February 2008. Written by Albert Gräf, University of Mainz.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1. Introduction

Q stands for "equational", so Q, in a nutshell, is a programming language which lets you "program by equations". You specify a system of equations which the interpreter uses as "rewrite rules" to reduce expressions to "normal form". This allows you to formulate your programs in a high-level, concise and declarative style. Q's development started out in the 1990s, motivated by the author's research on term pattern matching [Gräf 1991] and inspired by the pioneering work on equational programming by Michael O'Donnell and others [O'Donnell 1985], as an attempt to show that term rewriting would provide a feasible basis for a practical programming language. I think that this goal has been achieved, and that the present interpreter is efficient and robust enough for practical purposes. Q has been ported to a variety of operating systems, such as BeOS, FreeBSD, Linux, MacOS X, Solaris and Windows. Porting to other modern UNIX/POSIX environments should be a piece of cake. Thus Q can be used on most modern computer systems, and Q scripts written on one system will usually run on all supported platforms.

The Q language supports a rich variety of built-in types, like arbitrary precision integers, floating point numbers (double precision 64 bit), truth values, strings, tuples, lists, streams (a "lazy" list data structure with call-by-need evaluation), lambda functions and files. It also provides primitives for exception handling and multithreaded execution. Q scripts can be broken down into "modules", each with their separate namespace, and Q gives you full control over which symbols (function, variable and type names) are exported by each module. This makes it easy to organize large scripts in a modular fashion. Q also allows you to interface to "external" modules written in the C programming language, which provides a means to access functions in C and C++ libraries and employ C's higher processing speed for time-critical tasks. Conversely, Q scripts can also be executed from C and C++, which allows Q to be used as an embedded language or term rewriting engine in C/C++ applications.

As a practical programming language, Q comes with "batteries included". The standard library, which is mostly written in Q itself, provides rational and complex numbers, a lot of useful tuple, list and stream processing functions (including comprehensions), some common container data structures (dictionaries, sets, etc.), and an interface to the PostScript language. It also includes an extensive system interface which offers services such as binary and C-style formatted I/O, BSD socket I/O, process management, POSIX threads, regular expression matching and internationalization features. Additional extension modules provide interfaces to a number of other third party libraries, which turns Q into a practical tool for a variety of application areas.

In difference to other functional languages, Q is entirely based on the notions of rewrite rules, reductions and irreducible expressions (also known as normal forms) pertaining to the term rewriting calculus. A Q "program", called a script, consists of equations which are treated as rewrite rules and are used to reduce expressions to normal form. The normal form of an expression denotes its value, which can itself be a compound expression. Q has no rigid distinction between "constructor" and "defined" function symbols and it also allows you to evaluate expressions containing "free" variables. Basically, both sides of an equation may involve arbitrary expressions. Therefore Q can also be used as a tool for symbolic expression evaluation.

On the surface, Q looks very much like contemporary functional languages such as Miranda or Haskell. In fact, the syntax of the language has largely been inspired by the first edition of Introduction to Functional Programming by Richard Bird and Philip Wadler. However, Q is an interpreted language with dynamic typing and eager (leftmost-innermost) evaluation, which is more in line with classical functional languages such as Lisp. For the sake of efficiency, Q scripts are first translated into "bytecode" (an intermediate binary format) which is executed on a virtual stack machine. The interpreter automatically optimizes tail recursion, such that "iterative" algorithms can be realized in constant stack space. Besides the built-in I/O operations, the language is free of side-effects; in particular, Q itself does not have mutable variables (although the standard library provides such imperative features for those who need them). Q also provides two novel and (IMHO) interesting features: a notion of special forms which allows to handle both macro-like functions and lazy evaluation in a uniform setting without having to give up the basic eager evaluation strategy; and a notion of type guards which provides a means to cope with hierarchies of abstract data types (similar to the notion of classes with single inheritance in object-oriented languages) in the context of a term rewriting language.

Using Q is supposed to be fairly simple: you throw together some equations, start the interpreter and then type in the expressions you wish to evaluate. All this can be done with a few keystrokes, if you use the Emacs Q mode supplied with the package. A graphical user interface for Windows is also available. Of course, you can also run the Q programming utilities from the command line if you prefer that.

The manual is organized as follows. In Getting Started, we start out with a brief and informal introduction to the main features of the language, and a glimpse of how Q scripts look like. Lexical Matters, describes the lexical part of the Q language. In Scripts and Modules, we discuss how declarations, definitions and equations are put together to form a script, and how to break down larger scripts into a collection of smaller modules which can be managed separately. Declarations, discusses how certain entities like types, variables and function symbols can be declared in a Q script. Expressions, treats the syntax of Q expressions, and describes the built-in operators provided by the Q language. Equations and Expression Evaluation, is about equations and variable definitions, and how they are used in the evaluation process. Types, and Special Forms, describe the facilities provided by the Q language for dealing with abstract data types and deferred evaluation. Built-In Functions, and The Standard Library, discuss the built-in and standard library functions of the Q language. Clib, describes Q's "system module" which provides access to some important functions from the C library. The appendix gives additional information about some aspects of the language and its implementation. Q Language Grammar, contains a summary of the Q language syntax in BNF. Using Q, provides a description of the programming tools included in the Q programming system, C Language Interface, describes Q's interface to the C programming language, and Debugging, is a brief introduction to the symbolic debugger built into the Q interpreter. Finally, Running Scripts in Emacs, discusses how Q scripts can be edited and run using the Emacs editor.

Acknowledgements

This project has become much further reaching and took a lot longer than I anticipated when I started out working on it, and I wouldn't have been able to keep it going without the patience and support of my beloved wife Evelyn and children Sebastian, Janosch, Yannic and Miriam.

Next I have to acknowledge the support and help provided by colleagues at Mainz and Bremen (Germany) and at INRIA Lorraine (France) during the initial phases of this project. In 1992 Dieter Hofbauer invited me to the term rewriting group at INRIA Lorraine to discuss the design of the (then still nascent) Q with his colleagues and friends. During the following years, I also collaborated with Frank Drewes at Bremen (now at Umea, Sweden) who was involved in the initiation of this project, and Klaus Barthelmann at Mainz who provided many comments which helped to improve the early versions of this manual. Frank and Klaus tested various early beta versions of the Q interpreter, and contributed sample scripts as well as substantial parts of the standard Q library.

I'd also like to express my gratitude to the growing community of users of the Q language all over the world, in particular the members of the Q mailing list, for the lively discussions which helped to improve the language and its implementation, and for the contributed Q modules and applications. Special thanks are due to John Cowan for proofreading the manual, for his valuable suggestions concerning design and implementation issues and for his Q-Chicken interface, and to Rob Hubbard for his comprehensive rational number and polynomial modules which are now part of the distribution.

Last but not least, thanks are also due to the developers of ML and Haskell. While these people have not been involved with Q in any way, their work, which has changed the landscape of functional programming, has been a constant source of inspiration for me.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2. Getting Started

A new programming language is best conveyed through some examples, and therefore it is common practice to start a language description with an introductory chapter which treats the most essential language features in a fairly informal manner. This is also the purpose of the present chapter. We first show how some of the "standard features" of the Q programming language can be employed for using the Q interpreter effectively as a sophisticated kind of "desktop calculator". The remainder of this section then adresses the question of how you can extend the Q environment by providing your own definitions in Q scripts, and describes the evaluation process and the treatment of runtime errors in the interpreter.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.1 Using the Interpreter

To begin with, let us show how the Q interpreter is invoked to evaluate some expressions with the built-in functions provided by the Q language. Having installed the Q programming system, you can simply invoke the interpreter from your shell by typing q. The interpreter will then start up, display its sign-on, and leaves you at its prompt:

 
==>

This indicates that the interpreter is waiting for you to type an expression. Let's start with a simple one:

 
==> 23
23

Sure enough, that's correct. Whenever you type an expression, the interpreter just prints its value. Any integer is a legal value in the Q language, and the common arithmetic operations work on these values as expected. Q integers are "bignums", i.e., their size is only limited by available memory:

 
==> 16753418726345 * 991726534256718265234
16614809890429729930396098173389730

"Real" numbers (more exactly, their approximation using 64 bit double precision floating point numbers) are provided as well. Let's try these:

 
==> sqrt (16.3805*5)/.05
181.0

The sqrt identifier denotes a built-in function which computes the square root of its argument. (A summary of the built-in functions can be found in Built-In Functions.)

What happens if you mistype an expression?

 
==> sqrt (16.3805*5)/,05
! Syntax error
>>> sqrt (16.3805*5)/,05
                     ^

As you can see, the interpreter not only lets you know that you typed something wrong, but also indicates the position of the error. It is quite easy to go back now and correct the error, using the interpreter's "command history". With the up and down arrow keys you can cycle through the expressions you have typed before, edit them, and resubmit a line by hitting the carriage return key. Also note that what you typed is stored in a "history file" when you exit the interpreter, which is reloaded next time the interpreter is invoked. A number of other useful keyboard commands are provided, see Readline: (readline)Command Line Editing section `Command Line Editing' in The GNU Readline Library. In particular, you can have the command line editor "complete" function symbols with the <TAB> key. E.g., if you type in sq and press the <TAB> key, you will get the sqrt function.

Before we proceed, a few remarks about the syntax of function applications are in order. The first thing you should note is that in Q, like in most other modern functional languages, function application is simply denoted by juxtaposition:

 
==> sqrt 2
1.4142135623731

Multiple arguments are specified in the same fashion:

 
==> max 5 7
7

Parentheses can be used for grouping expressions as usual. In particular, if a function application appears as an argument in another function application then it must be parenthesized as follows:

 
==> sqrt (sqrt 2)
1.18920711500272

Another important point is that operators are in fact just functions in disguise. You can turn any operator into a prefix function by enclosing it in parentheses. Thus (+) denotes the function which adds its two arguments, and X+1 can also be written as (+) X 1; in fact, the former expression is nothing but "syntactic sugar" for the latter, see Built-In Operators. You can easily verify this in the interpreter:

 
==> (+) X 1
X+1

You can also have partial function applications like (*) 2 which denotes a function which doubles its argument. Moreover, Q supports so-called operator sections which allow you to specify a binary operator with only either its left or right operand. For instance, (1/) denotes the reciprocal and (+1) the "increment by 1" function:

 
==> (+1) 5
6

==> (1/) 3
0.333333333333333

The interpreter maintains a global variable environment, in which you can store arbitrary expression values. This provides a convenient means to define abbreviations for frequently-used expressions and for storing intermediate results. Variable definitions are done using the def command. For instance:

 
==> def X = 16.3805*5

==> X
81.9025

As indicated, variable symbols usually start with a capital letter; an initial lowercase letter indicates a function symbol. This convention is valid throughout the Q language. However, it is possible to explicitly declare an uncapitalized symbol as a variable, using the var command, and then assign values to it, like so:

 
==> var f

==> def f = sqrt

You can also both declare a variable and initialize its value with a single var command:

 
==> var f = sqrt

==> f X/.05
181.0

Multiple variable declarations and definitions can be entered on a single line, using commas to separate different definitions in a def or var command, and you can separate multiple expressions and commands on the same line with semicolons:

 
==> def X = 16.3805*5, f = sqrt; X; f X/.05
81.9025
181.0

You can also bind several variables at once by using an expression pattern as the left-hand side of a variable definition:

 
==> def (f, X) = (sqrt, 16.3805*5); f X/.05
181.0

(Such pattern-matching definitions only work with def; the left-hand side of a var declaration must always be a simple identifier.)

Another useful feature is the built-in "anonymous" variable `_', which is always set to the value of the most recent expression value printed by the interpreter:

 
==> _
181.0

==> 2*_
362.0

Sometimes you would also like the interpreter to "forget" about a definition. This can be done by means of an undef statement:

 
==> undef X; X
X

Besides def, undef and var, the interpreter provides a number of other special commands. The most important command for beginners certainly is the help command, which displays the online manual using the GNU info reader. You can also run this command with a keyword to be searched in the info file. For instance, to find out about all special commands provided by the interpreter, type the following:

 
==> help commands

(Type q when you are done reading the info file.)

Other useful commands are who which prints a list of the user-defined variables, and whos which describes the attributes of a symbol:

 
==> who
f

==> whos f sqrt

f               user-defined variable symbol
                = sqrt

sqrt            builtin function symbol
                sqrt X1

You can save user-defined variables and reload them using the save and load commands:

 
==> save
saving .q_vars

==> def f = 0; f
0

==> load
loading .q_vars

==> f
sqrt

Some statistics about the most recent expression evaluation can be printed with the stats command:

 
==> sum [1..123456]
7620753696

==> stats
0.52 secs, 246916 reductions, 246918 cells

The stats command displays the (cpu) time needed to complete an evaluation, the number of "reductions" (a.k.a. basic evaluation steps) performed by the interpreter, and the number of expression "cells" needed during the course of the computation. This information is often useful for profiling purposes.

Other commands allow you to edit and run a script directly from the interpreter, and to inspect and set various internal parameters of the interpreter; see Command Language.

Of course, the Q interpreter can also carry out computations on non-numeric data. In fact, Q provides a fairly rich collection of built-in data types, and makes it easy to define your own. For instance, strings are denoted by character sequences enclosed in double quotes. Strings can be concatenated with the `++' operator and the length of a string can be determined with `#'. Moreover, the character at a given position can be retrieved with the (zero-based) indexing operator `!':

 
==> "abc"++"xyz"; #"abc"; "abc"!1
"abcxyz"
3
"b"

The same operations also apply to lists, which are written as sequences of values enclosed in brackets:

 
==> [a,b,c]++[x,y,z]; #[a,b,c]; [a,b,c]!1
[a,b,c,x,y,z]
3
b

As in Prolog, lists are actually represented as right-recursive constructs of the form `[X|Xs]' where X denotes the head element of the list and Xs its tail, i.e., the list of the remaining elements. This representation allows a list to be traversed and manipulated in an efficient manner.

 
==> def [X|Xs] = [a,b,c]; X; Xs
a
[b,c]

==> [0|Xs]
[0,b,c]

Another useful list operation is "enumeration", which works with all so-called "enumeration types" (cf. Enumeration Types), as well as characters and numbers. Enumerations are constructed with the enum function, or using the familiar [X..Y] notation which is in fact only "syntactic sugar" for enum:

 
==> enum "a" "k"
["a","b","c","d","e","f","g","h","i","j","k"]

==> ["a".."k"]
["a","b","c","d","e","f","g","h","i","j","k"]

==> [0..9]
[0,1,2,3,4,5,6,7,8,9]

==> [0.1,0.2..1.0]
[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0]

Tuples are sequences of expressions enclosed in parentheses, which are Q's equivalent of "vectors" or "arrays" in other languages. Like lists, tuples can be concatenated, indexed and enumerated, but they use an internal representation optimized for efficient storage and constant-time indexing:

 
==> (a,b,c)++(x,y,z); #(a,b,c); (a,b,c)!1
(a,b,c,x,y,z)
3
b

==> (0..9)
(0,1,2,3,4,5,6,7,8,9)

Q also offers built-in operations to convert between lists and tuples:

 
==> tuple [a,b,c]
(a,b,c)

==> list _
[a,b,c]

Q provides yet another list-like data structure, namely "lazy" lists a.k.a. streams, which are written using curly braces instead of brackets. In difference to lists, streams only produce their elements "on demand", when they are needed in the course of a computation. This makes it possible to represent large and even infinite sequences of values in an efficient manner. You can see this somewhat unusual behaviour if you try streams interactively in the interpreter. For instance, let's try to concatenate two streams:

 
==> {a,b,c}++{x,y,z}
{a|{b,c}++{x,y,z}}

You probably noticed that the tail of the resulting stream, {b,c}++{x,y,z}, has not been evaluated yet. But that is all right, because it will be evaluated as soon as we need it:

 
==> _!4
y

This lazy way of doing things becomes especially important when we work with infinite streams. We will have more to say about this in the following section.

It is quite instructive to keep on playing a bit like this, to get a feeling of what the Q interpreter can do. You might also wish to consult Built-In Functions, which discusses the built-in functions, and Using Q, which shows how the interpreter is invoked and what additional capabilities it offers. To exit the interpreter when you are finished, invoke the built-in quit function:

 
==> quit

This function does not return a value, but immediately returns you to the operating system's command shell.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.2 Using the Standard Library

The standard library is a collection of useful functions and data structures which are not provided as built-ins, but are implemented by scripts which are mostly written in Q itself. Most of these scripts are included in the "prelude script" prelude.q, which is always loaded by the interpreter, so that the standard library functions are always available. See The Standard Library, for an overview of these operations. You can check which standard library scripts a.k.a. "modules" are actually loaded with the interpreter's modules command. With the standard prelude loaded, this command shows the following:

 
==> modules 
assert          clib*           complex         cond
error           math            prelude         rational
sort            stdlib          stream          string
tuple           typec

As you can see, there are quite a few modules already loaded, and you can use the functions provided by these modules just like any of the built-in operations. The standard library provides a lot of additional functions operating on numbers, strings and lists. For instance, you can take the sum of a list of (integer or floating point) numbers simply as follows:

 
==> sum [1..5]
15

In fact, the library provides a rather general operation, foldl, which iterates a binary function over a list, starting from a given initial value. Using the foldl function, the above example can also be written as follows:

 
==> foldl (+) 0 [1..5]
15

(There also is a foldr function which works analogously, but combines list members from right to left rather than from left to right.)

Generalizing the notion of simple enumerations of numbers like [1..5], the standard library also provides the general-purpose list-generating function while. For instance, we can generate the list of all powers of 2 in the range [1..1000] as follows:

 
==> while (<=1000) (2*) 1
[1,2,4,8,16,32,64,128,256,512]

(Recall from the previous section that (2*) is the doubling function. And (<=1000) is a predicate which checks its argument to be less than or equal to 1000.)

If we want to generate a list of a given size, we can use iter instead. So, for instance, we might compute the value of a finite geometric series as follows:

 
==> sum (iter 4 (/3) 1)
1.48148148148148

The map function allows you to apply a function to every member of a list. For instance, the following expression doubles each value in the list [1..5]:

 
==> map (2*) [1..5]
[2,4,6,8,10]

Lists can also be filtered with a given predicate:

 
==> filter (>=3) [1..5]
[3,4,5]

The scanl operation allows you to compute all the sums of initial segments of a list (or accumulate any other binary operation over a list):

 
==> scanl (+) 0 [1..5]
[0,1,3,6,10,15]

(Like foldl, scanl also has a sibling called scanr which collects results from right to left rather than left to right, starting at the end of the list.)

You can partition a list into an initial part with a given length and the remaining part of the list with take and drop:

 
==> take 3 [1..5]; drop 3 [1..5]
[1,2,3]
[4,5]

The takewhile and dropwhile functions have a similar effect, but partition their input list according to a given predicate:

 
==> takewhile (<=3) [1..5]; dropwhile (<=3) [1..5]
[1,2,3]
[4,5]

Another useful list operation is zip which collects pairs of corresponding elements in two input lists:

 
==> zip [1..5] ["a".."e"]
[(1,"a"),(2,"b"),(3,"c"),(4,"d"),(5,"e")]

The effect of zip can be undone with unzip which returns a pair of lists:

 
==> unzip _
([1,2,3,4,5],["a","b","c","d","e"])

The zipwith function is a generalized version of zip which combines corresponding members from two lists using a given binary function:

 
==> zipwith (*) [1..5] [1..5]
[1,4,9,16,25]

All the operations described above - and many others - are provided by the stdlib.q script. It is instructive to take a look at this script and see how the operations are defined.

In addition, the standard library provides yet another list generating function listof which implements so-called "list comprehensions". These allow you to describe list values in much the same way as sets are commonly specified in mathematics. For instance, you can generate a list of pairs (I,J) with 1<=J<I<=5 as follows:

 
==> listof (I,J) (I in [1..5], J in [1..I-1])
[(2,1),(3,1),(3,2),(4,1),(4,2),(4,3),(5,1),(5,2),(5,3),(5,4)]

The language provides syntactic sugar for list comprehensions so that the above example can also be written as follows:

 
==> [(I,J) : I in [1..5], J in [1..I-1]]
[(2,1),(3,1),(3,2),(4,1),(4,2),(4,3),(5,1),(5,2),(5,3),(5,4)]

The same kind of construct also works with tuples:

 
==> ((I,J) : I in [1..5], J in [1..I-1])
((2,1),(3,1),(3,2),(4,1),(4,2),(4,3),(5,1),(5,2),(5,3),(5,4))

We also have a random number generator, which is implemented by the built-in random function. Here is how we can generate a list of 5 pseudo random 32 bit integers:

 
==> [random : I in [1..5]]
[1960913167,1769592841,3443410988,2545648850,536988551]

To get random floating point values in the range [0,1] instead, we simply divide the results of random by 0xffffffff:

 
==> map (/0xffffffff) _
[0.456560674928259,0.41201544027124,0.801731596887515,0.592705060400233,
0.125027389993199]

Lists can be sorted using quicksort (this one comes from sort.q):

 
==> qsort (<) _
[0.125027389993199,0.41201544027124,0.456560674928259,0.592705060400233,
0.801731596887515]

As already mentioned, the Q language also provides a "lazy" list data structure called "streams". Streams are like lists but can actually be infinite because their elements are only produced "on demand". The standard library includes a module stream.q which implements a lot of useful stream operations. Most list operations carry over to streams accordingly. For instance, if we want to create a geometric series like the one generated with iter above, but we do not know how many elements will be needed in advance, we can employ the stream generation function iterate:

 
==> iterate (/3) 1
{1|iterate (/3) ((/3) 1)}

The {|} stream constructor works in the same way as the [|] list constructor, but is "special" in that it does not evaluate its arguments, i.e., the head and the tail of the stream. (Otherwise the call to iterate would never terminate.)

To get all members of the series we can apply the scanl function:

 
==> def S = scanl (+) 0 _; S
{0|scanl (+) (0+1) (iterate (/3) ((/3) 1))}

Now we can extract any number of initial values of the series using the take operation, and convert the resulting stream to an ordinary list with the list function. For instance, if we are interested in the first five values of the series, we proceed as follows:

 
==> list (take 5 S)
[0,1,1.33333333333333,1.44444444444444,1.48148148148148]

Note that the stream S is really infinite; if we want, we can extract any value in the series:

 
==> S!9999
1.5

Let's see how many iterations are actually required to reach the limit 1.5 with an error of at most 1e-15:

 
==> #takewhile (>1e-15) (map (1.5-) S)
32

This means that the sum of the first 31 series terms is needed to get an accuracy of 15 digits (which is the best that we can hope for with 64 bit floating point values). We can readily verify this using iter:

 
==> sum (iter 31 (/3) 1)
1.5

The standard library also implements stream enumerations and comprehensions which work like the corresponding list and tuple constructs but are evaluated in a lazy fashion, and the interpreter provides syntactic sugar for these. In particular, the notation {X..} or {X,Y..} can be used to specify an infinite arithmetic sequence. For instance, here is another way to define the infinite geometric series from above:

 
==> def S = scanl (+) 0 {1/3^N : N in {0..}}

==> list (take 5 S)
[0,1.0,1.33333333333333,1.44444444444444,1.48148148148148]

It is worth noting here that streams are just one simple example of a lazy data structure, which happen to play an important role in many useful algorithms so that it makes sense to provide them as a builtin. The Q language makes it rather easy to define your own lazy data structures using so-called special forms. In fact, special forms provide a uniform means to define both lazy data constructors and macro-like functions which take their arguments unevaluated, employing "call by name" parameter passing. This will be discussed in detail in Special Forms.

Special forms are an integrated part of Q which adds a great amount of expressive power to the language. In particular, they allow specialized constructs such as a "short-circuit" conditional expressions to be defined in the standard library, just like "ordinary" functions, rather than having to provide them as builtins. One example is the conditional expression if X then Y else Z which is nothing but syntactic sugar for the standard library function ifelse:

 
==> ifelse (5>0) "positive" "negative"
"positive"

==> if 5>0 then "positive" else "negative"
"positive"

Another special form which is useful in many situations is lambda (as of Q 7.1, this one is actually provided as a builtin), which allows you to create little "anonymous" functions on the fly. The customary notation \X.Y is provided as a shorthand for lambda X Y. These constructs are also called lambda abstractions, or simply lambdas. For instance, the following interpreter command defines the well-known factorial function using a lambda abstraction and assigns the resulting function object to the variable fac:

 
==> var fac = \N.if N>0 then N*fac (N-1) else 1

==> fac
\X1 . if X1>0 then X1*fac (X1-1) else 1

==> map fac [1..10]
[1,2,6,24,120,720,5040,40320,362880,3628800]

Lambdas taking multiple arguments can be created like this:

 
==> \X Y.(1-X)*Y
\X1 . \X2 . (1-X1)*X2

==> _ 0.9 0.5
0.05

The lambda arguments can also be patterns to be matched against the actual parameters. For instance:

 
==> \(X,Y).(1-X)*Y
\(X1,X2) . (1-X1)*X2

==> _ (0.9,0.5)
0.05

Besides string and list processing functions and general utility functions of the kind sketched out above, the standard library also includes a collection of efficient "container" data structures which are useful in many applications. For instance, so-called hashed dictionaries (also known as "hashes" or "associative arrays") are implemented by the hdict.q script. The following expressions show how to create a dictionary object from a list, and how to access this dictionary using the index (!), keys and vals operations. (Note that here we have to load the hdict.q module explicitly, as it is not automatically included via the prelude. We could also import the stdtypes.q module instead, to get hold of all the container types.)

 
==> import hdict

==> def D = hdict [(foo,99),(bar,-1),(gnu,foo)]; D!gnu
foo

==> keys D
[foo,bar,gnu]

==> vals D
[99,-1,foo]

This completes our little tour through the standard library. To find out more, please refer to The Standard Library, or take a look at the scripts themselves.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.3 Writing a Script

Now that we have taken the first big hurdle and are confident that we can actually get the Q interpreter to evaluate us some expressions, let us try to define a new function for use in the interpreter. That is, we are going to write our first own Q "program" or "script".

In the Q language, there are actually two different ways to create new functions. As we have already seen in the previous sections, little "anonymous" functions are often computed on the fly by deriving them from existing functions using partial applications or lambda abstractions. The other way, which is often more convenient when the functions to be defined get more complicated, is the definition of named functions by the means of "equations", which works in a way similar to algebraic definitions in mathematics. We will discuss how to do this in the following.

Having exited the Q interpreter, invoke your favourite text editor and enter the following simple script which implements the factorial function:

 
fac N                   = N*fac(N-1) if N>0;
                        = 1 otherwise;

(Note that in order to define functions equationally, you really have to write a script file. For technical reasons there currently is no way to enter an equational definition directly in the interpreter.)

Save the script in the file fac.q and then restart the interpreter as follows (here and in the following `$' denotes the command shell prompt):

 
$ q fac.q

You can also edit the script from within the interpreter (using vi or the editor named by the EDITOR environment variable) and then restart the interpreter with the new script, as follows:

 
==> edit fac.q

==> run fac.q

In any case, now the interpreter should know about the definition of fac and we can use it like any of the built-in operations:

 
==> map fac [1..10]
[1,2,6,24,120,720,5040,40320,362880,3628800]

For instance, what is the number of 30-element subsets of a 100-element set?

 
==> fac 100 div (fac 30*fac 70)
29372339821610944823963760

As another example, let us write down Newton's algorithm for computing roots of a function. Type in the following script and save it to the file newton.q:

 
newton DX DY F          = until (satis DY F) (improve DX F);
satis DY F X            = abs (F X) < DY;
improve DX F X          = X - F X / derive DX F X;
derive DX F X           = (F (X+DX) - F X) / DX;

Restart the interpreter with the newton.q script and try the following. Note that the target function here is \Y.Y^3-X which becomes zero when Y equals the cube root of X.

 
==> var eps = .0001, cubrt = \X.newton eps eps (\Y.Y^3-X) X

==> cubrt 8
2.00000000344216

Well, this is not that precise, but we can do better:

 
==> def eps = 1e-12

==> cubrt 8
2.0

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.4 Definitions

As mentioned in the previous section, in the Q language function definitions take the form of equations which specify how a given expression pattern, the left-hand side of the equation, is transformed into a new expression, the right-hand side of the equation. In this section we elaborate on this some more to give you an idea about how these equational definitions work. Our explanations will necessarily be a bit sketchy at this point, but the concepts we briefly touch on in this section will be explained in much more detail later.

The left-hand side of an equation may introduce "variables" (denoted by capitalized identifiers, as in Prolog) which play the role of formal parameters in conventional programming languages. For instance, the following equation defines a function sqr which squares its (numeric) argument:

 
sqr X                   = X*X;

An equation may also include a condition part, as in the definition of the factorial function from the previous section:

 
fac N                   = N*fac(N-1) if N>0;
                        = 1 otherwise;

As indicated above, several equations for the same left-hand side can be factored to the left, omitting repetition of the left-hand side. Furthermore, the otherwise keyword may be used to denote the "default" alternative.

The left-hand side of an equation may actually be an arbitrary compound expression pattern to be matched in the expression to be evaluated. For instance, as we have already seen, the expressions [] and [X|Xs] denote, respectively, the empty list and a list starting with head element X and continuing with a list Xs of remaining elements, just as in Prolog. We can use these patterns to define a function add which adds up the members of a list as follows:

 
add []                  = 0;
add [X|Xs]              = X+add Xs;

Thus no explicit operations for "extracting" the members from a compound data object such as a list are required; the components are simply retrieved by "pattern matching". This works in variable definitions as well. For instance, in the interpreter you can bind variables to the head element and the tail of a list using a single definition as follows:

 
==> def [X|Xs] = [1,2,3]; X; Xs
1
[2,3]

Pattern matching also works if the arguments take the form of function applications. For instance, the following definition implements an insertion operation on binary trees:

 
insert nil Y            = bin Y nil nil;
insert (bin X T1 T2) Y  = bin Y T1 T2 if X=Y;
                        = bin X (insert T1 Y) T2 if X>Y;
                        = bin X T1 (insert T2 Y) if X<Y;

Note the symbols nil and bin which act as "constructors" for the binary tree data structure in this example. (Q actually has no rigid distinction between "function applications" and "data constructors". This is unnecessary since function applications can be "values" in their own right, as discussed below.)

The Q interpreter treats equations as "rewrite rules" which are used to reduce expressions to "normal forms". Evaluation normally uses the standard "leftmost-innermost" rule, also known as "applicative order" or "call by value" evaluation. Using this strategy, expressions are evaluated from left to right, innermost expressions first; thus the arguments of a function application (and also the function object itself) are evaluated before the function is applied to its arguments.

An expression is in normal form if it cannot be evaluated any further by applying matching equations (including some predefined rules which are used to implement the built-in operations, see Equations and Expression Evaluation). A normal form expression simply stands for itself, and constitutes a "value" in the Q language. The built-in objects of the Q language (such as numbers and strings) are always in normal form, but also compound objects like lists (given that their elements are in normal form) and any other symbol or function application which does not match a built-in or user-defined equation.

This means that Q is essentially an "exception-free" language. That is, an "error condition" such as "wrong argument type" does not raise an exception by default, but simply causes the offending expression to be returned "as is". For instance:

 
==> hd []
hd []

==> sin (X+1)
sin (X+1)

You may be disconcerted by the fact that expressions like hd [] and sin (X+1) actually denote legal values in the Q language. However, this is one of Q's key features as a term rewriting programming language, and it adds a great amount of flexibility to the language. Most importantly, it allows expressions, even expressions involving variables, to be evaluated in a symbolic fashion. For instance, you might add rules for algebraic identities and symbolic differentiation to your script and have the interpreter simplify expressions according to these rules. On the other hand, the Q language also allows you to refine the definition of any built-in or user-defined operation and thus it is a simple matter to add an "error rule" like the following to your script when needed:

 
hd []                   = error "hd: empty list";

(The error function is defined in the standard library script error.q, see Diagnostics and Error Messages.) In order to implement more elaborate error handling, you can also use Q's built-in throw and catch functions to raise and respond to exceptions on certain error conditions. This is discussed in Exception Handling.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.5 Runtime Errors

There are a few error conditions which may cause the Q interpreter to abort the evaluation with a runtime error message. A runtime error arises when the interpreter runs out of memory or stack space, when the user aborts an evaluation with the interrupt key (usually Ctl-C), and when the condition part of an equation does not evaluate to a truth value (true or false). (All these error conditions can also be handled by the executing script, see Exception Handling.)

You can use the break on command to tell the interpreter that it should fire up the debugger when one of the latter two conditions arises. After Ctl-C, you can then resume evaluation in the debugger, see Debugging. This is not possible if the interpreter stumbles across an invalid condition part; in this case the debugger is invoked merely to report the equation which caused the error, and to inspect the call chain of the offending rule. Hitting the carriage return key will return you to the interpreter's evaluation loop.

For instance, reload the fac.q script from Writing a Script:

 
fac N                   = N*fac(N-1) if N>0;
                        = 1 otherwise;

Now enable the debugger with the break command and type some "garbage" like fac fac:

 
==> break on; fac fac
! Error in conditional
  0>  fac.q, line 1: fac fac  ==>  fac*fac (fac-1) if fac>0
(type ? for help)
:

What happened? Well, the debugger informs us that the error occurred when the interpreter tried to apply the first equation,

 
fac N                   = N*fac(N-1) if N>0;

to the expression fac fac. In fact this is no big surprise since we cannot expect the interpreter to know how to compare a symbol, fac, with a number, 0, which it tried when evaluating the condition part of the above equation. So let us return to the interpreter's prompt by hitting the carriage return key:

 
: <CR>

==>

The interpreter now waits for us to type the next expression. (For a summary of debugger commands please refer to Debugging. You can also type ? or help while in the debugger to obtain a list of available commands.)


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

3. Lexical Matters

The vocabulary of the Q programming language consists of notations for identifiers, operators, integers, floating point numbers, character strings, comments, a few reserved words which may not be used as identifiers, and some special punctuation symbols which are used as delimiters.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

3.1 Whitespace and Comments

Whitespace (blanks, tabs, newlines, form feeds) serves as a delimiter between adjacent symbols, but is otherwise ignored. Comments are treated like whitespace:

 
/* This is a comment ... */

Note that these comments cannot be nested. C++-style line-oriented comments are also supported:

 
// C++-style comment ...

Furthermore, lines beginning with the #! symbol denote a special type of comment which may be processed by the operating system's command shell and the Q programming tools. On UNIX systems, this (odd) feature allows you to execute Q scripts directly from the shell (by specifying the q program as a command language processor) and to include compiler and interpreter command line options in a script (see Running Scripts from the Shell).


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

3.2 Identifiers and Reserved Words

Identifiers are denoted by the usual sequences of letters (including `_') and digits, beginning with a letter. Upper- and lowercase is distinct. In the Q language, identifiers are used to denote type, function and variable symbols. Type identifiers may start with either an uppercase or lowercase letter (the convention, however, is to use an initial uppercase letter). The case of the first letter matters, however, for function and variable symbols. As in Prolog, a capitalized identifier (such as X, Xmax and XMAX) indicates a variable symbol, whereas identifiers starting with a lowercase letter denote function symbols (unless they are declared as "free" variables, see below). Unlike in Prolog, the underscore `_' counts as a lowercase letter, hence _MAX is a function symbol, not a variable. (The idea behind this is that it allows you to get a function symbol which appears to start with an uppercase letter by stropping it with an initial `_'.) However, as an exception to the general rule, the identifier `_' does denote a variable symbol, the so-called anonymous variable. The same rules also apply to symbols created interactively in the interpreter.

Variables actually come in two flavours: bound and free variables, i.e., variables which also occur on the left-hand side of an equation, and variables which only occur on the right-hand side and/or in the condition part of an equation. Identifiers may also be declared as free variables; see Declarations. In this case, they may also start with a lowercase letter.

Type, function and free variable identifiers may also be qualified with a module identifier prefix (cf. Scripts and Modules), to specifically denote a symbol of the given module. Such a qualified identifier takes the form module-id::identifier; no whitespace or comments are allowed between the module name, `::' symbol and the function or variable identifier.

Formally, the syntax of identifiers is described by the following grammatical rules:

 
identifier              : unqualified-identifier
                        | qualified-identifier
qualified-identifier    : module-identifier '::'
                          unqualified-identifier
unqualified-identifier  : variable-identifier
                        | function-identifier
                        | type-identifier
module-identifier       : letter {letter|digitsym}
type-identifier         : letter {letter|digitsym}
variable-identifier     : uppercase-letter {letter|digitsym}
                        | '_'
function-identifier     : lowercase-letter {letter|digitsym}
letter                  : uppercase-letter|lowercase-letter
uppercase-letter        : 'A'|...|'Z'|uppercase unicode letter
lowercase-letter        : 'a'|...|'z'|'_'|lowercase unicode letter
digitsym                : '0'|...|'9'|unicode digit

(Please refer to Q Language Grammar, for a description of the BNF grammar notation used throughout this document.)

The following symbols are reserved words of the Q language and may not be used as identifiers:

 
as        const     def       else      extern    from      if
import    include   otherwise private   public    special   then
type      undef     var       virtual   where

In addition, the following identifiers are predeclared as operator symbols (see Operator Symbols) and cannot be used as normal identifiers either:

 
and       div       mod       not       or

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

3.3 Operator Symbols

Operator symbols may either take the form of function identifiers or they may be sequences of punctuation characters (excluding `_' which serves as a letter in the Q language). Like function and free variable symbols, they may be qualified with a module identifier prefix:

 
op                      : unary-op|binary-op
unary-op                : opsym
binary-op               : opsym|'and' 'then'|'or' 'else'
opsym                   : unqualified-opsym
                        | qualified-opsym
qualified-opsym         : module-identifier '::'
                          unqualified-opsym
unqualified-opsym       : function-identifier
                        | punctsym {punctsym}
punctsym                : unicode punctuation character

In either case, operator symbols must be declared explicitly before they can be used, cf. Declarations. The declaration determines the precedence and "fixity" of the operator (i.e., whether it acts as a unary prefix or a binary infix operator; see User-Defined Operators, for more information on this).

As already mentioned, the following identifiers are predefined as built-in operators:

 
and       div       mod       not       or

As indicated by the syntax rules, the logical connectives `and' and `or' can also be combined with the keywords `then' and `else' to form the "short-circuit" connectives `and then' and `or else'.

The punctuation symbols predefined as built-in operators are the following:

 
` ' ~ & . || < > = <= >= <> == ++ + - * / ^ ! # $

Most of these can actually be redeclared by the programmer for his own purposes. However, there are a few symbols, called "soft delimiters", which play a special role in the syntax of the Q language (some of these are also used as operator symbols):

 
~ . .. : | = == - \ @

The soft delimiters may occur inside user-defined operators, but if they are used as separate lexemes then they are treated like reserved words and thus they may not be declared as operator symbols (except for the purpose of making aliases, cf. Declarations). The same applies to the reserved keywords (see Identifiers and Reserved Words).

Moreover, the following special symbols serve as "hard delimiters" in the Q language which always separate lexemes and thus may not occur inside operator symbols at all:

 
" , ; :: ( ) [ ] { }

The same applies to whitespace and other non-printable characters, as well as the comment delimiters (`//' and `/*' as well as initial `#!' on a script line).

Symbols consisting of punctuation are generally parsed using the "longest possible lexeme" a.k.a. "maximal munch" rule. Here, the "longest possible lexeme" refers to the longest prefix of the input such that the sequence of punctuation characters either forms a valid (i.e., declared) operator symbol, or one of the reserved and special delimiter symbols listed above. Thus, e.g., `..#' will usually be parsed as `.. #', i.e., a reserved `..' symbol followed by a `#' operator. This holds unless the entire sequence `..#' has already been declared as an operator in the scope where it is used.

The only exception to the "maximal munch" rule occurs inside declarations where a new operator symbol is being introduced. In this case the symbol extends up to the next hard delimiter symbol (usually `)' or `;') or whitespace/non-printable character in the input. For instance, the following Q code snippet declares a new binary operator symbol `+~%' (please refer to Declarations, for an explanation of the declaration syntax):

 
public (+~%) X Y;

Another special case arises with conglomerates of three or more consecutive `:' symbols. In this case the initial `::' is always treated as the designation of a qualified (operator) symbol. (In the unlikely case that you ever need to specify a "guarded variable" of the form X : ::Type with a qualified type in the built-in namespace, the space between the `:' token and the following `::' qualification designator is mandatory.)


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

3.4 Numbers

Signed decimal numeric constants are denoted by sequences of decimal digits (only the standard digits 0..9 in the ASCII character set may be used here) and may contain a decimal point and/or a scaling factor. Integers can also be denoted in octal or hexadecimal, using the same syntax as in C:

 
number                  : ['-'] unsigned-number
unsigned-number         : '0' octdigitseq
                        | '0x' hexdigitseq
                        | '0X' hexdigitseq
                        | digitseq ['.' [digitseq]] [scalefact]
                        | [digitseq] '.' digitseq [scalefact]
digitseq                : digit {digit}
octdigitseq             : octdigit {octdigit}
hexdigitseq             : hexdigit {hexdigit}
scalefact               : 'E' ['-'] digitseq
                        | 'e' ['-'] digitseq
digit                   : '0'|...|'9'
octdigit                : '0'|...|'7'
hexdigit                : '0'|...|'9'|'a'|...|'f'|'A'|...|'F'

Simple digit sequences without decimal point and scaling factor are treated as integers; if the sequence starts with `0' or `0x'/`0X' then it denotes an integer in octal or hexadecimal base, respectively. Other numbers denote (decimal) floating point values. If a decimal point is present, it must be preceded or followed by at least one digit. Both the scaling factor and the number itself may be prefixed with a minus sign. (Syntactically, the minus sign in front of a number is interpreted as unary minus, cf. Expressions. However, if unary minus occurs in front of a number, it is interpreted as a part of the number and is taken to denote a negative value. See the remarks concerning unary minus in Expressions.) Some examples:

 
0  -187326  0.0  -.05  3.1415e3 -1E-10 0177 0xaf -0XFFFF

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

3.5 Strings

String constants are written as character sequences enclosed in double quotes:

 
string                  : '"' {char} '"'
char                    : any character but newline and "

To include newlines, double quotes and other (non-printable) characters in a string, the following escape sequences may be used:

 
\n                      newline
\r                      carriage return
\t                      tab
\b                      backspace
\f                      form feed
\"                      double quote
\\                      backslash

Furthermore, a character may also be denoted in the form \N, where N is the character number in decimal, hexadecimal or octal (using the same syntax as for unsigned integer values). Note that the character number may consist of an arbitrary number of digits; the resulting value will be taken modulo the size of the local character set, which is 256 in the case of ASCII-only systems and 0x110000 if the interpreter has been built with Unicode support (see Unicode Support). Optionally, the character number may also be enclosed in parentheses; this makes it possible to specify a string in which a character escape is immediately followed by another character which happens to be a valid digit, as in "\(123)4".

As of version 7.11 and later, the interpreter also supports symbolic character escapes of the form `\&name;', where name is any of the XML single character entity names specified in the "XML Entity definitions for Characters", see http://www.w3.org/TR/xml-entity-names/. Note that, at the time of this writing, this is still a W3C working draft, so the supported entity names may be subject to change until the final specification comes out; the currently supported entities are described in the draft from 14 December 2007, see http://www.w3.org/TR/2007/WD-xml-entity-names-20071214/. Also note that multi-character entities are not supported in this implementation.

A string may also be continued across lines by putting the \ character immediately before the end of the line, which causes the following newline character to be ignored.

As of Q 7.0, it is a syntax error if a character escape is not recognized as either a numeric escape or one of the special escapes listed above.

Some examples:

 
""                      empty string
"A"                     single character string
"\27"                   ASCII escape character (ASCII 27)
"\033"                  same in octal
"\0x1b"                 same in hexadecimal
"\(0x1b)c"              ASCII escape followed by a literal `c' character
"Gr\&auml;f"            German umlaut, using XML entity escape
"a string"              multiple character string
"a \"quoted\" string"   include double quotes
"a line\n"              include newline
"a very \               continue across line end
long line\n"

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

3.6 Unicode Support

Since version 7.0 the Q interpreter has full Unicode support. This means that identifiers may actually contain arbitrary alphanumeric letter and digit symbols from the entire Unicode character set. (All non-uppercase alphabetical letters are considered as lowercase letters which may begin a function identifier.) Likewise, operator symbols may consist of arbitrary punctuation characters in the Unicode character set. (We refrain from giving any concrete examples here, to keep this document 7-bit clean.)

Moreover, all string values are internally represented using the UTF-8 encoding, which allows you to represent all characters in the Unicode character set, while retaining backward compatibility with 7 bit ASCII. String literals in the source script will be translated from the system encoding to UTF-8 automatically and transparently, and the \N notation may specify any valid Unicode point. For this to work, the interpreter must have been built with Unicode support enabled (which is the default).


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4. Scripts and Modules

The basic compilation unit in the Q language is the script, which is simply a (possibly empty) sequence of declarations and definitions in a single source file:

 
script                  : {declaration|definition}

In order to make a given set of definitions available for use in the interpreter, you can just collect the relevant declarations, variable definitions and equations in a script which you submit to the interpreter for execution. The interpreter in turn invokes the compiler to translate the script into a bytecode file which is loaded by the interpreter (see Using Q). A script may also be empty, in which case it does not provide any definitions at all.

If you put all your definitions into a single script, that's all there is to it. However, you will often want to break down a larger script into a collection of smaller units which can be managed separately and which are linked together by the compiler into a single bytecode file. To these ends the Q language allows you to put your definitions into several separate script files, which are also called modules. To gain access to the function, variable and type symbols provided by another module, you must then use an import or include declaration, using the following syntax:

 
declaration             : unqualified-import
                        | qualified-import

unqualified-import      : 'import' module-spec {',' module-spec} ';'
                        | 'include' module-spec {',' module-spec} ';'

qualified-import        : 'from' module-spec 'import' [symbol-specs] ';'
                        | 'from' module-spec 'include' [symbol-specs] ';'

module-spec             : module-name ['as' unqualified-identifier]
module-name             : unqualified-identifier
                        | string

symbol-specs            : symbol-spec {',' symbol-spec}

symbol-spec             : unqualified-identifier ['as' unqualified-identifier]
                        | unqualified-opsym ['as' unqualified-opsym]

As of version 7.8, Q supports both unqualified and qualified import clauses. The former allows you to quickly import an entire collection of modules, while the latter gives you precise control over which symbols are to be imported from which modules. We describe each of these in turn.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.1 Unqualified Imports

Easiest first, let as take a look at unqualified imports. The following declaration imports two modules foo and bar:

 
import foo, bar;

If you put such declarations into your "main script" and then run the script with the interpreter, the imported modules will be loaded as well.

To determine which functions, variables and types are actually available to be imported into another script, the Q language allows you to declare symbols either as "public" or "private", see Declarations. Outside a module, only the public symbols are accessible when the module is imported in another module.

Normally import is not "transitive", i.e., importing a module foo does not automatically give you access to the symbols imported by foo, but only to the public symbols declared in foo itself. To work around this, instead of import you can also use an include declaration which causes all imports and includes of the included module to be "reexported". For instance, let us consider the case that module foo includes module bar:

 
include bar;

Then by importing module foo you also gain access to the public symbols of module bar (and, recursively, to the modules included by bar, etc.). This provides a means to group different modules together in one larger "umbrella" module. For instance, the standard prelude script (cf. The Standard Library) simply includes most of the other standard library modules.

In the Q language, each module has its own separate namespace. This means that two different modules, say foo1 and foo2, may both provide their own public function symbol named foo. If both foo1 and foo2 are imported in the same script, you can distinguish between the two by using a qualified identifier, using the module name as a prefix, e.g., foo1::foo or foo2::foo. (Writing simply foo in this case will produce an error message because the reference is ambiguous.)

Import and include declarations can occur anywhere in a script (and will be active from this point on up to the end of the script file), but it is common (and recommended) practice to put them near the beginning of the script. As in most other programming languages, module dependencies must always be acyclic. If the compiler finds a cyclic chain of import and include declarations, such as a module importing itself or a module foo importing a module bar which in turn imports foo again, it produces an error message.

Another caveat: When using unqualified import clauses, it is much too easy to accidentally "reuse" an imported symbol because of a missing local symbol declaration. For instance, if you import a module foo which happens to export a symbol bar, and then you later define a function bar in your script, your definition will use the imported bar symbol. This is perfectly legal in Q, and may be intended, but if it's not then you have to explicitly declare a new local bar symbol after the import clause, cf. Declarations. If you forget this, you will errorneously redefine the existing bar function instead. One way to avoid this is to always use qualified imports, as described below. Another possibility is to run the interpreter with the --pedantic option, in which case it will warn you about symbols from unqualified imports which are referred to with unqualified identifiers, see Running Compiler and Interpreter, for details.

As indicated in the syntax rules, the names of the modules to be imported can either be specified using an unqualified identifier, or a string denoting a full or relative pathname, e.g.:

 
import "/home/ag/q/stuff.q";

In this case, you can also omit the `.q' suffix, it will be supplied automatically when necessary. Moreover, the module identifier is automatically the basename of the script filename (i.e., stuff in the above example); therefore, it is a good idea to choose basenames which are legal Q identifiers so that you can use them in qualified symbols.

If no absolute path is specified, the interpreter locates script files using its "search path", which usually contains the current directory and other system-dependent or user-defined locations where "library" scripts are kept, see Using Q, for details.

The `as' keyword can be used to explicitly specify an unambiguous module name. This is useful, in particular, if the basename of the module is not a valid identifier, and for the purpose of resolving name clashes. For instance:

 
import "my/stdlib.q" as mystdlib;

Simply importing "my/stdlib.q" under its own name in this case would produce an error message, because the name of the module collides with the standard library module of the same name, and the compiler enforces that all module names are unique.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.2 Qualified Imports

A qualified import takes the form:

 
from foo import foo, BAR;

This specifically imports the symbols foo and BAR from module foo, and nothing else. What this actually does is to "redeclare" the imported symbols in the namespace of the importing module, as explained in Cross-Checking of Declarations and Aliases, so that they look like private symbols of the importing module. Thus you can refer to them, e.g., simply as foo or with the qualified name gnats::foo, where gnats is the name of the importing module. Note that foo::foo will not work in this case, because a qualified import does not bring the module itself into scope. (However, as pointed out below you can use both a qualified and an unqualified import in concert to make both gnats::foo and foo::foo work.)

A note on terminology: In Q parlance, the terms qualified import and unqualified import apply to the import clauses themselves; a qualified import clause is called "qualified" simply because it restricts the set of imported symbols. Unqualified and qualified identifiers can be used with both styles of import clauses alike.

As with unqualified import, there is a second kind of qualified import clause which also reexports the imported symbols, by making them public members of the importing module:

 
from foo include foo, BAR;

For convenience, you can also write:

 
from module import;

or

 
from module include;

without listing any symbols to just import or include all symbols of the given module qualified. (It goes without saying that this should be used with care.) This is roughly equivalent to `import module' or `include module', respectively, but the imported symbols are made available as members of the namespace of the client, not the imported module.

If this sounds a bit confusing, here is an example which should clarify the differences between unqualified and qualified imports. Let's assume the following three modules A, B and C:

 
/* A: */
import B;

/* B: */
include C;

/* C: */
public foo;

Using this setup, the symbol foo is available as either just `foo' or as `C::foo' in all of A, B and C. Now suppose we change module B to use a qualified `include' instead:

 
/* B: */
from C include foo; /* or just: from C include; */

C's namespace hasn't changed, of course, but in both A and B the symbol foo is now available as `foo' or `B::foo', but not as `C::foo' anymore. However, you can also combine unqualified and qualified imports, like this:

 
/* B: */
include C;
from C include foo;

Now the symbol foo is available as either `foo', `B::foo' or `C::foo' in B and C (and the compiler will recognize it as the same symbol no matter which notation you use).

Finally, note that since symbols imported using a qualified clause become members of the importing namespace, they must not collide with other symbols declared either locally or in another qualified import clause. Thus, if you have two modules foo1 and foo2 which both export their own (local) symbol foo, the following will provoke an error message from the compiler because of the clash between the two different foo symbols:

 
from foo1 import foo;
from foo2 import foo;

In such cases you must either use unqualified import, or employ an `as' clause to rename the symbols while importing them. E.g.:

 
from foo1 import foo as foo1;
from foo2 import foo as foo2;

Note that this is only a problem if the imported symbols are really distinct. If the symbols are just different "aliases" of the same symbol defined elsewhere, then you can just import them under the same name into the same scope without any problems.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.3 Implicit Imports and the Prelude

Actually, there is yet another kind of import, namely the implicit import of the prelude.q script at the beginning of each script, which in turn includes most standard library modules (see The Standard Library) and makes them readily available in your program, without having to use an explicit import declaration. When looking up an unqualified symbol in a given script, the compiler first searches for a symbol defined in that script, then for a symbol in an imported or included module, then for a symbol defined by the prelude and its includes, and finally for a built-in symbol.

You can instruct the compiler to inhibit the default import of the prelude, by using the --no-prelude option, see Using Q. Moreover, you can also override the standard prelude with your own, if it occurs on the search path before the standard prelude, see Setting up your Environment.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.4 The Global Namespace

The namespace available in the interpreter is always that of the main script, i.e., the script given on the command line. This namespace also includes the built-in namespace which contains all the built-in symbols, as well as the implicit imports, i.e., the prelude and all its includes (unless the --no-prelude option was used). The built-in namespace can be accessed explicitly by using an empty module qualifier, as in ::sin. Qualified prelude symbols use the corresponding module name as the qualifier (e.g., stdlib::cat), just as with explicit imports.

The interpreter also allows you to dynamically import additional modules in the global scope using the import command, see Command Language. Moreover, to facilitate testing and debugging, in the interpreter it is possible to gain access to all public and private symbols of the program (also in modules not directly imported in the main script) using qualified identifiers.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5. Declarations

The Q language allows you to declare function and (free) variable symbols explicitly, by means of the syntactic constructs discussed in this chapter. Symbol declarations are mostly optional; if you introduce a new symbol without declaring it, the compiler declares it for you. However, you will sometimes wish to ensure that a new symbol is created to override a symbol of the same name from an imported module, or you want to attach special attributes to a symbol, and then you have to use an explicit declaration. Explicit declarations are also needed to introduce new operator and type symbols. Moreover, while implicit declarations are often convenient when you want to get something done quickly, for larger projects you might prefer to play it safe and ensure that each function symbol actually has a proper explicit declaration. In such cases you can run the compiler with the --pedantic option (see Running Compiler and Interpreter) to warn you about undeclared occurrences of an identifier.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.1 Declaration Syntax

Syntactically, symbol declarations take the following form:

 
declaration             : prefix headers ';'
                        | [scope] 'type' unqualified-identifier
                          [':' identifier] ['=' sections] ';'
                        | [scope] 'extern' 'type' unqualified-identifier
                          [':' identifier] ['=' sections] ';'
                        | [scope] 'type' qualified-identifier
                          ['as' unqualified-identifier] ';'
                        | [scope] 'type' unqualified-identifier
                          '==' identifier ';'

prefix                  : scope
                        | [scope] modifier {modifier}

scope                   : 'private'|'public'

modifier                : 'const'|'special'|'extern'|'var'|'virtual'

headers                 : header {',' header}

header                  : unqualified-identifier '=' expression
                        | unqualified-identifier
                          {['~'] variable-identifier}
                        | qualified-identifier
                          {['~'] variable-identifier}
                          ['as' unqualified-identifier]
                        | '(' unqualified-opsym ')'
                          {['~'] variable-identifier}
                          ['@' precedence]
                        | '(' qualified-opsym ')'
                          {['~'] variable-identifier}
                          ['@' precedence]
                          ['as' unqualified-opsym]

precedence              : unsigned-number
                        | '(' operator ')'

sections                : section {'|' section}

section                 : [prefix] headers

For instance, the following are all valid symbol declarations:

 
public foo;
private extern bar X;
public special lambda X Y;
special cond::ifelse ~P X Y as myifelse;

public (--) Xs Ys;
private (::++) Xs Ys as concat;

const red, green, blue;
var FOO, BAR;
var const e = exp 1;

type Day = const sun, mon, tue, wed, thu, fri, sat;
public type BinTree = virtual bintree Xs | private const nil, bin X T1 T2;

The keywords private and public specify the scope of a symbol. Public symbols are accessible outside a script, and can be imported by other modules (see Scripts and Modules). If the keywords private and public are omitted, the scope defaults to private.

The special keyword serves to declare special forms, which are described in Special Forms. Function symbols declared with const introduce "constant" or "constructor" symbols; the compiler enforces that expressions created with such symbols are not redefined in an equation (cf. Equations). The built-in constants true, false, [] and () are already predeclared as const. The virtual keyword serves to declare "virtual constructors" which can be used to define concrete representations (so-called "views") of abstract data types, see Views. Non-const function symbols can also be declared as extern, meaning that the corresponding function is actually implemented by a corresponding "external" module, see C Language Interface.

The var keyword allows you to declare "free" variable symbols, which can be assigned a value by means of a variable definition, see Free Variables. If you use such a declaration, the variable symbol may also start with a lowercase letter (if it has not already been declared as a function symbol); note that without such a declaration, an identifier starting with a lowercase letter will implicitly be declared as a function symbol. (The idea behind this is that you can make a variable look like a function symbol, if the variable is actually supposed to be used for function values.)

Variable symbols can also be declared as const; in this case the variable can only be assigned to once and always refers to the same value once it has been defined. The built-in variables INPUT, OUTPUT, ERROR and ARGS are predeclared as const, see Command Language. A variable declaration may also contain an "initializer" of the form = X, where X is an expression. This has the effect of declaring and defining the variable in a single step.

As indicated, the scope-modifier prefix of a declaration is followed by a comma-separated list of headers. The first identifier in each header states the symbol to be declared. In function symbol declarations the function identifier may be followed by a list of variable identifiers which are used to denote the arguments of the declared function symbol. The variables are effectively treated as comments; only their number (called the arity of a function symbol) is recorded to check the consistency of different declarations of the same symbol, and to specify the number of "special" arguments in a special declaration. In special declarations variables may also be prefixed with `~' to declare them as "non-special" arguments; see Special Forms.

The first declaration of an unqualified identifier or operator symbol in a module always introduces a new symbol in the module's namespace. By default (if no explicit declaration precedes the first use of a new, unqualified symbol), the compiler automatically declares it as a private nullary function or variable symbol, depending on the case of the initial letter of the identifier (as already mentioned in Lexical Matters, capitalized identifiers are interpreted as variable symbols, others as function symbols).


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.2 Operator Declarations

As of Q 6.2, it is also possible to declare new prefix and infix operator symbols. Such declarations are never optional; operator symbols must always be declared before they are used. Syntactically, they take exactly the same form as a function symbol declaration, but the operator symbol is given inside parentheses. As discussed in Lexical Matters, the operator symbol may either be a function identifier or a sequence of punctuation characters. Examples:

 
public (myop) X Y;
public (--) X Y;

You can also specify a precedence level, like so:

 
public (myop) X Y @ 2;   // use explicit precedence level ...
public (myop) X Y @ (<); // ... or precedence level of given operator

If no precedence is given, it defaults to 2, which is the precedence of the relational operators. This is described in more detail in User-Defined Operators.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.3 Cross-Checking of Declarations and Aliases

The Q language allows multiple declarations of the same symbol, and the compiler will verify the consistency of all declarations of a given symbol. That is, if a symbol is declared with different scope (private or public), attributes (like var, const or special) or different number of arguments, or if non-special arguments of a special form are declared differently, then the compiler will issue an error message.

Note, however, that the compiler will never cross-check the number of parameters in a function symbol declaration with the actual number of arguments to which the function is applied when it is used in an equation or a variable definition. This is because in Q it is perfectly legal to have a function symbol applied to a varying number of arguments for the purpose of constructing partial applications. Also note that if a local function symbol is first used without an explicit declaration then it will be implicitly declared to have a zero argument count. If the symbol is then later declared with a nonzero number of arguments or other non-default attributes then the compiler will print an error message.

As indicated by the syntactic rules, it is also possible to redeclare a qualified symbol. This requires that the target module either is the current module or has already been imported in the current scope using an unqualified import clause, and causes the compiler to both cross-check the declaration with the declaration in the imported module and redeclare the symbol within the current namespace.

Such a redeclaration serves several different purposes. First and foremost, it allows you to ensure that an imported symbol was actually declared with the given attributes. Second, it lets you resolve name clashes by redeclaring the symbol in the current scope where it overrides unqualified imports of other modules; if you want, you can also import the symbol under a new name using an `as' clause (the new symbol is then also referred to as an alias of the original one). In these two cases the symbol will normally be redeclared as a private symbol. Third, by redeclaring a symbol as public, you cause the symbol to be reexported by the current module. Examples:

 
import gnats;
private gnats::foo X Y; // cross-check declaration of gnats::foo
public gnats::foo X Y as bar; // reexport gnats::foo as bar

Of course, the same also works with operator symbols. E.g., here is how you declare a new operator symbol concat which behaves exactly like the built-in `++' operator:

 
public (::++) X Y as concat;

There is yet another usage of an imported symbol redeclaration, namely the extern redeclaration. This is only possible with function symbols and if the redeclared symbol is not already declared extern by another module. It instructs the compiler that this module provides an external definition of the symbol which will override equational definitions in other modules, see C Language Interface, for details.

Note that, as of Q 7.8, most of the uses of qualified symbol redeclarations are now covered in a more convenient fashion by qualified imports, see Qualified Imports. Thus qualified symbol redeclarations will now mostly be used for cross-checking declarations and for external overrides.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.4 Type Declarations

A type declaration introduces a type identifier, an optional supertype for the type, and an optional list of "constructor" symbols for the type. It associates the function symbols in the list with the given type. The symbol list is a |-delimited list of individual sections. Each section takes the form of an ordinary symbol declaration, consisting of scope/modifier prefix and a comma-delimited list of headers. The prefix is optional; by default, a symbol has the same scope as the type it belongs to. To change scope and attributes of a symbol on the list, you use an appropriate scope/modifier prefix. For instance, you can declare a public BinTree type with private constructors nil and bin as follows:

 
public type BinTree = private const nil, bin X T1 T2;

Each constructor symbol may only be declared once; otherwise the compiler will issue an error message. Hence the compiler enforces that a constructor symbol only belongs to a single type, and that the number of arguments of that symbol is determined uniquely.

A constructor symbol may also be declared as virtual, which indicates that it may be used in concrete representations of abstract data types, so-called "views". Such symbols are usually non-const symbols implementing a public construction function which delivers values of the type they belong to, see Views, for details. For instance, to add a virtual constructor bintree to the above BinTree type, you might declare the type as follows:

 
public type BinTree = virtual bintree Xs | private const nil, bin X T1 T2;

Type identifiers may begin with either an upper- or lowercase letter (the convention, however, is to use capitalized identifiers). There may only be a single declaration for each type. As of Q 7.8, type identifiers must also be distinct from function or variable symbols declared in the same namespace, as both type and function/variable identifiers may now occur in the same context (namely, a qualified import clause).

Types are used on the left-hand side of equations to restrict the set of expressions which can be matched by a variable; see Type Guards, for details. The Q language has a number of predefined type symbols which distinguish the corresponding types of objects built into the language: Bool, Int, Float, Real (which is the supertype of both Int and Float), Num (the supertype of Real), String, Char (the subtype of String denoting the single-character strings), File, List, Stream, Tuple and Function. (Besides this, there are also two built-in types for exceptions, see Exception Handling.)

Like function symbols, types imported from other modules can also be redeclared (possibly under a new name), and reexported, e.g.:

 
public type array::Array as MyArray;

In this case, no constructor symbols or supertype are specified. (As of Q 7.8, it is also possible to use a qualified import clause to achieve this, see Qualified Imports. Also note that if a type is imported using a qualified import clause, then its constructor symbols will not be imported automatically; you will have to explicitly list them in the import clause if you need them.)

As of Q 7.1, there is an alternative, more customary syntax for introducing type aliases which looks as follows:

 
type Integer == Int;

Note that here the alias to be defined is specified first, and the existing type to be aliased can be specified without a module qualifier. This works exactly like an ordinary alias declaration, but resembles more closely the syntax commonly used for defining type synonyms in other languages like Pascal and Haskell.

Types can also be declared as extern, to indicate that they are realized in a corresponding C module, as described in C Language Interface. External types may have a supertype, but only virtual constructors. For instance:

 
extern type Bar = virtual bar X;

In difference to function symbols, an existing type imported from another module cannot be redeclared as extern. Therefore an external definition must always be given by the module which originally declares the type.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6. Expressions

The notion of expressions is an intrinsic part of most, if not all, programming languages. However, in most programming languages expressions are merely treated as descriptions of computations to be performed in order to obtain the value of an expression, which usually belongs to some basic type provided by the language. In contrast, in the Q language expressions are objects in their own right which, as we shall see, are manipulated according to computational rules specified by the programmer.

As usual, expressions are built from certain kinds of basic objects. In the Q language, these are integers, floating point numbers, strings, function and variable symbols. These objects are combined into larger, compound expressions by means of applications, tuples, lists and streams. For convenience, the Q language also has prefix, infix and mixfix operator symbols for the most common operations; these are actually treated as "syntactic sugar" for applications. Syntactic sugar is also provided for list, tuple and stream enumerations and comprehensions. The syntax of all these entities is described by the following grammatical rules:

 
expression              : identifier
                        | 'var' unqualified-identifier
                        | variable-identifier ':' identifier
                        | number
                        | string
                        | expression expression
                        | unary-op expression
                        | expression binary-op expression
                        | 'if' expression 'then' expression
                          ['else' expression]
                        | '\' expression { expression } '.' expression
                        | '(' [element-list|enumeration|comprehension] ')'
                        | '[' [element-list|enumeration|comprehension] ']'
                        | '{' [element-list|enumeration|comprehension] '}'
                        | '(' op ')'
                        | '(' expression binary-op ')'
                        | '(' binary-op expression ')'

element-list            : expression-list [',' | ';' | '|' expression]

enumeration             : expression-list '..' [expression]

comprehension           : expression ':' expression-list

expression-list         : expression {',' expression}
                        | expression {';' expression}

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.1 Constants and Variables

The nonterminals identifier, op, unary-op, binary-op, number and string occuring in the syntax rules above refer to the lexical entities introduced in Lexical Matters. Note that the Q language actually distinguishes two different types of identifiers (function and variable identifiers), two different kinds of operator symbols (unary and binary) and two different types of numeric quantities (integers and floating point numbers). Integers are implemented as "bignums" using the GNU multiprecision package (GMP); thus the size of an integer value is only limited by available memory. Floating point values are implemented using 64 bit (i.e., double precision) floating point numbers; on most machines, these should provide nonzero absolute values ranging from 1.7E-308 to 1.7E308 and a precision of 15 decimal digits.

String constants are character sequences internally represented as character arrays permitting constant-time access to the individual characters of a string. There is no a priori length restriction for string constants. In the current implementation, the null character \0 is reserved as a string terminator and must not be used as an ordinary string character.

Furthermore, the Q language provides five other built-in constants which denote the truth values (true and false), and the empty list, stream and tuple ([], {} and (), to be discussed in Lists, Streams and Tuples).

A variable symbol can be "bound" or "free", depending on the context in which it occurs. We say that a variable is bound in an equation if it occurs on the left-hand side of the equation. Otherwise, the variable is a free variable. In this case, the variable may denote a value (introduced with def), or simply stands for itself. In any case, a variable is a legal atomic expression which may stand wherever a constant is allowed.

On the left-hand side of equations, it is possible to declare that a variable is of a given type by using the notation: variable:type. This requires that you have previously declared type as a type identifier, see Declarations. When a type identifier is specified with a variable, the variable will only match values belonging to the given type, cf. Type Guards.

On the right-hand side of equations, an unqualified identifier can also be prefixed with the keyword var to indicate that the identifier is to be treated as a free variable symbol. Such inline variable declarations can be used to escape free variables (if the same identifier is also introduced as a bound variable on the left-hand side of the equation) and to declare free variable symbols on the fly, see Local Variables, for details.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.2 Applications

Application is probably the most important single construct in the Q language. It allows you to apply one object (the "function") to another one (the "argument"). This construct is used to denote "function calls" such as sqrt 2 as well as "constructor terms" such as bin X T1 T2 which encode tree-like data structures. Also, as we will see in Built-In Operators, operators like +, -, etc. are merely syntactic sugar for applications.

As in other contemporary functional languages, application is a binary operation written simply as juxtaposition: X Y denotes the application of X to Y, both of which may be arbitrary expressions themselves. Application associates to the left; the construct X Y1 … Yn is equivalent to (…((X Y1) Y2) …) Yn, and denotes the application of X to n arguments Y1, …, Yn. This style of writing function applications is commonly referred to as currying, after the American logician H.B. Curry. We will have to say more about this shortly.

Here are some valid examples of applications:

 
sqrt 2
sqrt (Y+1)
foo X (bar (Y-1)) Z

Note that since application is left-associative, nested applications in arguments must be set off with parentheses. For instance, foo (bar X) applies foo to bar X, whereas foo bar X applies foo to two arguments bar and X.

Since currying is so ubiquitous in functional programming, you should be well familiar with it, so let us briefly explain what it is, and what it is good for.

Functions of multiple arguments can generally be defined in two different ways:

The Q language supports both kinds of notations. Choosing between the two is more than just a matter of taste. Besides saving parentheses, curried functions provide a cheap way to derive new functions through "partial applications" of a multi-argument function. For instance, given the curried definition of max from above, max 0 can be used to denote the function computing the "nonnegative part" of its argument (which is the argument itself if it is nonnegative, and zero otherwise). This does not work with the uncurried definition since that definition requires us to specify both arguments of max in one chunk; instead, we would have to start defining the derived function from scratch.

Uncurried definitions also have their merits. In particular, they allow us to define variadic functions, i.e., functions which can handle a varying number of arguments. For instance, the following definition enables us to apply the max function to both pairs and triples:

 
max (X,Y)               = X if X>=Y;
                        = Y otherwise;
max (X,Y,Z)             = max (X,max (Y,Z))

In fact, in the Q language it is possible to define generic functions which apply to any number of arguments specified as a tuple. For instance, the following version of max handles tuples of any size at least 2:

 
max (X,Y)               = X if X>=Y;
                        = Y otherwise;
max (X,Y|Zs)            = max (X,max (Y|Zs));

(In the above example, the notation (…|Zs) is used to denote a tuple with "tail" Zs. This is explained in the following section.)


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.3 Lists, Streams and Tuples

The Q language provides three closely related general constructs for representing sequences of objects: lists, streams and tuples.

We start out with a discussion of the list data structure. The constant [] denotes the empty list. In general, a list consisting of elements X1, X2, …, Xn is denoted [X1,X2,…,Xn]. For instance, [a,b,c] consists of three elements (symbols) a, b and c. In contrast to statically typed languages like Haskell and ML, list members may have different types; e.g., [a,10,"a string"] is a perfectly well-formed list consisting of a symbol, a number and a string. It is also possible to have nested lists, as in [a,[b,c]] which consists of two elements, the symbol a and the list [b,c].

You can also have a trailing comma in a list, as in [a,b,c,]. The trailing comma is ignored, so the above is exactly the same as [a,b,c]. This makes it possible to format a list as follows, which lets you add new items at the end more easily:

 
def ITEMS = [
              "This is the first item",
              "This is the second item",
              …
              "This is the last item",
            ];

As in Prolog, lists are actually represented in a right-recursive fashion using a binary constructor [|] which takes as its arguments the head element of the list and the list of the remaining elements, called the tail of the list. Thus, e.g., [a,b,c] is simply a shorthand notation for [a|[b,c]], which denotes a list with head a and tail [b,c] (which is in turn a shorthand for [b|[c]], etc.). You can also mix both styles of notation; for instance, [a,b|[c,d]] is another way to represent the 4-element list [a,b,c,d].

Note that [a|[b,c]] is different from [a,[b,c]]: the former denotes a three-element list, while the latter is a two-element list whose second member happens to be a list itself. Also note that the [|] constructor can in fact be applied to any pair of values (the second value does not have to be a list); e.g., [a|b] is a perfectly well-formed expression (although the built-in length, indexing and concatenation operations described in String, List and Tuple Operators, will fail on such values).

Q also provides so-called enumerations to construct lists from a range of values. These usually take the form [X1,X2,…,Xn..Y] which is just syntactic sugar for the application enum [X1,X2,…,Xn] Y involving the built-in enum function, see Enumeration Types, for details. Syntactic sugar is also provided for list comprehensions, which are discussed in Conditionals and Comprehensions. Analogous constructs are also provided for streams and tuples.

Streams look exactly like lists except that they are written with curly braces. The difference between lists and streams is that the head and tail of a stream are not evaluated until their values are actually required during the course of a computation. For instance, you will find that if you enter an expression like {1+1,2+2,3+3} in the interpreter, then the value of that expression will be just the expression itself, because the embedded applications of the `+' operator are not evaluated:

 
==> {1+1,2+2,3+3}
{1+1,2+2,3+3}

In contrast, if you enter the same sequence as a list, its elements will be evaluated immediately:

 
==> [1+1,2+2,3+3]
[2,4,6]

A stream element will be evaluated automatically when it is accessed, e.g., using the index operator:

 
==> {1+1,2+2,3+3}!1
4

This is also known as lazy or non-strict evaluation: the evaluation of the stream elements is deferred until they are actually needed. In fact, the entire tail of a stream may be deferred, which makes it possible to create infinite sequences whose members are produced "on the fly" while traversing the sequence.

Streams are implemented in terms of so-called "special forms" which will be introduced in Special Forms. We therefore postpone a closer description of this data structure until Special Constructors and Streams. Also note that the Q language itself only defines the stream constructors; most other stream functions are actually implemented by the standard library script stream.q, see Streams.

Tuples also work in much the same fashion as lists: The constant () denotes the empty tuple, and a tuple consisting of elements X1, X2, …, Xn is written as (X1,X2,…,Xn), which is equivalent to (X1|(X2|…|(Xn|()))), where the notation (X|Xs) denotes a tuple consisting of a first element X and the tuple Xs of the remaining elements. A trailing comma may follow the last tuple element, so that (a,b,c,) is the same as (a,b,c). The trailing comma is optional in all cases except one: As in the Python programming language, 1-tuples are indicated with a trailing comma, to distinguish them from ordinary parenthesized expressions. Hence (a,) denotes the 1-tuple with the single member a, while (a) just denotes a itself.

Note that in contrast to languages like ML and Haskell, tuples are not actually needed to represent sequences of values with different types, as Q's list data structure already allows you to do that. Also, Q's tuples are a lot more flexible than in ML and Haskell, as they can be handled mostly like list values. The big difference between lists and tuples in the Q language is that, while lists are always represented as recursive data objects using a binary constructor symbol (just the way that they are written), tuples are actually implemented as "vectors" which are stored as contiguous sequences in memory. (This only works for "well-formed" tuples. If the tail Xs of a tuple (X|Xs) is not a tuple, then this value can only be represented using an explicit application of the tuple constructor.) Therefore tuples normally use up much less memory than lists of the same size, and they also allow constant-time access to their members. The size of a tuple can be determined in constant time as well. In contrast, the same operations, when applied to a list, require time proportional to the size of the list. On the other hand, lists are more efficient when accessing the tail of a list and when a new element is prepended to a list, which can both be done in constant time. Here a tuple needs time proportional to its size, since the member sequence of the original tuple must be copied when accessing its tail or when prepending an element.

These tradeoffs should be carefully considered when deciding whether to implement a given sequence as a list or a tuple. Tuples are usually the best choice for implementing fixed sequences requiring fast random access to its individual members, whereas lists provide the most efficient representation if the elements are traversed and manipulated sequentially. If necessary, you can easily convert between these two representations using the built-in functions list and tuple, see Conversion Functions.

Lists, streams and tuples are frequently combined to create nested structures like a tuple of lists and/or streams, a list (or stream) of lists, or a list (or stream, or tuple) of tuples. The latter construct (sequences of tuples) is so common that Q provides special support for it in the form of the following "grouping" syntax. The notation [X1, X2, …; Y1, Y2, …; …] is a shorthand for [(X1, X2, …), (Y1, Y2, …), …]. Note that there must be at least one element in each group (but the number of elements in different groups may vary), and there must be at least one group (a single group is denoted with a trailing `;'). Thus, for instance, [A,B;C,D;X,Y,Z] is the same as [(A,B),(C,D),(X,Y,Z)], and [X,Y,Z;] is the same as [(X,Y,Z)]. The same notation also applies to streams and tuples.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.4 Built-In Operators

Besides the forms of expressions already discussed above, the Q language also provides a collection of prefix, infix and mixfix operator symbols for common arithmetic, logical, relational and other operations. A complete list of these symbols is given below, in order of decreasing precedence:

' ` ~ &

quotation operators (unary)

.

function composition

^ !

exponentiation and subscript

- # not

prefix operators (unary)

* / div mod and and-then

multiplication operators

++ + - or or-else

addition operators

< > = <= >= <> ==

relational operators

$

infix application operator

if-then-else

conditional expressions

||

sequence operator

\X.Y

lambda abstractions

Most of these symbols have their usual meaning; a closer description follows below. All binary operators are left-associative, with the exception of ^, ! and $ which associate to the right, and the relational operators which are nonassociative. Application takes precedence over all these operations except the quotation operators; hence sqrt X^3 denotes (sqrt X)^3 and not sqrt (X^3). The quotation operators have the highest possible precedence, and hence bind stronger than even applications. Parentheses are used to group expressions and overwrite default precedences and associativity as usual. C programmers will also note that the logical operators have the same "wrong" precedence as in Pascal. Thus you should make sure that you always parenthesize relational expressions when combining them with logical connectives.

You should also note that unary minus must be parenthesized when appearing in an argument of a function application. For instance, although foo X -Y is a perfectly well-formed expression, it denotes (foo X) - Y and not foo X (-Y) as you might have intended by the spacing which is however ignored by the Q compiler. As already noted in Lexical Matters, unary minus in front of a number is special; it causes values such as -2 to be interpreted as negative numbers rather than denoting an explicit application of the unary minus operator (an explicit application of unary minus can be denoted using the built-in minus symbol; see below). The rules for parenthesizing negative numbers are the same as those for unary minus; you have to write foo X (-2) instead of foo X -2 (which denotes (foo X) - 2).

In the Q language, expressions involving prefix and infix operators are merely syntactic sugar for applications. By enclosing such an operator in parentheses, you can turn it into an ordinary prefix function. For instance, X+Y is exactly the same as (+) X Y, and not X actually denotes the application (not) X. The built-in operators simply provide a more convenient way for applying these operations to their arguments, which resembles common mathematical notation. Moreover, you can also turn a binary operator into a unary function by specifying either the left or the right operand. E.g., (1/) denotes the reciprocal and (*2) the doubling function. Such constructs are commonly called operator sections. Inside a section, the usual precedence and associativity rules apply. For instance, the X+3 subterm in (*(X+3)) must be parenthesized because * has a higher precedence than +, and hence the partial expression (*X+3) is not well-formed.

The - operator plays a somewhat awkward role in the syntax, since it is used to denote both unary and binary minus. Q adopts the convention that the notation (-) always denotes binary minus; unary minus may be denoted by the built-in minus function. Thus the expression minus X applies unary minus to X. Note that this construct always denotes an explicit application of the unary minus operation. For instance, minus 5 denotes the application of unary minus to the integer 5, while -5 is a negative integer. Also note that the construct (-X) is not an operator section, but a parenthesized expression involving unary minus. The easiest way to construct an operator section which subtracts a given value from its argument is to formulate the function using the addition operator as in (+(-X)).

Another special case are the mixfix operators if-then-else (conditional expression) and \X.Y (lambda abstraction). These are just syntactic sugar for the standard library functions ifelse/when and the built-in function lambda, so no special syntax is needed to turn them into prefix functions.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.4.1 Quotation Operators

The ' (quote), ` (backquote), ~ (tilde) and & (ampersand) operators are used to deal with so-called special forms. The quote operator quotes an expression as a literal; it is a constructor symbol and hence becomes part of the quoted expression. The backquote and tilde operators are used to "splice" and "force" subterms in an expression. The ampersand operator causes "special arguments" to be memoized. We postpone a discussion of these operations until Special Forms.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.4.2 Function Composition

The `.' operator denotes function composition: (F.G) X = F (G X). For instance, to double a value and then add 1 to it, you can use an expression like ((+1).(*2)) X. Note the parentheses around the composition - as application binds stronger than composition, F.G X is interpreted as F.(G X) rather than (F.G) X. If a composition involves applications with integer arguments, extra whitespace may be needed to distinguish between the `.' operator and the decimal point; e.g., F 5 . G denotes a composition of the functions F 5 and G, whereas F 5.G is an application of the function F to two arguments, the floating point value 5. and the function G. Also note that the composition operator is associative, since (F.G.H) X = ((F.G).H) X = F (G (H X)) = (F.(G.H)) X.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.4.3 Arithmetic Operators

The operators +, -, *, / denote the sum, the difference, the product and the quotient of two numeric values, respectively. As already noted, - is also used as unary minus. The operators div and mod denote integer quotient and modulo, respectively; they only apply to integers. The ^ operator denotes exponentiation, as defined by X^Y = exp (ln X*Y); it always returns a floating point value. (If X<0 then X^Y is defined only if Y is an integer. 0^0 is left undefined as well, so if you would like to have that 0^0 = 1 then you must add corresponding equations yourself. Also note that the complex.q standard library module extends the built-in definition of the exponentiation operator to handle the case that X<0 with general exponent; see Complex Numbers.)

The argument and corresponding result types of these operations are summarized in the following table (Int denotes integer, Float floating point, and Real integer or floating point values):

+ - *

Int Int -> Int
Int Float -> Float
Float Int -> Float
Float Float -> Float

/ ^

Real Real -> Float

div mod

Int Int -> Int

- (unary)

Int -> Int
Float -> Float

Note that, as of Q 7.2, all floating point operations properly deal with IEEE floating point infinities and NaN ("not a number") values. Consequently, division by zero now yields one of the special values inf, -inf or nan, depending on the value of the first operand. (Of course, this will only work as described on systems with a proper IEEE floating point implementation. The interpreter makes no attempt to emulate IEEE floating point values on systems which do not provide them natively.)


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.4.4 Relational Operators

The operators <, >, =, <=, >=, <> are binary predicates meaning "less", "greater", "equal", "less or equal", "greater or equal" and "not equal", respectively. The built-in definition of these operations only applies to numbers, strings and truth values. All relational operators return truth values (true, false) as results. Strings are compared lexicographically, on the basis of the character encoding (which, as of Q 7.0, usually is Unicode). Truth values are ordered by false < true.

If you would like to compare other types of values than the basic objects mentioned above, normally you will have to provide suitable definitions yourself. For instance, you might wish to extend the equality operation to other built-in and user-defined data structures such as lists, trees, etc., by "overloading" the = operator accordingly. The following equations implement an equality predicate on lists (the parentheses on the left-hand side are necessary to prevent the equality operation to be interpreted as the equal sign separating left-hand and right-hand side of an equation):

 
([] = [])               = true;
([] = [Y|Ys])           = false;
([X|Xs] = [])           = false;
([X|Xs] = [Y|Ys])       = (X=Y) and then (Xs=Ys);

Rules for other comparison operators (<>, <=, etc.) could be added in the same fashion. Actually, the standard library provides corresponding definitions; see The Standard Library.

A special case arises with types consisting of only nullary constructor symbols declared with const, so-called enumeration types, see Types. The values of such a type can always be compared with the relational operators, using the order in which they are listed in the type declaration. (A special case of this is the order of the built-in truth values.) For instance, assume that the Day type is declared as follows:

 
type Day = const sun, mon, tue, wed, thu, fri, sat;

Then the listed constants will be ordered as sun < mon < … < sat.

Besides the equality predicate, the Q language also provides a notion of "syntactic" equality, implemented by the `==' operator, which applies to all kinds of expressions; see Non-Linear Equations, for details.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.4.5 Logical and Bit Operators

The logical operations not, and, or denote logical negation, conjunction and disjunction, respectively. These operators take truth values as their arguments. They are defined in a straightforward manner:

 
not true                = false;
not false               = true;

true and true           = true;
true and false          = false;
false and true          = false;
false and false         = false;

true or true            = true;
true or false           = true;
false or true           = true;
false or false          = false;

Like most other programming languages, Q also has logical connectives for the short-circuit evaluation of logical expressions, which are denoted by the operators and then and or else. These operations are actually implemented as "special forms" which evaluate their arguments from left to right only as far as required to determine the value of the expression (cf. Special Forms). They are defined by the following built-in equations:

 
true and then X         = X;
false and then X        = false;

false or else X         = X;
true or else X          = true;

The first operand is always evaluated. Depending on its value, the second operand may not be evaluated at all. For instance, if X evaluates to false, then X and then Y immediately reduces to false, without ever having to evaluate the second argument Y. On the other hand, if X evaluates to true, then Y is evaluated and returned as the value of the expression. The or else operation works analogously.

One reason for using short-circuit evaluation is efficiency: prevent unnecessary computations when an initial part of a logical expression already determines the value of the entire expression. Furthermore, short-circuit evaluation is sometimes essential to check a condition before evaluating an expression depending on the validity of this condition. For instance:

 
(X <> 0) and then (foo (1/X) > 0)

You should also note that, according to the above definitions, X and then Y and X or else Y are always reduced if X is a truth value, even if the second operand Y does not evaluate to a truth value. This may look a bit weird at first, but it is in fact the most reasonable way to implement short-circuit evaluation in the Q language, since Q uses dynamic typing, and hence the type of an expression is only known after it has been evaluated. In fact, this "feature" can be quite useful at times. For instance, you can also use and then to write down a simple kind of conditional expression like

 
check X and then bar X

where bar X is the value you wish to compute, but only if the condition check X holds.

The Q interpreter also uses the operators not, and and or to denote bitwise logical operations on integers. Thus, not X denotes the one's complement of an integer X, and X and Y and X or Y the bitwise logical conjunction and disjunction of integers X and Y, respectively. These operations behave as if negative integers were represented in two's complement (although GMP actually uses a sign-magnitude representation). This means that for each integer X, -X = not X + 1, or, equivalently, not X = -X-1. For instance:

 
==> 17 and not 13
16

==> 17 or not 13
-13

==> not _
12

These results become clear when considering that 17 has bits 0 (=1) and 4 (=16) on (and all other bits off), while not 13 has bits 0 (=1), 2 (=4) and 3 (=8) off (and all other bits on). Thus, writing these numbers as unsigned bit sequences with most significant bits first, where ...1 denotes an infinite sequence of leading 1's, we have:

 
17 and not 13 =
10001 and not 1101 = 10001 and ...10010 = 10000
= 16

17 or not 13 =
10001 or not 1101 = 10001 or ...10010 = ...10011 (= not 1100 = not 12)
= -13

Together with the built-in bit shift operations, the bitwise logical operations can also be used to implement "bitsets", i.e., sets of nonnegative integers represented by bit patterns. See Arithmetic Functions, for details.

In case you are wondering about "exclusive or:" While this operator is not provided as a builtin, you can easily define it yourself as follows:

 
public (xor) X Y @ 3;
X xor Y                 = (X or Y) and not (X and Y);

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.4.6 String, List and Tuple Operators

The ++ operator denotes concatenation, # is the length or size operator, and the subscript operator ! is used for indexing. These operations work consistently on strings, lists and tuples. For instance:

 
"abc"++"xy"             ⇒ "abcxy"
[a,b,c]++[x,y]          ⇒ [a,b,c,x,y]
(a,b,c)++(x,y)          ⇒ (a,b,c,x,y)

#"abc"                  ⇒ 3
#[a,b,c]                ⇒ 3
#(a,b,c)                ⇒ 3

"abc"!1                 ⇒ "b"
[a,b,c]!1               ⇒ b
(a,b,c)!1               ⇒ b

Indexing with the subscript operator starts at zero, so that X!0 and X!(#X-1) denote the first and last member of a string, list or tuple, respectively. Also note that since ! is right-associative, double subscripts may have to be parenthesized. For instance, compare (X!I)!J and X!I!J=X!(I!J).

All list and tuple operators check that their first argument is a well-formed list or tuple value. However, the second argument of the ++ operator may in fact be any value. List concatenation just replaces the empty sublist marking the end of its first list argument with the second argument, as if it was defined by the following equations:

 
[]++Ys                  = Ys;
[X|Xs]++Ys              = [X|Xs++Ys];

Hence we have that, e.g., []++1 ⇒ 1 and [1,2]++3 ⇒ [1,2|3], which may look odd, but is consistent with the above equations. Tuple concatenation works in an analogous manner.

All operators discussed in this section are also available for streams (cf. Lists, Streams and Tuples). However, the corresponding definitions are not built into the language but are provided in the standard library script stream.q, see Streams.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.4.7 Application and Sequence Operators

The application operator `$' and the sequence operator `||' have the lowest precedence among the prefix and infix operators. The $ operator is right-associative and binds stronger than the || operator, which is left-associative.

The infix application operator provides an alternative way to denote function application, i.e., F $ X ⇒ F X. This operator comes in handy when you want to pass around function application as a function parameter. Since this operator has very low precedence and is right-associative, it also provides a convenient way to write down "cascading function applications," such as foo $ bar $ X+Y = foo (bar (X+Y)), which can save a lot of parentheses and make an expression more readable.

The sequence operator lets you evaluate sequences of expressions in a given order. The value of the sequence is given by the rightmost expression. That is,

 
X || Y                  ⇒ Y.

The sole purpose of this construct is to allow operations with side-effects (such as I/O functions) to be carried out sequentially. A typical example is

 
writes "Input: " || reads

which prints a prompt on the terminal and then reads in a string. (The built-in functions writes and reads are described in Built-In Functions.)


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.4.8 Conditional Expressions and Lambdas

As of version 7.1, Q also provides special syntax for two other kinds of constructs which are commonly encountered in functional programs: conditional expressions and lambda abstractions. A conditional expression takes the form if X then Y else Z where X, Y and Z are arbitrary expressions. It returns the value of Y if the value of X is true, and the value of Z if the value of X is false. The `else' part can also be omitted, in which case () is returned if the value of X is false. Like the built-in logical connectives and then and or else, conditional expressions are special forms which are evaluated in short-circuit mode; thus, if X evaluates to false, then Y will never be evaluated.

Lambda abstractions can be denoted using the customary notation \X.Y where X is an expression denoting a pattern to be matched against the argument of the lambda function, and Y is another expression, the lambda body which is to be evaluated when the lambda function is applied to an argument. Note that if the argument pattern X is not an atomic expression (i.e., not a basic expression, list, stream or tuple) then it must be parenthesized; the lambda body Y never has to be parenthesized, however, because of the low precedence of the lambda construct. Multi-argument lambdas can be written using the notation \X1 X2 … . Y, which is just a shorthand for \X1 . \X2 . … . Y. Lambdas are special forms; neither the parameter patterns nor the body are evaluated, rather they are "compiled" (at runtime) to a special kind of function object which can be applied to the actual arguments in an efficient manner, see Lambda Abstractions, for details.

Like the other operators, both if-then-else and the \X.Y construct are merely syntactic sugar for function applications. The if-then-else operator is actually implemented by the standard library function ifelse (or the when function if the `else' part is omitted), see Conditionals and Comprehensions. The \X.Y construct translates to an application of the built-in lambda function, see Lambda Abstractions.

Note that the \X.Y construct has the lowest possible precedence, even lower than the sequencing operator `||', so it always has to be parenthesized unless it forms a toplevel expression or the body of another lambda construct. The if-then-else operator binds stronger than `||', but weaker than `$', and "dangling else" parts are assumed to belong to the most recent if-then. This makes imperative-style code like the following behave as expected:

 
test = \X.
       writes "The value is " ||
       if X > 0 then
         writes "positive.\n"
       else if X < 0 then
         writes "negative.\n"
       else
         writes "zero.\n";

Also note that if-then-else needs to be parenthesized unless it occurs as a toplevel or inside a lambda, sequence or another conditional expression:

 
test = \X. writes $ "The value is " ++
       (if X > 0 then
          "positive"
        else if X < 0 then
          "negative"
        else
          "zero") ++ ".\n";

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.5 User-Defined Operators

As of version 6.2, the Q language also lets you you define your own prefix and infix operator symbols, using a declaration like the following (see also Declarations):

 
public (myop) X Y;

This effectively turns myop into a keyword which can be used as an infix operator: X myop Y. User-defined operators work exactly like the built-in ones, in that you can turn them into a prefix function by enclosing them in parentheses, as in (myop), and that you can build sections from them which leave the left or right operand unspecified, such as (X myop) and (myop X).

By default, the new operator has the same precedence and associativity as the relational operators (which are at precedence level 2, and are non-associative). You can also explicitly specify a precedence level, like so:

 
public (myop) X Y @ 2;

Alternatively, you may denote the precedence level by listing an existing operator symbol. The following is equivalent to the above declaration:

 
public (myop) X Y @ (<);

As of Q 7.7, the operator symbol doesn't have to be an identifier, it can also be a sequence of punctuation characters as discussed in Operator Symbols. For instance:

 
public (===) X Y @ (=);

As a real-world example, here is the definition of an xor (exclusive-or) operator from Logical and Bit Operators. It uses the third precedence level, which is occupied by the addition operators (which also includes the built-in or operator):

 
public (xor) X Y @ 3;
X xor Y                 = (X or Y) and not (X and Y);

In general, a quick glance at the operator table in the preceding subsection reveals that valid precedence level numbers for the prefix and infix operators go from 0 (lowest precedence, binary sequence operator) to 9 (highest precedence, unary quotation operators), with 8 denoting the precedence of function application. (Note that there are no numbered precedence levels for the if-then-else and \X.Y constructs because there is no way to declare such mixfix operators in Q.) The complete list of built-in prefix and infix operators, cast as a sequence of valid Q declarations, is shown below.

 
public (::||) X Y @0, (::$) X Y @1;
public (::<) X Y @2, (::<=) X Y @2, (::>) X Y @2, (::>=) X Y @2,
  (::=) X Y @2, (::<>) X Y @2;
public special (::==) X Y @2;
public (::++) X Y @3, (::+) X Y @3, (::-) X Y @3,
  (::or) X Y @3;
public special (::or else) ~X Y @3;
public (::*) X Y @4, (::/) X Y @4, (::div) X Y @4, (::mod) X Y @4,
  (::and) X Y @4;
public special (::and then) ~X Y @4;
public (::#) X @5, (::not) X @5;
public (::^) X Y @6, (::!) X Y @6;
public (::.) X Y @7;
public special const (::') X @9;
public (::`) X @9, (::~) X @9, (::&) X @9;

Note that precedence level 8 may not be used for user-defined operators. Other than that, all precedence levels are permitted, but the fixity (prefix/infix), the number of operands (unary/binary) and (unlike in ML and Haskell) also the associativity of operators (non-/left-/right-associative) are all determined by Q's system of built-in operators, as described in Built-In Operators.

Caveats

While user-defined operators are often useful for extending Q's built-in operator system and to implement "domain-specific languages", the excessive use of obscure operator symbols, especially the use of multi-character punctuation looking like "line noise", may make your programs hard to read and is considered bad style.

As explained in Operator Symbols, some built-in operator symbols, such as `-', `=' and `.' are actually part of Q's syntax and need special treatment. These symbols may therefore not be redeclared as user-defined operators. The same applies to special symbols like `|', `..', `\', `@' and the keywords of the language. In general, you should also refrain from redeclaring other built-in operators since this will most likely be confusing for other programmers reading your code. Most of the time that you want to define, say, a `+' operation on a given data structure, extending the built-in `+' operator should work just fine; there's usually no need to declare a new `+' operator for that purpose.

You should also be aware that declaring new operator symbols actually amounts to changing the syntax of the Q language in small and sometimes subtle ways, both at compile- and runtime. The amount of havoc you can wreak on the syntax is somewhat limited by the fact that Q does not allow you to create your own precedence levels; all the available levels and the corresponding fixities of operators are prescribed by Q's expression syntax. Nevertheless, designing your own operator systems takes careful planning. In particular, since operator symbols are parsed using the maximal munch rule (see Operator Symbols), it is a bad idea to declare binary operator symbols like `--' or `<-' which end in a valid prefix of a unary operator (`-' in this case). Such ambiguities can break existing code and produce bugs which are hard to find. When in doubt, just use symbols from the extended Unicode character set; there are plenty of nice mathematical and other punctuation symbols in Unicode which are very useful for programming purposes.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

7. Equations and Expression Evaluation

"Computational processes are abstract beings that inhabit computers." [Abelson/Sussman 1996, p.1]

At its core, any programming language faces the programmer with three distinct abstract notions [Abelson/Sussman 1996]: the entities which are to be manipulated, usually referred to as "data", the computational "processes" which carry out those manipulations, and the description of these processes by finite collections of computational rules, commonly called "programs".

As in other functional programming languages, all computations performed by Q scripts are expression evaluations: The user specifies an expression to be evaluated, the interpreter computes the value of that expression and prints it as the result of the computation.

Having described the "data" manipulated by Q scripts, expressions, we now turn to the computational processes carried out on those expressions, expression evaluations, and their description by collections of computational rules, equations.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

7.1 Equations

In the Q language, there is in fact no exact equivalent of a "function definition" in conventional languages. Rather, definitions take the form of equations which describe how a given expression, the left-hand side of the equation, can be transformed into another one, the corresponding right-hand side. Note the difference between this interpretation and common mathematical usage in which an equation is usually something to be "solved". Here we are actually dealing with "rewriting rules", i.e., the equations are oriented from left to right. The syntax of equations is captured by the following grammar rules:

 
definition              : expression
                          lqualifiers '=' expression qualifiers ';'
                          {lqualifiers '=' expression qualifiers ';'}
lqualifiers             : [qualifier {qualifier} ':']
qualifiers              : {qualifier}
qualifier               : condition
                        | where-clause
condition               : 'if' expression
                        | 'otherwise'
where-clause            : 'where' expression '=' expression
                          {',' expression '=' expression}

Variables occurring on the left-hand side of an equation are taken to be universally quantified. They play the role of formal parameters in conventional programming languages, which are "bound" to their corresponding values when the equation is applied to an actual expression value. For instance, suppose we wish to define a function sqr which squares its argument. We know that to square something we simply multiply it with itself. The corresponding equation is:

 
sqr X                   = X*X;

Here the left-hand side is simply the application of the function symbol sqr to an argument value X which stands for any actual value we might apply the sqr function to.

An equation may also contain a collection of conditions and "local" variable definitions (these two kinds of supplementary elements are also called qualifiers in Q parlance), and multiple right-hand sides (which are interpreted as a collection of equations for the same left-hand side). An empty condition may be denoted with the keyword otherwise, which is a convenient means to indicate the "default case" of a definition. For instance, the following equations define the factorial function:

 
fac N                   = N*fac (N-1) if N>0;
                        = 1 otherwise;

The otherwise keyword is nothing but syntactic sugar; it can always be omitted. However, it tends to improve the readability of definitions like the one above.

Qualifiers can also be written on the left-hand side of an equation, like so:

 
fac N if N>0:           = N*fac (N-1);
      otherwise:        = 1;

The two forms are more or less equivalent, and can be mixed, but left-hand side qualifiers may also be shared by multiple equations for the same left-hand side. For instance, here is a definition of Ackerman's function:

 
ack M N if M>0:         = ack (M-1) (ack M (N-1)) if N>0;
                        = ack (M-1) 1 if N=0;
        if M=0:         = N+1;

Note that the condition M>0 on the left-hand side of the first equation applies to both the first and the second equation. In general, the scope of a left-hand qualifier extends up to (but excluding) the next equation with either another left-hand side qualifier or a new left-hand side expression.

In difference to function definitions in conventional programming languages, the left-hand side of an equation may also contain constant and structured argument patterns which are to be matched against the actual parameters of a function. For instance, the following equations, which involve constant values on the left-hand side, define the Fibonacci function:

 
fib 0                   = 0;
fib 1                   = 1;
fib N                   = fib (N-1) + fib (N-2) otherwise;

As an example for structured argument patterns, operations to extract the head (first element, the "car" in Lisp terminology) and tail (list of remaining elements, the "cdr") of a list are defined in the standard library as follows:

 
hd [X|Xs]               = X;
tl [X|Xs]               = Xs;

Note that in the above example we actually only need one of the involved component values. In such a case it is possible to employ the anonymous variable `_' as a placeholder for any component value we do not care for. For instance:

 
hd [X|_]                = X;
tl [_|Xs]               = Xs;

Multiple instances of the anonymous variable are matched independently from each other, therefore an equation like

 
foo _ _                 = 0;

will be matched for any combination of arguments. Also note that values matched by the anonymous variable on the left-hand side of an equation are not accessible on the right-hand side. (The anonymous variable may appear on the right-hand side of an equation, but there it is always treated as a free variable, see Free Variables.)

As another example, here is the definition of a function which adds up the members of a list:

 
add []                  = 0;
add [X|Xs]              = X+add Xs;

The above definition is the Q equivalent for what Lisp programmers call "cdr'ing down a list". In general you will find that pattern-matching lets you express many functions operating on recursive, list- or tree-like data structures in a very concise and elegant way. For instance, we can define a binary tree insertion operation simply as follows:

 
const nil, bin X T1 T2;
insert X nil            = bin X nil nil;
insert X (bin Y T1 T2)  = bin Y (insert X T1) T2  if X<Y;
                        = bin Y T1 (insert X T2)  otherwise;

Here the nil and bin symbols act as constructors which are used to build a recursive binary tree data structure. To these ends, the bin symbol can be applied to a value X (which is the value to be stored at that node in the tree) and the left and right subtrees T1 and T2. This works exactly like a function which is applied to its arguments, but here the bin symbol does not actually compute anything but simply stands for itself.

In the above example, we actually advertised the fact that nil and bin are just constructors by declaring them with const, as described in Declarations. While this declaration is not strictly necessary, it helps to enforce that these symbols cannot accidentally be "redefined" by means of an equation:

 
nil                 = ...; // compiler will print an error message
bin X T1 T2         = ...; // here, too
bin X               = ...; // here, too (prefix of a constant)

The same applies to the built-in constant symbols true and false which are predeclared as const, and also to other types of built-in constants, i.e., numbers, strings, files, lists, streams and tuples. The general rule is that no constant value is allowed as the left-hand side of an equation. A singleton variable is also forbidden. Thus the left-hand side must either be a non-const function symbol or application. Prefixes of constant applications, such as bin X in the example above, also count as constants and are thus forbidden on the left-hand side. On the other hand, both constants and variables may occur as the head of the left-hand side of an equation if they are followed by additional arguments. Thus equations like the following are all valid:

 
0 X                 = ...;
bin X T1 T2 Y       = ...;
F X Y               = ...;

Note that since expressions involving operators are nothing but syntactic sugar for applications, they can occur on the left-hand side of an equation as well (with the exception of the ' operator which is a constructor symbol, see Special Forms). For instance, here is a collection of algebraic simplification rules which cause any expression involving only + and * to be converted into a "sum of products" form:

 
// distributivity laws:
(X+Y)*Z                 = X*Y+X*Z;
X*(Y+Z)                 = X*Y+X*Z;
// associativity laws:
X+(Y+Z)                 = (X+Y)+Z;
X*(Y*Z)                 = (X*Y)*Z;

A word of caution: if the = operator occurs at the toplevel of the left- or right-hand side of an equation, it must be parenthesized. This is necessary to prevent the = operator to be confused with the = symbol used to separate both sides of an equation. (The same also applies to variable definitions.) E.g., you should write something like

 
(X=X)                   = true;

instead of

 
X=X                     = true; // this is a syntax error

Although almost any expression can form the left-hand side of an equation, you should be aware that an equation will only be applicable if the pattern you specify can actually occur in the course of an expression evaluation. In particular, since the Q interpreter evaluates expressions in a leftmost-innermost fashion, you should make sure that argument patterns match normal form expressions (cf. Normal Forms and Reduction Strategy), unless the function to be defined is a special form (cf. Special Forms). Otherwise the equation may be useless, albeit syntactically correct. The compiler cannot verify these conditions, so they are at your own responsibility.

Using equations we can also easily define "higher order" functions which take functions as arguments or return them as results. This is made possible by the fact that function application is an explicit operation. For instance, as a generalization of the add function above, the standard library provides an operation which applies any "binary" function F to a list, starting from a given initial value A:

 
foldr F A []            = A;
foldr F A [X|Xs]        = F X (foldr F A Xs);

The foldr ("fold-right") function takes as its arguments a function F, an initial value A, and a list [X1,...,Xn], and computes the value

 
F X1 (F X2 (... (F Xn A)...)).

(The name of this function comes from the fact that the recursive applications of F are parenthesized to the right.) Now we can easily define the add function in terms of foldr as follows:

 
add                     = foldr (+) 0;

There is another version of the fold operation, which accumulates results from left to right rather than from right to left:

 
foldl F A []            = A;
foldl F A [X|Xs]        = foldl F (F A X) Xs;

Note that although both functions work in linear time (with respect to the size of the input list), the foldl function actually is more efficient than foldr in terms of stack space requirements since its definition is "tail-recursive" (cf. Tail Recursion).

As another example, here is the definition of the standard library function map which maps a given function F to each member of a list Xs:

 
map F []                = [];
map F [X|Xs]            = [F X|map F Xs];

Another example of a (built-in) higher-order function is lambda, which returns another function (cf. Lambda Abstractions). In fact, there is nothing that keeps you from defining a function like lambda yourself in Q. As a proof of concept, here is a toy implementation of the lambda calculus, based on the combinatorial calculus, as described in [Henson 1987]. (This definition will only work with simple kinds of expressions. The built-in lambda function provides a much more comprehensive implementation which also properly handles the obscure cases. But our definition in terms of the combinatorial calculus could be extended to handle those cases, too.)

Our little lambda function is declared as a special form which prevents its arguments from being evaluated. This will be explained in Special Forms.

 
special lambda X Y;

lambda X X              = _I;
lambda X (Y Z)          = _S (lambda X Y) (lambda X Z);
lambda X Y              = _K Y otherwise;

/* combinator rules */

_I X                    = X;            // identity function
_K X Y                  = X;            // constant function
_S X Y Z                = X Z (Y Z);    // argument dispatching

For instance:

 
==> lambda X (2*X)
_S (_S (_K (*)) (_K 2)) _I

==> _ 4
8

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

7.2 Non-Linear Equations

If a variable has multiple occurrences on the left-hand side of a rule, each occurrence must be matched to the same value. In term rewriting theory, such rules are called "non-linear" ("non-left-linear", to be more precise). The following definition implements an operation uniq which removes adjacent duplicates from a list:

 
uniq []                 = [];
uniq [X,X|Xs]           = uniq [X|Xs];
uniq [X|Xs]             = [X|uniq Xs] otherwise;

It is important to notice the difference between the syntactical notion of equality which plays a role in definitions like the one above, and the "semantic" equality relation defined by the `=' operator (cf. Relational Operators). Non-linear equations and equations involving constants, such as

 
uniq [X,X|Xs]           = uniq [X|Xs];
fib 0                   = 0;

call for the verification of syntactic identities. The Q language implements these checks by means of a special relational operator `=='. Thus the above examples could also be written as:

 
uniq [X,Y|Xs]           = uniq [X|Xs] if X==Y;
fib X                   = 0 if X==0;

Two expressions are considered syntactically equal (`==') only if they print out the same in the interpreter. In contrast, the meaning of the normal equality operation `=' depends on built-in rules and equations specified by the programmer. For instance, two different instances of the expression 0 always denote the same object. However, since 0 is an integer and 0.0 a floating point number, these two expressions are syntactically different:

 
==> 0==0.0
false

Nevertheless, the = operator allows to compare integers and floating point numbers. Indeed it asserts that 0 and 0.0 are "equal":

 
==> 0=0.0
true

Note that, in contrast to all the other built-in relational operators, the syntactic equality operator `==' is actually implemented as a special form (cf. Special Forms) which compares its operands as literal expressions without evaluating them:

 
==> 0==0+0
false

This is necessary to ensure proper operation when a nonlinear equation needs to compare two unevaluated special arguments for syntactic identity, but makes the operation somewhat awkward to use directly in programs. As a remedy, you can easily define a non-special syntactic equality operation yourself as follows:

 
eq X X                  = true;
eq _ _                  = false otherwise;

In fact, this definition is already provided in the standard library script stdlib.q. The standard library also defines the logical negations `!=' and neq of `==' and eq, see The Standard Library, for details.

You should also note that the strict separation of syntactic and semantic equality is actually quite useful. First of all, it allows the = operation to be treated in a manner consistent with other comparison operators such as < and >. Secondly, it allows you to tailor the definition of = to your application. For instance, an application might call for a set data structure, and you would like to be able to test whether two expressions of a certain kind represent the same set. Now there are fairly sophisticated data structures for representing sets efficiently, but almost none of them allow you to decide whether two objects represent the same set simply on the basis of syntactic identity. Instead, you will have to provide an explicit = operation which tests whether two set objects contain the same elements.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

7.3 Free Variables

On the right-hand side of an equation (as well as in the condition part and the right-hand sides of local definitions), we distinguish between bound variables, i.e., unqualified variable symbols which are introduced in the left-hand side of an equation (or the left-hand side of a local definition, see Local Variables), and free or global variables which are not bound by any of the left-hand side patterns. Free variables can also be declared explicitly using a var declaration (cf. Declarations). They are effectively treated as (parameterless) function symbols, except that they may be assigned a value by means of a variable definition. Variable definitions have the following syntax:

 
definition              : 'def' expression '=' expression
                          {',' expression '=' expression} ';'
                        | 'undef' identifier {',' identifier} ';'

For instance, in the following equation,

 
foo X                   = C*X;

the variable C occurs free on the right-hand side and we have:

 
==> foo 23
C*23

We can assign a value to the variable as follows (this can be done either in the script or interactively in the interpreter):

 
def C = 2;

After this definition we have:

 
==> foo 23
46

It is worth noting here that, for convenience, the global variable environment uses dynamic binding, i.e., you can change the value of a free variable at any time interactively in the interpreter:

 
==> def C = 3; foo 23
69

This also makes it possible to have recursive definitions like the following:

 
==> var fac = \N.if N>0 then N*fac (N-1) else 1

==> fac
\X1 . if X1>0 then X1*fac (X1-1) else 1

==> map fac [1..10]
[1,2,6,24,120,720,5040,40320,362880,3628800]

(Note that we have to use an explicit variable declaration here, since `fac' is an uncapitalized identifier and hence would otherwise be taken to denote a function symbol. We also made use of a variable definition with initializer which declares and defines the variable in a single step. See Declarations.)

In contrast, bound variables (which are bound either by the left-hand side of the equation, cf. Equations, or in a local variable definition, cf. Local Variables) always use lexical binding, i.e., their values are always completely determined by the (static) context of the equation to which they belong. This is important because it guarantees referential transparency, i.e., the values of the left-hand side and local variables in an equation will only depend on the expression to which the equation is applied, as long as the definition does not involve any operations with side-effects, such as I/O. The values of the free variables may change, but only between different expression evaluations, as there is no function which would allow you to change the value of a global variable during an ongoing evaluation. This is in contrast to other functional programming languages such as Lisp which allow you to modify the value assigned to a variable in a function definition.

Free variables are useful if you have a script which repeatedly refers to the same value whose calculation may be costly or involve functions with side-effects. By assigning such a value to a variable, you can refer to it without having to recompute the value each time it is required, as would be the case with a corresponding function definition.

The right-hand side of a variable definition may be any Q expression, as described in Expressions. The expression is evaluated once, at the time the script is loaded, and is then matched against the left-hand side of the definition, binding variables to the corresponding component values of the right-hand side value. In the special case that the left-hand side is a single variable, it is simply bound to the right-hand side value. Whenever a defined variable occurs "free" on the right-hand side of an equation or variable definition, it will then be replaced by the value of the assigned expression. For instance, the following definition assigns the value exp 1.0 = 2.71828… to the free variable e:

 
var e = exp 1.0;

Variable definitions are carried out in the order in which they occur in a script, so that a definition may refer to the values of variables assigned in previous definitions. It is also possible to override assignments made in previous definitions, and to undefine a variable. Multiple definitions can be performed with a single def statement. For instance:

 
def N = 99, K = N, N = 2*K+1, M = N;
undef N, K;

As already mentioned, the value assigned to a variable may also be changed with def and undef statements in the interpreter, see Command Language.

Note that a variable can also be declared const, which can be used to introduce constant values, as in:

 
public var const c = 299792458; // the speed of light, in m/sec

In this case the variable can only be assigned to once, afterwards attempts to assign a new value to the same variable will produce an error message:

 
==> def c = 300000000
! Cannot redefine const variable
>>> def c = 300000000
                     ^

Free variable definitions can also involve a pattern on the left-hand side which is to be matched against the supplied value. E.g.,

 
def [X|Xs] = [1,2,3];

assigns 1 to the variable X and the remainder of the list, [2,3], to the variable Xs. Pattern matching is performed in the same manner as for the left-hand side of an equation. However, an error is reported if the match fails (as would be the case, e.g., if we tried to assign the empty list to [X|Xs]).

A number of built-in free variables are already defined by the interpreter when it starts up, and there is also a free anonymous variable `_' which is used to record the result of the most recent expression evaluation performed in the interactive evaluation loop of the interpreter. See Command Language, for details.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

7.4 Local Variables

As already mentioned, an equation may also contain one or more auxiliary variable definitions a.k.a. where clauses, which bind additional variables to their corresponding values. In difference to the global definitions of "free" variables discussed in Free Variables, the definitions in where clauses are "local" to the rule in which they occur, meaning that the values of these variables are only available in the rule they are defined in. As in the case of free variable definitions, the left-hand side of each where clause can be an arbitrary expression pattern which is to be matched against the value computed from the corresponding right-hand side. The variables occuring on the left-hand side of the definition will then be bound to their corresponding values, thereby becoming "bound" variables in the context of the rule, in addition to the variables occuring on the left-hand side of the equation. Local definitions also act as additional conditions; the rule will only be applicable if all left-hand sides in where clauses match their corresponding right-hand side values (this is guaranteed, of course, if each left-hand side is a single variable).

Local definitions are useful if the right-hand side of a rule refers repeatedly to the same (maybe complicated) subexpression. You can avoid the recalculation of such a value be assigning it to a variable, and referring to the variable in the right-hand side. For instance:

 
foo X                   = bar Y Z where Y = baz X, Z = qux Y;

As already mentioned, pattern-matching definitions are permitted as well (and the anonymous variable also works as usual). A failing match causes the entire rule to fail. For instance,

 
foo Z                   = bar X Z where [X|_] = Z;

works like

 
foo [X|Xs]              = bar X [X|Xs];

but avoids repeating the list value on the right.

Like conditions, where clauses can also be written on the left-hand side of an equation. This allows you to share the local variables between different equations for the same left-hand side. For instance:

 
foo Z where [X|_] = Z:  = bar X Z if X>0;
                        = bar (-X) Z otherwise;

The variable bindings in a single where clause are performed in the given order, and each definition may refer to all left-hand side variables of the rule, as well as all variables introduced in earlier definitions. Each use of a variable refers to its most recent definition. It is also possible to have multiple where clauses, as in:

 
foo X = bar Y where Y = baz Z where Z = qux U where U = quux X;

Note that here the definitions will be processed from right to left. Such constructs are convenient if there is a series of values to be computed in which one value depends on the next one. Using multiple where clauses, you start off with the definitions immediately required by the right-hand side of the rule, then add another where clause for the variables introduced in the first clause, etc. The above definition is actually equivalent to the following:

 
foo X = bar Y where U = quux X, Z = qux U, Y = baz Z;

It is also possible to mix where clauses with conditions, as in:

 
foo X = bar Y if qux Y where Y = baz Z if quux Z where Z = bazz X;

Again, the different qualifiers are actually considered from right to left; the rule will be aborted as soon as the first qualifier fails (i.e., a condition evaluates to false or the left-hand side of a definition does not match the corresponding right-hand side), so that no time is wasted with definitions which are not needed anyway.

Of course, left-hand side qualifiers work in the same fashion:

 
foo X if qux Y where Y = baz Z if quux Z where Z = bazz X: = bar Y;

You can also mix left-hand and right-hand qualifiers, in which case the left-hand qualifiers will be processed first:

 
foo X if quux Z where Z = bazz X: = bar Y if qux Y where Y = baz Z;

Variables bound by the left-hand side or a where clause can be rebound freely. E.g.:

 
foo X                   = bar (X+1) where X = qux X X
                            if X>0 where X = baz X;

Each occurrence of the variable in the right-hand side of the equation or a local definition then refers to its most recent value. Note that this doesn't mean that local variables are mutable; it just means that there are multiple nested scopes of symbols. More precisely, each local definition introduces a nested scope binding the variables in the left-hand side of the definition. It goes without saying that repeated rebinding of the same variable can be confusing, but sometimes it is appropriate and at other times it might just be more convenient than having to invent a bunch of additional variable names for no other purpose than disambiguation.

Also note that if a variable is bound by the left-hand side or a local definition of a rule, then a global variable of the same name will be "shadowed" within the rule. However, in this case you can still access the value of the shadowed binding with a qualified identifier. For instance, with the following definitions, foo 2 reduces to bar 2*99:

 
/* this is the script foo.q */
def FOO                 = 99;
foo X                   = FOO*foo::FOO where FOO = bar X;

If the free variable belongs to the current module, as in the example above, then it can also be escaped using an inline variable declaration (cf. Constants and Variables), like so:

 
def FOO                 = 99;
foo X                   = FOO*var FOO where FOO = bar X;

Inline variable declarations can also be used to explicitly declare free variable symbols on the fly. If the symbol specified with var has not yet been declared in the current module, it will be declared as a private free variable symbol.

Finally, Haskell programmers should note that Q's where clauses are really just variable definitions, and not (as in Haskell) a kind of "local equations" which would permit you to define "local functions". (There are technical reasons for this which we won't discuss here.) So you shouldn't try to emulate Haskell's local function definitions using Q's where clauses. For instance, while the following definition is perfectly legal Q code, it does not have the intended effect of defining a local factorial function:

 
// this doesn't work as intended
foo X = map fac [1..10] where fac N = if N>0 then N*fac (N-1) else 1;

Instead, the where clause is just an (unsuccessful) attempt to match the right-hand side expression, `if N>0 then N*fac (N-1) else 1', against the left-hand side expression `fac N' and bind the variable N accordingly.

It is actually possible to have a kind of "local functions" in Q as well, but this involves creating a lambda abstraction and assigning it to a local variable. However, this is usually not very convenient and a "global" function definition using equations is almost always preferable. Also note that, because of Q's scoping rules in local definitions, you cannot really have recursive local definitions anyway. For instance, the following definition will not work as intended either (even though the analogous global definition of the factorial works just fine, because of the dynamic binding of global variables):

 
// doesn't work either
foo X = map FAC [1..10] where FAC = \N.if N>0 then N*FAC (N-1) else 1;

Only non-recursive "local functions" can be defined this way. (Well, using the "fixed point combinator" it is in fact possible to define the factorial and actually any recursive function in a form that can be assigned to a local variable, but we won't go into that here.)


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

7.5 Type Guards

The Q language allows you to restrict the set of patterns which can be matched by a variable on the left-hand side of an equation or variable definition. This is done by augmenting any left-hand side occurrence of the variable with a type guard, using the notation: variable:type. For instance, to declare a type BinTree which comprises the constructors nil and bin X T1 T2, the following declaration might be used (cf. Declarations):

 
public type BinTree = const nil, bin X T1 T2;

This declaration introduces two function symbols nil and bin, just as if they were declared by:

 
public const nil, bin X T1 T2;

However, it also assigns these symbols to the type symbol BinTree. The type symbol can then be used as a guard on variables to ensure that a variable is only matched to objects of the given type. E.g., the following equation will only apply if the argument to the foo function has the form nil or bin X T1 T2 for some subexpressions X, T1 and T2:

 
foo T:BinTree           = ...;

This makes it possible to avoid explicit argument patterns like

 
foo nil                 = ...;
foo (bin X T1 T2)       = ...;

in cases in which the component values are not actually required. Also, it allows you to define operations on a given type without actually referring to the constructor symbols of the type, and thereby makes it possible to avoid dependencies on implementation details. You can even "hide" the constructor symbols of a type by declaring them private, which makes the symbols accessible only to the script which declares the type they belong to. For instance:

 
public type BinTree = private const nil, bin X T1 T2;

The Q language allows you to construct a hierarchy of types by making one type a subtype of another. This is done by specifying a supertype following the type identifier in a type declaration. For instance, to make BinTree a subtype of the type SearchTree, BinTree would be declared as follows:

 
type BinTree : SearchTree = const nil, bin X T1 T2;

Now a variable of type SearchTree will not only match SearchTree's, but also any object of its subtype BinTree. In fact, we might have declared SearchTree as an "abstract" type which does not provide any constructors at all:

 
type SearchTree;

Abstract supertypes like SearchTree are useful for factoring out generic operations which are shared by different subtypes; see Types, for examples, and for a more detailed discussion of the type guard concept.

The Q language has a number of predefined type symbols which can be used to distinguish the corresponding types of objects built into the language: Bool, Int, Float, Real (which is the supertype of both Int and Float), Num (the supertype of Real), String, Char (the subtype of String denoting the single-character strings), File, List, Stream, Tuple and Function. (Besides these, there are also two types Exception and SysException for dealing with exception values, cf. Exception Handling.) For instance, the standard library defines an operation isstr, which determines whether its argument is a string, as follows:

 
isstr _:String          = true;
isstr _                 = false otherwise;

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

7.6 Normal Forms and Reduction Strategy

The process in which equations are applied in the Q interpreter to evaluate a given expression is fairly straightforward. In fact, it corresponds to the way in which we manipulate algebraic formulae, and we all learned how to do that in high school.

We say that an equation L=R is applicable to a given expression X, if we can substitute actual values for the variables occurring in L so that we obtain X. This is also expressed as "L matches X" or "X is an instance of L". In this case, we can replace X by Y, where Y is the expression obtained from R by replacing the variables occurring in L by their corresponding values. We shall call such a replacement step a reduction, and write it down as X ⇒ Y.

For instance, given the rule

 
sqr X                   = X*X;

we have that

 
sqr 2                   ⇒ 2*2

since the expression sqr 2 matches the left-hand side sqr X, with 2 being substituted for the variable X. This works with compound expressions as well, even if they involve variables themselves. For instance:

 
sqr (X+1)               ⇒ (X+1)*(X+1).

Here, the compound expression (X+1) has been substituted for the variable X.

Equations are usually applied in the context of a larger expression to be evaluated. For instance, we have that

 
sqr 2 + 2               ⇒ 2*2+2

by the application of the above definition of sqr to the subexpression sqr 2 of sqr 2 + 2.

Before we proceed, a remark on the role of the built-in operations of the Q language is in order. Primitive operations, such as the built-in + and * operators described in Built-In Operators, do not require any equations specified by the programmer. Instead, they may be thought of as being predefined by a large set of built-in equations. Each application of a built-in rule also counts as a single reduction. For instance, we have that

 
2*2                     ⇒ 4

by the built-in rule which handles the multiplication of integers.

Built-in rules take priority over any equations specified by the programmer. However, it is possible to extend the definition of a primitive operation with equations supplied by the programmer. A corresponding example has already been given in Equations. By refining built-in definitions accordingly, you can also treat "exceptions" in built-in operations; see Runtime Errors, for an example.

In order to complete the evaluation of a given expression, we have to repeatedly perform single reductions until no further equations (built-in or user-defined) apply. We then say that the resulting expression is in normal form, and interpret it as the value of the original expression. For instance, the following reduction sequence leads to the normal form (i.e., value) 6 for the target expression sqr 2 + 2 (we still assume the definition of sqr introduced above):

 
sqr 2 + 2  ⇒  2*2+2  ⇒  4+2  ⇒  6.

Normal forms do not necessarily exist. Just as in conventional programming languages, the evaluation may go into an endless loop and never return an answer. Even if a normal form exists, it need not be unique - it may depend on which equations are used in the reduction process and in what order. The problem is that at any point in the reduction process there may be more than one "reducible" subexpression, and even if there is only one such expression (termed a redex), there may be more than one applicable equation.

To illustrate this kind of problem, consider the definition of the sqr function from above, and the target expression sqr (1+1). This expression admits three distinct reduction sequences:

 
sqr (1+1)  ⇒  sqr 2  ⇒  2*2  ⇒  4
sqr (1+1)  ⇒  (1+1)*(1+1)  ⇒  2*(1+1)  ⇒  2*2  ⇒ 4
sqr (1+1)  ⇒  (1+1)*(1+1)  ⇒  (1+1)*2  ⇒  2*2  ⇒ 4.

The second and third reduction sequence appear to be almost the same, but the first reduction sequence makes clear that it matters whether we reduce the subexpression (1+1) before applying the definition of sqr or not. In the present example, the order in which reductions are performed only affects the number of required reduction steps, but it is easy to construct other systems of equations in which both the termination of a normal form evaluation and the computed normal forms actually depend on the evaluation order.

Non-uniqueness of normal forms does not necessarily mean that there must be two or more rules being applicable to a given expression. Ambiguities can also arise when an equation "overlaps" with other equations or with itself, as in the following example:

 
foo (foo X)             = bar X;

This system consisting of a single equation is terminating, but it has two distinct normal forms for the expression foo (foo (foo X)), namely foo (bar X) and bar (foo X).

Indeed, the termination of rewrite systems and the uniqueness of normal forms are important issues in the theory of term rewrite systems [Dershowitz/Jouannaud 1990]. The Q language simply avoids these problems by imposing a reduction strategy which makes the reduction process deterministic. The reduction strategy must specify unambiguously for each reduction step the redex that is to be reduced and the equation which is to be applied. For this purpose the Q interpreter applies the following rules:

The leftmost-innermost strategy is common in many programming languages, and it also corresponds to the "natural" way in which people tend to carry out calculations manually. It means that at any point in the reduction process it is always the leftmost-innermost redex which is to be reduced. In particular, this implies that in an application of the form F X first F is evaluated, then X, before the value of F is applied to the value of X. Thus the first reduction sequence from above,

 
sqr (1+1)  ⇒ sqr 2  ⇒  2*2  ⇒  4

would be the one actually carried out in the interpreter in the course of the evaluation of sqr (1+1).

The leftmost-innermost strategy resolves all ambiguities in the choice of redexes during a normal form evaluation. The second disambiguating rule allows to decide which equation to apply if more than one equation is applicable to a given redex. As indicated, the default is to apply equations in the order in which they appear in the source script. This allows you to have equations with overlapping left-hand sides which naturally arise, e.g., in definitions involving "special case rules" such as the definition of the fib function in Equations:

 
fib 0                   = 0;
fib 1                   = 1;
fib N                   = fib (N-1) + fib (N-2) otherwise;

The rule for applying equations is extended to the built-in equations of primitive operations by assuming - as already mentioned - that the built-in rules come "before" any equations specified by the programmer, and hence always take priority. Also note that external functions, which are declared with the extern modifier and implemented in a C module, are treated in an analogous manner. In general, the interpreter will first try a built-in rule, then an extern rule, then the equations supplied in the script. (The treatment of extern functions is described in more detail in C Language Interface.)


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

7.7 Conditional Rules

Conditional rules are applied in the same manner as simple equations, only we also have to check the conditions of the rule. When the left-hand side of a conditional rule matches the target expression, we first evaluate the condition (with left-hand side variables being replaced as usual). This expression must evaluate to a truth value, otherwise we cannot decide whether the rule is applicable or not. (The Q interpreter generates a runtime error if the result is not a truth value.) The rule is applicable only when the condition evaluates to true.

For instance, recall the definition of the factorial function:

 
fac N                   = N*fac (N-1) if N>0;
                        = 1 otherwise;

The first rule is applicable only if N>0 evaluates to true; if it evaluates to false, the second rule applies. Furthermore, if N happens to be incomparable with 0, then a runtime error will occur.

Here's how this definition is applied to evaluate the expression fac 3:

 
fac 3                   ⇒ 3*fac (3-1)         ⇒ 3*fac 2
                        ⇒ 3*(2*fac (2-1))     ⇒ 3*(2*fac 1)
                        ⇒ 3*(2*(1*fac (1-1))) ⇒ 3*(2*(1*fac 0))
                        ⇒ 3*(2*(1*1))         ⇒ 3*(2*1)
                        ⇒ 3*2                 ⇒ 6.

Local variable definitions (cf. Local Variables) are treated in a similar fashion. We first evaluate the right-hand side of the definition and match it against the corresponding left-hand side. If the match fails, the rule is skipped. Otherwise, we bind variables accordingly and proceed with the next qualifier or the right-hand side of the rule.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

7.8 Rule Priorities

We already discussed that the Q interpreter normally considers equations in the order in which they appear in a script. Hence, if all your equations go into a single script, you simply write down overlapping equations in the order in which they should be tried by the interpreter. However, if equations for the same expression pattern are scattered out over different modules, then the rule order also depends on the order in which the different modules are processed by the compiler. In the current implementation, the rules of each module are processed when the first import or include declaration for the module is encountered during a preorder traversal of the module graph starting at the main script, but you should not rely on this.

Hence it is usually a bad idea to have overlapping equations scattered out over different source files. However, in some situations this is hard to avoid. For instance, you might decide to put all "default rules" for certain expression patterns in a separate module. But if you simply import this module at the beginning of your main script then the default rules might override all the other equations.

To work around this, the Q language allows you to explicitly attach priorities to equations by means of a priority declaration, which has the following format:

 
declaration             : '@' ['+'|'-'] unsigned-number

This specifies the given number (which must be an integer, optionally signed with `+' or `-') as the priority level of the following equations. In the current implementation, priority levels must be 32 bit integers, i.e., in the range -0x80000000..0x7fffffff. Rules are ordered according to their priorities, highest priority first. Rules with the same priority are considered in the order in which they appear in the script.

The priority level set with a priority declaration applies only to the script in which the declaration is located. The default priority level, which is in effect up to the first @ declaration, is 0.

Note that even with priority declarations the built-in operations still take priority over all user-defined rules, i.e., those operations effectively have infinite priority. (The same applies to external functions, see C Language Interface.)

As a simple (and contrived) example for the use of priority declarations, consider the following script:

 
@-1
foo X                   = -1; // "default" rule

@0
foo X:Real              =  0; // "standard" rule

@+1
foo X:Int               =  1; // "special case" rule

The second equation is at the default priority level, 0. The first equation, although it comes before the second one, becomes a "fallback" rule by assigning it a low priority, -1. The last equation is put on a high priority level, +1, and hence overrides both preceding equations. The net effect is that the equations are actually ordered in reverse textual order. You can verify this in the interpreter:

 
==> foo 77
1

==> foo 77.0
0

==> foo ()
-1

Of course, this example is somewhat silly, because in a single script we can easily arrange equations in the correct textual order. However, if for some reason we had to put the three rules into three different modules, then using the given priority declaration ensures that the equations will always be applied in the correct order.

Priority levels should be used sparingly, if at all. Using low priorities to factor out a module with default rules can occasionally be quite useful, but overriding rules with high priorities is considered harmful and should be avoided whenever possible.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

7.9 Performing Reductions on a Stack

The present and the following section contain material for the technically interested reader which, while not being strictly necessary to work successfully with the Q programming language, helps to understand the operation of the Q interpreter and to write better Q scripts. You might wish to skim the following at first reading, and later return to it when you have become more proficient with the Q language and require some further background knowledge.

While the "rewriting model" of expression evaluation introduced in Normal Forms and Reduction Strategy, provides a nice conceptual model for the Q programmer, it is too inefficient to be a useful implementation model for the Q interpreter. In particular, it would be rather inefficient to actually construct the full expressions which occur as intermediate results in the course of a reduction sequence, and repeatedly scan these expressions for the leftmost-innermost redex. Instead, we can evaluate expressions by a simple recursive procedure as follows. The input to the algorithm is an expression X to be evaluated.

  1. If X is an application, list or tuple, evaluate the parts of X recursively, from left to right. Proceed with step 2.
  2. If a built-in (or extern) rule is applicable to X, invoke it and return the corresponding value.
  3. Otherwise, determine the first equation L=R which is applicable to X. If there is no such rule, X is in normal form and is returned as the value of the expression. Otherwise we recursively evaluate R (with variables instantiated to their corresponding values) and return it as the value of the original expression.

(The above description of course leaves out many important details. For instance, the interpreter will also have to take care of def'ed symbols, and we will have to organize the search for an applicable equation. However, here we only want to give the general idea, and not a complete implementation of the interpreter.)

The above procedure can actually be implemented non-recursively if we keep track of the rules which are currently "active", together with the corresponding variable bindings. This information can easily be maintained on a stack. We illustrate this with a simple example. Recall the definition of the add function from Equations:

 
add []                  = 0;
add [X|Xs]              = X+add Xs;

The evaluation of add [1,2,3] then proceeds as follows:

 
add [1,2,3] ⇒ 1+add [2,3]:
        add [2,3] ⇒ 2+add [3]:
                add [3] ⇒ 3+add []:
                        add [] ⇒ 0
                        3+0 ⇒ 3
                add [3] ⇒ 3
                2+3 ⇒ 5
        add [2,3] ⇒ 5
        1+5 ⇒ 6
add [1,2,3] ⇒ 6

Each new level of indentation indicates that we "suspend" the current rule (push it on the stack) and activate a new rule in order to evaluate some part of the right-hand side of the suspended rule. When all parts of the right-hand side have been evaluated, we return to the suspended rule (pop it from the stack) and actually perform the reduction. More precisely, we replace the left-hand side of the suspended rule by the result obtained by evaluating the right-hand side, which is already in normal form. (We will of course have to keep track of the results of intermediate reductions, which can be done on a second "evaluation" stack. When the evaluation of the right-hand side is finished, the corresponding result will be on top of the evaluation stack.)

If desired, we can easily reconstruct the "context" of a rule by following the chain of stacked rules starting from the current rule, and proceeding towards the bottom of the stack. For instance, when we perform the reduction

 
add [] ⇒ 0

in the above example, the context of this rule is described by the following stacked rules:

 
add [3] ⇒ 3+add []
add [2,3] ⇒ 2+add [3]
add [1,2,3] ⇒ 1+add [2,3]

Thus, when add [] gets reduced to 0, we have actually completed the following initial part of the reduction sequence:

 
add [1,2,3] ⇒ 1+add [2,3] ⇒ 1+(2+add [3]) ⇒ 1+(2+(3+add []))
            ⇒ 1+(2+(3+0))

(Note that we never actually constructed intermediate results like the expression 1+(2+(3+add [])). Rather, these expressions are represented implicitly by the contents of the stack.)

We can also apply the above procedure to qualified rules accordingly. When the interpreter comes to consider a conditional rule, it pushes it onto the stack as usual. However, it then first starts to evaluate the qualifying conditions and where clauses of the rule. When the value of a condition has been computed, we can check whether it holds. If the computed value is neither true nor false, the interpreter aborts the evaluation with a runtime error. If it is true, it proceeds by evaluating other qualifiers or the right-hand side. If it is false, however, it gives up on the current rule (pops it from the stack), and considers the next applicable equation. A similar approach is used to handle local variable definitions.

For instance, consider once more the definition of the factorial function:

 
fac N                   = N*fac (N-1) if N>0;
                        = 1 otherwise;

The computation of fac 3 then proceeds as follows (the ∗ symbol indicates that the condition N>0 has evaluated to false and hence the corresponding rule is abandoned):

 
fac 3 ⇒ 3*fac (3-1):
        3>0 ⇒ true
        3-1 ⇒ 2
        fac 2 ⇒ 2*fac (2-1):
                2>0 ⇒ true
                2-1 ⇒ 1
                fac 1 ⇒ 1*fac (1-1):
                        1>0 ⇒ true
                        1-1 ⇒ 0
                        fac 0 ⇒ 0*fac (0-1):
                                0>0 ⇒ false   ∗
                        fac 0 ⇒ 1
                        1*1 ⇒ 1
                fac 1 ⇒ 1
                2*1 ⇒ 2
        fac 2 ⇒ 2
        3*2 ⇒ 6
fac 3 ⇒ 6

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

7.10 Tail Recursion

The above equations for the add and fac functions are examples for recursive definitions. The computational processes generated from such definitions are characterized by chains of suspended rules in which the same rules are activated over and over again. For instance, an evaluation of fac N with N>0 requires N recursive activations of the rule:

 
fac N                   = N*fac (N-1) if N>0;

Hence, if N becomes very large, the recursive definition of fac is in danger of running out of stack space. In the following, we show how to avoid this defect by employing the technique of "tail-recursive programming".

It is a well-known fact that the factorial function also has an "iterative" implementation which can be executed in constant memory space. The idea is to maintain a "running product" P and a "counter" I which counts down from N to 1. The iterative algorithm can be written down in a conventional programming language like Pascal as follows:

 
P := 1; I := N;
while I>0 do begin
  P := P*I;
  I := I-1;
end;

While the Q language does not provide any special looping constructs (and it also lacks an assignment operation), there is an alternative definition of fac which takes up the idea of running products and counters and implements the factorial function in an iterative fashion:

 
fac N                   = fac2 1 N;
fac2 P I                = fac2 (P*I) (I-1) if I>0;
                        = P otherwise;

Here, the "state variables" P and I are implemented as arguments of an "auxiliary" function fac2 which is invoked from fac. Again, this is a recursive definition; it requires N recursive applications of the rule

 
fac2 P I                = fac2 (P*I) (I-1) if I>0;

when fac N is computed. However, in difference to our previous definition of fac, the recursive rule is always applied "on top" of the target expression. Such rules are called tail-recursive. (The name "tail recursion" comes from the fact that the recursive application of fac2 is the last operation considered during the leftmost-innermost evaluation of the right-hand side.) For instance, the evaluation of fac 3 now proceeds as follows (abbreviating reductions by built-in rules for `-' and `*'):

 
fac 3 ⇒ fac2 1 3 ⇒ fac2 (1*3) (3-1)
      ⇒ fac2 3 2 ⇒ fac2 (3*2) (2-1)
      ⇒ fac2 6 1 ⇒ fac2 (6*1) (1-1)
      ⇒ fac2 6 0 ⇒ 6

Tail-recursive definitions can be employed for implementing all kinds of functions which can be computed on a machine with a fixed set of "registers" and no auxiliary memory. For instance, here is a tail-recursive implementation of the Fibonacci function:

 
fib N                   = fib2 1 0 N;
fib2 A B N              = fib2 (A+B) A (N-1) if N>0;
                        = B otherwise;

(This definition also has the desirable side effect that it cuts down the exponential running time of the "naive" definition given in Equations, to linear.)

The Q interpreter employs a clever optimization trick, commonly known as tail recursion optimization (see e.g., [Steele/Sussman 1975]), to actually execute tail-recursive definitions within constant stack space. Hence no special looping constructs are required for implementing iterative algorithms efficiently in the Q language.

Assume that in our stack model of expression evaluation we are working on a reduction X ⇒ Y, and that we already recursively evaluated all parts of the right-hand side Y, but not Y itself. Furthermore, suppose that in order to evaluate Y we will have to apply the rule Y ⇒ Z next. Then, instead of keeping the rule X ⇒ Y suspended and evaluating Y using rule Y ⇒ Z recursively, we can also immediately perform the reduction X ⇒ Y and replace this rule with Y ⇒ Z on the stack. Thus, the new rule Y ⇒ Z will not require any additional stack space at all, but simply reuses the existing "activation record" for the X ⇒ Y rule. In other words, instead of invoking the Y ⇒ Z rule as a "subroutine", we effectively perform a kind of "goto" to the new rule. We also refer to this process as performing the tail reduction X ⇒ Y. The evaluation now proceeds as if we had been working on the rule Y ⇒ Z in the first place.

For instance, with the new definition of fac the evaluation of fac 3 would be carried out using only a single level of suspended rules (again, the ∗ symbol signals abortion of a rule with failing qualifier):

 
fac 3 ⇒ fac2 1 3:
fac2 1 3 ⇒ fac2 (1*3) (3-1):
        3>0 ⇒ true
        1*3 ⇒ 3
        3-1 ⇒ 2
fac2 3 2 ⇒ fac2 (3*2) (2-1):
        2>0 ⇒ true
        3*2 ⇒ 6
        2-1 ⇒ 1
fac2 6 1 ⇒ fac2 (6*1) (1-1):
        1>0 ⇒ true
        6*1 ⇒ 6
        1-1 ⇒ 0
fac2 6 0 ⇒ fac2 (6*0) (0-1):
        0>0 ⇒ false                  ∗
fac2 6 0 ⇒ 6

Besides the tail recursion optimization technique discussed above, the Q interpreter also automatically optimizes toplevel sequences (i.e., applications of the || operator which are not inside a nested subexpression) on the right-hand side of equations, so that basic imperative-style looping constructs can be executed in a tail-recursive fashion as well. Consider, for instance, the standard library function do which applies a function to each member of a list, like the map function discussed in Equations, but instead of collecting the results in an output list simply throws away the intermediate values and returns (). This is useful, of course, only if the function is evaluated solely for its side-effects, e.g., if we want to print out all elements of a list. The definition of do is rather straightforward:

 
do F []                 = ();
do F [X|Xs]             = F X || do F Xs;

Now this definition is not tail-recursive in the strict sense alluded to above, because the last application on the right-hand side of the second rule actually involves (||) and not the do function. (Recall that an infix expression like X||Y is nothing but syntact sugar for the function application (||) X Y.)

However, as a remedy, the Q interpreter actually implements toplevel sequences on the right-hand side of a rule by direct stack manipulation. That is, the first argument of a sequence is thrown away as soon as it has been computed. By these means, the interpreter, after having computed F X, effectively carries out the reduction do F [X|Xs] ⇒ do F Xs on which it can perform tail recursion optimization as usual. Thus the above definition actually works in constant stack space, as one might reasonably expect.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

7.11 Error Handling

There are a few conditions which may force the Q interpreter to give up on a reduction sequence and abort the evaluation with a runtime error message. These conditions are listed below.

All of the above error conditions can also be caught and handled by the running script in any desired manner, see Exception Handling.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

8. Types

Q uses dynamic typing like, e.g., Lisp or Smalltalk, as opposed to statically typed languages such as Pascal, Eiffel or Haskell. In languages with static typing, a variable or function parameter can only hold a value which matches its prescribed type (which can be a "polymorphic" type in languages like Eiffel and Haskell, but still the type of the value is restricted). In dynamically typed languages, however, the actual type of a variable or function parameter value is not known in advance. Consequently, in Q it is only possible to distinguish different types of objects - such as search trees, queues, arrays and the like - by selecting an appropriate set of constructor symbols for each type of object. This chapter discusses Q's notion of type guards which allows you to make the assignment of constructors to a type explicit, and to use this information in order to restrict the applicability of equations to objects of a given type.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

8.1 Using Type Guards

As in any programming language, the careful design of application-dependent data structures is one of the main concerns when developing Q scripts which go beyond simple numeric, string and list processing. As a typical example for a non-list data structure which plays a prominent role in many Q scripts, let us consider binary search trees, which are a convenient means to implement sets, bags, dictionaries and the like. We will use this data structure as a running example throughout this chapter.

A typical choice of constructors for binary trees is the following:

 
public const nil, bin X T1 T2;

To make explicit the fact that nil and bin belong to the binary tree type, instead of the above declaration we use a type declaration like the following, which, besides declaring the constructor symbols, also introduces the type symbol BinTree:

 
public type BinTree = const nil, bin X T1 T2;

This declaration tells the interpreter that each expression of the form nil or bin X T1 T2 should be considered as a member of the BinTree type. This is also known as an algebraic type; the type declaration is essentially a signature of constructor symbols which are used to create the members of the type. The type symbol can then be used as a guard on variables on the left-hand side of equations and variable definitions to ensure that a variable is only matched to objects of the given type, see Type Guards. E.g., the following rule employs such a type guard in order to restrict the argument of the foo function to BinTree objects:

 
foo T:BinTree           = ...;

This makes it possible to avoid explicit argument patterns like

 
foo nil                 = ...;
foo (bin X T1 T2)       = ...;

in cases in which the component values are not actually required. This can simplify matters a lot, in particular if multiple arguments have to be matched to a given type. Also, it is more efficient than checking the type of an object in the qualifier part of a rule by using a user-defined predicate, since the interpreter can use the type information right away in the pattern matching process.

Another important reason for preferring type guards over explicit argument patterns is the issue of "information hiding". With the former definition of the foo function above we do not make any explicit reference to the constructor patterns making up the BinTree data type. This makes it possible to treat BinTree as an abstract data type (ADT) which hides the underlying implementation details (in particular the constructors), while still being able to verify that the proper kind of object is supplied as an argument. Any access to objects of the ADT will then be implemented by referring to the appropriate operations supplied by the type. In fact, we can make the constructors private symbols which are only accessible to the script which implements the BinTree type:

 
public type BinTree = private const nil, bin X T1 T2;

As a concrete example, let us assume the standard search tree operations insert T X and delete T X, which insert an element X into a tree T, and remove it from the tree, respectively. These operations can be implemented as follows (see [Bird/Wadler 1988]):

 
public insert T X, delete T X;
private join T1 T2, init T, last T;

insert nil Y            = bin Y nil nil;
insert (bin X T1 T2) Y  = bin X (insert T1 Y) T2 if X>Y;
                        = bin X T1 (insert T2 Y) if X<Y;
                        = bin Y T1 T2 if X=Y;

delete nil Y            = nil;
delete (bin X T1 T2) Y  = bin X (delete T1 Y) T2 if X>Y;
                        = bin X T1 (delete T2 Y) if X<Y;
                        = join T1 T2 if X=Y;
                        
join nil T2             = T2;
join T1 T2              = bin (last T1) (init T1) T2 otherwise;

init (bin X T1 nil)     = T1;
init (bin X T1 T2)      = bin X T1 (init T2) otherwise;

last (bin X T1 nil)     = X;
last (bin X T1 T2)      = last T2 otherwise;

(Note that the last and init operations return the last element of a binary tree, and a binary tree with the last element removed, respectively. The join, init and last functions are for internal use only, and can hence be implemented as private functions.)

Furthermore, we assume the following function bintree which constructs a binary tree from a list, and the function members which returns the list of elements stored in a tree (in ascending order):

 
public bintree Xs;
bintree Xs:List         = foldl insert nil Xs;

public members T;
members nil             = [];
members (bin X T1 T2)   = members T1 ++ [X|members T2];

(The definition of bintree employs the standard foldl operation, see The Standard Library.) We can use the interface operations of the BinTree ADT in order to implement the functions union and diff which add and remove all members of a tree to/from another tree:

 
public union T1 T2; // add elements of T2 to T1
union T1:BinTree T2:BinTree
                        = foldl insert T1 (members T2);

public diff T1 T2; // remove elements of T2 from T1
diff T1:BinTree T2:BinTree
                        = foldl delete T1 (members T2);

Observe that no explicit reference is made to the BinTree constructors; we only employ the public "interface" functions insert, delete and members of the BinTree ADT. This allows us to change the realization of the data structure without having to rewrite the definitions of union and diff. Still, the definitions of union and diff are "safe"; the BinTree type guard ensures that the arguments passed to union and diff are indeed BinTree objects capable of carrying out the requested operations.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

8.2 Built-In Types

Type guards are also the only way for verifying that the actual value of a variable belongs to one of the built-in types of integers, floating point numbers, strings, files and lambda functions, since there is no way for writing out all "constructors" for these kinds of objects - there are infinitely many (at least in theory). For this purpose, the type symbols Int, Float, String, File and Function are predefined. There also is a type named Char which denotes the single-character strings. Technically, Char is a subtype of the String type; more about that in Sub- and Supertypes. Moreover, Char is also treated as an enumeration type, see Enumeration Types, below.

Beginning with version 7.2, Q also offers a more elaborate "numeric tower" which makes it easier to integrate user-defined number types. There is a type Real which is a subtype of the Num type and the supertype of both Int and Float. Moreover, the standard library defines the complex number type Complex which is a subtype of Num, as well as the rational number type Rational which is a subtype of Real, see The Standard Library.

The built-in List type matches all list expressions of the form [] or [X|Xs]. This type is used to restrict the applicability of an equation to list arguments. For instance, the following equations define a function reverse which reverses a list:

 
reverse Xs:List         = foldl push [] Xs;
push Xs:List X          = [X|Xs];

The Stream and Tuple types are completely analogous: they match streams and tuples of arbitrary sizes, i.e., expressions of the form {} and {X|Xs} or () and (X|Xs), respectively.

The predefined Bool type is a means to refer to objects which are truth values. Its built-in definition is as follows:

 
public type Bool = const false, true;

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

8.3 Enumeration Types

Types like the built-in Bool type, which only consist of nullary const symbols, are also called enumeration types. You can easily define such types yourself, e.g.:

 
type Day = const sun, mon, tue, wed, thu, fri, sat;

The Q language provides special support for enumeration types (including the built-in Bool type, and also the Char type), by means of the following operations:

Note that while there is no direct operation for converting ordinal numbers back to the corresponding members of a given type, you can easily accomplish this using arithmetic:

 
==> ord thu
4

==> sun+4
thu

Enumeration type arithmetic also provides a quick way to check whether two enumeration type members belong to the same type:

 
==> isint (tue-thu)
true

Here we employed the standard library function isint to determine whether the ordinal difference of the two values is actually an integer. This works because the `-' operator will fail if the two constants belong to different enumeration types.

Using "enumerations" (which are described in detail below), you can also list all enumeration type members starting at a given value as follows:

 
==> [sun..]
[sun,mon,tue,wed,thu,fri,sat]

Let us now take a more in-depth look at enumerations. The central operation to create an enumeration is the built-in enum function, which lists all members of an enumeration type in a given range:

 
==> enum mon fri
[mon,tue,wed,thu,fri]

The enum_from function works like enum, but only takes a single argument and lists all members of the type starting at the given value:

 
==> enum_from mon
[mon,tue,wed,thu,fri,sat]

Note that the default "step size" is 1. An arbitrary step size can be indicated by invoking enum or enum_from with two initial members in the first argument as follows:

 
==> enum [sun,tue] sat
[sun,tue,thu,sat]

==> enum [sat,fri] sun
[sat,fri,thu,wed,tue,mon,sun]

==> enum_from [sun,tue]
[sun,tue,thu,sat]

==> enum_from [sat,fri]
[sat,fri,thu,wed,tue,mon,sun]

The notation [X1,X2,…,Xn..Y] is provided as syntactic sugar for enum [X1,X2,…,Xn] Y, so you can also write the following:

 
==> [mon..fri]
[mon,tue,wed,thu,fri]

==> [sun,tue..sat]
[sun,tue,thu,sat]

==> [sat,fri..sun]
[sat,fri,thu,wed,tue,mon,sun]

Likewise, [X1,X2,…,Xn..] is the same as enum_from [X1,X2,…,Xn]:

 
==> [mon..]
[mon,tue,wed,thu,fri,sat]

==> [sun,tue..]
[sun,tue,thu,sat]

==> [sat,fri..]
[sat,fri,thu,wed,tue,mon,sun]

We have already mentioned that the built-in Char type is also an enumeration type, which consists of all the characters in the Unicode character set. Thus arithmetic on characters works as expected:

 
==> "a"+5
"f"

==> "z"-2
"x"

==> "8"-"0"
8

The enum operation also applies to characters accordingly:

 
==> enum "a" "k"
["a","b","c","d","e","f","g","h","i","j","k"]

==> ["a".."k"]
["a","b","c","d","e","f","g","h","i","j","k"]

==> ["k","j".."a"]
["k","j","i","h","g","f","e","d","c","b","a"]

The enum_from operation works with Char, too, but this is usually rather useless because the Unicode character set is fairly large and thus it is rather inefficient to construct a complete list of all characters starting at a given value. The right way to work with such huge character enumerations is to take advantage of lazy evaluation and use a stream enumeration instead, see below.

The standard prelude (see The Standard Library) extends succ, pred and enum (but not enum_from, for obvious reasons) to integers, which effectively turns the integers into an (albeit infinite) enumeration type:

 
==> succ 0
1

==> pred 0
-1

==> [0..9]
[0,1,2,3,4,5,6,7,8,9]

==> [0,2..9]
[0,2,4,6,8]

==> [9,8..0]
[9,8,7,6,5,4,3,2,1,0]

The standard prelude also implements enum (but not succ and pred) on floating point values:

 
==> [0.1,0.2..1.0]
[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0]

Note that it is also possible to invoke enum or enum_from with more than two members in the first argument. That is, you can construct enumerations with more than two initial members. Neither the interpreter nor the library provides any definitions for these, so it is up to you to introduce your own interpretations of such constructs when you need them. For instance:

 
[X1,X2,X3..Y] = while (<=Y) (F*) X1 if (F>1) and then (Z3=X3)
                  where F:Int = X2 div X1, Z3:Int = X2*F;

This definition employs the standard library function while to construct an increasing geometric integer series if it detects a constant integer ratio between the initial three members:

 
==> [2,4,8..128]
[2,4,8,16,32,64,128]

Tuple enumerations work in exactly the same fashion as list enumerations. The corresponding operations are called tupleenum and tupleenum_from. The same kind of syntactic sugar is provided, the only difference being that ordinary parentheses are used instead of brackets. Example:

 
==> (9,8..0)
(9,8,7,6,5,4,3,2,1,0)

The standard prelude also provides enumerations for lazy lists a.k.a. streams, cf. Streams. The streamenum and streamenum_from functions work like enum and enum_from, respectively, but construct streams instead of lists. In the case of streamenum_from, the produced stream may be infinite. The Q language provides the same kind of syntactic sugar for stream enumerations as for lists, the only difference is that curly braces are used instead of brackets. For instance, {1..9} denotes a stream consisting of a finite arithmetic sequence, while {0..} is the infinite stream of all nonnegative integers.

Because of lazy evaluation, streamenum_from can often be used to work around the inefficiencies of enum_from when you have to deal with huge enumerations of characters. For instance, here is how we can use a stream enumeration to count the number of all alphabetic Unicode characters:

 
==> #filter isalpha {chr 1..}
90342

This will take a little while, but works o.k. because we never actually construct the complete list of all Unicode characters, as would be the case when doing the same with a list enumeration.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

8.4 Sub- and Supertypes

The Q programming language also provides a subtype concept similar to the notion of single inheritance in object-oriented programming languages such as Smalltalk. For instance, we can modify our declaration of the BinTree type (cf. Using Type Guards) in order to make BinTree a subtype of the supertype SearchTree as follows:

 
public type BinTree : SearchTree = private const nil, bin X T1 T2;

Quite often, supertypes are abstract types which do not provide their own set of constructor symbols, but are simply used to factor out common operations shared among several "concrete" types. For instance, the SearchTree type might have been declared simply as follows:

 
public type SearchTree;

Now variables of type SearchTree will also match objects of its subtype BinTree, as well as of any other subtype of SearchTree. We can turn the union and diff functions from Using Type Guards, into operations on the SearchTree type as follows:

 
union T1:SearchTree T2:SearchTree
                        = foldl insert T1 (members T2);

diff T1:SearchTree T2:SearchTree
                        = foldl delete T1 (members T2);

As the next step, we might introduce another type AVLTree providing the same interface operations insert, delete and members as the BinTree type, but implementing these operations in terms of AVL trees rather than simple binary trees. (AVL trees are a variant of binary search trees in which the trees are kept balanced, and thus logarithmic insertion and deletion times can be guaranteed.) If we make AVLTree another subtype of SearchTree, then the union and diff operations can be applied to AVLTree objects just as well as to BinTree's. In fact, the operations will even work if we mix argument types, e.g., provide a BinTree as the first argument of union and an AVLTree as the second! By these means, it is possible to define polymorphic operations which are applicable to several different types sharing the same (subset of) interface functions.

For the sake of concreteness, here is an implementation of the AVLTree type. The shl, shr, rol and ror functions implement the common AVL tree rotation operations which are used to keep the tree balanced; see [Bird/Wadler 1988] for details.

 
/* H denotes the height of a nonempty AVL tree */

public type AVLTree : SearchTree = private const anil, abin H X T1 T2;

public avltree Xs;

private mknode X T1 T2;
private join T1 T2, init T, last T, height T, slope T, rebal T;
private rol T, ror T, shl T, shr T;

avltree Xs:List         = foldl insert anil Xs;

members anil            = [];
members (abin H X T1 T2)
                        = members T1 ++ [X|members T2];

insert anil Y           = abin 1 Y anil anil;
insert (abin H X T1 T2) Y
                        = rebal (mknode X (insert T1 Y)) T2 if X>Y;
                        = rebal (mknode X T1 (insert T2 Y)) if X<Y;
                        = abin H Y T1 T2 if X=Y;

delete anil Y           = anil;
delete (abin H X T1 T2) Y
                        = rebal (mknode X (delete T1 Y) T2) if X>Y;
                        = rebal (mknode X T1 (delete T2 Y)) if X<Y;
                        = join T1 T2 if X=Y;

join anil T2            = T2;
join T1 T2              = rebal (mknode (last T1) (init T1) T2) otherwise;

init (abin H X T1 anil) = T1;
init (abin H X T1 T2)   = rebal (mknode X T1 (init T2)) otherwise;

last (abin H X T1 anil) = X;
last (abin H X T1 T2)   = last T2 otherwise;

/* mknode constructs a tree node, computing the height value */

mknode X T1 T2          = abin (max (height T1) (height T2) +1) X T1 T2;

/* height and slope compute the height and slope (difference between
   heights of the left and the right subtree), respectively */

height anil             = 0;
height (abin H T1 T2)   = H;

slope anil              = 0;
slope (abin H X T1 T2)  = height T1 - height T2;

/* rebal rebalances after single insertions and deletions */

rebal T                 = shl T if slope T = -2;
                        = shr T if slope T = 2;
                        = T otherwise;

/* rotation operations */

rol (abin H1 X1 T1 (abin H2 X2 T2 T3))
                        = mknode X2 (mknode X1 T1 T2) T3;
                        
ror (abin H1 X1 (abin H2 X2 T1 T2) T3)
                        = mknode X2 T1 (mknode X1 T2 T3);

shl (abin H X T1 T2)    = rol (mknode X T1 (ror T2)) if slope T2 = 1;
                        = rol (abin H X T1 T2) otherwise;

shr (abin H X T1 T2)    = ror (mknode X T1 (ror T2)) if slope T2 = -1;
                        = ror (abin H X T1 T2) otherwise;

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

8.5 Views

As of version 7.7, Q provides a simple but effective implementation of views, a device originally proposed by Wadler [Wadler 1987] which provides a means to equip abstract data types with concrete representations in terms of "virtual constructors", which can be used for pattern matching just like "real" constructors are used when dealing with concrete algebraic data types. In the Q language, views are also used for deciding syntactic equality of external types (see Non-Linear Equations) and, last but not least, for pretty-printing purposes.

The first ingredient of a view is a definition for the built-in special form view which returns a representation of a data object, usually in terms of a public "construction" function for objects of the type. For instance, given the BinTree and AVLTree types from the previous section, we might provide the following rules:

 
view T:BinTree          = '(bintree Xs) where Xs = members T;
view T:AVLTree          = '(avltree Xs) where Xs = members T;

Here we used the construction functions bintree and avltree to build a representation of binary and AVL trees in terms of their member lists. After adding these rules to the program in Sub- and Supertypes, the representation defined by these rules will be used whenever an object of these types is printed in the interpreter (as well as by the built-in str and write functions). For instance:

 
==> def T1 = avltree [17,5,26,5], T2 = bintree [8,17]

==> union T1 T2; diff T1 T2
avltree [5,5,8,17,17,26]
avltree [5,5,26]

Two things are worth noting here. First, the view function always has to return a quoted expression. A description of the quote constructor `'' can be found in Special Forms; essentially, it just protects its argument from being evaluated. The need for this becomes apparent when considering that views are just expressions which, when evaluated, would construct a value of the type for which the view is being defined; thus the expression returned by view has to be treated as a literal, and the `'' operator takes care of that. Second, if you take a look at the example above, you will notice that we defined the views in such a manner that they would actually reconstruct the original values if you would evaluate them from the command line. It goes without saying that this behaviour is highly recommended, although the interpreter does not enforce this condition.

The second ingredient of a view definition is the declaration of one or more virtual constructors for the type. (In fact this is only needed if you also want to be able to use the view for pattern matching.) In our example, it suffices to turn the construction functions used in the view, bintree and avltree, into virtual constructors of the BinTree and AVLTree types, respectively. To these ends, we remove the public declarations of bintree and avltree and modify the type declarations as follows:

 
public type BinTree : SearchTree = virtual bintree Xs
                                 | private const nil, bin X T1 T2;
public type AVLTree : SearchTree = virtual avltree Xs
                                 | private const anil, abin H X T1 T2;

This lets us use the virtual constructors in pattern-matching definitions just as if they were real constructors. The views of these objects will be constructed on the fly, as they are required during pattern matching. Note that since the virtual constructors, as before, are implemented as public functions, this will even work outside the module where the data types are defined. E.g.:

 
==> def T1 = avltree [17,5,26,5], T2 = bintree [8,17]

==> diff T1 T2
avltree [5,5,26]

==> def avltree [X,Y,Z] = diff T1 T2; (X,Y,Z)
(5,5,26)

Of course, we can also use these "virtual" patterns in equations:

 
test (avltree [X|_])    = X>0;

And they also work in lambdas:

 
==> (\(avltree [X|_]).X) (union T1 T2)
5

It is also possible to define views in a recursive fashion so that, e.g., the tree members are made available one at a time. For instance, we might want to create a list-like view which delivers the tree members in ascending order. Taking the BinTree data type as an example, this can be achieved as follows:

 
public type BinTree : SearchTree = virtual empty, cons X T
                                 | private const nil, bin X T1 T2;

empty                   = nil;
cons X T:BinTree        = insert T X;

private first T;

view nil                = 'empty;
view T:BinTree          = '(cons X T)
                            where X = first T, T = delete T X;

first (bin X nil _)     = X;
first (bin X T1 T2)     = first T1 otherwise;

public bintree Xs;

// ...

Note that the first function returns the smallest member in the tree. The view is defined in terms of two virtual constructors empty (which returns the empty tree) and cons (which takes a value and a tree as parameters and inserts the value into the tree). With these definitions we now have:

 
==> bintree [5,1,9,3]
cons 1 (cons 3 (cons 5 (cons 9 empty)))

==> def (cons X (cons Y _)) = _; (X,Y)
(1,3)

The main advantage of this approach is that, at least for the purpose of pattern matching, the view only needs to be constructed as far as required to extract the desired number of tree elements. In contrast, employing bintree as the virtual constructor, as in our first example above, is much more inefficient since the entire list of members will be computed whenever the view is needed in pattern matching.

Another important point worth mentioning here is that a virtual constructor does not really belong to the type, even though it is declared as one of its members. More precisely, only "proper" objects which were constructed with the real constructors of the type will ever match the corresponding type guard. Nevertheless, it is possible to have definitions like the following, which first try a couple of explicit virtual constructor patterns and then fall back to a guarded variable:

 
type Foo = virtual foo X, goo X | const bar X, baz X;

view (bar X)            = '(foo X);
view (baz X)            = '(goo X);

foo X                   = bar X;
goo X                   = baz X;

test (foo X)            = X;
test X:Goo              = X otherwise;

Note that in this case, if the default alternative is taken (i.e., the last equation for test), the constructed view is returned (after being evaluated) rather than the original expression. Of course this will lead to unexpected results if the view, when evaluated, does not reconstruct the original expression, hence our recommendation that Y ⇒ X if view X ⇒ 'Y.

Another word of caution: Definitions mixing patterns with virtual and nonvirtual constructors of a given type will not work in the current implementation. To avoid backtracking during pattern matching, a view will always be constructed for each parameter which is matched against at least one virtual constructor; in such a case the nonvirtual patterns are effectively disabled. The compiler will warn you about such situations when it is invoked with the -w option (see Using Q). If you get this "overlap between virtual and nonvirtual constructors" warning it means that you should rewrite your definition so that it uses either real or virtual constructors, but not both at the same time. That should be easy enough to do; usually you will use the real constructors inside the module which defines the data type, and virtual constructors otherwise.

Finally, note that views can also be used in exactly the same fashion with external types. The only difference here is that an external type does not have any real constructors and thus the view must be created using the operations provided for the type. For instance, consider an external type Bar which represents pairs of integer values constructed with an external function bar (cf. Writing a Module):

 
public extern type Bar = virtual extern bar I J;

Assume that we have another extern function bartuple which returns the contents of a Bar object as a pair of two integer values:

 
public extern bartuple B;

Now a suitable view for Bar can be defined as follows:

 
view B:Bar              = '(bar I J) where (I,J) = bartuple B;

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

9. Special Forms

As discussed in Equations and Expression Evaluation, the Q interpreter evaluates expressions in applicative, i.e., leftmost-innermost, order. This means that the arguments of a function are usually evaluated before the function is applied to them, which is also known as call by value, strict or eager evaluation. Occasionally, it is useful or even essential to defer the evaluation of arguments until they are actually required in the course of a computation. For this purpose, the Q language lets you introduce so-called special forms which receive their arguments unevaluated (i.e., using call by name or non-strict evaluation). This chapter discusses how these constructs are defined and used.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

9.1 Basic Concepts

Consider the following definition from the standard library which implements a simple kind of conditional expression (see also Conditionals and Comprehensions):

 
ifelse P X Y            = X if P;
                        = Y otherwise;

The ifelse function takes as its first argument a truth value which is used to determine whether the second or third argument is to be returned as the value of the conditional expression. Although the above definition is perfectly correct, using applicative order evaluation with this definition is clearly inappropriate since all arguments must already have been evaluated before the ifelse function gets applied to them. Since either the second or third argument is simply thrown away, the effort involved in the evaluation of this argument is wasted. As an extreme case, e.g., Y might not have a terminating evaluation at all, in which case the evaluation of ifelse P X Y would not terminate either even though Y is actually not required if P happens to evaluate to true.

Instead, we would like to have ifelse evaluate its arguments only as they are required. In the Q language, this can be done by simply declaring ifelse as a special form. All we have to do is to precede the above equations with the following declaration:

 
public special ifelse P X Y;

The syntax of such declarations has already been introduced in Declarations. The special keyword must be followed by a function identifier and a sequence of variable symbols (the variable symbols only serve to specify the number of arguments, and are otherwise treated as comments). If the argument list is omitted, the function symbol is actually treated as an ordinary (non-special) symbol. Otherwise the given arguments are declared as special (a.k.a. call by name) arguments, i.e., arguments which are treated as literals and hence are not evaluated when the given function is applied to them. Special arguments become so-called deferred values (also known as thunks or suspensions in the literature) which will be left unevaluated as long as they are "protected" by a special form. But as soon as the value of a thunk is referred to in a "non-special" context, the deferred value will automatically be evaluated by the interpreter.

Thus the above declaration of the ifelse function tells the Q interpreter that this function expects three special arguments P, X and Y which should be left unevaluated until their values are actually required in the qualifier or the right-hand side of an equation in ifelse's definition. Consequently, when the interpreter comes to consider the first rule,

 
ifelse P X Y            = X if P;

it will first evaluate P to obtain the value of the condition part of this rule. If P evaluates to true, X will be evaluated and returned as the value of the left-hand side expression. Otherwise (if P evaluates to false), X will be left unevaluated, and the interpreter considers the next rule,

 
ifelse P X Y            = Y otherwise;

which causes it to reduce ifelse P X Y to the value of Y.

(Note that in cases like the above example, where a special argument forms the right-hand side of a rule, the interpreter performs tail call elimination as usual (cf. Tail Recursion). That is, it will actually evaluate the special argument after performing a tail reduction of the rule. This is important for simple recursive functions involving special forms like ifelse which can be executed in constant stack space. This also applies, in particular, to the built-in special operators (and then) and (or else), see Built-In Special Forms.)

It is important to note here that special forms are indeed a runtime feature in the Q language. This means that special forms are not only recognized at compile time, but also when they are passed as arguments or returned as function results in the course of a computation (which usually cannot be predicted at compile time). For instance, if we define foo as follows:

 
special bar X;
foo                     = bar;

then foo (1+1) evaluates to bar (1+1) and not to bar 2, although foo itself is not declared to be a special form. This also works if you pass bar as a functional parameter:

 
special bar X;
foo F X                 = F (X+1);

Given the above definition, foo bar 1 will evaluate to bar (1+1).

On the other hand, you must take care that arguments to special functions which are passed on by other functions are not evaluated too early. For instance, if the apply function is defined as follows:

 
apply F X               = F X;

and is then invoked as apply bar (1+1) then (1+1) will be evaluated even though bar is a special form. The reason is that the argument (1+1) is evaluated before apply is applied to it, since we have not declared apply as a special form as well. As a general rule, you should make any function a special form that passes its arguments to another special form, unless you explicitly wish to evaluate these arguments.

Up to now, we have only considered "pure" special forms, which receive all their arguments unevaluated. Many functions, however, require a mixture of special and non-special arguments. The Q language provides two different methods for dealing with such situations. First, you can force the evaluation of a special argument at runtime using the `~' (tilde or "force") operator:

 
special foo P X;
foo P X                 = ifelse ~P (bar X) X;

Here, the condition P is evaluated before it is passed on to the ifelse special form (which is as defined above). This method is useful if we only occasionally have to evaluate a parameter of a special form. (Note that you can also force expressions outside a special form; in this case `~' acts as the identity operation, i.e., the argument expression (which is already in normal form) is simply returned as is.)

If we always want to evaluate a given parameter of a special function, we can also declare the corresponding argument as non-special which effectively turns the argument into an ordinary call by value parameter. This is done by preceding the argument with the `~' symbol in the declaration of the function symbol. For instance, in the above definition of the ifelse function the first argument will be evaluated anyway. Hence we may as well make it a non-special argument. The corresponding declaration reads as follows:

 
public special ifelse ~P X Y;

With this declaration (which is precisely how ifelse is actually declared in the standard library), the ifelse function always receives its first argument evaluated, while the other two arguments are special. This works exactly like an explicit application of the `~' operator, but releases you from the burden of having to force evaluation of the argument in every application of the ifelse function. It is also slightly more efficient since the argument does not have to be constructed as a literal expression, but can be evaluated right away before the ifelse function is applied to it.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

9.2 Special Constructors and Streams

Special forms prevent their arguments from being evaluated until they are used in a context which is not "protected" by another special form. This also works if the special form happens to be a constructor. Special constructors thus allow you to define algebraic data structures which act as containers for deferred expressions. By these means Q enables you to create "lazy" data structures which work pretty much like their counterparts in non-strict functional languages like Haskell. A prime example for this are streams, the lazy list data structure, which was briefly introduced in Lists, Streams and Tuples. As of Q 7.1, this data type is now predefined as follows:

 
public type Stream = special const nil_stream, cons_stream X Xs;

The constant symbol nil_stream denotes an empty stream, while the special constructor symbol cons_stream is used to represent a nonempty stream with head X and tail Xs. (Please note that, as of Q 7.1, these symbols are now builtins and thus cannot be imported from the stream.q module any more.) As of Q 7.1, the language also provides some syntactic sugar for these constructs to make them look more list-like. In particular, {} is the same as nil_stream and {X|Xs} the same as cons_stream X Xs. More generally, the notation {X1,X2,…,Xn|Y} can be used as a shorthand for cons_stream X1 (cons_stream X2 … (cons_stream Xn Y)…). Likewise, {X1,X2,…,Xn} is the same as cons_stream X1 (cons_stream X2 … (cons_stream Xn nil_stream)…).

Just like lists, a stream consists of a head element, X, and a stream of remaining elements, the tail Xs. The standard library module stream.q extends most list operations to streams, see Streams. For instance, the list operations hd and tl are easily translated to streams as follows:

 
hd {X|_}                = X;
tl {_|Xs}               = Xs;

Lisp programmers should note that this implementation does not require any explicit "delay" and "force" operations; all necessary evaluations are performed implicitly by the Q interpreter. As a simple example, consider the following definition for the stream of all integers starting from N:

 
ints N                  = {N|ints (N+1)};

Obviously, this definition would not work with eager evaluation, because the recursive invocation of ints would send the interpreter into an infinite recursion. However, since streams are implemented as a special form, the evaluation of the nested ints term is deferred, and the tails of the stream are only produced as we access them, using "call by need" evaluation:

 
==> ints 1
{1|ints (1+1)}

==> tl _
{2|ints (2+1)}

==> tl _
{3|ints (3+1)}

Now this is all good and fine, but it also raises a question, namely how are we going to write definitions which match against subterms in a deferred value if the needed parts have not been evaluated yet? For instance, suppose that we want to extract the next three elements from the above stream. We would actually like to write something like the following:

 
==> _
{3|ints (3+1)}

==> def {X,Y,Z|_} = _

==> (X,Y,Z)
(3,4,5)

In fact this works as expected, even though {3|ints (3+1)} does not literally match the pattern {X,Y,Z|_}, because the Q interpreter takes into account special data constructors during pattern matching and will automagically evaluate deferred subterms as needed.

This also works with non-special arguments of a function in equations. E.g., consider the following definition of a deinterleave operation which takes adjacent stream elements and turns them into pairs:

 
deinterleave {}         = {};
deinterleave {X,Y|Xs}   = {(X,Y)|deinterleave Xs};

Note again that the pattern {X,Y|Xs} on the left-hand side of the second equation does not match a stream value like ints 1 ⇒ {1|ints (1+1)} literally, but it does match after the embedded deferred subterm ints (1+1) has been reduced to {2|ints (2+1)}. The interpreter performs the recursive evaluation automatically, and hence we have:

 
==> deinterleave (ints 1)
{(1,2)|deinterleave (ints (2+1))}

==> tl _
{(3,4)|deinterleave (ints (4+1))}

This "call by need" evaluation of argument values which is performed automatically during pattern matching, is also known as call by pattern. In the Q language, call by pattern evaluations are always done in variable definitions, and also when matching against the left-hand sides of equations, but only for non-special toplevel arguments.

In contrast, special toplevel arguments in an equation are always passed unevaluated, "as is"; that's their purpose after all. Thus, e.g., if foo is a special form then a pattern like foo {X,Y,Z|_} will only match if the provided stream argument literally has three elements in front; no implicit evaluation of stream tails to obtain the required number of stream elements will be performed in this case.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

9.3 Memoization and Lazy Evaluation

Another issue arising when dealing with special forms is the problem of repeated evaluations of the same deferred expression. Note that normally a special argument is evaluated each time its value is required on the right-hand side of a definition. Consider, for instance:

 
special foo X;
foo X                   = bar X X;

Assuming that bar is non-special, X actually gets evaluated twice, once for each occurrence on the right-hand side of the equation. This may be undesirable for reasons of efficiency. One method to cope with such a situation is to introduce a local variable binding, e.g.:

 
special foo X;
foo X                   = bar Y Y where Y = X;

However, in some cases this may be inconvenient or downright impossible. In particular, the deferred value might actually be stored inside a special data element such as a stream (cf. Special Constructors and Streams). For instance, consider the case of a stream map foo {1..} where foo is a function on integers. Then, whenever we access an element of this stream, the corresponding expression foo I will have to be recomputed, which might be costly, depending on the function foo at hand.

To avoid such inefficiencies, the Q language provides the `&' (ampersand or "memoize") operator. When applied to a special argument (such as a stream member) it causes the value to be memoized. This means that the value will only be computed the first time it is needed. Each subsequent access will then simply return the already computed value.

In our example we can memoize both the heads and the tails of the stream, by applying `&' recursively to all heads and tails. The standard library contains the function lazy which does just this. All we have to do is to apply this function to the existing stream to have all elements memoized:

 
==> var foo = \X.writes $ "foo: "++str X++"\n" || X

==> def S = lazy (map foo {1..})

==> list (take 2 S)
foo: 1
foo: 2
[1,2]

==> list (take 4 S)
foo: 3
foo: 4
[1,2,3,4]

Note that when computing the first four stream members with list (take 4 S), only elements number 3 and 4 were computed, since the first two elements had already been computed with list (take 2 S) before.

Functional programming theorists also denote the above style of computation, which combines call by name with the memoization of already computed thunks, as call by need or simply as lazy evaluation.

Lazy evaluation paves the way for some important and powerful programming techniques. In particular, streams are a useful device to implement different kinds of backtracking and dynamic programming algorithms in a uniform setting. For instance, here is a recursive definition of the stream of all Fibonacci numbers:

 
fibs = {0,1|zipwith (+) fibs (tl fibs)};

Unfortunately, this implementation is rather slow because of the multiple recursive invocations of fibs. But this can be fixed by memoizing the stream with lazy and assigning it to a variable:

 
var fibs = lazy {0,1|zipwith (+) fibs (tl fibs)};

By these means we ensure that the stream is built in-place, by "chasing its own tail". This works because Q's global variable environment uses dynamic binding, and the (unevaluated) instance of fibs on the right-hand side of the definition will not be evaluated before the variable has already been defined. Thus the stream effectively includes circular references to itself.

The reason that special arguments are not memoized automatically at all times is that some arguments may involve functions with side-effects, in which case memoization is often undesirable. For instance, we can compute a list of five random values quite conveniently using a list comprehension as follows:

 
==> [random : I in [1..5]]
[1809930567,1267130897,1713466763,574472268,2355190805]

This works because the random subterm is actually a special parameter to the standard library function listof implementing list comprehensions (see Conditionals and Comprehensions), so it will be recomputed as each list element is produced. However, if we memoize the random subterm then the function will only be evaluated once to give the value for all list members:

 
==> [&random : I in [1..5]]
[3194909515,3194909515,3194909515,3194909515,3194909515]

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

9.4 Built-In Special Forms

Besides streams, the Q language has a number of other built-in operations which are actually implemented as special forms. This is necessary, in particular, in the case of the syntactic equality operator `==' (see Non-Linear Equations), as well as the logical connectives and then and or else which are evaluated in "short-circuit mode", using the following built-in equations (see Logical and Bit Operators):

 
true and then X         = X;
false and then X        = false;

false or else X         = X;
true or else X          = true;

The first argument of these operations is non-special, but the second one is a special argument which is only evaluated if it is needed. For instance, if the first argument of and then is false, false is returned - there is no need to take a look at the second argument. The or else operation works analogously. These rules allow the Q interpreter to perform the following reductions immediately, without ever having to evaluate the second argument X:

 
false and then X        ⇒ false
true or else X          ⇒ true

On the other hand, if the first argument of and then is true (or the first argument of or else is false), then the value of the second argument is returned:

 
true and then X         ⇒ X
false or else X         ⇒ X

In this case the interpreter also performs tail call optimization, i.e., the reduction to X is actually carried out before X is evaluated. This implies that tail recursions involving and then and or else are executed in constant stack space, as one might reasonably expect. (The same applies to user-defined special forms like ifelse, see Basic Concepts, which are defined in a similar fashion.) For instance, consider the following example from the standard library which defines a function all that checks whether all members of a list satisfy a given predicate P:

 
all P []                = true;
all P [X|Xs]            = P X and then all P Xs;

Note that if P X in the second rule evaluates to true, then the right-hand side immediately reduces to the tail-recursive call all P Xs, and hence an application of all P to a list of arbitrary size only needs constant stack space.

Other important built-in special forms are the lambda function, which is described in Lambda Abstractions, and the view construction function view, see Views.

Last but not least, there are three other builtins in the Q language which may also act as special forms depending on the arguments with which they are invoked. The function composition operator `.' automagically adapts to special forms in its left and/or right operand. More precisely, if the first argument of a function F is special, then the second operand of a composition section of the form (F.) = (.) F is special as well; in this case F.G will always leave G unevaluated. Furthermore, the argument of a composition F.G is special if F or G has a special first argument. This makes the `.' operator work as expected in situations where the composed functions are special forms. The infix application operator `$' (cf. Application and Sequence Operators) and the built-in flip function (which flips the arguments of a binary function and is used to implement operator sections with a missing left operand, see Miscellaneous Functions) adjust to their first (function) argument in a similar fashion.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

9.5 The Quote Operator

Another built-in special form is the predefined quote operator `'', which acts as a constructor that protects its single argument expression from being evaluated. This construct is useful if you wish to treat certain expressions (which may or may not be in normal form) as literal objects rather than having them evaluated. For instance:

 
==> '(1+1)
'(1+1)

Lisp programmers should note that the quote operator is in fact a constructor, and not an operation which returns its single argument unevaluated. This is essential since in the Q language an expression only remains unevaluated as long as it is protected by a special form - an expression which occurs outside the context of a special form is always in normal form.

Another fact that deserves mentioning is that `'' is just an "ordinary" special form, and does not involve any special "magic" on the side of the interpreter. In particular, `'' does not inhibit variable replacements in rules like the following:

 
special foo X;
foo X                   = '(X+1);

Given the above definition, foo Y ⇒ '(Y+1), i.e., the X variable on the right-hand side is not treated as a literal. However, as the very same example shows, you can employ `'' to quote a free variable, and thereby defer its evaluation:

 
==> def Y = 99; foo Y
'(Y+1)

==> foo ~Y
'(99+1)

The force operator `~' works in quoted expressions as usual:

 
==> '(1+~(2+3))
'(1+5)

Moreover, there is another, similar operation, the ``' (backquote or "splice") operator, which forces evaluation of its argument like the `~' operator, but also "unquotes" the result. For instance, using the same foo function as above, we have:

 
==> '(`(foo Y)/2)
'((Y+1)/2)

Like the force operator, the splice operator also works outside a special form, in which case the unquoted expression is evaluated as usual. Moreover, if the evaluated argument is an unquoted expression, it is returned as is; in this case, ``' does exactly the same as the `~' operator.

The splice operator is the fundamental operation when constructing quoted expressions from smaller quoted pieces, by "splicing" the pieces into a "template" expression. Lisp programmers will be familiar with this technique. However, there are some notable differences between Q's `~'/``' and Lisp's `,'/`@,' constructs:

Also note that, in contrast to the quote operator, the force and splice operations do need special support from the interpreter. They are the only operations (besides the memoization operator, which is treated in a similar fashion) which are evaluated while a special form is being processed.

When used in concert, the quotation operators `'', ``' and `~' become powerful tools for the symbolic manipulation of literal expressions. They allow you to define functions which analyze and transform an expression before the modified expression is finally evaluated. As a simple (and somewhat contrived) example, let us write a function which takes an arbitrary quoted expression and replaces each instance of a foo X subterm by a corresponding bar X expression. (More meaningful examples can be found in the standard library; in particular, have a look at the cond.q script to see how the listof function is implemented.)

 
foo X                   = X-1;
bar X                   = X+1;

special foobar X;
foobar '(foo X)         = '(bar `(foobar 'X));
foobar '(X Y)           = '(`(foobar 'X) `(foobar 'Y));
foobar '(X|Y)           = '(`(foobar 'X)|`(foobar 'Y));
foobar '[X|Y]           = '[`(foobar 'X)|`(foobar 'Y)];
foobar X                = X otherwise;

To see foobar in action on a lambda abstraction, try the following:

 
==> var f = '(\X Y.X+foo (Y+foo (X*Y))), g = foobar ~f

==> f; g
'(\X . \Y . X+foo (Y+foo (X*Y)))
'(\X . \Y . X+bar (Y+bar (X*Y)))

==> `f 12 14; `g 12 14
192
196

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

9.6 A Bit of Reflection

Let us briefly comment on Q's special forms versus "real" lazy evaluation in functional languages like Haskell. It should be clear by now that Q allows you to keep a special argument from being evaluated as long as you like, maybe "to eternity". That is, the interpreter actually computes so-called "weak" normal forms in which special arguments are left unevaluated, even if these arguments are reducible. This is similar to the way deferred evaluation is handled in traditional functional languages featuring a basic eager evaluation strategy. In contrast, a pure lazy functional language interpreter will eventually reduce each expression to a "true" normal form to which no definition is applicable.

Both approaches have their pros and cons. The nice thing about normal order evaluation (with automatic memoization) in lazy functional languages is that it "just works" and is guaranteed to be optimal (in the sense that the program never computes a value that is not needed to produce the program's result). However, one downside of this method is that it becomes hard to deal with operations involving side-effects. Moreover, normal order evaluation is in fact only truly lazy for the lambda and combinatorial calculi used by these systems, which are very special kinds of rewriting system. In general term rewriting, however, there is no single "optimal" evaluation strategy; in fact, the problem to devise a terminating, let alone an optimal reduction strategy for a given rewriting system, even for a single input expression, amounts to solving the infamous halting problem which is undecidable.

Although the Q interpreter uses applicative order evaluation by default, special forms effectively give you full control over the evaluation strategy. As we have seen, this makes it possible to implement lazy data structures which behave pretty much like their counterparts in pure lazy languages. But special forms actually go way beyond this, as they also provide a device to program "meta" (or "macro"-like) functions which operate on the descriptions of other functions (or any other kind of literal expressions). These facilities in turn make it possible to write programs performing "reflective" computations in which a program inspects some part of itself (which might be given, say, as a lambda abstraction), and modifies this part before actually executing it.

It goes without saying that this is an essential requisite in advanced applications like artificial intelligence. While Q is not a fully reflective language (the equational rules making up a Q program are no first-class citizens of the language, neither are the modules a program consists of), having symbolic expressions as first-class citizens of the language provides enough reflectiveness so that you can do most stuff that you'd normally use Lisp for. Pure lazy languages like Haskell fall short in this respect, because they cannot deal with function expressions on a symbolic level without introducing an extra level of interpretation.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10. Built-In Functions

Besides the built-in operators already discussed in Expressions, the Q language also provides a collection of predefined functions which cover a variety of elementary operations. The built-in functions can be divided into eight major groups, arithmetic and numeric functions, string, list and tuple functions, conversion functions, I/O functions, lambda abstractions, exception handling functions, and miscellaneous functions. We will describe each of these in turn.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.1 Arithmetic Functions

The Q language provides the following functions for simple arithmetic on enumeration types (cf. Types):

enum Xs Y, enum_from Xs

enumerate a range of enumeration type values

succ X

successor function

pred X

predecessor function

As already discussed in Enumeration Types, the enum and enum_from functions construct a list from a range of enumeration type values, and the succ and pred functions produce the successor and predecessor of a given value in an enumeration type. Note that while the built-in definitions of these operations only apply to "real" enumeration types (including the built-in Char type), the standard prelude extends enum, succ and pred to integers (and, in the case of enum, floating point values), cf. The Standard Library.

Two additional functions are provided for integer arithmetic, shl and shr, which are used to shift the bits of an integer value a given number of bit positions to the left and right, respectively:

shl N COUNT

shift N COUNT bits to the left

shr N COUNT

shift N COUNT bits to the right

If the COUNT argument is negative, then the opposite shift by -COUNT bits is performed, i.e., shl X COUNT = shr X (-COUNT).

These operations provide the fastest possible way to multiply an integer with, and divide it by, a power of two. More precisely, if X is an arbitrary and N a nonnegative integer, then shl X N and shr X N are the same as X*pow 2 N and X div pow 2 N, respectively, where pow 2 N denotes the Nth power of 2, as an integer. Note that in contrast to bit shifts on fixed size machine integers, there is no "loss" of most significant bits, because Q integers have arbitrary precision; hence the results obtained with bit shifting are always "arithmetically correct".

The bitwise logical operations provide a means to implement sets of nonnegative integers in a fairly efficient manner. As usual, such "bitsets" are obtained by turning on exactly those bits whose positions are the members of the set. Thus, 0 encodes the empty set, and shl 1 I the set consisting of the single member I. You then employ or as set union, and as set intersection, and not as set complement. Set difference can be implemented by taking the intersection with the complement set.

For convenience, you might wish to define yourself a bit function as follows:

 
bit                     = shl 1;

Then you can test for membership using an expression like X and bit I which returns nonzero (namely bit I itself) iff I is in the set. You can also build a set from its members by or'ing the corresponding bit values, e.g.: bit 1 or bit 4 or bit 39.

Because Q integers have arbitrary precision, bitsets do not have an a priory size restriction in Q (in fact, they can even be infinite, as negative integers are used to encode the complements of the finite bitsets). However, you should note that a finite bitset (a.k.a. a nonnegative integer) needs space proprotional to the set's largest member, and hence this method will be practical only if the potential set members are within a reasonable range.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.2 Numeric Functions

exp X

exponential function

ln X

natural logarithm

sqrt X

square root

sin X

sine

cos X

cosine

atan X

arcus tangent

atan2 Y X

arcus tangent of Y/X

random

random number

seed N

initialize random number generator

This group includes the usual trigonometric and exponential functions, logarithms and square root. All these functions take both integer and floating point arguments. They return a floating point number. The atan2 function is like atan, but takes two arguments and computes the arcus tangent of their ratio. In difference to atan, it takes into account the signs of both arguments to determine the quadrant of the result.

Besides this, there is a parameterless function random which returns a 32 bit pseudo random integer in the range 0..2^32-1. To obtain a random 32 bit floating point value in the range [0,1], you simply divide random by 0xffffffff. The current implementation uses the "Mersenne Twister", a fast uniform random number generator with a period of 2^19937-1, written by Makoto Matsumoto and Takuji Nishimura. The generator is initialized automatically with a seed taken from the system clock at the time the interpreter starts up. The seed function allows to initialize the generator explicitly with a given nonnegative integer seed; this is useful if it is necessary to reproduce random sequences. Note that only the least significant 31 bits of the given seed value are actually used; these 31 bits are padded with a constant 1 bit, since the generator works best with odd seed values.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.3 String, List and Tuple Functions

This group provides some additional operations on sequences (strings, lists and tuples).

sub Xs I J

extract the subsequence consisting of members I thru J of a string, list or tuple Xs

substr S K L

return substring of length L at position K in S

pos S1 S

position of substring S1 in S

The sub function returns a subsequence (also called a "slice") of a string, list or tuple, given by the zero-based index values I and J. For instance:

 
==> sub "abcde" 2 3
"cd"

==> sub [a,b,c,d,e] 2 3
[c,d]

==> sub (a,b,c,d,e) 2 3
(c,d)

(The standard library provides an analogous definition of sub on streams, see Streams.)

The sub function handles all combinations of index arguments in a reasonable manner. Thus, if J < I or I >= #Xs then an empty sequence is returned, I = 0 is assumed if I < 0, and the remainder of the sequence is returned if J >= #Xs.

Besides sub, there are two additional functions operating exclusively on strings. The substr function returns a substring of a string with a given length at a given position. This function is provided for backward compatibility; note that substr S K L is equivalent to sub S K (K+L-1). The pos function returns the position of a substring in a string (-1 if there is no occurrence of the given substring). For instance:

 
==> pos "cd" "abcde"
2

==> substr "abcde" 2 2
"cd"

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.4 Conversion Functions

This group consists of functions for converting between different kinds of objects:

trunc X

truncate floating point to integer value, or return integer value unchanged

round X

round floating point to nearest integer value, or return integer value unchanged

float X

convert integer to floating point number, or return floating point value unchanged

int X

compute the integer part of floating point number, or convert integer value to floating point

frac X

compute the fraction part of floating point number, or return 0.0 for an integer value

hash X

32 bit hash code of an expression

ord X

ordinal number of enumeration type member (or character)

chr N

character with given ordinal number

list Xs

convert a tuple to a list

tuple Xs

convert a list to a tuple

str X

convert expression to string

strq X

convert quoted expression to string

val S

convert string to expression

valq S

convert string to quoted expression

Note that, as of Q 7.1, the trunc, round, float, int and frac functions work consistently on all numbers, i.e., both integers and floating point values. (With older versions, float would only work on integers and the other functions only on floating point values.)

The ord and chr functions convert between characters and their ordinal numbers in the (Unicode) character set. The ord function can also be used to compute the ordinal number of a member of an arbitrary enumeration type, see Enumeration Types.

The hash function returns a nonnegative 32 bit hash code for an arbitrary expression. The only guarantee is that syntactically equal objects are mapped to the same hash code. This function is useful, e.g., for implementing hashed dictionaries which map arbitrary keys to corresponding values, see Hashed Dictionaries for an application.

The str function converts its argument expression to the corresponding print representation conforming to the Q expression syntax, which is returned as a string; the argument is evaluated as usual. The strq function is similar, but converts a quoted expression (cf. The Quote Operator). For instance:

 
str (1+1)               ⇒ "2"
strq '(1+1)             ⇒ "1+1"

The val/valq functions are the counterpart of str and strq; they convert a string containing the print representation of an expression back to an expression. In the case of val, the expression is evaluated, while valq returns the parsed expression as it is, protected with the quote operator:

 
val "1+1"               ⇒ 2
valq "1+1"              ⇒ '(1+1)

These functions require that the contents of their string arguments conform to the Q expression syntax. In case of a syntax error, the built-in rules fail.

All expression conversion routines use the global scope, i.e., the namespace of the main script, for the purpose of converting identifiers, just like the interpreter itself. Thus these functions work consistently in a given main script no matter which module they are invoked from. However, this also means that the precise results may depend on which main script is currently loaded. Note that this only affects the parsing and unparsing of identifiers. For instance, if a function symbol foo in module bar is visible in the main scope then it will be unparsed simply as "foo", otherwise the qualified form "bar::foo" is used.

Also note that, as of version 7.2, Q now provides a hook into the built-in expression pretty-printer so that you can define custom views for any user-defined, built-in or external data type. This is described in Views.

A word of caution: Having a "universal" string representation of Q expressions is very useful to facilitate interaction with the user and for the purpose of "marshalling" expressions so that they can be stored in a file, but you should note that the expression conversion routines are so powerful that they can easily be abused for rather questionable purposes. For instance, the val function gives you a "backdoor" to access any function or variable symbol in a script, even private symbols in other modules. The only way around this would have been to severely restrict the functionality of the conversion routines, which would have made them much less useful. Thus it is in your responsibility to use these functions in an "orderly" manner.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.5 I/O Functions

This group provides functions for handling input/output from/to the terminal or a text file. These functions implement operations with side-effects; the side-effects consist in modifying the terminal display or a file. All I/O functions also take care, automatically and transparently, of the necessary translations of strings between the system and the internal UTF-8 encoding.

fopen NAME MODE

open file NAME in mode MODE

popen CMD MODE

open pipe CMD in mode MODE

fclose F

close a file

read, fread F

read an expression from the terminal or a file

readq, freadq F

read a quoted expression from the terminal or a file

readc, freadc F

read a character from the terminal or a file

reads, freads F

read a string from the terminal or a file

write X, fwrite F X

write an expression to the terminal or a file

writeq X, fwriteq F X

write a quoted expression to the terminal or a file

writec C, fwritec F C

write a character to the terminal or a file

writes S, fwrites F S

write a string to the terminal or a file

eof, feof F

check for end-of-file on the terminal or a file

flush, fflush F

flush output buffer on the terminal or a file


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.5.1 Terminal I/O

Input from the terminal (i.e., the standard input device) is done through the parameterless functions read, readq, readc and reads which read and return, respectively, an expression, a quoted expression, a single character, and a string. Terminal input is line-buffered which means that you must type an entire input line before anything is returned by these functions.

The reads function obtains one line of input, strips off the trailing newline character, and returns the result as a string. For instance (here and in the following, <CR> means that you hit the carriage return key to terminate the input line, rather than actually typing the string "<CR>"):

 
==> reads
one line of text<CR>
"one line of text"

The readc function allows you to read input character by character; at the end of a line the newline character "\n" is returned:

 
==> readc
<CR>
"\n"

The read and readq functions read one line from the input, as with reads, and convert the resulting string to an expression, as with val and valq, respectively. For instance:

 
==> read
1+1<CR>
2

==> readq
1+1<CR>
'(1+1)

The corresponding functions for producing output on the terminal (the standard output device) are named write, writeq, writec and writes. They print an expression, a quoted expression, a character and a string, respectively. These functions take one argument, the object to be printed, and return the empty tuple (). For instance:

 
==> writec "x"
x()

==> writes "one line of text\n"
one line of text
()

==> write (1+1)
2()

==> writeq '(1+1)
1+1()

(Output operations are invoked solely for their side-effects. However, any Q expression must have a value, and since there is no other meaningful value to return, the write functions return (). In the above examples, this value is output by the interpreter after the printed text, which explains, e.g., the x() in response to writec "x".)

It is common practice to combine these operations by means of the || operator in order to implement dialogs such as the following prompt/input interaction:

 
==> writes "Input: " || reads
Input: one line of text<CR>
"one line of text"

You will also often encounter several output operations for interpolating data into text fragments:

 
==> writes "The result is " || writeq '(1+1) || writes ".\n"
The result is 1+1.
()

The eof function allows you to check for end-of-file on the terminal. Actually, this causes another line of input to be read from the terminal if no input is currently available, to see whether the end of the input has been reached. Most operating systems allow you to type a special character to indicate end-of-file, such as the Ctl-D character on the UNIX system:

 
==> eof
<Ctl-D>
true

The flush function writes any pending output to the terminal. It is rarely necessary to call this function explicitly; see also the discussion of the fflush function in File I/O.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.5.2 File I/O

File input and output is implemented by the functions fread, freadq, freadc, freads, fwrite, fwriteq, fwritec and fwrites. There also is an feof function which allows to check whether the end of a file has been reached, and an fflush function which flushes the output buffer of a file. These operations are analogous to their terminal equivalents, but take an additional first argument, a file object. A file object is a special kind of elementary object in the Q language which is returned by the built-in fopen function.

The fopen function takes two string arguments, the name of the file to be opened (which is the name under which the file is known by the operating system), and the mode in which the file is to be opened. The mode string "r" means to open an existing file for reading, "w" to open a new file for writing (existing files are truncated to zero size), and "a" to create a new or append to an existing file. You can also add the letter "b" at the end of the mode string to indicate that the file should be opened as a binary file. This only has the effect to suppress the LF/CR-LF conversion on MS-DOS/Windows systems, which is essential if you read/write binary data from/to the file. On UNIX systems this flag is ignored.

For instance, a function checking whether a file exists and is accessible for reading can be implemented as follows:

 
exists NAME             = isfile (fopen NAME "r");

(A suitable definition of the isfile function is provided by the standard library, cf. Type-Checking Predicates.) Files are closed automatically when the corresponding file objects are no longer accessible in a computation. E.g., with the above definition of the exists function, if you invoke this function as exists "myfile", then the file object returned by fopen "myfile" "r" (assuming that the file actually exists) will become inaccessible as soon as it has been processed by the isfile function, at which point it gets closed. Similarly, if you assign a file to a variable in the interpreter,

 
==> def F = fopen "myfile" "r"

the file will be closed as soon as you undefine the variable:

 
==> undef F

Occasionally, it might be necessary to close a file explicitly, e.g., a file object might still be accessible, but you want to close it before you do some other processing. In this case you can invoke the fclose function on the file:

 
==> def F = fopen "myfile" "r"; fclose F
()

After closing a file, the file object still exists, but all further I/O operations on it (including fclose itself) will fail:

 
==> freads F; fclose F
freads <<File>>
fclose <<File>>

(The notation <<File>> represents a file object, since file objects have no printable representation. This syntax is also used by the expression output and unparsing functions, write, str etc. Of course, such objects cannot be reparsed using read, val, etc.)

The standard input and output devices used by the terminal I/O functions (the readx and writex functions, see Terminal I/O) are actually special instances of the file I/O functions, which read from standard input and write to standard output. The standard input and output devices are implemented by the interpreter as predefined file objects which are assigned to the INPUT and OUTPUT variables, see Command Language. Thus, for instance, reads and writes X are equivalent to freads INPUT and fwrites OUTPUT X, respectively.

As an example for the use of file I/O, here's a tail-recursive function which copies the contents of one file opened in read mode to another file opened in write or append mode:

 
fcopy F G               = () if feof F;
                        = fwritec G (freadc F) || fcopy F G otherwise;

Note that file objects are indeed modified by input and output operations; at least, the file pointer is moved accordingly. Otherwise the above definitions would not work. This also becomes apparent when manipulating files interactively:

 
==> def F = fopen "xyz" "r"

==> freads F
"first line in file xyz"

==> freads F
"second line in file xyz"

…

The fflush function is used to write any buffered output to a file. This operation is only needed when the target file must be updated immediately. For instance, the target file may actually be a pipe and it may be necessary to get an immediate response from the program which reads the output at the other end of the pipe (see Pipes). Then you can use the fflush function as follows:

 
==> fwrites F S
()

==> fflush F // force S to be written to F immediately
()

Note that if the Q interpreter runs interactively, then it automatically flushes the standard output files whenever an evaluation is finished, see Running Compiler and Interpreter. And, of course, buffered output is always flushed when a file is closed. In case you have to flush standard output explicitly, use flush or, equivalently, fflush OUTPUT. (This should only be necessary if the standard output stream has been redirected to a disk file or a pipe.)

Also note that the interpreter does not provide direct support for reading and writing binary files. However, such functionality is provided by the clib standard library module (see Clib).


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.5.3 Pipes

The popen function is used to create a file object connected to a pipe. The file objects constructed with popen allow you to pipe data into an operating system command, and to read the output of such a command from a file. Like fopen, popen takes two string arguments. The first parameter denotes the command to be executed, and the second parameter specifies the mode in which the pipe is to be opened. The mode argument must be either "r" (read from the output of the given command) or "w" (write to the input of the command), and causes a file object open for reading or writing to be returned, respectively. The "b" flag may also be used to open the pipe as a binary file, see File I/O.

Input pipes are typically employed for retrieving information from the host operating system. For instance, on UNIX systems we can use the ls command to obtain a list of filenames matching a given wildcard specification. A corresponding function ls which returns such a list can be implemented as follows:

 
ls S:String             = freadls (popen ("ls "++S) "r");
freadls F:File          = [] if feof F;
                        = [freads F|freadls F] otherwise;

As an example for an output pipe, the following function more pipes a file through the more program which displays the file page by page:

 
more F:File             = fcopy F (popen "more" "w");

(The definition of fcopy is as in File I/O.)

Just like ordinary files, pipes are closed automatically when the corresponding file object is no longer accessible, or explicitly by an invocation of the fclose function. Furthermore, the interpreter waits for the command started with popen to finish when closing the pipe.

Output pipes are a convenient means to implement specialized output devices in the Q language. For instance, the standard library script graphics.q writes its graphics output to a file GRAPHICS which is often defined as a pipe to a PostScript previewer like, e.g., Ghostscript:

 
def GRAPHICS = popen "gs -q -" "w";

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.6 Lambda Abstractions

As of Q 7.1, the special form lambda is now a built-in function which returns a "compiled" function object represented by the new built-in Function type. This offers better performance than the previous implementation in the standard library which was based on the combinatorial calculus. Moreover, the \X.Y syntax can now be used as a convenient shorthand for an application lambda X Y, see Conditional Expressions and Lambdas.

Both arguments of lambda are special. The first argument X denotes an expression to be matched against the actual parameter of the function when it gets applied. The second argument Y is the body of the lambda abstraction. When a lambda function is applied to an argument Z, Z is first matched against the pattern X and the (free) variables in X are bound to the corresponding values in Z. Pattern matching is performed in the same manner as for the left-hand side of an equation; in particular, non-linear patterns (which contain a variable more than once) and the anonymous variable work as expected, as do "call by pattern" (cf. Special Constructors and Streams) and matching against "views" (cf. Views). If the match fails then the application of the lambda function fails, too. Otherwise the body of the lambda is evaluated, with the variables of the pattern replaced with their corresponding values, and the resulting normal form is returned as the value of the lambda application.

As in previous releases, the lambda function needs to know which other special forms perform variable bindings so that these constructs can be expanded recursively. To these ends, such "lambda-like" functions should be declared as members of a type derived from the predefined Lambda type, and the predefined lambdax special form should be overloaded such that, when applied to an expression involving the lambda-like function, it returns a quoted expansion of the construct in terms of lambda. Examples for this can be found in the cond.q standard library module.

Examples

The following simple lambda abstraction can be used to denote a function which squares its argument by multiplying it with itself. (In the following we use the customary \X.Y notation throughout, so this example is written as \X.X*X. But note that this is just syntactic sugar for an application of the lambda function, lambda X (X*X) in this case.)

 
==> \X.X*X
\X1 . X1*X1

==> _ 5
25

In this example the lambda pattern is just a single variable, which always matches. It is also possible to have pattern-matching lambda abstractions, see below. Also note that lambda objects have a printable representation in Q, although the original variable names are lost when a lambda is printed (more about this in the subsection "Lambdas as First-Class Objects" below).

Multi-argument lambda functions can be created using nested lambdas:

 
==> \X.\Y.(1-X)*Y
\X1 . \X2 . (1-X1)*X2

==> _ 0.9 0.5
0.05

For convenience, the notation \X1 X2 … . Y is provided as a shorthand for \X1 . \X2 … . Y:

 
==> \X Y.(1-X)*Y
\X1 . \X2 . (1-X1)*X2

==> _ 0.9 0.5
0.05

Also note that in a nested lambda expression each variable is bound by the innermost lambda in which it occurs:

 
==> \X.(\X.X*X) (X+1)
\X1 . (\X2 . X2*X2) (X1+1)

==> _ 2
9

Moreover, the lambda variables are always bound statically, as determined by the lexical context. For instance, consider:

 
==> def F = \X.\Y.(1-X)*Y, G = \Y.(1-X)*Y, H = \X.~G

==> F; H
\X1 . \X2 . (1-X1)*X2
\X1 . \X2 . (1-X)*X2

==> F 0.9 0.5; H 0.9 0.5
0.05
(1-X)*0.5

Note that in the function G the variable X occurs free, and hence the same holds for H. On the other hand, in the function F the variable X is bound by the outermost lambda.

The lambda argument can also be a pattern to be matched against the actual parameter when the function is applied:

 
==> \(X,Y).(1-X)*Y
\(X1,X2) . (1-X1)*X2

==> _ (0.9,0.5)
0.05

==> \[X|Xs].Xs
\[X1|X2] . X2

==> _ [1,2,3]
[2,3]

If a match fails then the application of the lambda function fails, too:

 
==> (\[X|Xs].Xs) []
(\[X1|X2] . X2) []

The anonymous variable and nonlinear patterns also work as expected:

 
==> \[_|Xs].Xs
\[X1|X2] . X2

==> _ [1,2,3]
[2,3]

==> \[X,X|Xs].Xs
\[X1,X1|X2] . X2

==> _ [1,1,2]
[2]

==> (\[X,X|Xs].Xs) [1,2,3]
(\[X1,X1|X2] . X2) [1,2,3]

==> (\[_,_|Xs].Xs) [1,2,3]
[3]

But note that in a multi-argument lambda, the different argument patterns will be matched independently from each other. This is because a lambda abstraction like \X X.X*X is equivalent to \X.\X.X*X, so the X variable in the body is actually bound to the second lambda argument (which belongs to the innermost lambda) and the first argument is never used:

 
==> \X X.X*X
\X1 . \X2 . X2*X2

==> _ 1 2
4

This is to be seen in contrast to an equational definition like `foo X X = X*X;' in which the entire left-hand side participates in the matching process.

Lambdas can also be used to substitute variables in special forms like the quote operator:

 
==> (\X.'(X+1)) (2*3)
'(6+1)

But note that while the pattern and body of lambda are special arguments, the actual parameter expected by the lambda function is non-special. Thus the value 2*3 in the example above is evaluated before it is substituted for the X variable. If you want to prevent lambda arguments from being evaluated, you have to protect them with a special form (e.g., a quote), and extract the value for use in the lambda body using an appropriate pattern:

 
==> (\'X.'(X+1)) '(2*3)
'(2*3+1)

Lambdas as First-Class Objects

While Q's lambda abstractions look superficially like those in languages like Haskell and ML, there is a big difference: Lambdas are just normal expressions in the Q language and can thus can be inspected and manipulated just like any other expression. In other words, lambda is just an ordinary function in Q (albeit a special form) which is implemented on top of the term rewriting machinery that Q is based on. This has some important consequences which are discussed in the following, including some incompatibilities to mainstream FP languages that Haskell and ML programmers should know about.

For efficiency reasons, the value returned by lambda is actually a "compiled" form of the given lambda abstraction, which is represented as an object of the built-in Function type. These objects are computed at runtime and contain a complete description of the original lambda abstraction, except that the original names of variables bound in the lambda pattern are lost in the compilation process; this is why you will see generic variable names like X1, X2, etc. in the printed representation.

We also call these objects lambda functions, to distinguish them from lambda abstractions, i.e., the lambda expressions from which they are computed. These objects have an associated "view", with lambda acting as a "virtual constructor" of the Function type (cf. Views), so that lambda objects can be compared for syntactic equality, printed, and dissected by pattern matching. Therefore the difference between lambda abstractions and the corresponding function objects is mostly transparent to the programmer (except for the variable names); for most practical purposes you should be able to work with lambdas just as if they were actually represented the way that they are written.

For instance, in Q you can create a lambda function and then extract its pattern and body simply as follows:

 
==> var fac = \N.if N>0 then N*fac (N-1) else 1

==> def \X.Y = fac; X; Y
X1
if X1>0 then X1*fac (X1-1) else 1

This makes it easy to manipulate lambdas in various ways before actually applying and thereby evaluating them, which offers great opportunities for "metaprogramming". For instance, a simple kind of `let' expressions can easily be defined in terms of lambda as follows:

 
special let C X;
let (A=B) X = (\A.X) B;

More examples of this kind can be found in the standard library. For instance, the standard library defines `case' expressions as well as list and stream comprehensions in terms of lambda.

However, this power comes at a price. Since lambdas are first-class objects, the variables "bound" in lambdas are just ordinary expressions, too. In fact, for the Q interpreter the variables inside a lambda pattern are nothing special at all; on the level of equational definitions they are just free variables! Conversely, from the point of view of lambdas, the variables bound in an equation are "meta variables" whose values are substituted into both the pattern and the body of a lambda abstraction. In fact, we used that to good effect in the above definition of `let'.

This leads to a common pitfall, namely that you might accidentally try to use a variable symbol bound by the left-hand side of an equation (or inside a where clause) as a lambda variable, with predictable but unintended results. For instance, consider the following definition:

 
foo X = (\X . 2*X+1) (X+1);

In Haskell, an equational definition like the one above (i.e., something like foo x = (\x->2*x+1) (x+1)) is essentially considered equivalent to a corresponding lambda abstraction (\x -> (\x->2*x+1) (x+1) in this case), and hence foo 99 will evaluate to 2*(99+1)+1 = 201.

Not so in Q. Here, the left-hand side X acts as a meta variable whose value is substituted into the lambda abstraction, yielding the result (\99 . 2*99+1) (99+1), which probably was not intended here. To get the correct behaviour in this example you can either replace the lambda variable with a symbol which is not bound by the equation, or use an inline variable declaration to escape the lambda variable, like so:

 
foo X = (\var X.2*var X+1) (X+1);

Of course you can also just define foo itself as a lambda function:

 
foo = \X . (\X.2*X+1) (X+1);

While this is a little awkward and might take some time to get used to, there is no way around this if lambda abstractions are implemented as first-class citizens as they are in the Q language. So some discipline is needed to avoid the pitfall sketched out above.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.7 Exception Handling

As already mentioned, the Q language only knows a few "hard" runtime error conditions a.k.a. exceptions such as stack or memory overflow. However, "soft" exceptions can also be generated and handled in any desired manner. The following functions are used to deal with all kinds of exceptions during expression evaluation:

halt

halt evaluation

quit

exit the Q interpreter

break

invoke the debugger

fail

abort the current rule

_FAIL_

abort the current reduction

catch F X

handle an exception

throw X

raise an exception

trap ACT SIG

trap signals

The halt function never returns a result, but raises an exception which normally causes the evaluation process to be aborted immediately. This last resort is commonly used in case of a fatal error condition. The quit function is like halt, but also causes exit from the interpreter and returns you to the operating system shell; this operation is often used interactively to terminate a session with the interpreter. (Note that in a multithreaded script, see POSIX Threads, quit only terminates the program when it is invoked from the main thread. In other threads it just acts like halt.)

If the break flag is on (see Debugging), the break function interrupts an evaluation and invokes the symbolic debugger built into the Q interpreter, just as if the user typed Ctl-C. This operation allows you to set breakpoints in a script. For instance,

 
foo X           = break || bar X;

causes the debugger to be invoked as soon as the rule is executed. If the break flag is off then this operation has no effect. In any case, the break function returns ().

The fail function is used to abort the current rule, just like a failing qualifier. The difference is that fail can be used anywhere in the right-hand side or qualifier of a rule, and causes the rule to be exited immediately. This allows you to handle complicated error conditions which occur while a rule is already being executed, and also provides an efficient method for implementing backtracking algorithms. For instance, here is a quick solution for the famous "N queens" problem:

 
queens N        = search N 1 1 [];

search N I J P  = write P || writes "\n" if I>N;
                = search N (I+1) 1 (P++[(I,J)]) || fail if safe (I,J) P;
                = search N I (J+1) P if J<N;
                = () otherwise;

safe (I1,J1) P  = not any (check (I1,J1)) P;

check (I1,J1) (I2,J2)
                = (I1=I2) or else (J1=J2) or else
                  (I1+J1=I2+J2) or else (I1-J1=I2-J2);

This algorithm prints out all valid placements (i.e., lists of (row,column) pairs) of N queens on an N times N board. Note the use of fail in the second equation for the search function, which causes the rule to fail after the current placement has been tried, after which evaluation proceeds with the third rule which places the queen into the next column. This turns the definition into a backtracking algorithm. The safe function verifies that a placement (I1,J1) does not put the queen in check with any of the other queens which have already been placed.

The _FAIL_ function works in a fashion analogous to fail, but makes an entire reduction fail (not just a single equation). This is similar to throwing an exception (see below) in that the evaluation of the current redex is aborted immediately, but does not actually raise any exception; instead, it effectively forces the redex to be a normal form, regardless of how many subsequent equations might still be applicable. This operation is useful if you want to check for certain error conditions before trying any subsequent equations. For instance:

 
foo X:Num       = _FAIL_ if X<=0;
                = bar X if X>1;
                = bar (1/X) otherwise;

The catch function allows you to handle both "hard exceptions" which are raised when the interpreter encounters one of the runtime error conditions discussed in Error Handling, and "soft exceptions" which are raised by the functions halt and quit already explained above, and the throw function which is discussed below. Moreover, the interpreter also handles certain signals sent to it, e.g., via the kill(2) system call, by raising appropriate exceptions. In the current implementation, by default only the signals SIGINT ("break"), SIGTERM ("terminate") and SIGHUP ("hangup") are handled; the latter two are both treated as termination requests, like a call to the quit function. Signal handlers can also be installed by the running script using the trap function discussed below.

Both arguments of catch are special. The catch function first evaluates its second argument, the target expression and, if all is well, returns the value of that expression. Otherwise, if any exception was raised during the evaluation of the target expression, it evaluates the first argument (the exception handler), applies it to the value of the exception, and returns the result computed by the exception handler.

In the case of a runtime error condition and the exceptions raised with the halt and quit functions, the exception value is of the form syserr N where N is one of the integer error codes listed below:

Let's try this by forcing a stack overflow condition:

 
==> iter 100000000 (+) 1 
! Stack overflow
>>> iter 100000000 (+) 1
                        ^

==> catch exception (iter 100000000 (+) 1)
exception (syserr 5)

The syserr constructor belongs to the built-in SysException type, which in turn is a subtype of Exception. These types may be thought of as being predefined as follows:

 
public type Exception;
public type SysException : Exception = const syserr N;

User-defined exceptions can be generated with throw. The throw function raises an exception whose value is given by its single (non-special) argument. For instance:

 
hd []           = throw '(hd []);
exception X     = writes "Exception: " || write X || writes "\n";

With these definitions we have:

 
==> catch exception (hd [1..3])
1

==> catch exception (hd [1..0])
Exception: '(hd [])
()

==> hd [1..0]
! Exception
'(hd [])
>>> hd [1..0]
             ^

Note that if an exception raised with throw is not handled with a corresponding catch, then the interpreter aborts the evaluation and prints an error message (and it will also invoke the debugger to report the rule which generated the exception, if break is on).

The catch and throw functions can also be used to implement non-local value returns. For instance, here is a variation of the N queens function which returns the first solution as soon as it has been found during backtracking:

 
queens1 N       = catch id (search1 N 1 1 []);

search1 N I J P = throw P if I>N;
                = search1 N (I+1) 1 (P++[(I,J)]) || fail if safe (I,J) P;
                = search1 N I (J+1) P if J<N;
                = () otherwise;

Here, the "exception handler" is just the identity function id, as defined by the standard library, see Standard Functions.

Another neat programming trick is the following which makes a rule fail if an exception occurs during evaluation of the right-hand side:

 
foo X           = catch fail (bar X);

This works because the handler argument, which is special, is evaluated only if an exception is actually encountered during evaluation of the target expression. Otherwise, foo X will just return the value of bar X. The _FAIL_ function can be used in a similar manner.

Lisp and C++ programmers will probably miss an additional parameter which restricts the kind of exceptions handled with catch. You can implement this easily yourself by checking the exception value in your handler function, and "throw on" an unknown exception to the next enclosing catch:

 
type MyException : Exception = const empty_list;

hd []           = throw empty_list;

myexception X:MyException
                = writes "myexception: " || write X ||
                  writes "\n" || halt;
myexception X   = throw X otherwise;

Note how we implemented the empty_list exception using our own Exception subtype MyException. This method of structuring error exceptions is recommended because it makes it easier to spot the source of an exception. It also enables us to use a type guard (cf. Types) in order to check for specific exception types, rather than having to discriminate over different exception patterns.

Finally, the trap function allows you to have exceptions be generated when a signal arrives. The trap function takes two arguments, ACT and SIG, denoting the action to be performed and the number of the signal to be trapped. It returns the previous action associated with the signal. ACT must be an integer value. If it is positive then the signal raises an exception of the form syserr (-SIG); if it is negative then the signal is completely ignored; and if it is zero then the interpreter reverts to the default handling of the signal. You can also use trap to redefine the default handling of the break and termination signals. To allow signals to be caught reliably, further signals are "blocked" (i.e., they are queued for later delivery) while catch evaluates an exception handler. For instance, here is a little script which sets up some signal handlers and keeps on printing the trapped signals until it either gets terminated with kill -9 or the timed wait with the sleep function times out. (The sleep function is described below. The symbolic signal constants and the getpid and printf function are provided by the clib module, see Clib.)

 
test            = do (trap 1) [SIGINT,SIGTSTP,SIGTERM,SIGHUP,SIGQUIT] ||
                  printf "I'm running as pid %d, try to kill me!\n" getpid ||
                  loop;

loop            = flush || catch sig (sleep 100);

sig (syserr K)  = printf "Hey, I got signal %d.\n" (-K) || loop;

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.8 Miscellaneous Functions

This group consists of some special operations which do not fit into any of the other groups treated above.

version, sysinfo

determine version/system information

which S

determine absolute pathname of the executing script or of another file on Q library path

time

get the system time

sleep X

pause for some time

isspecial X

predicate checking whether argument is a special form

isconst X

predicate checking whether argument is a constant

isfun X, isvar X

predicates checking whether argument is a function or a variable symbol, respectively

isdef X

predicate checking whether the variable X has been assigned a value

flip F

flip arguments of a binary function

The version and sysinfo functions are useful when writing code which depends on a particular version of the Q interpreter or a machine/operating system type. They return a string describing the Q interpreter version and the host system on which it is installed, respectively.

The which function performs a search for the given filename on the Q library path. If the file is found then the absolute pathname is returned; otherwise the function fails. The function can also be invoked with the argument (), in which case the full pathname of the executing main script is returned. This function is most useful if a script has to locate additional data files which are stored along with it, see Running Scripts from the Shell.

Two functions are provided for timing purposes. The time function returns, as a floating point value, the time in seconds since the "epoch" (00:00:00 UTC, January 1, 1970). The sleep function suspends evaluation for the given time (integer or floating point value, again in seconds). Be warned that the actual resolution of these functions depends on the timing routines available on your system, and even if a decent resolution is provided, the accuracy of results will depend on various factors such as system load.

The isspecial, isconst, isfun, isvar and isdef predicates are all special forms checking whether their single argument has a certain property. The isspecial function allows you to determine whether its argument is a special form which will receive its next argument unevaluated. The isconst operation checks whether its argument is a constant, i.e., a constant belonging to any of the built-in types, a variable or function symbol declared with const, or an application whose head element is such a constant. The isfun and isvar predicates are used to check whether an expression is a function symbol (built-in or defined by the loaded script, can also be a constant symbol), or a free variable symbol, respectively. The isdef function can be used to check whether its argument (a variable) has been assigned a value using def.

The flip function exchanges arguments of a (binary) function, as if it was defined by the following equation:

 
flip F X Y              = F Y X;

This function is used internally to implement operator sections with missing left argument. E.g., (+1) is nothing but syntactic sugar for flip (+) 1. You can apply flip to other, user-defined functions as well; thus, flip foo X denotes the function which maps Y to foo Y X.

In order to handle special forms like, e.g., (or else) correctly, flip automagically adjusts to the argument pattern of its first argument. That is, if the function given as the first argument (which is always evaluated) has a special first (or second) argument, then flip's second (or first) argument will be special as well.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11. The Standard Library

This chapter gives an overview of the data types and operations provided by the standard library modules supplied with the Q programming system. Most of these scripts are already "preloaded" when you run the interpreter, because they are included in the "prelude" script prelude.q. The exact collection of library scripts may depend on your local setup, but a basic installation of the Q core package should at least provide the modules listed below. In this list, (*) indicates "external" modules which are (mostly) implemented in C, whereas (+) denotes modules which have to be imported explicitly, as they are not included in the prelude.

assert.q

print diagnostics

clib.q

basic C interface (*)

complex.q

complex numbers

cond.q

conditional expressions and tuple/list/stream comprehensions

error.q

print error messages

getopt.q

GNU getopt compatible option parser (+)

graphics.q

PostScript graphics interface (+)

math.q

mathematical functions

prelude.q

standard prelude

rational.q

rational numbers

reftypes.q

tuples, lists and streams of references (+)

sort.q

algorithms to sort a list

stddecl.q

shared declarations of the container data structures, cf. stdtypes.q (+)

stdlib.q

a collection of common standard functions

stdtypes.q

a collection of efficient container data structures (+)

stream.q

a lazy list data structure

string.q

additional string functions

system.q

POSIX system interface (*) (+)

tuple.q

additional tuple functions

typec.q

type-checking predicates

Here is a quick rundown of the most important modules: The prelude.q script implements the standard prelude, which loads the entire collection (except the modules marked with (+) in the list above, which must be imported explicitly). It adds some convenient operations for comparing lists, streams and tuples, the syntactic inequality operator `!=', as well as successor and predecessor functions on integers (succ and pred) and default implementations of various functions for subtypes derived from the Real type. It also contains definitions for some cases of list, tuple and stream enumerations which are not provided as builtins.

Specifically, the standard prelude overloads the =, <>, <, >, <= and >= operators with rules for deciding equality and inequality of lists, streams and tuples, and rules for ordering lists and streams lexicographically. That is, tuples, lists and streams can be compared with = and <>, which is done by checking that the operands are of the same size and recursively comparing all members of the operands. Lists and streams can also be compared lexicographically by recursively comparing the list or stream members; the first unequal members decide the comparison. This works just like the lexicographic comparison of strings. Thus, e.g., [1,2] is considered to be less than both [1,3] and [1,2,1], but more than [1,1,3] or [0,5].

Moreover, the prelude defines the syntactic inequality operator `!=' as the logical negation of the built-in `==' operator, cf. Non-Linear Equations. Like `==', `!=' is implemented as a special form.

Last but not least, the standard prelude also provides default definitions for arithmetic, relational, numeric and conversion functions on Real, so that a minimal implementation of a user-defined type derived from Real (or any of its subtypes) just needs to define the float function (which is needed to coerce values of the type to Float) and then add definitions for those operations which need special treatment. The default definitions in prelude.q are at the lowest priority level so that they can always be overridden if needed.

The stdlib.q script has those operations which will probably be used most frequently by the average programmer; it contains a lot of additional list processing functions and other useful stuff mostly adopted from [Bird/Wadler 1988]. This module is also included by many other library scripts. Historically, it was the first script written for the standard library.

The stream.q script extends most list operations defined in stdlib.q to streams, i.e., lazy lists. It also provides a number of other useful stream operations.

Similarly, the string.q and tuple.q scripts provide additional operations on strings and tuples, and also extend some list operations to these data types.

The stdtypes.q script simply includes all modules which implement common container data structures; currently these are array.q, bag.q, dict.q, hdict.q, heap.q and set.q, see Standard Types. Some declarations shared by these modules are in the stddecls.q script.

The complex.q and rational.q scripts implement complex and rational number types along with the usual operations on these types. These have been designed to integrate seamlessly into Q's system of built-in number types so that expressions involving arithmetic operations with any combination of integer, floating point, complex and rational arguments work as expected.

The clib.q and system.q modules give access to various useful system functions from the C library, including C-style formatted I/O, binary file I/O, process and thread management, internationalization support and regular expression routines. The clib.q module also provides mutable expression cells ("references"), as well as a "byte string" data type which lets you represent arbitrary C data, and it overrides various important standard library functions with much faster C versions. The operations of these modules are mostly written in C, using the C language interface discussed in C Language Interface. These modules are described in their own chapter, see Clib.

As of Q 7.8, the reftypes.q module provides additional convenience operations to work with mutable expression sequences implemented as tuples, lists or streams of references. These are also described in the "Clib" chapter, see Expression References.

The Q programming system comes bundled with some additional modules for interfacing to various third-party libraries and software packages. At the time of this writing, these are:

Preliminary documentation for these modules can be found in the etc subdirectory of your Q installation directory (usually in /usr/share/q or /usr/local/share/q). Other useful add-on modules are available as separate source packages. At the time of this writing this comprises additional GUI libraries (including a complete interface to Trolltech's very nice "Qt" toolkit), a bunch of graphics and multimedia modules (including interfaces to OpenGL, OpenAL, the Xine video library, and Grame's MidiShare and Faust) and more. Ready-made plugins for using the Q interpreter from the "Chicken" Scheme compiler, the Apache webserver and Miller Puckette's graphical computer music and multimedia environment "PureData" are also available. Please check out the Q website at http://q-lang.sourceforge.net for more information about these. (The Q module for Chicken, which was written by John Cowan, is available separately from the Chicken website, see http://www.call-with-current-continuation.org.)


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.1 Standard Functions

The stdlib.q script provides frequently used list operations and other stuff mostly adopted from [Bird/Wadler 1988].

abs X

absolute value of X

all P Xs

verify that each element of the list Xs satisfies the predicate P

any P Xs

verify that the list Xs contains an element satisfying predicate P

append Xs X

append a value X to the list Xs

cat Xs

concatenate a list of lists

cons X Xs

prepend an element to a list

cst X

constant-valued function: cst X Y ⇒ X

curry F

curry a function: turn a function operating on pairs into a function with two arguments

curry3 F

curry with three arguments

do F Xs

apply a function F to each member of a list Xs, return ()

dowith F Xs Ys

take two lists and apply a binary function to corresponding elements, return ()

dowith3 F Xs Ys Zs

dowith with three lists

drop N Xs

remove the first N elements from the list Xs

dropwhile P Xs

remove elements from the beginning of Xs while the predicate P is satisfied

eq X Y

syntactic equality (cf. Non-Linear Equations)

filter P Xs

filter a list with a predicate

foldl F A Xs

fold-left

foldl1 F Xs

fold-left over nonempty lists

foldr F A Xs

fold-right

foldr1 F Xs

fold-right over nonempty lists

hd Xs

return the head element of a list

hds Xs

return the list of all head elements in a list of lists

id

the identity function: id X ⇒ X

init Xs

return list Xs without its last element

iter N F A

generate the list of the first N values A, F A, F (F A), …

last Xs

return the last element of a list

map F Xs

apply function F to each member of a list

max X Y

maximum of two values

min X Y

minimum of two values

mklist X N

create a list of N X's

neg P

negate a predicate

neq X Y

syntactic inequality

null Xs

check whether a list is empty ([])

nums N M

generate a list of numbers in a given range

numsby K N M

generate a list of numbers with a given step size

pop Xs

remove the head element from a list

prd Xs

product of a list of numbers

push Xs X

prepend an element to a list (cons with arguments reversed)

reverse Xs

reverse a list

scanl F A Xs

apply foldl to every initial part of a list

scanl1 F Xs

apply foldl1 to every nonempty initial part of a list

scanr F A Xs

apply foldr to every final part of a list

scanr1 F Xs

apply foldr1 to every nonempty final part of a list

sgn X

sign of a number

sum Xs

sum of a list of numbers

take N Xs

select the first N elements from the list Xs

takewhile P Xs

select elements from the beginning of Xs while the predicate P is satisfied

tl Xs

remove the head element from a list

tls Xs

return a list of lists with all head elements removed

top Xs

return the head element from a list

transpose Xs

transpose a list of lists

uncurry F

uncurry a function: turn a function with two arguments into a function operating on pairs

uncurry3 F

uncurry with triples

until P F X

repeat applying F to X until P is satisfied

unzip Xs

transform a list of pairs into a pair of lists

unzip3 Xs

unzip with triples

while P F A

list repeated applications of F to A while P is satisfied

zip Xs Ys

take two lists and return a list of corresponding pairs

zip3 Xs Ys Zs

zip with three lists

zipwith F Xs Ys

take two lists and map a binary function to corresponding elements

zipwith3 F Xs Ys Zs

zipwith with three lists

Note that some of these functions are actually overridden with much faster C versions in the clib.q module; see C Replacements for Common Standard Library Functions.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.2 Tuple Functions

The tuple.q script provides some additional functions operating on tuples:

fst Xs

return first element of a tuple

mktuple X N

create a tuple of N X's

pair X Y

construct a pair

snd Xs

return second element of a tuple

trd Xs

return third element of a tuple

triple X Y Z

construct a triple

tuplecat Xs

concatenate a list of tuples

Moreover, the following list functions from stdlib.q are overloaded to also work on tuples: append, cat, cons, do, dowith, dowith3, map, null, pop, push, reverse, top. Most of these return tuples when applied to such arguments, except null (which returns a truth value), the do/dowith functions (which return ()) and cat (which always returns a list; use tuplecat if you want to concatenate a collection of tuples instead).

Also note that the reverse and tuplecat operations are actually overridden with much faster C versions in the clib.q module; see C Replacements for Common Standard Library Functions.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.3 String Functions

The string.q script provides a collection of additional string functions. Currently the following operations are implemented (most of these operations are actually overridden with much more efficient C implementations in the clib.q module; see C Replacements for Common Standard Library Functions):

chars S

return the list of individual characters in S

join DELIM Xs

concatenate a list of strings, interpolating the given DELIM string between each pair of consecutive strings in the list

mkstr S N

create a string consisting of N copies of the given string S

split DELIM S

split a string into a list of substrings delimited by characters in the given DELIM string

strcat Xs

concatenate a list of strings

Moreover, as of Q 7.8 this module overloads all list operations in stdlib.q so that they work on strings as expected. This lets you use strings mostly as if they were just another kind of list value. For instance:

 
==> take 3 "abcdef"
"abc"

==> dropwhile (<="c") "abcdef"
"def"

==> tl "abcdef"
"bcdef"

==> zip "abcdef" "ABCDEF"
[("a","A"),("b","B"),("c","C"),("d","D"),("e","E"),("f","F")]

==> map (+1) "HAL"
"IBM"

==> map ord "HAL"
[72,65,76]

==> [C-1 : C in "IBM"]
["H","A","L"]

(As the last example shows, strings can also serve as sources in the binding clauses of list comprehensions; cf. Conditionals and Comprehensions.)

Here's a somewhat more practical example which illustrates the use of various list operations on strings to build a ROT13 translation table:

 
import dict;

def CHARS = strcat ["a".."z"], RCHARS = drop 13 CHARS ++ take 13 CHARS,
  ROT13 = dict $ zip (CHARS++toupper CHARS) (RCHARS++toupper RCHARS);

rot13 C:Char    = C where C:Char = ROT13!C;
                = C otherwise;
rot13 S:String  = map rot13 S;

(This employs the dict function to create a dictionary mapping lower- and uppercase letters to their equivalents in the ROT13 encoding; see Dictionaries.) Example:

 
==> CHARS; RCHARS
"abcdefghijklmnopqrstuvwxyz"
"nopqrstuvwxyzabcdefghijklm"

==> rot13 "This is an encoded string."
"Guvf vf na rapbqrq fgevat."

==> rot13 _
"This is an encoded string."

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.4 Option Parsing

This module provides one function, getopt, which takes two arguments: OPTS, a list of option descriptions in the format described below, and ARGS, a list of strings containing the command line parameters to be parsed for options. The result is a pair (OPTS,ARGS) where OPTS is a list of pairs of options and their arguments (if any; missing arguments are returned as ()), and ARGS is the list of remaining (non-option) arguments. Options are parsed using the rules of GNU getopt(1). If an invalid option is encountered (unrecognized option, missing or extra argument, etc.), getopt throws the offending option string as an exception.

The OPTS argument of getopt is a list of triples (LONG,SHORT,FLAG), where LONG denotes the long option, SHORT the equivalent short option, and FLAG is one of the symbolic integer values NOARG, OPTARG and REQARG which specifies whether the option has no argument, an optional argument or a required argument, respectively. In the returned option-value list, all options will be represented using their long option equivalents.

Also note that both the long and short option values in the OPTS argument may actually be of any type, so unneeded options may be replaced with corresponding dummy values. You only have to make sure that the values given for the long options allow you to identify which option was actually specified.

Finally, please note that, as already pointed out, this script does not belong to the prelude, and hence has to be imported explicitly if you want to use the getopt operation in your program.

Example (see C-Style Formatted I/O, for a description of printf and fprintf):

 
import getopt;

def OPTS        = [("--help", "-h", NOARG),
                   ("--foo",  "-f", REQARG),
                   ("--bar",  "-b", OPTARG)];

invalid_option PROG OPT
                = fprintf ERROR "%s: invalid option %s\n" (PROG,OPT) ||
                  halt;

test [PROG|ARGS]
                = printf "OPTS = %s\nARGS = %s\n" (str OPTS, str ARGS)
                    where (OPTS,ARGS) = catch (invalid_option PROG) $
                      getopt OPTS ARGS;

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.5 Type-Checking Predicates

The typec.q script contains a collection of predicates which check whether an expression is of a given type. The following functions perform a purely syntactic check; they are all equivalent to the corresponding type guards:

isbool X

check for truth values

ischar X

check for single character strings

iscomplex X

check for complex numbers (cf. Complex Numbers)

isexcept X

check for Exception values (cf. Exception Handling)

isfile X

check for file objects

isfloat X

check for floating point numbers

isfunction X

check for lambda functions

isint X

check for integers

islist X

check for lists

isnum X

check for numbers

isrational X

check for rational numbers (cf. Rational Numbers)

isreal X

check for real numbers

isstr X

check for strings

issym X

check for function and variable symbols (special form)

istuple X

check for tuples

Note that the syntactic number predicates isint, isfloat etc. only check for a given representation. The typec.q module also provides various predicates which classify numbers according to the kind of abstract mathematical object they represent, regardless of the concrete representation:

iscompval X

check for complex values

isintval X

check for integer values

isratval X

check for rational values

isrealval X

check for real values

These predicates work mostly like their Scheme counterparts. E.g., 3 will be classified not only as an integer value, but also as a rational, real and a complex value. A complex number with zero imaginary part may also be classified as an integer, rational or real, depending on its real part. A floating point number with zero fractional part and a rational number with a denominator of 1 are both also classified as an integer value.

Moreover, the following predicates allow you to check whether a number is exact or inexact (inexact numbers generally involve floating point values), and whether a number represents an IEEE infinity or NaN ("not a number") value:

isexact X

check for exact numbers

isinexact X

check for inexact numbers

isinf X

check for infinite floating point values

isnan X

check for NaN floating point values

Finally, two other special-purpose predicates provided in typec.q are:

isenum X

check for enumeration type members

issym X

check for function and variable symbols (special form)


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.6 Sorting Algorithms

The sort.q script provides mergesort and quicksort algorithms for sorting a list using a given order predicate:

msort P Xs

mergesort algorithm

qsort P Xs

quicksort algorithm

The mergesort algorithm is more involved than quicksort, but may run much faster if input lists are large enough and are already partially sorted. Both algorithms take an order predicate as their first argument, which makes it possible to sort lists using different criteria. The order predicate must be a function accepting two arguments, and must return true iff the first argument is strictly less than the second. For instance, to sort a list in ascending order, you could say:

 
==> qsort (<) [1,5,3,2,4]
[1,2,3,4,5]

By reversing the order predicate, the list is sorted in descending order:

 
==> qsort (>) [1,5,3,2,4]
[5,4,3,2,1]

Custom order predicates also allow you to sort a list according to different sort keys. For instance, the following example shows how to sort a list of pairs using the first component of each pair as the sort key:

 
==> def L = [(1,2),(5,1),(3,3),(1,1),(4,5)]

==> var le1 = \X Y.fst X < fst Y

==> qsort le1 L
[(1,2),(1,1),(3,3),(4,5),(5,1)]

Both algorithms provided by this module are "stable", i.e., they preserve the relative order of list elements with "equal" sort keys. This is important when successive sorts are used to order elements according to different criteria. For instance:

 
==> var le2 = \X Y.snd X < snd Y

==> qsort le2 L
[(5,1),(1,1),(1,2),(3,3),(4,5)]

==> qsort le1 _
[(1,1),(1,2),(3,3),(4,5),(5,1)]

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.7 Standard Types

Please note that, as of Q 7.8 this module is no longer included in the prelude, so you have to import it explicitly if your script needs one of the data types described here. It is also possible to just import the individual data type modules (array.q, bag.q, etc.) if you do not need the entire collection.

The stdtypes.q script implements a collection of efficient container data structures, which currently comprises arrays, heaps (priority queues), ordered sets and bags, and (ordered as well as hashed) dictionaries. The different data types are actually implemented in the scripts array.q, bag.q, dict.q, hdict.q, heap.q and set.q. Many oper