| [Top] | [Contents] | [Index] | [ ? ] |
The Q Programming Language, Version 7.11, 23 February 2008.
Copyright (c) 1992-2008 by Albert Gräf.
This manual has been published in the series Musikinformatik und Medientechnik, Bereich Musikinformatik, Musikwissenschaftliches Institut, Johannes Gutenberg-Universitaet Mainz
55099 Mainz
Germany
ISSN 0941-0309
The Q programming system is distributed under the terms of the GNU General Public License. See the software license accompanying the distribution for details.
Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies.
Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one.
Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the author.
This document describes the Q programming language and system, version 7.11, 23 February 2008. Written by Albert Gräf, University of Mainz.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Q stands for "equational", so Q, in a nutshell, is a programming language which lets you "program by equations". You specify a system of equations which the interpreter uses as "rewrite rules" to reduce expressions to "normal form". This allows you to formulate your programs in a high-level, concise and declarative style. Q's development started out in the 1990s, motivated by the author's research on term pattern matching [Gräf 1991] and inspired by the pioneering work on equational programming by Michael O'Donnell and others [O'Donnell 1985], as an attempt to show that term rewriting would provide a feasible basis for a practical programming language. I think that this goal has been achieved, and that the present interpreter is efficient and robust enough for practical purposes. Q has been ported to a variety of operating systems, such as BeOS, FreeBSD, Linux, MacOS X, Solaris and Windows. Porting to other modern UNIX/POSIX environments should be a piece of cake. Thus Q can be used on most modern computer systems, and Q scripts written on one system will usually run on all supported platforms.
The Q language supports a rich variety of built-in types, like arbitrary precision integers, floating point numbers (double precision 64 bit), truth values, strings, tuples, lists, streams (a "lazy" list data structure with call-by-need evaluation), lambda functions and files. It also provides primitives for exception handling and multithreaded execution. Q scripts can be broken down into "modules", each with their separate namespace, and Q gives you full control over which symbols (function, variable and type names) are exported by each module. This makes it easy to organize large scripts in a modular fashion. Q also allows you to interface to "external" modules written in the C programming language, which provides a means to access functions in C and C++ libraries and employ C's higher processing speed for time-critical tasks. Conversely, Q scripts can also be executed from C and C++, which allows Q to be used as an embedded language or term rewriting engine in C/C++ applications.
As a practical programming language, Q comes with "batteries included". The standard library, which is mostly written in Q itself, provides rational and complex numbers, a lot of useful tuple, list and stream processing functions (including comprehensions), some common container data structures (dictionaries, sets, etc.), and an interface to the PostScript language. It also includes an extensive system interface which offers services such as binary and C-style formatted I/O, BSD socket I/O, process management, POSIX threads, regular expression matching and internationalization features. Additional extension modules provide interfaces to a number of other third party libraries, which turns Q into a practical tool for a variety of application areas.
In difference to other functional languages, Q is entirely based on the notions of rewrite rules, reductions and irreducible expressions (also known as normal forms) pertaining to the term rewriting calculus. A Q "program", called a script, consists of equations which are treated as rewrite rules and are used to reduce expressions to normal form. The normal form of an expression denotes its value, which can itself be a compound expression. Q has no rigid distinction between "constructor" and "defined" function symbols and it also allows you to evaluate expressions containing "free" variables. Basically, both sides of an equation may involve arbitrary expressions. Therefore Q can also be used as a tool for symbolic expression evaluation.
On the surface, Q looks very much like contemporary functional languages such as Miranda or Haskell. In fact, the syntax of the language has largely been inspired by the first edition of Introduction to Functional Programming by Richard Bird and Philip Wadler. However, Q is an interpreted language with dynamic typing and eager (leftmost-innermost) evaluation, which is more in line with classical functional languages such as Lisp. For the sake of efficiency, Q scripts are first translated into "bytecode" (an intermediate binary format) which is executed on a virtual stack machine. The interpreter automatically optimizes tail recursion, such that "iterative" algorithms can be realized in constant stack space. Besides the built-in I/O operations, the language is free of side-effects; in particular, Q itself does not have mutable variables (although the standard library provides such imperative features for those who need them). Q also provides two novel and (IMHO) interesting features: a notion of special forms which allows to handle both macro-like functions and lazy evaluation in a uniform setting without having to give up the basic eager evaluation strategy; and a notion of type guards which provides a means to cope with hierarchies of abstract data types (similar to the notion of classes with single inheritance in object-oriented languages) in the context of a term rewriting language.
Using Q is supposed to be fairly simple: you throw together some equations, start the interpreter and then type in the expressions you wish to evaluate. All this can be done with a few keystrokes, if you use the Emacs Q mode supplied with the package. A graphical user interface for Windows is also available. Of course, you can also run the Q programming utilities from the command line if you prefer that.
The manual is organized as follows. In Getting Started, we start out with a brief and informal introduction to the main features of the language, and a glimpse of how Q scripts look like. Lexical Matters, describes the lexical part of the Q language. In Scripts and Modules, we discuss how declarations, definitions and equations are put together to form a script, and how to break down larger scripts into a collection of smaller modules which can be managed separately. Declarations, discusses how certain entities like types, variables and function symbols can be declared in a Q script. Expressions, treats the syntax of Q expressions, and describes the built-in operators provided by the Q language. Equations and Expression Evaluation, is about equations and variable definitions, and how they are used in the evaluation process. Types, and Special Forms, describe the facilities provided by the Q language for dealing with abstract data types and deferred evaluation. Built-In Functions, and The Standard Library, discuss the built-in and standard library functions of the Q language. Clib, describes Q's "system module" which provides access to some important functions from the C library. The appendix gives additional information about some aspects of the language and its implementation. Q Language Grammar, contains a summary of the Q language syntax in BNF. Using Q, provides a description of the programming tools included in the Q programming system, C Language Interface, describes Q's interface to the C programming language, and Debugging, is a brief introduction to the symbolic debugger built into the Q interpreter. Finally, Running Scripts in Emacs, discusses how Q scripts can be edited and run using the Emacs editor.
Acknowledgements
This project has become much further reaching and took a lot longer than I anticipated when I started out working on it, and I wouldn't have been able to keep it going without the patience and support of my beloved wife Evelyn and children Sebastian, Janosch, Yannic and Miriam.
Next I have to acknowledge the support and help provided by colleagues at Mainz and Bremen (Germany) and at INRIA Lorraine (France) during the initial phases of this project. In 1992 Dieter Hofbauer invited me to the term rewriting group at INRIA Lorraine to discuss the design of the (then still nascent) Q with his colleagues and friends. During the following years, I also collaborated with Frank Drewes at Bremen (now at Umea, Sweden) who was involved in the initiation of this project, and Klaus Barthelmann at Mainz who provided many comments which helped to improve the early versions of this manual. Frank and Klaus tested various early beta versions of the Q interpreter, and contributed sample scripts as well as substantial parts of the standard Q library.
I'd also like to express my gratitude to the growing community of users of the Q language all over the world, in particular the members of the Q mailing list, for the lively discussions which helped to improve the language and its implementation, and for the contributed Q modules and applications. Special thanks are due to John Cowan for proofreading the manual, for his valuable suggestions concerning design and implementation issues and for his Q-Chicken interface, and to Rob Hubbard for his comprehensive rational number and polynomial modules which are now part of the distribution.
Last but not least, thanks are also due to the developers of ML and Haskell. While these people have not been involved with Q in any way, their work, which has changed the landscape of functional programming, has been a constant source of inspiration for me.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A new programming language is best conveyed through some examples, and therefore it is common practice to start a language description with an introductory chapter which treats the most essential language features in a fairly informal manner. This is also the purpose of the present chapter. We first show how some of the "standard features" of the Q programming language can be employed for using the Q interpreter effectively as a sophisticated kind of "desktop calculator". The remainder of this section then adresses the question of how you can extend the Q environment by providing your own definitions in Q scripts, and describes the evaluation process and the treatment of runtime errors in the interpreter.
| 2.1 Using the Interpreter | ||
| 2.2 Using the Standard Library | ||
| 2.3 Writing a Script | ||
| 2.4 Definitions | ||
| 2.5 Runtime Errors |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
To begin with, let us show how the Q interpreter is invoked to evaluate
some expressions with the built-in functions provided by the Q
language. Having installed the Q programming system, you can simply
invoke the interpreter from your shell by typing q. The
interpreter will then start up, display its sign-on, and leaves you at
its prompt:
==> |
This indicates that the interpreter is waiting for you to type an expression. Let's start with a simple one:
==> 23 23 |
Sure enough, that's correct. Whenever you type an expression, the interpreter just prints its value. Any integer is a legal value in the Q language, and the common arithmetic operations work on these values as expected. Q integers are "bignums", i.e., their size is only limited by available memory:
==> 16753418726345 * 991726534256718265234 16614809890429729930396098173389730 |
"Real" numbers (more exactly, their approximation using 64 bit double precision floating point numbers) are provided as well. Let's try these:
==> sqrt (16.3805*5)/.05 181.0 |
The sqrt identifier denotes a built-in function which computes
the square root of its argument. (A summary of the built-in functions
can be found in Built-In Functions.)
What happens if you mistype an expression?
==> sqrt (16.3805*5)/,05
! Syntax error
>>> sqrt (16.3805*5)/,05
^
|
As you can see, the interpreter not only lets you know that you typed
something wrong, but also indicates the position of the error. It is
quite easy to go back now and correct the error, using the interpreter's
"command history". With the up and down arrow keys you can cycle
through the expressions you have typed before, edit them, and resubmit a
line by hitting the carriage return key. Also note that what you typed
is stored in a "history file" when you exit the interpreter, which is
reloaded next time the interpreter is invoked. A number of other useful
keyboard commands are provided, see Readline: (readline)Command Line Editing section `Command Line Editing' in The GNU Readline Library. In
particular, you can have the command line editor "complete" function
symbols with the <TAB> key. E.g., if you type in sq and
press the <TAB> key, you will get the sqrt function.
Before we proceed, a few remarks about the syntax of function applications are in order. The first thing you should note is that in Q, like in most other modern functional languages, function application is simply denoted by juxtaposition:
==> sqrt 2 1.4142135623731 |
Multiple arguments are specified in the same fashion:
==> max 5 7 7 |
Parentheses can be used for grouping expressions as usual. In particular, if a function application appears as an argument in another function application then it must be parenthesized as follows:
==> sqrt (sqrt 2) 1.18920711500272 |
Another important point is that operators are in fact just functions in
disguise. You can turn any operator into a prefix function by enclosing
it in parentheses. Thus (+) denotes the function which adds its
two arguments, and X+1 can also be written as (+) X 1; in
fact, the former expression is nothing but "syntactic sugar" for the
latter, see Built-In Operators. You can easily verify this in the
interpreter:
==> (+) X 1 X+1 |
You can also have partial function applications like (*) 2 which
denotes a function which doubles its argument. Moreover, Q supports
so-called operator sections which allow you to specify a binary
operator with only either its left or right operand. For instance,
(1/) denotes the reciprocal and (+1) the "increment by
1" function:
==> (+1) 5 6 ==> (1/) 3 0.333333333333333 |
The interpreter maintains a global variable environment, in which you
can store arbitrary expression values. This provides a convenient means
to define abbreviations for frequently-used expressions and for storing
intermediate results. Variable definitions are done using the def
command. For instance:
==> def X = 16.3805*5 ==> X 81.9025 |
As indicated, variable symbols usually start with a capital letter; an
initial lowercase letter indicates a function symbol. This convention is
valid throughout the Q language. However, it is possible to explicitly
declare an uncapitalized symbol as a variable, using the var
command, and then assign values to it, like so:
==> var f ==> def f = sqrt |
You can also both declare a variable and initialize its value with a
single var command:
==> var f = sqrt ==> f X/.05 181.0 |
Multiple variable declarations and definitions can be entered on a
single line, using commas to separate different definitions in a
def or var command, and you can separate multiple
expressions and commands on the same line with semicolons:
==> def X = 16.3805*5, f = sqrt; X; f X/.05 81.9025 181.0 |
You can also bind several variables at once by using an expression pattern as the left-hand side of a variable definition:
==> def (f, X) = (sqrt, 16.3805*5); f X/.05 181.0 |
(Such pattern-matching definitions only work with def; the
left-hand side of a var declaration must always be a simple
identifier.)
Another useful feature is the built-in "anonymous" variable `_', which is always set to the value of the most recent expression value printed by the interpreter:
==> _ 181.0 ==> 2*_ 362.0 |
Sometimes you would also like the interpreter to "forget" about a
definition. This can be done by means of an undef statement:
==> undef X; X X |
Besides def, undef and var, the interpreter
provides a number of other special commands. The most important command
for beginners certainly is the help command, which displays the
online manual using the GNU info reader. You can also run this command
with a keyword to be searched in the info file. For instance, to find
out about all special commands provided by the interpreter, type the
following:
==> help commands |
(Type q when you are done reading the info file.)
Other useful commands are who which prints a list of the
user-defined variables, and whos which describes the attributes
of a symbol:
==> who
f
==> whos f sqrt
f user-defined variable symbol
= sqrt
sqrt builtin function symbol
sqrt X1
|
You can save user-defined variables and reload them using the
save and load commands:
==> save saving .q_vars ==> def f = 0; f 0 ==> load loading .q_vars ==> f sqrt |
Some statistics about the most recent expression evaluation can be
printed with the stats command:
==> sum [1..123456] 7620753696 ==> stats 0.52 secs, 246916 reductions, 246918 cells |
The stats command displays the (cpu) time needed to complete an
evaluation, the number of "reductions" (a.k.a. basic evaluation steps)
performed by the interpreter, and the number of expression "cells"
needed during the course of the computation. This information is often
useful for profiling purposes.
Other commands allow you to edit and run a script directly from the interpreter, and to inspect and set various internal parameters of the interpreter; see Command Language.
Of course, the Q interpreter can also carry out computations on non-numeric data. In fact, Q provides a fairly rich collection of built-in data types, and makes it easy to define your own. For instance, strings are denoted by character sequences enclosed in double quotes. Strings can be concatenated with the `++' operator and the length of a string can be determined with `#'. Moreover, the character at a given position can be retrieved with the (zero-based) indexing operator `!':
==> "abc"++"xyz"; #"abc"; "abc"!1 "abcxyz" 3 "b" |
The same operations also apply to lists, which are written as sequences of values enclosed in brackets:
==> [a,b,c]++[x,y,z]; #[a,b,c]; [a,b,c]!1 [a,b,c,x,y,z] 3 b |
As in Prolog, lists are actually represented as right-recursive
constructs of the form `[X|Xs]' where X denotes the head
element of the list and Xs its tail, i.e., the list of the
remaining elements. This representation allows a list to be traversed
and manipulated in an efficient manner.
==> def [X|Xs] = [a,b,c]; X; Xs a [b,c] ==> [0|Xs] [0,b,c] |
Another useful list operation is "enumeration", which works with all
so-called "enumeration types" (cf. Enumeration Types), as well
as characters and numbers. Enumerations are constructed with the
enum function, or using the familiar [X..Y] notation which
is in fact only "syntactic sugar" for enum:
==> enum "a" "k" ["a","b","c","d","e","f","g","h","i","j","k"] ==> ["a".."k"] ["a","b","c","d","e","f","g","h","i","j","k"] ==> [0..9] [0,1,2,3,4,5,6,7,8,9] ==> [0.1,0.2..1.0] [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0] |
Tuples are sequences of expressions enclosed in parentheses, which are Q's equivalent of "vectors" or "arrays" in other languages. Like lists, tuples can be concatenated, indexed and enumerated, but they use an internal representation optimized for efficient storage and constant-time indexing:
==> (a,b,c)++(x,y,z); #(a,b,c); (a,b,c)!1 (a,b,c,x,y,z) 3 b ==> (0..9) (0,1,2,3,4,5,6,7,8,9) |
Q also offers built-in operations to convert between lists and tuples:
==> tuple [a,b,c] (a,b,c) ==> list _ [a,b,c] |
Q provides yet another list-like data structure, namely "lazy" lists a.k.a. streams, which are written using curly braces instead of brackets. In difference to lists, streams only produce their elements "on demand", when they are needed in the course of a computation. This makes it possible to represent large and even infinite sequences of values in an efficient manner. You can see this somewhat unusual behaviour if you try streams interactively in the interpreter. For instance, let's try to concatenate two streams:
==> {a,b,c}++{x,y,z}
{a|{b,c}++{x,y,z}}
|
You probably noticed that the tail of the resulting stream,
{b,c}++{x,y,z}, has not been evaluated yet. But that is all
right, because it will be evaluated as soon as we need it:
==> _!4 y |
This lazy way of doing things becomes especially important when we work with infinite streams. We will have more to say about this in the following section.
It is quite instructive to keep on playing a bit like this, to get a feeling
of what the Q interpreter can do. You might also wish to consult
Built-In Functions, which discusses the built-in functions, and
Using Q, which shows how the interpreter is invoked and what additional
capabilities it offers. To exit the interpreter when you are finished,
invoke the built-in quit function:
==> quit |
This function does not return a value, but immediately returns you to the operating system's command shell.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The standard library is a collection of useful functions and data
structures which are not provided as built-ins, but are
implemented by scripts which are mostly written in Q itself. Most of
these scripts are included in the "prelude script" prelude.q,
which is always loaded by the interpreter, so that the standard library
functions are always available. See The Standard Library, for an
overview of these operations. You can check which standard library
scripts a.k.a. "modules" are actually loaded with the interpreter's
modules command. With the standard prelude loaded, this command
shows the following:
==> modules assert clib* complex cond error math prelude rational sort stdlib stream string tuple typec |
As you can see, there are quite a few modules already loaded, and you can use the functions provided by these modules just like any of the built-in operations. The standard library provides a lot of additional functions operating on numbers, strings and lists. For instance, you can take the sum of a list of (integer or floating point) numbers simply as follows:
==> sum [1..5] 15 |
In fact, the library provides a rather general operation, foldl,
which iterates a binary function over a list, starting from a given initial
value. Using the foldl function, the above example can also be
written as follows:
==> foldl (+) 0 [1..5] 15 |
(There also is a foldr function which works analogously, but
combines list members from right to left rather than from left to
right.)
Generalizing the notion of simple enumerations of numbers like
[1..5], the standard library also provides the general-purpose
list-generating function while. For instance, we can generate the
list of all powers of 2 in the range [1..1000] as follows:
==> while (<=1000) (2*) 1 [1,2,4,8,16,32,64,128,256,512] |
(Recall from the previous section that (2*) is the doubling
function. And (<=1000) is a predicate which checks its argument
to be less than or equal to 1000.)
If we want to generate a list of a given size, we can use iter
instead. So, for instance, we might compute the value of a finite
geometric series as follows:
==> sum (iter 4 (/3) 1) 1.48148148148148 |
The map function allows you to apply a function to every member of a
list. For instance, the following expression doubles each value in the list
[1..5]:
==> map (2*) [1..5] [2,4,6,8,10] |
Lists can also be filtered with a given predicate:
==> filter (>=3) [1..5] [3,4,5] |
The scanl operation allows you to compute all the sums of initial
segments of a list (or accumulate any other binary operation over a
list):
==> scanl (+) 0 [1..5] [0,1,3,6,10,15] |
(Like foldl, scanl also has a sibling called scanr
which collects results from right to left rather than left to right,
starting at the end of the list.)
You can partition a list into an initial part with a given length and
the remaining part of the list with take and drop:
==> take 3 [1..5]; drop 3 [1..5] [1,2,3] [4,5] |
The takewhile and dropwhile functions have a similar
effect, but partition their input list according to a given predicate:
==> takewhile (<=3) [1..5]; dropwhile (<=3) [1..5] [1,2,3] [4,5] |
Another useful list operation is zip which collects pairs of
corresponding elements in two input lists:
==> zip [1..5] ["a".."e"] [(1,"a"),(2,"b"),(3,"c"),(4,"d"),(5,"e")] |
The effect of zip can be undone with unzip which returns a
pair of lists:
==> unzip _ ([1,2,3,4,5],["a","b","c","d","e"]) |
The zipwith function is a generalized version of zip which
combines corresponding members from two lists using a given binary
function:
==> zipwith (*) [1..5] [1..5] [1,4,9,16,25] |
All the operations described above - and many others - are provided by
the stdlib.q script. It is instructive to take a look at this
script and see how the operations are defined.
In addition, the standard library provides yet another list generating
function listof which implements so-called "list
comprehensions". These allow you to describe list values in much the
same way as sets are commonly specified in mathematics. For instance,
you can generate a list of pairs (I,J) with 1<=J<I<=5 as
follows:
==> listof (I,J) (I in [1..5], J in [1..I-1]) [(2,1),(3,1),(3,2),(4,1),(4,2),(4,3),(5,1),(5,2),(5,3),(5,4)] |
The language provides syntactic sugar for list comprehensions so that the above example can also be written as follows:
==> [(I,J) : I in [1..5], J in [1..I-1]] [(2,1),(3,1),(3,2),(4,1),(4,2),(4,3),(5,1),(5,2),(5,3),(5,4)] |
The same kind of construct also works with tuples:
==> ((I,J) : I in [1..5], J in [1..I-1]) ((2,1),(3,1),(3,2),(4,1),(4,2),(4,3),(5,1),(5,2),(5,3),(5,4)) |
We also have a random number generator, which is implemented by the
built-in random function. Here is how we can generate a list of 5
pseudo random 32 bit integers:
==> [random : I in [1..5]] [1960913167,1769592841,3443410988,2545648850,536988551] |
To get random floating point values in the range [0,1] instead, we
simply divide the results of random by 0xffffffff:
==> map (/0xffffffff) _ [0.456560674928259,0.41201544027124,0.801731596887515,0.592705060400233, 0.125027389993199] |
Lists can be sorted using quicksort (this one comes from sort.q):
==> qsort (<) _ [0.125027389993199,0.41201544027124,0.456560674928259,0.592705060400233, 0.801731596887515] |
As already mentioned, the Q language also provides a "lazy" list data
structure called "streams". Streams are like lists but can actually be
infinite because their elements are only produced "on demand". The
standard library includes a module stream.q which implements a
lot of useful stream operations. Most list operations carry over to
streams accordingly. For instance, if we want to create a geometric
series like the one generated with iter above, but we do not know
how many elements will be needed in advance, we can employ the stream
generation function iterate:
==> iterate (/3) 1
{1|iterate (/3) ((/3) 1)}
|
The {|} stream constructor works in the same way as the
[|] list constructor, but is "special" in that it does
not evaluate its arguments, i.e., the head and the tail of the
stream. (Otherwise the call to iterate would never terminate.)
To get all members of the series we can apply the scanl function:
==> def S = scanl (+) 0 _; S
{0|scanl (+) (0+1) (iterate (/3) ((/3) 1))}
|
Now we can extract any number of initial values of the series using the
take operation, and convert the resulting stream to an ordinary
list with the list function. For instance, if we are interested
in the first five values of the series, we proceed as follows:
==> list (take 5 S) [0,1,1.33333333333333,1.44444444444444,1.48148148148148] |
Note that the stream S is really infinite; if we want, we can
extract any value in the series:
==> S!9999 1.5 |
Let's see how many iterations are actually required to reach the limit
1.5 with an error of at most 1e-15:
==> #takewhile (>1e-15) (map (1.5-) S) 32 |
This means that the sum of the first 31 series terms is needed to get an
accuracy of 15 digits (which is the best that we can hope for with 64
bit floating point values). We can readily verify this using
iter:
==> sum (iter 31 (/3) 1) 1.5 |
The standard library also implements stream enumerations and
comprehensions which work like the corresponding list and tuple
constructs but are evaluated in a lazy fashion, and the interpreter
provides syntactic sugar for these. In particular, the notation
{X..} or {X,Y..} can be used to specify an infinite
arithmetic sequence. For instance, here is another way to define the
infinite geometric series from above:
==> def S = scanl (+) 0 {1/3^N : N in {0..}}
==> list (take 5 S)
[0,1.0,1.33333333333333,1.44444444444444,1.48148148148148]
|
It is worth noting here that streams are just one simple example of a lazy data structure, which happen to play an important role in many useful algorithms so that it makes sense to provide them as a builtin. The Q language makes it rather easy to define your own lazy data structures using so-called special forms. In fact, special forms provide a uniform means to define both lazy data constructors and macro-like functions which take their arguments unevaluated, employing "call by name" parameter passing. This will be discussed in detail in Special Forms.
Special forms are an integrated part of Q which adds a great amount of
expressive power to the language. In particular, they allow specialized
constructs such as a "short-circuit" conditional expressions to be
defined in the standard library, just like "ordinary" functions,
rather than having to provide them as builtins. One example is the
conditional expression if X then Y else Z which is nothing but
syntactic sugar for the standard library function ifelse:
==> ifelse (5>0) "positive" "negative" "positive" ==> if 5>0 then "positive" else "negative" "positive" |
Another special form which is useful in many situations is lambda
(as of Q 7.1, this one is actually provided as a builtin), which allows
you to create little "anonymous" functions on the fly. The customary
notation \X.Y is provided as a shorthand for lambda X
Y. These constructs are also called lambda abstractions, or
simply lambdas. For instance, the following interpreter command
defines the well-known factorial function using a lambda abstraction and
assigns the resulting function object to the variable fac:
==> var fac = \N.if N>0 then N*fac (N-1) else 1 ==> fac \X1 . if X1>0 then X1*fac (X1-1) else 1 ==> map fac [1..10] [1,2,6,24,120,720,5040,40320,362880,3628800] |
Lambdas taking multiple arguments can be created like this:
==> \X Y.(1-X)*Y \X1 . \X2 . (1-X1)*X2 ==> _ 0.9 0.5 0.05 |
The lambda arguments can also be patterns to be matched against the actual parameters. For instance:
==> \(X,Y).(1-X)*Y \(X1,X2) . (1-X1)*X2 ==> _ (0.9,0.5) 0.05 |
Besides string and list processing functions and general utility
functions of the kind sketched out above, the standard library also
includes a collection of efficient "container" data structures which
are useful in many applications. For instance, so-called hashed
dictionaries (also known as "hashes" or "associative arrays") are
implemented by the hdict.q script. The following expressions show
how to create a dictionary object from a list, and how to access this
dictionary using the index (!), keys and vals
operations. (Note that here we have to load the hdict.q module
explicitly, as it is not automatically included via the prelude. We
could also import the stdtypes.q module instead, to get hold of
all the container types.)
==> import hdict ==> def D = hdict [(foo,99),(bar,-1),(gnu,foo)]; D!gnu foo ==> keys D [foo,bar,gnu] ==> vals D [99,-1,foo] |
This completes our little tour through the standard library. To find out more, please refer to The Standard Library, or take a look at the scripts themselves.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Now that we have taken the first big hurdle and are confident that we can actually get the Q interpreter to evaluate us some expressions, let us try to define a new function for use in the interpreter. That is, we are going to write our first own Q "program" or "script".
In the Q language, there are actually two different ways to create new functions. As we have already seen in the previous sections, little "anonymous" functions are often computed on the fly by deriving them from existing functions using partial applications or lambda abstractions. The other way, which is often more convenient when the functions to be defined get more complicated, is the definition of named functions by the means of "equations", which works in a way similar to algebraic definitions in mathematics. We will discuss how to do this in the following.
Having exited the Q interpreter, invoke your favourite text editor and enter the following simple script which implements the factorial function:
fac N = N*fac(N-1) if N>0;
= 1 otherwise;
|
(Note that in order to define functions equationally, you really have to write a script file. For technical reasons there currently is no way to enter an equational definition directly in the interpreter.)
Save the script in the file fac.q and then restart the
interpreter as follows (here and in the following `$' denotes the
command shell prompt):
$ q fac.q |
You can also edit the script from within the interpreter (using vi or
the editor named by the EDITOR environment variable) and then restart
the interpreter with the new script, as follows:
==> edit fac.q ==> run fac.q |
In any case, now the interpreter should know about the definition of
fac and we can use it like any of the built-in operations:
==> map fac [1..10] [1,2,6,24,120,720,5040,40320,362880,3628800] |
For instance, what is the number of 30-element subsets of a 100-element set?
==> fac 100 div (fac 30*fac 70) 29372339821610944823963760 |
As another example, let us write down Newton's algorithm for computing
roots of a function. Type in the following script and save it to the
file newton.q:
newton DX DY F = until (satis DY F) (improve DX F); satis DY F X = abs (F X) < DY; improve DX F X = X - F X / derive DX F X; derive DX F X = (F (X+DX) - F X) / DX; |
Restart the interpreter with the newton.q script and try the
following. Note that the target function here is \Y.Y^3-X which
becomes zero when Y equals the cube root of X.
==> var eps = .0001, cubrt = \X.newton eps eps (\Y.Y^3-X) X ==> cubrt 8 2.00000000344216 |
Well, this is not that precise, but we can do better:
==> def eps = 1e-12 ==> cubrt 8 2.0 |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
As mentioned in the previous section, in the Q language function definitions take the form of equations which specify how a given expression pattern, the left-hand side of the equation, is transformed into a new expression, the right-hand side of the equation. In this section we elaborate on this some more to give you an idea about how these equational definitions work. Our explanations will necessarily be a bit sketchy at this point, but the concepts we briefly touch on in this section will be explained in much more detail later.
The left-hand side of an equation may introduce "variables" (denoted
by capitalized identifiers, as in Prolog) which play the role of formal
parameters in conventional programming languages. For instance, the
following equation defines a function sqr which squares its
(numeric) argument:
sqr X = X*X; |
An equation may also include a condition part, as in the definition of the factorial function from the previous section:
fac N = N*fac(N-1) if N>0;
= 1 otherwise;
|
As indicated above, several equations for the same left-hand side can be
factored to the left, omitting repetition of the left-hand side.
Furthermore, the otherwise keyword may be used to denote the
"default" alternative.
The left-hand side of an equation may actually be an arbitrary compound
expression pattern to be matched in the expression to be evaluated. For
instance, as we have already seen, the expressions [] and
[X|Xs] denote, respectively, the empty list and a list starting
with head element X and continuing with a list Xs of
remaining elements, just as in Prolog. We can use these patterns to
define a function add which adds up the members of a list as
follows:
add [] = 0; add [X|Xs] = X+add Xs; |
Thus no explicit operations for "extracting" the members from a compound data object such as a list are required; the components are simply retrieved by "pattern matching". This works in variable definitions as well. For instance, in the interpreter you can bind variables to the head element and the tail of a list using a single definition as follows:
==> def [X|Xs] = [1,2,3]; X; Xs 1 [2,3] |
Pattern matching also works if the arguments take the form of function applications. For instance, the following definition implements an insertion operation on binary trees:
insert nil Y = bin Y nil nil;
insert (bin X T1 T2) Y = bin Y T1 T2 if X=Y;
= bin X (insert T1 Y) T2 if X>Y;
= bin X T1 (insert T2 Y) if X<Y;
|
Note the symbols nil and bin which act as "constructors"
for the binary tree data structure in this example. (Q actually has no
rigid distinction between "function applications" and "data
constructors". This is unnecessary since function applications can be
"values" in their own right, as discussed below.)
The Q interpreter treats equations as "rewrite rules" which are used to reduce expressions to "normal forms". Evaluation normally uses the standard "leftmost-innermost" rule, also known as "applicative order" or "call by value" evaluation. Using this strategy, expressions are evaluated from left to right, innermost expressions first; thus the arguments of a function application (and also the function object itself) are evaluated before the function is applied to its arguments.
An expression is in normal form if it cannot be evaluated any further by applying matching equations (including some predefined rules which are used to implement the built-in operations, see Equations and Expression Evaluation). A normal form expression simply stands for itself, and constitutes a "value" in the Q language. The built-in objects of the Q language (such as numbers and strings) are always in normal form, but also compound objects like lists (given that their elements are in normal form) and any other symbol or function application which does not match a built-in or user-defined equation.
This means that Q is essentially an "exception-free" language. That is, an "error condition" such as "wrong argument type" does not raise an exception by default, but simply causes the offending expression to be returned "as is". For instance:
==> hd [] hd [] ==> sin (X+1) sin (X+1) |
You may be disconcerted by the fact that expressions like hd []
and sin (X+1) actually denote legal values in the Q
language. However, this is one of Q's key features as a term rewriting
programming language, and it adds a great amount of flexibility to the
language. Most importantly, it allows expressions, even expressions
involving variables, to be evaluated in a symbolic fashion. For
instance, you might add rules for algebraic identities and symbolic
differentiation to your script and have the interpreter simplify
expressions according to these rules. On the other hand, the Q language
also allows you to refine the definition of any built-in or user-defined
operation and thus it is a simple matter to add an "error rule" like
the following to your script when needed:
hd [] = error "hd: empty list"; |
(The error function is defined in the standard library script
error.q, see Diagnostics and Error Messages.) In order to
implement more elaborate error handling, you can also use Q's built-in
throw and catch functions to raise and respond to
exceptions on certain error conditions. This is discussed in
Exception Handling.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
There are a few error conditions which may cause the Q interpreter to
abort the evaluation with a runtime error message. A runtime error
arises when the interpreter runs out of memory or stack space, when the
user aborts an evaluation with the interrupt key (usually Ctl-C),
and when the condition part of an equation does not evaluate to a truth
value (true or false). (All these error conditions can
also be handled by the executing script, see Exception Handling.)
You can use the break on command to tell the interpreter that it
should fire up the debugger when one of the latter two conditions
arises. After Ctl-C, you can then resume evaluation in the
debugger, see Debugging. This is not possible if the interpreter
stumbles across an invalid condition part; in this case the debugger is
invoked merely to report the equation which caused the error, and to
inspect the call chain of the offending rule. Hitting the carriage
return key will return you to the interpreter's evaluation loop.
For instance, reload the fac.q script from Writing a Script:
fac N = N*fac(N-1) if N>0;
= 1 otherwise;
|
Now enable the debugger with the break command and type some
"garbage" like fac fac:
==> break on; fac fac ! Error in conditional 0> fac.q, line 1: fac fac ==> fac*fac (fac-1) if fac>0 (type ? for help) : |
What happened? Well, the debugger informs us that the error occurred when the interpreter tried to apply the first equation,
fac N = N*fac(N-1) if N>0; |
to the expression fac fac. In fact this is no big surprise since
we cannot expect the interpreter to know how to compare a symbol,
fac, with a number, 0, which it tried when evaluating the
condition part of the above equation. So let us return to the
interpreter's prompt by hitting the carriage return key:
: <CR> ==> |
The interpreter now waits for us to type the next expression. (For a summary
of debugger commands please refer to Debugging. You can also
type ? or help while in the debugger to obtain a list of
available commands.)
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The vocabulary of the Q programming language consists of notations for identifiers, operators, integers, floating point numbers, character strings, comments, a few reserved words which may not be used as identifiers, and some special punctuation symbols which are used as delimiters.
| 3.1 Whitespace and Comments | ||
| 3.2 Identifiers and Reserved Words | ||
| 3.3 Operator Symbols | ||
| 3.4 Numbers | ||
| 3.5 Strings | ||
| 3.6 Unicode Support |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Whitespace (blanks, tabs, newlines, form feeds) serves as a delimiter between adjacent symbols, but is otherwise ignored. Comments are treated like whitespace:
/* This is a comment ... */ |
Note that these comments cannot be nested. C++-style line-oriented comments are also supported:
// C++-style comment ... |
Furthermore, lines beginning with the #! symbol denote a special
type of comment which may be processed by the operating system's command
shell and the Q programming tools. On UNIX systems, this (odd) feature
allows you to execute Q scripts directly from the shell (by specifying
the q program as a command language processor) and to include
compiler and interpreter command line options in a script (see
Running Scripts from the Shell).
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Identifiers are denoted by the usual sequences of letters (including
`_') and digits, beginning with a letter. Upper- and lowercase is
distinct. In the Q language, identifiers are used to denote type,
function and variable symbols. Type identifiers may start
with either an uppercase or lowercase letter (the convention, however,
is to use an initial uppercase letter). The case of the first letter
matters, however, for function and variable symbols. As in Prolog, a
capitalized identifier (such as X, Xmax and XMAX)
indicates a variable symbol, whereas identifiers starting with a
lowercase letter denote function symbols (unless they are declared as
"free" variables, see below). Unlike in Prolog, the underscore
`_' counts as a lowercase letter, hence _MAX is a
function symbol, not a variable. (The idea behind this is that it allows
you to get a function symbol which appears to start with an uppercase
letter by stropping it with an initial `_'.) However, as an
exception to the general rule, the identifier `_' does
denote a variable symbol, the so-called anonymous variable. The
same rules also apply to symbols created interactively in the
interpreter.
Variables actually come in two flavours: bound and free variables, i.e., variables which also occur on the left-hand side of an equation, and variables which only occur on the right-hand side and/or in the condition part of an equation. Identifiers may also be declared as free variables; see Declarations. In this case, they may also start with a lowercase letter.
Type, function and free variable identifiers may also be qualified
with a module identifier prefix (cf. Scripts and Modules),
to specifically denote a symbol of the given module. Such a qualified
identifier takes the form module-id::identifier; no
whitespace or comments are allowed between the module name, `::'
symbol and the function or variable identifier.
Formally, the syntax of identifiers is described by the following grammatical rules:
identifier : unqualified-identifier
| qualified-identifier
qualified-identifier : module-identifier '::'
unqualified-identifier
unqualified-identifier : variable-identifier
| function-identifier
| type-identifier
module-identifier : letter {letter|digitsym}
type-identifier : letter {letter|digitsym}
variable-identifier : uppercase-letter {letter|digitsym}
| '_'
function-identifier : lowercase-letter {letter|digitsym}
letter : uppercase-letter|lowercase-letter
uppercase-letter : 'A'|...|'Z'|uppercase unicode letter
lowercase-letter : 'a'|...|'z'|'_'|lowercase unicode letter
digitsym : '0'|...|'9'|unicode digit
|
(Please refer to Q Language Grammar, for a description of the BNF grammar notation used throughout this document.)
The following symbols are reserved words of the Q language and may not be used as identifiers:
as const def else extern from if import include otherwise private public special then type undef var virtual where |
In addition, the following identifiers are predeclared as operator symbols (see Operator Symbols) and cannot be used as normal identifiers either:
and div mod not or |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Operator symbols may either take the form of function identifiers or they may be sequences of punctuation characters (excluding `_' which serves as a letter in the Q language). Like function and free variable symbols, they may be qualified with a module identifier prefix:
op : unary-op|binary-op
unary-op : opsym
binary-op : opsym|'and' 'then'|'or' 'else'
opsym : unqualified-opsym
| qualified-opsym
qualified-opsym : module-identifier '::'
unqualified-opsym
unqualified-opsym : function-identifier
| punctsym {punctsym}
punctsym : unicode punctuation character
|
In either case, operator symbols must be declared explicitly before they can be used, cf. Declarations. The declaration determines the precedence and "fixity" of the operator (i.e., whether it acts as a unary prefix or a binary infix operator; see User-Defined Operators, for more information on this).
As already mentioned, the following identifiers are predefined as built-in operators:
and div mod not or |
As indicated by the syntax rules, the logical connectives `and' and `or' can also be combined with the keywords `then' and `else' to form the "short-circuit" connectives `and then' and `or else'.
The punctuation symbols predefined as built-in operators are the following:
` ' ~ & . || < > = <= >= <> == ++ + - * / ^ ! # $ |
Most of these can actually be redeclared by the programmer for his own purposes. However, there are a few symbols, called "soft delimiters", which play a special role in the syntax of the Q language (some of these are also used as operator symbols):
~ . .. : | = == - \ @ |
The soft delimiters may occur inside user-defined operators, but if they are used as separate lexemes then they are treated like reserved words and thus they may not be declared as operator symbols (except for the purpose of making aliases, cf. Declarations). The same applies to the reserved keywords (see Identifiers and Reserved Words).
Moreover, the following special symbols serve as "hard delimiters" in the Q language which always separate lexemes and thus may not occur inside operator symbols at all:
" , ; :: ( ) [ ] { }
|
The same applies to whitespace and other non-printable characters, as well as the comment delimiters (`//' and `/*' as well as initial `#!' on a script line).
Symbols consisting of punctuation are generally parsed using the "longest possible lexeme" a.k.a. "maximal munch" rule. Here, the "longest possible lexeme" refers to the longest prefix of the input such that the sequence of punctuation characters either forms a valid (i.e., declared) operator symbol, or one of the reserved and special delimiter symbols listed above. Thus, e.g., `..#' will usually be parsed as `.. #', i.e., a reserved `..' symbol followed by a `#' operator. This holds unless the entire sequence `..#' has already been declared as an operator in the scope where it is used.
The only exception to the "maximal munch" rule occurs inside declarations where a new operator symbol is being introduced. In this case the symbol extends up to the next hard delimiter symbol (usually `)' or `;') or whitespace/non-printable character in the input. For instance, the following Q code snippet declares a new binary operator symbol `+~%' (please refer to Declarations, for an explanation of the declaration syntax):
public (+~%) X Y; |
Another special case arises with conglomerates of three or more
consecutive `:' symbols. In this case the initial `::' is
always treated as the designation of a qualified (operator) symbol. (In
the unlikely case that you ever need to specify a "guarded variable"
of the form X : ::Type with a qualified type in the built-in
namespace, the space between the `:' token and the following
`::' qualification designator is mandatory.)
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Signed decimal numeric constants are denoted by sequences of decimal digits (only the standard digits 0..9 in the ASCII character set may be used here) and may contain a decimal point and/or a scaling factor. Integers can also be denoted in octal or hexadecimal, using the same syntax as in C:
number : ['-'] unsigned-number
unsigned-number : '0' octdigitseq
| '0x' hexdigitseq
| '0X' hexdigitseq
| digitseq ['.' [digitseq]] [scalefact]
| [digitseq] '.' digitseq [scalefact]
digitseq : digit {digit}
octdigitseq : octdigit {octdigit}
hexdigitseq : hexdigit {hexdigit}
scalefact : 'E' ['-'] digitseq
| 'e' ['-'] digitseq
digit : '0'|...|'9'
octdigit : '0'|...|'7'
hexdigit : '0'|...|'9'|'a'|...|'f'|'A'|...|'F'
|
Simple digit sequences without decimal point and scaling factor are treated as integers; if the sequence starts with `0' or `0x'/`0X' then it denotes an integer in octal or hexadecimal base, respectively. Other numbers denote (decimal) floating point values. If a decimal point is present, it must be preceded or followed by at least one digit. Both the scaling factor and the number itself may be prefixed with a minus sign. (Syntactically, the minus sign in front of a number is interpreted as unary minus, cf. Expressions. However, if unary minus occurs in front of a number, it is interpreted as a part of the number and is taken to denote a negative value. See the remarks concerning unary minus in Expressions.) Some examples:
0 -187326 0.0 -.05 3.1415e3 -1E-10 0177 0xaf -0XFFFF |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
String constants are written as character sequences enclosed in double quotes:
string : '"' {char} '"'
char : any character but newline and |
To include newlines, double quotes and other (non-printable) characters in a string, the following escape sequences may be used:
\n newline \r carriage return \t tab \b backspace \f form feed \" double quote \\ backslash |
Furthermore, a character may also be denoted in the form \N,
where N is the character number in decimal, hexadecimal or octal
(using the same syntax as for unsigned integer values). Note that the
character number may consist of an arbitrary number of digits; the
resulting value will be taken modulo the size of the local character
set, which is 256 in the case of ASCII-only systems and 0x110000 if the
interpreter has been built with Unicode support (see Unicode Support). Optionally, the character number may also be enclosed in
parentheses; this makes it possible to specify a string in which a
character escape is immediately followed by another character which
happens to be a valid digit, as in "\(123)4".
As of version 7.11 and later, the interpreter also supports symbolic character escapes of the form `\&name;', where name is any of the XML single character entity names specified in the "XML Entity definitions for Characters", see http://www.w3.org/TR/xml-entity-names/. Note that, at the time of this writing, this is still a W3C working draft, so the supported entity names may be subject to change until the final specification comes out; the currently supported entities are described in the draft from 14 December 2007, see http://www.w3.org/TR/2007/WD-xml-entity-names-20071214/. Also note that multi-character entities are not supported in this implementation.
A string may also be continued across lines by putting the \
character immediately before the end of the line, which causes the
following newline character to be ignored.
As of Q 7.0, it is a syntax error if a character escape is not recognized as either a numeric escape or one of the special escapes listed above.
Some examples:
"" empty string "A" single character string "\27" ASCII escape character (ASCII 27) "\033" same in octal "\0x1b" same in hexadecimal "\(0x1b)c" ASCII escape followed by a literal `c' character "Gr\äf" German umlaut, using XML entity escape "a string" multiple character string "a \"quoted\" string" include double quotes "a line\n" include newline "a very \ continue across line end long line\n" |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Since version 7.0 the Q interpreter has full Unicode support. This means that identifiers may actually contain arbitrary alphanumeric letter and digit symbols from the entire Unicode character set. (All non-uppercase alphabetical letters are considered as lowercase letters which may begin a function identifier.) Likewise, operator symbols may consist of arbitrary punctuation characters in the Unicode character set. (We refrain from giving any concrete examples here, to keep this document 7-bit clean.)
Moreover, all string values are internally represented using the UTF-8
encoding, which allows you to represent all characters in the Unicode
character set, while retaining backward compatibility with 7 bit
ASCII. String literals in the source script will be translated from the
system encoding to UTF-8 automatically and transparently, and the
\N notation may specify any valid Unicode point. For this to
work, the interpreter must have been built with Unicode support enabled
(which is the default).
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The basic compilation unit in the Q language is the script, which is simply a (possibly empty) sequence of declarations and definitions in a single source file:
script : {declaration|definition}
|
In order to make a given set of definitions available for use in the interpreter, you can just collect the relevant declarations, variable definitions and equations in a script which you submit to the interpreter for execution. The interpreter in turn invokes the compiler to translate the script into a bytecode file which is loaded by the interpreter (see Using Q). A script may also be empty, in which case it does not provide any definitions at all.
If you put all your definitions into a single script, that's all there is to it. However, you will often want to break down a larger script into a collection of smaller units which can be managed separately and which are linked together by the compiler into a single bytecode file. To these ends the Q language allows you to put your definitions into several separate script files, which are also called modules. To gain access to the function, variable and type symbols provided by another module, you must then use an import or include declaration, using the following syntax:
declaration : unqualified-import
| qualified-import
unqualified-import : 'import' module-spec {',' module-spec} ';'
| 'include' module-spec {',' module-spec} ';'
qualified-import : 'from' module-spec 'import' [symbol-specs] ';'
| 'from' module-spec 'include' [symbol-specs] ';'
module-spec : module-name ['as' unqualified-identifier]
module-name : unqualified-identifier
| string
symbol-specs : symbol-spec {',' symbol-spec}
symbol-spec : unqualified-identifier ['as' unqualified-identifier]
| unqualified-opsym ['as' unqualified-opsym]
|
As of version 7.8, Q supports both unqualified and qualified import clauses. The former allows you to quickly import an entire collection of modules, while the latter gives you precise control over which symbols are to be imported from which modules. We describe each of these in turn.
| 4.1 Unqualified Imports | ||
| 4.2 Qualified Imports | ||
| 4.3 Implicit Imports and the Prelude | ||
| 4.4 The Global Namespace |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Easiest first, let as take a look at unqualified imports. The following
declaration imports two modules foo and bar:
import foo, bar; |
If you put such declarations into your "main script" and then run the script with the interpreter, the imported modules will be loaded as well.
To determine which functions, variables and types are actually available to be imported into another script, the Q language allows you to declare symbols either as "public" or "private", see Declarations. Outside a module, only the public symbols are accessible when the module is imported in another module.
Normally import is not "transitive", i.e., importing a module
foo does not automatically give you access to the symbols
imported by foo, but only to the public symbols declared in
foo itself. To work around this, instead of import you
can also use an include declaration which causes all imports and
includes of the included module to be "reexported". For instance, let
us consider the case that module foo includes module bar:
include bar; |
Then by importing module foo you also gain access to the public
symbols of module bar (and, recursively, to the modules included
by bar, etc.). This provides a means to group different modules
together in one larger "umbrella" module. For instance, the standard
prelude script (cf. The Standard Library) simply includes most of
the other standard library modules.
In the Q language, each module has its own separate namespace. This
means that two different modules, say foo1 and foo2, may
both provide their own public function symbol named foo. If both
foo1 and foo2 are imported in the same script, you can
distinguish between the two by using a qualified identifier, using
the module name as a prefix, e.g., foo1::foo or
foo2::foo. (Writing simply foo in this case will produce
an error message because the reference is ambiguous.)
Import and include declarations can occur anywhere in a script (and will
be active from this point on up to the end of the script file), but it
is common (and recommended) practice to put them near the beginning of
the script. As in most other programming languages, module dependencies
must always be acyclic. If the compiler finds a cyclic chain of
import and include declarations, such as a module
importing itself or a module foo importing a module bar
which in turn imports foo again, it produces an error message.
Another caveat: When using unqualified import clauses, it is much too
easy to accidentally "reuse" an imported symbol because of a missing
local symbol declaration. For instance, if you import a module
foo which happens to export a symbol bar, and then you
later define a function bar in your script, your definition will
use the imported bar symbol. This is perfectly legal in Q, and
may be intended, but if it's not then you have to explicitly declare a
new local bar symbol after the import clause,
cf. Declarations. If you forget this, you will errorneously
redefine the existing bar function instead. One way to avoid this
is to always use qualified imports, as described below. Another
possibility is to run the interpreter with the --pedantic option,
in which case it will warn you about symbols from unqualified imports
which are referred to with unqualified identifiers, see Running Compiler and Interpreter, for details.
As indicated in the syntax rules, the names of the modules to be imported can either be specified using an unqualified identifier, or a string denoting a full or relative pathname, e.g.:
import "/home/ag/q/stuff.q"; |
In this case, you can also omit the `.q' suffix, it will be
supplied automatically when necessary. Moreover, the module identifier
is automatically the basename of the script filename (i.e., stuff
in the above example); therefore, it is a good idea to choose basenames
which are legal Q identifiers so that you can use them in qualified
symbols.
If no absolute path is specified, the interpreter locates script files using its "search path", which usually contains the current directory and other system-dependent or user-defined locations where "library" scripts are kept, see Using Q, for details.
The `as' keyword can be used to explicitly specify an unambiguous module name. This is useful, in particular, if the basename of the module is not a valid identifier, and for the purpose of resolving name clashes. For instance:
import "my/stdlib.q" as mystdlib; |
Simply importing "my/stdlib.q" under its own name in this case
would produce an error message, because the name of the module collides
with the standard library module of the same name, and the compiler
enforces that all module names are unique.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A qualified import takes the form:
from foo import foo, BAR; |
This specifically imports the symbols foo and BAR from
module foo, and nothing else. What this actually does is to
"redeclare" the imported symbols in the namespace of the importing
module, as explained in Cross-Checking of Declarations and Aliases, so that they look like private symbols of the importing
module. Thus you can refer to them, e.g., simply as foo or with
the qualified name gnats::foo, where gnats is the name of
the importing module. Note that foo::foo will not
work in this case, because a qualified import does not bring the
module itself into scope. (However, as pointed out below you can use
both a qualified and an unqualified import in concert to make
both gnats::foo and foo::foo work.)
A note on terminology: In Q parlance, the terms qualified import and unqualified import apply to the import clauses themselves; a qualified import clause is called "qualified" simply because it restricts the set of imported symbols. Unqualified and qualified identifiers can be used with both styles of import clauses alike.
As with unqualified import, there is a second kind of qualified import clause which also reexports the imported symbols, by making them public members of the importing module:
from foo include foo, BAR; |
For convenience, you can also write:
from module import; |
or
from module include; |
without listing any symbols to just import or include all symbols of the given module qualified. (It goes without saying that this should be used with care.) This is roughly equivalent to `import module' or `include module', respectively, but the imported symbols are made available as members of the namespace of the client, not the imported module.
If this sounds a bit confusing, here is an example which should clarify
the differences between unqualified and qualified imports. Let's assume
the following three modules A, B and C:
/* A: */ import B; /* B: */ include C; /* C: */ public foo; |
Using this setup, the symbol foo is available as either just
`foo' or as `C::foo' in all of A, B and
C. Now suppose we change module B to use a qualified
`include' instead:
/* B: */ from C include foo; /* or just: from C include; */ |
C's namespace hasn't changed, of course, but in both A and
B the symbol foo is now available as `foo' or
`B::foo', but not as `C::foo' anymore. However, you can
also combine unqualified and qualified imports, like this:
/* B: */ include C; from C include foo; |
Now the symbol foo is available as either `foo',
`B::foo' or `C::foo' in B and C (and the
compiler will recognize it as the same symbol no matter which notation
you use).
Finally, note that since symbols imported using a qualified clause
become members of the importing namespace, they must not collide with
other symbols declared either locally or in another qualified import
clause. Thus, if you have two modules foo1 and foo2 which
both export their own (local) symbol foo, the following will
provoke an error message from the compiler because of the clash between
the two different foo symbols:
from foo1 import foo; from foo2 import foo; |
In such cases you must either use unqualified import, or employ an `as' clause to rename the symbols while importing them. E.g.:
from foo1 import foo as foo1; from foo2 import foo as foo2; |
Note that this is only a problem if the imported symbols are really distinct. If the symbols are just different "aliases" of the same symbol defined elsewhere, then you can just import them under the same name into the same scope without any problems.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Actually, there is yet another kind of import, namely the implicit
import of the prelude.q script at the beginning of each script,
which in turn includes most standard library modules (see The Standard Library) and makes them readily available in your program,
without having to use an explicit import declaration. When looking up an
unqualified symbol in a given script, the compiler first searches for a
symbol defined in that script, then for a symbol in an imported or
included module, then for a symbol defined by the prelude and its
includes, and finally for a built-in symbol.
You can instruct the compiler to inhibit the default import of the
prelude, by using the --no-prelude option, see Using Q. Moreover, you can also override the standard prelude with your own,
if it occurs on the search path before the standard prelude, see
Setting up your Environment.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The namespace available in the interpreter is always that of the main
script, i.e., the script given on the command line. This namespace also
includes the built-in namespace which contains all the built-in
symbols, as well as the implicit imports, i.e., the prelude and all its
includes (unless the --no-prelude option was used). The built-in
namespace can be accessed explicitly by using an empty module qualifier,
as in ::sin. Qualified prelude symbols use the corresponding
module name as the qualifier (e.g., stdlib::cat), just as with
explicit imports.
The interpreter also allows you to dynamically import additional modules
in the global scope using the import command, see Command Language. Moreover, to facilitate testing and debugging, in the
interpreter it is possible to gain access to all public and
private symbols of the program (also in modules not directly imported in
the main script) using qualified identifiers.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The Q language allows you to declare function and (free) variable
symbols explicitly, by means of the syntactic constructs discussed in
this chapter. Symbol declarations are mostly optional; if you introduce
a new symbol without declaring it, the compiler declares it for
you. However, you will sometimes wish to ensure that a new symbol is
created to override a symbol of the same name from an imported module,
or you want to attach special attributes to a symbol, and then you have
to use an explicit declaration. Explicit declarations are also needed to
introduce new operator and type symbols. Moreover, while implicit
declarations are often convenient when you want to get something done
quickly, for larger projects you might prefer to play it safe and ensure
that each function symbol actually has a proper explicit declaration. In
such cases you can run the compiler with the --pedantic option
(see Running Compiler and Interpreter) to warn you about
undeclared occurrences of an identifier.
| 5.1 Declaration Syntax | ||
| 5.2 Operator Declarations | ||
| 5.3 Cross-Checking of Declarations and Aliases | ||
| 5.4 Type Declarations |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Syntactically, symbol declarations take the following form:
declaration : prefix headers ';'
| [scope] 'type' unqualified-identifier
[':' identifier] ['=' sections] ';'
| [scope] 'extern' 'type' unqualified-identifier
[':' identifier] ['=' sections] ';'
| [scope] 'type' qualified-identifier
['as' unqualified-identifier] ';'
| [scope] 'type' unqualified-identifier
'==' identifier ';'
prefix : scope
| [scope] modifier {modifier}
scope : 'private'|'public'
modifier : 'const'|'special'|'extern'|'var'|'virtual'
headers : header {',' header}
header : unqualified-identifier '=' expression
| unqualified-identifier
{['~'] variable-identifier}
| qualified-identifier
{['~'] variable-identifier}
['as' unqualified-identifier]
| '(' unqualified-opsym ')'
{['~'] variable-identifier}
['@' precedence]
| '(' qualified-opsym ')'
{['~'] variable-identifier}
['@' precedence]
['as' unqualified-opsym]
precedence : unsigned-number
| '(' operator ')'
sections : section {'|' section}
section : [prefix] headers
|
For instance, the following are all valid symbol declarations:
public foo; private extern bar X; public special lambda X Y; special cond::ifelse ~P X Y as myifelse; public (--) Xs Ys; private (::++) Xs Ys as concat; const red, green, blue; var FOO, BAR; var const e = exp 1; type Day = const sun, mon, tue, wed, thu, fri, sat; public type BinTree = virtual bintree Xs | private const nil, bin X T1 T2; |
The keywords private and public specify the scope of
a symbol. Public symbols are accessible outside a script, and can
be imported by other modules (see Scripts and Modules). If the
keywords private and public are omitted, the scope
defaults to private.
The special keyword serves to declare special forms, which
are described in Special Forms. Function symbols declared with
const introduce "constant" or "constructor" symbols; the
compiler enforces that expressions created with such symbols are not
redefined in an equation (cf. Equations). The built-in constants
true, false, [] and () are already
predeclared as const. The virtual keyword serves to
declare "virtual constructors" which can be used to define concrete
representations (so-called "views") of abstract data types, see
Views. Non-const function symbols can also be declared as
extern, meaning that the corresponding function is actually
implemented by a corresponding "external" module, see C Language Interface.
The var keyword allows you to declare "free" variable symbols,
which can be assigned a value by means of a variable definition,
see Free Variables. If you use such a declaration, the variable
symbol may also start with a lowercase letter (if it has not already
been declared as a function symbol); note that without such a
declaration, an identifier starting with a lowercase letter will
implicitly be declared as a function symbol. (The idea behind this is
that you can make a variable look like a function symbol, if the
variable is actually supposed to be used for function values.)
Variable symbols can also be declared as const; in this case the
variable can only be assigned to once and always refers to the
same value once it has been defined. The built-in variables
INPUT, OUTPUT, ERROR and ARGS are
predeclared as const, see Command Language. A variable
declaration may also contain an "initializer" of the form = X,
where X is an expression. This has the effect of declaring and
defining the variable in a single step.
As indicated, the scope-modifier prefix of a declaration is followed by
a comma-separated list of headers. The first identifier in each
header states the symbol to be declared. In function symbol declarations
the function identifier may be followed by a list of variable
identifiers which are used to denote the arguments of the declared
function symbol. The variables are effectively treated as comments; only
their number (called the arity of a function symbol) is recorded
to check the consistency of different declarations of the same symbol,
and to specify the number of "special" arguments in a special
declaration. In special declarations variables may also be
prefixed with `~' to declare them as "non-special" arguments; see
Special Forms.
The first declaration of an unqualified identifier or operator symbol in
a module always introduces a new symbol in the module's namespace. By
default (if no explicit declaration precedes the first use of a new,
unqualified symbol), the compiler automatically declares it as a
private nullary function or variable symbol, depending on the
case of the initial letter of the identifier (as already mentioned in
Lexical Matters, capitalized identifiers are interpreted as
variable symbols, others as function symbols).
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
As of Q 6.2, it is also possible to declare new prefix and infix operator symbols. Such declarations are never optional; operator symbols must always be declared before they are used. Syntactically, they take exactly the same form as a function symbol declaration, but the operator symbol is given inside parentheses. As discussed in Lexical Matters, the operator symbol may either be a function identifier or a sequence of punctuation characters. Examples:
public (myop) X Y; public (--) X Y; |
You can also specify a precedence level, like so:
public (myop) X Y @ 2; // use explicit precedence level ... public (myop) X Y @ (<); // ... or precedence level of given operator |
If no precedence is given, it defaults to 2, which is the precedence of the relational operators. This is described in more detail in User-Defined Operators.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The Q language allows multiple declarations of the same symbol, and the
compiler will verify the consistency of all declarations of a given
symbol. That is, if a symbol is declared with different scope
(private or public), attributes (like var,
const or special) or different number of arguments, or if
non-special arguments of a special form are declared differently, then
the compiler will issue an error message.
Note, however, that the compiler will never cross-check the number of parameters in a function symbol declaration with the actual number of arguments to which the function is applied when it is used in an equation or a variable definition. This is because in Q it is perfectly legal to have a function symbol applied to a varying number of arguments for the purpose of constructing partial applications. Also note that if a local function symbol is first used without an explicit declaration then it will be implicitly declared to have a zero argument count. If the symbol is then later declared with a nonzero number of arguments or other non-default attributes then the compiler will print an error message.
As indicated by the syntactic rules, it is also possible to redeclare a qualified symbol. This requires that the target module either is the current module or has already been imported in the current scope using an unqualified import clause, and causes the compiler to both cross-check the declaration with the declaration in the imported module and redeclare the symbol within the current namespace.
Such a redeclaration serves several different purposes. First and
foremost, it allows you to ensure that an imported symbol was actually
declared with the given attributes. Second, it lets you resolve name
clashes by redeclaring the symbol in the current scope where it
overrides unqualified imports of other modules; if you want, you can
also import the symbol under a new name using an `as' clause (the
new symbol is then also referred to as an alias of the original
one). In these two cases the symbol will normally be redeclared as a
private symbol. Third, by redeclaring a symbol as public, you
cause the symbol to be reexported by the current module. Examples:
import gnats; private gnats::foo X Y; // cross-check declaration of gnats::foo public gnats::foo X Y as bar; // reexport gnats::foo as bar |
Of course, the same also works with operator symbols. E.g., here is how
you declare a new operator symbol concat which behaves exactly
like the built-in `++' operator:
public (::++) X Y as concat; |
There is yet another usage of an imported symbol redeclaration, namely the
extern redeclaration. This is only possible with function symbols
and if the redeclared symbol is not already declared extern by
another module. It instructs the compiler that this module provides an
external definition of the symbol which will override equational
definitions in other modules, see C Language Interface, for
details.
Note that, as of Q 7.8, most of the uses of qualified symbol redeclarations are now covered in a more convenient fashion by qualified imports, see Qualified Imports. Thus qualified symbol redeclarations will now mostly be used for cross-checking declarations and for external overrides.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A type declaration introduces a type identifier, an optional
supertype for the type, and an optional list of "constructor" symbols
for the type. It associates the function symbols in the list with the
given type. The symbol list is a |-delimited list of individual
sections. Each section takes the form of an ordinary symbol
declaration, consisting of scope/modifier prefix and a comma-delimited
list of headers. The prefix is optional; by default, a symbol has the
same scope as the type it belongs to. To change scope and attributes of
a symbol on the list, you use an appropriate scope/modifier prefix. For
instance, you can declare a public BinTree type with private
constructors nil and bin as follows:
public type BinTree = private const nil, bin X T1 T2; |
Each constructor symbol may only be declared once; otherwise the compiler will issue an error message. Hence the compiler enforces that a constructor symbol only belongs to a single type, and that the number of arguments of that symbol is determined uniquely.
A constructor symbol may also be declared as virtual, which
indicates that it may be used in concrete representations of abstract
data types, so-called "views". Such symbols are usually
non-const symbols implementing a public construction function
which delivers values of the type they belong to, see Views, for
details. For instance, to add a virtual constructor bintree to
the above BinTree type, you might declare the type as follows:
public type BinTree = virtual bintree Xs | private const nil, bin X T1 T2; |
Type identifiers may begin with either an upper- or lowercase letter (the convention, however, is to use capitalized identifiers). There may only be a single declaration for each type. As of Q 7.8, type identifiers must also be distinct from function or variable symbols declared in the same namespace, as both type and function/variable identifiers may now occur in the same context (namely, a qualified import clause).
Types are used on the left-hand side of equations to restrict the set of
expressions which can be matched by a variable; see Type Guards,
for details. The Q language has a number of predefined type symbols
which distinguish the corresponding types of objects built into the
language: Bool, Int, Float, Real (which is
the supertype of both Int and Float), Num (the
supertype of Real), String, Char (the subtype of
String denoting the single-character strings), File,
List, Stream, Tuple and Function. (Besides
this, there are also two built-in types for exceptions, see
Exception Handling.)
Like function symbols, types imported from other modules can also be redeclared (possibly under a new name), and reexported, e.g.:
public type array::Array as MyArray; |
In this case, no constructor symbols or supertype are specified. (As of Q 7.8, it is also possible to use a qualified import clause to achieve this, see Qualified Imports. Also note that if a type is imported using a qualified import clause, then its constructor symbols will not be imported automatically; you will have to explicitly list them in the import clause if you need them.)
As of Q 7.1, there is an alternative, more customary syntax for introducing type aliases which looks as follows:
type Integer == Int; |
Note that here the alias to be defined is specified first, and the existing type to be aliased can be specified without a module qualifier. This works exactly like an ordinary alias declaration, but resembles more closely the syntax commonly used for defining type synonyms in other languages like Pascal and Haskell.
Types can also be declared as extern, to indicate that they are
realized in a corresponding C module, as described in C Language Interface. External types may have a supertype, but only virtual
constructors. For instance:
extern type Bar = virtual bar X; |
In difference to function symbols, an existing type imported from
another module cannot be redeclared as extern. Therefore an
external definition must always be given by the module which originally
declares the type.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The notion of expressions is an intrinsic part of most, if not all, programming languages. However, in most programming languages expressions are merely treated as descriptions of computations to be performed in order to obtain the value of an expression, which usually belongs to some basic type provided by the language. In contrast, in the Q language expressions are objects in their own right which, as we shall see, are manipulated according to computational rules specified by the programmer.
As usual, expressions are built from certain kinds of basic objects. In the Q language, these are integers, floating point numbers, strings, function and variable symbols. These objects are combined into larger, compound expressions by means of applications, tuples, lists and streams. For convenience, the Q language also has prefix, infix and mixfix operator symbols for the most common operations; these are actually treated as "syntactic sugar" for applications. Syntactic sugar is also provided for list, tuple and stream enumerations and comprehensions. The syntax of all these entities is described by the following grammatical rules:
expression : identifier
| 'var' unqualified-identifier
| variable-identifier ':' identifier
| number
| string
| expression expression
| unary-op expression
| expression binary-op expression
| 'if' expression 'then' expression
['else' expression]
| '\' expression { expression } '.' expression
| '(' [element-list|enumeration|comprehension] ')'
| '[' [element-list|enumeration|comprehension] ']'
| '{' [element-list|enumeration|comprehension] '}'
| '(' op ')'
| '(' expression binary-op ')'
| '(' binary-op expression ')'
element-list : expression-list [',' | ';' | '|' expression]
enumeration : expression-list '..' [expression]
comprehension : expression ':' expression-list
expression-list : expression {',' expression}
| expression {';' expression}
|
| 6.1 Constants and Variables | ||
| 6.2 Applications | ||
| 6.3 Lists, Streams and Tuples | ||
| 6.4 Built-In Operators | ||
| 6.5 User-Defined Operators |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The nonterminals identifier, op, unary-op,
binary-op, number and string occuring in the syntax
rules above refer to the lexical entities introduced in Lexical Matters. Note that the Q language actually distinguishes two different
types of identifiers (function and variable identifiers), two different
kinds of operator symbols (unary and binary) and two different types of
numeric quantities (integers and floating point numbers). Integers are
implemented as "bignums" using the GNU multiprecision package (GMP);
thus the size of an integer value is only limited by available
memory. Floating point values are implemented using 64 bit (i.e., double
precision) floating point numbers; on most machines, these should
provide nonzero absolute values ranging from 1.7E-308 to 1.7E308 and a
precision of 15 decimal digits.
String constants are character sequences internally represented as character
arrays permitting constant-time access to the individual characters of a
string. There is no a priori length restriction for string constants. In the
current implementation, the null character \0 is reserved as a string
terminator and must not be used as an ordinary string character.
Furthermore, the Q language provides five other built-in constants which
denote the truth values (true and false), and the empty
list, stream and tuple ([], {} and (), to be
discussed in Lists, Streams and Tuples).
A variable symbol can be "bound" or "free", depending on the context
in which it occurs. We say that a variable is bound in an equation
if it occurs on the left-hand side of the equation. Otherwise, the
variable is a free variable. In this case, the variable may denote
a value (introduced with def), or simply stands for itself. In
any case, a variable is a legal atomic expression which may stand
wherever a constant is allowed.
On the left-hand side of equations, it is possible to declare
that a variable is of a given type by using the notation:
variable:type. This requires that you have previously
declared type as a type identifier, see Declarations. When a
type identifier is specified with a variable, the variable will only
match values belonging to the given type, cf. Type Guards.
On the right-hand side of equations, an unqualified identifier
can also be prefixed with the keyword var to indicate that the
identifier is to be treated as a free variable symbol. Such inline
variable declarations can be used to escape free variables (if the same
identifier is also introduced as a bound variable on the left-hand side
of the equation) and to declare free variable symbols on the fly, see
Local Variables, for details.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Application is probably the most important single construct in the Q
language. It allows you to apply one object (the "function") to
another one (the "argument"). This construct is used to denote
"function calls" such as sqrt 2 as well as "constructor
terms" such as bin X T1 T2 which encode tree-like data
structures. Also, as we will see in Built-In Operators, operators
like +, -, etc. are merely syntactic sugar for
applications.
As in other contemporary functional languages, application is a binary
operation written simply as juxtaposition: X Y denotes the
application of X to Y, both of which may be arbitrary
expressions themselves. Application associates to the left; the
construct X Y1 … Yn is equivalent to (…((X Y1)
Y2) …) Yn, and denotes the application of X to n arguments
Y1, …, Yn. This style of writing function
applications is commonly referred to as currying, after the
American logician H.B. Curry. We will have to say more about this
shortly.
Here are some valid examples of applications:
sqrt 2 sqrt (Y+1) foo X (bar (Y-1)) Z |
Note that since application is left-associative, nested applications in
arguments must be set off with parentheses. For instance, foo (bar X)
applies foo to bar X, whereas foo bar X applies
foo to two arguments bar and X.
Since currying is so ubiquitous in functional programming, you should be well familiar with it, so let us briefly explain what it is, and what it is good for.
Functions of multiple arguments can generally be defined in two different ways:
max, which computes the maximum of two values, as follows:
max (X,Y) = X if X>Y;
= Y otherwise;
|
max function:
max X Y = X if X>Y;
= Y otherwise;
|
Here max X Y is to be read as (max X) Y, meaning that by
applying max to the first argument X, we obtain another
function max X, which, when applied to the second argument
Y, yields the maximum of X and Y. (Incidentally,
this curried form is also the way that the max function is
actually defined in the standard library. In fact, most built-in and
standard library functions use the curried form as it is more flexible.)
The Q language supports both kinds of notations. Choosing between the
two is more than just a matter of taste. Besides saving parentheses,
curried functions provide a cheap way to derive new functions through
"partial applications" of a multi-argument function. For instance,
given the curried definition of max from above, max 0 can
be used to denote the function computing the "nonnegative part" of its
argument (which is the argument itself if it is nonnegative, and zero
otherwise). This does not work with the uncurried definition since that
definition requires us to specify both arguments of max in one
chunk; instead, we would have to start defining the derived function
from scratch.
Uncurried definitions also have their merits. In particular, they allow
us to define variadic functions, i.e., functions which can handle
a varying number of arguments. For instance, the following definition
enables us to apply the max function to both pairs and triples:
max (X,Y) = X if X>=Y;
= Y otherwise;
max (X,Y,Z) = max (X,max (Y,Z))
|
In fact, in the Q language it is possible to define generic functions
which apply to any number of arguments specified as a tuple. For
instance, the following version of max handles tuples of any size
at least 2:
max (X,Y) = X if X>=Y;
= Y otherwise;
max (X,Y|Zs) = max (X,max (Y|Zs));
|
(In the above example, the notation (…|Zs) is used to
denote a tuple with "tail" Zs. This is explained in the
following section.)
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The Q language provides three closely related general constructs for representing sequences of objects: lists, streams and tuples.
We start out with a discussion of the list data structure. The
constant [] denotes the empty list. In general, a list consisting
of elements X1, X2, …, Xn is denoted
[X1,X2,…,Xn]. For instance, [a,b,c] consists of
three elements (symbols) a, b and c. In contrast to
statically typed languages like Haskell and ML, list members may have
different types; e.g., [a,10,"a string"] is a perfectly
well-formed list consisting of a symbol, a number and a string. It is
also possible to have nested lists, as in [a,[b,c]] which
consists of two elements, the symbol a and the list [b,c].
You can also have a trailing comma in a list, as in [a,b,c,].
The trailing comma is ignored, so the above is exactly the same as
[a,b,c]. This makes it possible to format a list as follows,
which lets you add new items at the end more easily:
def ITEMS = [
"This is the first item",
"This is the second item",
…
"This is the last item",
];
|
As in Prolog, lists are actually represented in a right-recursive
fashion using a binary constructor [|] which takes as its
arguments the head element of the list and the list of the
remaining elements, called the tail of the list. Thus, e.g.,
[a,b,c] is simply a shorthand notation for [a|[b,c]],
which denotes a list with head a and tail [b,c] (which is
in turn a shorthand for [b|[c]], etc.). You can also mix both
styles of notation; for instance, [a,b|[c,d]] is another way to
represent the 4-element list [a,b,c,d].
Note that [a|[b,c]] is different from [a,[b,c]]: the
former denotes a three-element list, while the latter is a two-element
list whose second member happens to be a list itself. Also note that the
[|] constructor can in fact be applied to any pair of values (the
second value does not have to be a list); e.g., [a|b] is a
perfectly well-formed expression (although the built-in length, indexing
and concatenation operations described in String, List and Tuple Operators, will fail on such values).
Q also provides so-called enumerations to construct lists from a
range of values. These usually take the form
[X1,X2,…,Xn..Y] which is just syntactic sugar for the
application enum [X1,X2,…,Xn] Y involving the built-in
enum function, see Enumeration Types, for
details. Syntactic sugar is also provided for list comprehensions,
which are discussed in Conditionals and Comprehensions. Analogous
constructs are also provided for streams and tuples.
Streams look exactly like lists except that they are written with
curly braces. The difference between lists and streams is that the head
and tail of a stream are not evaluated until their values are actually
required during the course of a computation. For instance, you will find
that if you enter an expression like {1+1,2+2,3+3} in the
interpreter, then the value of that expression will be just the
expression itself, because the embedded applications of the `+'
operator are not evaluated:
==> {1+1,2+2,3+3}
{1+1,2+2,3+3}
|
In contrast, if you enter the same sequence as a list, its elements will be evaluated immediately:
==> [1+1,2+2,3+3] [2,4,6] |
A stream element will be evaluated automatically when it is accessed, e.g., using the index operator:
==> {1+1,2+2,3+3}!1
4
|
This is also known as lazy or non-strict evaluation: the evaluation of the stream elements is deferred until they are actually needed. In fact, the entire tail of a stream may be deferred, which makes it possible to create infinite sequences whose members are produced "on the fly" while traversing the sequence.
Streams are implemented in terms of so-called "special forms" which
will be introduced in Special Forms. We therefore postpone a
closer description of this data structure until Special Constructors and Streams. Also note that the Q language itself only
defines the stream constructors; most other stream functions are
actually implemented by the standard library script stream.q, see
Streams.
Tuples also work in much the same fashion as lists: The constant
() denotes the empty tuple, and a tuple consisting of elements
X1, X2, …, Xn is written as
(X1,X2,…,Xn), which is equivalent to
(X1|(X2|…|(Xn|()))), where the notation (X|Xs)
denotes a tuple consisting of a first element X and the tuple
Xs of the remaining elements. A trailing comma may follow the
last tuple element, so that (a,b,c,) is the same as
(a,b,c). The trailing comma is optional in all cases except one:
As in the Python programming language, 1-tuples are indicated with a
trailing comma, to distinguish them from ordinary parenthesized
expressions. Hence (a,) denotes the 1-tuple with the single
member a, while (a) just denotes a itself.
Note that in contrast to languages like ML and Haskell, tuples are not
actually needed to represent sequences of values with different types,
as Q's list data structure already allows you to do that. Also, Q's
tuples are a lot more flexible than in ML and Haskell, as they can be
handled mostly like list values. The big difference between lists and
tuples in the Q language is that, while lists are always represented as
recursive data objects using a binary constructor symbol (just the way
that they are written), tuples are actually implemented as "vectors"
which are stored as contiguous sequences in memory. (This only works for
"well-formed" tuples. If the tail Xs of a tuple (X|Xs)
is not a tuple, then this value can only be represented using an
explicit application of the tuple constructor.) Therefore tuples
normally use up much less memory than lists of the same size, and they
also allow constant-time access to their members. The size of a tuple
can be determined in constant time as well. In contrast, the same
operations, when applied to a list, require time proportional to the
size of the list. On the other hand, lists are more efficient when
accessing the tail of a list and when a new element is prepended to a
list, which can both be done in constant time. Here a tuple needs time
proportional to its size, since the member sequence of the original
tuple must be copied when accessing its tail or when prepending an
element.
These tradeoffs should be carefully considered when deciding whether to
implement a given sequence as a list or a tuple. Tuples are usually the
best choice for implementing fixed sequences requiring fast random
access to its individual members, whereas lists provide the most
efficient representation if the elements are traversed and manipulated
sequentially. If necessary, you can easily convert between these two
representations using the built-in functions list and
tuple, see Conversion Functions.
Lists, streams and tuples are frequently combined to create nested
structures like a tuple of lists and/or streams, a list (or stream) of
lists, or a list (or stream, or tuple) of tuples. The latter construct
(sequences of tuples) is so common that Q provides special support for
it in the form of the following "grouping" syntax. The notation
[X1, X2, …; Y1, Y2, …; …] is a shorthand for
[(X1, X2, …), (Y1, Y2, …), …]. Note that there
must be at least one element in each group (but the number of elements
in different groups may vary), and there must be at least one group (a
single group is denoted with a trailing `;'). Thus, for instance,
[A,B;C,D;X,Y,Z] is the same as [(A,B),(C,D),(X,Y,Z)], and
[X,Y,Z;] is the same as [(X,Y,Z)]. The same notation also
applies to streams and tuples.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Besides the forms of expressions already discussed above, the Q language also provides a collection of prefix, infix and mixfix operator symbols for common arithmetic, logical, relational and other operations. A complete list of these symbols is given below, in order of decreasing precedence:
' ` ~ "ation operators (unary)
.function composition
^ !exponentiation and subscript
- # notprefix operators (unary)
* / div mod and and-thenmultiplication operators
++ + - or or-elseaddition operators
< > = <= >= <> ==relational operators
$infix application operator
if-then-elseconditional expressions
||sequence operator
\X.Ylambda abstractions
Most of these symbols have their usual meaning; a closer description
follows below. All binary operators are left-associative, with the
exception of ^, ! and $ which associate to the
right, and the relational operators which are
nonassociative. Application takes precedence over all these operations
except the quotation operators; hence sqrt X^3 denotes
(sqrt X)^3 and not sqrt (X^3). The quotation
operators have the highest possible precedence, and hence bind stronger
than even applications. Parentheses are used to group expressions and
overwrite default precedences and associativity as usual. C programmers
will also note that the logical operators have the same "wrong"
precedence as in Pascal. Thus you should make sure that you always
parenthesize relational expressions when combining them with logical
connectives.
You should also note that unary minus must be parenthesized when
appearing in an argument of a function application. For instance,
although foo X -Y is a perfectly well-formed expression, it
denotes (foo X) - Y and not foo X (-Y) as you might
have intended by the spacing which is however ignored by the Q
compiler. As already noted in Lexical Matters, unary minus in
front of a number is special; it causes values such as -2 to be
interpreted as negative numbers rather than denoting an explicit
application of the unary minus operator (an explicit application of
unary minus can be denoted using the built-in minus symbol; see
below). The rules for parenthesizing negative numbers are the same as
those for unary minus; you have to write foo X (-2) instead of
foo X -2 (which denotes (foo X) - 2).
In the Q language, expressions involving prefix and infix operators are
merely syntactic sugar for applications. By enclosing such an operator in
parentheses, you can turn it into an ordinary prefix function. For
instance, X+Y is exactly the same as (+) X Y, and
not X actually denotes the application (not) X. The
built-in operators simply provide a more convenient way for applying
these operations to their arguments, which resembles common mathematical
notation. Moreover, you can also turn a binary operator into a unary
function by specifying either the left or the right operand. E.g.,
(1/) denotes the reciprocal and (*2) the doubling
function. Such constructs are commonly called operator
sections. Inside a section, the usual precedence and associativity
rules apply. For instance, the X+3 subterm in (*(X+3))
must be parenthesized because * has a higher precedence
than +, and hence the partial expression (*X+3) is not
well-formed.
The - operator plays a somewhat awkward role in the syntax, since
it is used to denote both unary and binary minus. Q adopts the
convention that the notation (-) always denotes binary
minus; unary minus may be denoted by the built-in minus
function. Thus the expression minus X applies unary minus to
X. Note that this construct always denotes an explicit
application of the unary minus operation. For instance, minus 5
denotes the application of unary minus to the integer 5, while
-5 is a negative integer. Also note that the construct
(-X) is not an operator section, but a parenthesized
expression involving unary minus. The easiest way to construct an
operator section which subtracts a given value from its argument is to
formulate the function using the addition operator as in (+(-X)).
Another special case are the mixfix operators if-then-else
(conditional expression) and \X.Y (lambda abstraction). These are
just syntactic sugar for the standard library functions
ifelse/when and the built-in function lambda, so no
special syntax is needed to turn them into prefix functions.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The ' (quote), ` (backquote), ~ (tilde) and
& (ampersand) operators are used to deal with so-called
special forms. The quote operator quotes an expression as a
literal; it is a constructor symbol and hence becomes part of the quoted
expression. The backquote and tilde operators are used to "splice" and
"force" subterms in an expression. The ampersand operator causes
"special arguments" to be memoized. We postpone a discussion of these
operations until Special Forms.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The `.' operator denotes function composition: (F.G) X = F (G
X). For instance, to double a value and then add 1 to it, you can use
an expression like ((+1).(*2)) X. Note the parentheses around the
composition - as application binds stronger than composition, F.G
X is interpreted as F.(G X) rather than (F.G) X. If a
composition involves applications with integer arguments, extra
whitespace may be needed to distinguish between the `.' operator
and the decimal point; e.g., F 5 . G denotes a composition of the
functions F 5 and G, whereas F 5.G is an
application of the function F to two arguments, the floating
point value 5. and the function G. Also note that the
composition operator is associative, since (F.G.H) X = ((F.G).H) X
= F (G (H X)) = (F.(G.H)) X.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The operators +, -, *, / denote the sum, the
difference, the product and the quotient of two numeric values,
respectively. As already noted, - is also used as unary
minus. The operators div and mod denote integer quotient
and modulo, respectively; they only apply to integers. The ^
operator denotes exponentiation, as defined by X^Y = exp (ln
X*Y); it always returns a floating point value. (If X<0 then
X^Y is defined only if Y is an integer. 0^0 is left
undefined as well, so if you would like to have that 0^0 = 1 then
you must add corresponding equations yourself. Also note that the
complex.q standard library module extends the built-in definition
of the exponentiation operator to handle the case that X<0 with
general exponent; see Complex Numbers.)
The argument and corresponding result types of these operations are
summarized in the following table (Int denotes integer,
Float floating point, and Real integer or floating point
values):
+ - *Int Int -> Int
Int Float -> Float
Float Int -> Float
Float Float -> Float
/ ^Real Real -> Float
div modInt Int -> Int
- (unary)Int -> Int
Float -> Float
Note that, as of Q 7.2, all floating point operations properly deal with
IEEE floating point infinities and NaN ("not a number")
values. Consequently, division by zero now yields one of the special
values inf, -inf or nan, depending on the value of
the first operand. (Of course, this will only work as described on
systems with a proper IEEE floating point implementation. The
interpreter makes no attempt to emulate IEEE floating point values on
systems which do not provide them natively.)
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The operators <, >, =, <=, >=,
<> are binary predicates meaning "less", "greater",
"equal", "less or equal", "greater or equal" and "not equal",
respectively. The built-in definition of these operations only applies
to numbers, strings and truth values. All relational operators return
truth values (true, false) as results. Strings are
compared lexicographically, on the basis of the character encoding
(which, as of Q 7.0, usually is Unicode). Truth values are ordered by
false < true.
If you would like to compare other types of values than the basic
objects mentioned above, normally you will have to provide suitable
definitions yourself. For instance, you might wish to extend the
equality operation to other built-in and user-defined data structures
such as lists, trees, etc., by "overloading" the = operator
accordingly. The following equations implement an equality predicate on
lists (the parentheses on the left-hand side are necessary to prevent
the equality operation to be interpreted as the equal sign separating
left-hand and right-hand side of an equation):
([] = []) = true; ([] = [Y|Ys]) = false; ([X|Xs] = []) = false; ([X|Xs] = [Y|Ys]) = (X=Y) and then (Xs=Ys); |
Rules for other comparison operators (<>, <=, etc.) could
be added in the same fashion. Actually, the standard library provides
corresponding definitions; see The Standard Library.
A special case arises with types consisting of only nullary constructor
symbols declared with const, so-called enumeration types,
see Types. The values of such a type can always be compared with
the relational operators, using the order in which they are listed in
the type declaration. (A special case of this is the order of the
built-in truth values.) For instance, assume that the Day type is
declared as follows:
type Day = const sun, mon, tue, wed, thu, fri, sat; |
Then the listed constants will be ordered as sun < mon < … <
sat.
Besides the equality predicate, the Q language also provides a notion of "syntactic" equality, implemented by the `==' operator, which applies to all kinds of expressions; see Non-Linear Equations, for details.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The logical operations not, and, or denote logical
negation, conjunction and disjunction, respectively. These operators
take truth values as their arguments. They are defined in a
straightforward manner:
not true = false; not false = true; true and true = true; true and false = false; false and true = false; false and false = false; true or true = true; true or false = true; false or true = true; false or false = false; |
Like most other programming languages, Q also has logical connectives for
the short-circuit evaluation of logical expressions, which are
denoted by the operators and then and or else. These
operations are actually implemented as "special forms" which evaluate
their arguments from left to right only as far as required to determine
the value of the expression (cf. Special Forms). They are defined
by the following built-in equations:
true and then X = X; false and then X = false; false or else X = X; true or else X = true; |
The first operand is always evaluated. Depending on its value, the
second operand may not be evaluated at all. For instance, if X
evaluates to false, then X and then Y immediately
reduces to false, without ever having to evaluate the second
argument Y. On the other hand, if X evaluates to
true, then Y is evaluated and returned as the value of the
expression. The or else operation works analogously.
One reason for using short-circuit evaluation is efficiency: prevent unnecessary computations when an initial part of a logical expression already determines the value of the entire expression. Furthermore, short-circuit evaluation is sometimes essential to check a condition before evaluating an expression depending on the validity of this condition. For instance:
(X <> 0) and then (foo (1/X) > 0) |
You should also note that, according to the above definitions, X
and then Y and X or else Y are always reduced if X
is a truth value, even if the second operand Y does not
evaluate to a truth value. This may look a bit weird at first, but it is
in fact the most reasonable way to implement short-circuit evaluation in
the Q language, since Q uses dynamic typing, and hence the type of an
expression is only known after it has been evaluated. In fact,
this "feature" can be quite useful at times. For instance, you can
also use and then to write down a simple kind of conditional
expression like
check X and then bar X |
where bar X is the value you wish to compute, but only if the
condition check X holds.
The Q interpreter also uses the operators not, and and
or to denote bitwise logical operations on integers. Thus,
not X denotes the one's complement of an integer X, and
X and Y and X or Y the bitwise logical conjunction and
disjunction of integers X and Y, respectively. These
operations behave as if negative integers were represented in two's
complement (although GMP actually uses a sign-magnitude
representation). This means that for each integer X, -X =
not X + 1, or, equivalently, not X = -X-1. For instance:
==> 17 and not 13 16 ==> 17 or not 13 -13 ==> not _ 12 |
These results become clear when considering that 17 has bits 0
(=1) and 4 (=16) on (and all other bits off), while not 13 has
bits 0 (=1), 2 (=4) and 3 (=8) off (and all other bits on). Thus,
writing these numbers as unsigned bit sequences with most significant
bits first, where ...1 denotes an infinite sequence of leading
1's, we have:
17 and not 13 = 10001 and not 1101 = 10001 and ...10010 = 10000 = 16 17 or not 13 = 10001 or not 1101 = 10001 or ...10010 = ...10011 (= not 1100 = not 12) = -13 |
Together with the built-in bit shift operations, the bitwise logical operations can also be used to implement "bitsets", i.e., sets of nonnegative integers represented by bit patterns. See Arithmetic Functions, for details.
In case you are wondering about "exclusive or:" While this operator is not provided as a builtin, you can easily define it yourself as follows:
public (xor) X Y @ 3; X xor Y = (X or Y) and not (X and Y); |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The ++ operator denotes concatenation, # is the length or
size operator, and the subscript operator ! is used for
indexing. These operations work consistently on strings, lists and
tuples. For instance:
"abc"++"xy" ⇒ "abcxy" [a,b,c]++[x,y] ⇒ [a,b,c,x,y] (a,b,c)++(x,y) ⇒ (a,b,c,x,y) #"abc" ⇒ 3 #[a,b,c] ⇒ 3 #(a,b,c) ⇒ 3 "abc"!1 ⇒ "b" [a,b,c]!1 ⇒ b (a,b,c)!1 ⇒ b |
Indexing with the subscript operator starts at zero, so that X!0
and X!(#X-1) denote the first and last member of a string, list
or tuple, respectively. Also note that since ! is
right-associative, double subscripts may have to be parenthesized. For
instance, compare (X!I)!J and X!I!J=X!(I!J).
All list and tuple operators check that their first argument is a
well-formed list or tuple value. However, the second argument of the
++ operator may in fact be any value. List concatenation just
replaces the empty sublist marking the end of its first list argument
with the second argument, as if it was defined by the following
equations:
[]++Ys = Ys; [X|Xs]++Ys = [X|Xs++Ys]; |
Hence we have that, e.g., []++1 ⇒ 1 and [1,2]++3
⇒ [1,2|3], which may look odd, but is consistent with the above
equations. Tuple concatenation works in an analogous manner.
All operators discussed in this section are also available for streams
(cf. Lists, Streams and Tuples). However, the corresponding
definitions are not built into the language but are provided in the
standard library script stream.q, see Streams.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The application operator `$' and the sequence operator `||'
have the lowest precedence among the prefix and infix operators. The
$ operator is right-associative and binds stronger than the
|| operator, which is left-associative.
The infix application operator provides an alternative way to denote
function application, i.e., F $ X ⇒ F X. This operator
comes in handy when you want to pass around function application as a
function parameter. Since this operator has very low precedence and is
right-associative, it also provides a convenient way to write down
"cascading function applications," such as foo $ bar $ X+Y = foo
(bar (X+Y)), which can save a lot of parentheses and make an expression
more readable.
The sequence operator lets you evaluate sequences of expressions in a given order. The value of the sequence is given by the rightmost expression. That is,
X || Y ⇒ Y. |
The sole purpose of this construct is to allow operations with side-effects (such as I/O functions) to be carried out sequentially. A typical example is
writes "Input: " || reads |
which prints a prompt on the terminal and then reads in a string. (The
built-in functions writes and reads are described in
Built-In Functions.)
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
As of version 7.1, Q also provides special syntax for two other kinds of
constructs which are commonly encountered in functional programs:
conditional expressions and lambda abstractions. A conditional
expression takes the form if X then Y else Z where X,
Y and Z are arbitrary expressions. It returns the value of
Y if the value of X is true, and the value of
Z if the value of X is false. The `else' part
can also be omitted, in which case () is returned if the value of
X is false. Like the built-in logical connectives
and then and or else, conditional expressions are special
forms which are evaluated in short-circuit mode; thus, if X
evaluates to false, then Y will never be evaluated.
Lambda abstractions can be denoted using the customary notation
\X.Y where X is an expression denoting a pattern to be
matched against the argument of the lambda function, and Y is
another expression, the lambda body which is to be evaluated when the
lambda function is applied to an argument. Note that if the argument
pattern X is not an atomic expression (i.e., not a basic
expression, list, stream or tuple) then it must be parenthesized; the
lambda body Y never has to be parenthesized, however, because of
the low precedence of the lambda construct. Multi-argument lambdas can
be written using the notation \X1 X2 … . Y, which is just a
shorthand for \X1 . \X2 . … . Y. Lambdas are special forms;
neither the parameter patterns nor the body are evaluated, rather they
are "compiled" (at runtime) to a special kind of function object which
can be applied to the actual arguments in an efficient manner, see
Lambda Abstractions, for details.
Like the other operators, both if-then-else and the \X.Y
construct are merely syntactic sugar for function applications. The
if-then-else operator is actually implemented by the standard
library function ifelse (or the when function if the
`else' part is omitted), see Conditionals and Comprehensions. The \X.Y construct translates to an application
of the built-in lambda function, see Lambda Abstractions.
Note that the \X.Y construct has the lowest possible precedence,
even lower than the sequencing operator `||', so it always has to
be parenthesized unless it forms a toplevel expression or the body of
another lambda construct. The if-then-else operator binds
stronger than `||', but weaker than `$', and "dangling
else" parts are assumed to belong to the most recent
if-then. This makes imperative-style code like the following
behave as expected:
test = \X.
writes "The value is " ||
if X > 0 then
writes "positive.\n"
else if X < 0 then
writes "negative.\n"
else
writes "zero.\n";
|
Also note that if-then-else needs to be parenthesized unless it
occurs as a toplevel or inside a lambda, sequence or another conditional
expression:
test = \X. writes $ "The value is " ++
(if X > 0 then
"positive"
else if X < 0 then
"negative"
else
"zero") ++ ".\n";
|
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
As of version 6.2, the Q language also lets you you define your own prefix and infix operator symbols, using a declaration like the following (see also Declarations):
public (myop) X Y; |
This effectively turns myop into a keyword which can be used as
an infix operator: X myop Y. User-defined operators work exactly
like the built-in ones, in that you can turn them into a prefix function
by enclosing them in parentheses, as in (myop), and that you can
build sections from them which leave the left or right operand
unspecified, such as (X myop) and (myop X).
By default, the new operator has the same precedence and associativity as the relational operators (which are at precedence level 2, and are non-associative). You can also explicitly specify a precedence level, like so:
public (myop) X Y @ 2; |
Alternatively, you may denote the precedence level by listing an existing operator symbol. The following is equivalent to the above declaration:
public (myop) X Y @ (<); |
As of Q 7.7, the operator symbol doesn't have to be an identifier, it can also be a sequence of punctuation characters as discussed in Operator Symbols. For instance:
public (===) X Y @ (=); |
As a real-world example, here is the definition of an xor
(exclusive-or) operator from Logical and Bit Operators. It uses
the third precedence level, which is occupied by the addition operators
(which also includes the built-in or operator):
public (xor) X Y @ 3; X xor Y = (X or Y) and not (X and Y); |
In general, a quick glance at the operator table in the preceding
subsection reveals that valid precedence level numbers for the prefix
and infix operators go from 0 (lowest precedence, binary sequence
operator) to 9 (highest precedence, unary quotation operators), with 8
denoting the precedence of function application. (Note that there are no
numbered precedence levels for the if-then-else and \X.Y
constructs because there is no way to declare such mixfix operators in
Q.) The complete list of built-in prefix and infix operators, cast as a
sequence of valid Q declarations, is shown below.
public (::||) X Y @0, (::$) X Y @1; public (::<) X Y @2, (::<=) X Y @2, (::>) X Y @2, (::>=) X Y @2, (::=) X Y @2, (::<>) X Y @2; public special (::==) X Y @2; public (::++) X Y @3, (::+) X Y @3, (::-) X Y @3, (::or) X Y @3; public special (::or else) ~X Y @3; public (::*) X Y @4, (::/) X Y @4, (::div) X Y @4, (::mod) X Y @4, (::and) X Y @4; public special (::and then) ~X Y @4; public (::#) X @5, (::not) X @5; public (::^) X Y @6, (::!) X Y @6; public (::.) X Y @7; public special const (::') X @9; public (::`) X @9, (::~) X @9, (::&) X @9; |
Note that precedence level 8 may not be used for user-defined operators. Other than that, all precedence levels are permitted, but the fixity (prefix/infix), the number of operands (unary/binary) and (unlike in ML and Haskell) also the associativity of operators (non-/left-/right-associative) are all determined by Q's system of built-in operators, as described in Built-In Operators.
While user-defined operators are often useful for extending Q's built-in operator system and to implement "domain-specific languages", the excessive use of obscure operator symbols, especially the use of multi-character punctuation looking like "line noise", may make your programs hard to read and is considered bad style.
As explained in Operator Symbols, some built-in operator symbols, such as `-', `=' and `.' are actually part of Q's syntax and need special treatment. These symbols may therefore not be redeclared as user-defined operators. The same applies to special symbols like `|', `..', `\', `@' and the keywords of the language. In general, you should also refrain from redeclaring other built-in operators since this will most likely be confusing for other programmers reading your code. Most of the time that you want to define, say, a `+' operation on a given data structure, extending the built-in `+' operator should work just fine; there's usually no need to declare a new `+' operator for that purpose.
You should also be aware that declaring new operator symbols actually amounts to changing the syntax of the Q language in small and sometimes subtle ways, both at compile- and runtime. The amount of havoc you can wreak on the syntax is somewhat limited by the fact that Q does not allow you to create your own precedence levels; all the available levels and the corresponding fixities of operators are prescribed by Q's expression syntax. Nevertheless, designing your own operator systems takes careful planning. In particular, since operator symbols are parsed using the maximal munch rule (see Operator Symbols), it is a bad idea to declare binary operator symbols like `--' or `<-' which end in a valid prefix of a unary operator (`-' in this case). Such ambiguities can break existing code and produce bugs which are hard to find. When in doubt, just use symbols from the extended Unicode character set; there are plenty of nice mathematical and other punctuation symbols in Unicode which are very useful for programming purposes.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
"Computational processes are abstract beings that inhabit computers." [Abelson/Sussman 1996, p.1]
At its core, any programming language faces the programmer with three distinct abstract notions [Abelson/Sussman 1996]: the entities which are to be manipulated, usually referred to as "data", the computational "processes" which carry out those manipulations, and the description of these processes by finite collections of computational rules, commonly called "programs".
As in other functional programming languages, all computations performed by Q scripts are expression evaluations: The user specifies an expression to be evaluated, the interpreter computes the value of that expression and prints it as the result of the computation.
Having described the "data" manipulated by Q scripts, expressions, we now turn to the computational processes carried out on those expressions, expression evaluations, and their description by collections of computational rules, equations.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
In the Q language, there is in fact no exact equivalent of a "function definition" in conventional languages. Rather, definitions take the form of equations which describe how a given expression, the left-hand side of the equation, can be transformed into another one, the corresponding right-hand side. Note the difference between this interpretation and common mathematical usage in which an equation is usually something to be "solved". Here we are actually dealing with "rewriting rules", i.e., the equations are oriented from left to right. The syntax of equations is captured by the following grammar rules:
definition : expression
lqualifiers '=' expression qualifiers ';'
{lqualifiers '=' expression qualifiers ';'}
lqualifiers : [qualifier {qualifier} ':']
qualifiers : {qualifier}
qualifier : condition
| where-clause
condition : 'if' expression
| 'otherwise'
where-clause : 'where' expression '=' expression
{',' expression '=' expression}
|
Variables occurring on the left-hand side of an equation are taken to be
universally quantified. They play the role of formal parameters in
conventional programming languages, which are "bound" to their
corresponding values when the equation is applied to an actual
expression value. For instance, suppose we wish to define a function
sqr which squares its argument. We know that to square something
we simply multiply it with itself. The corresponding equation is:
sqr X = X*X; |
Here the left-hand side is simply the application of the function symbol
sqr to an argument value X which stands for any actual
value we might apply the sqr function to.
An equation may also contain a collection of conditions and "local"
variable definitions (these two kinds of supplementary elements are also
called qualifiers in Q parlance), and multiple right-hand sides
(which are interpreted as a collection of equations for the same
left-hand side). An empty condition may be denoted with the keyword
otherwise, which is a convenient means to indicate the "default
case" of a definition. For instance, the following equations define the
factorial function:
fac N = N*fac (N-1) if N>0;
= 1 otherwise;
|
The otherwise keyword is nothing but syntactic sugar; it can always
be omitted. However, it tends to improve the readability of definitions like
the one above.
Qualifiers can also be written on the left-hand side of an equation, like so:
fac N if N>0: = N*fac (N-1);
otherwise: = 1;
|
The two forms are more or less equivalent, and can be mixed, but left-hand side qualifiers may also be shared by multiple equations for the same left-hand side. For instance, here is a definition of Ackerman's function:
ack M N if M>0: = ack (M-1) (ack M (N-1)) if N>0;
= ack (M-1) 1 if N=0;
if M=0: = N+1;
|
Note that the condition M>0 on the left-hand side of the first
equation applies to both the first and the second equation. In general,
the scope of a left-hand qualifier extends up to (but excluding) the
next equation with either another left-hand side qualifier or a new
left-hand side expression.
In difference to function definitions in conventional programming languages, the left-hand side of an equation may also contain constant and structured argument patterns which are to be matched against the actual parameters of a function. For instance, the following equations, which involve constant values on the left-hand side, define the Fibonacci function:
fib 0 = 0; fib 1 = 1; fib N = fib (N-1) + fib (N-2) otherwise; |
As an example for structured argument patterns, operations to extract the head (first element, the "car" in Lisp terminology) and tail (list of remaining elements, the "cdr") of a list are defined in the standard library as follows:
hd [X|Xs] = X; tl [X|Xs] = Xs; |
Note that in the above example we actually only need one of the involved component values. In such a case it is possible to employ the anonymous variable `_' as a placeholder for any component value we do not care for. For instance:
hd [X|_] = X; tl [_|Xs] = Xs; |
Multiple instances of the anonymous variable are matched independently from each other, therefore an equation like
foo _ _ = 0; |
will be matched for any combination of arguments. Also note that values matched by the anonymous variable on the left-hand side of an equation are not accessible on the right-hand side. (The anonymous variable may appear on the right-hand side of an equation, but there it is always treated as a free variable, see Free Variables.)
As another example, here is the definition of a function which adds up the members of a list:
add [] = 0; add [X|Xs] = X+add Xs; |
The above definition is the Q equivalent for what Lisp programmers call "cdr'ing down a list". In general you will find that pattern-matching lets you express many functions operating on recursive, list- or tree-like data structures in a very concise and elegant way. For instance, we can define a binary tree insertion operation simply as follows:
const nil, bin X T1 T2;
insert X nil = bin X nil nil;
insert X (bin Y T1 T2) = bin Y (insert X T1) T2 if X<Y;
= bin Y T1 (insert X T2) otherwise;
|
Here the nil and bin symbols act as constructors
which are used to build a recursive binary tree data structure. To these
ends, the bin symbol can be applied to a value X (which is
the value to be stored at that node in the tree) and the left and right
subtrees T1 and T2. This works exactly like a function
which is applied to its arguments, but here the bin symbol does
not actually compute anything but simply stands for itself.
In the above example, we actually advertised the fact that nil
and bin are just constructors by declaring them with
const, as described in Declarations. While this declaration
is not strictly necessary, it helps to enforce that these symbols cannot
accidentally be "redefined" by means of an equation:
nil = ...; // compiler will print an error message bin X T1 T2 = ...; // here, too bin X = ...; // here, too (prefix of a constant) |
The same applies to the built-in constant symbols true and
false which are predeclared as const, and also to other
types of built-in constants, i.e., numbers, strings, files, lists,
streams and tuples. The general rule is that no constant value is
allowed as the left-hand side of an equation. A singleton variable is
also forbidden. Thus the left-hand side must either be a
non-const function symbol or application. Prefixes of constant
applications, such as bin X in the example above, also count as
constants and are thus forbidden on the left-hand side. On the other
hand, both constants and variables may occur as the head of the
left-hand side of an equation if they are followed by additional
arguments. Thus equations like the following are all valid:
0 X = ...; bin X T1 T2 Y = ...; F X Y = ...; |
Note that since expressions involving operators are nothing but
syntactic sugar for applications, they can occur on the left-hand side
of an equation as well (with the exception of the ' operator
which is a constructor symbol, see Special Forms). For instance,
here is a collection of algebraic simplification rules which cause any
expression involving only + and * to be converted into a
"sum of products" form:
// distributivity laws: (X+Y)*Z = X*Y+X*Z; X*(Y+Z) = X*Y+X*Z; // associativity laws: X+(Y+Z) = (X+Y)+Z; X*(Y*Z) = (X*Y)*Z; |
A word of caution: if the = operator occurs at the toplevel of
the left- or right-hand side of an equation, it must be
parenthesized. This is necessary to prevent the = operator to be
confused with the = symbol used to separate both sides of an
equation. (The same also applies to variable definitions.) E.g., you
should write something like
(X=X) = true; |
instead of
X=X = true; // this is a syntax error |
Although almost any expression can form the left-hand side of an equation, you should be aware that an equation will only be applicable if the pattern you specify can actually occur in the course of an expression evaluation. In particular, since the Q interpreter evaluates expressions in a leftmost-innermost fashion, you should make sure that argument patterns match normal form expressions (cf. Normal Forms and Reduction Strategy), unless the function to be defined is a special form (cf. Special Forms). Otherwise the equation may be useless, albeit syntactically correct. The compiler cannot verify these conditions, so they are at your own responsibility.
Using equations we can also easily define "higher order" functions
which take functions as arguments or return them as results. This is
made possible by the fact that function application is an explicit
operation. For instance, as a generalization of the add function
above, the standard library provides an operation which applies
any "binary" function F to a list, starting from a given
initial value A:
foldr F A [] = A; foldr F A [X|Xs] = F X (foldr F A Xs); |
The foldr ("fold-right") function takes as its arguments a function
F, an initial value A, and a list [X1,...,Xn], and
computes the value
F X1 (F X2 (... (F Xn A)...)). |
(The name of this function comes from the fact that the recursive
applications of F are parenthesized to the right.) Now we can easily
define the add function in terms of foldr as follows:
add = foldr (+) 0; |
There is another version of the fold operation, which accumulates results from left to right rather than from right to left:
foldl F A [] = A; foldl F A [X|Xs] = foldl F (F A X) Xs; |
Note that although both functions work in linear time (with respect to
the size of the input list), the foldl function actually is more
efficient than foldr in terms of stack space requirements since
its definition is "tail-recursive" (cf. Tail Recursion).
As another example, here is the definition of the standard library
function map which maps a given function F to each member
of a list Xs:
map F [] = []; map F [X|Xs] = [F X|map F Xs]; |
Another example of a (built-in) higher-order function is lambda,
which returns another function (cf. Lambda Abstractions). In fact, there is nothing that keeps you from defining a
function like lambda yourself in Q. As a proof of concept, here
is a toy implementation of the lambda calculus, based on the
combinatorial calculus, as described in [Henson 1987]. (This definition
will only work with simple kinds of expressions. The built-in
lambda function provides a much more comprehensive implementation
which also properly handles the obscure cases. But our definition in
terms of the combinatorial calculus could be extended to handle those
cases, too.)
Our little lambda function is declared as a special form
which prevents its arguments from being evaluated. This will be
explained in Special Forms.
special lambda X Y; lambda X X = _I; lambda X (Y Z) = _S (lambda X Y) (lambda X Z); lambda X Y = _K Y otherwise; /* combinator rules */ _I X = X; // identity function _K X Y = X; // constant function _S X Y Z = X Z (Y Z); // argument dispatching |
For instance:
==> lambda X (2*X) _S (_S (_K (*)) (_K 2)) _I ==> _ 4 8 |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
If a variable has multiple occurrences on the left-hand side of a rule,
each occurrence must be matched to the same value. In term rewriting
theory, such rules are called "non-linear" ("non-left-linear", to be
more precise). The following definition implements an operation
uniq which removes adjacent duplicates from a list:
uniq [] = []; uniq [X,X|Xs] = uniq [X|Xs]; uniq [X|Xs] = [X|uniq Xs] otherwise; |
It is important to notice the difference between the syntactical notion of equality which plays a role in definitions like the one above, and the "semantic" equality relation defined by the `=' operator (cf. Relational Operators). Non-linear equations and equations involving constants, such as
uniq [X,X|Xs] = uniq [X|Xs]; fib 0 = 0; |
call for the verification of syntactic identities. The Q language implements these checks by means of a special relational operator `=='. Thus the above examples could also be written as:
uniq [X,Y|Xs] = uniq [X|Xs] if X==Y; fib X = 0 if X==0; |
Two expressions are considered syntactically equal (`==') only if
they print out the same in the interpreter. In contrast, the meaning of
the normal equality operation `=' depends on built-in rules and
equations specified by the programmer. For instance, two different
instances of the expression 0 always denote the same
object. However, since 0 is an integer and 0.0 a floating
point number, these two expressions are syntactically
different:
==> 0==0.0 false |
Nevertheless, the = operator allows to compare integers and
floating point numbers. Indeed it asserts that 0 and 0.0
are "equal":
==> 0=0.0 true |
Note that, in contrast to all the other built-in relational operators, the syntactic equality operator `==' is actually implemented as a special form (cf. Special Forms) which compares its operands as literal expressions without evaluating them:
==> 0==0+0 false |
This is necessary to ensure proper operation when a nonlinear equation needs to compare two unevaluated special arguments for syntactic identity, but makes the operation somewhat awkward to use directly in programs. As a remedy, you can easily define a non-special syntactic equality operation yourself as follows:
eq X X = true; eq _ _ = false otherwise; |
In fact, this definition is already provided in the standard library
script stdlib.q. The standard library also defines the logical
negations `!=' and neq of `==' and eq, see
The Standard Library, for details.
You should also note that the strict separation of syntactic and
semantic equality is actually quite useful. First of all, it allows the
= operation to be treated in a manner consistent with other
comparison operators such as < and >. Secondly, it allows
you to tailor the definition of = to your application. For
instance, an application might call for a set data structure, and you
would like to be able to test whether two expressions of a certain kind
represent the same set. Now there are fairly sophisticated data
structures for representing sets efficiently, but almost none of them
allow you to decide whether two objects represent the same set simply on
the basis of syntactic identity. Instead, you will have to provide an
explicit = operation which tests whether two set objects contain
the same elements.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
On the right-hand side of an equation (as well as in the condition part
and the right-hand sides of local definitions), we distinguish between
bound variables, i.e., unqualified variable symbols which are
introduced in the left-hand side of an equation (or the left-hand side
of a local definition, see Local Variables), and free or
global variables which are not bound by any of the
left-hand side patterns. Free variables can also be declared explicitly
using a var declaration (cf. Declarations). They are
effectively treated as (parameterless) function symbols, except that
they may be assigned a value by means of a variable
definition. Variable definitions have the following syntax:
definition : 'def' expression '=' expression
{',' expression '=' expression} ';'
| 'undef' identifier {',' identifier} ';'
|
For instance, in the following equation,
foo X = C*X; |
the variable C occurs free on the right-hand side and we have:
==> foo 23 C*23 |
We can assign a value to the variable as follows (this can be done either in the script or interactively in the interpreter):
def C = 2; |
After this definition we have:
==> foo 23 46 |
It is worth noting here that, for convenience, the global variable environment uses dynamic binding, i.e., you can change the value of a free variable at any time interactively in the interpreter:
==> def C = 3; foo 23 69 |
This also makes it possible to have recursive definitions like the following:
==> var fac = \N.if N>0 then N*fac (N-1) else 1 ==> fac \X1 . if X1>0 then X1*fac (X1-1) else 1 ==> map fac [1..10] [1,2,6,24,120,720,5040,40320,362880,3628800] |
(Note that we have to use an explicit variable declaration here, since `fac' is an uncapitalized identifier and hence would otherwise be taken to denote a function symbol. We also made use of a variable definition with initializer which declares and defines the variable in a single step. See Declarations.)
In contrast, bound variables (which are bound either by the left-hand side of the equation, cf. Equations, or in a local variable definition, cf. Local Variables) always use lexical binding, i.e., their values are always completely determined by the (static) context of the equation to which they belong. This is important because it guarantees referential transparency, i.e., the values of the left-hand side and local variables in an equation will only depend on the expression to which the equation is applied, as long as the definition does not involve any operations with side-effects, such as I/O. The values of the free variables may change, but only between different expression evaluations, as there is no function which would allow you to change the value of a global variable during an ongoing evaluation. This is in contrast to other functional programming languages such as Lisp which allow you to modify the value assigned to a variable in a function definition.
Free variables are useful if you have a script which repeatedly refers to the same value whose calculation may be costly or involve functions with side-effects. By assigning such a value to a variable, you can refer to it without having to recompute the value each time it is required, as would be the case with a corresponding function definition.
The right-hand side of a variable definition may be any Q expression, as
described in Expressions. The expression is evaluated once,
at the time the script is loaded, and is then matched against the
left-hand side of the definition, binding variables to the corresponding
component values of the right-hand side value. In the special case that
the left-hand side is a single variable, it is simply bound to the
right-hand side value. Whenever a defined variable occurs "free" on
the right-hand side of an equation or variable definition, it will then
be replaced by the value of the assigned expression. For instance, the
following definition assigns the value exp 1.0 = 2.71828…
to the free variable e:
var e = exp 1.0; |
Variable definitions are carried out in the order in which they occur in
a script, so that a definition may refer to the values of variables
assigned in previous definitions. It is also possible to override
assignments made in previous definitions, and to undefine a
variable. Multiple definitions can be performed with a single def
statement. For instance:
def N = 99, K = N, N = 2*K+1, M = N; undef N, K; |
As already mentioned, the value assigned to a variable may also be
changed with def and undef statements in the interpreter,
see Command Language.
Note that a variable can also be declared const, which can be
used to introduce constant values, as in:
public var const c = 299792458; // the speed of light, in m/sec |
In this case the variable can only be assigned to once, afterwards attempts to assign a new value to the same variable will produce an error message:
==> def c = 300000000
! Cannot redefine const variable
>>> def c = 300000000
^
|
Free variable definitions can also involve a pattern on the left-hand side which is to be matched against the supplied value. E.g.,
def [X|Xs] = [1,2,3]; |
assigns 1 to the variable X and the remainder of the list,
[2,3], to the variable Xs. Pattern matching is performed
in the same manner as for the left-hand side of an equation. However, an
error is reported if the match fails (as would be the case, e.g., if we
tried to assign the empty list to [X|Xs]).
A number of built-in free variables are already defined by the interpreter when it starts up, and there is also a free anonymous variable `_' which is used to record the result of the most recent expression evaluation performed in the interactive evaluation loop of the interpreter. See Command Language, for details.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
As already mentioned, an equation may also contain one or more auxiliary
variable definitions a.k.a. where clauses, which bind
additional variables to their corresponding values. In difference to the
global definitions of "free" variables discussed in Free Variables, the definitions in where clauses are "local" to the
rule in which they occur, meaning that the values of these variables are
only available in the rule they are defined in. As in the case of free
variable definitions, the left-hand side of each where clause can
be an arbitrary expression pattern which is to be matched against the
value computed from the corresponding right-hand side. The variables
occuring on the left-hand side of the definition will then be bound to
their corresponding values, thereby becoming "bound" variables in the
context of the rule, in addition to the variables occuring on the
left-hand side of the equation. Local definitions also act as additional
conditions; the rule will only be applicable if all left-hand sides in
where clauses match their corresponding right-hand side values
(this is guaranteed, of course, if each left-hand side is a single
variable).
Local definitions are useful if the right-hand side of a rule refers repeatedly to the same (maybe complicated) subexpression. You can avoid the recalculation of such a value be assigning it to a variable, and referring to the variable in the right-hand side. For instance:
foo X = bar Y Z where Y = baz X, Z = qux Y; |
As already mentioned, pattern-matching definitions are permitted as well (and the anonymous variable also works as usual). A failing match causes the entire rule to fail. For instance,
foo Z = bar X Z where [X|_] = Z; |
works like
foo [X|Xs] = bar X [X|Xs]; |
but avoids repeating the list value on the right.
Like conditions, where clauses can also be written on the
left-hand side of an equation. This allows you to share the local
variables between different equations for the same left-hand side. For
instance:
foo Z where [X|_] = Z: = bar X Z if X>0;
= bar (-X) Z otherwise;
|
The variable bindings in a single where clause are performed in
the given order, and each definition may refer to all left-hand side
variables of the rule, as well as all variables introduced in earlier
definitions. Each use of a variable refers to its most recent
definition. It is also possible to have multiple where clauses,
as in:
foo X = bar Y where Y = baz Z where Z = qux U where U = quux X; |
Note that here the definitions will be processed from right to
left. Such constructs are convenient if there is a series of values to
be computed in which one value depends on the next one. Using multiple
where clauses, you start off with the definitions immediately
required by the right-hand side of the rule, then add another
where clause for the variables introduced in the first clause,
etc. The above definition is actually equivalent to the following:
foo X = bar Y where U = quux X, Z = qux U, Y = baz Z; |
It is also possible to mix where clauses with conditions, as in:
foo X = bar Y if qux Y where Y = baz Z if quux Z where Z = bazz X; |
Again, the different qualifiers are actually considered from right to
left; the rule will be aborted as soon as the first qualifier fails
(i.e., a condition evaluates to false or the left-hand side of a
definition does not match the corresponding right-hand side), so that
no time is wasted with definitions which are not needed anyway.
Of course, left-hand side qualifiers work in the same fashion:
foo X if qux Y where Y = baz Z if quux Z where Z = bazz X: = bar Y; |
You can also mix left-hand and right-hand qualifiers, in which case the left-hand qualifiers will be processed first:
foo X if quux Z where Z = bazz X: = bar Y if qux Y where Y = baz Z; |
Variables bound by the left-hand side or a where clause can be
rebound freely. E.g.:
foo X = bar (X+1) where X = qux X X
if X>0 where X = baz X;
|
Each occurrence of the variable in the right-hand side of the equation or a local definition then refers to its most recent value. Note that this doesn't mean that local variables are mutable; it just means that there are multiple nested scopes of symbols. More precisely, each local definition introduces a nested scope binding the variables in the left-hand side of the definition. It goes without saying that repeated rebinding of the same variable can be confusing, but sometimes it is appropriate and at other times it might just be more convenient than having to invent a bunch of additional variable names for no other purpose than disambiguation.
Also note that if a variable is bound by the left-hand side or a local
definition of a rule, then a global variable of the same name will be
"shadowed" within the rule. However, in this case you can still access
the value of the shadowed binding with a qualified identifier. For
instance, with the following definitions, foo 2 reduces to
bar 2*99:
/* this is the script foo.q */ def FOO = 99; foo X = FOO*foo::FOO where FOO = bar X; |
If the free variable belongs to the current module, as in the example above, then it can also be escaped using an inline variable declaration (cf. Constants and Variables), like so:
def FOO = 99; foo X = FOO*var FOO where FOO = bar X; |
Inline variable declarations can also be used to explicitly declare free
variable symbols on the fly. If the symbol specified with var has
not yet been declared in the current module, it will be declared as a
private free variable symbol.
Finally, Haskell programmers should note that Q's where clauses
are really just variable definitions, and not (as in Haskell) a
kind of "local equations" which would permit you to define "local
functions". (There are technical reasons for this which we won't
discuss here.) So you shouldn't try to emulate Haskell's local function
definitions using Q's where clauses. For instance, while the
following definition is perfectly legal Q code, it does not have the
intended effect of defining a local factorial function:
// this doesn't work as intended foo X = map fac [1..10] where fac N = if N>0 then N*fac (N-1) else 1; |
Instead, the where clause is just an (unsuccessful) attempt to
match the right-hand side expression, `if N>0 then N*fac (N-1) else
1', against the left-hand side expression `fac N' and bind the
variable N accordingly.
It is actually possible to have a kind of "local functions" in Q as well, but this involves creating a lambda abstraction and assigning it to a local variable. However, this is usually not very convenient and a "global" function definition using equations is almost always preferable. Also note that, because of Q's scoping rules in local definitions, you cannot really have recursive local definitions anyway. For instance, the following definition will not work as intended either (even though the analogous global definition of the factorial works just fine, because of the dynamic binding of global variables):
// doesn't work either foo X = map FAC [1..10] where FAC = \N.if N>0 then N*FAC (N-1) else 1; |
Only non-recursive "local functions" can be defined this way. (Well, using the "fixed point combinator" it is in fact possible to define the factorial and actually any recursive function in a form that can be assigned to a local variable, but we won't go into that here.)
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The Q language allows you to restrict the set of patterns which can be
matched by a variable on the left-hand side of an equation or variable
definition. This is done by augmenting any left-hand side occurrence of
the variable with a type guard, using the notation:
variable:type. For instance, to declare a type
BinTree which comprises the constructors nil and bin
X T1 T2, the following declaration might be used
(cf. Declarations):
public type BinTree = const nil, bin X T1 T2; |
This declaration introduces two function symbols nil and
bin, just as if they were declared by:
public const nil, bin X T1 T2; |
However, it also assigns these symbols to the type symbol BinTree.
The type symbol can then be used as a guard on variables to ensure
that a variable is only matched to objects of the given type. E.g., the
following equation will only apply if the argument to the foo
function has the form nil or bin X T1 T2 for some
subexpressions X, T1 and T2:
foo T:BinTree = ...; |
This makes it possible to avoid explicit argument patterns like
foo nil = ...; foo (bin X T1 T2) = ...; |
in cases in which the component values are not actually required. Also, it
allows you to define operations on a given type without actually referring
to the constructor symbols of the type, and thereby makes it possible to
avoid dependencies on implementation details. You can even "hide" the
constructor symbols of a type by declaring them private, which makes
the symbols accessible only to the script which declares the type they
belong to. For instance:
public type BinTree = private const nil, bin X T1 T2; |
The Q language allows you to construct a hierarchy of types by making one
type a subtype of another. This is done by specifying a supertype following
the type identifier in a type declaration. For instance, to make
BinTree a subtype of the type SearchTree, BinTree would
be declared as follows:
type BinTree : SearchTree = const nil, bin X T1 T2; |
Now a variable of type SearchTree will not only match
SearchTree's, but also any object of its subtype BinTree. In
fact, we might have declared SearchTree as an "abstract" type which
does not provide any constructors at all:
type SearchTree; |
Abstract supertypes like SearchTree are useful for factoring out
generic operations which are shared by different subtypes; see Types,
for examples, and for a more detailed discussion of the type guard concept.
The Q language has a number of predefined type symbols which can be used
to distinguish the corresponding types of objects built into the
language: Bool, Int, Float, Real (which is
the supertype of both Int and Float), Num (the
supertype of Real), String, Char (the subtype of
String denoting the single-character strings), File,
List, Stream, Tuple and Function. (Besides
these, there are also two types Exception and SysException
for dealing with exception values, cf. Exception Handling.) For
instance, the standard library defines an operation isstr, which
determines whether its argument is a string, as follows:
isstr _:String = true; isstr _ = false otherwise; |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The process in which equations are applied in the Q interpreter to evaluate a given expression is fairly straightforward. In fact, it corresponds to the way in which we manipulate algebraic formulae, and we all learned how to do that in high school.
We say that an equation L=R is applicable to a given expression
X, if we can substitute actual values for the variables occurring in
L so that we obtain X. This is also expressed as "L
matches X" or "X is an instance of L". In this case,
we can replace X by Y, where Y is the expression
obtained from R by replacing the variables occurring in L by
their corresponding values. We shall call such a replacement step a
reduction, and write it down as X ⇒ Y.
For instance, given the rule
sqr X = X*X; |
we have that
sqr 2 ⇒ 2*2 |
since the expression sqr 2 matches the left-hand side sqr X,
with 2 being substituted for the variable X. This works with
compound expressions as well, even if they involve variables themselves.
For instance:
sqr (X+1) ⇒ (X+1)*(X+1). |
Here, the compound expression (X+1) has been substituted for the
variable X.
Equations are usually applied in the context of a larger expression to be evaluated. For instance, we have that
sqr 2 + 2 ⇒ 2*2+2 |
by the application of the above definition of sqr to the subexpression
sqr 2 of sqr 2 + 2.
Before we proceed, a remark on the role of the built-in operations of the
Q language is in order. Primitive operations, such as the built-in +
and * operators described in Built-In Operators, do not require
any equations specified by the programmer. Instead, they may be thought of
as being predefined by a large set of built-in equations. Each application
of a built-in rule also counts as a single reduction. For instance, we have
that
2*2 ⇒ 4 |
by the built-in rule which handles the multiplication of integers.
Built-in rules take priority over any equations specified by the programmer. However, it is possible to extend the definition of a primitive operation with equations supplied by the programmer. A corresponding example has already been given in Equations. By refining built-in definitions accordingly, you can also treat "exceptions" in built-in operations; see Runtime Errors, for an example.
In order to complete the evaluation of a given expression, we have to
repeatedly perform single reductions until no further equations (built-in
or user-defined) apply. We then say that the resulting expression is in
normal form, and interpret it as the value of the original
expression. For instance, the following reduction sequence leads to the
normal form (i.e., value) 6 for the target expression
sqr 2 + 2 (we still assume the definition of sqr introduced
above):
sqr 2 + 2 ⇒ 2*2+2 ⇒ 4+2 ⇒ 6. |
Normal forms do not necessarily exist. Just as in conventional programming languages, the evaluation may go into an endless loop and never return an answer. Even if a normal form exists, it need not be unique - it may depend on which equations are used in the reduction process and in what order. The problem is that at any point in the reduction process there may be more than one "reducible" subexpression, and even if there is only one such expression (termed a redex), there may be more than one applicable equation.
To illustrate this kind of problem, consider the definition of the sqr
function from above, and the target expression sqr (1+1). This
expression admits three distinct reduction sequences:
sqr (1+1) ⇒ sqr 2 ⇒ 2*2 ⇒ 4 sqr (1+1) ⇒ (1+1)*(1+1) ⇒ 2*(1+1) ⇒ 2*2 ⇒ 4 sqr (1+1) ⇒ (1+1)*(1+1) ⇒ (1+1)*2 ⇒ 2*2 ⇒ 4. |
The second and third reduction sequence appear to be almost the same, but
the first reduction sequence makes clear that it matters whether we reduce
the subexpression (1+1) before applying the definition of sqr
or not. In the present example, the order in which reductions are performed
only affects the number of required reduction steps, but it is easy to
construct other systems of equations in which both the termination of a
normal form evaluation and the computed normal forms actually depend on the
evaluation order.
Non-uniqueness of normal forms does not necessarily mean that there must be two or more rules being applicable to a given expression. Ambiguities can also arise when an equation "overlaps" with other equations or with itself, as in the following example:
foo (foo X) = bar X; |
This system consisting of a single equation is terminating, but it has
two distinct normal forms for the expression foo (foo (foo X)),
namely foo (bar X) and bar (foo X).
Indeed, the termination of rewrite systems and the uniqueness of normal forms are important issues in the theory of term rewrite systems [Dershowitz/Jouannaud 1990]. The Q language simply avoids these problems by imposing a reduction strategy which makes the reduction process deterministic. The reduction strategy must specify unambiguously for each reduction step the redex that is to be reduced and the equation which is to be applied. For this purpose the Q interpreter applies the following rules:
The leftmost-innermost strategy is common in many programming languages, and
it also corresponds to the "natural" way in which people tend to carry
out calculations manually. It means that at any point in the reduction
process it is always the leftmost-innermost redex which is to be reduced.
In particular, this implies that in an application of the form
F X first F is evaluated, then X, before the
value of F is applied to the value of X. Thus the first
reduction sequence from above,
sqr (1+1) ⇒ sqr 2 ⇒ 2*2 ⇒ 4 |
would be the one actually carried out in the interpreter in the course
of the evaluation of sqr (1+1).
The leftmost-innermost strategy resolves all ambiguities in the choice
of redexes during a normal form evaluation. The second disambiguating
rule allows to decide which equation to apply if more than one equation
is applicable to a given redex. As indicated, the default is to apply
equations in the order in which they appear in the source script. This
allows you to have equations with overlapping left-hand sides which
naturally arise, e.g., in definitions involving "special case rules"
such as the definition of the fib function in Equations:
fib 0 = 0; fib 1 = 1; fib N = fib (N-1) + fib (N-2) otherwise; |
The rule for applying equations is extended to the built-in equations of
primitive operations by assuming - as already mentioned - that the
built-in rules come "before" any equations specified by the
programmer, and hence always take priority. Also note that external
functions, which are declared with the extern modifier and
implemented in a C module, are treated in an analogous manner. In
general, the interpreter will first try a built-in rule, then an extern
rule, then the equations supplied in the script. (The treatment of
extern functions is described in more detail in C Language Interface.)
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Conditional rules are applied in the same manner as simple equations,
only we also have to check the conditions of the rule. When the
left-hand side of a conditional rule matches the target expression, we
first evaluate the condition (with left-hand side variables being
replaced as usual). This expression must evaluate to a truth
value, otherwise we cannot decide whether the rule is applicable or
not. (The Q interpreter generates a runtime error if the result is not a
truth value.) The rule is applicable only when the condition evaluates
to true.
For instance, recall the definition of the factorial function:
fac N = N*fac (N-1) if N>0;
= 1 otherwise;
|
The first rule is applicable only if N>0 evaluates to
true; if it evaluates to false, the second rule
applies. Furthermore, if N happens to be incomparable with
0, then a runtime error will occur.
Here's how this definition is applied to evaluate the expression
fac 3:
fac 3 ⇒ 3*fac (3-1) ⇒ 3*fac 2
⇒ 3*(2*fac (2-1)) ⇒ 3*(2*fac 1)
⇒ 3*(2*(1*fac (1-1))) ⇒ 3*(2*(1*fac 0))
⇒ 3*(2*(1*1)) ⇒ 3*(2*1)
⇒ 3*2 ⇒ 6.
|
Local variable definitions (cf. Local Variables) are treated in a similar fashion. We first evaluate the right-hand side of the definition and match it against the corresponding left-hand side. If the match fails, the rule is skipped. Otherwise, we bind variables accordingly and proceed with the next qualifier or the right-hand side of the rule.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
We already discussed that the Q interpreter normally considers equations in the order in which they appear in a script. Hence, if all your equations go into a single script, you simply write down overlapping equations in the order in which they should be tried by the interpreter. However, if equations for the same expression pattern are scattered out over different modules, then the rule order also depends on the order in which the different modules are processed by the compiler. In the current implementation, the rules of each module are processed when the first import or include declaration for the module is encountered during a preorder traversal of the module graph starting at the main script, but you should not rely on this.
Hence it is usually a bad idea to have overlapping equations scattered out over different source files. However, in some situations this is hard to avoid. For instance, you might decide to put all "default rules" for certain expression patterns in a separate module. But if you simply import this module at the beginning of your main script then the default rules might override all the other equations.
To work around this, the Q language allows you to explicitly attach priorities to equations by means of a priority declaration, which has the following format:
declaration : '@' ['+'|'-'] unsigned-number |
This specifies the given number (which must be an integer, optionally
signed with `+' or `-') as the priority level of the
following equations. In the current implementation, priority levels must
be 32 bit integers, i.e., in the range
-0x80000000..0x7fffffff. Rules are ordered according to their
priorities, highest priority first. Rules with the same priority are
considered in the order in which they appear in the script.
The priority level set with a priority declaration applies only to the
script in which the declaration is located. The default priority level,
which is in effect up to the first @ declaration, is 0.
Note that even with priority declarations the built-in operations still take priority over all user-defined rules, i.e., those operations effectively have infinite priority. (The same applies to external functions, see C Language Interface.)
As a simple (and contrived) example for the use of priority declarations, consider the following script:
@-1 foo X = -1; // "default" rule @0 foo X:Real = 0; // "standard" rule @+1 foo X:Int = 1; // "special case" rule |
The second equation is at the default priority level, 0. The first equation, although it comes before the second one, becomes a "fallback" rule by assigning it a low priority, -1. The last equation is put on a high priority level, +1, and hence overrides both preceding equations. The net effect is that the equations are actually ordered in reverse textual order. You can verify this in the interpreter:
==> foo 77 1 ==> foo 77.0 0 ==> foo () -1 |
Of course, this example is somewhat silly, because in a single script we can easily arrange equations in the correct textual order. However, if for some reason we had to put the three rules into three different modules, then using the given priority declaration ensures that the equations will always be applied in the correct order.
Priority levels should be used sparingly, if at all. Using low priorities to factor out a module with default rules can occasionally be quite useful, but overriding rules with high priorities is considered harmful and should be avoided whenever possible.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The present and the following section contain material for the technically interested reader which, while not being strictly necessary to work successfully with the Q programming language, helps to understand the operation of the Q interpreter and to write better Q scripts. You might wish to skim the following at first reading, and later return to it when you have become more proficient with the Q language and require some further background knowledge.
While the "rewriting model" of expression evaluation introduced in
Normal Forms and Reduction Strategy, provides a nice conceptual model
for the Q programmer, it is too inefficient to be a useful implementation
model for the Q interpreter. In particular, it would be rather inefficient to
actually construct the full expressions which occur as intermediate results
in the course of a reduction sequence, and repeatedly scan these expressions
for the leftmost-innermost redex. Instead, we can evaluate expressions by a
simple recursive procedure as follows. The input to the algorithm is an
expression X to be evaluated.
X is an application, list or tuple, evaluate the parts of X
recursively, from left to right. Proceed with step 2.
X, invoke it and
return the corresponding value.
L=R which is
applicable to X. If there is no such rule, X is in normal form
and is returned as the value of the expression. Otherwise we recursively
evaluate R (with variables instantiated to their corresponding values)
and return it as the value of the original expression.
(The above description of course leaves out many important details. For
instance, the interpreter will also have to take care of def'ed
symbols, and we will have to organize the search for an applicable equation.
However, here we only want to give the general idea, and not a complete
implementation of the interpreter.)
The above procedure can actually be implemented non-recursively if we
keep track of the rules which are currently "active", together with the
corresponding variable bindings. This information can easily be maintained
on a stack. We illustrate this with a simple example. Recall the definition
of the add function from Equations:
add [] = 0; add [X|Xs] = X+add Xs; |
The evaluation of add [1,2,3] then proceeds as follows:
add [1,2,3] ⇒ 1+add [2,3]:
add [2,3] ⇒ 2+add [3]:
add [3] ⇒ 3+add []:
add [] ⇒ 0
3+0 ⇒ 3
add [3] ⇒ 3
2+3 ⇒ 5
add [2,3] ⇒ 5
1+5 ⇒ 6
add [1,2,3] ⇒ 6
|
Each new level of indentation indicates that we "suspend" the current rule (push it on the stack) and activate a new rule in order to evaluate some part of the right-hand side of the suspended rule. When all parts of the right-hand side have been evaluated, we return to the suspended rule (pop it from the stack) and actually perform the reduction. More precisely, we replace the left-hand side of the suspended rule by the result obtained by evaluating the right-hand side, which is already in normal form. (We will of course have to keep track of the results of intermediate reductions, which can be done on a second "evaluation" stack. When the evaluation of the right-hand side is finished, the corresponding result will be on top of the evaluation stack.)
If desired, we can easily reconstruct the "context" of a rule by following the chain of stacked rules starting from the current rule, and proceeding towards the bottom of the stack. For instance, when we perform the reduction
add [] ⇒ 0 |
in the above example, the context of this rule is described by the following stacked rules:
add [3] ⇒ 3+add [] add [2,3] ⇒ 2+add [3] add [1,2,3] ⇒ 1+add [2,3] |
Thus, when add [] gets reduced to 0, we have actually completed
the following initial part of the reduction sequence:
add [1,2,3] ⇒ 1+add [2,3] ⇒ 1+(2+add [3]) ⇒ 1+(2+(3+add []))
⇒ 1+(2+(3+0))
|
(Note that we never actually constructed intermediate results like the
expression 1+(2+(3+add [])). Rather, these expressions are represented
implicitly by the contents of the stack.)
We can also apply the above procedure to qualified rules accordingly.
When the interpreter comes to consider a conditional rule, it pushes it
onto the stack as usual. However, it then first starts to evaluate the
qualifying conditions and where clauses of the rule. When the
value of a condition has been computed, we can check whether it
holds. If the computed value is neither true nor false,
the interpreter aborts the evaluation with a runtime error. If it is
true, it proceeds by evaluating other qualifiers or the
right-hand side. If it is false, however, it gives up on the
current rule (pops it from the stack), and considers the next applicable
equation. A similar approach is used to handle local variable
definitions.
For instance, consider once more the definition of the factorial function:
fac N = N*fac (N-1) if N>0;
= 1 otherwise;
|
The computation of fac 3 then proceeds as follows (the ∗
symbol indicates that the condition N>0 has evaluated to false
and hence the corresponding rule is abandoned):
fac 3 ⇒ 3*fac (3-1):
3>0 ⇒ true
3-1 ⇒ 2
fac 2 ⇒ 2*fac (2-1):
2>0 ⇒ true
2-1 ⇒ 1
fac 1 ⇒ 1*fac (1-1):
1>0 ⇒ true
1-1 ⇒ 0
fac 0 ⇒ 0*fac (0-1):
0>0 ⇒ false ∗
fac 0 ⇒ 1
1*1 ⇒ 1
fac 1 ⇒ 1
2*1 ⇒ 2
fac 2 ⇒ 2
3*2 ⇒ 6
fac 3 ⇒ 6
|
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The above equations for the add and fac functions are examples
for recursive definitions. The computational processes generated from
such definitions are characterized by chains of suspended rules in which
the same rules are activated over and over again. For instance, an
evaluation of fac N with N>0 requires N recursive
activations of the rule:
fac N = N*fac (N-1) if N>0; |
Hence, if N becomes very large, the recursive definition of fac
is in danger of running out of stack space. In the following, we show how to
avoid this defect by employing the technique of "tail-recursive
programming".
It is a well-known fact that the factorial function also has an "iterative"
implementation which can be executed in constant memory space. The idea is
to maintain a "running product" P and a "counter" I which
counts down from N to 1. The iterative algorithm can be written
down in a conventional programming language like Pascal as follows:
P := 1; I := N; while I>0 do begin P := P*I; I := I-1; end; |
While the Q language does not provide any special looping constructs
(and it also lacks an assignment operation), there is an alternative
definition of fac which takes up the idea of running products and
counters and implements the factorial function in an iterative fashion:
fac N = fac2 1 N;
fac2 P I = fac2 (P*I) (I-1) if I>0;
= P otherwise;
|
Here, the "state variables" P and I are implemented as
arguments of an "auxiliary" function fac2 which is invoked from
fac. Again, this is a recursive definition; it requires N
recursive applications of the rule
fac2 P I = fac2 (P*I) (I-1) if I>0; |
when fac N is computed. However, in difference to our previous
definition of fac, the recursive rule is always applied "on top"
of the target expression. Such rules are called tail-recursive.
(The name "tail recursion" comes from the fact that the recursive
application of fac2 is the last operation considered during the
leftmost-innermost evaluation of the right-hand side.)
For instance, the evaluation of fac 3 now proceeds as follows
(abbreviating reductions by built-in rules for `-' and `*'):
fac 3 ⇒ fac2 1 3 ⇒ fac2 (1*3) (3-1)
⇒ fac2 3 2 ⇒ fac2 (3*2) (2-1)
⇒ fac2 6 1 ⇒ fac2 (6*1) (1-1)
⇒ fac2 6 0 ⇒ 6
|
Tail-recursive definitions can be employed for implementing all kinds of functions which can be computed on a machine with a fixed set of "registers" and no auxiliary memory. For instance, here is a tail-recursive implementation of the Fibonacci function:
fib N = fib2 1 0 N;
fib2 A B N = fib2 (A+B) A (N-1) if N>0;
= B otherwise;
|
(This definition also has the desirable side effect that it cuts down the exponential running time of the "naive" definition given in Equations, to linear.)
The Q interpreter employs a clever optimization trick, commonly known as tail recursion optimization (see e.g., [Steele/Sussman 1975]), to actually execute tail-recursive definitions within constant stack space. Hence no special looping constructs are required for implementing iterative algorithms efficiently in the Q language.
Assume that in our stack model of expression evaluation we are working
on a reduction X ⇒ Y, and that we already recursively
evaluated all parts of the right-hand side Y, but not Y
itself. Furthermore, suppose that in order to evaluate Y we will
have to apply the rule Y ⇒ Z next. Then, instead of
keeping the rule X ⇒ Y suspended and evaluating Y
using rule Y ⇒ Z recursively, we can also immediately
perform the reduction X ⇒ Y and replace this rule with
Y ⇒ Z on the stack. Thus, the new rule Y ⇒
Z will not require any additional stack space at all, but simply reuses
the existing "activation record" for the X ⇒ Y rule. In
other words, instead of invoking the Y ⇒ Z rule as a
"subroutine", we effectively perform a kind of "goto" to the new
rule. We also refer to this process as performing the tail
reduction X ⇒ Y. The evaluation now proceeds as if we
had been working on the rule Y ⇒ Z in the first place.
For instance, with the new definition of fac the evaluation of
fac 3 would be carried out using only a single level of suspended
rules (again, the ∗ symbol signals abortion of a rule with failing
qualifier):
fac 3 ⇒ fac2 1 3:
fac2 1 3 ⇒ fac2 (1*3) (3-1):
3>0 ⇒ true
1*3 ⇒ 3
3-1 ⇒ 2
fac2 3 2 ⇒ fac2 (3*2) (2-1):
2>0 ⇒ true
3*2 ⇒ 6
2-1 ⇒ 1
fac2 6 1 ⇒ fac2 (6*1) (1-1):
1>0 ⇒ true
6*1 ⇒ 6
1-1 ⇒ 0
fac2 6 0 ⇒ fac2 (6*0) (0-1):
0>0 ⇒ false ∗
fac2 6 0 ⇒ 6
|
Besides the tail recursion optimization technique discussed above, the Q
interpreter also automatically optimizes toplevel sequences
(i.e., applications of the || operator which are not inside a
nested subexpression) on the right-hand side of equations, so that basic
imperative-style looping constructs can be executed in a tail-recursive
fashion as well. Consider, for instance, the standard library function
do which applies a function to each member of a list, like the
map function discussed in Equations, but instead of
collecting the results in an output list simply throws away the
intermediate values and returns (). This is useful, of course,
only if the function is evaluated solely for its side-effects, e.g., if
we want to print out all elements of a list. The definition of do
is rather straightforward:
do F [] = (); do F [X|Xs] = F X || do F Xs; |
Now this definition is not tail-recursive in the strict sense
alluded to above, because the last application on the right-hand side of
the second rule actually involves (||) and not the do
function. (Recall that an infix expression like X||Y is nothing
but syntact sugar for the function application (||) X Y.)
However, as a remedy, the Q interpreter actually implements toplevel
sequences on the right-hand side of a rule by direct stack manipulation.
That is, the first argument of a sequence is thrown away as soon as it
has been computed. By these means, the interpreter, after having
computed F X, effectively carries out the reduction do F
[X|Xs] ⇒ do F Xs on which it can perform tail recursion
optimization as usual. Thus the above definition actually works in
constant stack space, as one might reasonably expect.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
There are a few conditions which may force the Q interpreter to give up on a reduction sequence and abort the evaluation with a runtime error message. These conditions are listed below.
Ctl-C). In this case it is possible to resume the evaluation in
the debugger (cf. Debugging), provided that debugging has been
enabled with the break on command (see Command Language).
break is on,
the debugger is invoked to inform the user of the equation which gave
rise to the error condition.
All of the above error conditions can also be caught and handled by the running script in any desired manner, see Exception Handling.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Q uses dynamic typing like, e.g., Lisp or Smalltalk, as opposed to statically typed languages such as Pascal, Eiffel or Haskell. In languages with static typing, a variable or function parameter can only hold a value which matches its prescribed type (which can be a "polymorphic" type in languages like Eiffel and Haskell, but still the type of the value is restricted). In dynamically typed languages, however, the actual type of a variable or function parameter value is not known in advance. Consequently, in Q it is only possible to distinguish different types of objects - such as search trees, queues, arrays and the like - by selecting an appropriate set of constructor symbols for each type of object. This chapter discusses Q's notion of type guards which allows you to make the assignment of constructors to a type explicit, and to use this information in order to restrict the applicability of equations to objects of a given type.
| 8.1 Using Type Guards | ||
| 8.2 Built-In Types | ||
| 8.3 Enumeration Types | ||
| 8.4 Sub- and Supertypes | ||
| 8.5 Views |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
As in any programming language, the careful design of application-dependent data structures is one of the main concerns when developing Q scripts which go beyond simple numeric, string and list processing. As a typical example for a non-list data structure which plays a prominent role in many Q scripts, let us consider binary search trees, which are a convenient means to implement sets, bags, dictionaries and the like. We will use this data structure as a running example throughout this chapter.
A typical choice of constructors for binary trees is the following:
public const nil, bin X T1 T2; |
To make explicit the fact that nil and bin belong to the
binary tree type, instead of the above declaration we use a type
declaration like the following, which, besides declaring the constructor
symbols, also introduces the type symbol BinTree:
public type BinTree = const nil, bin X T1 T2; |
This declaration tells the interpreter that each expression of the form
nil or bin X T1 T2 should be considered as a member of the
BinTree type. This is also known as an algebraic type; the
type declaration is essentially a signature of constructor symbols which
are used to create the members of the type. The type symbol can then be
used as a guard on variables on the left-hand side of equations
and variable definitions to ensure that a variable is only matched to
objects of the given type, see Type Guards. E.g., the following
rule employs such a type guard in order to restrict the argument of the
foo function to BinTree objects:
foo T:BinTree = ...; |
This makes it possible to avoid explicit argument patterns like
foo nil = ...; foo (bin X T1 T2) = ...; |
in cases in which the component values are not actually required. This can simplify matters a lot, in particular if multiple arguments have to be matched to a given type. Also, it is more efficient than checking the type of an object in the qualifier part of a rule by using a user-defined predicate, since the interpreter can use the type information right away in the pattern matching process.
Another important reason for preferring type guards over explicit argument
patterns is the issue of "information hiding". With the former definition
of the foo function above we do not make any explicit reference to
the constructor patterns making up the BinTree data type. This makes
it possible to treat BinTree as an abstract data type
(ADT) which hides the underlying implementation details (in particular
the constructors), while still being able to verify that the proper kind of
object is supplied as an argument. Any access to objects of the ADT will
then be implemented by referring to the appropriate operations supplied by
the type. In fact, we can make the constructors private symbols which are
only accessible to the script which implements the BinTree type:
public type BinTree = private const nil, bin X T1 T2; |
As a concrete example, let us assume the standard search tree operations
insert T X and delete T X, which insert an element X
into a tree T, and remove it from the tree, respectively. These
operations can be implemented as follows (see [Bird/Wadler 1988]):
public insert T X, delete T X;
private join T1 T2, init T, last T;
insert nil Y = bin Y nil nil;
insert (bin X T1 T2) Y = bin X (insert T1 Y) T2 if X>Y;
= bin X T1 (insert T2 Y) if X<Y;
= bin Y T1 T2 if X=Y;
delete nil Y = nil;
delete (bin X T1 T2) Y = bin X (delete T1 Y) T2 if X>Y;
= bin X T1 (delete T2 Y) if X<Y;
= join T1 T2 if X=Y;
join nil T2 = T2;
join T1 T2 = bin (last T1) (init T1) T2 otherwise;
init (bin X T1 nil) = T1;
init (bin X T1 T2) = bin X T1 (init T2) otherwise;
last (bin X T1 nil) = X;
last (bin X T1 T2) = last T2 otherwise;
|
(Note that the last and init operations return the last
element of a binary tree, and a binary tree with the last element removed,
respectively. The join, init and last functions are for
internal use only, and can hence be implemented as private functions.)
Furthermore, we assume the following function bintree which
constructs a binary tree from a list, and the function members which
returns the list of elements stored in a tree (in ascending order):
public bintree Xs; bintree Xs:List = foldl insert nil Xs; public members T; members nil = []; members (bin X T1 T2) = members T1 ++ [X|members T2]; |
(The definition of bintree employs the standard foldl
operation, see The Standard Library.) We can use the interface
operations of the BinTree ADT in order to implement the functions
union and diff which add and remove all members of a tree
to/from another tree:
public union T1 T2; // add elements of T2 to T1
union T1:BinTree T2:BinTree
= foldl insert T1 (members T2);
public diff T1 T2; // remove elements of T2 from T1
diff T1:BinTree T2:BinTree
= foldl delete T1 (members T2);
|
Observe that no explicit reference is made to the BinTree
constructors; we only employ the public "interface" functions
insert, delete and members of the BinTree ADT.
This allows us to change the realization of the data structure without
having to rewrite the definitions of union and diff. Still,
the definitions of union and diff are "safe"; the
BinTree type guard ensures that the arguments passed to union
and diff are indeed BinTree objects capable of carrying out
the requested operations.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Type guards are also the only way for verifying that the actual value of
a variable belongs to one of the built-in types of integers, floating
point numbers, strings, files and lambda functions, since there is no
way for writing out all "constructors" for these kinds of objects -
there are infinitely many (at least in theory). For this purpose, the
type symbols Int, Float, String, File and
Function are predefined. There also is a type named Char
which denotes the single-character strings. Technically, Char is
a subtype of the String type; more about that in Sub- and Supertypes. Moreover, Char is also treated as an
enumeration type, see Enumeration Types, below.
Beginning with version 7.2, Q also offers a more elaborate "numeric
tower" which makes it easier to integrate user-defined number
types. There is a type Real which is a subtype of the Num
type and the supertype of both Int and Float. Moreover,
the standard library defines the complex number type Complex
which is a subtype of Num, as well as the rational number type
Rational which is a subtype of Real, see The Standard Library.
The built-in List type matches all list expressions of the form
[] or [X|Xs]. This type is used to restrict the applicability
of an equation to list arguments. For instance, the following equations
define a function reverse which reverses a list:
reverse Xs:List = foldl push [] Xs; push Xs:List X = [X|Xs]; |
The Stream and Tuple types are completely analogous: they
match streams and tuples of arbitrary sizes, i.e., expressions of the
form {} and {X|Xs} or () and (X|Xs),
respectively.
The predefined Bool type is a means to refer to objects which are
truth values. Its built-in definition is as follows:
public type Bool = const false, true; |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Types like the built-in Bool type, which only consist of nullary
const symbols, are also called enumeration types. You can
easily define such types yourself, e.g.:
type Day = const sun, mon, tue, wed, thu, fri, sat; |
The Q language provides special support for enumeration types (including
the built-in Bool type, and also the Char type), by means
of the following operations:
Day type, we have that sun <
mon ⇒ true.
succ and pred functions (see Arithmetic Functions) produce the successor and the predecessor of an enumeration
type member. E.g., succ sun ⇒ mon.
ord function (see Conversion Functions) computes the
ordinal number of an enumeration type member. E.g., ord sun
⇒ 0.
X is an enumeration
type member then X+N and X-N yields the Nth
successor and predecessor of X, respectively. E.g., mon+3
⇒ thu and fri-5 ⇒ sun. This also works with
negative numbers: fri+(-3) ⇒ tue. Moreover, if X
and Y are members of the same enumeration type then X-Y
yields the difference ord X-ord Y of the corresponding ordinal
numbers. Thus, e.g., fri-tue ⇒ 3.
enum and enum_from functions can be used to construct a
list consisting of a range of enumeration type values. Some convenient
shorthand notations like [X..Y] are provided as well, see below
for details.
Note that while there is no direct operation for converting ordinal numbers back to the corresponding members of a given type, you can easily accomplish this using arithmetic:
==> ord thu 4 ==> sun+4 thu |
Enumeration type arithmetic also provides a quick way to check whether two enumeration type members belong to the same type:
==> isint (tue-thu) true |
Here we employed the standard library function isint to determine
whether the ordinal difference of the two values is actually an
integer. This works because the `-' operator will fail if the two
constants belong to different enumeration types.
Using "enumerations" (which are described in detail below), you can also list all enumeration type members starting at a given value as follows:
==> [sun..] [sun,mon,tue,wed,thu,fri,sat] |
Let us now take a more in-depth look at enumerations. The central
operation to create an enumeration is the built-in enum function,
which lists all members of an enumeration type in a given range:
==> enum mon fri [mon,tue,wed,thu,fri] |
The enum_from function works like enum, but only takes a
single argument and lists all members of the type starting at the
given value:
==> enum_from mon [mon,tue,wed,thu,fri,sat] |
Note that the default "step size" is 1. An arbitrary step size can be
indicated by invoking enum or enum_from with two
initial members in the first argument as follows:
==> enum [sun,tue] sat [sun,tue,thu,sat] ==> enum [sat,fri] sun [sat,fri,thu,wed,tue,mon,sun] ==> enum_from [sun,tue] [sun,tue,thu,sat] ==> enum_from [sat,fri] [sat,fri,thu,wed,tue,mon,sun] |
The notation [X1,X2,…,Xn..Y] is provided as syntactic sugar
for enum [X1,X2,…,Xn] Y, so you can also write the
following:
==> [mon..fri] [mon,tue,wed,thu,fri] ==> [sun,tue..sat] [sun,tue,thu,sat] ==> [sat,fri..sun] [sat,fri,thu,wed,tue,mon,sun] |
Likewise, [X1,X2,…,Xn..] is the same as enum_from
[X1,X2,…,Xn]:
==> [mon..] [mon,tue,wed,thu,fri,sat] ==> [sun,tue..] [sun,tue,thu,sat] ==> [sat,fri..] [sat,fri,thu,wed,tue,mon,sun] |
We have already mentioned that the built-in Char type is also an
enumeration type, which consists of all the characters in the Unicode
character set. Thus arithmetic on characters works as expected:
==> "a"+5 "f" ==> "z"-2 "x" ==> "8"-"0" 8 |
The enum operation also applies to characters accordingly:
==> enum "a" "k" ["a","b","c","d","e","f","g","h","i","j","k"] ==> ["a".."k"] ["a","b","c","d","e","f","g","h","i","j","k"] ==> ["k","j".."a"] ["k","j","i","h","g","f","e","d","c","b","a"] |
The enum_from operation works with Char, too, but this is
usually rather useless because the Unicode character set is fairly large
and thus it is rather inefficient to construct a complete list of all
characters starting at a given value. The right way to work with such
huge character enumerations is to take advantage of lazy evaluation and
use a stream enumeration instead, see below.
The standard prelude (see The Standard Library) extends
succ, pred and enum (but not enum_from, for
obvious reasons) to integers, which effectively turns the integers into
an (albeit infinite) enumeration type:
==> succ 0 1 ==> pred 0 -1 ==> [0..9] [0,1,2,3,4,5,6,7,8,9] ==> [0,2..9] [0,2,4,6,8] ==> [9,8..0] [9,8,7,6,5,4,3,2,1,0] |
The standard prelude also implements enum (but not succ
and pred) on floating point values:
==> [0.1,0.2..1.0] [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0] |
Note that it is also possible to invoke enum or enum_from
with more than two members in the first argument. That is, you can
construct enumerations with more than two initial members. Neither the
interpreter nor the library provides any definitions for these, so it is
up to you to introduce your own interpretations of such constructs when
you need them. For instance:
[X1,X2,X3..Y] = while (<=Y) (F*) X1 if (F>1) and then (Z3=X3)
where F:Int = X2 div X1, Z3:Int = X2*F;
|
This definition employs the standard library function while to
construct an increasing geometric integer series if it detects a
constant integer ratio between the initial three members:
==> [2,4,8..128] [2,4,8,16,32,64,128] |
Tuple enumerations work in exactly the same fashion as list
enumerations. The corresponding operations are called tupleenum
and tupleenum_from. The same kind of syntactic sugar is provided,
the only difference being that ordinary parentheses are used instead of
brackets. Example:
==> (9,8..0) (9,8,7,6,5,4,3,2,1,0) |
The standard prelude also provides enumerations for lazy lists
a.k.a. streams, cf. Streams. The streamenum and
streamenum_from functions work like enum and
enum_from, respectively, but construct streams instead of
lists. In the case of streamenum_from, the produced stream may be
infinite. The Q language provides the same kind of syntactic sugar for
stream enumerations as for lists, the only difference is that curly
braces are used instead of brackets. For instance, {1..9}
denotes a stream consisting of a finite arithmetic sequence, while
{0..} is the infinite stream of all nonnegative
integers.
Because of lazy evaluation, streamenum_from can often be used to
work around the inefficiencies of enum_from when you have to deal
with huge enumerations of characters. For instance, here is how we can
use a stream enumeration to count the number of all alphabetic Unicode
characters:
==> #filter isalpha {chr 1..}
90342
|
This will take a little while, but works o.k. because we never actually construct the complete list of all Unicode characters, as would be the case when doing the same with a list enumeration.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The Q programming language also provides a subtype concept similar to the
notion of single inheritance in object-oriented programming languages such
as Smalltalk. For instance, we can modify our declaration of the
BinTree type (cf. Using Type Guards) in order to make
BinTree a subtype of the supertype SearchTree as follows:
public type BinTree : SearchTree = private const nil, bin X T1 T2; |
Quite often, supertypes are abstract types which do not provide
their own set of constructor symbols, but are simply used to factor out
common operations shared among several "concrete" types. For instance,
the SearchTree type might have been declared simply as follows:
public type SearchTree; |
Now variables of type SearchTree will also match objects of its
subtype BinTree, as well as of any other subtype of
SearchTree. We can turn the union and diff functions
from Using Type Guards, into operations on the SearchTree type
as follows:
union T1:SearchTree T2:SearchTree
= foldl insert T1 (members T2);
diff T1:SearchTree T2:SearchTree
= foldl delete T1 (members T2);
|
As the next step, we might introduce another type AVLTree providing
the same interface operations insert, delete and
members as the BinTree type, but implementing these operations
in terms of AVL trees rather than simple binary trees. (AVL trees are a
variant of binary search trees in which the trees are kept balanced, and
thus logarithmic insertion and deletion times can be guaranteed.) If we make
AVLTree another subtype of SearchTree, then the union
and diff operations can be applied to AVLTree objects just as
well as to BinTree's. In fact, the operations will even work if we
mix argument types, e.g., provide a BinTree as the first argument of
union and an AVLTree as the second! By these means, it is
possible to define polymorphic operations which are applicable to several
different types sharing the same (subset of) interface functions.
For the sake of concreteness, here is an implementation of the
AVLTree type. The shl, shr, rol and
ror functions implement the common AVL tree rotation operations
which are used to keep the tree balanced; see [Bird/Wadler 1988] for
details.
/* H denotes the height of a nonempty AVL tree */
public type AVLTree : SearchTree = private const anil, abin H X T1 T2;
public avltree Xs;
private mknode X T1 T2;
private join T1 T2, init T, last T, height T, slope T, rebal T;
private rol T, ror T, shl T, shr T;
avltree Xs:List = foldl insert anil Xs;
members anil = [];
members (abin H X T1 T2)
= members T1 ++ [X|members T2];
insert anil Y = abin 1 Y anil anil;
insert (abin H X T1 T2) Y
= rebal (mknode X (insert T1 Y)) T2 if X>Y;
= rebal (mknode X T1 (insert T2 Y)) if X<Y;
= abin H Y T1 T2 if X=Y;
delete anil Y = anil;
delete (abin H X T1 T2) Y
= rebal (mknode X (delete T1 Y) T2) if X>Y;
= rebal (mknode X T1 (delete T2 Y)) if X<Y;
= join T1 T2 if X=Y;
join anil T2 = T2;
join T1 T2 = rebal (mknode (last T1) (init T1) T2) otherwise;
init (abin H X T1 anil) = T1;
init (abin H X T1 T2) = rebal (mknode X T1 (init T2)) otherwise;
last (abin H X T1 anil) = X;
last (abin H X T1 T2) = last T2 otherwise;
/* mknode constructs a tree node, computing the height value */
mknode X T1 T2 = abin (max (height T1) (height T2) +1) X T1 T2;
/* height and slope compute the height and slope (difference between
heights of the left and the right subtree), respectively */
height anil = 0;
height (abin H T1 T2) = H;
slope anil = 0;
slope (abin H X T1 T2) = height T1 - height T2;
/* rebal rebalances after single insertions and deletions */
rebal T = shl T if slope T = -2;
= shr T if slope T = 2;
= T otherwise;
/* rotation operations */
rol (abin H1 X1 T1 (abin H2 X2 T2 T3))
= mknode X2 (mknode X1 T1 T2) T3;
ror (abin H1 X1 (abin H2 X2 T1 T2) T3)
= mknode X2 T1 (mknode X1 T2 T3);
shl (abin H X T1 T2) = rol (mknode X T1 (ror T2)) if slope T2 = 1;
= rol (abin H X T1 T2) otherwise;
shr (abin H X T1 T2) = ror (mknode X T1 (ror T2)) if slope T2 = -1;
= ror (abin H X T1 T2) otherwise;
|
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
As of version 7.7, Q provides a simple but effective implementation of views, a device originally proposed by Wadler [Wadler 1987] which provides a means to equip abstract data types with concrete representations in terms of "virtual constructors", which can be used for pattern matching just like "real" constructors are used when dealing with concrete algebraic data types. In the Q language, views are also used for deciding syntactic equality of external types (see Non-Linear Equations) and, last but not least, for pretty-printing purposes.
The first ingredient of a view is a definition for the built-in special
form view which returns a representation of a data object,
usually in terms of a public "construction" function for objects of
the type. For instance, given the BinTree and AVLTree
types from the previous section, we might provide the following rules:
view T:BinTree = '(bintree Xs) where Xs = members T; view T:AVLTree = '(avltree Xs) where Xs = members T; |
Here we used the construction functions bintree and
avltree to build a representation of binary and AVL trees in
terms of their member lists. After adding these rules to the program in
Sub- and Supertypes, the representation defined by these rules
will be used whenever an object of these types is printed in the
interpreter (as well as by the built-in str and write
functions). For instance:
==> def T1 = avltree [17,5,26,5], T2 = bintree [8,17] ==> union T1 T2; diff T1 T2 avltree [5,5,8,17,17,26] avltree [5,5,26] |
Two things are worth noting here. First, the view function always
has to return a quoted expression. A description of the quote
constructor `'' can be found in Special Forms; essentially,
it just protects its argument from being evaluated. The need for this
becomes apparent when considering that views are just expressions which,
when evaluated, would construct a value of the type for which the view
is being defined; thus the expression returned by view has to be
treated as a literal, and the `'' operator takes care of
that. Second, if you take a look at the example above, you will notice
that we defined the views in such a manner that they would actually
reconstruct the original values if you would evaluate them from the
command line. It goes without saying that this behaviour is highly
recommended, although the interpreter does not enforce this condition.
The second ingredient of a view definition is the declaration of one or
more virtual constructors for the type. (In fact this is only needed if
you also want to be able to use the view for pattern matching.) In our
example, it suffices to turn the construction functions used in the
view, bintree and avltree, into virtual constructors of
the BinTree and AVLTree types, respectively. To these
ends, we remove the public declarations of bintree and
avltree and modify the type declarations as follows:
public type BinTree : SearchTree = virtual bintree Xs
| private const nil, bin X T1 T2;
public type AVLTree : SearchTree = virtual avltree Xs
| private const anil, abin H X T1 T2;
|
This lets us use the virtual constructors in pattern-matching definitions just as if they were real constructors. The views of these objects will be constructed on the fly, as they are required during pattern matching. Note that since the virtual constructors, as before, are implemented as public functions, this will even work outside the module where the data types are defined. E.g.:
==> def T1 = avltree [17,5,26,5], T2 = bintree [8,17] ==> diff T1 T2 avltree [5,5,26] ==> def avltree [X,Y,Z] = diff T1 T2; (X,Y,Z) (5,5,26) |
Of course, we can also use these "virtual" patterns in equations:
test (avltree [X|_]) = X>0; |
And they also work in lambdas:
==> (\(avltree [X|_]).X) (union T1 T2) 5 |
It is also possible to define views in a recursive fashion so that,
e.g., the tree members are made available one at a time. For instance,
we might want to create a list-like view which delivers the tree members
in ascending order. Taking the BinTree data type as an example,
this can be achieved as follows:
public type BinTree : SearchTree = virtual empty, cons X T
| private const nil, bin X T1 T2;
empty = nil;
cons X T:BinTree = insert T X;
private first T;
view nil = 'empty;
view T:BinTree = '(cons X T)
where X = first T, T = delete T X;
first (bin X nil _) = X;
first (bin X T1 T2) = first T1 otherwise;
public bintree Xs;
// ...
|
Note that the first function returns the smallest member in the
tree. The view is defined in terms of two virtual constructors
empty (which returns the empty tree) and cons (which takes
a value and a tree as parameters and inserts the value into the
tree). With these definitions we now have:
==> bintree [5,1,9,3] cons 1 (cons 3 (cons 5 (cons 9 empty))) ==> def (cons X (cons Y _)) = _; (X,Y) (1,3) |
The main advantage of this approach is that, at least for the purpose of
pattern matching, the view only needs to be constructed as far as
required to extract the desired number of tree elements. In contrast,
employing bintree as the virtual constructor, as in our first
example above, is much more inefficient since the entire list of members
will be computed whenever the view is needed in pattern matching.
Another important point worth mentioning here is that a virtual constructor does not really belong to the type, even though it is declared as one of its members. More precisely, only "proper" objects which were constructed with the real constructors of the type will ever match the corresponding type guard. Nevertheless, it is possible to have definitions like the following, which first try a couple of explicit virtual constructor patterns and then fall back to a guarded variable:
type Foo = virtual foo X, goo X | const bar X, baz X; view (bar X) = '(foo X); view (baz X) = '(goo X); foo X = bar X; goo X = baz X; test (foo X) = X; test X:Goo = X otherwise; |
Note that in this case, if the default alternative is taken (i.e., the
last equation for test), the constructed view is returned
(after being evaluated) rather than the original expression. Of course
this will lead to unexpected results if the view, when evaluated, does
not reconstruct the original expression, hence our recommendation that
Y ⇒ X if view X ⇒ 'Y.
Another word of caution: Definitions mixing patterns with virtual
and nonvirtual constructors of a given type will not work
in the current implementation. To avoid backtracking during pattern
matching, a view will always be constructed for each parameter
which is matched against at least one virtual constructor; in such a
case the nonvirtual patterns are effectively disabled. The compiler will
warn you about such situations when it is invoked with the -w
option (see Using Q). If you get this "overlap between virtual
and nonvirtual constructors" warning it means that you should rewrite
your definition so that it uses either real or virtual
constructors, but not both at the same time. That should be easy enough
to do; usually you will use the real constructors inside the module
which defines the data type, and virtual constructors otherwise.
Finally, note that views can also be used in exactly the same fashion
with external types. The only difference here is that an external type
does not have any real constructors and thus the view must be created
using the operations provided for the type. For instance, consider an
external type Bar which represents pairs of integer values
constructed with an external function bar (cf. Writing a Module):
public extern type Bar = virtual extern bar I J; |
Assume that we have another extern function bartuple which
returns the contents of a Bar object as a pair of two integer
values:
public extern bartuple B; |
Now a suitable view for Bar can be defined as follows:
view B:Bar = '(bar I J) where (I,J) = bartuple B; |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
As discussed in Equations and Expression Evaluation, the Q interpreter evaluates expressions in applicative, i.e., leftmost-innermost, order. This means that the arguments of a function are usually evaluated before the function is applied to them, which is also known as call by value, strict or eager evaluation. Occasionally, it is useful or even essential to defer the evaluation of arguments until they are actually required in the course of a computation. For this purpose, the Q language lets you introduce so-called special forms which receive their arguments unevaluated (i.e., using call by name or non-strict evaluation). This chapter discusses how these constructs are defined and used.
| 9.1 Basic Concepts | ||
| 9.2 Special Constructors and Streams | ||
| 9.3 Memoization and Lazy Evaluation | ||
| 9.4 Built-In Special Forms | ||
| 9.5 The Quote Operator | ||
| 9.6 A Bit of Reflection |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Consider the following definition from the standard library which implements a simple kind of conditional expression (see also Conditionals and Comprehensions):
ifelse P X Y = X if P;
= Y otherwise;
|
The ifelse function takes as its first argument a truth value
which is used to determine whether the second or third argument is to be
returned as the value of the conditional expression. Although the above
definition is perfectly correct, using applicative order evaluation with
this definition is clearly inappropriate since all arguments must
already have been evaluated before the ifelse function gets
applied to them. Since either the second or third argument is simply
thrown away, the effort involved in the evaluation of this argument is
wasted. As an extreme case, e.g., Y might not have a terminating
evaluation at all, in which case the evaluation of ifelse P X Y
would not terminate either even though Y is actually not required
if P happens to evaluate to true.
Instead, we would like to have ifelse evaluate its arguments only as
they are required. In the Q language, this can be done by simply declaring
ifelse as a special form. All we have to do is to precede the
above equations with the following declaration:
public special ifelse P X Y; |
The syntax of such declarations has already been introduced in
Declarations. The special keyword must be followed by a
function identifier and a sequence of variable symbols (the variable
symbols only serve to specify the number of arguments, and are otherwise
treated as comments). If the argument list is omitted, the function
symbol is actually treated as an ordinary (non-special)
symbol. Otherwise the given arguments are declared as special
(a.k.a. call by name) arguments, i.e., arguments which are treated
as literals and hence are not evaluated when the given function is
applied to them. Special arguments become so-called deferred
values (also known as thunks or suspensions in the
literature) which will be left unevaluated as long as they are
"protected" by a special form. But as soon as the value of a thunk is
referred to in a "non-special" context, the deferred value will
automatically be evaluated by the interpreter.
Thus the above declaration of the ifelse function tells the Q
interpreter that this function expects three special arguments P,
X and Y which should be left unevaluated until their
values are actually required in the qualifier or the right-hand side of
an equation in ifelse's definition. Consequently, when the
interpreter comes to consider the first rule,
ifelse P X Y = X if P; |
it will first evaluate P to obtain the value of the condition part of
this rule. If P evaluates to true, X will be evaluated
and returned as the value of the left-hand side expression. Otherwise (if
P evaluates to false), X will be left unevaluated, and
the interpreter considers the next rule,
ifelse P X Y = Y otherwise; |
which causes it to reduce ifelse P X Y to the value of Y.
(Note that in cases like the above example, where a special argument
forms the right-hand side of a rule, the interpreter performs tail call
elimination as usual (cf. Tail Recursion). That is, it will
actually evaluate the special argument after performing a tail
reduction of the rule. This is important for simple recursive functions
involving special forms like ifelse which can be executed in
constant stack space. This also applies, in particular, to the built-in
special operators (and then) and (or else), see
Built-In Special Forms.)
It is important to note here that special forms are indeed a runtime
feature in the Q language. This means that special forms are not only
recognized at compile time, but also when they are passed as arguments
or returned as function results in the course of a computation (which
usually cannot be predicted at compile time). For instance, if we define
foo as follows:
special bar X; foo = bar; |
then foo (1+1) evaluates to bar (1+1) and not
to bar 2, although foo itself is not declared to be a
special form. This also works if you pass bar as a functional
parameter:
special bar X; foo F X = F (X+1); |
Given the above definition, foo bar 1 will evaluate to
bar (1+1).
On the other hand, you must take care that arguments to special functions
which are passed on by other functions are not evaluated too early. For
instance, if the apply function is defined as follows:
apply F X = F X; |
and is then invoked as apply bar (1+1) then (1+1) will be
evaluated even though bar is a special form. The reason is that
the argument (1+1) is evaluated before apply is
applied to it, since we have not declared apply as a special form
as well. As a general rule, you should make any function a special form
that passes its arguments to another special form, unless you explicitly
wish to evaluate these arguments.
Up to now, we have only considered "pure" special forms, which receive all their arguments unevaluated. Many functions, however, require a mixture of special and non-special arguments. The Q language provides two different methods for dealing with such situations. First, you can force the evaluation of a special argument at runtime using the `~' (tilde or "force") operator:
special foo P X; foo P X = ifelse ~P (bar X) X; |
Here, the condition P is evaluated before it is passed on
to the ifelse special form (which is as defined above). This
method is useful if we only occasionally have to evaluate a parameter of
a special form. (Note that you can also force expressions outside a
special form; in this case `~' acts as the identity operation,
i.e., the argument expression (which is already in normal form) is
simply returned as is.)
If we always want to evaluate a given parameter of a special
function, we can also declare the corresponding argument as
non-special which effectively turns the argument into an ordinary
call by value parameter. This is done by preceding the argument
with the `~' symbol in the declaration of the function symbol. For
instance, in the above definition of the ifelse function the
first argument will be evaluated anyway. Hence we may as well make it a
non-special argument. The corresponding declaration reads as follows:
public special ifelse ~P X Y; |
With this declaration (which is precisely how ifelse is actually
declared in the standard library), the ifelse function always
receives its first argument evaluated, while the other two arguments are
special. This works exactly like an explicit application of the `~'
operator, but releases you from the burden of having to force evaluation
of the argument in every application of the ifelse function. It
is also slightly more efficient since the argument does not have to be
constructed as a literal expression, but can be evaluated right away
before the ifelse function is applied to it.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Special forms prevent their arguments from being evaluated until they are used in a context which is not "protected" by another special form. This also works if the special form happens to be a constructor. Special constructors thus allow you to define algebraic data structures which act as containers for deferred expressions. By these means Q enables you to create "lazy" data structures which work pretty much like their counterparts in non-strict functional languages like Haskell. A prime example for this are streams, the lazy list data structure, which was briefly introduced in Lists, Streams and Tuples. As of Q 7.1, this data type is now predefined as follows:
public type Stream = special const nil_stream, cons_stream X Xs; |
The constant symbol nil_stream denotes an empty stream, while the
special constructor symbol cons_stream is used to represent a
nonempty stream with head X and tail Xs. (Please note
that, as of Q 7.1, these symbols are now builtins and thus cannot be
imported from the stream.q module any more.) As of Q 7.1, the
language also provides some syntactic sugar for these constructs to make
them look more list-like. In particular, {} is the same as
nil_stream and {X|Xs} the same as cons_stream X
Xs. More generally, the notation {X1,X2,…,Xn|Y} can be
used as a shorthand for cons_stream X1 (cons_stream X2 …
(cons_stream Xn Y)…). Likewise, {X1,X2,…,Xn} is
the same as cons_stream X1 (cons_stream X2 … (cons_stream Xn
nil_stream)…).
Just like lists, a stream consists of a head element, X, and a
stream of remaining elements, the tail Xs. The standard library
module stream.q extends most list operations to streams, see
Streams. For instance, the list operations hd and tl
are easily translated to streams as follows:
hd {X|_} = X;
tl {_|Xs} = Xs;
|
Lisp programmers should note that this implementation does not require
any explicit "delay" and "force" operations; all necessary
evaluations are performed implicitly by the Q interpreter. As a simple
example, consider the following definition for the stream of all
integers starting from N:
ints N = {N|ints (N+1)};
|
Obviously, this definition would not work with eager evaluation, because
the recursive invocation of ints would send the interpreter into
an infinite recursion. However, since streams are implemented as a
special form, the evaluation of the nested ints term is deferred,
and the tails of the stream are only produced as we access them, using
"call by need" evaluation:
==> ints 1
{1|ints (1+1)}
==> tl _
{2|ints (2+1)}
==> tl _
{3|ints (3+1)}
|
Now this is all good and fine, but it also raises a question, namely how are we going to write definitions which match against subterms in a deferred value if the needed parts have not been evaluated yet? For instance, suppose that we want to extract the next three elements from the above stream. We would actually like to write something like the following:
==> _
{3|ints (3+1)}
==> def {X,Y,Z|_} = _
==> (X,Y,Z)
(3,4,5)
|
In fact this works as expected, even though {3|ints (3+1)} does
not literally match the pattern {X,Y,Z|_}, because the Q
interpreter takes into account special data constructors during pattern
matching and will automagically evaluate deferred subterms as needed.
This also works with non-special arguments of a function in
equations. E.g., consider the following definition of a
deinterleave operation which takes adjacent stream elements and
turns them into pairs:
deinterleave {} = {};
deinterleave {X,Y|Xs} = {(X,Y)|deinterleave Xs};
|
Note again that the pattern {X,Y|Xs} on the left-hand side of
the second equation does not match a stream value like ints
1 ⇒ {1|ints (1+1)} literally, but it does match after
the embedded deferred subterm ints (1+1) has been reduced to
{2|ints (2+1)}. The interpreter performs the recursive
evaluation automatically, and hence we have:
==> deinterleave (ints 1)
{(1,2)|deinterleave (ints (2+1))}
==> tl _
{(3,4)|deinterleave (ints (4+1))}
|
This "call by need" evaluation of argument values which is performed automatically during pattern matching, is also known as call by pattern. In the Q language, call by pattern evaluations are always done in variable definitions, and also when matching against the left-hand sides of equations, but only for non-special toplevel arguments.
In contrast, special toplevel arguments in an equation are
always passed unevaluated, "as is"; that's their purpose after
all. Thus, e.g., if foo is a special form then a pattern like
foo {X,Y,Z|_} will only match if the provided stream
argument literally has three elements in front; no implicit evaluation
of stream tails to obtain the required number of stream elements will be
performed in this case.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Another issue arising when dealing with special forms is the problem of repeated evaluations of the same deferred expression. Note that normally a special argument is evaluated each time its value is required on the right-hand side of a definition. Consider, for instance:
special foo X; foo X = bar X X; |
Assuming that bar is non-special, X actually gets
evaluated twice, once for each occurrence on the right-hand side of the
equation. This may be undesirable for reasons of efficiency. One method
to cope with such a situation is to introduce a local variable binding,
e.g.:
special foo X; foo X = bar Y Y where Y = X; |
However, in some cases this may be inconvenient or downright
impossible. In particular, the deferred value might actually be stored
inside a special data element such as a stream (cf. Special Constructors and Streams). For instance, consider the case of a stream
map foo {1..} where foo is a function on integers. Then,
whenever we access an element of this stream, the corresponding
expression foo I will have to be recomputed, which might be
costly, depending on the function foo at hand.
To avoid such inefficiencies, the Q language provides the `&' (ampersand or "memoize") operator. When applied to a special argument (such as a stream member) it causes the value to be memoized. This means that the value will only be computed the first time it is needed. Each subsequent access will then simply return the already computed value.
In our example we can memoize both the heads and the tails of the
stream, by applying `&' recursively to all heads and tails. The
standard library contains the function lazy which does just
this. All we have to do is to apply this function to the existing stream
to have all elements memoized:
==> var foo = \X.writes $ "foo: "++str X++"\n" || X
==> def S = lazy (map foo {1..})
==> list (take 2 S)
foo: 1
foo: 2
[1,2]
==> list (take 4 S)
foo: 3
foo: 4
[1,2,3,4]
|
Note that when computing the first four stream members with list
(take 4 S), only elements number 3 and 4 were computed, since the first
two elements had already been computed with list (take 2 S)
before.
Functional programming theorists also denote the above style of computation, which combines call by name with the memoization of already computed thunks, as call by need or simply as lazy evaluation.
Lazy evaluation paves the way for some important and powerful programming techniques. In particular, streams are a useful device to implement different kinds of backtracking and dynamic programming algorithms in a uniform setting. For instance, here is a recursive definition of the stream of all Fibonacci numbers:
fibs = {0,1|zipwith (+) fibs (tl fibs)};
|
Unfortunately, this implementation is rather slow because of the
multiple recursive invocations of fibs. But this can be fixed by
memoizing the stream with lazy and assigning it to a variable:
var fibs = lazy {0,1|zipwith (+) fibs (tl fibs)};
|
By these means we ensure that the stream is built in-place, by "chasing
its own tail". This works because Q's global variable environment uses
dynamic binding, and the (unevaluated) instance of fibs on the
right-hand side of the definition will not be evaluated before the
variable has already been defined. Thus the stream effectively includes
circular references to itself.
The reason that special arguments are not memoized automatically at all times is that some arguments may involve functions with side-effects, in which case memoization is often undesirable. For instance, we can compute a list of five random values quite conveniently using a list comprehension as follows:
==> [random : I in [1..5]] [1809930567,1267130897,1713466763,574472268,2355190805] |
This works because the random subterm is actually a special
parameter to the standard library function listof implementing
list comprehensions (see Conditionals and Comprehensions), so it
will be recomputed as each list element is produced. However, if we
memoize the random subterm then the function will only be
evaluated once to give the value for all list members:
==> [&random : I in [1..5]] [3194909515,3194909515,3194909515,3194909515,3194909515] |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Besides streams, the Q language has a number of other built-in
operations which are actually implemented as special forms. This is
necessary, in particular, in the case of the syntactic equality operator
`==' (see Non-Linear Equations), as well as the logical
connectives and then and or else which are evaluated in
"short-circuit mode", using the following built-in equations (see
Logical and Bit Operators):
true and then X = X; false and then X = false; false or else X = X; true or else X = true; |
The first argument of these operations is non-special, but the second
one is a special argument which is only evaluated if it is needed. For
instance, if the first argument of and then is false,
false is returned - there is no need to take a look at the
second argument. The or else operation works analogously. These
rules allow the Q interpreter to perform the following reductions
immediately, without ever having to evaluate the second argument
X:
false and then X ⇒ false true or else X ⇒ true |
On the other hand, if the first argument of and then is
true (or the first argument of or else is false),
then the value of the second argument is returned:
true and then X ⇒ X false or else X ⇒ X |
In this case the interpreter also performs tail call optimization, i.e.,
the reduction to X is actually carried out before X
is evaluated. This implies that tail recursions involving and
then and or else are executed in constant stack space, as one
might reasonably expect. (The same applies to user-defined special forms
like ifelse, see Basic Concepts, which are defined in a
similar fashion.) For instance, consider the following example from the
standard library which defines a function all that checks whether
all members of a list satisfy a given predicate P:
all P [] = true; all P [X|Xs] = P X and then all P Xs; |
Note that if P X in the second rule evaluates to true,
then the right-hand side immediately reduces to the tail-recursive call
all P Xs, and hence an application of all P to a list of
arbitrary size only needs constant stack space.
Other important built-in special forms are the lambda function,
which is described in Lambda Abstractions, and the view
construction function view, see Views.
Last but not least, there are three other builtins in the Q language
which may also act as special forms depending on the arguments with
which they are invoked. The function composition operator `.'
automagically adapts to special forms in its left and/or right
operand. More precisely, if the first argument of a function F is
special, then the second operand of a composition section of the form
(F.) = (.) F is special as well; in this case F.G will
always leave G unevaluated. Furthermore, the argument of a
composition F.G is special if F or G has a special
first argument. This makes the `.' operator work as expected in
situations where the composed functions are special forms. The infix
application operator `$' (cf. Application and Sequence Operators) and the built-in flip function (which flips the
arguments of a binary function and is used to implement operator
sections with a missing left operand, see Miscellaneous Functions)
adjust to their first (function) argument in a similar fashion.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Another built-in special form is the predefined quote operator `'', which acts as a constructor that protects its single argument expression from being evaluated. This construct is useful if you wish to treat certain expressions (which may or may not be in normal form) as literal objects rather than having them evaluated. For instance:
==> '(1+1) '(1+1) |
Lisp programmers should note that the quote operator is in fact a constructor, and not an operation which returns its single argument unevaluated. This is essential since in the Q language an expression only remains unevaluated as long as it is protected by a special form - an expression which occurs outside the context of a special form is always in normal form.
Another fact that deserves mentioning is that `'' is just an "ordinary" special form, and does not involve any special "magic" on the side of the interpreter. In particular, `'' does not inhibit variable replacements in rules like the following:
special foo X; foo X = '(X+1); |
Given the above definition, foo Y ⇒ '(Y+1), i.e., the
X variable on the right-hand side is not treated as a
literal. However, as the very same example shows, you can employ
`'' to quote a free variable, and thereby defer its
evaluation:
==> def Y = 99; foo Y '(Y+1) ==> foo ~Y '(99+1) |
The force operator `~' works in quoted expressions as usual:
==> '(1+~(2+3)) '(1+5) |
Moreover, there is another, similar operation, the ``' (backquote
or "splice") operator, which forces evaluation of its argument like
the `~' operator, but also "unquotes" the result. For instance,
using the same foo function as above, we have:
==> '(`(foo Y)/2) '((Y+1)/2) |
Like the force operator, the splice operator also works outside a special form, in which case the unquoted expression is evaluated as usual. Moreover, if the evaluated argument is an unquoted expression, it is returned as is; in this case, ``' does exactly the same as the `~' operator.
The splice operator is the fundamental operation when constructing quoted expressions from smaller quoted pieces, by "splicing" the pieces into a "template" expression. Lisp programmers will be familiar with this technique. However, there are some notable differences between Q's `~'/``' and Lisp's `,'/`@,' constructs:
Also note that, in contrast to the quote operator, the force and splice operations do need special support from the interpreter. They are the only operations (besides the memoization operator, which is treated in a similar fashion) which are evaluated while a special form is being processed.
When used in concert, the quotation operators `'', ``' and
`~' become powerful tools for the symbolic manipulation of literal
expressions. They allow you to define functions which analyze and
transform an expression before the modified expression is finally
evaluated. As a simple (and somewhat contrived) example, let us write a
function which takes an arbitrary quoted expression and replaces each
instance of a foo X subterm by a corresponding bar X
expression. (More meaningful examples can be found in the standard
library; in particular, have a look at the cond.q script to see
how the listof function is implemented.)
foo X = X-1; bar X = X+1; special foobar X; foobar '(foo X) = '(bar `(foobar 'X)); foobar '(X Y) = '(`(foobar 'X) `(foobar 'Y)); foobar '(X|Y) = '(`(foobar 'X)|`(foobar 'Y)); foobar '[X|Y] = '[`(foobar 'X)|`(foobar 'Y)]; foobar X = X otherwise; |
To see foobar in action on a lambda abstraction, try the
following:
==> var f = '(\X Y.X+foo (Y+foo (X*Y))), g = foobar ~f ==> f; g '(\X . \Y . X+foo (Y+foo (X*Y))) '(\X . \Y . X+bar (Y+bar (X*Y))) ==> `f 12 14; `g 12 14 192 196 |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Let us briefly comment on Q's special forms versus "real" lazy evaluation in functional languages like Haskell. It should be clear by now that Q allows you to keep a special argument from being evaluated as long as you like, maybe "to eternity". That is, the interpreter actually computes so-called "weak" normal forms in which special arguments are left unevaluated, even if these arguments are reducible. This is similar to the way deferred evaluation is handled in traditional functional languages featuring a basic eager evaluation strategy. In contrast, a pure lazy functional language interpreter will eventually reduce each expression to a "true" normal form to which no definition is applicable.
Both approaches have their pros and cons. The nice thing about normal order evaluation (with automatic memoization) in lazy functional languages is that it "just works" and is guaranteed to be optimal (in the sense that the program never computes a value that is not needed to produce the program's result). However, one downside of this method is that it becomes hard to deal with operations involving side-effects. Moreover, normal order evaluation is in fact only truly lazy for the lambda and combinatorial calculi used by these systems, which are very special kinds of rewriting system. In general term rewriting, however, there is no single "optimal" evaluation strategy; in fact, the problem to devise a terminating, let alone an optimal reduction strategy for a given rewriting system, even for a single input expression, amounts to solving the infamous halting problem which is undecidable.
Although the Q interpreter uses applicative order evaluation by default, special forms effectively give you full control over the evaluation strategy. As we have seen, this makes it possible to implement lazy data structures which behave pretty much like their counterparts in pure lazy languages. But special forms actually go way beyond this, as they also provide a device to program "meta" (or "macro"-like) functions which operate on the descriptions of other functions (or any other kind of literal expressions). These facilities in turn make it possible to write programs performing "reflective" computations in which a program inspects some part of itself (which might be given, say, as a lambda abstraction), and modifies this part before actually executing it.
It goes without saying that this is an essential requisite in advanced applications like artificial intelligence. While Q is not a fully reflective language (the equational rules making up a Q program are no first-class citizens of the language, neither are the modules a program consists of), having symbolic expressions as first-class citizens of the language provides enough reflectiveness so that you can do most stuff that you'd normally use Lisp for. Pure lazy languages like Haskell fall short in this respect, because they cannot deal with function expressions on a symbolic level without introducing an extra level of interpretation.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Besides the built-in operators already discussed in Expressions, the Q language also provides a collection of predefined functions which cover a variety of elementary operations. The built-in functions can be divided into eight major groups, arithmetic and numeric functions, string, list and tuple functions, conversion functions, I/O functions, lambda abstractions, exception handling functions, and miscellaneous functions. We will describe each of these in turn.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The Q language provides the following functions for simple arithmetic on enumeration types (cf. Types):
enum Xs Y, enum_from Xsenumerate a range of enumeration type values
succ Xsuccessor function
pred Xpredecessor function
As already discussed in Enumeration Types, the enum and
enum_from functions construct a list from a range of enumeration
type values, and the succ and pred functions produce the
successor and predecessor of a given value in an enumeration type. Note
that while the built-in definitions of these operations only apply to
"real" enumeration types (including the built-in Char type),
the standard prelude extends enum, succ and pred to
integers (and, in the case of enum, floating point values),
cf. The Standard Library.
Two additional functions are provided for integer arithmetic, shl
and shr, which are used to shift the bits of an integer value a
given number of bit positions to the left and right, respectively:
shl N COUNTshift N COUNT bits to the left
shr N COUNTshift N COUNT bits to the right
If the COUNT argument is negative, then the opposite shift by
-COUNT bits is performed, i.e., shl X COUNT = shr X
(-COUNT).
These operations provide the fastest possible way to multiply an integer
with, and divide it by, a power of two. More precisely, if X is
an arbitrary and N a nonnegative integer, then shl X N and
shr X N are the same as X*pow 2 N and X div pow 2
N, respectively, where pow 2 N denotes the Nth power of
2, as an integer. Note that in contrast to bit shifts on fixed size
machine integers, there is no "loss" of most significant bits, because
Q integers have arbitrary precision; hence the results obtained with bit
shifting are always "arithmetically correct".
The bitwise logical operations provide a means to implement sets of
nonnegative integers in a fairly efficient manner. As usual, such
"bitsets" are obtained by turning on exactly those bits whose
positions are the members of the set. Thus, 0 encodes the empty
set, and shl 1 I the set consisting of the single member
I. You then employ or as set union, and as set
intersection, and not as set complement. Set difference can be
implemented by taking the intersection with the complement set.
For convenience, you might wish to define yourself a bit function
as follows:
bit = shl 1; |
Then you can test for membership using an expression like X and
bit I which returns nonzero (namely bit I itself) iff I
is in the set. You can also build a set from its members by
or'ing the corresponding bit values, e.g.: bit 1 or
bit 4 or bit 39.
Because Q integers have arbitrary precision, bitsets do not have an a priory size restriction in Q (in fact, they can even be infinite, as negative integers are used to encode the complements of the finite bitsets). However, you should note that a finite bitset (a.k.a. a nonnegative integer) needs space proprotional to the set's largest member, and hence this method will be practical only if the potential set members are within a reasonable range.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
exp Xexponential function
ln Xnatural logarithm
sqrt Xsquare root
sin Xsine
cos Xcosine
atan Xarcus tangent
atan2 Y Xarcus tangent of Y/X
randomrandom number
seed Ninitialize random number generator
This group includes the usual trigonometric and exponential functions,
logarithms and square root. All these functions take both integer and
floating point arguments. They return a floating point number. The
atan2 function is like atan, but takes two arguments and
computes the arcus tangent of their ratio. In difference to atan,
it takes into account the signs of both arguments to determine the
quadrant of the result.
Besides this, there is a parameterless function random which
returns a 32 bit pseudo random integer in the range 0..2^32-1. To obtain
a random 32 bit floating point value in the range [0,1], you simply
divide random by 0xffffffff. The current implementation uses the
"Mersenne Twister", a fast uniform random number generator with a
period of 2^19937-1, written by Makoto Matsumoto and Takuji
Nishimura. The generator is initialized automatically with a seed taken
from the system clock at the time the interpreter starts up. The
seed function allows to initialize the generator explicitly with
a given nonnegative integer seed; this is useful if it is necessary to
reproduce random sequences. Note that only the least significant 31 bits
of the given seed value are actually used; these 31 bits are padded with
a constant 1 bit, since the generator works best with odd seed values.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This group provides some additional operations on sequences (strings, lists and tuples).
sub Xs I Jextract the subsequence consisting of members I thru J of
a string, list or tuple Xs
substr S K Lreturn substring of length L at position K in S
pos S1 Sposition of substring S1 in S
The sub function returns a subsequence (also called a "slice")
of a string, list or tuple, given by the zero-based index values
I and J. For instance:
==> sub "abcde" 2 3 "cd" ==> sub [a,b,c,d,e] 2 3 [c,d] ==> sub (a,b,c,d,e) 2 3 (c,d) |
(The standard library provides an analogous definition of sub on
streams, see Streams.)
The sub function handles all combinations of index arguments in a
reasonable manner. Thus, if J < I or I >= #Xs then an
empty sequence is returned, I = 0 is assumed if I < 0, and
the remainder of the sequence is returned if J >= #Xs.
Besides sub, there are two additional functions operating
exclusively on strings. The substr function returns a substring
of a string with a given length at a given position. This function is
provided for backward compatibility; note that substr S K L is
equivalent to sub S K (K+L-1). The pos function returns
the position of a substring in a string (-1 if there is no
occurrence of the given substring). For instance:
==> pos "cd" "abcde" 2 ==> substr "abcde" 2 2 "cd" |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This group consists of functions for converting between different kinds of objects:
trunc Xtruncate floating point to integer value, or return integer value unchanged
round Xround floating point to nearest integer value, or return integer value unchanged
float Xconvert integer to floating point number, or return floating point value unchanged
int Xcompute the integer part of floating point number, or convert integer value to floating point
frac Xcompute the fraction part of floating point number, or return 0.0
for an integer value
hash X32 bit hash code of an expression
ord Xordinal number of enumeration type member (or character)
chr Ncharacter with given ordinal number
list Xsconvert a tuple to a list
tuple Xsconvert a list to a tuple
str Xconvert expression to string
strq Xconvert quoted expression to string
val Sconvert string to expression
valq Sconvert string to quoted expression
Note that, as of Q 7.1, the trunc, round, float,
int and frac functions work consistently on all numbers,
i.e., both integers and floating point values. (With older versions,
float would only work on integers and the other functions only on
floating point values.)
The ord and chr functions convert between characters and
their ordinal numbers in the (Unicode) character set. The ord
function can also be used to compute the ordinal number of a member of
an arbitrary enumeration type, see Enumeration Types.
The hash function returns a nonnegative 32 bit hash code for an
arbitrary expression. The only guarantee is that syntactically equal
objects are mapped to the same hash code. This function is useful,
e.g., for implementing hashed dictionaries which map arbitrary keys to
corresponding values, see Hashed Dictionaries for an application.
The str function converts its argument expression to the corresponding
print representation conforming to the Q expression syntax, which is
returned as a string; the argument is evaluated as usual. The strq
function is similar, but converts a quoted expression (cf.
The Quote Operator). For instance:
str (1+1) ⇒ "2" strq '(1+1) ⇒ "1+1" |
The val/valq functions are the counterpart of str and
strq; they convert a string containing the print representation of an
expression back to an expression. In the case of val, the expression
is evaluated, while valq returns the parsed expression as it is,
protected with the quote operator:
val "1+1" ⇒ 2 valq "1+1" ⇒ '(1+1) |
These functions require that the contents of their string arguments conform to the Q expression syntax. In case of a syntax error, the built-in rules fail.
All expression conversion routines use the global scope, i.e., the
namespace of the main script, for the purpose of converting identifiers,
just like the interpreter itself. Thus these functions work consistently
in a given main script no matter which module they are invoked
from. However, this also means that the precise results may depend on
which main script is currently loaded. Note that this only affects the
parsing and unparsing of identifiers. For instance, if a function symbol
foo in module bar is visible in the main scope then it
will be unparsed simply as "foo", otherwise the qualified form
"bar::foo" is used.
Also note that, as of version 7.2, Q now provides a hook into the built-in expression pretty-printer so that you can define custom views for any user-defined, built-in or external data type. This is described in Views.
A word of caution: Having a "universal" string representation of Q
expressions is very useful to facilitate interaction with the user and
for the purpose of "marshalling" expressions so that they can be
stored in a file, but you should note that the expression conversion
routines are so powerful that they can easily be abused for rather
questionable purposes. For instance, the val function gives you a
"backdoor" to access any function or variable symbol in a
script, even private symbols in other modules. The only way around this
would have been to severely restrict the functionality of the conversion
routines, which would have made them much less useful. Thus it is in
your responsibility to use these functions in an "orderly" manner.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This group provides functions for handling input/output from/to the terminal or a text file. These functions implement operations with side-effects; the side-effects consist in modifying the terminal display or a file. All I/O functions also take care, automatically and transparently, of the necessary translations of strings between the system and the internal UTF-8 encoding.
fopen NAME MODEopen file NAME in mode MODE
popen CMD MODEopen pipe CMD in mode MODE
fclose Fclose a file
read, fread Fread an expression from the terminal or a file
readq, freadq Fread a quoted expression from the terminal or a file
readc, freadc Fread a character from the terminal or a file
reads, freads Fread a string from the terminal or a file
write X, fwrite F Xwrite an expression to the terminal or a file
writeq X, fwriteq F Xwrite a quoted expression to the terminal or a file
writec C, fwritec F Cwrite a character to the terminal or a file
writes S, fwrites F Swrite a string to the terminal or a file
eof, feof Fcheck for end-of-file on the terminal or a file
flush, fflush Fflush output buffer on the terminal or a file
| 10.5.1 Terminal I/O | ||
| 10.5.2 File I/O | ||
| 10.5.3 Pipes |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Input from the terminal (i.e., the standard input device) is done through
the parameterless functions read, readq, readc and
reads which read and return, respectively, an expression, a quoted
expression, a single character, and a string. Terminal input is line-buffered
which means that you must type an entire input line before anything is
returned by these functions.
The reads function obtains one line of input, strips off the trailing
newline character, and returns the result as a string. For instance (here and
in the following, <CR> means that you hit the carriage return key to
terminate the input line, rather than actually typing the string
"<CR>"):
==> reads one line of text<CR> "one line of text" |
The readc function allows you to read input character by character;
at the end of a line the newline character "\n" is returned:
==> readc <CR> "\n" |
The read and readq functions read one line from the input, as
with reads, and convert the resulting string to an expression, as with
val and valq, respectively. For instance:
==> read 1+1<CR> 2 ==> readq 1+1<CR> '(1+1) |
The corresponding functions for producing output on the terminal (the
standard output device) are named write, writeq,
writec and writes. They print an expression, a quoted
expression, a character and a string, respectively. These functions
take one argument, the object to be printed, and return the empty tuple
(). For instance:
==> writec "x" x() ==> writes "one line of text\n" one line of text () ==> write (1+1) 2() ==> writeq '(1+1) 1+1() |
(Output operations are invoked solely for their side-effects. However,
any Q expression must have a value, and since there is no other meaningful
value to return, the write functions return (). In the above
examples, this value is output by the interpreter after the printed text,
which explains, e.g., the x() in response to writec "x".)
It is common practice to combine these operations by means of the ||
operator in order to implement dialogs such as the following prompt/input
interaction:
==> writes "Input: " || reads Input: one line of text<CR> "one line of text" |
You will also often encounter several output operations for interpolating data into text fragments:
==> writes "The result is " || writeq '(1+1) || writes ".\n" The result is 1+1. () |
The eof function allows you to check for end-of-file on the
terminal. Actually, this causes another line of input to be read from
the terminal if no input is currently available, to see whether the end
of the input has been reached. Most operating systems allow you to type
a special character to indicate end-of-file, such as the Ctl-D
character on the UNIX system:
==> eof <Ctl-D> true |
The flush function writes any pending output to the terminal. It
is rarely necessary to call this function explicitly; see also the
discussion of the fflush function in File I/O.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
File input and output is implemented by the functions fread,
freadq, freadc, freads, fwrite,
fwriteq, fwritec and fwrites. There also is an
feof function which allows to check whether the end of a file has
been reached, and an fflush function which flushes the output
buffer of a file. These operations are analogous to their terminal
equivalents, but take an additional first argument, a file
object. A file object is a special kind of elementary object in the Q
language which is returned by the built-in fopen function.
The fopen function takes two string arguments, the name of
the file to be opened (which is the name under which the file is known
by the operating system), and the mode in which the file is to be
opened. The mode string "r" means to open an existing file for
reading, "w" to open a new file for writing (existing files are
truncated to zero size), and "a" to create a new or append to an
existing file. You can also add the letter "b" at the end of the
mode string to indicate that the file should be opened as a binary
file. This only has the effect to suppress the LF/CR-LF conversion on
MS-DOS/Windows systems, which is essential if you read/write binary data
from/to the file. On UNIX systems this flag is ignored.
For instance, a function checking whether a file exists and is accessible for reading can be implemented as follows:
exists NAME = isfile (fopen NAME "r"); |
(A suitable definition of the isfile function is provided by the
standard library, cf. Type-Checking Predicates.) Files are closed
automatically when the corresponding file objects are no longer
accessible in a computation. E.g., with the above definition of the
exists function, if you invoke this function as exists
"myfile", then the file object returned by fopen "myfile" "r"
(assuming that the file actually exists) will become inaccessible as
soon as it has been processed by the isfile function, at which
point it gets closed. Similarly, if you assign a file to a variable in
the interpreter,
==> def F = fopen "myfile" "r" |
the file will be closed as soon as you undefine the variable:
==> undef F |
Occasionally, it might be necessary to close a file explicitly, e.g., a file
object might still be accessible, but you want to close it before you do some
other processing. In this case you can invoke the fclose function on
the file:
==> def F = fopen "myfile" "r"; fclose F () |
After closing a file, the file object still exists, but all further I/O
operations on it (including fclose itself) will fail:
==> freads F; fclose F freads <<File>> fclose <<File>> |
(The notation <<File>> represents a file object, since file
objects have no printable representation. This syntax is also used by
the expression output and unparsing functions, write, str
etc. Of course, such objects cannot be reparsed using read,
val, etc.)
The standard input and output devices used by the terminal I/O functions
(the readx and writex functions, see Terminal I/O)
are actually special instances of the file I/O functions, which read
from standard input and write to standard output. The standard input and
output devices are implemented by the interpreter as predefined file
objects which are assigned to the INPUT and OUTPUT
variables, see Command Language. Thus, for instance, reads
and writes X are equivalent to freads INPUT and
fwrites OUTPUT X, respectively.
As an example for the use of file I/O, here's a tail-recursive function which copies the contents of one file opened in read mode to another file opened in write or append mode:
fcopy F G = () if feof F;
= fwritec G (freadc F) || fcopy F G otherwise;
|
Note that file objects are indeed modified by input and output operations; at least, the file pointer is moved accordingly. Otherwise the above definitions would not work. This also becomes apparent when manipulating files interactively:
==> def F = fopen "xyz" "r" ==> freads F "first line in file xyz" ==> freads F "second line in file xyz" … |
The fflush function is used to write any buffered output to a
file. This operation is only needed when the target file must be updated
immediately. For instance, the target file may actually be a pipe
and it may be necessary to get an immediate response from the program
which reads the output at the other end of the pipe (see
Pipes). Then you can use the fflush function as follows:
==> fwrites F S () ==> fflush F // force S to be written to F immediately () |
Note that if the Q interpreter runs interactively, then it automatically
flushes the standard output files whenever an evaluation is finished,
see Running Compiler and Interpreter. And, of course, buffered
output is always flushed when a file is closed. In case you have to
flush standard output explicitly, use flush or, equivalently,
fflush OUTPUT. (This should only be necessary if the standard
output stream has been redirected to a disk file or a pipe.)
Also note that the interpreter does not provide direct support for
reading and writing binary files. However, such functionality is
provided by the clib standard library module (see Clib).
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The popen function is used to create a file object connected to a
pipe. The file objects constructed with popen allow you to
pipe data into an operating system command, and to read the output of
such a command from a file. Like fopen, popen takes two
string arguments. The first parameter denotes the command to be
executed, and the second parameter specifies the mode in which the pipe
is to be opened. The mode argument must be either "r" (read from
the output of the given command) or "w" (write to the input of
the command), and causes a file object open for reading or writing to be
returned, respectively. The "b" flag may also be used to open the
pipe as a binary file, see File I/O.
Input pipes are typically employed for retrieving information from the
host operating system. For instance, on UNIX systems we can use the
ls command to obtain a list of filenames matching a given
wildcard specification. A corresponding function ls which returns
such a list can be implemented as follows:
ls S:String = freadls (popen ("ls "++S) "r");
freadls F:File = [] if feof F;
= [freads F|freadls F] otherwise;
|
As an example for an output pipe, the following function more
pipes a file through the more program which displays the file
page by page:
more F:File = fcopy F (popen "more" "w"); |
(The definition of fcopy is as in File I/O.)
Just like ordinary files, pipes are closed automatically when the
corresponding file object is no longer accessible, or explicitly by an
invocation of the fclose function. Furthermore, the interpreter waits
for the command started with popen to finish when closing the pipe.
Output pipes are a convenient means to implement specialized output
devices in the Q language. For instance, the standard library script
graphics.q writes its graphics output to a file GRAPHICS
which is often defined as a pipe to a PostScript previewer like, e.g.,
Ghostscript:
def GRAPHICS = popen "gs -q -" "w"; |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
As of Q 7.1, the special form lambda is now a built-in function
which returns a "compiled" function object represented by the new
built-in Function type. This offers better performance than the
previous implementation in the standard library which was based on the
combinatorial calculus. Moreover, the \X.Y syntax can now be used
as a convenient shorthand for an application lambda X Y, see
Conditional Expressions and Lambdas.
Both arguments of lambda are special. The first argument X
denotes an expression to be matched against the actual parameter of the
function when it gets applied. The second argument Y is the body
of the lambda abstraction. When a lambda function is applied to an
argument Z, Z is first matched against the pattern
X and the (free) variables in X are bound to the
corresponding values in Z. Pattern matching is performed in the
same manner as for the left-hand side of an equation; in particular,
non-linear patterns (which contain a variable more than once) and the
anonymous variable work as expected, as do "call by pattern"
(cf. Special Constructors and Streams) and matching against
"views" (cf. Views). If the match fails then the application of
the lambda function fails, too. Otherwise the body of the lambda is
evaluated, with the variables of the pattern replaced with their
corresponding values, and the resulting normal form is returned as the
value of the lambda application.
As in previous releases, the lambda function needs to know which
other special forms perform variable bindings so that these constructs
can be expanded recursively. To these ends, such "lambda-like"
functions should be declared as members of a type derived from the
predefined Lambda type, and the predefined lambdax special
form should be overloaded such that, when applied to an expression
involving the lambda-like function, it returns a quoted expansion of the
construct in terms of lambda. Examples for this can be found in
the cond.q standard library module.
The following simple lambda abstraction can be used to denote a function
which squares its argument by multiplying it with itself. (In the
following we use the customary \X.Y notation throughout, so this
example is written as \X.X*X. But note that this is just
syntactic sugar for an application of the lambda function,
lambda X (X*X) in this case.)
==> \X.X*X \X1 . X1*X1 ==> _ 5 25 |
In this example the lambda pattern is just a single variable, which always matches. It is also possible to have pattern-matching lambda abstractions, see below. Also note that lambda objects have a printable representation in Q, although the original variable names are lost when a lambda is printed (more about this in the subsection "Lambdas as First-Class Objects" below).
Multi-argument lambda functions can be created using nested lambdas:
==> \X.\Y.(1-X)*Y \X1 . \X2 . (1-X1)*X2 ==> _ 0.9 0.5 0.05 |
For convenience, the notation \X1 X2 … . Y is provided as a
shorthand for \X1 . \X2 … . Y:
==> \X Y.(1-X)*Y \X1 . \X2 . (1-X1)*X2 ==> _ 0.9 0.5 0.05 |
Also note that in a nested lambda expression each variable is bound by the innermost lambda in which it occurs:
==> \X.(\X.X*X) (X+1) \X1 . (\X2 . X2*X2) (X1+1) ==> _ 2 9 |
Moreover, the lambda variables are always bound statically, as determined by the lexical context. For instance, consider:
==> def F = \X.\Y.(1-X)*Y, G = \Y.(1-X)*Y, H = \X.~G ==> F; H \X1 . \X2 . (1-X1)*X2 \X1 . \X2 . (1-X)*X2 ==> F 0.9 0.5; H 0.9 0.5 0.05 (1-X)*0.5 |
Note that in the function G the variable X occurs free,
and hence the same holds for H. On the other hand, in the
function F the variable X is bound by the outermost
lambda.
The lambda argument can also be a pattern to be matched against the actual parameter when the function is applied:
==> \(X,Y).(1-X)*Y \(X1,X2) . (1-X1)*X2 ==> _ (0.9,0.5) 0.05 ==> \[X|Xs].Xs \[X1|X2] . X2 ==> _ [1,2,3] [2,3] |
If a match fails then the application of the lambda function fails, too:
==> (\[X|Xs].Xs) [] (\[X1|X2] . X2) [] |
The anonymous variable and nonlinear patterns also work as expected:
==> \[_|Xs].Xs \[X1|X2] . X2 ==> _ [1,2,3] [2,3] ==> \[X,X|Xs].Xs \[X1,X1|X2] . X2 ==> _ [1,1,2] [2] ==> (\[X,X|Xs].Xs) [1,2,3] (\[X1,X1|X2] . X2) [1,2,3] ==> (\[_,_|Xs].Xs) [1,2,3] [3] |
But note that in a multi-argument lambda, the different argument
patterns will be matched independently from each other. This is because
a lambda abstraction like \X X.X*X is equivalent to
\X.\X.X*X, so the X variable in the body is actually bound
to the second lambda argument (which belongs to the innermost lambda)
and the first argument is never used:
==> \X X.X*X \X1 . \X2 . X2*X2 ==> _ 1 2 4 |
This is to be seen in contrast to an equational definition like `foo X X = X*X;' in which the entire left-hand side participates in the matching process.
Lambdas can also be used to substitute variables in special forms like the quote operator:
==> (\X.'(X+1)) (2*3) '(6+1) |
But note that while the pattern and body of lambda are special
arguments, the actual parameter expected by the lambda function is
non-special. Thus the value 2*3 in the example above is evaluated
before it is substituted for the X variable. If you want to
prevent lambda arguments from being evaluated, you have to protect them
with a special form (e.g., a quote), and extract the value for use in
the lambda body using an appropriate pattern:
==> (\'X.'(X+1)) '(2*3) '(2*3+1) |
While Q's lambda abstractions look superficially like those in languages
like Haskell and ML, there is a big difference: Lambdas are just
normal expressions in the Q language and can thus can be
inspected and manipulated just like any other expression. In other
words, lambda is just an ordinary function in Q (albeit a special
form) which is implemented on top of the term rewriting machinery that Q
is based on. This has some important consequences which are discussed in
the following, including some incompatibilities to mainstream FP
languages that Haskell and ML programmers should know about.
For efficiency reasons, the value returned by lambda is actually
a "compiled" form of the given lambda abstraction, which is
represented as an object of the built-in Function type. These
objects are computed at runtime and contain a complete description of
the original lambda abstraction, except that the original names of
variables bound in the lambda pattern are lost in the compilation
process; this is why you will see generic variable names like X1,
X2, etc. in the printed representation.
We also call these objects lambda functions, to distinguish them
from lambda abstractions, i.e., the lambda expressions from
which they are computed. These objects have an associated "view", with
lambda acting as a "virtual constructor" of the Function
type (cf. Views), so that lambda objects can be compared for
syntactic equality, printed, and dissected by pattern
matching. Therefore the difference between lambda abstractions and the
corresponding function objects is mostly transparent to the programmer
(except for the variable names); for most practical purposes you should
be able to work with lambdas just as if they were actually represented
the way that they are written.
For instance, in Q you can create a lambda function and then extract its pattern and body simply as follows:
==> var fac = \N.if N>0 then N*fac (N-1) else 1 ==> def \X.Y = fac; X; Y X1 if X1>0 then X1*fac (X1-1) else 1 |
This makes it easy to manipulate lambdas in various ways before actually
applying and thereby evaluating them, which offers great opportunities
for "metaprogramming". For instance, a simple kind of `let'
expressions can easily be defined in terms of lambda as follows:
special let C X; let (A=B) X = (\A.X) B; |
More examples of this kind can be found in the standard library. For
instance, the standard library defines `case' expressions as well
as list and stream comprehensions in terms of lambda.
However, this power comes at a price. Since lambdas are first-class objects, the variables "bound" in lambdas are just ordinary expressions, too. In fact, for the Q interpreter the variables inside a lambda pattern are nothing special at all; on the level of equational definitions they are just free variables! Conversely, from the point of view of lambdas, the variables bound in an equation are "meta variables" whose values are substituted into both the pattern and the body of a lambda abstraction. In fact, we used that to good effect in the above definition of `let'.
This leads to a common pitfall, namely that you might accidentally try
to use a variable symbol bound by the left-hand side of an equation (or
inside a where clause) as a lambda variable, with predictable but
unintended results. For instance, consider the following definition:
foo X = (\X . 2*X+1) (X+1); |
In Haskell, an equational definition like the one above (i.e., something
like foo x = (\x->2*x+1) (x+1)) is essentially considered
equivalent to a corresponding lambda abstraction (\x ->
(\x->2*x+1) (x+1) in this case), and hence foo 99 will evaluate
to 2*(99+1)+1 = 201.
Not so in Q. Here, the left-hand side X acts as a meta variable
whose value is substituted into the lambda abstraction, yielding the
result (\99 . 2*99+1) (99+1), which probably was not intended
here. To get the correct behaviour in this example you can either
replace the lambda variable with a symbol which is not bound by the
equation, or use an inline variable declaration to escape the lambda
variable, like so:
foo X = (\var X.2*var X+1) (X+1); |
Of course you can also just define foo itself as a lambda function:
foo = \X . (\X.2*X+1) (X+1); |
While this is a little awkward and might take some time to get used to, there is no way around this if lambda abstractions are implemented as first-class citizens as they are in the Q language. So some discipline is needed to avoid the pitfall sketched out above.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
As already mentioned, the Q language only knows a few "hard" runtime error conditions a.k.a. exceptions such as stack or memory overflow. However, "soft" exceptions can also be generated and handled in any desired manner. The following functions are used to deal with all kinds of exceptions during expression evaluation:
halthalt evaluation
quitexit the Q interpreter
breakinvoke the debugger
failabort the current rule
_FAIL_abort the current reduction
catch F Xhandle an exception
throw Xraise an exception
trap ACT SIGtrap signals
The halt function never returns a result, but raises an exception
which normally causes the evaluation process to be aborted immediately.
This last resort is commonly used in case of a fatal error condition.
The quit function is like halt, but also causes exit from
the interpreter and returns you to the operating system shell; this
operation is often used interactively to terminate a session with the
interpreter. (Note that in a multithreaded script, see POSIX Threads, quit only terminates the program when it is invoked
from the main thread. In other threads it just acts like halt.)
If the break flag is on (see Debugging), the
break function interrupts an evaluation and invokes the symbolic
debugger built into the Q interpreter, just as if the user typed
Ctl-C. This operation allows you to set breakpoints in a
script. For instance,
foo X = break || bar X; |
causes the debugger to be invoked as soon as the rule is executed. If
the break flag is off then this operation has no
effect. In any case, the break function returns ().
The fail function is used to abort the current rule, just like a
failing qualifier. The difference is that fail can be used
anywhere in the right-hand side or qualifier of a rule, and causes the
rule to be exited immediately. This allows you to handle
complicated error conditions which occur while a rule is already being
executed, and also provides an efficient method for implementing
backtracking algorithms. For instance, here is a quick solution for the
famous "N queens" problem:
queens N = search N 1 1 [];
search N I J P = write P || writes "\n" if I>N;
= search N (I+1) 1 (P++[(I,J)]) || fail if safe (I,J) P;
= search N I (J+1) P if J<N;
= () otherwise;
safe (I1,J1) P = not any (check (I1,J1)) P;
check (I1,J1) (I2,J2)
= (I1=I2) or else (J1=J2) or else
(I1+J1=I2+J2) or else (I1-J1=I2-J2);
|
This algorithm prints out all valid placements (i.e., lists of
(row,column) pairs) of N queens on an N times N board.
Note the use of fail in the second equation for the search
function, which causes the rule to fail after the current
placement has been tried, after which evaluation proceeds with the third
rule which places the queen into the next column. This turns the
definition into a backtracking algorithm. The safe function
verifies that a placement (I1,J1) does not put the queen in check
with any of the other queens which have already been placed.
The _FAIL_ function works in a fashion analogous to fail,
but makes an entire reduction fail (not just a single
equation). This is similar to throwing an exception (see below) in that
the evaluation of the current redex is aborted immediately, but does not
actually raise any exception; instead, it effectively forces the redex
to be a normal form, regardless of how many subsequent equations might
still be applicable. This operation is useful if you want to check for
certain error conditions before trying any subsequent
equations. For instance:
foo X:Num = _FAIL_ if X<=0;
= bar X if X>1;
= bar (1/X) otherwise;
|
The catch function allows you to handle both "hard exceptions"
which are raised when the interpreter encounters one of the runtime
error conditions discussed in Error Handling, and "soft
exceptions" which are raised by the functions halt and
quit already explained above, and the throw function which
is discussed below. Moreover, the interpreter also handles certain
signals sent to it, e.g., via the kill(2) system call, by raising
appropriate exceptions. In the current implementation, by default only
the signals SIGINT ("break"), SIGTERM ("terminate")
and SIGHUP ("hangup") are handled; the latter two are both
treated as termination requests, like a call to the quit
function. Signal handlers can also be installed by the running script
using the trap function discussed below.
Both arguments of catch are special. The catch function
first evaluates its second argument, the target expression and, if
all is well, returns the value of that expression. Otherwise, if any
exception was raised during the evaluation of the target expression, it
evaluates the first argument (the exception handler), applies it
to the value of the exception, and returns the result computed by the
exception handler.
In the case of a runtime error condition and the exceptions raised with
the halt and quit functions, the exception value is
of the form syserr N where N is one of the integer error
codes listed below:
Ctl-C.
halt function, or request to halt
evaluation from the debugger.
quit
function, or request to exit the interpreter from the debugger.
Let's try this by forcing a stack overflow condition:
==> iter 100000000 (+) 1
! Stack overflow
>>> iter 100000000 (+) 1
^
==> catch exception (iter 100000000 (+) 1)
exception (syserr 5)
|
The syserr constructor belongs to the built-in
SysException type, which in turn is a subtype of
Exception. These types may be thought of as being predefined as
follows:
public type Exception; public type SysException : Exception = const syserr N; |
User-defined exceptions can be generated with throw. The
throw function raises an exception whose value is given by its
single (non-special) argument. For instance:
hd [] = throw '(hd []); exception X = writes "Exception: " || write X || writes "\n"; |
With these definitions we have:
==> catch exception (hd [1..3])
1
==> catch exception (hd [1..0])
Exception: '(hd [])
()
==> hd [1..0]
! Exception
'(hd [])
>>> hd [1..0]
^
|
Note that if an exception raised with throw is not handled with a
corresponding catch, then the interpreter aborts the evaluation
and prints an error message (and it will also invoke the debugger to
report the rule which generated the exception, if break is
on).
The catch and throw functions can also be used to
implement non-local value returns. For instance, here is a variation of
the N queens function which returns the first solution as soon as it has
been found during backtracking:
queens1 N = catch id (search1 N 1 1 []);
search1 N I J P = throw P if I>N;
= search1 N (I+1) 1 (P++[(I,J)]) || fail if safe (I,J) P;
= search1 N I (J+1) P if J<N;
= () otherwise;
|
Here, the "exception handler" is just the identity function id,
as defined by the standard library, see Standard Functions.
Another neat programming trick is the following which makes a rule fail if an exception occurs during evaluation of the right-hand side:
foo X = catch fail (bar X); |
This works because the handler argument, which is special, is evaluated
only if an exception is actually encountered during evaluation of the
target expression. Otherwise, foo X will just return the value of
bar X. The _FAIL_ function can be used in a similar
manner.
Lisp and C++ programmers will probably miss an additional parameter
which restricts the kind of exceptions handled with catch. You
can implement this easily yourself by checking the exception value in
your handler function, and "throw on" an unknown exception to the next
enclosing catch:
type MyException : Exception = const empty_list;
hd [] = throw empty_list;
myexception X:MyException
= writes "myexception: " || write X ||
writes "\n" || halt;
myexception X = throw X otherwise;
|
Note how we implemented the empty_list exception using our own
Exception subtype MyException. This method of structuring
error exceptions is recommended because it makes it easier to spot the
source of an exception. It also enables us to use a type guard
(cf. Types) in order to check for specific exception types, rather
than having to discriminate over different exception patterns.
Finally, the trap function allows you to have exceptions be
generated when a signal arrives. The trap function takes two
arguments, ACT and SIG, denoting the action to be
performed and the number of the signal to be trapped. It returns the
previous action associated with the signal. ACT must be an
integer value. If it is positive then the signal raises an exception of
the form syserr (-SIG); if it is negative then the signal is
completely ignored; and if it is zero then the interpreter reverts to
the default handling of the signal. You can also use trap to
redefine the default handling of the break and termination signals. To
allow signals to be caught reliably, further signals are "blocked"
(i.e., they are queued for later delivery) while catch evaluates
an exception handler. For instance, here is a little script which sets
up some signal handlers and keeps on printing the trapped signals until
it either gets terminated with kill -9 or the timed wait with the
sleep function times out. (The sleep function is described
below. The symbolic signal constants and the getpid and
printf function are provided by the clib module, see
Clib.)
test = do (trap 1) [SIGINT,SIGTSTP,SIGTERM,SIGHUP,SIGQUIT] ||
printf "I'm running as pid %d, try to kill me!\n" getpid ||
loop;
loop = flush || catch sig (sleep 100);
sig (syserr K) = printf "Hey, I got signal %d.\n" (-K) || loop;
|
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This group consists of some special operations which do not fit into any of the other groups treated above.
version, sysinfodetermine version/system information
which Sdetermine absolute pathname of the executing script or of another file on Q library path
timeget the system time
sleep Xpause for some time
isspecial Xpredicate checking whether argument is a special form
isconst Xpredicate checking whether argument is a constant
isfun X, isvar Xpredicates checking whether argument is a function or a variable symbol, respectively
isdef Xpredicate checking whether the variable X has been assigned a value
flip Fflip arguments of a binary function
The version and sysinfo functions are useful when writing
code which depends on a particular version of the Q interpreter or a
machine/operating system type. They return a string describing the Q
interpreter version and the host system on which it is installed,
respectively.
The which function performs a search for the given filename on
the Q library path. If the file is found then the absolute pathname is
returned; otherwise the function fails. The function can also be invoked
with the argument (), in which case the full pathname of the
executing main script is returned. This function is most useful if a
script has to locate additional data files which are stored along with
it, see Running Scripts from the Shell.
Two functions are provided for timing purposes. The time function
returns, as a floating point value, the time in seconds since the
"epoch" (00:00:00 UTC, January 1, 1970). The sleep function
suspends evaluation for the given time (integer or floating point value,
again in seconds). Be warned that the actual resolution of these
functions depends on the timing routines available on your system, and
even if a decent resolution is provided, the accuracy of results will
depend on various factors such as system load.
The isspecial, isconst, isfun, isvar and
isdef predicates are all special forms checking whether their
single argument has a certain property. The isspecial function
allows you to determine whether its argument is a special form which
will receive its next argument unevaluated. The isconst operation
checks whether its argument is a constant, i.e., a constant belonging to
any of the built-in types, a variable or function symbol declared with
const, or an application whose head element is such a
constant. The isfun and isvar predicates are used to check
whether an expression is a function symbol (built-in or defined by the
loaded script, can also be a constant symbol), or a free variable
symbol, respectively. The isdef function can be used to check
whether its argument (a variable) has been assigned a value using
def.
The flip function exchanges arguments of a (binary) function, as
if it was defined by the following equation:
flip F X Y = F Y X; |
This function is used internally to implement operator sections with
missing left argument. E.g., (+1) is nothing but syntactic sugar
for flip (+) 1. You can apply flip to other, user-defined
functions as well; thus, flip foo X denotes the function which
maps Y to foo Y X.
In order to handle special forms like, e.g., (or else) correctly,
flip automagically adjusts to the argument pattern of its first
argument. That is, if the function given as the first argument (which is
always evaluated) has a special first (or second) argument, then
flip's second (or first) argument will be special as well.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This chapter gives an overview of the data types and operations provided
by the standard library modules supplied with the Q programming system.
Most of these scripts are already "preloaded" when you run the
interpreter, because they are included in the "prelude" script
prelude.q. The exact collection of library scripts may depend on
your local setup, but a basic installation of the Q core package should
at least provide the modules listed below. In this list, (*) indicates
"external" modules which are (mostly) implemented in C, whereas (+)
denotes modules which have to be imported explicitly, as they are not
included in the prelude.
assert.qprint diagnostics
clib.qbasic C interface (*)
complex.qcomplex numbers
cond.qconditional expressions and tuple/list/stream comprehensions
error.qprint error messages
getopt.qGNU getopt compatible option parser (+)
graphics.qPostScript graphics interface (+)
math.qmathematical functions
prelude.qstandard prelude
rational.qrational numbers
reftypes.qtuples, lists and streams of references (+)
sort.qalgorithms to sort a list
stddecl.qshared declarations of the container data structures, cf. stdtypes.q (+)
stdlib.qa collection of common standard functions
stdtypes.qa collection of efficient container data structures (+)
stream.qa lazy list data structure
string.qadditional string functions
system.qPOSIX system interface (*) (+)
tuple.qadditional tuple functions
typec.qtype-checking predicates
Here is a quick rundown of the most important modules: The
prelude.q script implements the standard prelude, which loads the
entire collection (except the modules marked with (+) in the list above,
which must be imported explicitly). It adds some convenient operations
for comparing lists, streams and tuples, the syntactic inequality
operator `!=', as well as successor and predecessor functions on
integers (succ and pred) and default implementations of
various functions for subtypes derived from the Real type. It
also contains definitions for some cases of list, tuple and stream
enumerations which are not provided as builtins.
Specifically, the standard prelude overloads the =, <>,
<, >, <= and >= operators with rules for
deciding equality and inequality of lists, streams and tuples, and rules
for ordering lists and streams lexicographically. That is, tuples, lists
and streams can be compared with = and <>, which is done
by checking that the operands are of the same size and recursively
comparing all members of the operands. Lists and streams can also be
compared lexicographically by recursively comparing the list or
stream members; the first unequal members decide the comparison. This
works just like the lexicographic comparison of strings. Thus, e.g.,
[1,2] is considered to be less than both [1,3] and
[1,2,1], but more than [1,1,3] or [0,5].
Moreover, the prelude defines the syntactic inequality operator `!=' as the logical negation of the built-in `==' operator, cf. Non-Linear Equations. Like `==', `!=' is implemented as a special form.
Last but not least, the standard prelude also provides default
definitions for arithmetic, relational, numeric and conversion functions
on Real, so that a minimal implementation of a user-defined type
derived from Real (or any of its subtypes) just needs to define
the float function (which is needed to coerce values of the type
to Float) and then add definitions for those operations which
need special treatment. The default definitions in prelude.q are
at the lowest priority level so that they can always be overridden if
needed.
The stdlib.q script has those operations which will probably be
used most frequently by the average programmer; it contains a lot of
additional list processing functions and other useful stuff mostly
adopted from [Bird/Wadler 1988]. This module is also included by many
other library scripts. Historically, it was the first script written
for the standard library.
The stream.q script extends most list operations defined in
stdlib.q to streams, i.e., lazy lists. It also provides a number
of other useful stream operations.
Similarly, the string.q and tuple.q scripts provide
additional operations on strings and tuples, and also extend some list
operations to these data types.
The stdtypes.q script simply includes all modules which implement
common container data structures; currently these are array.q,
bag.q, dict.q, hdict.q, heap.q and
set.q, see Standard Types. Some declarations shared by
these modules are in the stddecls.q script.
The complex.q and rational.q scripts implement complex and
rational number types along with the usual operations on these
types. These have been designed to integrate seamlessly into Q's system
of built-in number types so that expressions involving arithmetic
operations with any combination of integer, floating point, complex and
rational arguments work as expected.
The clib.q and system.q modules give access to various
useful system functions from the C library, including C-style formatted
I/O, binary file I/O, process and thread management,
internationalization support and regular expression routines. The
clib.q module also provides mutable expression cells
("references"), as well as a "byte string" data type which lets you
represent arbitrary C data, and it overrides various important standard
library functions with much faster C versions. The operations of these
modules are mostly written in C, using the C language interface
discussed in C Language Interface. These modules are described in
their own chapter, see Clib.
As of Q 7.8, the reftypes.q module provides additional
convenience operations to work with mutable expression sequences
implemented as tuples, lists or streams of references. These are also
described in the "Clib" chapter, see Expression References.
The Q programming system comes bundled with some additional modules for interfacing to various third-party libraries and software packages. At the time of this writing, these are:
Preliminary documentation for these modules can be found in the
etc subdirectory of your Q installation directory (usually in
/usr/share/q or /usr/local/share/q). Other useful add-on
modules are available as separate source packages. At the time of this
writing this comprises additional GUI libraries (including a complete
interface to Trolltech's very nice "Qt" toolkit), a bunch of graphics
and multimedia modules (including interfaces to OpenGL, OpenAL, the Xine
video library, and Grame's MidiShare and Faust) and more. Ready-made
plugins for using the Q interpreter from the "Chicken" Scheme
compiler, the Apache webserver and Miller Puckette's graphical computer
music and multimedia environment "PureData" are also available. Please
check out the Q website at http://q-lang.sourceforge.net for more
information about these. (The Q module for Chicken, which was written by
John Cowan, is available separately from the Chicken website, see
http://www.call-with-current-continuation.org.)
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The stdlib.q script provides frequently used list operations and
other stuff mostly adopted from [Bird/Wadler 1988].
abs Xabsolute value of X
all P Xsverify that each element of the list Xs satisfies the predicate
P
any P Xsverify that the list Xs contains an element satisfying predicate
P
append Xs Xappend a value X to the list Xs
cat Xsconcatenate a list of lists
cons X Xsprepend an element to a list
cst Xconstant-valued function: cst X Y ⇒ X
curry Fcurry a function: turn a function operating on pairs into a function with two arguments
curry3 Fcurry with three arguments
do F Xsapply a function F to each member of a list Xs, return
()
dowith F Xs Ystake two lists and apply a binary function to corresponding elements,
return ()
dowith3 F Xs Ys Zsdowith with three lists
drop N Xsremove the first N elements from the list Xs
dropwhile P Xsremove elements from the beginning of Xs while the predicate P
is satisfied
eq X Ysyntactic equality (cf. Non-Linear Equations)
filter P Xsfilter a list with a predicate
foldl F A Xsfold-left
foldl1 F Xsfold-left over nonempty lists
foldr F A Xsfold-right
foldr1 F Xsfold-right over nonempty lists
hd Xsreturn the head element of a list
hds Xsreturn the list of all head elements in a list of lists
idthe identity function: id X ⇒ X
init Xsreturn list Xs without its last element
iter N F Agenerate the list of the first N values A, F A,
F (F A), …
last Xsreturn the last element of a list
map F Xsapply function F to each member of a list
max X Ymaximum of two values
min X Yminimum of two values
mklist X Ncreate a list of N X's
neg Pnegate a predicate
neq X Ysyntactic inequality
null Xscheck whether a list is empty ([])
nums N Mgenerate a list of numbers in a given range
numsby K N Mgenerate a list of numbers with a given step size
pop Xsremove the head element from a list
prd Xsproduct of a list of numbers
push Xs Xprepend an element to a list (cons with arguments reversed)
reverse Xsreverse a list
scanl F A Xsapply foldl to every initial part of a list
scanl1 F Xsapply foldl1 to every nonempty initial part of a list
scanr F A Xsapply foldr to every final part of a list
scanr1 F Xsapply foldr1 to every nonempty final part of a list
sgn Xsign of a number
sum Xssum of a list of numbers
take N Xsselect the first N elements from the list Xs
takewhile P Xsselect elements from the beginning of Xs while the predicate P
is satisfied
tl Xsremove the head element from a list
tls Xsreturn a list of lists with all head elements removed
top Xsreturn the head element from a list
transpose Xstranspose a list of lists
uncurry Funcurry a function: turn a function with two arguments into a function operating on pairs
uncurry3 Funcurry with triples
until P F Xrepeat applying F to X until P is satisfied
unzip Xstransform a list of pairs into a pair of lists
unzip3 Xsunzip with triples
while P F Alist repeated applications of F to A while P is
satisfied
zip Xs Ystake two lists and return a list of corresponding pairs
zip3 Xs Ys Zszip with three lists
zipwith F Xs Ystake two lists and map a binary function to corresponding elements
zipwith3 F Xs Ys Zszipwith with three lists
Note that some of these functions are actually overridden with much
faster C versions in the clib.q module; see C Replacements for Common Standard Library Functions.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The tuple.q script provides some additional functions operating
on tuples:
fst Xsreturn first element of a tuple
mktuple X Ncreate a tuple of N X's
pair X Yconstruct a pair
snd Xsreturn second element of a tuple
trd Xsreturn third element of a tuple
triple X Y Zconstruct a triple
tuplecat Xsconcatenate a list of tuples
Moreover, the following list functions from stdlib.q are
overloaded to also work on tuples: append, cat,
cons, do, dowith, dowith3, map,
null, pop, push, reverse, top. Most
of these return tuples when applied to such arguments, except
null (which returns a truth value), the do/dowith
functions (which return ()) and cat (which always returns
a list; use tuplecat if you want to concatenate a collection of
tuples instead).
Also note that the reverse and tuplecat operations are
actually overridden with much faster C versions in the clib.q
module; see C Replacements for Common Standard Library Functions.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The string.q script provides a collection of additional string
functions. Currently the following operations are implemented (most of
these operations are actually overridden with much more efficient C
implementations in the clib.q module; see C Replacements for Common Standard Library Functions):
chars Sreturn the list of individual characters in S
join DELIM Xsconcatenate a list of strings, interpolating the given DELIM string
between each pair of consecutive strings in the list
mkstr S Ncreate a string consisting of N copies of the given string S
split DELIM Ssplit a string into a list of substrings delimited by characters in the
given DELIM string
strcat Xsconcatenate a list of strings
Moreover, as of Q 7.8 this module overloads all list operations in
stdlib.q so that they work on strings as expected. This lets you
use strings mostly as if they were just another kind of list value. For
instance:
==> take 3 "abcdef"
"abc"
==> dropwhile (<="c") "abcdef"
"def"
==> tl "abcdef"
"bcdef"
==> zip "abcdef" "ABCDEF"
[("a","A"),("b","B"),("c","C"),("d","D"),("e","E"),("f","F")]
==> map (+1) "HAL"
"IBM"
==> map ord "HAL"
[72,65,76]
==> [C-1 : C in "IBM"]
["H","A","L"]
|
(As the last example shows, strings can also serve as sources in the binding clauses of list comprehensions; cf. Conditionals and Comprehensions.)
Here's a somewhat more practical example which illustrates the use of various list operations on strings to build a ROT13 translation table:
import dict;
def CHARS = strcat ["a".."z"], RCHARS = drop 13 CHARS ++ take 13 CHARS,
ROT13 = dict $ zip (CHARS++toupper CHARS) (RCHARS++toupper RCHARS);
rot13 C:Char = C where C:Char = ROT13!C;
= C otherwise;
rot13 S:String = map rot13 S;
|
(This employs the dict function to create a dictionary mapping
lower- and uppercase letters to their equivalents in the ROT13 encoding;
see Dictionaries.) Example:
==> CHARS; RCHARS "abcdefghijklmnopqrstuvwxyz" "nopqrstuvwxyzabcdefghijklm" ==> rot13 "This is an encoded string." "Guvf vf na rapbqrq fgevat." ==> rot13 _ "This is an encoded string." |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This module provides one function, getopt, which takes two
arguments: OPTS, a list of option descriptions in the format
described below, and ARGS, a list of strings containing the
command line parameters to be parsed for options. The result is a pair
(OPTS,ARGS) where OPTS is a list of pairs of options and
their arguments (if any; missing arguments are returned as ()),
and ARGS is the list of remaining (non-option) arguments. Options
are parsed using the rules of GNU getopt(1). If an invalid option
is encountered (unrecognized option, missing or extra argument, etc.),
getopt throws the offending option string as an exception.
The OPTS argument of getopt is a list of triples
(LONG,SHORT,FLAG), where LONG denotes the long option,
SHORT the equivalent short option, and FLAG is one of the
symbolic integer values NOARG, OPTARG and REQARG
which specifies whether the option has no argument, an optional argument
or a required argument, respectively. In the returned option-value list,
all options will be represented using their long option equivalents.
Also note that both the long and short option values in the OPTS
argument may actually be of any type, so unneeded options may be
replaced with corresponding dummy values. You only have to make sure
that the values given for the long options allow you to identify which
option was actually specified.
Finally, please note that, as already pointed out, this script does
not belong to the prelude, and hence has to be imported
explicitly if you want to use the getopt operation in your
program.
Example (see C-Style Formatted I/O, for a description of
printf and fprintf):
import getopt;
def OPTS = [("--help", "-h", NOARG),
("--foo", "-f", REQARG),
("--bar", "-b", OPTARG)];
invalid_option PROG OPT
= fprintf ERROR "%s: invalid option %s\n" (PROG,OPT) ||
halt;
test [PROG|ARGS]
= printf "OPTS = %s\nARGS = %s\n" (str OPTS, str ARGS)
where (OPTS,ARGS) = catch (invalid_option PROG) $
getopt OPTS ARGS;
|
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The typec.q script contains a collection of predicates which
check whether an expression is of a given type. The following functions
perform a purely syntactic check; they are all equivalent to the
corresponding type guards:
isbool Xcheck for truth values
ischar Xcheck for single character strings
iscomplex Xcheck for complex numbers (cf. Complex Numbers)
isexcept Xcheck for Exception values (cf. Exception Handling)
isfile Xcheck for file objects
isfloat Xcheck for floating point numbers
isfunction Xcheck for lambda functions
isint Xcheck for integers
islist Xcheck for lists
isnum Xcheck for numbers
isrational Xcheck for rational numbers (cf. Rational Numbers)
isreal Xcheck for real numbers
isstr Xcheck for strings
issym Xcheck for function and variable symbols (special form)
istuple Xcheck for tuples
Note that the syntactic number predicates isint, isfloat
etc. only check for a given representation. The typec.q module
also provides various predicates which classify numbers according to the
kind of abstract mathematical object they represent, regardless of the
concrete representation:
iscompval Xcheck for complex values
isintval Xcheck for integer values
isratval Xcheck for rational values
isrealval Xcheck for real values
These predicates work mostly like their Scheme counterparts. E.g.,
3 will be classified not only as an integer value, but also as a
rational, real and a complex value. A complex number with zero imaginary
part may also be classified as an integer, rational or real, depending
on its real part. A floating point number with zero fractional part and
a rational number with a denominator of 1 are both also
classified as an integer value.
Moreover, the following predicates allow you to check whether a number is exact or inexact (inexact numbers generally involve floating point values), and whether a number represents an IEEE infinity or NaN ("not a number") value:
isexact Xcheck for exact numbers
isinexact Xcheck for inexact numbers
isinf Xcheck for infinite floating point values
isnan Xcheck for NaN floating point values
Finally, two other special-purpose predicates provided in typec.q
are:
isenum Xcheck for enumeration type members
issym Xcheck for function and variable symbols (special form)
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The sort.q script provides mergesort and quicksort algorithms for
sorting a list using a given order predicate:
msort P Xsmergesort algorithm
qsort P Xsquicksort algorithm
The mergesort algorithm is more involved than quicksort, but may run
much faster if input lists are large enough and are already
partially sorted. Both algorithms take an order predicate as their first
argument, which makes it possible to sort lists using different
criteria. The order predicate must be a function accepting two
arguments, and must return true iff the first argument is
strictly less than the second. For instance, to sort a list in ascending
order, you could say:
==> qsort (<) [1,5,3,2,4] [1,2,3,4,5] |
By reversing the order predicate, the list is sorted in descending order:
==> qsort (>) [1,5,3,2,4] [5,4,3,2,1] |
Custom order predicates also allow you to sort a list according to different sort keys. For instance, the following example shows how to sort a list of pairs using the first component of each pair as the sort key:
==> def L = [(1,2),(5,1),(3,3),(1,1),(4,5)] ==> var le1 = \X Y.fst X < fst Y ==> qsort le1 L [(1,2),(1,1),(3,3),(4,5),(5,1)] |
Both algorithms provided by this module are "stable", i.e., they preserve the relative order of list elements with "equal" sort keys. This is important when successive sorts are used to order elements according to different criteria. For instance:
==> var le2 = \X Y.snd X < snd Y ==> qsort le2 L [(5,1),(1,1),(1,2),(3,3),(4,5)] ==> qsort le1 _ [(1,1),(1,2),(3,3),(4,5),(5,1)] |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Please note that, as of Q 7.8 this module is no longer included in the
prelude, so you have to import it explicitly if your script needs one of
the data types described here. It is also possible to just import the
individual data type modules (array.q, bag.q, etc.) if you
do not need the entire collection.
The stdtypes.q script implements a collection of efficient
container data structures, which currently comprises arrays, heaps
(priority queues), ordered sets and bags, and (ordered as well as
hashed) dictionaries. The different data types are actually implemented
in the scripts array.q, bag.q, dict.q,
hdict.q, heap.q and set.q. Many oper