[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Clib
is the C interface and "system" module of the Q
programming language. As of Q 7.8, this component actually consists of
two different modules, clib.q
which provides basic C data
structures and routines commonly used in most Q programs, and
system.q
which contains most of the POSIX system interface. Only
clib.q
is part of the prelude; for most system functions, you
will have to explicitly import system.q
in your programs. In the
sections below, we will always indicate whether the system.q
module is needed for the described operations.
In difference to the other standard library modules, clib
and
system
are external modules, i.e., most functions are
actually implemented in C (cf. C Language Interface). Together,
clib
and system
provide additional string operations,
extended file functions, C-style formatted I/O, low-level and binary
I/O, an interface to various system functions, POSIX thread functions,
expression references, time functions, internationalization support,
filename globbing and regular expression matching, additional integer
functions from the GMP library, and, last but not least, efficient C
replacements for some common standard library list and string processing
functions.
Even if you do not use the extra functionality provided by these
modules, you will benefit from the replacement operations (which are in
clib
and thus included in the prelude), which considerably speed
up basic list and string processing, sometimes by several orders of
magnitude.
NOTE: Not all of the following operations are implemented on all
systems. The UNIX-specific operations are marked with the symbol
`(U)' in the clib.q
and system.q
scripts. Only a
portable subset of the UNIX system interface is provided, which
encompasses the most essential operations found on many recent UNIX (and
other POSIX) systems, as described by the ANSI C and POSIX standards as
well as the Single UNIX Specification (SUS). These operations are also
available on Linux and OSX systems.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Clib
defines an abundance of symbolic values for use with various
system functions. Most of these can be found in system.q
, but a
few constants related to the memory sizes of various basic C data types
and the fseek
and setvbuf
operations can also be found in
clib.q
.
System constants actually vary from system to system; only the most
common values are provided as global variables here. The variables are
declared const
(read-only) and are initialized at startup
time. Flag values can be combined using bitwise logical operations as
usual. A complete list of the variables can be found at the beginning of
the clib.q
and system.q
scripts. Flag values which are
unavailable on the host system will be set to zero, other undefined
values to -1. Thus undefined values will generally have no effect or
cause the corresponding operations to fail.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
These functions provide an interface to some familiar character routines
from the C library. They can all be found in clib.q
and are thus
in the standard prelude.
Character predicates: These work exactly like the corresponding C library routines, except that they work with arbitrary Unicode, not just ASCII, characters, provided that the interpreter has been built with Unicode support.
public extern islower C, isupper C, isalpha C, isdigit C, isxdigit C, isalnum C, ispunct C, isspace C, isgraph C, isprint C, iscntrl C, isascii C; |
String conversion: Convert a string to lower- or uppercase (like the corresponding C functions, but work on arbitrary Unicode strings, not just on single ASCII characters).
public extern tolower S, toupper S; |
Count the number of alphanumeric characters in a text:
==> #filter isalnum (chars "The little brown fox.\n") 17 |
Convert a string to uppercase:
==> toupper "The little brown fox.\n" "THE LITTLE BROWN FOX.\n" |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The following type represents unstructured binary data implemented as C
byte vectors. This data structure is used by the low-level I/O functions
and other system functions which operate on binary data. The
ByteStr
type itself and its operations are implemented in
clib.q
and thus included in the prelude.
public extern type ByteStr; public isbytestr B; // check for byte strings |
Byte strings are like ordinary character strings, but they do not have a
printable representation, and they may include zero bytes. (Recall that a
zero byte in a character string terminates the string.) They can be used
to encode arbitrary binary data such as C vectors and structures. The
bytestr
function can be used to construct byte strings from
integers, floating point numbers, string values or lists of unsigned
byte values:
public extern bytestr X; // create a byte string |
The X
argument denotes the data to be encoded and can be either a
list of byte values (unsigned integers in the range from 0 to 255), or
an atomic data object, i.e., an integer, floating point number or string
constant. In the latter case, the argument can also have the form
(X,SIZE)
indicating the desired byte size of the object;
otherwise a reasonable default size is chosen. If the specified size
differs from the actual size of X
, the result is zero-padded or
truncated accordingly. Integer values are encoded in the host byte
order, with the least significant GMP limb first; negative integers are
represented in 2's complement. Floating point values are encoded using
double precision by default or if the byte count is sufficient (i.e., at
least 8 on most systems), and using single precision otherwise. Strings
are by default encoded in the system encoding, but you can also specify
the desired target encoding as (X,CODESET)
(or
(X,CODESET,SIZE)
if you also need to specify a byte size), where
CODESET
is a string denoting the target encoding.
Like ordinary character strings, byte strings can be concatenated,
size-measured, indexed, sliced and compared lexicographically. Moreover,
a byte string can be converted back to a (multiprecision) integer,
floating point number, string value, or a list of byte values. (When
converting back to a string you can specify the source encoding as in
bstr (B,CODESET)
, otherwise the system encoding is assumed.) For
these purposes the following operations are provided.
public extern bcat Bs; // concatenate list of byte strings public extern bsize B; // byte size of B public extern byte I B; // Ith byte of B public extern bsub B I J; // slice of B (bytes I..J) public extern bcmp M1 M2; // compare M1 and M2 public extern bint B; // convert to unsigned integer public extern bfloat B; // convert to floating point number public extern bstr B; // convert to string public bytes B; // convert to list public ::list B; // dito |
You can use the bytes
function to convert a byte string to a list
of byte values; the list
function is overloaded to provide the
same functionality. These functions are defined as follows:
bytes B:ByteStr = map (B!) [0..#B-1]; list B:ByteStr = bytes B; |
For convenience, the common string operators and the sub
function
are overloaded to work on byte strings as well. Thus #B
returns
the size of B
(the number of bytes it contains) and B!I
the I
th byte of B
. B1++B2
concatenates B1
and B2
, sub B I J
returns the slice from byte I
to
J
, and the relational operators `=', `<', `>'
etc. can be used to compare byte strings lexicographically. These
operations are all implemented in terms of the functions listed above.
As of Q 7.11, clib
supports a number of additional operations
which allow you to treat byte strings as mutable C vectors of
signed/unsigned 8/16/32 bit integers or single/double precision floating
point numbers. The following functions provide read/write access to
elements and slices of such C vectors:
public extern get_int8 B I, get_int16 B I, get_int32 B I; public extern get_uint8 B I, get_uint16 B I, get_uint32 B I; public extern get_float B I, get_double B I; public extern put_int8 B I X, put_int16 B I X, put_int32 B I X; public extern put_uint8 B I X, put_uint16 B I X, put_uint32 B I X; public extern put_float B I X, put_double B I X; |
Note that the given index argument I
is interpreted relative to
the corresponding element type. Thus, e.g., get_int32 B I
returns
the I
th 32 bit integer rather than the integer at byte offset
I
. Also note that integer arguments must fit into machine
integers, otherwise these operations will fail. Integers passed for
floating point arguments will be coerced to floating point values
automatically.
For the get_xxx
functions, the index parameter may also be a pair
(I,J)
to return a slice of the given byte string instead of a
single element (this works like sub
/bsub
, but interprets
indices relative to the element type). The put_xxx
functions also
accept a byte string instead of an element as input, and will then
overwrite the corresponding slice of the target byte string B
with the given source byte string X
. Similar to
sub
/bsub
, these variations of
get_xxx
/put_xxx
are "safe" in that they automatically
adjust the given indices to fit within the bounds of the target byte
string.
Moreover, the following convenience functions are provided to convert between byte strings and lists of integer/floating point elements.
public extern int8_list B, int16_list B, int32_list B; public extern uint8_list B, uint16_list B, uint32_list B; public extern float_list B, double_list B; public extern int8_vect Xs, int16_vect Xs, int32_vect Xs; public extern uint8_vect Xs, uint16_vect Xs, uint32_vect Xs; public extern float_vect Xs, double_vect Xs; |
Encode an integer as a byte string, take a look at its individual bytes, and convert the byte string back to an integer:
==> hex ==> def B = bytestr 0x01020304; bytes B; bint B [0x4,0x3,0x2,0x1] 0x1020304 |
(Note that this result was obtained on a little-endian system, hence the
least significant byte 0x04
comes first in the byte list.)
Negative integers are correctly encoded in 2's complement:
==> def B = bytestr (-2); bytes B; bint B [0xfe,0xff,0xff,0xff] 0xfffffffe |
To work with these binary representations you must be aware of the way
GMP represents multiprecision integers. In particular, note that the
default size of an integer is always a multiple (at least one) of GMP's
limb size which is usually 4 or 8 bytes depending on the host system's
default long
integer type. The actual limb size can be determined
as follows:
==> #bytes (bytestr 0) |
In order to get integers of arbitrary sizes, an explicit SIZE
argument may be used. For instance, here is how we encode small (1 or 2
byte) integers:
==> bytes (bytestr (0x01,1)); bytes (bytestr (0x0102,2)) [0x1] [0x2,0x1] |
The host system's byte sizes of various atomic C types can be determined
with symbolic values declared at the beginning of clib.q
, such as
SIZEOF_CHAR
, SIZEOF_SHORT
, SIZEOF_LONG
,
SIZEOF_FLOAT
and SIZEOF_DOUBLE
.
Another fact worth mentioning is that even on big-endian systems,
integers are always encoded with the "least significant limb"
first. So, for instance, given that the limb size is 4, as in the above
examples, the 2-limb integer 0x0102030405060708
consists of bytes
0x8 0x7 0x6 0x5 0x4 0x3 0x2 0x1
on a little-endian system, in
that order, whereas the byte order on a big-endian system is 0x5
0x6 0x7 0x8 0x1 0x2 0x3 0x4
.
Here is how we can quickly check the byte order of the host system:
==> hd (bytes (bytestr 1)) |
This expression returns 1 on a little-endian system and zero otherwise.
As long as an integer does not exceed the machine's word size (which usually matches the limb size), we can simply convert between big-endian and little-endian representation by reversing the byte list:
==> bytestr (reverse (bytes B)) |
Floating point values can be encoded either in double or single
precision, depending on the SIZE
argument. The default size is
double precision (usually 8 bytes).
==> bfloat (bytestr (1/3)); bfloat (bytestr (1/3,SIZEOF_FLOAT)) 0.333333333333333 0.333333343267441 |
The default size of the encoding of a character string is the byte size of the string in the target encoding (the system encoding by default). If an explicit size is given, the string is zero-padded or truncated if necessary. The following example will work with any system encoding based on 7 bit ASCII (like Latin1, UTF-8 or ASCII itself):
==> dec ==> def S1 = bytestr "ABC", S2 = bytestr ("ABC",2), S3 = bytestr ("ABC",5) ==> bytes S1; bytes S2; bytes S3 [65,66,67] [65,66] [65,66,67,0,0] ==> bstr S1; bstr S2; bstr S3 "ABC" "AB" "ABC" |
By combining elements like the ones above, and including appropriate
"tagging" information, more complex data structures can be represented
as binary data as well. For this purpose, the byte strings of the tags
and the data elements can be concatenated with bcat
or the
`++' operator. This is useful, in particular, for compact storage
of objects in files. Moreover, some system functions involve binary data
which might represent C structures and/or vectors. Such data can be
assembled from the constituent parts by simply concatenating them. For
instance, consider the following C struct:
struct { char foo[108]; short bar; int baz; }; |
A value of this type, say {"Hello, world.", 4711, 123456}
, can
then be encoded as follows:
==> bytestr ("Hello, world.",108) ++ bytestr (4711,SIZEOF_SHORT) ++ \ bytestr (123456,SIZEOF_INT) |
Similarly, a list of integers can be converted to a corresponding C vector as follows:
==> bcat (map bytestr [1..100]) |
When encoding such C structures you must also consider alignment issues. For instance, most C compilers will align non-byte data at even addresses.
In order to facilitate the handling of C vectors of integers and
floating point values, as of Q 7.11 clib
offers a number of
specialized operations which provide direct read/write access to
elements and slices of numeric vectors, and allow you to convert between
C vectors and Q lists of integer or floating point values. These
operations are all implemented directly in C and will usually be much
more efficient for manipulating numeric C vectors than the basic
byte-oriented functions. Moreover, they allow you to modify the elements
of a C vector in a direct fashion, turning byte strings into a mutable
data structure.
Different operations are provided to handle vectors of signed or unsigned 8/16/32 bit (machine) integers, as well as single (32 bit) and double precision (64 bit) floating point numbers. For instance:
==> def B = uint32_vect [100..110] ==> uint32_list B [100,101,102,103,104,105,106,107,108,109,110] ==> get_uint32 B 1 101 ==> put_uint32 B 1 0xffffffff () ==> uint32_list B [100,4294967295,102,103,104,105,106,107,108,109,110] |
Note that, because these C vectors are just normal byte strings, you can freely convert between different representations of the numeric data. E.g.:
==> take 12 $ int8_list B [100,0,0,0,-1,-1,-1,-1,102,0,0,0] |
Entire slices of byte strings can be retrieved and overwritten as
well. Note that, as with sub
, the indices are adjusted
automatically to stay within the bounds of the target vector.
==> put_uint32 B (-2) (uint32_vect [90..94]) () ==> uint32_list B [92,93,94,103,104,105,106,107,108,109,110] ==> uint32_list $ get_uint32 B (-2,3) [92,93,94,103] ==> uint32_list $ get_uint32 B (8,100) [108,109,110] |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Clib
provides the following enhanced and additional file functions:
public extern ::fopen NAME MODE, fdopen FD MODE, freopen NAME MODE F; public extern fileno F; public extern setvbuf F MODE; public extern fconv F CODESET; public extern tmpnam, tmpfile; public extern ftell F, fseek F POS WHENCE; public rewind F; public extern gets, fgets F; public extern fget F; public extern ungetc C, fungetc F C; |
These are all defined in clib.q
and thus included in the prelude.
The fopen
version of clib
handles the `+' flag in mode
strings, thus enabling you to open files for both reading and writing. The
mode "r+"
opens an existing file for both reading and writing; the
initial file contents are unchanged, and both the input and output file
pointers are positioned at the beginning of the file. The "w+"
mode
creates a new file, or truncates it to zero size if it already exists, and
positions the file pointers at the beginning of the file. The "a+"
mode
appends to an existing file (or creates a new one); the initial file pointer
is set at the beginning of the file for reading, and at the end of the file
for writing. All these modes also work in combination with the b
(binary file) flag.
The freopen
function is like fopen
, but reopens an
existing file object on another file. Just as in C programming, the main
purpose of this operation is to enable the user to redirect the standard
I/O streams associated with the interpreter process (available in the
interpreter by means of the INPUT
, OUTPUT
and ERROR
variables).
The fdopen
function opens a new file object on a given file
descriptor, given that the mode is compatible. Conversely, the
fileno
function returns the file descriptor of a file
object. (See also the functions for direct file descriptor manipulation
in Low-Level I/O.)
The setvbuf
function sets the buffering mode for a file
(IONBF
= no buffering, IOLBF
= line buffering,
IOFBF
= full buffering). This operation should be invoked right
after the file has been opened, before any I/O operations are
performed.
The fconv
function sets the encoding of a file. By default, Q's
built-in I/O operations as well as clib
's string I/O functions
assume the system encoding, and convert between this encoding and the
internal UTF-8 string representation as needed. If a text file uses an
encoding different from the system encoding, you can use the
fconv
function to set the desired encoding. CODESET
must
be a string denoting a valid encoding name for the iconv
function
(see also Internationalization, below). This affects all
subsequent text read/write operations on the file. (This operation only
works for Unicode-capable systems which have iconv
installed. Also note that this function is only available for Q 7.0 and
later.)
The tmpnam
and tmpfile
functions work just like the
corresponding C routines: tmpnam
returns a unique name for a temporary
file, and tmpfile
constructs a temporary file opened in "w+b"
mode, which will be deleted automatically when it is closed. See the
tmpnam
(3) and tmpfile
(3) manual pages for details.
The ftell
/fseek
functions are used for file
positioning. The ftell
function returns the current file
position, while fseek
function positions the file at the given
position. The rewind
function provides a convenient shorthand for
repositioning the file at the beginning. These operations work just like
the corresponding C functions. The WHENCE
argument of
fseek
determines how the POS
argument is to be
interpreted; it can be either SEEK_SET
(POS
is relative to
the beginning of the file, i.e., an absolute position), SEEK_CUR
(POS
is relative to the current position) or SEEK_END
(POS
is relative to the end of the file). In the latter two cases
POS
can also be negative.
Portability Notes:
fflush
or
fseek
before switching between reading and writing on a file opened
with the `+' flag.
ftell
and fseek
might only work reliably if
the file is opened in binary mode (b
flag).
The gets
/fgets
functions work like the C fgets
function, i.e., they read a line from standard input or the given file
including the trailing newline, if any. The fget
function
reads an entire file at once and returns it as a string. The
ungetc
/fungetc
functions push back a single character on
standard input or the given input file, like the C ungetc
function. The C library only guarantees that pushing back a single ASCII
character will work, so the result of pushing back multiple or multibyte
characters is implementation-dependent.
Moreover, the following additional aliases are provided for C aficionados:
public ::readc as getc, ::freadc F as fgetc; public ::writes S as puts, ::fwrites F S as fputs; public ::writec C as putc, ::fwritec F C as fputc; |
Open a new file for both reading and writing: ==> def F = fopen "test" "w+" Write a string to the file: ==> fwrites F "The little brown fox.\n" () Current position is behind written string (at end-of-file): ==> ftell F 22 Rewind (go to the beginning of the file): ==> rewind F () Read back the string we've written before: ==> fgets F "The little brown fox.\n" Check that we're again at end-of-file: ==> feof F true Output another string: ==> fwrites F "The second line.\n" () Position behind the first string: ==> fseek F 22 SEEK_SET () Reread the second string: ==> fgets F "The second line.\n" And here's how to read an entire text file at once: ==> def T = fget (fopen "clib.q" "r") To quickly compute a 32 bit checksum of the file: ==> sum (bytes (bytestr T)) mod 0x100000000 3937166 |
Finally, let's split the text into lines and add line numbers using
sprintf
(see C-Style Formatted I/O):
==> def L = split "\n" T ==> def L = map (sprintf "%3d: %s\n") (zip [1..#L] L) ==> do writes (take 5 L) 1: 2: /* clib.q: Q's system module */ 3: 4: /* This file is part of the Q programming system. 5: () |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
These functions provide an interface to the C printf
and
scanf
routines. They are all defined in clib.q
and thus
included in the prelude.
public extern printf FORMAT ARGS, fprintf F FORMAT ARGS, sprintf FORMAT ARGS; public extern scanf FORMAT, fscanf F FORMAT, sscanf S FORMAT; |
Arguments to the printf
routines and the results of the scanf
routines are generally encoded as tuples or single non-tuple values (if only
one item is read/written).
All the usual conversions and flags of C printf
/scanf
are
supported, except %p
(pointer conversion). The basic h
and
l
length modifiers are also understood, but not the fancy ISO C99
extensions like ll
or hh
, or l
modifiers on characters
and strings. Two further unsupported features of the printf
functions
are the %n
(number of written characters) conversion and explicit
argument indexing (m$
); thus all arguments have to be in the same order
as specified in the printf
format string. The %n
conversion
is implemented for the scanf
functions, though.
As these functions are simply wrappers for the corresponding C functions,
integer conversions are generally limited to values which fit into machine
integers. To handle integers of arbitrary sizes, you might treat them as
strings (%s
) in the format string and do the actual conversion manually
with val
or str
.
Seasoned C programmers will appreciate that the wrapper functions
provided here are safe in that they check their arguments and
prevent buffer overflows, so they should never crash your program with a
segfault. To these ends, if a %s
or %[...]
conversion
without maximum field width is used with scanf
, the field width
will effectively be limited to some (large) value chosen by the
implementation.
See the printf
(3) and scanf
(3) manual pages for a description of
the format string syntax. Some basic examples follow (<CR>
indicates
that you hit the carriage return key to terminate a line):
==> printf "%d\n" 99 99 () ==> printf "%d\n" (99) 99 () ==> printf "%s %s %d\n" ("foo","bar",99) foo bar 99 () ==> scanf "%d" 99<CR> 99 ==> scanf "%s %s %d" foo bar 99<CR> ("foo","bar",99) |
As indicated, multiple values are denoted as tuples, and the printf
function accepts both a single value or a one-tuple for a single
conversion. The scanf
function always returns a single, non-tuple value
if only a single conversion is specified. Zero items are represented using the
empty tuple. Note that you always have to supply the ARGS
argument of
printf
, thus you specify an empty tuple if there are no output
conversions:
==> printf "foo\n" () foo () |
The scanf
function also returns an empty tuple if no input items are
converted. For instance (as usual, using the *
flag with a scanf
conversion suppresses the corresponding input item):
==> scanf "%*s" foo<CR> () |
Note that while scanf
for most conversions skips an arbitrary
amount of leading whitespace, the trailing whitespace character at which
a conversion stops is not discarded by scanf
. You can
notice this if you invoke, e.g., readc
afterwards:
==> scanf "%s %d"; readc foo 99<CR> ("foo",99) "\n" |
If you really have to skip the trailing whitespace character, you can do this with a suppressed character conversion, e.g.:
==> scanf "%s %d%*c"; writes "input: "||reads foo 99<CR> ("foo",99) input: <reads function waiting for input here> |
The fprintf
/fscanf
functions work analogously, but are used when
writing or reading an arbitrary file instead of standard output or input. For
instance:
==> var msg = "You're not supposed to do that!" ==> fprintf ERROR "Error: %s\n" msg Error: You're not supposed to do that! () |
The sprintf
function returns the formatted text as a string instead of
writing it to a file:
==> sprintf "%s %s %d\n" ("foo","bar",99) "foo bar 99\n" |
Likewise, sscanf
takes its input from a string:
==> sscanf "foo bar 99\n" "%s %s %d" ("foo","bar",99) |
The %n
conversion is especially useful with sscanf
, since it
allows you to determine the number of characters which were actually consumed:
==> sscanf "foo bar 99 *** extra text here ***\n" "%s %s %d%n" ("foo","bar",99,10) |
You might then use the character count, e.g., to check whether the input format matched the entire string, or whether there remains some text to be processed.
Some remarks about the role of the length modifiers h
and
l
are in order. Just as with the C scanf
routines, you
need the l
modifier to read a double precision value; a simple
%f
will only read single precision number:
==> scanf "%f" 1e100<CR> inf ==> scanf "%lf" 1e100<CR> 1e+100 |
The printf
functions, however, always print double precision numbers,
so the l
modifier is not needed:
==> sprintf "%g" 1e100 "1e+100" |
For the integer conversions, the h
and l
modifiers denote short
(usually 2 byte) and long (usually 4 byte) integer values. If the modifier is
omitted, the default integer type is used (this usually is the same as
long
, but your mileage may vary).
As already indicated, the printf
and scanf
routines are limited
to machine integer sizes. Thus a scanf
integer conversion will always
return a short or long integer value, depending on the length modifier
used. If a printf
integer conversion is applied to a "big" integer
value, only the least significant bytes of the value are printed, as if the
printed number (represented in 2's complement if negative) had been cast to
the corresponding integer type in C. Thus the printed result will be
consistent with C printf
output under all circumstances. For instance:
==> def N = 0xffff70008000 // big number ==> printf "%hu %lu\n" (N,N) 32768 1879080960 () ==> printf "%hd %ld\n" (N,N) -32768 1879080960 () |
To correctly print a big integer value, you can convert it manually with Q's
built-in str
function, then print the value using a %s
conversion:
==> printf "%s\n" (str 1234567812345678) 1234567812345678 () |
Similarly, you can read a big integer value by converting it as a string, and then apply the val builtin.
==> val (scanf "%s") 1234567812345678<CR> 1234567812345678 |
Here you might use the %[...]
conversion to ensure that the number is
in proper format (the initial blank is needed here to skip any leading
whitespace):
==> val (scanf " %[0-9-]") -1234567812345678 -1234567812345678 |
On output, the integer and floating point conversions can all be used with either integer or floating point arguments; integers will be converted to floating point values and vice versa if necessary:
==> printf "An integer: %d\n" 99.9 An integer: 99 () ==> printf "A floating point value: %e\n" 99 A floating point value: 9.900000e+01 () |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
These functions provide the same functionality as their C
counterparts. They are all to be found in system.q
and thus you
have to explicitly import the system
module to use them.
public extern rename OLD NEW; // rename a file public extern unlink NAME; // delete a file public extern truncate NAME LEN; // truncate a file (U) public extern getcwd, chdir NAME; // get/set the working directory public extern mkdir NAME MODE; // create a new directory public extern rmdir NAME; // remove a directory public extern readdir NAME; // list the files in a directory public extern link OLD NEW; // create a hard link (U) public extern symlink OLD NEW; // create a symbolic link (U) public extern readlink NAME; // read a symbolic link public extern mkfifo NAME MODE; // create a named pipe (U) public extern access NAME MODE; // test access mode public extern chmod NAME MODE; // set the file mode public extern chown NAME MODE UID GID; // set file ownership (U) public extern lchown NAME MODE UID GID; // set link ownership (U) public extern utime NAME TIMES; // set the file times public extern umask N; // set/get file creation mask public extern stat NAME, lstat NAME; // file and link information |
The stat
/lstat
functions return a tuple consisting of the
commonly available fields of the C stat
struct, see
stat
(2). For your convenience, the following mnemonic functions
are provided for accessing the different components:
public st_dev STAT, st_ino STAT, st_mode STAT, st_nlink STAT, st_uid STAT, st_gid STAT, st_rdev STAT, st_size STAT, st_atime STAT, st_mtime STAT, st_ctime STAT; |
These all need the system
module, so you have to import it in the
interpreter to make the following examples work. E.g.:
==> import system |
With that out of the way, let's play around with some of these functions:
==> mkdir "tmp" 0777||chdir "tmp"||mkfifo "foo" 0666||\ rename "foo" "bar"||unlink "bar"||chdir ".."||rmdir "tmp" () |
(Create a tmp
subdirectory, change to it, create a new FIFO
special file, rename that file, delete it, change back to the original
directory, and remove the tmp
directory. All with a single
expression which realizes identity.)
Now for something more useful. We can retrieve the current umask while setting it to zero, and then reset it to the original value as follows:
==> def U = umask 0; oct; umask U || U; dec 022 |
List the files in the current directory:
==> readdir "." [".","..","Makefile","givertcap","clib.c","clib.q","Makefile.am", "Makefile.in","README-Clib","examples","Makefile.mingw"] |
Get the size of a file:
==> st_size (stat "README-Clib") 355 |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
With the notable exception of the exit
function which is included
in clib
(and thus in the prelude), all the following functions
need an explicit import of the system
module.
The system
function returns the status code of the command if execution
was successful, and fails otherwise:
public extern system CMD; // exec command using the shell |
Clib
also provides the usual UNIX process creation and management
routines. Most of these really require a UNIX system; no attempt is made
to emulate operations like fork
on systems where they are not
implemented. Thus the only process operations which currently work under
Windows are system
, exec
, spawn
, _spawn
,
exit
and getpid
.
public extern fork; // fork a child process (U) public extern exec PROG ARGS; // execute program public extern spawn PROG ARGS; // execute program in child process public extern _spawn MODE PROG ARGS; // execute child with options public extern nice INC; // change nice value (U) public extern exit N; // exit process with given exit code public extern pause; // pause until a signal occurs (U) public extern raise SIG; // raise signal in current process public extern kill SIG PID; // send signal to given process (U) public extern getpid; // current process id public extern getppid; // parent's process id (U) public extern wait; // wait for any child process (U) public extern waitpid PID OPTIONS; // wait for given child process (U) |
All these operations are simply wrappers for the corresponding C library
routines. Note, however, that the kill
function takes the signal
to send as its first argument, which makes it easier to use
partial applications of the function, e.g., to iterate a kill operation
over a list of process numbers (as in `do (kill SIGTERM) PIDs').
The exec
function performs a path search like the C
execlp
/execvp
function; the parameters for the program are
given as a string list ARGS
, and as usual the first argument
should repeat the program file name. This function never returns unless
it fails. The spawn
and _spawn
operations are provided to
accommodate Windows' lack of fork
and wait
; these
functions work on both UNIX and Windows. The spawn
function works
like exec
, but runs the program in a new child process. It
returns the new process id (actually the process handle under
Windows). The _spawn
function is like spawn
, but accepts
an additional MODE
parameter which determines how the child is to
be executed, either P_WAIT
(wait for the child, return its exit
status), P_NOWAIT
(do not wait for the child, same as
spawn
), P_OVERLAY
(replace the current image with the new
process, same as exec
) and P_DETACH
(run the new process
in the background). (Note that the P_DETACH
option is ignored on
UNIX systems; the correct way to code a "daemon" on UNIX is shown in
the examples section below.)
On UNIX, the following routines are provided to interpret the status
code returned by the system
, _spawn
, wait
and
waitpid
functions:
public extern isactive STATUS; // process is active public extern isexited STATUS, exitstatus STATUS; // process has exited normally, get its exit code public extern issignaled STATUS, termsig STATUS; // process was terminated by signal, get the signal number public extern isstopped STATUS, stopsig STATUS; // process was stopped by signal, get the signal number |
For more information about the process functions, we refer the reader to the corresponding UNIX manual pages.
Operations to access the process environment are also implemented. The
getenv
function fails if the given variable is not set in the
environment; this lets you distinguish this error condition from a defined
variable with empty value. The setenv
function overwrites an existing
definition of the given variable:
public extern getenv NAME, setenv NAME VAL; // get/set environment variables |
On UNIX, the following operations provide access to process user and group information, as well as process groups and sessions. Not all operations may be implemented on all UNIX flavours. Please see the UNIX manual for a description of these functions.
/* User/group-related functions (U). */ public extern setuid UID, setgid GID; // set user/group id of process public extern seteuid UID, setegid GID; // set effective user/group id public extern setreuid RUID EUID, setregid RGID EGID; // set real and effective ids public extern getuid, geteuid; // get real/effective user id public extern getgid, getegid; // get real/effective group id public extern getlogin; // get real login name // get/set supplementary group ids of current process public extern getgroups, setgroups GIDS; /* Session-related routines (U). */ public extern getpgid PID, setpgid PID PGID; // get and set process group public extern getpgrp, setpgrp; // dito, for calling process public extern getsid PID; // get session id of process public extern setsid; // create a new session |
Invoke the system
function to execute a shell command:
==> import system ==> system "ls -l" |
You can also run the program directly with the spawn
function:
==> spawn "ls" ["ls","-l"] |
Get and set an environment variable:
==> getenv "HOME" "/home/ag" ==> getenv "MYVAR" // variable is undefined getenv "MYVAR" ==> setenv "MYVAR" "foo bar" () ==> getenv "MYVAR" "foo bar" |
Here are some examples demonstrating the use of named pipes and the
process functions on UNIX systems. The mkfifo
function allows the
creation of so-called "FIFO special files" a.k.a. named pipes,
which provide a simple inter-process communication facility.
For instance, create a named pipe as follows:
==> mkfifo "pipe" 0666 () |
You can then open the writeable end of the pipe:
==> def OUT = fopen "pipe" "w" |
Note that this call blocks until the input side of the pipe has been
opened. For this purpose, start another instance of the interpreter
(e.g., in another xterm
), and from there open the pipe for
reading:
==> def IN = fopen "pipe" "r" |
Both fopen
calls should now have finished, and you can write something
to the output end of the pipe in the first instance of the interpreter:
==> fwrites OUT "Hello, there!\n" |
Go to the other interpreter instance, and read back the string from there:
==> freads IN "Hello, there!" |
As usual, each end of the pipe is closed as soon as the corresponding file
object is no longer accessible. When you close the writeable end of the pipe
using, e.g., undef OUT
in the first instance of the interpreter, the
input side of the pipe will reach end-of-file, and thus feof IN
will
become true
. After closing the pipe also on the input side, you can
remove the FIFO special file with the unlink
function.
You can also use named pipes to set up a communication channel to child
processes created with fork
. For instance:
import system; def NAME = tmpnam; def PIPE = mkfifo NAME 0666; def MSG = "Hello there!\n"; test = printf "Parent writes: %s" MSG || fwrites (fopen NAME "w") MSG || writes "Parent waits for child ...\n" || printf "Parent: child has exited with code %d\n" wait if fork > 0; = printf "Child reads: %s\n" (freads (fopen NAME "r")) || writes "Child exiting ...\n" || exit 0 otherwise; ==> test Parent writes: Hello there! Parent waits for child ... Child reads: Hello there! Child exiting ... Parent: child has exited with code 0 () ==> unlink NAME () |
Another method for accomplishing this with anonymous pipes is discussed in Low-Level I/O.
On UNIX, it also possible to implement "daemons", i.e., processes which place themselves in the background and continue to run even when you log out. The following little script shows how to do this.
import system; /* Becoming a daemon is easy: Just fork, have the parent exit, and call setsid in the child to start a new session. The new process becomes a child of the init process and has no controlling terminal. Thus it keeps running even if you log out, until it gets killed or the system shuts down. */ daemon = setsid || main if fork = 0; = exit 0 otherwise; /* The main code of the daemon then closes file descriptors inherited by the parent and starts executing. In this example we just open a logfile and start logging messages in regular intervals. We also handle the condition that we are terminated by a signal. */ main = do close [0,1,2] || log F "daemon started" || do (trap 1) [SIGINT, SIGTERM, SIGHUP, SIGQUIT] || catch (sig F) (loop F) where F:File = fopen "log" "w"; = perror "daemon" || exit 1 otherwise; sig F (syserr SIG) = log F (sprintf "daemon stopped by signal %d" (-SIG)) || exit 0; loop F = sleep 5 || log F "daemon still alive" || loop F; log F MSG = fprintf F "%s at %s" (MSG, ctime time) || fflush F; |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
These functions provide operations for direct manipulation of files on
the file descriptor level. They are all in the system
module. For
a closer description of the following operations we refer the reader to
the corresponding UNIX manual pages.
public extern open NAME FLAGS MODE; // create a new descriptor public extern close FD; // close a descriptor public extern dup FD, dup2 OLDFD NEWFD; // duplicate a descriptor public extern pipe; // create an unnamed pipe public extern fstat FD; // stat descriptor public extern fchdir FD; // change directory (U) public extern fchmod FD MODE; // change file mode (U) public extern fchown FD UID GID; // set file ownership (U) public extern ftruncate FD LEN; // truncate a file (U) public extern fsync FD, fdatasync FD; // sync the given file (U) |
The following operations can be used on both file descriptors and file
objects. They read and write binary data represented as byte strings
(see Byte Strings), providing an interface to the system's
read
/write
(2) and fread
/fwrite
(3)
functions. The bread
function returns a byte string of the given
size read from the given file. Note that the returned byte string may
actually be shorter than SIZE
bytes because, e.g., end of file
has been reached or not enough input was currently available on a
pipe. The bwrite
function returns the number of bytes actually
written which is usually the size of the byte string unless an error
occurs. The functions fail if an error occurred before anything was read
or written. It is the application's responsibility to check these error
conditions and handle them in an appropriate manner.
public extern bread FD SIZE; // read a byte string public extern bwrite FD DATA; // write a byte string |
For instance, the following function uses bread
and bwrite
to copy an input to an output file, using chunks of 8192 bytes at a
time:
fcopy F G = () if bwrite G (bread F 8192) < 8192; = fcopy F G otherwise; |
The file pointer of a descriptor can be positioned with lseek
. In
difference to fseek
, this function returns the new offset. To
determine the current position you can hence use an expression like
`lseek FD 0 SEEK_CUR'.
public extern lseek FD POS WHENCE; |
Some terminal-related routines are also provided:
public extern isatty FD; // is descriptor a terminal? |
The following are UNIX-specific:
public extern ttyname FD; // terminal associated with descriptor public extern ctermid; // name of controlling terminal public extern openpty, forkpty; // pseudo terminal operations |
The openpty
function returns a pair (MASTER, SLAVE)
of
file descriptors opened for both reading and writing on a "pseudo
terminal". MASTER
is to be used in the controlling process,
while SLAVE
can be used for the standard I/O streams in a child
process. The forkpty
function combines openpty
with
fork
and makes the slave device the controlling terminal of the
child process; it returns a pair (PID, MASTER)
, where PID
is zero in the child process and the process id of the child in the
parent, and MASTER
is the master end of the pseudo terminal to be
used by the parent. These functions are commonly used to implement
applications which drive other programs through a terminal emulation
interface.
On UNIX systems, clib
also provides access to the following
fcntl
operation (see Section 2 of the UNIX manual):
public extern fcntl FD CMD ARG; |
The ARG
parameter of fcntl
depends on the type of command
CMD
which is executed. The available command codes and other
relevant values are defined as global variables, as listed below. Flags
are bitwise disjunctions of the symbolic values listed below. (The
following are the values present on most systems. Specific
implementations may provide additional flags.)
public var const // fcntl command codes F_DUPFD, F_GETFD, F_SETFD, F_GETFL, F_SETFL, F_GETLK, F_SETLK, F_SETLKW, // lock types F_RDLCK, F_WRLCK, F_UNLCK, // file access modes and access mode bitmask O_RDONLY, O_WRONLY, O_RDWR, O_ACCMODE, // file descriptor flags FD_CLOEXEC, // status flags O_CREAT, O_EXCL, O_TRUNC, O_APPEND, O_NONBLOCK, O_NDELAY, O_NOCTTY, O_BINARY; |
The following types of commands are implemented:
fcntl FD F_DUPFD ARG // duplicate a file descriptor fcntl FD F_GETFD () // get file descriptor flags fcntl FD F_SETFD FLAGS // set file descriptor flags fcntl FD F_SETFD () // get status flags/access mode fcntl FD F_SETFD FLAGS // set status flags fcntl FD F_GETLK (TYPE,POS,LEN[,WHENCE]) // query file lock information fcntl FD F_SETLK (TYPE,POS,LEN[,WHENCE]) // set an advisory file lock fcntl FD F_SETLKW (TYPE,POS,LEN[,WHENCE]) // blocking variant of F_SETLK |
The first five commands serve to duplicate descriptors and to retrieve
and change the file descriptor and status flags. The remaining commands
are used for advisory file locking. A file lock is specified as a triple
(TYPE, POS, LEN)
or quadruple (TYPE, POS, LEN, WHENCE)
,
where TYPE
is the type of lock (F_RDLCK
, F_WRLCK
or
F_UNLCK
for read locks, write locks and unlocking, respectively),
POS
the position in the file, LEN
the number of bytes to
be locked (0 means up to the end of the file) and WHENCE
specifies how the POS
argument is to be interpreted. (This
parameter has the same meaning as for the fseek
and lseek
functions, see above. If WHENCE
is omitted, it defaults to
SEEK_SET
, i.e., absolute positions.) The value returned by the
F_GETLK
command is the lock description with TYPE
set to
F_UNLCK
if the given lock would be accepted, and the description
of a current lock blocking the lock request otherwise. (In the latter
case the return value is actually a quadruple, with the id of a process
currently owning a conflicting lock in the last component.)
Note that the standard I/O operations use buffered I/O by default which
might interfere with record locking. Therefore in applications requiring
individual record locking you should work with the low-level operations
(open
, bwrite
, etc.) instead.
The following select
function waits for a set of files to change
I/O status. Note that this operation is available on Windows as part of
the socket interface, but it only applies to sockets there.
public extern select FILES; |
The input is a tuple (IN, OUT, ERR, TIMEOUT)
consisting of three
lists of file descriptors and/or file objects to be watched, and an
optional integer or floating point value specifying a timeout in
seconds. The function returns as soon as either a member of IN
or
OUT
becomes ready for performing an I/O operation (without
blocking), or an error condition is signaled for a member of
ERR
. The returned value is a triple (IN, OUT, ERR)
with
all the members of the original lists which are now ready for I/O. If
the timeout is exceeded before any of the files has become ready, a
triple of three empty lists is returned. If no timeout is specified then
the function may block indefinitely.
These examples are mostly UNIX-specific, thus Windows users might wish to skip ahead.
The following definitions show how the fcntl
function can be used
to change a file's "non-blocking" flag. This is useful, e.g., if we
want to read from standard input or a pipe but do not want to be blocked
until input becomes available. Instead, having set the non-blocking
flag, input operations will fail immediately if there is no input to be
read right now.
/* set and clear the O_NONBLOCK flag of a file */ set_nonblock FD:Int = fcntl FD F_SETFL (FLAGS or O_NONBLOCK) where FLAGS = fcntl FD F_GETFL (); clr_nonblock FD:Int = fcntl FD F_SETFL (FLAGS and not O_NONBLOCK) where FLAGS = fcntl FD F_GETFL (); |
And here is how we can perform advisory locking on an entire file.
/* place an advisory read or write lock on an entire file (fail if error) */ rdlock FD:Int = () where () = fcntl FD F_SETLK (F_RDLCK, 0, 0); wrlock FD:Int = () where () = fcntl FD F_SETLK (F_WRLCK, 0, 0); /* remove the lock from the file */ unlock FD:Int = () where () = fcntl FD F_SETLK (F_UNLCK, 0, 0); /* predicates to check whether a read or write lock could be placed */ rdlockp FD:Int = (LOCK!0 = F_UNLCK) where LOCK:Tuple = fcntl FD F_GETLK (F_RDLCK, 0, 0); wrlockp FD:Int = (LOCK!0 = F_UNLCK) where LOCK:Tuple = fcntl FD F_GETLK (F_WRLCK, 0, 0); |
Note that to apply these functions to standard file objects you can use
the fileno
function (see Extended File Functions) as
follows:
==> rdlock (fileno F) |
The select
function accepts both files and file descriptors as
input. Here is a way to test whether input is currently available from a
file/descriptor:
avail F = not null (select ([F],[],[],0)!0); |
This is useful, in particular, if the file is actually a pipe. For instance:
==> def F = popen "sleep 5; echo done" "r" ==> avail F // no input to be read yet, wait ... false ==> avail F // ... input available now true ==> fget F "done\n" |
However, most of the time select
is used for multiplexing I/O
operations. For instance, the following loop processes input from a set
of files, one line at a time:
loop FILES = loop (proc FILES F) where ([F|_],_,_) = select (FILES,[],[]) if not null FILES; = () otherwise; proc FILES F = // done with this file, get rid of it filter (neq F) FILES if feof F; = // process a line writes (fgets F) || FILES; |
The pipe
and dup2
operations provide a quick way to
reassign input and output of a child process and connect it to
corresponding file objects in the parent. For instance, here's how we
can implement a popen2
function which works like the built-in
popen
routine, but allows to redirect both the input and
output side of a child process:
import system; /* Create two unnamed pipes, one for the parent to read and the child to write, the other one for the child to read and the parent to write. */ popen2 CMD = spawn2 CMD (P_IN, P_OUT) (C_IN, C_OUT) where (P_IN, C_OUT) = pipe, (C_IN, P_OUT) = pipe; /* Fork the child and redirect its standard input and output streams to the child's ends of the pipe. This is accomplished with dup2 SRC DEST which closes the file descriptor DEST and then makes DEST a copy of SRC. In the parent we use fdopen to open two new file objects for the parent's ends of the pipes. */ spawn2 CMD (P_IN, P_OUT) (C_IN, C_OUT) = close P_IN || close P_OUT || dup2 C_IN (fileno INPUT) || dup2 C_OUT (fileno OUTPUT) || exec "/bin/sh" ["/bin/sh", "-c", CMD] if fork = 0; = close C_IN || close C_OUT || (fdopen P_IN "r", fdopen P_OUT "w") otherwise; |
The popen2
function employs fork
and exec
to spawn
a child process which executes the given command using the shell, after
redirecting the child's input and output to two pipes. In the parent
process, the popen2
function returns a pair of files opened on
the other ends of the child's descriptors.
The following piece of Q code shows how to apply the popen2
function
defined above in order to pipe a string list into the UNIX sort
program
and construct the sorted list from the output:
mysort STRL = do (fprintf OUT "%s\n") STRL || fclose OUT || digest IN where (IN, OUT) = popen2 "sort"; digest IN = [] if feof IN; = [freads IN|digest IN] otherwise; |
Example:
==> mysort ["five","strings","to","be","sorted"] ["be","five","sorted","strings","to"] ==> wait // get exit code of child process (sort program) (15804,0) |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The following (UNIX-specific) operations from the system
module
provide an interface to the POSIX termios interface. Terminal attributes
are stored in a "termios" structure, represented as a 7-tuple
(IFLAG, OFLAG, CFLAG, LFLAG, ISPEED, OSPEED, CC)
. The control
character set CC
is represented as a list of character numbers,
indexed by the symbolic constants VEOF
etc. See termios
(3)
for further details.
public extern tcgetattr FD; // get terminal attributes public extern tcsetattr FD WHEN ATTR; // set terminal attributes public extern tcsendbreak FD DURATION; // send break public extern tcdrain FD; // wait until all output finished public extern tcflush FD QUEUE; // flush input or output queue public extern tcflow FD ACTION; // control input/output flow public extern tcgetpgrp FD; // get terminal process group public extern tcsetpgrp FD PGID; // set terminal process group /* Access components of the termios structure. */ public c_iflag ATTR, c_oflag ATTR, c_cflag ATTR, c_lflag ATTR, c_ispeed ATTR, c_ospeed ATTR, c_cc ATTR; |
This example shows how to use the termios
functions to read a
password from the terminal without echoing. This is an almost literal
translation of the C program described in Richard Stevens: Advanced
Programming in the UNIX Environment, Addison-Wesley, 1993,
cf. p. 350. The main difference is that we merely ignore the
SIGINT
and SIGTSTP
signals instead of blocking them (the
latter is not supported by Q's trap
builtin).
import system; getpass PROMPT = fwritec F "\n" || unprep F SAVE || PW where F:File = fopen ctermid "r+", SAVE = prep F, PW = fwrites F PROMPT || fflush F || freads F; /* prep F: ignore SIGINT and SIGTSTP and prepare the terminal */ prep F = tcsetattr (fileno F) TCSAFLUSH NATTR || (ATTR,TRAPS) where TRAPS = map (trap SIG_IGN) [SIGINT,SIGTSTP], ATTR = tcgetattr (fileno F), (IF,OF,CF,LF,IS,OS,CC) = ATTR, LF = LF and not (ECHO or ECHOE or ECHOK or ECHONL), NATTR = (IF,OF,CF,LF,IS,OS,CC); /* unprep F SAVE: revert to previous settings */ unprep F (ATTR,TRAPS) = tcsetattr (fileno F) TCSAFLUSH ATTR || zipwith trap TRAPS [SIGINT,SIGTSTP]; |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The following functions from system.q
provide a basic interface
to the GNU readline library. While readline doesn't really belong to the
POSIX interface, it is rather useful and the interpreter uses it anyway,
so it makes sense to also provide it as a part of the system interface.
public extern readline PROMPT, rl_line_buffer; public extern add_history LINE, stifle_history MAX; public extern read_history FNAME, write_history FNAME; |
These functions work like their C counterparts. The readline
function prompts with the given string and reads an input line,
providing editing and history facilities. Basic bash-like filename
completion is also provided. The rl_line_buffer
function returns
the text of the current input line, which is useful, e.g., if a custom
completion function (see below) needs to inspect surrounding context.
The add_history
routine adds a line to the history. The maximum
size of the history can be set with stifle_history
, which also
returns the size which was set previously. A negative MAX
value
makes the size of the history unbounded (which is also the
default). Last but not least, the history can be read from and written
to a file with the read_history
and write_history
functions.
For instance, the following commands read an input string, add it to the history, and finally save the history to a file:
==> import system ==> readline "input> " input> foo bar "foo bar" ==> add_history _ () ==> write_history "myhistory" () |
As another example, here is the definition of a little convenience
function which reads a line using readline
and enters it into the
history if it is nonempty:
get_line PROMPT = if not null LINE then add_history LINE || LINE where LINE:String = readline PROMPT; |
Readline's standard filename completion facility can be augmented with a custom completion function. This is achieved by simply setting the following global variable to the desired function:
public var RL_COMPLETION_FUNCTION; |
The completion function is invoked with two arguments, the string to be
completed and the position index of that string in the input line
(rl_line_buffer
, see above), and is expected to return a string
list with the possible completions. For instance, here is a simple
example of a function which checks a list of "command" words for
possible completions:
def RL_COMPLETION_FUNCTION = complete; complete S _ = filter (is_prefix S) ["bar","foo","gnats","gnu"]; is_prefix X Y = (X=sub Y 0 (#X-1)); |
The position index argument is useful if the completion depends on the position inside the input line. For instance, the following function only attempts completion at the beginning of the input line:
complete S 0 = filter (is_prefix S) ["bar","foo","gnats","gnu"]; |
Readline's default behaviour is to try a custom completion first, if it
is available, and to fall back to the standard filename completion
otherwise. The latter can be suppressed by ending the list of
completions with a ()
entry:
complete S 0 = filter (is_prefix S) ["bar","foo","gnats","gnu"] ++ [()]; complete _ _ = [()] otherwise; |
The text units for which readline attempts completion are the "words" of the input line. There is a `RL_WORD_BREAK_CHARS' variable which allows you to change readline's idea of what a word is, by setting the variable to a string containing the word delimiter characters. The default definition of this variable is as follows:
public var RL_WORD_BREAK_CHARS = " \t\n\"\\'`@$><=;|&{("; |
Redefining RL_WORD_BREAK_CHARS
affects both readline's cursor
movement commands and the behaviour of the completion routine. For
instance, to turn the comma into a word delimiter, you can change
RL_WORD_BREAK_CHARS
as follows:
def RL_WORD_BREAK_CHARS = RL_WORD_BREAK_CHARS++","; |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The functions described here are all in the system
module. Error
codes from various system operations can be retrieved with the following
functions:
public extern errno, seterrno N; // get/set last error code public extern perror S, strerror N; // print error message |
The perror
function is commonly used to report error conditions
in system operations on the standard error file. For instance:
==> fopen "/etc/passw" "r" fopen "/etc/passw" "r" ==> perror "fopen" fopen: No such file or directory () |
If more elaborate formatting is required then you can use
strerror
on the errno
value to obtain the error message as
a string:
==> fprintf ERROR "fopen returned message `%s'\n" (strerror errno) fopen returned message `No such file or directory' () |
Note that errno
is only set when an error occurs in a system
call. You can use seterrno
to reset the errno
value before
a system operation to check whether there actually was an error while
executing the system call:
==> seterrno 0 () ==> fopen "/etc/passwd" "r" <<File>> ==> perror "fopen" fopen: Success () |
The remaining operations are used to obtain various information about
the system and its information databases. For instance, the uname
operation returns a 5-tuple containing information identifying the
operating system (this operation is generally only available on
UNIX-like systems):
public extern uname; /* Access components of uname result. */ public un_sysname UNAME, un_nodename UNAME, un_release UNAME, un_version UNAME, un_machine UNAME; |
The hostname of the system can be retrieved with the gethostname
function.
public extern gethostname; |
The password and group database can be accessed with the following
functions. Password entries are encoded as 7-tuples (NAME, PASSWD,
UID, GID, GECOS, DIR, SHELL)
, group entries as 4-tuples (NAME,
PASSWD, GID, MEMBERS)
. This information is only available on UNIX-like
systems.
public extern getpwuid UID, getpwnam NAME; // look up a password entry public extern getpwent; // list of all pw entries public extern getgrgid GID, getgrnam NAME; // look up group entry public extern getgrent; // list of all group entries /* Access components of password and group structures. */ public pw_name PW, pw_passwd PW, pw_uid PW, pw_gid PW, pw_gecos PW, pw_dir PW, pw_shell PW; public gr_name GR, gr_passwd GR, gr_gid GR, gr_members GR; |
Moreover, the crypt
function can be used to perform UNIX password
encryption (see crypt
(3) for details).
public extern crypt KEY SALT; // (U) |
The following functions can be used to query host information as well as
the network protocols and services available on your system. This
information is closely related to the socket interface described in
Sockets. For a closer description of these operations we refer the
reader to the corresponding manual pages. Note that the
gethostent
, getprotoent
and getservent
operations
are not available on Windows.
The host database: Host entries are of the form (NAME, ALIASES,
ADDR_TYPE, ADDR_LIST)
, where NAME
denotes the official hostname,
ALIASES
its alternative names, ADDR_TYPE
the address
family and ADDR_LIST
the list of addresses.
public extern gethostbyname HOST, gethostbyaddr ADDR; public extern gethostent; // (U) public h_name HENT, h_aliases HENT, h_addr_type HENT, h_addr_list HENT; |
Note that both hostnames and IP addresses are specified as
strings. Hostnames are symbolic names such as "localhost"
, and
can also have a domain name specified, as in "www.gnu.org"
. IPv4
addresses use the well-known "numbers-and-dots" notation, like the
loopback address "127.0.0.1"
. IPv6 addresses are usually written
as eight 16-bit hexadecimal numbers that are separated by colons; two
colons are used to abbreviate strings of consecutive zeros. For example,
the IPv6 loopback address "0:0:0:0:0:0:0:1"
can be abbreviated as
"::1"
.
The protocol database: Protocol entries are of the form
(NAME, ALIASES, PROTO)
denoting official name, aliases and number
of the protocol.
public extern getprotobyname NAME; public extern getprotobynumber PROTO; public extern getprotoent; // (U) public p_name PENT, p_aliases PENT, p_proto PENT; |
The service database: Service entries are of the form (NAME,
ALIASES, PORT, PROTO)
denoting official name, aliases, port number and
protocol number of the port. The NAME
argument of
getservbyname
can also be a pair (NAME, PROTO)
to restrict
the search to services for the given protocol (given by its
name). Likewise, the PORT
argument can also be given as
(PORT, PROTO)
.
public extern getservbyname NAME; public extern getservbyport PORT; public extern getservent; // (U) public s_name PENT, s_aliases PENT, s_port PENT, s_proto PENT; |
For instance, here is some information retrieved from a typical Linux system:
==> import system ==> uname ("Linux","obelix","2.4.19-4GB","#2 Tue Mar 4 16:03:51 CET 2003","i686") ==> gethostname "obelix" ==> gethostbyname gethostname ("obelix.local",["obelix"],2,["127.0.0.2"]) ==> gethostbyaddr "::1" ("localhost",["ipv6-localhost","ipv6-loopback"],10,["::1"]) ==> getprotobyname "tcp" ("tcp",["TCP"],6) ==> getservbyname "ftp" ("ftp",[],5376,"tcp") ==> getpwuid getuid ("ag","x",500,100,"Albert Gräf","/home/ag","/bin/bash") ==> getgrgid getgid ("users","x",100,[]) |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The following functions from the system
module are available on
systems providing a BSD-compatible socket layer, which, besides BSD,
includes BEOS, Linux, OSX, Windows and most recent System V
flavours. Sockets provide bidirectional communication channels on the
local machine as well as across the network. Sockets are represented by
file descriptors which can be written to and read from with the
send
and recv
functions. On most systems, socket
descriptors are just ordinary file descriptors which can also be used
with low-level I/O functions and fdopen
as usual. However, on
some systems (in particular, BEOS and Windows), socket descriptors are
"special" and all socket I/O must be performed with the special socket
operations.
At creation time, a socket is described by the following attributes (see
socket
(2) for more details):
AF_LOCAL
(a.k.a. AF_UNIX
a.k.a. AF_FILE
),
AF_INET
and AF_INET6
. Not all protocol families may be
available on all systems (e.g., the Windows socket library only supports
AF_INET
).
SOCK_STREAM
,
SOCK_DGRAM
, SOCK_SEQPACKET
, SOCK_RAW
and
SOCK_RDM
. Among these, SOCK_STREAM
, SOCK_SEQPACKET
and SOCK_RDM
are connection-based. Please note that not all
socket types are supported for all protocol families, and some socket
types may be entirely missing on non-UNIX systems.
AF_LOCAL
namespace, which refers to the local filesystem, the
protocol is always 0, the default protocol. The available protocols for
the internet namespaces can be retrieved from the protocol database (see
System Information).
Before another process can connect to a socket it must also be bound to
an address. The address format depends on the address family of the
socket. For the local namespace, the address is simply a filename on
the local filesystem. For the IPv4 namespace, it is a pair (HOST,
PORT)
where HOST
denotes the host name or IP address, specified
as a string, and PORT
is a port number. For the IPv6 namespace,
it is a quadruple (HOST, PORT, FLOWINFO, SCOPEID)
. Host names and
known port numbers can be retrieved from the host and service databases
(see System Information).
The following operations are provided to create a socket, or a pair of connected sockets, for the given address family, socket type and protocol. They return the file descriptor of the socket (or a pair of file descriptors).
public extern socket FAMILY TYPE PROTO; public extern socketpair FAMILY TYPE PROTO; // (U) |
The shutdown
function terminates data transmission on a
socket. You can stop reading, writing or both, depending on whether
HOW
is SHUT_RD
, SHUT_WR
or SHUT_RDWR
. Note
that this operation does not close the socket's file descriptor; for
this purpose closesocket
is used (see below).
public extern shutdown SOCKET HOW; |
The closesocket
function closes a socket. On most systems this is
just identical to close
(see Low-Level I/O), but, as
already noted, on some systems socket descriptors are special and you
must use this function instead.
public extern closesocket SOCKET; |
The bind
function binds a socket to an address. This is also done
automatically when the socket is first used. However, if the socket has
to be found by another process you'll have to explicitly specify an
address for it. The bind
function does just that.
public extern bind SOCKET ADDR; |
The following operations are used to start listening for and accept
connection requests on a socket. These operations are used on the server
side of a connection-based socket. The argument N
of
listen
denotes the maximum number of pending connection requests
for the server. After the call to listen, the server can accept
connections from a client with the accept
function, which returns
a pair (SOCKET, ADDR)
, where SOCKET
is a new socket
connected to the client, and ADDR
is the client's address.
public extern listen SOCKET N; public extern accept SOCKET; |
The connect
function is used to initiate a connection on a
socket. This function can be used on both connection-based and
connectionless sockets. In the former case, connect
can only be
invoked once. In the latter case, it can be invoked multiple times, and
sets the remote socket for subsequent send and receive operations.
public extern connect SOCKET ADDR; |
The following routines retrieve information about a socket. The local
address of a socket and the address of the remote socket it is connected
to can be retrieved with getsockname
and
getpeername
. Socket options, specified using a protocol level
LEVEL
and an option index OPT
, can be queried and changed
with getsockopt
and setsockopt
. The option values are
encoded as byte strings (cf. Byte Strings). For a description of
the available options see getsockopt
(2).
public extern getsockname SOCKET; public extern getpeername SOCKET; public extern getsockopt SOCKET LEVEL OPT; public extern setsockopt SOCKET LEVEL OPT VAL; |
Finally, the following specialized I/O functions are used to transmit
data over a socket. All data is encoded as byte strings. The receive
operations return the received data (which may be shorter than the
requested size, if not enough data was currently available), the send
operations the number of bytes actually written. For
recvfrom
/sendto
the data is encoded as a pair (ADDR,
DATA)
which includes the source/destination address; these operations
are typically used for connectionless sockets. The FLAGS
argument
is used to specify special transmission options (see the MSG_*
constants at the beginning of clib.q
).
public extern recv SOCKET FLAGS SIZE, send SOCKET FLAGS DATA; public extern recvfrom SOCKET FLAGS SIZE, sendto SOCKET FLAGS DATA; |
The following script demonstrates how we can implement a connectionless
server in the IPv4 namespace which repeatedly accepts a request from a
client and sends back an answer. In this example the requests are
strings denoting Q expressions; the server evaluates each expression and
sends back the result as a string. The client reads input from the user,
transmits it to the server and prints the received answer. Note that the
transmitted strings are represented as byte strings, as required by the
recvfrom
and sendto
operations. The bytestr
and
bstr
functions are used to convert between character and byte
strings, see Byte Strings.
import system; def BUFSZ = 500000; // buffer size /* the server: receive messages, evaluate them as Q expressions, and send back the results */ def SERVER = ("localhost",5001); // the server address server = server_loop FD where FD:Int = socket AF_INET SOCK_DGRAM 0, () = bind FD SERVER; = perror "server" otherwise; server_loop FD = sendto FD 0 (ADDR,eval MSG) || server_loop FD where (ADDR,MSG) = recvfrom FD 0 BUFSZ; = server_loop FD otherwise; /* evaluate an expression encoded as a byte string, catch syntax errors and exceptions, convert result back to a byte string */ eval MSG = catch exception (bytestr (str VAL)) where 'VAL = valq (bstr MSG); = bytestr ">>> SYNTAX ERROR" otherwise; exception _ = bytestr ">>> ABORTED"; /* the client: read input from user, send it to the server, print returned result */ def CLIENT = ("localhost",5002); // the client address client = client_loop FD where FD:Int = socket AF_INET SOCK_DGRAM 0, () = bind FD CLIENT; = perror "client" otherwise; client_loop FD = sendto FD 0 (SERVER,bytestr MSG) || printf "%s\n" (bstr (recv FD 0 BUFSZ)) || client_loop FD if not null MSG where MSG:String = writes "\nclient> " || flush || reads; = () otherwise; |
For instance, we can invoke the server in a secondary thread and then execute the client as follows:
==> def S = thread server ==> client client> prd [1..50] 30414093201713378043612608166064768844377641568960512000000000000 client> 1+) >>> SYNTAX ERROR client> quit >>> ABORTED client> |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
On systems where the POSIX threads library or some compatible
replacement is available (this includes Windows and most modern UNIXes),
clib
provides functions for handling multiple threads of
control. Threads, a.k.a. "light-weighted processes", allow you
to realize "multithreaded scripts" consisting of different tasks which
together perform some computation in a distributed manner. All tasks are
executed concurrently. Thus you can, e.g., perform some lengthy
calculation in a background task while you go on evaluating other
expressions in the interpreter's command loop. You can also have tasks
communicate via mutexes, conditions and semaphores.
The operations described in this section (which are all contained in
clib
and thus included in the prelude) are in close
correspondence with POSIX 1003.1b. However, some operations are named
differently, and semaphores provide the extra functionality of sending
data from one thread to another. Mutexes are also supported, mostly for
the purpose of handling critical sections involving operations with
side-effects (I/O etc.). Mutexes are not required to make
conditions work since these have their own internal mutex handling. For
more information on POSIX threads, please refer to the corresponding
section in the UNIX manual.
Please note that these functions will only work as advertised if the interpreter has been built with POSIX thread support. Moreover, in the current implementation the interpreter effectively serializes multithreaded scripts on the reduction level and thus user-level threads cannot really take advantage of multi-processor machines.
12.13.1 Thread Creation and Management | ||
12.13.2 Realtime Scheduling | ||
12.13.3 Mutexes | ||
12.13.4 Conditions | ||
12.13.5 Semaphores | ||
12.13.6 Threads and Signals | ||
12.13.7 Thread Examples |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Clib
threads are represented using handles (objects of type
Thread
). Note that a thread is canceled automatically as soon as
its handle is garbage collected, thus you should keep the handle around
as long as the thread is needed. For convenience, thread handles are
numbered arbitrarily, starting at 0 which denotes the main thread, and
are ordered by the thread numbers. This is handy, e.g., if you want to
use thread handles as indices in a dictionary.
public extern type Thread; // thread handle type public isthread THREAD; // check for thread objects public extern thread_no THREAD; // thread number public extern this_thread; // handle of the current thread |
The basic thread operations are listed below. The thread
function
starts evaluating its special argument in a new thread, and returns its
handle. You can wait for a thread to terminate and obtain the evaluated
result with the result
function. (If there is no result, because
the thread has been canceled, or was aborted with halt
,
quit
or a runtime error, result
fails.) Note that
halt
or quit
in a thread which is not the main thread only
terminates the current thread; however, the exit
function,
cf. Process Control, always exits from the
interpreter. You can also terminate the current thread immediately and
return a given value as its result with the return
function; in
the main thread, this function is equivalent to halt
and the
return value is ignored. Moreover, all threads except the main thread
can also be canceled from any other thread using the cancel
function. Finally, the yield
function allows the interpreter to
switch threads at any given point (normally the interpreter will only
switch contexts in certain builtins and when a new rule is activated).
public extern special thread X; // start new thread public extern return X; // terminate thread with result X public extern cancel THREAD; // cancel THREAD public extern result THREAD; // wait for THREAD, return result public extern yield; // allow context switch |
Clib
threads always use deferred cancellation, hence thread
cancellation requests are usually not honored immediately, but are
deferred until the thread reaches a cancellation point where it is
safe to do so. Cancellation points occur at certain C library calls
listed in the POSIX threads documentation, when a new equation is
activated in the Q interpreter, and when yield
is called.
You can also check whether a thread is still active or has been
canceled. If neither condition holds, then the thread has already been
terminated and you can obtain its result with the result
operation.
public extern active THREAD; // check if THREAD is active public extern canceled THREAD; // check if THREAD was canceled |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Threads are always created with the default "non-realtime" scheduling policy. On some systems it is also possible to increase a thread's priority and have it scheduled in "realtime". This is useful for tasks with strict timing and responsiveness requirements, such as a function performing recording or playback in a multimedia application.
The scheduling policy and the priority of a thread can be changed with
the following function which takes two extra arguments, the policy
POL
(0 = default, 1 = realtime round-robin, 2 = realtime fifo
scheduling) and the priority PRIO
(where 0 denotes the default
priority). Note that on most systems realtime scheduling will only be
granted to privileged processes.
public extern setsched THREAD POL PRIO; // set scheduling parameters |
The current scheduling policy and priority of a thread can be retrieved
with the getsched
function:
public extern getsched THREAD; // get scheduling parameters |
The actual range of priority values is system-specific. On a typical UNIX system a non-realtime thread can only have priority 0, while realtime threads always have positive priorities. Higher priority threads always take precedence over lower priority ones. All else being equal, threads using round-robin scheduling are each given their "fair" timeslice, while fifo threads are handled on a "first come first served" basis. The latter can only be interrupted by higher priority processes (and signals) and thus should be used with utmost care. In any case you must make sure that a high-priority thread does not run unsuspended for extended periods of time, otherwise it might lock up your system. Thus a realtime thread should typically spend most of its time "in limbo" where it waits for input to arrive or a condition to be signaled.
Assuming that the priority value ranges are contiguous and contain either 0 or 1, you can determine the ranges for your system with the following little script:
test POL PRIO = setsched this_thread 0 0 || true where () = setsched this_thread POL PRIO; = false otherwise; priotest POL = (hd L,last L) if not null L where L = reverse (while (test POL) pred 0) ++ while (test POL) succ 1; |
Here is a sample result obtained on Linux:
==> map priotest [0,1,2] [(0,0),(1,99),(1,99)] |
Under Windows, things are a bit different, since Windows does not really
have POSIX-compatible thread scheduling. Instead, processes have
"priority classes" while threads have "priorities" which are
interpreted in the context of the priority class of the process they
belong to. Therefore under Windows the POL
argument of the
setsched
operation is actually interpreted as a priority class
for the entire process while the PRIO
argument specifies
the priority of the individual thread. Clib
keeps track of all
setsched
calls and always lets the process have the highest
priority class specified for a thread which is still active.
To these ends, the policy values 0, 1 and 2 are mapped to the priority
classes "normal", "high" and "realtime". (The latter should be
used sparingly, if at all, in a Q script, because it makes Windows very
unresponsive.) Moreover, the setsched
function also accepts a
policy value of -1 to denote "idle time" processes. (You'll rarely
have any use for this, unless you want to write a screensaver in Q.) For
each policy, the possible priority values have a range from -3 to 3,
where -3 denotes idle time threads, 0 is the normal priority, and 3 is
the highest priority to be used for "time-critical" threads. The
remaining priority levels provide some additional amount of control over
which thread gets the bone first.
While some "special effects" can be achieved with Windows' more exotic priority values, for typical usage a policy of 0 with zero priority should be employed for ordinary threads, a policy of 1 with some (small) positive priority value for a thread with moderate realtime requirements, and a policy of 2 if time is very critical. If you follow these conventions then your script will be able to run under Windows and most UNIX systems unchanged.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Clib
mutexes come in three flavours: fast (no error
checking), error checking (fail if the current thread already
holds a lock on the mutex) and recursive (the same thread may lock
the mutex multiple times, and the same number of unlock operations is
required to unlock the mutex again). The supported operations are
lock
(wait for the mutex to be unlocked, then lock it and return
()
), unlock
(unlock the mutex, return ()
) and
try
(lock the mutex if it is available, fail otherwise). As of Q
7.11, on systems which support this functionality, the try
operation can also be used with an argument of the form (MUTEX,
TIME)
to denote an absolute timeout value. This works analogously to
the await
operation on conditions, see Conditions, for
details.
CAVEAT: These operations are dangerous, as you can easily create deadlocks which might lock up the interpreter as well. Note that a deadlock will also prevent the involved threads from being canceled since, in accordance with the POSIX standard, the wait for a mutex lock is not a cancellation point. Thus these operations should be used with care.
Since Q has no mutable variables and the interpreter's builtins are all thread-safe, mutexes are actually used much less in Q than in other, procedural languages. They are most useful for protecting critical sections in which a sequence of operations with side-effects (such as I/O) is to be carried out in an atomic fashion. This can be done by locking a mutex at the beginning and unlocking it at the end of the sequence. Such usage is safe, provided that no operation in the sequence may block for an extended or even indefinite period of time.
public extern type Mutex; // mutex type public ismutex MUTEX; // check for mutex objects public extern mutex; // standard (fast) mutex object public extern errorchecking_mutex; // error checking mutex object public extern recursive_mutex; // recursive mutex object public extern lock MUTEX; // lock MUTEX public extern unlock MUTEX; // unlock MUTEX public extern try MUTEX; // try MUTEX, (MUTEX,TIME) for timeout |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Clib
conditions support the following operations: signal
(wake up one thread waiting for the condition), broadcast
(wake
up all threads waiting for the condition) and await
(suspend
the current thread until the condition is sent). Each of these
operations returns ()
. The await
operation can also be
invoked with a (COND, TIME)
tuple to denote a timed wait. If the
wait times out or is interrupted by a signal then the operation
fails. Note that in accordance with the POSIX standard the TIME
value denotes an absolute time (integer or floating point value
in seconds since the "epoch", see also the description of the built-in
time
function in Miscellaneous Functions).
public extern type Condition; // condition type public iscondition COND; // check for condition objects public extern condition; // new condition object public extern signal COND; // signal COND public extern broadcast COND; // broadcast COND public extern await COND; // wait for COND, or (COND,TIME) |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Clib
semaphores are in fact semaphore queues which can be
used as a a communication channel to pass values between different
threads using a FIFO discipline. In extension to the POSIX 1003.1b
semaphore operations, clib
also provides support for
bounded semaphores which are created with a positive limit on the
number of values the semaphore queue may hold at any time. When a new
value is posted to a bounded semaphore, the current thread is suspended
until the queue has room to receive a new value, if necessary.
The supported operations are post
(enqueue a value), get
(dequeue a value and return it, as soon as one is available), try
(non-blocking version of get
which fails if no value is currently
available), get_size
or #
which both return the current
queue size, and get_bound
which returns the queue size limit of a
bounded semaphore (or zero if the semaphore is unbounded). As of Q 7.11,
on systems which support this functionality, the try
operation
can also be used with an argument of the form (SEM, TIME)
to
denote an absolute timeout value. This works analogously to the
await
operation on conditions, see Conditions, for details.
Note that even with unbounded semaphores the maximum semaphore size is
actually limited by the operating system, see your local POSIX threads
documentation for details. If this implementation-specific limit is
exceeded, the attempt to post a new value raises a syserr 9
exception.
public extern type Semaphore; // semaphore type public issemaphore SEM; // check for semaphore objects public extern semaphore; // semaphore object public extern bounded_semaphore MAX; // bounded semaphore object public extern post SEM X; // enqueue a value public extern get SEM; // dequeue a value public extern try SEM; // try SEM, (SEM,TIME) for timeout public extern get_size SEM; // get the current queue size public extern get_bound SEM; // get the max queue size (0 if none) |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Some remarks about the interaction between clib
threads and the
interpreter's signal handling are in order. It is a well-known fact that
threads and signals generally do not mix very well. Therefore, in the
current implementation, all signal handling is actually performed in the
interpreter's main thread. This is where you should set up your signal
handlers using the trap
and catch
builtins, as explained
in Exception Handling. If necessary, the main thread of your
script can inform other threads about received signals using the thread
synchronization functions described above.
However, the interpreter still allows you to set up traps in secondary
threads and keeps track of the configured traps separately for each
thread. The traps in secondary threads will not have any effect until
such a thread forks (see Process Control). In this case the
forking thread will be the one and only thread in the child process and
becomes the main thread. At this point, it takes on the signal handling,
and any traps which have been set up before become active. (Note that in
order to make this work reliably, you should configure the traps
before you call fork
and protect the call to fork
with an enclosing catch
. Otherwise a signal might arrive in the
time between the fork
and the next function call, causing the new
process to exit before it had a chance of setting up its own signal
handling.)
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
As already mentioned, threads allow you to perform a lengthy calculation as a background task. You can start such a task simply as follows:
==> def TASK = thread (sum [1..1000000]) ==> // do some other work ... ==> result TASK // get the result 500000500000 ==> stats all thread #0: 0 secs, 1 reduction, 0 cells thread #1: 4.3 secs, 2000004 reductions, 2000007 cells |
As indicated, if the stats
command is invoked with the parameter
all
then it also shows the statistics of background threads once
they are finished. To release all resources associated with a thread
(and also remove it from the stats all
list) you must undefine all
variables referencing the thread handle:
==> undef TASK |
Conditions are useful when a thread has to wait for some condition before proceeding with a computation. E.g.:
==> def COND = condition, TASK = thread (await COND || writes "Got it!\n") ==> signal COND () Got it! |
Semaphores are used when expression values have to be passed from one thread to another. For instance, let us rewrite the backtracking algorithm for the N queens problem (see Exception Handling) such that it runs as a background task which returns results via a semaphore instead of printing them directly on the terminal:
def RES = semaphore; // semaphore used to transmit results queens N = thread (search N 1 1 [] || post RES ()); search N I J P = post RES P if I>N; = search N (I+1) 1 (P++[(I,J)]) || fail if safe (I,J) P; = search N I (J+1) P if J<N; = () otherwise; |
Note that we added a post RES ()
to signal that all solutions
have been found. The safe
function is as in Exception Handling. In order to limit the size of the RES
semaphore queue,
we could also use a bounded semaphore instead, e.g.:
def RES = bounded_semaphore 10; |
We can use a second thread to process the semaphore queue and print solutions as they become available:
print = thread (loop print1 RES); print1 () = return (); // no more results print1 P = write P || writes "\n" otherwise; /* iterate a function over a semaphore */ loop F SEM = F (get SEM) || loop F SEM; |
To use this program, simply start the two background tasks as follows:
==> def QUEENS = queens 8, PRINT = print |
Both threads will begin to execute immediately, and you will see the
results scroll by. Once the threads are finished, you can check the
resources used by each thread with the stats all
command:
==> stats all thread #0: 0 secs, 2 reductions, 2 cells thread #1: 1.71 secs, 700451 reductions, 943 cells thread #2: 0 secs, 461 reductions, 3 cells |
Also note that here we used again a def
statement to assign the
thread handles to corresponding variables. It is important that you keep
these variables as long as you want the threads to survive. The
interpreter automatically cancels a thread as soon as the corresponding
thread handle is garbage collected. Hence you can stop the threads at
any time by simply undefining the variables using, e.g., the
clear
command.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
These are in clib
and thus included in the prelude. Expression
references work like pointers to expression values. The reference is
initialized with the ref
function, giving the initial value as an
argument. The referenced value can then be changed with the put
function and retrieved with get
. Also, as of Q 7.8 an
"assignment" operator `:=' is provided as syntactic sugar for
put
.
public extern type Ref; // reference type public isref REF; // check for reference objects public extern ref X; // initialize a reference object public extern put REF X; // store a new value public extern get REF; // retrieve the current value public (:=) X Y @ (=); // assignment operator |
NOTE: References can point to arbitrary values, in particular they can also point to other references. However, cyclic chains of references should be avoided since in the current implementation the interpreter cannot garbage-collect such structures.
References can be used to implement mutable data structures. This is particularly useful if a function has to maintain some internal state to perform its calculations. For instance, here is a function which memoizes past function calls in a hash table so that each requested value only has to be computed once:
def HASH = ref emptyhdict; public hashed F X; hashed F X = get HASH!(F,X) if member (get HASH) (F,X); = put HASH (update (get HASH) (F,X) Y) || Y where Y = F X otherwise; |
Using the hashed
function, a hashed version of the recursive
Fibonacci function can be implemented as follows. Since no value of the
function is computed more than once, the definition works in linear time
(up to logarithmic factors for the table updates and lookups). In fact,
even an expression like map fib [0..N]
will only require a
linear number of table manipulations and other operations.
fib = hashed hfib; hfib 0 = 0; hfib 1 = 1; hfib N = fib (N-1)+fib (N-2) if N>1; |
References also provide a general means for communicating values in a multithreaded script (see POSIX Threads). In this case they are typically used together with a condition which is signaled when the value changes:
def VAL_CHANGED = condition, VAL = ref (); change X = put VAL X || broadcast VAL_CHANGED; |
A loop like the following might then be executed by any number of other threads which have to keep track of the value in question, and perform some appropriate action when the value changes:
watch = await VAL_CHANGED || action (get VAL) || watch; |
As of Q 7.1, there is a a second, lazy kind of expression references
called sentinels. Sentinels defer the evaluation of the
referenced expression until the sentinel is garbage-collected. They are
created with the sentinel
function which takes the expression to
be evaluated later as a special argument. There are no other access
operations.
public extern type Sentinel; // sentinel type public issentinel S; // check for sentinel objects public extern special sentinel X; // create a sentinel object |
Sentinels are a means to trigger automatic cleanup of an ordinary Q data structure in the same fashion as the cleanup performed by some built-in and external data types. Example:
==> def X = sentinel (puts "cleaning up...\n") ==> undef X cleaning up... |
An important application of references are mutable sequences
which can be implemented as tuples, lists or streams of references. As
of version 7.8, Q provides an auxiliary module reftypes.q
which
implements some useful convenience functions for working with these
kinds of objects. Please note that this module is still experimental,
and has to be imported explicitly right now as it is not yet included in
the prelude.
The following operations are provided to construct a reference tuple, list or stream from a tuple, list or stream of its values, respectively.
public reftuple Xs, reflist Xs, refstream Xs; |
The following functions work like the standard library functions
mktuple
, mklist
and mkstream
, but construct a
sequence of references all initialized to the given value instead.
public mkreftuple X N, mkreflist X N, mkrefstream X; |
All of these are also provided as "2D" variations to deal with matrix-like sequences:
public reftuple2 Xs, reflist2 Xs, refstream2 Xs; public mkreftuple2 X NM, mkreflist2 X NM, mkrefstream2 X; |
Note that mkreftuple2
and mkreflist2
take a pair of
dimensions (N,M)
as the second parameter to denote the desired
number of rows N
and columns M
.
The module also provides the following pseudo type-checking functions. Please note that, for efficiency, these only look at the first component to see whether it is a reference (or reference sequence).
public isreftuple Xs, isreflist Xs, isrefstream Xs; public isreftuple2 Xs, isreflist2 Xs, isrefstream2 Xs; |
Moreover, the module overloads the get
and put
functions
so that they work with an entire sequence at once. You can apply
get
to a tuple, list or stream of references to obtain the tuple,
list or stream of its values, respectively, and you can invoke put
Xs Ys
to set the references in Xs
to the corresponding values in
the sequence Ys
. This also works for nested sequences such as a
list of lists of references.
A dereferencing index operator Xs!!I
is provided as an
abbreviation for get (Xs!I)
:
public (!!) Xs I @ (!); |
The fill
function fills references with a given value, and
putmap
implements a destructive variation of the standard library
map
function:
public fill Y X, putmap F X; |
These both work with either individual references or nested sequences of
references. Note that putmap F X
applied to a reference X
updates X
with F (get X)
, whereas fill
just fills
references with a constant value. (Thus fill X
does exactly the
same as putmap (cst X)
, but is implemented directly for
efficiency.) CAVEAT: Please note that fill
takes the fill value
as the first parameter. This allows for easier currying. For
instance, you can quickly define yourself a function to zero out a
sequence as follows:
zero = fill 0; |
The following examples show most of the operations in action, taking a reference list as an example.
==> import reftypes // needs to be imported explicitly ==> def Xs = reflist [1..10] // create a new reference list ==> get (Xs!5) // get a member 6 ==> Xs!!5 // syntactic sugar for the above 6 ==> put (Xs!5) 77 // change a member () ==> Xs!5 := 77 // syntactic sugar for the above () ==> Xs!!5 77 ==> get Xs // get all members [1,2,3,4,5,77,7,8,9,10] ==> putmap (*2) Xs // destructive map () ==> get Xs [2,4,6,8,10,154,14,16,18,20] ==> fill 0 Xs // fill with given value () ==> get Xs [0,0,0,0,0,0,0,0,0,0] ==> Xs := [1..10] // update all values at once () ==> get Xs [1,2,3,4,5,6,7,8,9,10] |
Tuples and streams work in an analogous fashion. Note that reference
streams are always fully memoized so that the reference values are
properly remembered when the stream is traversed for inspection and
updates. Infinite streams work, too, since only the finite part of the
stream which is accessed in the program ever gets evaluated. You must
take care, however, that you don't apply fill
, put
or
putmap
(which are strict operations) to the entire stream, but
only to a finite subsection of the stream. For instance:
==> def Xs = refstream {1..} ==> Xs!!5 6 ==> Xs!5 := 77 () ==> Xs!!5 77 ==> def Ys = take 10 Xs // apply putmap only to this finite subsequence ==> strict $ get Ys {1,2,3,4,5,77,7,8,9,10} ==> putmap (*2) Ys () ==> strict $ get Ys {2,4,6,8,10,154,14,16,18,20} ==> Xs!!5 154 |
Note that at this point only the first 10 members of the stream have actually been evaluated, the rest of the stream is still "thunked":
==> Xs {<<Ref>>,<<Ref>>,<<Ref>>,<<Ref>>,<<Ref>>,<<Ref>>,<<Ref>>,<<Ref>>,<<Ref>>, <<Ref>>,ref 11|lazy (map ref (iterate (+1) ((+1) 11)))} |
The reftypes.q
module also contains some magic to make the
overloaded get
, put
, fill
and putmap
functions work with any "indexed" reference container type providing a
vals
, list
or list2
function which gives access to
the container's reference components as a list. Examples of suitable
container types are the Array
, Dict
and HDict
types
discussed in Standard Types. For instance (taking the Array
type, which provides a kind of vector/matrix data structure with
logarithmic lookup and modification times, as an example):
==> import array reftypes ==> def A = array $ reflist [1..10] ==> get A [1,2,3,4,5,6,7,8,9,10] ==> A!5 := 99; get A () [1,2,3,4,5,99,7,8,9,10] ==> putmap (*2) A; get A () [2,4,6,8,10,198,14,16,18,20] ==> fill 0 A; get A () [0,0,0,0,0,0,0,0,0,0] ==> A := [0.1,0.2..1.0]; get A () [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0] |
2D containers a.k.a. matrices can be used in the same fashion:
==> def A = array2 $ reflist2 [1,2,3; 4,5,6] ==> get A [[1,2,3],[4,5,6]] ==> A!(1,1) := 55 () ==> A!!(1,1) 55 ==> get A [[1,2,3],[4,55,6]] ==> fill 0 A; get A () [[0,0,0],[0,0,0]] ==> A := [0.1,0.2,0.3; 0.4,0.5,0.6]; get A () [[0.1,0.2,0.3],[0.4,0.5,0.6]] ==> putmap (*10) A; get A () [[1.0,2.0,3.0],[4.0,5.0,6.0]] ==> A!1 := [3,2,1]; get A () [[1.0,2.0,3.0],[3,2,1]] |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The functions described in this section (which need to be imported from
the system
module) are used to return information about the
active timezone and the current time, and convert time values to various
formats. Also available are functions to measure cpu time and
high-resolution timers. These functions are in close correspondence with
the date and time functions of the C library.
The calendar date and time functions use two different representations for time values:
T
in the sequel) is represented as the
number of seconds elapsed since the "epoch", 00:00:00 on January 1,
1970, UTC.
TM
) is encoded as a 9-tuple
(YEAR, MONTH, DAY, HOUR, MIN, SEC, WDAY, YDAY, ISDST)
. See the
description of the tm
struct in the ctime
(3) manual page
for more information.
Three functions (tzname
, timezone
, daylight
) are
provided to return information about the current timezone and daylight
savings settings. The gmtime
and localtime
functions
convert simple time to broken-down time using UTC or the local timezone,
respectively. The functions mktime
and asctime
are used to
convert a broken-down time value into a simple time value or the
standardized string representation used by the date
(1) program,
respectively. The ctime
function combines localtime
and
asctime
. For more flexible formatting of time values, the
strftime
function can be used, which takes as its first argument
a format string as described on the strftime
(3) manual page.
public extern tzname, timezone, daylight; public extern ctime T, gmtime T, localtime T; public extern mktime TM, asctime TM, strftime FORMAT TM; |
The components of broken-down time. can be accessed with the following functions:
public tm_year TM, tm_month TM, tm_day TM, tm_hour TM, tm_min TM, tm_sec TM, tm_wday TM, tm_yday TM, tm_isdst TM; |
On the current 32 bit systems time values must be representable as 4
byte integer values, which means that the available range usually goes
from "Fri Dec 13 20:45:52 1901"
to "Tue Jan 19 03:14:07
2038"
. (This will probably be fixed before "time ends" on current
UNIX systems in January 2038, though.)
Some examples:
==> tzname; timezone; daylight ("CET","CEST") -3600 1 ==> ctime time "Mon Mar 17 04:08:29 2003\n" ==> localtime time (103,2,17,4,8,41,1,75,0) ==> strftime "Hey it's %c" _ "Hey it's Mon Mar 17 04:08:41 2003" ==> ctime (st_mtime (stat "README-Clib")) "Wed May 1 16:59:00 2002\n" |
Two additional functions are provided for measuring cpu time. Note that
clock
and times
measure cpu time in different units, given
by the constants CLOCKS_PER_SEC
and CLK_TICK
,
respectively. The times
function returns a 5-tuple (TOTAL,
UTIME, STIME, CHILD_UTIME, CHILD_STIME)
. See times
(2) for
details.
public extern clock, times; |
For instance, you can calculate the cpu time in seconds it takes to
evaluate an expression by computing the difference between the
clock
time at the end and at the beginning, divided by
CLOCKS_PER_SEC
:
==> -(clock - (prd [1..10000] || clock)) / CLOCKS_PER_SEC 0.14 |
Note that the initial value and the resolution of the clock
timer
is system-dependent. Usually the timer is initialized at process
creation time, but that is not guaranteed. Moreover, the value of
clock
typically is a machine integer which wraps around after a
finite time interval; on 32 bit systems with CLOCKS_PER_SEC =
1000000
, as recommended by the POSIX standard, this happens
approximately every 72 minutes.
On systems where the POSIX 1003.1-2001 timer extension is available (at
the time of this writing, this comprises most recent Linux and Unix
systems, but not Windows), the system
module provides a number of
operations to deal with high-resolution timers. These will be useful for
realtime applications and other programs with strict timing
requirements.
public extern nanotime ID, nanores ID; public extern nanosleep ID T, nanosleep_until ID T; |
Note that these operations differ from the builtin time
and
sleep
functions (see Miscellaneous Functions) in that they
use integer (unsigned 64 bit) time values specified in nanoseconds. The
nanotime
function returns the current time in nanosecs, the
nanores
function the resolution of the given timer (i.e., the
smallest measurable amount of time in nanosecs). The nanosleep
function sleeps for the specified amount of nanoseconds, while the
nanosleep_until
function sleeps until the given (absolute) time
arrives. In case of any internal error condition, these functions will
set errno
accordingly.
Just like the sleep
function, the nanosleep
and
nanosleep_until
functions may wake up early if a signal is
delivered to the process. In this case errno
will be set to
EINTR
, and nanosleep
will return the remaining time until
the planned wakeup, while nanosleep_until
will simply fail. If
the sleep terminates normally, nanosleep
will return zero and
nanosleep_until
()
, respectively.
For each of the functions you have to specify an integer-valued timer id. In the current implementation, the following timers are supported:
CLOCK_REALTIME
: The system clock, measured in nanosecs since the
epoch. This will be the same time as returned by the builtin time
function (up to rounding issues with the latter).
CLOCK_MONOTONIC
: Monotonic (non-decreasing) time since some
unspecified starting point. This clock cannot be reset by the user and
is thus to be preferred in realtime applications.
CLOCK_PROCESS_CPUTIME
, CLOCK_THREAD_CPUTIME
: Highres
per-process and per-thread CPU timers. (Normally these are for timing
purposes only and might be unreliable on SMP systems, see
clock_gettime
(3) for details.)
Only CLOCK_REALTIME
is guaranteed to exist (on systems which
implement the highres timers at all, that is). If a given clock is
unavailable on the host system, the corresponding id will be ()
.
(If CLOCK_REALTIME
is ()
then the highres timers are not
supported at all.)
Moreover, if CPU timers are available, then the following
process_cpu_clockid
and thread_cpu_clockid
functions can
be used to return the clock id for the given process id (0 denotes the
current process) and the given thread, respectively:
public extern process_cpu_clockid PID, thread_cpu_clockid THREAD; |
CAVEAT: While the highres timers nominally support nanoseconds
resolution, the actual resolutions depend on your system setup and will
typically be much coarser (for the system and monotonic clocks, they are
usually in the milliseconds range on current PCs). Also, be aware that
there are system latencies which might cause calls to nanosleep
and nanosleep_until
to wake up late. You can use realtime
scheduling priorities to mitigate these effects to some extent, see
Realtime Scheduling, but some sources of latency will still
remain.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
As of Q 7.0, the following functions give access to the locale and
internationalization functions provided by the C library on most modern
operating systems, which let you deal with different character sets and
language-specific conventions. If you have iconv
and
gettext
on your system, interfaces to these functions will be
provided as well. As of Q 7.8, all of these need to be imported from the
system
module.
public extern setlocale CATEGORY LOCALE; public extern localeconv; public extern nl_langinfo ITEM; |
The setlocale
function sets the current locale of the
program. This function is usually invoked near the beginning of an
application to initialize locale-dependent processing. (The Q
interpreter also initializes the default locale at startup time, so you
only have to invoke this function yourself if you want to set a locale
which is different from the user's default locale, or if you have to
change the locale during execution of the program.) The CATEGORY
parameter determines which part of the locale is to be modified, and can
be any of the LC_XXX
constants defined in the manifest constants
section at the beginning of this module. The LOCALE
string must
be a valid locale value for the given category, or ""
to specify
that the locale is to be set from the corresponding user
defaults. LOCALE
can also be ()
if you only want to query
the current locale. In any case the current locale value for the given
category is then returned as a string. For a description of the
different locale categories please refer to the setlocale
(3)
manpage.
The localeconv
function returns various numeric formatting
information about the current locale. The result is returned as a
tuple. Please refer to clib.q
and the localeconv
(3)
manpage for details about this operation. You'll rarely have to use the
information provided by this function directly, as the strfmon
function (see below) allows you to format numbers and monetary values in
a locale-dependent way.
On SUSv2-compatible Unix systems, additional information about the
current locale is available using the nl_langinfo
function. The
ITEM
parameter is one of the symbolic constants defined in the
manifest constants section at the beginning of clib.q
(see
nl_langinfo
(3) for a description of these). The result of
nl_langinfo
is always a string.
The following functions help with the formatting of various locale-dependent items:
public extern strfmon FORMAT ARGS; public extern strcoll S1 S2, strxfrm S; public extern wcswidth S; public wcwidth C; |
The strfmon
function provides locale-dependent formatting of
numbers and monetary values. See strfmon
(3) for a description of
the syntax of the FORMAT
string. ARGS
is either a
singleton value or a tuple of floating point values, as required for the
given format string. Integers in the ARGS
parameter are
automatically converted to floating point values.
The strcoll
function compares two strings in a locale-dependent
way; it returns <0, 0, >0 iff S1
is, respectively, less than,
equal to, or greater than S2
. The strxfrm
function
converts the given string into a form which can be used for locale-based
comparisons; by comparing the transformed strings you always get the
same results as by checking the result of strcoll
.
The wcswidth
function returns the actual number of columns
required to print a string, which may be different from the length of
the string if the string contains non-ASCII Unicode characters. If any
of the characters in the given string is non-printable, -1 is
returned. The wcwidth
function does the same for single
characters.
public extern type IConv; public extern iconv_open TO FROM, iconv_close IC, iconv IC B; |
The IConv
type represents conversion state objects for the
iconv
function. Objects of this type are created with
iconv_open
which takes two strings describing the desired target
and source encodings as arguments, and destroyed with the
iconv_close
function (this is also done automatically when an
IConv
object is garbage-collected).
The iconv
function does the actual conversion. It takes as
arguments the current conversion state IC
(an IConv
object) and the byte string B
to be converted, and returns the
converted byte string. The second argument can also be ()
in
which case iconv resets the conversion to the initial state and returns
the "shift" sequence necessary to return to the initial state, if any
(this will always be the empty byte string in the case of a stateless
encoding).
If anything goes wrong during the conversion, instead of a single byte
string iconv
returns a pair of byte strings (C,D)
where
C
is the successfully converted initial part of the sequence and
D
is the remaining data, starting from the position in the source
buffer B
where the error occurred. It is then the responsibility
of the application to check the errno
value in order to determine
what exactly went wrong, see iconv
(3) for details.
Please note that the supported combinations of source and target
encodings depend on the system and particular iconv
implementation. Using GNU iconv
, you can obtain the list of all
supported encodings with the `iconv --list' command.
public extern textdomain DOMAIN, bindtextdomain DOMAIN DIR; public extern gettext MSGID, dgettext DOMAIN MSGID, dcgettext DOMAIN MSGID CATEGORY; public extern ngettext MSGID1 MSGID2 N, dngettext DOMAIN MSGID1 MSGID2 N, dcngettext DOMAIN MSGID1 MSGID2 N CATEGORY; |
These are simple wrappers for the corresponding C functions from the GNU
gettext
library.
The textdomain
function selects the default domain to be used
with the gettext
/ngettext
functions. The
bindtextdomain
function determines the base directory where the
message files for the given domain are to be found. Both functions
return the current setting, i.e., the current domain for
textdomain
, and the current directory for the given domain for
bindtextdomain
. To only query the current setting without
changing it, use ()
as the DOMAIN
argument to textdomain
and the DIR
argument to bindtextdomain
.
The gettext
function retrieves a message from a message catalog
(determined with the textdomain
and bindtextdomain
functions) and returns the translated string. The message to be
translated is specified as a string MSGID
. If no translation can
be found then gettext
returns MSGID
unchanged.
The other variations of gettext
allow you to explicitly specify a
domain (usually the application name, which defaults to the value set
with the textdomain
function) and a locale category
(LC_MESSAGES
by default). The ngettext
functions are
similar, but provide a way to translate plural forms. See the
gettext
(3) and ngettext
(3) manual pages for details.
Note that the GNU gettext
family of functions uses message
catalogs in a binary format, which can be constructed from a readable
description using the msgfmt
(1) utility. Currently no special
support is provided for the automatic extraction of messages from Q
scripts using tools like xgettext
(1). This will likely be fixed
in future releases, but for the time being you will have to prepare the
msgfmt
input files manually, or use a custom script for that
purpose.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Matching filenames against patterns with shell wildcards (*
, ?
etc.), also known as filename "globbing", is implemented by the following
functions from clib.q
:
public extern fnmatch PATTERN S; // check whether S matches PATTERN public extern glob PATTERN; // list of all filenames matching PATTERN |
==> map (fnmatch "*.q") ["clib.q","clib.c"] [true,false] ==> map (fnmatch "*.[cq]") ["clib.q","clib.c"] [true,true] ==> glob "*.[qc]" ["clib.c","clib.q","factor.q","globexamp.q","regexamp.q"] ==> glob "/h*" ["/home"] |
Only a single pattern is accepted by both fnmatch
and
glob
. However, it is a simple matter to provide your own rules which
extend the clib
operations to lists of patterns:
fnmatch [] S:String = false; fnmatch [P:String|Ps] S:String = fnmatch P S or else fnmatch Ps S; glob Ps:List = rmdups (sort (<) (cat (map glob Ps))); rmdups [] = []; rmdups [X,X|Xs] = rmdups [X|Xs]; rmdups [X|Xs] = [X|rmdups Xs] otherwise; |
Note that in our extension of the definition of glob
we employ
the sort
function (see C Replacements for Common Standard Library Functions) to sort the resulting list, and then remove adjacent
duplicates (which could occur because a file may match more than one
pattern).
Now let's see these definitions in action:
==> map (fnmatch ["*.q","*.c"]) ["clib.q","clib.c"] [true,true] ==> glob ["*.q","*.c","clib.*"] ["clib.c","clib.q","clib.so","factor.q","globexamp.q","regexamp.q"] |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The clib
module implements regular expression matching using the
POSIX regcomp
and regexec
functions. Regular expressions
are generally specified using the extended (egrep
-like)
syntax. Clib
divides the matching functions into two interfaces,
the high-level interface (regex
function) and the
low-level interface (regmatch
and friends). Both interfaces
are used with a third group of match state functions (reg
and friends) which provide access to information about the current
match.
For most purposes, the high-level regex
function should be all that is
needed. However, the low-level functions are provided so that you can create
your own specialized regular expression engines, should you ever need to do
so. In this case you might wish to take a look at the definition of the
regex
function in clib.q
and modify it to suit your needs.
Note that in accordance with POSIX matching semantics, all matching functions
search for an occurrence of the pattern in the target string, rather than
simply checking whether the whole string matches the given pattern. If the
latter functionality is required, you can use ^
and $
to tie the
match to the beginning and the end of the string.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The following function is implemented as a special form, with the
EXPR
argument being passed unevaluated.
public special regex ~OPTS ~REGEX ~S EXPR; |
The regex
function evaluates, for each match of the given regular
expression (also called the pattern) in the given string, the
special EXPR
argument, and returns the collection of all results
as a list, in the order in which the matches were found. If no match is
found, the result list will be empty. The EXPR
argument typically
uses the match state functions (see Match State Information) to
retrieve the current match information. You can also invoke the
regdone
function (see Low-Level Interface) to escape from
the search at any time.
Matching generally proceeds from left to right, and if several different
substrings match at a given position, the longest match is preferred. The
matching process is controlled by means of the OPTS
string argument,
which may contain zero or more of the following option characters (the
corresponding flags of the POSIX regcomp
/regexec
functions are
given in parentheses, where applicable):
g
Match globally, i.e., find all non-overlapping occurrences. Otherwise only the first match is reported (if any).
G
Like g
, but report overlapping matches as well.
i
Do case-insensitive matches (REG_ICASE
).
n
Do multi-line matches. This makes ^
and $
match line beginnings
and ends (besides beginning and end of the string), and makes .
and
lists [...]
not match the newline character, unless it is
explicitly included in a list (REG_NEWLINE
).
^
Do not match ^
at the beginning of the string
(REG_NOTBOL
).
$
Do not match $
at the end of the string (REG_NOTEOL
).
Abnormal error conditions such as bad regular expression syntax are handled by
returning an expression of the form regerr MSG
where MSG
is a
string describing the error. You may give an appropriate definition of
regerr
, or check for literal regerr MSG
values, to handle such
error conditions in any way you like.
The regex
function is clib
's primary interface to perform
regular expression matching, and should cover most usual
applications. You can find a number of examples showing how to use this
function below (see Basic Examples). The regex
function
itself is implemented in Q using the low-level functions discussed in
the following section. You might wish to take a look at these if you
need to do something special which cannot be done with the regex
function (or cannot be done in an efficient manner).
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The following functions are implemented in C using the POSIX regex functions.
public extern regmatch OPTS REGEX S, regnext, regdone; |
The regmatch
function searches for the first (i.e., leftmost)
occurrence of the regular expression pattern in the given string, and
the parameterless regnext
function searches for the next
occurrence. Both functions normally return a truth value indicating
success or failure (but see the remarks concerning error handling
below). In case of success, you can retrieve the current match using the
match state functions (see Match State Information).
IMPORTANT: These functions have side-effects; they change the
internal match state which is accessed using the match state functions
(see Match State Information). This hidden state is provided for your
convenience, to spare you the trouble of having to pass explicit parameters
between the matching operations and the match state functions. All match state
information is maintained on an internal stack, so that you can start a new
search with the regmatch
function while another one is still in
progress. (This also implies that you can recursively invoke regex
inside the EXPR
argument of a regex
call.)
The OPTS
argument has the same meaning as for the regex
function, and error handling also works in the same fashion. That is, in case
of an abnormal error condition the regmatch
/regnext
functions
return a regerr MSG
expression instead of a truth
value. See High-Level Interface.
Note that the regnext
function will always return false
, unless
one of the g
/G
(global search) options is specified in the
preceding regmatch
call.
To terminate a (global) search which is still in progress (i.e., has not
failed yet), you can call the regdone
function, which always returns
()
. Subsequent invocations of the match state functions will then
behave just like after a failing regmatch
/regnext
call. (After a
failed match, regdone
is not needed; it will be invoked automatically.)
Some care is needed to ensure proper operation of the low-level routines with
multiple nested searches. As a general rule, each search successfully started
with regmatch
must be terminated either with a failing
regnext
call, or by an invocation of the regdone
function. In
both cases, after termination of the nested search, a subsequent
regnext
will continue where regmatch
/regnext
left off
when the nested search was started.
It is important to note that these rules also apply if the search is
non-global. The main rationale behind this behaviour is that, even if
one is only interested in the first match, one must still be able to obtain
information about any trailing unmatched text using the
regstart
/regskip
functions (see Match State Information),
and this information is only available after a match has failed (or was
aborted using regdone
). Thus the interface functions should always
behave consistently, no matter whether the g
/G
option was
specified or not. (In fact, the only difference g
/G
makes is
that, if it is omitted, then regnext
will pretend that no next match is
available even if there is one.)
If this sounds confusing to you, just stick with the high-level regex
function which takes care of all those messy details. A nested matching
example using regex
is discussed in Nested Searches.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
These functions are typically invoked after regmatch
, regnext
,
or in the EXPR
argument of regex
, to report information about
the current match.
public extern regstart, regskip, reg N, regpos N, regend N, regs; |
The reg
function returns the text of a match, regpos
its
beginning in the target string (first character position), and regend
its end (first position behind the match). These functions take an
integer argument denoting the group (a.k.a. parenthesized subexpression)
of the regular expression being matched. An argument of 0
always refers
to the whole match, N > 0
to the text matched by the N
th group
(counting opening parentheses from the left). The match reported for a given
group will always extend to the right as far as possible ("longest match"),
subject to the constraint that enclosing groups will also prefer the longest
match, and they always take priority. Thus reg 0
will extend to the
right as far as possible, reg 1
will be the longest possible match for
group 1 inside this match, etc.
If a group belongs to a repetitive pattern (patterns involving the *
,
+
operators, etc.), then it may be matched more than once. In this
case, the text last matched by the group is reported by the
reg
/regpos
/regend
functions.
If the group belongs to an optional part of the pattern (*
, ?
,
different alternatives with |
), then the group might not be matched at
all. For such unmatched groups, regpos
and regend
both return
-1
and reg
returns the empty string. These values are also
returned for all groups (including 0
) after a match has failed
or has been aborted using the regdone
function.
The regs
function returns a list with the indices of all matched groups
(except 0
).
The parts of the target string which were not matched can also be
accessed. This is done by means of the regstart
and regskip
functions, which return, respectively, the position where the last search for
a match started, and the text from this position up to the following match (or
to the end of the string if the match failed). These values can also be
accessed after a failing or aborted match (e.g., after regex
has
finished), which is useful to determine a trailing unmatched portion in the
input string.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Please refer to your UNIX manual for a description of the regular expression
syntax. (To get a start, take a look at the egrep
(1) manual page; most
of what it has to say about (extended) regular expressions should be
applicable here too. There are also some good books on the subject.)
So let's consider some elementary examples first. Here's how we can find all occurrences of "identifiers" (defined here as sequences of letters and digits, starting with a letter) in a string:
==> regex "g" "[A-Za-z][A-Za-z0-9]*" "1var foo 99 BAR $%&" (reg 0) ["var","foo","BAR"] |
The first argument to regex
is always the option string; the "g"
in our example indicates a global search, i.e., a search for all
(non-overlapping) matches of the given pattern. The second argument is the
regular expression pattern to be matched, and the third argument is the string
to be searched for matches against the pattern. The fourth argument is the
expression to be evaluated for each match; in our case, reg 0
is used
to simply retrieve the match. Actually, we could provide any appropriate
expression here, e.g., we might use sprintf
to add some fancy
formatting:
==> regex "g" "[A-Za-z][A-Za-z0-9]*" "1var foo 99 BAR $%&" \ (sprintf "MATCH: %s" (reg 0)) ["MATCH: var","MATCH: foo","MATCH: BAR"] |
What happens if the pattern contains an error, such as an unmatched bracket? Let's see:
==> regex "g" "[A-Za-z][A-Za-z0-9*" "1var foo 99 BAR $%&" (reg 0) regerr "Unmatched [ or [^" |
Reasonably enough, regex
returns an error message.
The regex
function always enumerates matches from left to right, and
only reports at most one match for each position in the input string. If more
than one substring matches at a given position, the longest match is
preferred. Thus in the above example, e.g., foo
is matched, and not the
individual characters f
and o
which by themselves also match the
given pattern.
Note that with the g
option only non-overlapping matches are reported;
if we want all occurrences (even overlapping ones) then we specify the
G
option instead:
==> regex "G" "[A-Za-z][A-Za-z0-9]*" "1var foo 99 BAR $%&" (reg 0) ["var","ar","r","foo","oo","o","BAR","AR","R"] |
And if we are only interested in the first match, we simply omit the
g
/G
option character:
==> regex "" "[A-Za-z][A-Za-z0-9]*" "1var foo 99 BAR $%&" (reg 0) ["var"] |
So far, so good. After these basics, let's reconsider our original search expression:
==> regex "g" "[A-Za-z][A-Za-z0-9]*" "1var foo 99 BAR $%&" (reg 0) ["var","foo","BAR"] |
We can simplify the pattern somewhat if we use the i
option to specify
a case-independent search:
==> regex "gi" "[a-z][a-z0-9]*" "1var foo 99 BAR $%&" (reg 0) ["var","foo","BAR"] |
Another way to write this pattern uses the predefined character classes
[:alpha:]
and [:alnum:]
; these constructs have the advantage
that they are portable, i.e., they work in different character sets and
locales.
==> regex "g" "[[:alpha:]][[:alnum:]]*" "1var foo 99 BAR $%&" \ (reg 0) ["var","foo","BAR"] |
So now we know how we can find identifiers in a string. But what if we have to
check whether a given string is an identifier? For this purpose, we
must match the string as a whole against the pattern. We can do this by
tying our pattern to the beginning and end of the string using the ^
and $
"anchors". (The g
option is not required here, since we
are only looking for a single match.)
==> regex "" "^[[:alpha:]][[:alnum:]]*$" "foo" (reg 0) ["foo"] ==> regex "" "^[[:alpha:]][[:alnum:]]*$" "1var" (reg 0) [] |
If we only want a truth value, we can simply test the size of the result list; in this case, any expression argument will do:
==> 1=#regex "" "^[[:alpha:]][[:alnum:]]*$" "foo" () true ==> 1=#regex "" "^[[:alpha:]][[:alnum:]]*$" "1var" () false |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Empty matches are permitted as well, subject to the constraint that at most one match is reported for each position (which also prevents looping). And of course an empty match will only be reported if nothing else matches. For instance:
==> regex "g" "" "foo" (regpos 0) [0,1,2,3] ==> regex "g" "o*" "foo" (regpos 0) [0,1,3] |
(The usefulness of such constructs might be questionable, but see Performing Replacements, for a reasonable example.)
As already mentioned, only non-overlapping matches are reported with the
g
option. However, occasionally it might be useful to also
determine overlapping matches, and for this purpose the G
option
is provided. For instance, suppose that we want to produce a table
showing which letters follow another given letter in a text. The first
step is to produce a list of all pairs of adjacent letters, and since we
really need all pairs here, we employ the G
option:
==> regex "G" "[[:alpha:]]{2}" "silly" (reg 0!0,reg 0!1) [("s","i"),("i","l"),("l","l"),("l","y")] |
Now it is an easy matter to collect the desired information using, e.g., a dictionary and a little bit of "list voodoo:"
==> var add = \D (X,Y).update D X (insert (D!X) Y) ==> def D = foldl add (mkdict emptyset (map fst _)) _ ==> zip (keys D) (map list (vals D)) [("i",["l"]),("l",["l","y"]),("s",["i"])] |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
In previous examples we saw how we can tokenize a string by matching its
constituents (see Basic Examples). Sometimes one also has to go the
other way round, i.e., split a string into arbitrary tokens separated by
"non-tokens". E.g., we might want to split a string into tokens
delimited with whitespace, which can be described by an expression like
"[ \t\n]+"
, or, in a locale-independent fashion, using
"[[:space:]]+"
. To do this with the regex
function, we
must find all text that is between matches, and any trailing text
after the last matched delimiter. The regskip
function lets us do
this as follows:
==> regex "g" "[[:space:]]+" "The little\t brown\n fox." regskip \ ++ [regskip] ["The","little","brown","fox."] |
Here the regskip
expression argument to regex
is used to access
the intervening tokens, and the final ++ [regskip]
appends the last
token.
We can use this method to define our own little regular expression tokenizing function as follows:
regsplit OPTS REGEX S = regex OPTS REGEX S regskip ++ [regskip]; |
With this definition, the above example works as follows:
==> regsplit "g" "[[:space:]]+" "The little\t brown\n fox." ["The","little","brown","fox."] |
We could also omit the g
option, in which case only the first delimiter
match is used as a splitting point:
==> regsplit "" "[[:space:]]+" "The little\t brown\n fox." ["The","little\t brown\n fox."] |
Splitting with an empty or optional pattern also works as expected:
==> regsplit "g" "" "some text" ["","s","o","m","e"," ","t","e","x","t",""] ==> regsplit "g" " ?" "some text" ["","s","o","m","e","","t","e","x","t",""] |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A similar idea is also used to replace matches with new text. You can define a substitution function, which replaces each match with the result of an expression, as follows:
special regsub ~OPTS ~REGEX ~S EXPR; regsub OPTS REGEX S EXPR = strcat (regex OPTS REGEX S (regskip++EXPR)) ++ regskip; |
For instance:
==> regsub "g" "[[:alpha:]][[:alnum:]]*" "1var foo 99 BAR $%&" \ (sprintf "-*-%s-*-" (reg 0)) "1-*-var-*- -*-foo-*- 99 -*-BAR-*- $%&" |
As usual, if we only want to replace the first match, we can omit the g
option:
==> regsub "" "[[:alpha:]][[:alnum:]]*" "1var foo 99 BAR $%&" \ (sprintf "-*-%s-*-" (reg 0)) "1-*-var-*- foo 99 BAR $%&" |
And here's an example showing empty and optional patterns at work:
==> regsub "g" "" "some text" " " " s o m e t e x t " ==> regsub "g" " ?" "some text" ":" ":s:o:m:e::t:e:x:t:" |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
It is also possible to access the parts of matches, called "groups", which
are defined by parenthesized subexpressions of the regular expression. Groups
are numbered starting at 1, counting opening parentheses from the left. The
text matched by a group can be accessed using reg N
where N
is
the group number.
For instance, suppose we want to parse environment lines, such as those
returned by the shell's set
command. We assume that each line with a
definition looks like VARIABLE=VALUE
, which can be described by the
pattern "^([^=]+)=(.*)$"
, in which group 1 denotes the variable and
group 2 the value. Thus we can obtain the parts of a definition using a
regex
search like the following:
==> regex "" "^([^=]+)=(.*)$" "VARIABLE=VALUE" (reg 1,reg 2) [("VARIABLE","VALUE")] |
We can turn this into an env
function which returns the current
environment as a list of (variable,value) pairs as follows. Here, we use
the fget
function to read in the whole environment as a single
string from a pipe opened on the set
command. The environment
string is then parsed using the n
option of regex, which makes
the ^
and $
anchors match the beginning and end of each
line, respectively.
env = regex "gn" "^([^=]+)=(.*)$" envget (reg 1,reg 2); envget = fget (popen "set" "r"); |
Given these definitions, we can now get hold of the whole environment and,
e.g., turn it into a dictionary which can be accessed in a convenient fashion
using the functions from the dict.q
standard library module:
==> var envdict = dict env ==> envdict!"SHELL" "/bin/bash" |
When a pattern consists of multiple alternatives, such as
"(foo)|(bar)"
, then some of the groups might participate in a match,
while others don't. You can check whether a group matched by testing the
corresponding regpos
value. If the value is nonnegative, then the group
matched; otherwise (the value is -1
) it didn't. You can also use the
regs
function to determine the list of all groups that matched. For
instance:
==> regex "" "(foo)|(bar)" "foo" (regpos 1,regpos 2) [(0,-1)] ==> regex "" "(foo)|(bar)" "bar" regs [[2]] |
So here's a simple way to implement a lexical analyzer which distinguishes between different kinds of tokens. For example, let us tokenize identifiers and integers, skip whitespace, and return all remaining characters as literals.
def SYN = "([[:alpha:]][[:alnum:]]*)|(-?[[:digit:]]+)|([^[:space:]])"; def TOK = (none,ident,num,id); toks S = regex "g" SYN S ((TOK!(hd regs)) (reg (hd regs))); |
Note that the literal characters are mapped to the standard library id
function, the identity operation. Hence these characters are represented by
themselves, whereas other tokens are denoted by appropriate constructor terms,
ident S
and num S
in our example, which contain the actual text
of the token as arguments. Let's see this in action:
==> toks "foo -99; bar 99;" [ident "foo",num "-99",";",ident "bar",num "99",";"] |
The token sequence is now ready to be processed by a higher-level routine such as a parser, but this is another story which will be told another time …
Yet another way to employ submatches are the back references. You can
specify that a group is to be repeated by using the notation \N
, where
N
is a single digit between 1 and 9, in the regular expression
argument. For instance, regex "g" "([ab]+)\\1"
will match all
substrings which have two consecutive identical sequences of a
's and
b
's.
Back references add considerably to the matching complexity (some matches may take exponential time) and hence should be avoided. In fact, it can be shown that regular expression matching with back references is NP-complete.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Regular expression searches can also be nested, by invoking the regex
function recursively in the expression argument of another regex
call. This is useful, e.g., if regex
is used to tokenize a string, and
we want to further analyze the individual tokens. We could do this by mapping
a regex
call to the finished token list, but it may be more efficient
to perform the second search right away on each individual token. Here's an
abstract example, which finds all substrings delimited by c
's, and
counts the number of consecutive a
's and b
's in them:
==> regex "g" "[^c]+" "abbacbaaca" \ (regex "g" "a+|b+" (reg 0) (reg 0!0,#reg 0)) [[("a",1),("b",2),("a",1)],[("b",1),("a",2)],[("a",1)]] |
There's one caveat with nested searches: The current regex
implementation has the somewhat annoying but hardly avoidable limitation
that it is not possible to "return" to the match state of an
enclosing regex
call after a recursive call to
regex
. Therefore all match state information must be retrieved
before the recursive call, and must then either be processed
right away or saved in local variables for later use.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The Q interpreter implements basic integer arithmetic using the GMP (GNU
multiprecision) library. With the clib
module you also get some of the
more advanced GMP routines, namely integer powers and roots, and various
number-theoretic functions. (These are all included in the prelude.)
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
public extern pow M N, root M N, intsqrt M; |
These functions compute (exact) powers and (integer parts of) roots of
integers. The results are always integers; roots are truncated to the largest
integer below the exact root value. The pow
function computes the
N
th power of M
(N >= 0
), root
the integer part of
the N
th root of M
(M >= 0
, N > 0
), and
intsqrt
the integer part of the square root of M >= 0
.
public extern powmod K M N, invmod K M; |
The powmod
function returns the (smallest nonnegative) N
th
power of M
modulo K
, where K
must be nonzero. This
function is much more efficient than pow M N mod K
if N
is
large. The invmod
function computes the (smallest positive)
multiplicative inverse of M
modulo K
(if it exists,
i.e. if M
is nonzero and relatively prime to K
). In other
words, invmod K M
returns a number N
in the range
1..K-1
such that M*N = N*M = 1
modulo K
.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
public extern isprime N; |
The isprime
function implements the Miller-Rabin algorithm for
probabilistic prime testing. If isprime N
(where N > 1
is an
integer) returns true
or false
, then N
is surely prime or
composite, respectively. If the function fails, then the number is composite
with a probability of only (1/4)^ISPRIME_REP
, where the
ISPRIME_REP
variable denotes the number of repetitions of the
probabilistic prime test.
To avoid the overhead of checking ISPRIME_REP
each time the
isprime
function is invoked, isprime
determines the value of the
variable once, at the time of its first invocation. Thus, if you set this
value, make sure to set it before you call isprime
for the first
time. Reasonable ISPRIME_REP
values vary from 5 to 10; if the variable
is not set when isprime
is first called, a default value of 5 is used,
which means that the probability of isprime
failing on a composite will
be 1/1024.
When using this function, your script should provide an appropriate default
definition which handles the case that the isprime
operation of this
module fails. There are basically three different ways to cope with a failing
isprime
call. First, you can make your program abort, e.g., using an
"error rule" like the following:
isprime N:Int = error "Prime test failed!"; |
Given that the test will probably not fail very often, this might be a viable option - you can always run your script again and hope that it finishes successfully, since the probabilistic test will choose different random parameters each time it is invoked.
Second, you can pretend that the prime test succeeded by providing the definition:
isprime N:Int = true; |
With this definition there will be a small probability that you mistake a composite for a prime, but for some algorithms this might be tolerable.
Third, you can play safe by providing your own primality test which catches those few cases when the probabilistic test fails. For instance, employing the usual trial division method:
isprime N:Int = true if N = 2; = false if N and 1 = 0; // even number = try_div 3 (intsqrt N) N otherwise; try_div L M N = true if L > M; = false if N mod L = 0; = try_div (L+2) M N otherwise; |
Of course such an exhaustive search will take a long time for big numbers, but this might be acceptable if the default rule is not invoked very often.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
public extern gcd M N, lcm M N; |
These are the usual greatest-common-divisor and least-common-multiple
functions. Both gcd
(which is only defined if at least one of the
arguments is nonzero) and lcm
(which returns zero if either
argument is zero) always return a nonnegative integer (note that older
GMP versions returned the sign of the product of N
and M
in the lcm value, but this is no longer the case).
public extern remove_factor M N; |
The remove_factor
function is used to help in factoring. It
returns a pair (K,R)
consisting of the multiplicity K
of
N
in M
(i.e., the maximum L
such that pow N
L
divides M
) and the remainder R
which has all N
factors removed, i.e., R = M div pow N K
. This function requires
that M
is nonzero and N
positive.
public extern jacobi M N; |
This function computes the Jacobi symbol M
over N
, which
can be used to find quadratic residues of an odd prime. M
can be an
arbitrary integer, but N
must be positive; the value of the function is
always 1, 0 or -1, where the value 0 indicates that gcd M N > 1
. In the
case of an odd prime N
, we have that jacobi M N = 1
iff M
is a nonzero quadratic residue modulo N
.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Determine all invertible numbers modulo 9 and their corresponding inverses:
==> filter (\(X,Y).isnum Y) (zip [1..8] (map (invmod 9) [1..8])) [(1,1),(2,5),(4,7),(5,2),(7,4),(8,8)] |
Check the result:
==> map (\(X,Y).X*Y mod 9) _ [1,1,1,1,1,1] |
Find all (nonzero) quadratic residues modulo 7:
==> filter (\X.jacobi X 7 = 1) [1..6] [1,2,4] |
Decompose 858330 into prime factors (brute force):
==> def N = 858330, P = filter isprime [2..N] ==> filter (\(X,Y).Y>0) (zip P (map (fst.remove_factor N) P)) [(2,1),(3,3),(5,1),(11,1),(17,2)] |
Well, that's probably not the right way to do it. So here's a reasonable
factorization algorithm using remove_factor
. This method is a
lot faster - but still not fast enough for code breaking.
factor 0 = []; factor N:Int = factor (-N) if N<0; = [(2,K)|factor_from 3 M] where (K,M) = remove_factor N 2 if N and 1 = 0; // even number = factor_from 3 N otherwise; factor_from P N = [] if P > N; = [(P,K)|factor_from (P+2) M] where (K,M) = remove_factor N P if N mod P = 0; // P divides N (must be prime) = factor_from (P+2) N otherwise; // not a factor, try the next one |
For instance, try the following:
==> factor 807699854836875 [(3,1),(5,4),(7,2),(11,5),(13,2),(17,1),(19,1)] |
Check the result:
==> prd (map (\(X,Y).pow X Y) _) 807699854836875 |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Last but not least, the clib
module also provides "built-in"
replacements for various standard library functions:
public extern stdlib::append Xs Y, stdlib::cat Xs, stdlib::mklist X N, stdlib::nums N M, stdlib::numsby K N M, stdlib::reverse Xs, tuple::tuplecat Xs; public extern string::chars S, string::join DELIM Xs, string::split DELIM Xs, string::strcat Xs; |
These functions are much more efficient, both in running time
and memory requirements, than the "standard" definitions in
stdlib.q
and string.q
, and improve the performance of
many common list and string processing tasks. In some cases, the
provided operations are even several orders of magnitude faster. In
particular, the standard definitions of the strcat
and
tuplecat
operations are very slow in comparison, because
they are implemented using repeated concatenation and hence take
quadratic running time. In contrast, all functions provided here are
guaranteed to run in linear time.
A faster sorting routine is provided as well:
public extern sort P Xs; |
This function works like the msort
and qsort
operations
provided by the sort.q
module, but is implemented using the
quicksort routine from the C library. Note that, in difference to
msort
/qsort
, the sort
function does not perform
stable sorting, so you still have to use msort
or qsort
if this feature is required.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated by Albert Gräf on February, 23 2008 using texi2html 1.76.