[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

12. Clib

Clib is the C interface and "system" module of the Q programming language. As of Q 7.8, this component actually consists of two different modules, clib.q which provides basic C data structures and routines commonly used in most Q programs, and system.q which contains most of the POSIX system interface. Only clib.q is part of the prelude; for most system functions, you will have to explicitly import system.q in your programs. In the sections below, we will always indicate whether the system.q module is needed for the described operations.

In difference to the other standard library modules, clib and system are external modules, i.e., most functions are actually implemented in C (cf. C Language Interface). Together, clib and system provide additional string operations, extended file functions, C-style formatted I/O, low-level and binary I/O, an interface to various system functions, POSIX thread functions, expression references, time functions, internationalization support, filename globbing and regular expression matching, additional integer functions from the GMP library, and, last but not least, efficient C replacements for some common standard library list and string processing functions.

Even if you do not use the extra functionality provided by these modules, you will benefit from the replacement operations (which are in clib and thus included in the prelude), which considerably speed up basic list and string processing, sometimes by several orders of magnitude.

NOTE: Not all of the following operations are implemented on all systems. The UNIX-specific operations are marked with the symbol `(U)' in the clib.q and system.q scripts. Only a portable subset of the UNIX system interface is provided, which encompasses the most essential operations found on many recent UNIX (and other POSIX) systems, as described by the ANSI C and POSIX standards as well as the Single UNIX Specification (SUS). These operations are also available on Linux and OSX systems.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

12.1 Manifest Constants

Clib defines an abundance of symbolic values for use with various system functions. Most of these can be found in system.q, but a few constants related to the memory sizes of various basic C data types and the fseek and setvbuf operations can also be found in clib.q.

System constants actually vary from system to system; only the most common values are provided as global variables here. The variables are declared const (read-only) and are initialized at startup time. Flag values can be combined using bitwise logical operations as usual. A complete list of the variables can be found at the beginning of the clib.q and system.q scripts. Flag values which are unavailable on the host system will be set to zero, other undefined values to -1. Thus undefined values will generally have no effect or cause the corresponding operations to fail.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

12.2 Additional String Functions

These functions provide an interface to some familiar character routines from the C library. They can all be found in clib.q and are thus in the standard prelude.

Character predicates: These work exactly like the corresponding C library routines, except that they work with arbitrary Unicode, not just ASCII, characters, provided that the interpreter has been built with Unicode support.

 
public extern islower C, isupper C, isalpha C, isdigit C, isxdigit C,
  isalnum C, ispunct C, isspace C, isgraph C, isprint C, iscntrl C,
  isascii C;

String conversion: Convert a string to lower- or uppercase (like the corresponding C functions, but work on arbitrary Unicode strings, not just on single ASCII characters).

 
public extern tolower S, toupper S;

Examples

Count the number of alphanumeric characters in a text:

 
==> #filter isalnum (chars "The little brown fox.\n")
17

Convert a string to uppercase:

 
==> toupper "The little brown fox.\n"
"THE LITTLE BROWN FOX.\n"

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

12.3 Byte Strings

The following type represents unstructured binary data implemented as C byte vectors. This data structure is used by the low-level I/O functions and other system functions which operate on binary data. The ByteStr type itself and its operations are implemented in clib.q and thus included in the prelude.

 
public extern type ByteStr;

public isbytestr B;                     // check for byte strings

Byte strings are like ordinary character strings, but they do not have a printable representation, and they may include zero bytes. (Recall that a zero byte in a character string terminates the string.) They can be used to encode arbitrary binary data such as C vectors and structures. The bytestr function can be used to construct byte strings from integers, floating point numbers, string values or lists of unsigned byte values:

 
public extern bytestr X;                // create a byte string

The X argument denotes the data to be encoded and can be either a list of byte values (unsigned integers in the range from 0 to 255), or an atomic data object, i.e., an integer, floating point number or string constant. In the latter case, the argument can also have the form (X,SIZE) indicating the desired byte size of the object; otherwise a reasonable default size is chosen. If the specified size differs from the actual size of X, the result is zero-padded or truncated accordingly. Integer values are encoded in the host byte order, with the least significant GMP limb first; negative integers are represented in 2's complement. Floating point values are encoded using double precision by default or if the byte count is sufficient (i.e., at least 8 on most systems), and using single precision otherwise. Strings are by default encoded in the system encoding, but you can also specify the desired target encoding as (X,CODESET) (or (X,CODESET,SIZE) if you also need to specify a byte size), where CODESET is a string denoting the target encoding.

Like ordinary character strings, byte strings can be concatenated, size-measured, indexed, sliced and compared lexicographically. Moreover, a byte string can be converted back to a (multiprecision) integer, floating point number, string value, or a list of byte values. (When converting back to a string you can specify the source encoding as in bstr (B,CODESET), otherwise the system encoding is assumed.) For these purposes the following operations are provided.

 
public extern bcat Bs;                  // concatenate list of byte strings
public extern bsize B;                  // byte size of B
public extern byte I B;                 // Ith byte of B
public extern bsub B I J;               // slice of B (bytes I..J)
public extern bcmp M1 M2;               // compare M1 and M2

public extern bint B;                   // convert to unsigned integer
public extern bfloat B;                 // convert to floating point number
public extern bstr B;                   // convert to string
public bytes B;                         // convert to list
public ::list B;                        // dito

You can use the bytes function to convert a byte string to a list of byte values; the list function is overloaded to provide the same functionality. These functions are defined as follows:

 
bytes B:ByteStr                         = map (B!) [0..#B-1];
list B:ByteStr                          = bytes B;

For convenience, the common string operators and the sub function are overloaded to work on byte strings as well. Thus #B returns the size of B (the number of bytes it contains) and B!I the Ith byte of B. B1++B2 concatenates B1 and B2, sub B I J returns the slice from byte I to J, and the relational operators `=', `<', `>' etc. can be used to compare byte strings lexicographically. These operations are all implemented in terms of the functions listed above.

Byte Strings as Mutable C Vectors

As of Q 7.11, clib supports a number of additional operations which allow you to treat byte strings as mutable C vectors of signed/unsigned 8/16/32 bit integers or single/double precision floating point numbers. The following functions provide read/write access to elements and slices of such C vectors:

 
public extern get_int8 B I, get_int16 B I, get_int32 B I;
public extern get_uint8 B I, get_uint16 B I, get_uint32 B I;
public extern get_float B I, get_double B I;

public extern put_int8 B I X, put_int16 B I X, put_int32 B I X;
public extern put_uint8 B I X, put_uint16 B I X, put_uint32 B I X;
public extern put_float B I X, put_double B I X;

Note that the given index argument I is interpreted relative to the corresponding element type. Thus, e.g., get_int32 B I returns the Ith 32 bit integer rather than the integer at byte offset I. Also note that integer arguments must fit into machine integers, otherwise these operations will fail. Integers passed for floating point arguments will be coerced to floating point values automatically.

For the get_xxx functions, the index parameter may also be a pair (I,J) to return a slice of the given byte string instead of a single element (this works like sub/bsub, but interprets indices relative to the element type). The put_xxx functions also accept a byte string instead of an element as input, and will then overwrite the corresponding slice of the target byte string B with the given source byte string X. Similar to sub/bsub, these variations of get_xxx/put_xxx are "safe" in that they automatically adjust the given indices to fit within the bounds of the target byte string.

Moreover, the following convenience functions are provided to convert between byte strings and lists of integer/floating point elements.

 
public extern int8_list B, int16_list B, int32_list B;
public extern uint8_list B, uint16_list B, uint32_list B;
public extern float_list B, double_list B;

public extern int8_vect Xs, int16_vect Xs, int32_vect Xs;
public extern uint8_vect Xs, uint16_vect Xs, uint32_vect Xs;
public extern float_vect Xs, double_vect Xs;

Examples

Encode an integer as a byte string, take a look at its individual bytes, and convert the byte string back to an integer:

 
==> hex

==> def B = bytestr 0x01020304; bytes B; bint B
[0x4,0x3,0x2,0x1]
0x1020304

(Note that this result was obtained on a little-endian system, hence the least significant byte 0x04 comes first in the byte list.)

Negative integers are correctly encoded in 2's complement:

 
==> def B = bytestr (-2); bytes B; bint B
[0xfe,0xff,0xff,0xff]
0xfffffffe

To work with these binary representations you must be aware of the way GMP represents multiprecision integers. In particular, note that the default size of an integer is always a multiple (at least one) of GMP's limb size which is usually 4 or 8 bytes depending on the host system's default long integer type. The actual limb size can be determined as follows:

 
==> #bytes (bytestr 0)

In order to get integers of arbitrary sizes, an explicit SIZE argument may be used. For instance, here is how we encode small (1 or 2 byte) integers:

 
==> bytes (bytestr (0x01,1)); bytes (bytestr (0x0102,2))
[0x1]
[0x2,0x1]

The host system's byte sizes of various atomic C types can be determined with symbolic values declared at the beginning of clib.q, such as SIZEOF_CHAR, SIZEOF_SHORT, SIZEOF_LONG, SIZEOF_FLOAT and SIZEOF_DOUBLE.

Another fact worth mentioning is that even on big-endian systems, integers are always encoded with the "least significant limb" first. So, for instance, given that the limb size is 4, as in the above examples, the 2-limb integer 0x0102030405060708 consists of bytes 0x8 0x7 0x6 0x5 0x4 0x3 0x2 0x1 on a little-endian system, in that order, whereas the byte order on a big-endian system is 0x5 0x6 0x7 0x8 0x1 0x2 0x3 0x4.

Here is how we can quickly check the byte order of the host system:

 
==> hd (bytes (bytestr 1))

This expression returns 1 on a little-endian system and zero otherwise.

As long as an integer does not exceed the machine's word size (which usually matches the limb size), we can simply convert between big-endian and little-endian representation by reversing the byte list:

 
==> bytestr (reverse (bytes B))

Floating point values can be encoded either in double or single precision, depending on the SIZE argument. The default size is double precision (usually 8 bytes).

 
==> bfloat (bytestr (1/3)); bfloat (bytestr (1/3,SIZEOF_FLOAT))
0.333333333333333
0.333333343267441

The default size of the encoding of a character string is the byte size of the string in the target encoding (the system encoding by default). If an explicit size is given, the string is zero-padded or truncated if necessary. The following example will work with any system encoding based on 7 bit ASCII (like Latin1, UTF-8 or ASCII itself):

 
==> dec

==> def S1 = bytestr "ABC", S2 = bytestr ("ABC",2), S3 = bytestr ("ABC",5)

==> bytes S1; bytes S2; bytes S3
[65,66,67]
[65,66]
[65,66,67,0,0]

==> bstr S1; bstr S2; bstr S3
"ABC"
"AB"
"ABC"

By combining elements like the ones above, and including appropriate "tagging" information, more complex data structures can be represented as binary data as well. For this purpose, the byte strings of the tags and the data elements can be concatenated with bcat or the `++' operator. This is useful, in particular, for compact storage of objects in files. Moreover, some system functions involve binary data which might represent C structures and/or vectors. Such data can be assembled from the constituent parts by simply concatenating them. For instance, consider the following C struct:

 
struct { char foo[108]; short bar; int baz; };

A value of this type, say {"Hello, world.", 4711, 123456}, can then be encoded as follows:

 
==> bytestr ("Hello, world.",108) ++ bytestr (4711,SIZEOF_SHORT) ++ \
bytestr (123456,SIZEOF_INT)

Similarly, a list of integers can be converted to a corresponding C vector as follows:

 
==> bcat (map bytestr [1..100])

When encoding such C structures you must also consider alignment issues. For instance, most C compilers will align non-byte data at even addresses.

In order to facilitate the handling of C vectors of integers and floating point values, as of Q 7.11 clib offers a number of specialized operations which provide direct read/write access to elements and slices of numeric vectors, and allow you to convert between C vectors and Q lists of integer or floating point values. These operations are all implemented directly in C and will usually be much more efficient for manipulating numeric C vectors than the basic byte-oriented functions. Moreover, they allow you to modify the elements of a C vector in a direct fashion, turning byte strings into a mutable data structure.

Different operations are provided to handle vectors of signed or unsigned 8/16/32 bit (machine) integers, as well as single (32 bit) and double precision (64 bit) floating point numbers. For instance:

 
==> def B = uint32_vect [100..110]

==> uint32_list B
[100,101,102,103,104,105,106,107,108,109,110]

==> get_uint32 B 1
101

==> put_uint32 B 1 0xffffffff
()

==> uint32_list B
[100,4294967295,102,103,104,105,106,107,108,109,110]

Note that, because these C vectors are just normal byte strings, you can freely convert between different representations of the numeric data. E.g.:

 
==> take 12 $ int8_list B
[100,0,0,0,-1,-1,-1,-1,102,0,0,0]

Entire slices of byte strings can be retrieved and overwritten as well. Note that, as with sub, the indices are adjusted automatically to stay within the bounds of the target vector.

 
==> put_uint32 B (-2) (uint32_vect [90..94])
()

==> uint32_list B
[92,93,94,103,104,105,106,107,108,109,110]

==> uint32_list $ get_uint32 B (-2,3)
[92,93,94,103]

==> uint32_list $ get_uint32 B (8,100)
[108,109,110]

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

12.4 Extended File Functions

Clib provides the following enhanced and additional file functions:

 
public extern ::fopen NAME MODE, fdopen FD MODE, freopen NAME MODE F;
public extern fileno F;
public extern setvbuf F MODE;
public extern fconv F CODESET;
public extern tmpnam, tmpfile;
public extern ftell F, fseek F POS WHENCE;
public rewind F;
public extern gets, fgets F;
public extern fget F;
public extern ungetc C, fungetc F C;

These are all defined in clib.q and thus included in the prelude.

The fopen version of clib handles the `+' flag in mode strings, thus enabling you to open files for both reading and writing. The mode "r+" opens an existing file for both reading and writing; the initial file contents are unchanged, and both the input and output file pointers are positioned at the beginning of the file. The "w+" mode creates a new file, or truncates it to zero size if it already exists, and positions the file pointers at the beginning of the file. The "a+" mode appends to an existing file (or creates a new one); the initial file pointer is set at the beginning of the file for reading, and at the end of the file for writing. All these modes also work in combination with the b (binary file) flag.

The freopen function is like fopen, but reopens an existing file object on another file. Just as in C programming, the main purpose of this operation is to enable the user to redirect the standard I/O streams associated with the interpreter process (available in the interpreter by means of the INPUT, OUTPUT and ERROR variables).

The fdopen function opens a new file object on a given file descriptor, given that the mode is compatible. Conversely, the fileno function returns the file descriptor of a file object. (See also the functions for direct file descriptor manipulation in Low-Level I/O.)

The setvbuf function sets the buffering mode for a file (IONBF = no buffering, IOLBF = line buffering, IOFBF = full buffering). This operation should be invoked right after the file has been opened, before any I/O operations are performed.

The fconv function sets the encoding of a file. By default, Q's built-in I/O operations as well as clib's string I/O functions assume the system encoding, and convert between this encoding and the internal UTF-8 string representation as needed. If a text file uses an encoding different from the system encoding, you can use the fconv function to set the desired encoding. CODESET must be a string denoting a valid encoding name for the iconv function (see also Internationalization, below). This affects all subsequent text read/write operations on the file. (This operation only works for Unicode-capable systems which have iconv installed. Also note that this function is only available for Q 7.0 and later.)

The tmpnam and tmpfile functions work just like the corresponding C routines: tmpnam returns a unique name for a temporary file, and tmpfile constructs a temporary file opened in "w+b" mode, which will be deleted automatically when it is closed. See the tmpnam(3) and tmpfile(3) manual pages for details.

The ftell/fseek functions are used for file positioning. The ftell function returns the current file position, while fseek function positions the file at the given position. The rewind function provides a convenient shorthand for repositioning the file at the beginning. These operations work just like the corresponding C functions. The WHENCE argument of fseek determines how the POS argument is to be interpreted; it can be either SEEK_SET (POS is relative to the beginning of the file, i.e., an absolute position), SEEK_CUR (POS is relative to the current position) or SEEK_END (POS is relative to the end of the file). In the latter two cases POS can also be negative.

Portability Notes:

The gets/fgets functions work like the C fgets function, i.e., they read a line from standard input or the given file including the trailing newline, if any. The fget function reads an entire file at once and returns it as a string. The ungetc/fungetc functions push back a single character on standard input or the given input file, like the C ungetc function. The C library only guarantees that pushing back a single ASCII character will work, so the result of pushing back multiple or multibyte characters is implementation-dependent.

Moreover, the following additional aliases are provided for C aficionados:

 
public ::readc as getc, ::freadc F as fgetc;
public ::writes S as puts, ::fwrites F S as fputs;
public ::writec C as putc, ::fwritec F C as fputc;

Examples

 
Open a new file for both reading and writing:

==> def F = fopen "test" "w+"

Write a string to the file:

==> fwrites F "The little brown fox.\n"
()

Current position is behind written string (at end-of-file):

==> ftell F
22

Rewind (go to the beginning of the file):

==> rewind F
()

Read back the string we've written before:

==> fgets F
"The little brown fox.\n"

Check that we're again at end-of-file:

==> feof F
true

Output another string:

==> fwrites F "The second line.\n"
()

Position behind the first string:

==> fseek F 22 SEEK_SET
()

Reread the second string:

==> fgets F
"The second line.\n"

And here's how to read an entire text file at once:

==> def T = fget (fopen "clib.q" "r")

To quickly compute a 32 bit checksum of the file:

==> sum (bytes (bytestr T)) mod 0x100000000
3937166

Finally, let's split the text into lines and add line numbers using sprintf (see C-Style Formatted I/O):

 
==> def L = split "\n" T

==> def L = map (sprintf "%3d: %s\n") (zip [1..#L] L)

==> do writes (take 5 L)
  1:
  2: /* clib.q: Q's system module */
  3:
  4: /* This file is part of the Q programming system.
  5: 
()

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

12.5 C-Style Formatted I/O

These functions provide an interface to the C printf and scanf routines. They are all defined in clib.q and thus included in the prelude.

 
public extern printf FORMAT ARGS, fprintf F FORMAT ARGS,
  sprintf FORMAT ARGS;
public extern scanf FORMAT, fscanf F FORMAT, sscanf S FORMAT;

Arguments to the printf routines and the results of the scanf routines are generally encoded as tuples or single non-tuple values (if only one item is read/written).

All the usual conversions and flags of C printf/scanf are supported, except %p (pointer conversion). The basic h and l length modifiers are also understood, but not the fancy ISO C99 extensions like ll or hh, or l modifiers on characters and strings. Two further unsupported features of the printf functions are the %n (number of written characters) conversion and explicit argument indexing (m$); thus all arguments have to be in the same order as specified in the printf format string. The %n conversion is implemented for the scanf functions, though.

As these functions are simply wrappers for the corresponding C functions, integer conversions are generally limited to values which fit into machine integers. To handle integers of arbitrary sizes, you might treat them as strings (%s) in the format string and do the actual conversion manually with val or str.

Seasoned C programmers will appreciate that the wrapper functions provided here are safe in that they check their arguments and prevent buffer overflows, so they should never crash your program with a segfault. To these ends, if a %s or %[...] conversion without maximum field width is used with scanf, the field width will effectively be limited to some (large) value chosen by the implementation.

Examples

See the printf(3) and scanf(3) manual pages for a description of the format string syntax. Some basic examples follow (<CR> indicates that you hit the carriage return key to terminate a line):

 
==> printf "%d\n" 99
99
()

==> printf "%d\n" (99)
99
()

==> printf "%s %s %d\n" ("foo","bar",99)
foo bar 99
()

==> scanf "%d"
99<CR>
99

==> scanf "%s %s %d"
foo bar 99<CR>
("foo","bar",99)

As indicated, multiple values are denoted as tuples, and the printf function accepts both a single value or a one-tuple for a single conversion. The scanf function always returns a single, non-tuple value if only a single conversion is specified. Zero items are represented using the empty tuple. Note that you always have to supply the ARGS argument of printf, thus you specify an empty tuple if there are no output conversions:

 
==> printf "foo\n" ()
foo
()

The scanf function also returns an empty tuple if no input items are converted. For instance (as usual, using the * flag with a scanf conversion suppresses the corresponding input item):

 
==> scanf "%*s"
foo<CR>
()

Note that while scanf for most conversions skips an arbitrary amount of leading whitespace, the trailing whitespace character at which a conversion stops is not discarded by scanf. You can notice this if you invoke, e.g., readc afterwards:

 
==> scanf "%s %d"; readc
foo 99<CR>
("foo",99)
"\n"

If you really have to skip the trailing whitespace character, you can do this with a suppressed character conversion, e.g.:

 
==> scanf "%s %d%*c"; writes "input: "||reads
foo 99<CR>
("foo",99)
input: <reads function waiting for input here>

The fprintf/fscanf functions work analogously, but are used when writing or reading an arbitrary file instead of standard output or input. For instance:

 
==> var msg = "You're not supposed to do that!"

==> fprintf ERROR "Error: %s\n" msg
Error: You're not supposed to do that!
()

The sprintf function returns the formatted text as a string instead of writing it to a file:

 
==> sprintf "%s %s %d\n" ("foo","bar",99)
"foo bar 99\n"

Likewise, sscanf takes its input from a string:

 
==> sscanf "foo bar 99\n" "%s %s %d"
("foo","bar",99)

The %n conversion is especially useful with sscanf, since it allows you to determine the number of characters which were actually consumed:

 
==> sscanf "foo bar 99 *** extra text here ***\n" "%s %s %d%n"
("foo","bar",99,10)

You might then use the character count, e.g., to check whether the input format matched the entire string, or whether there remains some text to be processed.

Some remarks about the role of the length modifiers h and l are in order. Just as with the C scanf routines, you need the l modifier to read a double precision value; a simple %f will only read single precision number:

 
==> scanf "%f"
1e100<CR>
inf

==> scanf "%lf"
1e100<CR>
1e+100

The printf functions, however, always print double precision numbers, so the l modifier is not needed:

 
==> sprintf "%g" 1e100
"1e+100"

For the integer conversions, the h and l modifiers denote short (usually 2 byte) and long (usually 4 byte) integer values. If the modifier is omitted, the default integer type is used (this usually is the same as long, but your mileage may vary).

As already indicated, the printf and scanf routines are limited to machine integer sizes. Thus a scanf integer conversion will always return a short or long integer value, depending on the length modifier used. If a printf integer conversion is applied to a "big" integer value, only the least significant bytes of the value are printed, as if the printed number (represented in 2's complement if negative) had been cast to the corresponding integer type in C. Thus the printed result will be consistent with C printf output under all circumstances. For instance:

 
==> def N = 0xffff70008000 // big number

==> printf "%hu %lu\n" (N,N)
32768 1879080960
()

==> printf "%hd %ld\n" (N,N)
-32768 1879080960
()

To correctly print a big integer value, you can convert it manually with Q's built-in str function, then print the value using a %s conversion:

 
==> printf "%s\n" (str 1234567812345678)
1234567812345678
()

Similarly, you can read a big integer value by converting it as a string, and then apply the val builtin.

 
==> val (scanf "%s")
1234567812345678<CR>
1234567812345678

Here you might use the %[...] conversion to ensure that the number is in proper format (the initial blank is needed here to skip any leading whitespace):

 
==> val (scanf " %[0-9-]")
-1234567812345678
-1234567812345678

On output, the integer and floating point conversions can all be used with either integer or floating point arguments; integers will be converted to floating point values and vice versa if necessary:

 
==> printf "An integer: %d\n" 99.9
An integer: 99
()

==> printf "A floating point value: %e\n" 99
A floating point value: 9.900000e+01
()

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

12.6 File and Directory Functions

These functions provide the same functionality as their C counterparts. They are all to be found in system.q and thus you have to explicitly import the system module to use them.

 
public extern rename OLD NEW;           // rename a file
public extern unlink NAME;              // delete a file
public extern truncate NAME LEN;        // truncate a file (U)
public extern getcwd, chdir NAME;       // get/set the working directory
public extern mkdir NAME MODE;          // create a new directory
public extern rmdir NAME;               // remove a directory
public extern readdir NAME;             // list the files in a directory
public extern link OLD NEW;             // create a hard link (U)
public extern symlink OLD NEW;          // create a symbolic link (U)
public extern readlink NAME;            // read a symbolic link
public extern mkfifo NAME MODE;         // create a named pipe (U)
public extern access NAME MODE;         // test access mode
public extern chmod NAME MODE;          // set the file mode
public extern chown NAME MODE UID GID;  // set file ownership (U)
public extern lchown NAME MODE UID GID; // set link ownership (U)
public extern utime NAME TIMES;         // set the file times
public extern umask N;                  // set/get file creation mask
public extern stat NAME, lstat NAME;    // file and link information

The stat/lstat functions return a tuple consisting of the commonly available fields of the C stat struct, see stat(2). For your convenience, the following mnemonic functions are provided for accessing the different components:

 
public st_dev STAT, st_ino STAT, st_mode STAT, st_nlink STAT,
  st_uid STAT, st_gid STAT, st_rdev STAT, st_size STAT, st_atime STAT,
  st_mtime STAT, st_ctime STAT;

Examples

These all need the system module, so you have to import it in the interpreter to make the following examples work. E.g.:

 
==> import system

With that out of the way, let's play around with some of these functions:

 
==> mkdir "tmp" 0777||chdir "tmp"||mkfifo "foo" 0666||\
rename "foo" "bar"||unlink "bar"||chdir ".."||rmdir "tmp"
()

(Create a tmp subdirectory, change to it, create a new FIFO special file, rename that file, delete it, change back to the original directory, and remove the tmp directory. All with a single expression which realizes identity.)

Now for something more useful. We can retrieve the current umask while setting it to zero, and then reset it to the original value as follows:

 
==> def U = umask 0; oct; umask U || U; dec
022

List the files in the current directory:

 
==> readdir "."
[".","..","Makefile","givertcap","clib.c","clib.q","Makefile.am",
"Makefile.in","README-Clib","examples","Makefile.mingw"]

Get the size of a file:

 
==> st_size (stat "README-Clib")
355

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

12.7 Process Control

With the notable exception of the exit function which is included in clib (and thus in the prelude), all the following functions need an explicit import of the system module.

The system function returns the status code of the command if execution was successful, and fails otherwise:

 
public extern system CMD;               // exec command using the shell

Clib also provides the usual UNIX process creation and management routines. Most of these really require a UNIX system; no attempt is made to emulate operations like fork on systems where they are not implemented. Thus the only process operations which currently work under Windows are system, exec, spawn, _spawn, exit and getpid.

 
public extern fork;                     // fork a child process (U)
public extern exec PROG ARGS;           // execute program
public extern spawn PROG ARGS;          // execute program in child process
public extern _spawn MODE PROG ARGS;    // execute child with options
public extern nice INC;                 // change nice value (U)
public extern exit N;                   // exit process with given exit code
public extern pause;                    // pause until a signal occurs (U)
public extern raise SIG;                // raise signal in current process
public extern kill SIG PID;             // send signal to given process (U)
public extern getpid;                   // current process id
public extern getppid;                  // parent's process id (U)
public extern wait;                     // wait for any child process (U)
public extern waitpid PID OPTIONS;      // wait for given child process (U)

All these operations are simply wrappers for the corresponding C library routines. Note, however, that the kill function takes the signal to send as its first argument, which makes it easier to use partial applications of the function, e.g., to iterate a kill operation over a list of process numbers (as in `do (kill SIGTERM) PIDs').

The exec function performs a path search like the C execlp/execvp function; the parameters for the program are given as a string list ARGS, and as usual the first argument should repeat the program file name. This function never returns unless it fails. The spawn and _spawn operations are provided to accommodate Windows' lack of fork and wait; these functions work on both UNIX and Windows. The spawn function works like exec, but runs the program in a new child process. It returns the new process id (actually the process handle under Windows). The _spawn function is like spawn, but accepts an additional MODE parameter which determines how the child is to be executed, either P_WAIT (wait for the child, return its exit status), P_NOWAIT (do not wait for the child, same as spawn), P_OVERLAY (replace the current image with the new process, same as exec) and P_DETACH (run the new process in the background). (Note that the P_DETACH option is ignored on UNIX systems; the correct way to code a "daemon" on UNIX is shown in the examples section below.)

On UNIX, the following routines are provided to interpret the status code returned by the system, _spawn, wait and waitpid functions:

 
public extern isactive STATUS;
// process is active
public extern isexited STATUS, exitstatus STATUS;
// process has exited normally, get its exit code
public extern issignaled STATUS, termsig STATUS;
// process was terminated by signal, get the signal number
public extern isstopped STATUS, stopsig STATUS;
// process was stopped by signal, get the signal number

For more information about the process functions, we refer the reader to the corresponding UNIX manual pages.

Operations to access the process environment are also implemented. The getenv function fails if the given variable is not set in the environment; this lets you distinguish this error condition from a defined variable with empty value. The setenv function overwrites an existing definition of the given variable:

 
public extern getenv NAME, setenv NAME VAL;
// get/set environment variables

On UNIX, the following operations provide access to process user and group information, as well as process groups and sessions. Not all operations may be implemented on all UNIX flavours. Please see the UNIX manual for a description of these functions.

 
/* User/group-related functions (U). */

public extern setuid UID, setgid GID;   // set user/group id of process
public extern seteuid UID, setegid GID; // set effective user/group id
public extern setreuid RUID EUID, setregid RGID EGID;
                                        // set real and effective ids
public extern getuid, geteuid;          // get real/effective user id
public extern getgid, getegid;          // get real/effective group id
public extern getlogin;                 // get real login name

// get/set supplementary group ids of current process
public extern getgroups, setgroups GIDS;

/* Session-related routines (U). */

public extern getpgid PID, setpgid PID PGID;    // get and set process group
public extern getpgrp, setpgrp;                 // dito, for calling process
public extern getsid PID;                       // get session id of process
public extern setsid;                           // create a new session

Examples

Invoke the system function to execute a shell command:

 
==> import system

==> system "ls -l"

You can also run the program directly with the spawn function:

 
==> spawn "ls" ["ls","-l"]

Get and set an environment variable:

 
==> getenv "HOME"
"/home/ag"

==> getenv "MYVAR" // variable is undefined
getenv "MYVAR"

==> setenv "MYVAR" "foo bar"
()

==> getenv "MYVAR"
"foo bar"

Here are some examples demonstrating the use of named pipes and the process functions on UNIX systems. The mkfifo function allows the creation of so-called "FIFO special files" a.k.a. named pipes, which provide a simple inter-process communication facility.

For instance, create a named pipe as follows:

 
==> mkfifo "pipe" 0666
()

You can then open the writeable end of the pipe:

 
==> def OUT = fopen "pipe" "w"

Note that this call blocks until the input side of the pipe has been opened. For this purpose, start another instance of the interpreter (e.g., in another xterm), and from there open the pipe for reading:

 
==> def IN = fopen "pipe" "r"

Both fopen calls should now have finished, and you can write something to the output end of the pipe in the first instance of the interpreter:

 
==> fwrites OUT "Hello, there!\n"

Go to the other interpreter instance, and read back the string from there:

 
==> freads IN
"Hello, there!"

As usual, each end of the pipe is closed as soon as the corresponding file object is no longer accessible. When you close the writeable end of the pipe using, e.g., undef OUT in the first instance of the interpreter, the input side of the pipe will reach end-of-file, and thus feof IN will become true. After closing the pipe also on the input side, you can remove the FIFO special file with the unlink function.

You can also use named pipes to set up a communication channel to child processes created with fork. For instance:

 
import system;
def NAME = tmpnam;
def PIPE = mkfifo NAME 0666;
def MSG  = "Hello there!\n";

test     = printf "Parent writes: %s" MSG ||
           fwrites (fopen NAME "w") MSG ||
           writes "Parent waits for child ...\n" ||
           printf "Parent: child has exited with code %d\n" wait
               if fork > 0;
         = printf "Child reads: %s\n" (freads (fopen NAME "r")) ||
           writes "Child exiting ...\n" || exit 0
               otherwise;


==> test
Parent writes: Hello there!
Parent waits for child ...
Child reads: Hello there!
Child exiting ...
Parent: child has exited with code 0
()

==> unlink NAME
()

Another method for accomplishing this with anonymous pipes is discussed in Low-Level I/O.

On UNIX, it also possible to implement "daemons", i.e., processes which place themselves in the background and continue to run even when you log out. The following little script shows how to do this.

 
import system;

/* Becoming a daemon is easy: Just fork, have the parent exit, and call setsid
   in the child to start a new session. The new process becomes a child of the
   init process and has no controlling terminal. Thus it keeps running even if
   you log out, until it gets killed or the system shuts down. */

daemon          = setsid || main if fork = 0;
                = exit 0 otherwise;

/* The main code of the daemon then closes file descriptors inherited by the
   parent and starts executing. In this example we just open a logfile and
   start logging messages in regular intervals. We also handle the condition
   that we are terminated by a signal. */

main            = do close [0,1,2] || log F "daemon started" ||
                  do (trap 1) [SIGINT, SIGTERM, SIGHUP, SIGQUIT] ||
                  catch (sig F) (loop F) where F:File = fopen "log" "w";
                = perror "daemon" || exit 1 otherwise;

sig F (syserr SIG)
                = log F (sprintf "daemon stopped by signal %d" (-SIG)) ||
                  exit 0;

loop F          = sleep 5 || log F "daemon still alive" || loop F;

log F MSG       = fprintf F "%s at %s" (MSG, ctime time) || fflush F;

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

12.8 Low-Level I/O

These functions provide operations for direct manipulation of files on the file descriptor level. They are all in the system module. For a closer description of the following operations we refer the reader to the corresponding UNIX manual pages.

 
public extern open NAME FLAGS MODE;     // create a new descriptor
public extern close FD;                 // close a descriptor
public extern dup FD, dup2 OLDFD NEWFD; // duplicate a descriptor
public extern pipe;                     // create an unnamed pipe
public extern fstat FD;                 // stat descriptor
public extern fchdir FD;                // change directory (U)
public extern fchmod FD MODE;           // change file mode (U)
public extern fchown FD UID GID;        // set file ownership (U)
public extern ftruncate FD LEN;         // truncate a file (U)
public extern fsync FD, fdatasync FD;   // sync the given file (U)

The following operations can be used on both file descriptors and file objects. They read and write binary data represented as byte strings (see Byte Strings), providing an interface to the system's read/write(2) and fread/fwrite(3) functions. The bread function returns a byte string of the given size read from the given file. Note that the returned byte string may actually be shorter than SIZE bytes because, e.g., end of file has been reached or not enough input was currently available on a pipe. The bwrite function returns the number of bytes actually written which is usually the size of the byte string unless an error occurs. The functions fail if an error occurred before anything was read or written. It is the application's responsibility to check these error conditions and handle them in an appropriate manner.

 
public extern bread FD SIZE;            // read a byte string
public extern bwrite FD DATA;           // write a byte string

For instance, the following function uses bread and bwrite to copy an input to an output file, using chunks of 8192 bytes at a time:

 
fcopy F G               = () if bwrite G (bread F 8192) < 8192;
                        = fcopy F G otherwise;

The file pointer of a descriptor can be positioned with lseek. In difference to fseek, this function returns the new offset. To determine the current position you can hence use an expression like `lseek FD 0 SEEK_CUR'.

 
public extern lseek FD POS WHENCE;

Some terminal-related routines are also provided:

 
public extern isatty FD;                // is descriptor a terminal?

The following are UNIX-specific:

 
public extern ttyname FD;               // terminal associated with descriptor
public extern ctermid;                  // name of controlling terminal
public extern openpty, forkpty;         // pseudo terminal operations

The openpty function returns a pair (MASTER, SLAVE) of file descriptors opened for both reading and writing on a "pseudo terminal". MASTER is to be used in the controlling process, while SLAVE can be used for the standard I/O streams in a child process. The forkpty function combines openpty with fork and makes the slave device the controlling terminal of the child process; it returns a pair (PID, MASTER), where PID is zero in the child process and the process id of the child in the parent, and MASTER is the master end of the pseudo terminal to be used by the parent. These functions are commonly used to implement applications which drive other programs through a terminal emulation interface.

On UNIX systems, clib also provides access to the following fcntl operation (see Section 2 of the UNIX manual):

 
public extern fcntl FD CMD ARG;

The ARG parameter of fcntl depends on the type of command CMD which is executed. The available command codes and other relevant values are defined as global variables, as listed below. Flags are bitwise disjunctions of the symbolic values listed below. (The following are the values present on most systems. Specific implementations may provide additional flags.)

 
public var const
  // fcntl command codes
  F_DUPFD, F_GETFD, F_SETFD, F_GETFL, F_SETFL,
  F_GETLK, F_SETLK, F_SETLKW,

  // lock types
  F_RDLCK, F_WRLCK, F_UNLCK,

  // file access modes and access mode bitmask
  O_RDONLY, O_WRONLY, O_RDWR, O_ACCMODE,

  // file descriptor flags
  FD_CLOEXEC,

  // status flags
  O_CREAT, O_EXCL, O_TRUNC, O_APPEND, O_NONBLOCK, O_NDELAY, O_NOCTTY,
  O_BINARY;

The following types of commands are implemented:

 
fcntl FD F_DUPFD ARG                      // duplicate a file descriptor

fcntl FD F_GETFD ()                       // get file descriptor flags
fcntl FD F_SETFD FLAGS                    // set file descriptor flags

fcntl FD F_SETFD ()                       // get status flags/access mode
fcntl FD F_SETFD FLAGS                    // set status flags

fcntl FD F_GETLK (TYPE,POS,LEN[,WHENCE])  // query file lock information
fcntl FD F_SETLK (TYPE,POS,LEN[,WHENCE])  // set an advisory file lock
fcntl FD F_SETLKW (TYPE,POS,LEN[,WHENCE]) // blocking variant of F_SETLK

The first five commands serve to duplicate descriptors and to retrieve and change the file descriptor and status flags. The remaining commands are used for advisory file locking. A file lock is specified as a triple (TYPE, POS, LEN) or quadruple (TYPE, POS, LEN, WHENCE), where TYPE is the type of lock (F_RDLCK, F_WRLCK or F_UNLCK for read locks, write locks and unlocking, respectively), POS the position in the file, LEN the number of bytes to be locked (0 means up to the end of the file) and WHENCE specifies how the POS argument is to be interpreted. (This parameter has the same meaning as for the fseek and lseek functions, see above. If WHENCE is omitted, it defaults to SEEK_SET, i.e., absolute positions.) The value returned by the F_GETLK command is the lock description with TYPE set to F_UNLCK if the given lock would be accepted, and the description of a current lock blocking the lock request otherwise. (In the latter case the return value is actually a quadruple, with the id of a process currently owning a conflicting lock in the last component.)

Note that the standard I/O operations use buffered I/O by default which might interfere with record locking. Therefore in applications requiring individual record locking you should work with the low-level operations (open, bwrite, etc.) instead.

The following select function waits for a set of files to change I/O status. Note that this operation is available on Windows as part of the socket interface, but it only applies to sockets there.

 
public extern select FILES;

The input is a tuple (IN, OUT, ERR, TIMEOUT) consisting of three lists of file descriptors and/or file objects to be watched, and an optional integer or floating point value specifying a timeout in seconds. The function returns as soon as either a member of IN or OUT becomes ready for performing an I/O operation (without blocking), or an error condition is signaled for a member of ERR. The returned value is a triple (IN, OUT, ERR) with all the members of the original lists which are now ready for I/O. If the timeout is exceeded before any of the files has become ready, a triple of three empty lists is returned. If no timeout is specified then the function may block indefinitely.

Examples

These examples are mostly UNIX-specific, thus Windows users might wish to skip ahead.

Fcntl

The following definitions show how the fcntl function can be used to change a file's "non-blocking" flag. This is useful, e.g., if we want to read from standard input or a pipe but do not want to be blocked until input becomes available. Instead, having set the non-blocking flag, input operations will fail immediately if there is no input to be read right now.

 
/* set and clear the O_NONBLOCK flag of a file */
set_nonblock FD:Int     = fcntl FD F_SETFL (FLAGS or O_NONBLOCK)
                            where FLAGS = fcntl FD F_GETFL ();
clr_nonblock FD:Int     = fcntl FD F_SETFL (FLAGS and not O_NONBLOCK)
                            where FLAGS = fcntl FD F_GETFL ();

And here is how we can perform advisory locking on an entire file.

 
/* place an advisory read or write lock on an entire file (fail if error) */

rdlock FD:Int           = () where () = fcntl FD F_SETLK (F_RDLCK, 0, 0);
wrlock FD:Int           = () where () = fcntl FD F_SETLK (F_WRLCK, 0, 0);

/* remove the lock from the file */

unlock FD:Int           = () where () = fcntl FD F_SETLK (F_UNLCK, 0, 0);

/* predicates to check whether a read or write lock could be placed */

rdlockp FD:Int          = (LOCK!0 = F_UNLCK)
                            where LOCK:Tuple =
                              fcntl FD F_GETLK (F_RDLCK, 0, 0);
wrlockp FD:Int          = (LOCK!0 = F_UNLCK)
                            where LOCK:Tuple =
                              fcntl FD F_GETLK (F_WRLCK, 0, 0);

Note that to apply these functions to standard file objects you can use the fileno function (see Extended File Functions) as follows:

 
==> rdlock (fileno F)

Select

The select function accepts both files and file descriptors as input. Here is a way to test whether input is currently available from a file/descriptor:

 
avail F                 = not null (select ([F],[],[],0)!0);

This is useful, in particular, if the file is actually a pipe. For instance:

 
==> def F = popen "sleep 5; echo done" "r"

==> avail F   // no input to be read yet, wait ...
false

==> avail F   // ... input available now
true

==> fget F
"done\n"

However, most of the time select is used for multiplexing I/O operations. For instance, the following loop processes input from a set of files, one line at a time:

 
loop FILES              = loop (proc FILES F)
                            where ([F|_],_,_) = select (FILES,[],[])
                            if not null FILES;
                        = () otherwise;

proc FILES F            = // done with this file, get rid of it
                          filter (neq F) FILES
                            if feof F;
                        = // process a line
                          writes (fgets F) || FILES;

Anonymous Pipes

The pipe and dup2 operations provide a quick way to reassign input and output of a child process and connect it to corresponding file objects in the parent. For instance, here's how we can implement a popen2 function which works like the built-in popen routine, but allows to redirect both the input and output side of a child process:

 
import system;

/* Create two unnamed pipes, one for the parent to read and the child to
   write, the other one for the child to read and the parent to write. */

popen2 CMD      = spawn2 CMD (P_IN, P_OUT) (C_IN, C_OUT)
                    where (P_IN, C_OUT) = pipe, (C_IN, P_OUT) = pipe;

/* Fork the child and redirect its standard input and output streams to the
   child's ends of the pipe. This is accomplished with dup2 SRC DEST which
   closes the file descriptor DEST and then makes DEST a copy of SRC. In the
   parent we use fdopen to open two new file objects for the parent's ends of
   the pipes. */

spawn2 CMD (P_IN, P_OUT) (C_IN, C_OUT)
                = close P_IN || close P_OUT ||
                  dup2 C_IN (fileno INPUT) || dup2 C_OUT (fileno OUTPUT) ||
                  exec "/bin/sh" ["/bin/sh", "-c", CMD] 
                    if fork = 0;
                = close C_IN || close C_OUT ||
                  (fdopen P_IN "r", fdopen P_OUT "w")
                    otherwise;

The popen2 function employs fork and exec to spawn a child process which executes the given command using the shell, after redirecting the child's input and output to two pipes. In the parent process, the popen2 function returns a pair of files opened on the other ends of the child's descriptors.

The following piece of Q code shows how to apply the popen2 function defined above in order to pipe a string list into the UNIX sort program and construct the sorted list from the output:

 
mysort STRL     = do (fprintf OUT "%s\n") STRL || fclose OUT || digest IN
                    where (IN, OUT) = popen2 "sort";

digest IN       = [] if feof IN;
                = [freads IN|digest IN] otherwise;

Example:

 
==> mysort ["five","strings","to","be","sorted"]
["be","five","sorted","strings","to"]

==> wait // get exit code of child process (sort program)
(15804,0)

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

12.9 Terminal Operations

The following (UNIX-specific) operations from the system module provide an interface to the POSIX termios interface. Terminal attributes are stored in a "termios" structure, represented as a 7-tuple (IFLAG, OFLAG, CFLAG, LFLAG, ISPEED, OSPEED, CC). The control character set CC is represented as a list of character numbers, indexed by the symbolic constants VEOF etc. See termios(3) for further details.

 
public extern tcgetattr FD;             // get terminal attributes
public extern tcsetattr FD WHEN ATTR;   // set terminal attributes

public extern tcsendbreak FD DURATION;  // send break
public extern tcdrain FD;               // wait until all output finished
public extern tcflush FD QUEUE;         // flush input or output queue
public extern tcflow FD ACTION;         // control input/output flow

public extern tcgetpgrp FD;             // get terminal process group
public extern tcsetpgrp FD PGID;        // set terminal process group

/* Access components of the termios structure. */

public c_iflag ATTR, c_oflag ATTR, c_cflag ATTR, c_lflag ATTR, 
  c_ispeed ATTR, c_ospeed ATTR, c_cc ATTR;

Example

This example shows how to use the termios functions to read a password from the terminal without echoing. This is an almost literal translation of the C program described in Richard Stevens: Advanced Programming in the UNIX Environment, Addison-Wesley, 1993, cf. p. 350. The main difference is that we merely ignore the SIGINT and SIGTSTP signals instead of blocking them (the latter is not supported by Q's trap builtin).

 
import system;

getpass PROMPT  = fwritec F "\n" || unprep F SAVE || PW
                    where F:File = fopen ctermid "r+", SAVE = prep F,
                      PW = fwrites F PROMPT || fflush F || freads F;

/* prep F: ignore SIGINT and SIGTSTP and prepare the terminal */

prep F          = tcsetattr (fileno F) TCSAFLUSH NATTR || (ATTR,TRAPS)
                    where TRAPS = map (trap SIG_IGN) [SIGINT,SIGTSTP],
                      ATTR = tcgetattr (fileno F),
                      (IF,OF,CF,LF,IS,OS,CC) = ATTR,
                      LF = LF and not (ECHO or ECHOE or ECHOK or ECHONL),
                      NATTR = (IF,OF,CF,LF,IS,OS,CC);

/* unprep F SAVE: revert to previous settings */

unprep F (ATTR,TRAPS)
                = tcsetattr (fileno F) TCSAFLUSH ATTR ||
                  zipwith trap TRAPS [SIGINT,SIGTSTP];

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

12.10 Readline Interface

The following functions from system.q provide a basic interface to the GNU readline library. While readline doesn't really belong to the POSIX interface, it is rather useful and the interpreter uses it anyway, so it makes sense to also provide it as a part of the system interface.

 
public extern readline PROMPT, rl_line_buffer;
public extern add_history LINE, stifle_history MAX;
public extern read_history FNAME, write_history FNAME;

These functions work like their C counterparts. The readline function prompts with the given string and reads an input line, providing editing and history facilities. Basic bash-like filename completion is also provided. The rl_line_buffer function returns the text of the current input line, which is useful, e.g., if a custom completion function (see below) needs to inspect surrounding context.

The add_history routine adds a line to the history. The maximum size of the history can be set with stifle_history, which also returns the size which was set previously. A negative MAX value makes the size of the history unbounded (which is also the default). Last but not least, the history can be read from and written to a file with the read_history and write_history functions.

For instance, the following commands read an input string, add it to the history, and finally save the history to a file:

 
==> import system

==> readline "input> "
input> foo bar
"foo bar"

==> add_history _
()

==> write_history "myhistory"
()

As another example, here is the definition of a little convenience function which reads a line using readline and enters it into the history if it is nonempty:

 
get_line PROMPT = if not null LINE then add_history LINE || LINE
                    where LINE:String = readline PROMPT;

Readline's standard filename completion facility can be augmented with a custom completion function. This is achieved by simply setting the following global variable to the desired function:

 
public var RL_COMPLETION_FUNCTION;

The completion function is invoked with two arguments, the string to be completed and the position index of that string in the input line (rl_line_buffer, see above), and is expected to return a string list with the possible completions. For instance, here is a simple example of a function which checks a list of "command" words for possible completions:

 
def RL_COMPLETION_FUNCTION = complete;
complete S _    = filter (is_prefix S) ["bar","foo","gnats","gnu"];
is_prefix X Y   = (X=sub Y 0 (#X-1));

The position index argument is useful if the completion depends on the position inside the input line. For instance, the following function only attempts completion at the beginning of the input line:

 
complete S 0    = filter (is_prefix S) ["bar","foo","gnats","gnu"];

Readline's default behaviour is to try a custom completion first, if it is available, and to fall back to the standard filename completion otherwise. The latter can be suppressed by ending the list of completions with a () entry:

 
complete S 0    = filter (is_prefix S) ["bar","foo","gnats","gnu"] ++ [()];
complete _ _    = [()] otherwise;

The text units for which readline attempts completion are the "words" of the input line. There is a `RL_WORD_BREAK_CHARS' variable which allows you to change readline's idea of what a word is, by setting the variable to a string containing the word delimiter characters. The default definition of this variable is as follows:

 
public var RL_WORD_BREAK_CHARS = " \t\n\"\\'`@$><=;|&{(";

Redefining RL_WORD_BREAK_CHARS affects both readline's cursor movement commands and the behaviour of the completion routine. For instance, to turn the comma into a word delimiter, you can change RL_WORD_BREAK_CHARS as follows:

 
def RL_WORD_BREAK_CHARS = RL_WORD_BREAK_CHARS++",";

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

12.11 System Information

The functions described here are all in the system module. Error codes from various system operations can be retrieved with the following functions:

 
public extern errno, seterrno N;        // get/set last error code
public extern perror S, strerror N;     // print error message

The perror function is commonly used to report error conditions in system operations on the standard error file. For instance:

 
==> fopen "/etc/passw" "r"
fopen "/etc/passw" "r"

==> perror "fopen"
fopen: No such file or directory
()

If more elaborate formatting is required then you can use strerror on the errno value to obtain the error message as a string:

 
==> fprintf ERROR "fopen returned message `%s'\n" (strerror errno)
fopen returned message `No such file or directory'
()

Note that errno is only set when an error occurs in a system call. You can use seterrno to reset the errno value before a system operation to check whether there actually was an error while executing the system call:

 
==> seterrno 0
()

==> fopen "/etc/passwd" "r"
<<File>>

==> perror "fopen"
fopen: Success
()

The remaining operations are used to obtain various information about the system and its information databases. For instance, the uname operation returns a 5-tuple containing information identifying the operating system (this operation is generally only available on UNIX-like systems):

 
public extern uname;

/* Access components of uname result. */

public un_sysname UNAME, un_nodename UNAME, un_release UNAME,
  un_version UNAME, un_machine UNAME;

The hostname of the system can be retrieved with the gethostname function.

 
public extern gethostname;

The password and group database can be accessed with the following functions. Password entries are encoded as 7-tuples (NAME, PASSWD, UID, GID, GECOS, DIR, SHELL), group entries as 4-tuples (NAME, PASSWD, GID, MEMBERS). This information is only available on UNIX-like systems.

 
public extern getpwuid UID, getpwnam NAME;      // look up a password entry
public extern getpwent;                         // list of all pw entries
public extern getgrgid GID, getgrnam NAME;      // look up group entry
public extern getgrent;                         // list of all group entries

/* Access components of password and group structures. */

public pw_name PW, pw_passwd PW, pw_uid PW, pw_gid PW, pw_gecos PW,
  pw_dir PW, pw_shell PW;

public gr_name GR, gr_passwd GR, gr_gid GR, gr_members GR;

Moreover, the crypt function can be used to perform UNIX password encryption (see crypt(3) for details).

 
public extern crypt KEY SALT; // (U)

The following functions can be used to query host information as well as the network protocols and services available on your system. This information is closely related to the socket interface described in Sockets. For a closer description of these operations we refer the reader to the corresponding manual pages. Note that the gethostent, getprotoent and getservent operations are not available on Windows.

The host database: Host entries are of the form (NAME, ALIASES, ADDR_TYPE, ADDR_LIST), where NAME denotes the official hostname, ALIASES its alternative names, ADDR_TYPE the address family and ADDR_LIST the list of addresses.

 
public extern gethostbyname HOST, gethostbyaddr ADDR;
public extern gethostent; // (U)

public h_name HENT, h_aliases HENT, h_addr_type HENT, h_addr_list HENT;

Note that both hostnames and IP addresses are specified as strings. Hostnames are symbolic names such as "localhost", and can also have a domain name specified, as in "www.gnu.org". IPv4 addresses use the well-known "numbers-and-dots" notation, like the loopback address "127.0.0.1". IPv6 addresses are usually written as eight 16-bit hexadecimal numbers that are separated by colons; two colons are used to abbreviate strings of consecutive zeros. For example, the IPv6 loopback address "0:0:0:0:0:0:0:1" can be abbreviated as "::1".

The protocol database: Protocol entries are of the form (NAME, ALIASES, PROTO) denoting official name, aliases and number of the protocol.

 
public extern getprotobyname NAME;
public extern getprotobynumber PROTO;
public extern getprotoent; // (U)

public p_name PENT, p_aliases PENT, p_proto PENT;

The service database: Service entries are of the form (NAME, ALIASES, PORT, PROTO) denoting official name, aliases, port number and protocol number of the port. The NAME argument of getservbyname can also be a pair (NAME, PROTO) to restrict the search to services for the given protocol (given by its name). Likewise, the PORT argument can also be given as (PORT, PROTO).

 
public extern getservbyname NAME;
public extern getservbyport PORT;
public extern getservent; // (U)

public s_name PENT, s_aliases PENT, s_port PENT, s_proto PENT;

For instance, here is some information retrieved from a typical Linux system:

 
==> import system

==> uname
("Linux","obelix","2.4.19-4GB","#2 Tue Mar 4 16:03:51 CET 2003","i686")

==> gethostname
"obelix"

==> gethostbyname gethostname
("obelix.local",["obelix"],2,["127.0.0.2"])

==> gethostbyaddr "::1"
("localhost",["ipv6-localhost","ipv6-loopback"],10,["::1"])

==> getprotobyname "tcp"
("tcp",["TCP"],6)

==> getservbyname "ftp"
("ftp",[],5376,"tcp")

==> getpwuid getuid
("ag","x",500,100,"Albert Gräf","/home/ag","/bin/bash")

==> getgrgid getgid
("users","x",100,[])

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

12.12 Sockets

The following functions from the system module are available on systems providing a BSD-compatible socket layer, which, besides BSD, includes BEOS, Linux, OSX, Windows and most recent System V flavours. Sockets provide bidirectional communication channels on the local machine as well as across the network. Sockets are represented by file descriptors which can be written to and read from with the send and recv functions. On most systems, socket descriptors are just ordinary file descriptors which can also be used with low-level I/O functions and fdopen as usual. However, on some systems (in particular, BEOS and Windows), socket descriptors are "special" and all socket I/O must be performed with the special socket operations.

At creation time, a socket is described by the following attributes (see socket(2) for more details):

Before another process can connect to a socket it must also be bound to an address. The address format depends on the address family of the socket. For the local namespace, the address is simply a filename on the local filesystem. For the IPv4 namespace, it is a pair (HOST, PORT) where HOST denotes the host name or IP address, specified as a string, and PORT is a port number. For the IPv6 namespace, it is a quadruple (HOST, PORT, FLOWINFO, SCOPEID). Host names and known port numbers can be retrieved from the host and service databases (see System Information).

The following operations are provided to create a socket, or a pair of connected sockets, for the given address family, socket type and protocol. They return the file descriptor of the socket (or a pair of file descriptors).

 
public extern socket FAMILY TYPE PROTO;
public extern socketpair FAMILY TYPE PROTO; // (U)

The shutdown function terminates data transmission on a socket. You can stop reading, writing or both, depending on whether HOW is SHUT_RD, SHUT_WR or SHUT_RDWR. Note that this operation does not close the socket's file descriptor; for this purpose closesocket is used (see below).

 
public extern shutdown SOCKET HOW;

The closesocket function closes a socket. On most systems this is just identical to close (see Low-Level I/O), but, as already noted, on some systems socket descriptors are special and you must use this function instead.

 
public extern closesocket SOCKET;

The bind function binds a socket to an address. This is also done automatically when the socket is first used. However, if the socket has to be found by another process you'll have to explicitly specify an address for it. The bind function does just that.

 
public extern bind SOCKET ADDR;

The following operations are used to start listening for and accept connection requests on a socket. These operations are used on the server side of a connection-based socket. The argument N of listen denotes the maximum number of pending connection requests for the server. After the call to listen, the server can accept connections from a client with the accept function, which returns a pair (SOCKET, ADDR), where SOCKET is a new socket connected to the client, and ADDR is the client's address.

 
public extern listen SOCKET N;
public extern accept SOCKET;

The connect function is used to initiate a connection on a socket. This function can be used on both connection-based and connectionless sockets. In the former case, connect can only be invoked once. In the latter case, it can be invoked multiple times, and sets the remote socket for subsequent send and receive operations.

 
public extern connect SOCKET ADDR;

The following routines retrieve information about a socket. The local address of a socket and the address of the remote socket it is connected to can be retrieved with getsockname and getpeername. Socket options, specified using a protocol level LEVEL and an option index OPT, can be queried and changed with getsockopt and setsockopt. The option values are encoded as byte strings (cf. Byte Strings). For a description of the available options see getsockopt(2).

 
public extern getsockname SOCKET;
public extern getpeername SOCKET;
public extern getsockopt SOCKET LEVEL OPT;
public extern setsockopt SOCKET LEVEL OPT VAL;

Finally, the following specialized I/O functions are used to transmit data over a socket. All data is encoded as byte strings. The receive operations return the received data (which may be shorter than the requested size, if not enough data was currently available), the send operations the number of bytes actually written. For recvfrom/sendto the data is encoded as a pair (ADDR, DATA) which includes the source/destination address; these operations are typically used for connectionless sockets. The FLAGS argument is used to specify special transmission options (see the MSG_* constants at the beginning of clib.q).

 
public extern recv SOCKET FLAGS SIZE, send SOCKET FLAGS DATA;
public extern recvfrom SOCKET FLAGS SIZE, sendto SOCKET FLAGS DATA;

Example

The following script demonstrates how we can implement a connectionless server in the IPv4 namespace which repeatedly accepts a request from a client and sends back an answer. In this example the requests are strings denoting Q expressions; the server evaluates each expression and sends back the result as a string. The client reads input from the user, transmits it to the server and prints the received answer. Note that the transmitted strings are represented as byte strings, as required by the recvfrom and sendto operations. The bytestr and bstr functions are used to convert between character and byte strings, see Byte Strings.

 
import system;

def BUFSZ = 500000; // buffer size

/* the server: receive messages, evaluate them as Q expressions, and
   send back the results */

def SERVER = ("localhost",5001); // the server address

server          = server_loop FD
                    where FD:Int = socket AF_INET SOCK_DGRAM 0,
                      () = bind FD SERVER;
                = perror "server" otherwise;

server_loop FD  = sendto FD 0 (ADDR,eval MSG) || server_loop FD
                    where (ADDR,MSG) = recvfrom FD 0 BUFSZ;
                = server_loop FD otherwise;

/* evaluate an expression encoded as a byte string, catch syntax
   errors and exceptions, convert result back to a byte string */

eval MSG        = catch exception (bytestr (str VAL))
                    where 'VAL = valq (bstr MSG);
                = bytestr ">>> SYNTAX ERROR" otherwise;
exception _     = bytestr ">>> ABORTED";

/* the client: read input from user, send it to the server, print
   returned result */

def CLIENT = ("localhost",5002); // the client address

client          = client_loop FD
                    where FD:Int = socket AF_INET SOCK_DGRAM 0,
                      () = bind FD CLIENT;
                = perror "client" otherwise;

client_loop FD  = sendto FD 0 (SERVER,bytestr MSG) ||
                  printf "%s\n" (bstr (recv FD 0 BUFSZ)) || client_loop FD
                    if not null MSG
                    where MSG:String = writes "\nclient> " || flush || reads;
                = () otherwise;

For instance, we can invoke the server in a secondary thread and then execute the client as follows:

 
==> def S = thread server

==> client

client> prd [1..50]
30414093201713378043612608166064768844377641568960512000000000000

client> 1+)
>>> SYNTAX ERROR

client> quit
>>> ABORTED

client> 

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

12.13 POSIX Threads

On systems where the POSIX threads library or some compatible replacement is available (this includes Windows and most modern UNIXes), clib provides functions for handling multiple threads of control. Threads, a.k.a. "light-weighted processes", allow you to realize "multithreaded scripts" consisting of different tasks which together perform some computation in a distributed manner. All tasks are executed concurrently. Thus you can, e.g., perform some lengthy calculation in a background task while you go on evaluating other expressions in the interpreter's command loop. You can also have tasks communicate via mutexes, conditions and semaphores.

The operations described in this section (which are all contained in clib and thus included in the prelude) are in close correspondence with POSIX 1003.1b. However, some operations are named differently, and semaphores provide the extra functionality of sending data from one thread to another. Mutexes are also supported, mostly for the purpose of handling critical sections involving operations with side-effects (I/O etc.). Mutexes are not required to make conditions work since these have their own internal mutex handling. For more information on POSIX threads, please refer to the corresponding section in the UNIX manual.

Please note that these functions will only work as advertised if the interpreter has been built with POSIX thread support. Moreover, in the current implementation the interpreter effectively serializes multithreaded scripts on the reduction level and thus user-level threads cannot really take advantage of multi-processor machines.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

12.13.1 Thread Creation and Management

Clib threads are represented using handles (objects of type Thread). Note that a thread is canceled automatically as soon as its handle is garbage collected, thus you should keep the handle around as long as the thread is needed. For convenience, thread handles are numbered arbitrarily, starting at 0 which denotes the main thread, and are ordered by the thread numbers. This is handy, e.g., if you want to use thread handles as indices in a dictionary.

 
public extern type Thread;              // thread handle type

public isthread THREAD;                 // check for thread objects

public extern thread_no THREAD;         // thread number
public extern this_thread;              // handle of the current thread

The basic thread operations are listed below. The thread function starts evaluating its special argument in a new thread, and returns its handle. You can wait for a thread to terminate and obtain the evaluated result with the result function. (If there is no result, because the thread has been canceled, or was aborted with halt, quit or a runtime error, result fails.) Note that halt or quit in a thread which is not the main thread only terminates the current thread; however, the exit function, cf. Process Control, always exits from the interpreter. You can also terminate the current thread immediately and return a given value as its result with the return function; in the main thread, this function is equivalent to halt and the return value is ignored. Moreover, all threads except the main thread can also be canceled from any other thread using the cancel function. Finally, the yield function allows the interpreter to switch threads at any given point (normally the interpreter will only switch contexts in certain builtins and when a new rule is activated).

 
public extern special thread X;         // start new thread
public extern return X;                 // terminate thread with result X
public extern cancel THREAD;            // cancel THREAD
public extern result THREAD;            // wait for THREAD, return result
public extern yield;                    // allow context switch

Clib threads always use deferred cancellation, hence thread cancellation requests are usually not honored immediately, but are deferred until the thread reaches a cancellation point where it is safe to do so. Cancellation points occur at certain C library calls listed in the POSIX threads documentation, when a new equation is activated in the Q interpreter, and when yield is called.

You can also check whether a thread is still active or has been canceled. If neither condition holds, then the thread has already been terminated and you can obtain its result with the result operation.

 
public extern active THREAD;            // check if THREAD is active
public extern canceled THREAD;          // check if THREAD was canceled

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [