| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Clib is the C interface and "system" module of the Q
programming language. As of Q 7.8, this component actually consists of
two different modules, clib.q which provides basic C data
structures and routines commonly used in most Q programs, and
system.q which contains most of the POSIX system interface. Only
clib.q is part of the prelude; for most system functions, you
will have to explicitly import system.q in your programs. In the
sections below, we will always indicate whether the system.q
module is needed for the described operations.
In difference to the other standard library modules, clib and
system are external modules, i.e., most functions are
actually implemented in C (cf. C Language Interface). Together,
clib and system provide additional string operations,
extended file functions, C-style formatted I/O, low-level and binary
I/O, an interface to various system functions, POSIX thread functions,
expression references, time functions, internationalization support,
filename globbing and regular expression matching, additional integer
functions from the GMP library, and, last but not least, efficient C
replacements for some common standard library list and string processing
functions.
Even if you do not use the extra functionality provided by these
modules, you will benefit from the replacement operations (which are in
clib and thus included in the prelude), which considerably speed
up basic list and string processing, sometimes by several orders of
magnitude.
NOTE: Not all of the following operations are implemented on all
systems. The UNIX-specific operations are marked with the symbol
`(U)' in the clib.q and system.q scripts. Only a
portable subset of the UNIX system interface is provided, which
encompasses the most essential operations found on many recent UNIX (and
other POSIX) systems, as described by the ANSI C and POSIX standards as
well as the Single UNIX Specification (SUS). These operations are also
available on Linux and OSX systems.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Clib defines an abundance of symbolic values for use with various
system functions. Most of these can be found in system.q, but a
few constants related to the memory sizes of various basic C data types
and the fseek and setvbuf operations can also be found in
clib.q.
System constants actually vary from system to system; only the most
common values are provided as global variables here. The variables are
declared const (read-only) and are initialized at startup
time. Flag values can be combined using bitwise logical operations as
usual. A complete list of the variables can be found at the beginning of
the clib.q and system.q scripts. Flag values which are
unavailable on the host system will be set to zero, other undefined
values to -1. Thus undefined values will generally have no effect or
cause the corresponding operations to fail.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
These functions provide an interface to some familiar character routines
from the C library. They can all be found in clib.q and are thus
in the standard prelude.
Character predicates: These work exactly like the corresponding C library routines, except that they work with arbitrary Unicode, not just ASCII, characters, provided that the interpreter has been built with Unicode support.
public extern islower C, isupper C, isalpha C, isdigit C, isxdigit C, isalnum C, ispunct C, isspace C, isgraph C, isprint C, iscntrl C, isascii C; |
String conversion: Convert a string to lower- or uppercase (like the corresponding C functions, but work on arbitrary Unicode strings, not just on single ASCII characters).
public extern tolower S, toupper S; |
Count the number of alphanumeric characters in a text:
==> #filter isalnum (chars "The little brown fox.\n") 17 |
Convert a string to uppercase:
==> toupper "The little brown fox.\n" "THE LITTLE BROWN FOX.\n" |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The following type represents unstructured binary data implemented as C
byte vectors. This data structure is used by the low-level I/O functions
and other system functions which operate on binary data. The
ByteStr type itself and its operations are implemented in
clib.q and thus included in the prelude.
public extern type ByteStr; public isbytestr B; // check for byte strings |
Byte strings are like ordinary character strings, but they do not have a
printable representation, and they may include zero bytes. (Recall that a
zero byte in a character string terminates the string.) They can be used
to encode arbitrary binary data such as C vectors and structures. The
bytestr function can be used to construct byte strings from
integers, floating point numbers, string values or lists of unsigned
byte values:
public extern bytestr X; // create a byte string |
The X argument denotes the data to be encoded and can be either a
list of byte values (unsigned integers in the range from 0 to 255), or
an atomic data object, i.e., an integer, floating point number or string
constant. In the latter case, the argument can also have the form
(X,SIZE) indicating the desired byte size of the object;
otherwise a reasonable default size is chosen. If the specified size
differs from the actual size of X, the result is zero-padded or
truncated accordingly. Integer values are encoded in the host byte
order, with the least significant GMP limb first; negative integers are
represented in 2's complement. Floating point values are encoded using
double precision by default or if the byte count is sufficient (i.e., at
least 8 on most systems), and using single precision otherwise. Strings
are by default encoded in the system encoding, but you can also specify
the desired target encoding as (X,CODESET) (or
(X,CODESET,SIZE) if you also need to specify a byte size), where
CODESET is a string denoting the target encoding.
Like ordinary character strings, byte strings can be concatenated,
size-measured, indexed, sliced and compared lexicographically. Moreover,
a byte string can be converted back to a (multiprecision) integer,
floating point number, string value, or a list of byte values. (When
converting back to a string you can specify the source encoding as in
bstr (B,CODESET), otherwise the system encoding is assumed.) For
these purposes the following operations are provided.
public extern bcat Bs; // concatenate list of byte strings public extern bsize B; // byte size of B public extern byte I B; // Ith byte of B public extern bsub B I J; // slice of B (bytes I..J) public extern bcmp M1 M2; // compare M1 and M2 public extern bint B; // convert to unsigned integer public extern bfloat B; // convert to floating point number public extern bstr B; // convert to string public bytes B; // convert to list public ::list B; // dito |
You can use the bytes function to convert a byte string to a list
of byte values; the list function is overloaded to provide the
same functionality. These functions are defined as follows:
bytes B:ByteStr = map (B!) [0..#B-1]; list B:ByteStr = bytes B; |
For convenience, the common string operators and the sub function
are overloaded to work on byte strings as well. Thus #B returns
the size of B (the number of bytes it contains) and B!I
the Ith byte of B. B1++B2 concatenates B1
and B2, sub B I J returns the slice from byte I to
J, and the relational operators `=', `<', `>'
etc. can be used to compare byte strings lexicographically. These
operations are all implemented in terms of the functions listed above.
As of Q 7.11, clib supports a number of additional operations
which allow you to treat byte strings as mutable C vectors of
signed/unsigned 8/16/32 bit integers or single/double precision floating
point numbers. The following functions provide read/write access to
elements and slices of such C vectors:
public extern get_int8 B I, get_int16 B I, get_int32 B I; public extern get_uint8 B I, get_uint16 B I, get_uint32 B I; public extern get_float B I, get_double B I; public extern put_int8 B I X, put_int16 B I X, put_int32 B I X; public extern put_uint8 B I X, put_uint16 B I X, put_uint32 B I X; public extern put_float B I X, put_double B I X; |
Note that the given index argument I is interpreted relative to
the corresponding element type. Thus, e.g., get_int32 B I returns
the Ith 32 bit integer rather than the integer at byte offset
I. Also note that integer arguments must fit into machine
integers, otherwise these operations will fail. Integers passed for
floating point arguments will be coerced to floating point values
automatically.
For the get_xxx functions, the index parameter may also be a pair
(I,J) to return a slice of the given byte string instead of a
single element (this works like sub/bsub, but interprets
indices relative to the element type). The put_xxx functions also
accept a byte string instead of an element as input, and will then
overwrite the corresponding slice of the target byte string B
with the given source byte string X. Similar to
sub/bsub, these variations of
get_xxx/put_xxx are "safe" in that they automatically
adjust the given indices to fit within the bounds of the target byte
string.
Moreover, the following convenience functions are provided to convert between byte strings and lists of integer/floating point elements.
public extern int8_list B, int16_list B, int32_list B; public extern uint8_list B, uint16_list B, uint32_list B; public extern float_list B, double_list B; public extern int8_vect Xs, int16_vect Xs, int32_vect Xs; public extern uint8_vect Xs, uint16_vect Xs, uint32_vect Xs; public extern float_vect Xs, double_vect Xs; |
Encode an integer as a byte string, take a look at its individual bytes, and convert the byte string back to an integer:
==> hex ==> def B = bytestr 0x01020304; bytes B; bint B [0x4,0x3,0x2,0x1] 0x1020304 |
(Note that this result was obtained on a little-endian system, hence the
least significant byte 0x04 comes first in the byte list.)
Negative integers are correctly encoded in 2's complement:
==> def B = bytestr (-2); bytes B; bint B [0xfe,0xff,0xff,0xff] 0xfffffffe |
To work with these binary representations you must be aware of the way
GMP represents multiprecision integers. In particular, note that the
default size of an integer is always a multiple (at least one) of GMP's
limb size which is usually 4 or 8 bytes depending on the host system's
default long integer type. The actual limb size can be determined
as follows:
==> #bytes (bytestr 0) |
In order to get integers of arbitrary sizes, an explicit SIZE
argument may be used. For instance, here is how we encode small (1 or 2
byte) integers:
==> bytes (bytestr (0x01,1)); bytes (bytestr (0x0102,2)) [0x1] [0x2,0x1] |
The host system's byte sizes of various atomic C types can be determined
with symbolic values declared at the beginning of clib.q, such as
SIZEOF_CHAR, SIZEOF_SHORT, SIZEOF_LONG,
SIZEOF_FLOAT and SIZEOF_DOUBLE.
Another fact worth mentioning is that even on big-endian systems,
integers are always encoded with the "least significant limb"
first. So, for instance, given that the limb size is 4, as in the above
examples, the 2-limb integer 0x0102030405060708 consists of bytes
0x8 0x7 0x6 0x5 0x4 0x3 0x2 0x1 on a little-endian system, in
that order, whereas the byte order on a big-endian system is 0x5
0x6 0x7 0x8 0x1 0x2 0x3 0x4.
Here is how we can quickly check the byte order of the host system:
==> hd (bytes (bytestr 1)) |
This expression returns 1 on a little-endian system and zero otherwise.
As long as an integer does not exceed the machine's word size (which usually matches the limb size), we can simply convert between big-endian and little-endian representation by reversing the byte list:
==> bytestr (reverse (bytes B)) |
Floating point values can be encoded either in double or single
precision, depending on the SIZE argument. The default size is
double precision (usually 8 bytes).
==> bfloat (bytestr (1/3)); bfloat (bytestr (1/3,SIZEOF_FLOAT)) 0.333333333333333 0.333333343267441 |
The default size of the encoding of a character string is the byte size of the string in the target encoding (the system encoding by default). If an explicit size is given, the string is zero-padded or truncated if necessary. The following example will work with any system encoding based on 7 bit ASCII (like Latin1, UTF-8 or ASCII itself):
==> dec
==> def S1 = bytestr "ABC", S2 = bytestr ("ABC",2), S3 = bytestr ("ABC",5)
==> bytes S1; bytes S2; bytes S3
[65,66,67]
[65,66]
[65,66,67,0,0]
==> bstr S1; bstr S2; bstr S3
"ABC"
"AB"
"ABC"
|
By combining elements like the ones above, and including appropriate
"tagging" information, more complex data structures can be represented
as binary data as well. For this purpose, the byte strings of the tags
and the data elements can be concatenated with bcat or the
`++' operator. This is useful, in particular, for compact storage
of objects in files. Moreover, some system functions involve binary data
which might represent C structures and/or vectors. Such data can be
assembled from the constituent parts by simply concatenating them. For
instance, consider the following C struct:
struct { char foo[108]; short bar; int baz; };
|
A value of this type, say {"Hello, world.", 4711, 123456}, can
then be encoded as follows:
==> bytestr ("Hello, world.",108) ++ bytestr (4711,SIZEOF_SHORT) ++ \
bytestr (123456,SIZEOF_INT)
|
Similarly, a list of integers can be converted to a corresponding C vector as follows:
==> bcat (map bytestr [1..100]) |
When encoding such C structures you must also consider alignment issues. For instance, most C compilers will align non-byte data at even addresses.
In order to facilitate the handling of C vectors of integers and
floating point values, as of Q 7.11 clib offers a number of
specialized operations which provide direct read/write access to
elements and slices of numeric vectors, and allow you to convert between
C vectors and Q lists of integer or floating point values. These
operations are all implemented directly in C and will usually be much
more efficient for manipulating numeric C vectors than the basic
byte-oriented functions. Moreover, they allow you to modify the elements
of a C vector in a direct fashion, turning byte strings into a mutable
data structure.
Different operations are provided to handle vectors of signed or unsigned 8/16/32 bit (machine) integers, as well as single (32 bit) and double precision (64 bit) floating point numbers. For instance:
==> def B = uint32_vect [100..110] ==> uint32_list B [100,101,102,103,104,105,106,107,108,109,110] ==> get_uint32 B 1 101 ==> put_uint32 B 1 0xffffffff () ==> uint32_list B [100,4294967295,102,103,104,105,106,107,108,109,110] |
Note that, because these C vectors are just normal byte strings, you can freely convert between different representations of the numeric data. E.g.:
==> take 12 $ int8_list B [100,0,0,0,-1,-1,-1,-1,102,0,0,0] |
Entire slices of byte strings can be retrieved and overwritten as
well. Note that, as with sub, the indices are adjusted
automatically to stay within the bounds of the target vector.
==> put_uint32 B (-2) (uint32_vect [90..94]) () ==> uint32_list B [92,93,94,103,104,105,106,107,108,109,110] ==> uint32_list $ get_uint32 B (-2,3) [92,93,94,103] ==> uint32_list $ get_uint32 B (8,100) [108,109,110] |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Clib provides the following enhanced and additional file functions:
public extern ::fopen NAME MODE, fdopen FD MODE, freopen NAME MODE F; public extern fileno F; public extern setvbuf F MODE; public extern fconv F CODESET; public extern tmpnam, tmpfile; public extern ftell F, fseek F POS WHENCE; public rewind F; public extern gets, fgets F; public extern fget F; public extern ungetc C, fungetc F C; |
These are all defined in clib.q and thus included in the prelude.
The fopen version of clib handles the `+' flag in mode
strings, thus enabling you to open files for both reading and writing. The
mode "r+" opens an existing file for both reading and writing; the
initial file contents are unchanged, and both the input and output file
pointers are positioned at the beginning of the file. The "w+" mode
creates a new file, or truncates it to zero size if it already exists, and
positions the file pointers at the beginning of the file. The "a+" mode
appends to an existing file (or creates a new one); the initial file pointer
is set at the beginning of the file for reading, and at the end of the file
for writing. All these modes also work in combination with the b
(binary file) flag.
The freopen function is like fopen, but reopens an
existing file object on another file. Just as in C programming, the main
purpose of this operation is to enable the user to redirect the standard
I/O streams associated with the interpreter process (available in the
interpreter by means of the INPUT, OUTPUT and ERROR
variables).
The fdopen function opens a new file object on a given file
descriptor, given that the mode is compatible. Conversely, the
fileno function returns the file descriptor of a file
object. (See also the functions for direct file descriptor manipulation
in Low-Level I/O.)
The setvbuf function sets the buffering mode for a file
(IONBF = no buffering, IOLBF = line buffering,
IOFBF = full buffering). This operation should be invoked right
after the file has been opened, before any I/O operations are
performed.
The fconv function sets the encoding of a file. By default, Q's
built-in I/O operations as well as clib's string I/O functions
assume the system encoding, and convert between this encoding and the
internal UTF-8 string representation as needed. If a text file uses an
encoding different from the system encoding, you can use the
fconv function to set the desired encoding. CODESET must
be a string denoting a valid encoding name for the iconv function
(see also Internationalization, below). This affects all
subsequent text read/write operations on the file. (This operation only
works for Unicode-capable systems which have iconv
installed. Also note that this function is only available for Q 7.0 and
later.)
The tmpnam and tmpfile functions work just like the
corresponding C routines: tmpnam returns a unique name for a temporary
file, and tmpfile constructs a temporary file opened in "w+b"
mode, which will be deleted automatically when it is closed. See the
tmpnam(3) and tmpfile(3) manual pages for details.
The ftell/fseek functions are used for file
positioning. The ftell function returns the current file
position, while fseek function positions the file at the given
position. The rewind function provides a convenient shorthand for
repositioning the file at the beginning. These operations work just like
the corresponding C functions. The WHENCE argument of
fseek determines how the POS argument is to be
interpreted; it can be either SEEK_SET (POS is relative to
the beginning of the file, i.e., an absolute position), SEEK_CUR
(POS is relative to the current position) or SEEK_END
(POS is relative to the end of the file). In the latter two cases
POS can also be negative.
Portability Notes:
fflush or
fseek before switching between reading and writing on a file opened
with the `+' flag.
ftell and fseek might only work reliably if
the file is opened in binary mode (b flag).
The gets/fgets functions work like the C fgets
function, i.e., they read a line from standard input or the given file
including the trailing newline, if any. The fget function
reads an entire file at once and returns it as a string. The
ungetc/fungetc functions push back a single character on
standard input or the given input file, like the C ungetc
function. The C library only guarantees that pushing back a single ASCII
character will work, so the result of pushing back multiple or multibyte
characters is implementation-dependent.
Moreover, the following additional aliases are provided for C aficionados:
public ::readc as getc, ::freadc F as fgetc; public ::writes S as puts, ::fwrites F S as fputs; public ::writec C as putc, ::fwritec F C as fputc; |
Open a new file for both reading and writing: ==> def F = fopen "test" "w+" Write a string to the file: ==> fwrites F "The little brown fox.\n" () Current position is behind written string (at end-of-file): ==> ftell F 22 Rewind (go to the beginning of the file): ==> rewind F () Read back the string we've written before: ==> fgets F "The little brown fox.\n" Check that we're again at end-of-file: ==> feof F true Output another string: ==> fwrites F "The second line.\n" () Position behind the first string: ==> fseek F 22 SEEK_SET () Reread the second string: ==> fgets F "The second line.\n" And here's how to read an entire text file at once: ==> def T = fget (fopen "clib.q" "r") To quickly compute a 32 bit checksum of the file: ==> sum (bytes (bytestr T)) mod 0x100000000 3937166 |
Finally, let's split the text into lines and add line numbers using
sprintf (see C-Style Formatted I/O):
==> def L = split "\n" T ==> def L = map (sprintf "%3d: %s\n") (zip [1..#L] L) ==> do writes (take 5 L) 1: 2: /* clib.q: Q's system module */ 3: 4: /* This file is part of the Q programming system. 5: () |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
These functions provide an interface to the C printf and
scanf routines. They are all defined in clib.q and thus
included in the prelude.
public extern printf FORMAT ARGS, fprintf F FORMAT ARGS, sprintf FORMAT ARGS; public extern scanf FORMAT, fscanf F FORMAT, sscanf S FORMAT; |
Arguments to the printf routines and the results of the scanf
routines are generally encoded as tuples or single non-tuple values (if only
one item is read/written).
All the usual conversions and flags of C printf/scanf are
supported, except %p (pointer conversion). The basic h and
l length modifiers are also understood, but not the fancy ISO C99
extensions like ll or hh, or l modifiers on characters
and strings. Two further unsupported features of the printf functions
are the %n (number of written characters) conversion and explicit
argument indexing (m$); thus all arguments have to be in the same order
as specified in the printf format string. The %n conversion
is implemented for the scanf functions, though.
As these functions are simply wrappers for the corresponding C functions,
integer conversions are generally limited to values which fit into machine
integers. To handle integers of arbitrary sizes, you might treat them as
strings (%s) in the format string and do the actual conversion manually
with val or str.
Seasoned C programmers will appreciate that the wrapper functions
provided here are safe in that they check their arguments and
prevent buffer overflows, so they should never crash your program with a
segfault. To these ends, if a %s or %[...] conversion
without maximum field width is used with scanf, the field width
will effectively be limited to some (large) value chosen by the
implementation.
See the printf(3) and scanf(3) manual pages for a description of
the format string syntax. Some basic examples follow (<CR> indicates
that you hit the carriage return key to terminate a line):
==> printf "%d\n" 99
99
()
==> printf "%d\n" (99)
99
()
==> printf "%s %s %d\n" ("foo","bar",99)
foo bar 99
()
==> scanf "%d"
99<CR>
99
==> scanf "%s %s %d"
foo bar 99<CR>
("foo","bar",99)
|
As indicated, multiple values are denoted as tuples, and the printf
function accepts both a single value or a one-tuple for a single
conversion. The scanf function always returns a single, non-tuple value
if only a single conversion is specified. Zero items are represented using the
empty tuple. Note that you always have to supply the ARGS argument of
printf, thus you specify an empty tuple if there are no output
conversions:
==> printf "foo\n" () foo () |
The scanf function also returns an empty tuple if no input items are
converted. For instance (as usual, using the * flag with a scanf
conversion suppresses the corresponding input item):
==> scanf "%*s" foo<CR> () |
Note that while scanf for most conversions skips an arbitrary
amount of leading whitespace, the trailing whitespace character at which
a conversion stops is not discarded by scanf. You can
notice this if you invoke, e.g., readc afterwards:
==> scanf "%s %d"; readc
foo 99<CR>
("foo",99)
"\n"
|
If you really have to skip the trailing whitespace character, you can do this with a suppressed character conversion, e.g.:
==> scanf "%s %d%*c"; writes "input: "||reads
foo 99<CR>
("foo",99)
input: <reads function waiting for input here>
|
The fprintf/fscanf functions work analogously, but are used when
writing or reading an arbitrary file instead of standard output or input. For
instance:
==> var msg = "You're not supposed to do that!" ==> fprintf ERROR "Error: %s\n" msg Error: You're not supposed to do that! () |
The sprintf function returns the formatted text as a string instead of
writing it to a file:
==> sprintf "%s %s %d\n" ("foo","bar",99)
"foo bar 99\n"
|
Likewise, sscanf takes its input from a string:
==> sscanf "foo bar 99\n" "%s %s %d"
("foo","bar",99)
|
The %n conversion is especially useful with sscanf, since it
allows you to determine the number of characters which were actually consumed:
==> sscanf "foo bar 99 *** extra text here ***\n" "%s %s %d%n"
("foo","bar",99,10)
|
You might then use the character count, e.g., to check whether the input format matched the entire string, or whether there remains some text to be processed.
Some remarks about the role of the length modifiers h and
l are in order. Just as with the C scanf routines, you
need the l modifier to read a double precision value; a simple
%f will only read single precision number:
==> scanf "%f" 1e100<CR> inf ==> scanf "%lf" 1e100<CR> 1e+100 |
The printf functions, however, always print double precision numbers,
so the l modifier is not needed:
==> sprintf "%g" 1e100 "1e+100" |
For the integer conversions, the h and l modifiers denote short
(usually 2 byte) and long (usually 4 byte) integer values. If the modifier is
omitted, the default integer type is used (this usually is the same as
long, but your mileage may vary).
As already indicated, the printf and scanf routines are limited
to machine integer sizes. Thus a scanf integer conversion will always
return a short or long integer value, depending on the length modifier
used. If a printf integer conversion is applied to a "big" integer
value, only the least significant bytes of the value are printed, as if the
printed number (represented in 2's complement if negative) had been cast to
the corresponding integer type in C. Thus the printed result will be
consistent with C printf output under all circumstances. For instance:
==> def N = 0xffff70008000 // big number ==> printf "%hu %lu\n" (N,N) 32768 1879080960 () ==> printf "%hd %ld\n" (N,N) -32768 1879080960 () |
To correctly print a big integer value, you can convert it manually with Q's
built-in str function, then print the value using a %s
conversion:
==> printf "%s\n" (str 1234567812345678) 1234567812345678 () |
Similarly, you can read a big integer value by converting it as a string, and then apply the val builtin.
==> val (scanf "%s") 1234567812345678<CR> 1234567812345678 |
Here you might use the %[...] conversion to ensure that the number is
in proper format (the initial blank is needed here to skip any leading
whitespace):
==> val (scanf " %[0-9-]") -1234567812345678 -1234567812345678 |
On output, the integer and floating point conversions can all be used with either integer or floating point arguments; integers will be converted to floating point values and vice versa if necessary:
==> printf "An integer: %d\n" 99.9 An integer: 99 () ==> printf "A floating point value: %e\n" 99 A floating point value: 9.900000e+01 () |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
These functions provide the same functionality as their C
counterparts. They are all to be found in system.q and thus you
have to explicitly import the system module to use them.
public extern rename OLD NEW; // rename a file public extern unlink NAME; // delete a file public extern truncate NAME LEN; // truncate a file (U) public extern getcwd, chdir NAME; // get/set the working directory public extern mkdir NAME MODE; // create a new directory public extern rmdir NAME; // remove a directory public extern readdir NAME; // list the files in a directory public extern link OLD NEW; // create a hard link (U) public extern symlink OLD NEW; // create a symbolic link (U) public extern readlink NAME; // read a symbolic link public extern mkfifo NAME MODE; // create a named pipe (U) public extern access NAME MODE; // test access mode public extern chmod NAME MODE; // set the file mode public extern chown NAME MODE UID GID; // set file ownership (U) public extern lchown NAME MODE UID GID; // set link ownership (U) public extern utime NAME TIMES; // set the file times public extern umask N; // set/get file creation mask public extern stat NAME, lstat NAME; // file and link information |
The stat/lstat functions return a tuple consisting of the
commonly available fields of the C stat struct, see
stat(2). For your convenience, the following mnemonic functions
are provided for accessing the different components:
public st_dev STAT, st_ino STAT, st_mode STAT, st_nlink STAT, st_uid STAT, st_gid STAT, st_rdev STAT, st_size STAT, st_atime STAT, st_mtime STAT, st_ctime STAT; |
These all need the system module, so you have to import it in the
interpreter to make the following examples work. E.g.:
==> import system |
With that out of the way, let's play around with some of these functions:
==> mkdir "tmp" 0777||chdir "tmp"||mkfifo "foo" 0666||\ rename "foo" "bar"||unlink "bar"||chdir ".."||rmdir "tmp" () |
(Create a tmp subdirectory, change to it, create a new FIFO
special file, rename that file, delete it, change back to the original
directory, and remove the tmp directory. All with a single
expression which realizes identity.)
Now for something more useful. We can retrieve the current umask while setting it to zero, and then reset it to the original value as follows:
==> def U = umask 0; oct; umask U || U; dec 022 |
List the files in the current directory:
==> readdir "." [".","..","Makefile","givertcap","clib.c","clib.q","Makefile.am", "Makefile.in","README-Clib","examples","Makefile.mingw"] |
Get the size of a file:
==> st_size (stat "README-Clib") 355 |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
With the notable exception of the exit function which is included
in clib (and thus in the prelude), all the following functions
need an explicit import of the system module.
The system function returns the status code of the command if execution
was successful, and fails otherwise:
public extern system CMD; // exec command using the shell |
Clib also provides the usual UNIX process creation and management
routines. Most of these really require a UNIX system; no attempt is made
to emulate operations like fork on systems where they are not
implemented. Thus the only process operations which currently work under
Windows are system, exec, spawn, _spawn,
exit and getpid.
public extern fork; // fork a child process (U) public extern exec PROG ARGS; // execute program public extern spawn PROG ARGS; // execute program in child process public extern _spawn MODE PROG ARGS; // execute child with options public extern nice INC; // change nice value (U) public extern exit N; // exit process with given exit code public extern pause; // pause until a signal occurs (U) public extern raise SIG; // raise signal in current process public extern kill SIG PID; // send signal to given process (U) public extern getpid; // current process id public extern getppid; // parent's process id (U) public extern wait; // wait for any child process (U) public extern waitpid PID OPTIONS; // wait for given child process (U) |
All these operations are simply wrappers for the corresponding C library
routines. Note, however, that the kill function takes the signal
to send as its first argument, which makes it easier to use
partial applications of the function, e.g., to iterate a kill operation
over a list of process numbers (as in `do (kill SIGTERM) PIDs').
The exec function performs a path search like the C
execlp/execvp function; the parameters for the program are
given as a string list ARGS, and as usual the first argument
should repeat the program file name. This function never returns unless
it fails. The spawn and _spawn operations are provided to
accommodate Windows' lack of fork and wait; these
functions work on both UNIX and Windows. The spawn function works
like exec, but runs the program in a new child process. It
returns the new process id (actually the process handle under
Windows). The _spawn function is like spawn, but accepts
an additional MODE parameter which determines how the child is to
be executed, either P_WAIT (wait for the child, return its exit
status), P_NOWAIT (do not wait for the child, same as
spawn), P_OVERLAY (replace the current image with the new
process, same as exec) and P_DETACH (run the new process
in the background). (Note that the P_DETACH option is ignored on
UNIX systems; the correct way to code a "daemon" on UNIX is shown in
the examples section below.)
On UNIX, the following routines are provided to interpret the status
code returned by the system, _spawn, wait and
waitpid functions:
public extern isactive STATUS; // process is active public extern isexited STATUS, exitstatus STATUS; // process has exited normally, get its exit code public extern issignaled STATUS, termsig STATUS; // process was terminated by signal, get the signal number public extern isstopped STATUS, stopsig STATUS; // process was stopped by signal, get the signal number |
For more information about the process functions, we refer the reader to the corresponding UNIX manual pages.
Operations to access the process environment are also implemented. The
getenv function fails if the given variable is not set in the
environment; this lets you distinguish this error condition from a defined
variable with empty value. The setenv function overwrites an existing
definition of the given variable:
public extern getenv NAME, setenv NAME VAL; // get/set environment variables |
On UNIX, the following operations provide access to process user and group information, as well as process groups and sessions. Not all operations may be implemented on all UNIX flavours. Please see the UNIX manual for a description of these functions.
/* User/group-related functions (U). */
public extern setuid UID, setgid GID; // set user/group id of process
public extern seteuid UID, setegid GID; // set effective user/group id
public extern setreuid RUID EUID, setregid RGID EGID;
// set real and effective ids
public extern getuid, geteuid; // get real/effective user id
public extern getgid, getegid; // get real/effective group id
public extern getlogin; // get real login name
// get/set supplementary group ids of current process
public extern getgroups, setgroups GIDS;
/* Session-related routines (U). */
public extern getpgid PID, setpgid PID PGID; // get and set process group
public extern getpgrp, setpgrp; // dito, for calling process
public extern getsid PID; // get session id of process
public extern setsid; // create a new session
|
Invoke the system function to execute a shell command:
==> import system ==> system "ls -l" |
You can also run the program directly with the spawn function:
==> spawn "ls" ["ls","-l"] |
Get and set an environment variable:
==> getenv "HOME" "/home/ag" ==> getenv "MYVAR" // variable is undefined getenv "MYVAR" ==> setenv "MYVAR" "foo bar" () ==> getenv "MYVAR" "foo bar" |
Here are some examples demonstrating the use of named pipes and the
process functions on UNIX systems. The mkfifo function allows the
creation of so-called "FIFO special files" a.k.a. named pipes,
which provide a simple inter-process communication facility.
For instance, create a named pipe as follows:
==> mkfifo "pipe" 0666 () |
You can then open the writeable end of the pipe:
==> def OUT = fopen "pipe" "w" |
Note that this call blocks until the input side of the pipe has been
opened. For this purpose, start another instance of the interpreter
(e.g., in another xterm), and from there open the pipe for
reading:
==> def IN = fopen "pipe" "r" |
Both fopen calls should now have finished, and you can write something
to the output end of the pipe in the first instance of the interpreter:
==> fwrites OUT "Hello, there!\n" |
Go to the other interpreter instance, and read back the string from there:
==> freads IN "Hello, there!" |
As usual, each end of the pipe is closed as soon as the corresponding file
object is no longer accessible. When you close the writeable end of the pipe
using, e.g., undef OUT in the first instance of the interpreter, the
input side of the pipe will reach end-of-file, and thus feof IN will
become true. After closing the pipe also on the input side, you can
remove the FIFO special file with the unlink function.
You can also use named pipes to set up a communication channel to child
processes created with fork. For instance:
import system;
def NAME = tmpnam;
def PIPE = mkfifo NAME 0666;
def MSG = "Hello there!\n";
test = printf "Parent writes: %s" MSG ||
fwrites (fopen NAME "w") MSG ||
writes "Parent waits for child ...\n" ||
printf "Parent: child has exited with code %d\n" wait
if fork > 0;
= printf "Child reads: %s\n" (freads (fopen NAME "r")) ||
writes "Child exiting ...\n" || exit 0
otherwise;
==> test
Parent writes: Hello there!
Parent waits for child ...
Child reads: Hello there!
Child exiting ...
Parent: child has exited with code 0
()
==> unlink NAME
()
|
Another method for accomplishing this with anonymous pipes is discussed in Low-Level I/O.
On UNIX, it also possible to implement "daemons", i.e., processes which place themselves in the background and continue to run even when you log out. The following little script shows how to do this.
import system;
/* Becoming a daemon is easy: Just fork, have the parent exit, and call setsid
in the child to start a new session. The new process becomes a child of the
init process and has no controlling terminal. Thus it keeps running even if
you log out, until it gets killed or the system shuts down. */
daemon = setsid || main if fork = 0;
= exit 0 otherwise;
/* The main code of the daemon then closes file descriptors inherited by the
parent and starts executing. In this example we just open a logfile and
start logging messages in regular intervals. We also handle the condition
that we are terminated by a signal. */
main = do close [0,1,2] || log F "daemon started" ||
do (trap 1) [SIGINT, SIGTERM, SIGHUP, SIGQUIT] ||
catch (sig F) (loop F) where F:File = fopen "log" "w";
= perror "daemon" || exit 1 otherwise;
sig F (syserr SIG)
= log F (sprintf "daemon stopped by signal %d" (-SIG)) ||
exit 0;
loop F = sleep 5 || log F "daemon still alive" || loop F;
log F MSG = fprintf F "%s at %s" (MSG, ctime time) || fflush F;
|
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
These functions provide operations for direct manipulation of files on
the file descriptor level. They are all in the system module. For
a closer description of the following operations we refer the reader to
the corresponding UNIX manual pages.
public extern open NAME FLAGS MODE; // create a new descriptor public extern close FD; // close a descriptor public extern dup FD, dup2 OLDFD NEWFD; // duplicate a descriptor public extern pipe; // create an unnamed pipe public extern fstat FD; // stat descriptor public extern fchdir FD; // change directory (U) public extern fchmod FD MODE; // change file mode (U) public extern fchown FD UID GID; // set file ownership (U) public extern ftruncate FD LEN; // truncate a file (U) public extern fsync FD, fdatasync FD; // sync the given file (U) |
The following operations can be used on both file descriptors and file
objects. They read and write binary data represented as byte strings
(see Byte Strings), providing an interface to the system's
read/write(2) and fread/fwrite(3)
functions. The bread function returns a byte string of the given
size read from the given file. Note that the returned byte string may
actually be shorter than SIZE bytes because, e.g., end of file
has been reached or not enough input was currently available on a
pipe. The bwrite function returns the number of bytes actually
written which is usually the size of the byte string unless an error
occurs. The functions fail if an error occurred before anything was read
or written. It is the application's responsibility to check these error
conditions and handle them in an appropriate manner.
public extern bread FD SIZE; // read a byte string public extern bwrite FD DATA; // write a byte string |
For instance, the following function uses bread and bwrite
to copy an input to an output file, using chunks of 8192 bytes at a
time:
fcopy F G = () if bwrite G (bread F 8192) < 8192;
= fcopy F G otherwise;
|
The file pointer of a descriptor can be positioned with lseek. In
difference to fseek, this function returns the new offset. To
determine the current position you can hence use an expression like
`lseek FD 0 SEEK_CUR'.
public extern lseek FD POS WHENCE; |
Some terminal-related routines are also provided:
public extern isatty FD; // is descriptor a terminal? |
The following are UNIX-specific:
public extern ttyname FD; // terminal associated with descriptor public extern ctermid; // name of controlling terminal public extern openpty, forkpty; // pseudo terminal operations |
The openpty function returns a pair (MASTER, SLAVE) of
file descriptors opened for both reading and writing on a "pseudo
terminal". MASTER is to be used in the controlling process,
while SLAVE can be used for the standard I/O streams in a child
process. The forkpty function combines openpty with
fork and makes the slave device the controlling terminal of the
child process; it returns a pair (PID, MASTER), where PID
is zero in the child process and the process id of the child in the
parent, and MASTER is the master end of the pseudo terminal to be
used by the parent. These functions are commonly used to implement
applications which drive other programs through a terminal emulation
interface.
On UNIX systems, clib also provides access to the following
fcntl operation (see Section 2 of the UNIX manual):
public extern fcntl FD CMD ARG; |
The ARG parameter of fcntl depends on the type of command
CMD which is executed. The available command codes and other
relevant values are defined as global variables, as listed below. Flags
are bitwise disjunctions of the symbolic values listed below. (The
following are the values present on most systems. Specific
implementations may provide additional flags.)
public var const // fcntl command codes F_DUPFD, F_GETFD, F_SETFD, F_GETFL, F_SETFL, F_GETLK, F_SETLK, F_SETLKW, // lock types F_RDLCK, F_WRLCK, F_UNLCK, // file access modes and access mode bitmask O_RDONLY, O_WRONLY, O_RDWR, O_ACCMODE, // file descriptor flags FD_CLOEXEC, // status flags O_CREAT, O_EXCL, O_TRUNC, O_APPEND, O_NONBLOCK, O_NDELAY, O_NOCTTY, O_BINARY; |
The following types of commands are implemented:
fcntl FD F_DUPFD ARG // duplicate a file descriptor fcntl FD F_GETFD () // get file descriptor flags fcntl FD F_SETFD FLAGS // set file descriptor flags fcntl FD F_SETFD () // get status flags/access mode fcntl FD F_SETFD FLAGS // set status flags fcntl FD F_GETLK (TYPE,POS,LEN[,WHENCE]) // query file lock information fcntl FD F_SETLK (TYPE,POS,LEN[,WHENCE]) // set an advisory file lock fcntl FD F_SETLKW (TYPE,POS,LEN[,WHENCE]) // blocking variant of F_SETLK |
The first five commands serve to duplicate descriptors and to retrieve
and change the file descriptor and status flags. The remaining commands
are used for advisory file locking. A file lock is specified as a triple
(TYPE, POS, LEN) or quadruple (TYPE, POS, LEN, WHENCE),
where TYPE is the type of lock (F_RDLCK, F_WRLCK or
F_UNLCK for read locks, write locks and unlocking, respectively),
POS the position in the file, LEN the number of bytes to
be locked (0 means up to the end of the file) and WHENCE
specifies how the POS argument is to be interpreted. (This
parameter has the same meaning as for the fseek and lseek
functions, see above. If WHENCE is omitted, it defaults to
SEEK_SET, i.e., absolute positions.) The value returned by the
F_GETLK command is the lock description with TYPE set to
F_UNLCK if the given lock would be accepted, and the description
of a current lock blocking the lock request otherwise. (In the latter
case the return value is actually a quadruple, with the id of a process
currently owning a conflicting lock in the last component.)
Note that the standard I/O operations use buffered I/O by default which
might interfere with record locking. Therefore in applications requiring
individual record locking you should work with the low-level operations
(open, bwrite, etc.) instead.
The following select function waits for a set of files to change
I/O status. Note that this operation is available on Windows as part of
the socket interface, but it only applies to sockets there.
public extern select FILES; |
The input is a tuple (IN, OUT, ERR, TIMEOUT) consisting of three
lists of file descriptors and/or file objects to be watched, and an
optional integer or floating point value specifying a timeout in
seconds. The function returns as soon as either a member of IN or
OUT becomes ready for performing an I/O operation (without
blocking), or an error condition is signaled for a member of
ERR. The returned value is a triple (IN, OUT, ERR) with
all the members of the original lists which are now ready for I/O. If
the timeout is exceeded before any of the files has become ready, a
triple of three empty lists is returned. If no timeout is specified then
the function may block indefinitely.
These examples are mostly UNIX-specific, thus Windows users might wish to skip ahead.
The following definitions show how the fcntl function can be used
to change a file's "non-blocking" flag. This is useful, e.g., if we
want to read from standard input or a pipe but do not want to be blocked
until input becomes available. Instead, having set the non-blocking
flag, input operations will fail immediately if there is no input to be
read right now.
/* set and clear the O_NONBLOCK flag of a file */
set_nonblock FD:Int = fcntl FD F_SETFL (FLAGS or O_NONBLOCK)
where FLAGS = fcntl FD F_GETFL ();
clr_nonblock FD:Int = fcntl FD F_SETFL (FLAGS and not O_NONBLOCK)
where FLAGS = fcntl FD F_GETFL ();
|
And here is how we can perform advisory locking on an entire file.
/* place an advisory read or write lock on an entire file (fail if error) */
rdlock FD:Int = () where () = fcntl FD F_SETLK (F_RDLCK, 0, 0);
wrlock FD:Int = () where () = fcntl FD F_SETLK (F_WRLCK, 0, 0);
/* remove the lock from the file */
unlock FD:Int = () where () = fcntl FD F_SETLK (F_UNLCK, 0, 0);
/* predicates to check whether a read or write lock could be placed */
rdlockp FD:Int = (LOCK!0 = F_UNLCK)
where LOCK:Tuple =
fcntl FD F_GETLK (F_RDLCK, 0, 0);
wrlockp FD:Int = (LOCK!0 = F_UNLCK)
where LOCK:Tuple =
fcntl FD F_GETLK (F_WRLCK, 0, 0);
|
Note that to apply these functions to standard file objects you can use
the fileno function (see Extended File Functions) as
follows:
==> rdlock (fileno F) |
The select function accepts both files and file descriptors as
input. Here is a way to test whether input is currently available from a
file/descriptor:
avail F = not null (select ([F],[],[],0)!0); |
This is useful, in particular, if the file is actually a pipe. For instance:
==> def F = popen "sleep 5; echo done" "r" ==> avail F // no input to be read yet, wait ... false ==> avail F // ... input available now true ==> fget F "done\n" |
However, most of the time select is used for multiplexing I/O
operations. For instance, the following loop processes input from a set
of files, one line at a time:
loop FILES = loop (proc FILES F)
where ([F|_],_,_) = select (FILES,[],[])
if not null FILES;
= () otherwise;
proc FILES F = // done with this file, get rid of it
filter (neq F) FILES
if feof F;
= // process a line
writes (fgets F) || FILES;
|
The pipe and dup2 operations provide a quick way to
reassign input and output of a child process and connect it to
corresponding file objects in the parent. For instance, here's how we
can implement a popen2 function which works like the built-in
popen routine, but allows to redirect both the input and
output side of a child process:
import system;
/* Create two unnamed pipes, one for the parent to read and the child to
write, the other one for the child to read and the parent to write. */
popen2 CMD = spawn2 CMD (P_IN, P_OUT) (C_IN, C_OUT)
where (P_IN, C_OUT) = pipe, (C_IN, P_OUT) = pipe;
/* Fork the child and redirect its standard input and output streams to the
child's ends of the pipe. This is accomplished with dup2 SRC DEST which
closes the file descriptor DEST and then makes DEST a copy of SRC. In the
parent we use fdopen to open two new file objects for the parent's ends of
the pipes. */
spawn2 CMD (P_IN, P_OUT) (C_IN, C_OUT)
= close P_IN || close P_OUT ||
dup2 C_IN (fileno INPUT) || dup2 C_OUT (fileno OUTPUT) ||
exec "/bin/sh" ["/bin/sh", "-c", CMD]
if fork = 0;
= close C_IN || close C_OUT ||
(fdopen P_IN "r", fdopen P_OUT "w")
otherwise;
|
The popen2 function employs fork and exec to spawn
a child process which executes the given command using the shell, after
redirecting the child's input and output to two pipes. In the parent
process, the popen2 function returns a pair of files opened on
the other ends of the child's descriptors.
The following piece of Q code shows how to apply the popen2 function
defined above in order to pipe a string list into the UNIX sort program
and construct the sorted list from the output:
mysort STRL = do (fprintf OUT "%s\n") STRL || fclose OUT || digest IN
where (IN, OUT) = popen2 "sort";
digest IN = [] if feof IN;
= [freads IN|digest IN] otherwise;
|
Example:
==> mysort ["five","strings","to","be","sorted"] ["be","five","sorted","strings","to"] ==> wait // get exit code of child process (sort program) (15804,0) |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The following (UNIX-specific) operations from the system module
provide an interface to the POSIX termios interface. Terminal attributes
are stored in a "termios" structure, represented as a 7-tuple
(IFLAG, OFLAG, CFLAG, LFLAG, ISPEED, OSPEED, CC). The control
character set CC is represented as a list of character numbers,
indexed by the symbolic constants VEOF etc. See termios(3)
for further details.
public extern tcgetattr FD; // get terminal attributes public extern tcsetattr FD WHEN ATTR; // set terminal attributes public extern tcsendbreak FD DURATION; // send break public extern tcdrain FD; // wait until all output finished public extern tcflush FD QUEUE; // flush input or output queue public extern tcflow FD ACTION; // control input/output flow public extern tcgetpgrp FD; // get terminal process group public extern tcsetpgrp FD PGID; // set terminal process group /* Access components of the termios structure. */ public c_iflag ATTR, c_oflag ATTR, c_cflag ATTR, c_lflag ATTR, c_ispeed ATTR, c_ospeed ATTR, c_cc ATTR; |
This example shows how to use the termios functions to read a
password from the terminal without echoing. This is an almost literal
translation of the C program described in Richard Stevens: Advanced
Programming in the UNIX Environment, Addison-Wesley, 1993,
cf. p. 350. The main difference is that we merely ignore the
SIGINT and SIGTSTP signals instead of blocking them (the
latter is not supported by Q's trap builtin).
import system;
getpass PROMPT = fwritec F "\n" || unprep F SAVE || PW
where F:File = fopen ctermid "r+", SAVE = prep F,
PW = fwrites F PROMPT || fflush F || freads F;
/* prep F: ignore SIGINT and SIGTSTP and prepare the terminal */
prep F = tcsetattr (fileno F) TCSAFLUSH NATTR || (ATTR,TRAPS)
where TRAPS = map (trap SIG_IGN) [SIGINT,SIGTSTP],
ATTR = tcgetattr (fileno F),
(IF,OF,CF,LF,IS,OS,CC) = ATTR,
LF = LF and not (ECHO or ECHOE or ECHOK or ECHONL),
NATTR = (IF,OF,CF,LF,IS,OS,CC);
/* unprep F SAVE: revert to previous settings */
unprep F (ATTR,TRAPS)
= tcsetattr (fileno F) TCSAFLUSH ATTR ||
zipwith trap TRAPS [SIGINT,SIGTSTP];
|
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The following functions from system.q provide a basic interface
to the GNU readline library. While readline doesn't really belong to the
POSIX interface, it is rather useful and the interpreter uses it anyway,
so it makes sense to also provide it as a part of the system interface.
public extern readline PROMPT, rl_line_buffer; public extern add_history LINE, stifle_history MAX; public extern read_history FNAME, write_history FNAME; |
These functions work like their C counterparts. The readline
function prompts with the given string and reads an input line,
providing editing and history facilities. Basic bash-like filename
completion is also provided. The rl_line_buffer function returns
the text of the current input line, which is useful, e.g., if a custom
completion function (see below) needs to inspect surrounding context.
The add_history routine adds a line to the history. The maximum
size of the history can be set with stifle_history, which also
returns the size which was set previously. A negative MAX value
makes the size of the history unbounded (which is also the
default). Last but not least, the history can be read from and written
to a file with the read_history and write_history
functions.
For instance, the following commands read an input string, add it to the history, and finally save the history to a file:
==> import system ==> readline "input> " input> foo bar "foo bar" ==> add_history _ () ==> write_history "myhistory" () |
As another example, here is the definition of a little convenience
function which reads a line using readline and enters it into the
history if it is nonempty:
get_line PROMPT = if not null LINE then add_history LINE || LINE
where LINE:String = readline PROMPT;
|
Readline's standard filename completion facility can be augmented with a custom completion function. This is achieved by simply setting the following global variable to the desired function:
public var RL_COMPLETION_FUNCTION; |
The completion function is invoked with two arguments, the string to be
completed and the position index of that string in the input line
(rl_line_buffer, see above), and is expected to return a string
list with the possible completions. For instance, here is a simple
example of a function which checks a list of "command" words for
possible completions:
def RL_COMPLETION_FUNCTION = complete; complete S _ = filter (is_prefix S) ["bar","foo","gnats","gnu"]; is_prefix X Y = (X=sub Y 0 (#X-1)); |
The position index argument is useful if the completion depends on the position inside the input line. For instance, the following function only attempts completion at the beginning of the input line:
complete S 0 = filter (is_prefix S) ["bar","foo","gnats","gnu"]; |
Readline's default behaviour is to try a custom completion first, if it
is available, and to fall back to the standard filename completion
otherwise. The latter can be suppressed by ending the list of
completions with a () entry:
complete S 0 = filter (is_prefix S) ["bar","foo","gnats","gnu"] ++ [()]; complete _ _ = [()] otherwise; |
The text units for which readline attempts completion are the "words" of the input line. There is a `RL_WORD_BREAK_CHARS' variable which allows you to change readline's idea of what a word is, by setting the variable to a string containing the word delimiter characters. The default definition of this variable is as follows:
public var RL_WORD_BREAK_CHARS = " \t\n\"\\'`@$><=;|&{(";
|
Redefining RL_WORD_BREAK_CHARS affects both readline's cursor
movement commands and the behaviour of the completion routine. For
instance, to turn the comma into a word delimiter, you can change
RL_WORD_BREAK_CHARS as follows:
def RL_WORD_BREAK_CHARS = RL_WORD_BREAK_CHARS++","; |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The functions described here are all in the system module. Error
codes from various system operations can be retrieved with the following
functions:
public extern errno, seterrno N; // get/set last error code public extern perror S, strerror N; // print error message |
The perror function is commonly used to report error conditions
in system operations on the standard error file. For instance:
==> fopen "/etc/passw" "r" fopen "/etc/passw" "r" ==> perror "fopen" fopen: No such file or directory () |
If more elaborate formatting is required then you can use
strerror on the errno value to obtain the error message as
a string:
==> fprintf ERROR "fopen returned message `%s'\n" (strerror errno) fopen returned message `No such file or directory' () |
Note that errno is only set when an error occurs in a system
call. You can use seterrno to reset the errno value before
a system operation to check whether there actually was an error while
executing the system call:
==> seterrno 0 () ==> fopen "/etc/passwd" "r" <<File>> ==> perror "fopen" fopen: Success () |
The remaining operations are used to obtain various information about
the system and its information databases. For instance, the uname
operation returns a 5-tuple containing information identifying the
operating system (this operation is generally only available on
UNIX-like systems):
public extern uname; /* Access components of uname result. */ public un_sysname UNAME, un_nodename UNAME, un_release UNAME, un_version UNAME, un_machine UNAME; |
The hostname of the system can be retrieved with the gethostname
function.
public extern gethostname; |
The password and group database can be accessed with the following
functions. Password entries are encoded as 7-tuples (NAME, PASSWD,
UID, GID, GECOS, DIR, SHELL), group entries as 4-tuples (NAME,
PASSWD, GID, MEMBERS). This information is only available on UNIX-like
systems.
public extern getpwuid UID, getpwnam NAME; // look up a password entry public extern getpwent; // list of all pw entries public extern getgrgid GID, getgrnam NAME; // look up group entry public extern getgrent; // list of all group entries /* Access components of password and group structures. */ public pw_name PW, pw_passwd PW, pw_uid PW, pw_gid PW, pw_gecos PW, pw_dir PW, pw_shell PW; public gr_name GR, gr_passwd GR, gr_gid GR, gr_members GR; |
Moreover, the crypt function can be used to perform UNIX password
encryption (see crypt(3) for details).
public extern crypt KEY SALT; // (U) |
The following functions can be used to query host information as well as
the network protocols and services available on your system. This
information is closely related to the socket interface described in
Sockets. For a closer description of these operations we refer the
reader to the corresponding manual pages. Note that the
gethostent, getprotoent and getservent operations
are not available on Windows.
The host database: Host entries are of the form (NAME, ALIASES,
ADDR_TYPE, ADDR_LIST), where NAME denotes the official hostname,
ALIASES its alternative names, ADDR_TYPE the address
family and ADDR_LIST the list of addresses.
public extern gethostbyname HOST, gethostbyaddr ADDR; public extern gethostent; // (U) public h_name HENT, h_aliases HENT, h_addr_type HENT, h_addr_list HENT; |
Note that both hostnames and IP addresses are specified as
strings. Hostnames are symbolic names such as "localhost", and
can also have a domain name specified, as in "www.gnu.org". IPv4
addresses use the well-known "numbers-and-dots" notation, like the
loopback address "127.0.0.1". IPv6 addresses are usually written
as eight 16-bit hexadecimal numbers that are separated by colons; two
colons are used to abbreviate strings of consecutive zeros. For example,
the IPv6 loopback address "0:0:0:0:0:0:0:1" can be abbreviated as
"::1".
The protocol database: Protocol entries are of the form
(NAME, ALIASES, PROTO) denoting official name, aliases and number
of the protocol.
public extern getprotobyname NAME; public extern getprotobynumber PROTO; public extern getprotoent; // (U) public p_name PENT, p_aliases PENT, p_proto PENT; |
The service database: Service entries are of the form (NAME,
ALIASES, PORT, PROTO) denoting official name, aliases, port number and
protocol number of the port. The NAME argument of
getservbyname can also be a pair (NAME, PROTO) to restrict
the search to services for the given protocol (given by its
name). Likewise, the PORT argument can also be given as
(PORT, PROTO).
public extern getservbyname NAME; public extern getservbyport PORT; public extern getservent; // (U) public s_name PENT, s_aliases PENT, s_port PENT, s_proto PENT; |
For instance, here is some information retrieved from a typical Linux system:
==> import system
==> uname
("Linux","obelix","2.4.19-4GB","#2 Tue Mar 4 16:03:51 CET 2003","i686")
==> gethostname
"obelix"
==> gethostbyname gethostname
("obelix.local",["obelix"],2,["127.0.0.2"])
==> gethostbyaddr "::1"
("localhost",["ipv6-localhost","ipv6-loopback"],10,["::1"])
==> getprotobyname "tcp"
("tcp",["TCP"],6)
==> getservbyname "ftp"
("ftp",[],5376,"tcp")
==> getpwuid getuid
("ag","x",500,100,"Albert Gräf","/home/ag","/bin/bash")
==> getgrgid getgid
("users","x",100,[])
|
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The following functions from the system module are available on
systems providing a BSD-compatible socket layer, which, besides BSD,
includes BEOS, Linux, OSX, Windows and most recent System V
flavours. Sockets provide bidirectional communication channels on the
local machine as well as across the network. Sockets are represented by
file descriptors which can be written to and read from with the
send and recv functions. On most systems, socket
descriptors are just ordinary file descriptors which can also be used
with low-level I/O functions and fdopen as usual. However, on
some systems (in particular, BEOS and Windows), socket descriptors are
"special" and all socket I/O must be performed with the special socket
operations.
At creation time, a socket is described by the following attributes (see
socket(2) for more details):
AF_LOCAL (a.k.a. AF_UNIX a.k.a. AF_FILE),
AF_INET and AF_INET6. Not all protocol families may be
available on all systems (e.g., the Windows socket library only supports
AF_INET).
SOCK_STREAM,
SOCK_DGRAM, SOCK_SEQPACKET, SOCK_RAW and
SOCK_RDM. Among these, SOCK_STREAM, SOCK_SEQPACKET
and SOCK_RDM are connection-based. Please note that not all
socket types are supported for all protocol families, and some socket
types may be entirely missing on non-UNIX systems.
AF_LOCAL namespace, which refers to the local filesystem, the
protocol is always 0, the default protocol. The available protocols for
the internet namespaces can be retrieved from the protocol database (see
System Information).
Before another process can connect to a socket it must also be bound to
an address. The address format depends on the address family of the
socket. For the local namespace, the address is simply a filename on
the local filesystem. For the IPv4 namespace, it is a pair (HOST,
PORT) where HOST denotes the host name or IP address, specified
as a string, and PORT is a port number. For the IPv6 namespace,
it is a quadruple (HOST, PORT, FLOWINFO, SCOPEID). Host names and
known port numbers can be retrieved from the host and service databases
(see System Information).
The following operations are provided to create a socket, or a pair of connected sockets, for the given address family, socket type and protocol. They return the file descriptor of the socket (or a pair of file descriptors).
public extern socket FAMILY TYPE PROTO; public extern socketpair FAMILY TYPE PROTO; // (U) |
The shutdown function terminates data transmission on a
socket. You can stop reading, writing or both, depending on whether
HOW is SHUT_RD, SHUT_WR or SHUT_RDWR. Note
that this operation does not close the socket's file descriptor; for
this purpose closesocket is used (see below).
public extern shutdown SOCKET HOW; |
The closesocket function closes a socket. On most systems this is
just identical to close (see Low-Level I/O), but, as
already noted, on some systems socket descriptors are special and you
must use this function instead.
public extern closesocket SOCKET; |
The bind function binds a socket to an address. This is also done
automatically when the socket is first used. However, if the socket has
to be found by another process you'll have to explicitly specify an
address for it. The bind function does just that.
public extern bind SOCKET ADDR; |
The following operations are used to start listening for and accept
connection requests on a socket. These operations are used on the server
side of a connection-based socket. The argument N of
listen denotes the maximum number of pending connection requests
for the server. After the call to listen, the server can accept
connections from a client with the accept function, which returns
a pair (SOCKET, ADDR), where SOCKET is a new socket
connected to the client, and ADDR is the client's address.
public extern listen SOCKET N; public extern accept SOCKET; |
The connect function is used to initiate a connection on a
socket. This function can be used on both connection-based and
connectionless sockets. In the former case, connect can only be
invoked once. In the latter case, it can be invoked multiple times, and
sets the remote socket for subsequent send and receive operations.
public extern connect SOCKET ADDR; |
The following routines retrieve information about a socket. The local
address of a socket and the address of the remote socket it is connected
to can be retrieved with getsockname and
getpeername. Socket options, specified using a protocol level
LEVEL and an option index OPT, can be queried and changed
with getsockopt and setsockopt. The option values are
encoded as byte strings (cf. Byte Strings). For a description of
the available options see getsockopt(2).
public extern getsockname SOCKET; public extern getpeername SOCKET; public extern getsockopt SOCKET LEVEL OPT; public extern setsockopt SOCKET LEVEL OPT VAL; |
Finally, the following specialized I/O functions are used to transmit
data over a socket. All data is encoded as byte strings. The receive
operations return the received data (which may be shorter than the
requested size, if not enough data was currently available), the send
operations the number of bytes actually written. For
recvfrom/sendto the data is encoded as a pair (ADDR,
DATA) which includes the source/destination address; these operations
are typically used for connectionless sockets. The FLAGS argument
is used to specify special transmission options (see the MSG_*
constants at the beginning of clib.q).
public extern recv SOCKET FLAGS SIZE, send SOCKET FLAGS DATA; public extern recvfrom SOCKET FLAGS SIZE, sendto SOCKET FLAGS DATA; |
The following script demonstrates how we can implement a connectionless
server in the IPv4 namespace which repeatedly accepts a request from a
client and sends back an answer. In this example the requests are
strings denoting Q expressions; the server evaluates each expression and
sends back the result as a string. The client reads input from the user,
transmits it to the server and prints the received answer. Note that the
transmitted strings are represented as byte strings, as required by the
recvfrom and sendto operations. The bytestr and
bstr functions are used to convert between character and byte
strings, see Byte Strings.
import system;
def BUFSZ = 500000; // buffer size
/* the server: receive messages, evaluate them as Q expressions, and
send back the results */
def SERVER = ("localhost",5001); // the server address
server = server_loop FD
where FD:Int = socket AF_INET SOCK_DGRAM 0,
() = bind FD SERVER;
= perror "server" otherwise;
server_loop FD = sendto FD 0 (ADDR,eval MSG) || server_loop FD
where (ADDR,MSG) = recvfrom FD 0 BUFSZ;
= server_loop FD otherwise;
/* evaluate an expression encoded as a byte string, catch syntax
errors and exceptions, convert result back to a byte string */
eval MSG = catch exception (bytestr (str VAL))
where 'VAL = valq (bstr MSG);
= bytestr ">>> SYNTAX ERROR" otherwise;
exception _ = bytestr ">>> ABORTED";
/* the client: read input from user, send it to the server, print
returned result */
def CLIENT = ("localhost",5002); // the client address
client = client_loop FD
where FD:Int = socket AF_INET SOCK_DGRAM 0,
() = bind FD CLIENT;
= perror "client" otherwise;
client_loop FD = sendto FD 0 (SERVER,bytestr MSG) ||
printf "%s\n" (bstr (recv FD 0 BUFSZ)) || client_loop FD
if not null MSG
where MSG:String = writes "\nclient> " || flush || reads;
= () otherwise;
|
For instance, we can invoke the server in a secondary thread and then execute the client as follows:
==> def S = thread server ==> client client> prd [1..50] 30414093201713378043612608166064768844377641568960512000000000000 client> 1+) >>> SYNTAX ERROR client> quit >>> ABORTED client> |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
On systems where the POSIX threads library or some compatible
replacement is available (this includes Windows and most modern UNIXes),
clib provides functions for handling multiple threads of
control. Threads, a.k.a. "light-weighted processes", allow you
to realize "multithreaded scripts" consisting of different tasks which
together perform some computation in a distributed manner. All tasks are
executed concurrently. Thus you can, e.g., perform some lengthy
calculation in a background task while you go on evaluating other
expressions in the interpreter's command loop. You can also have tasks
communicate via mutexes, conditions and semaphores.
The operations described in this section (which are all contained in
clib and thus included in the prelude) are in close
correspondence with POSIX 1003.1b. However, some operations are named
differently, and semaphores provide the extra functionality of sending
data from one thread to another. Mutexes are also supported, mostly for
the purpose of handling critical sections involving operations with
side-effects (I/O etc.). Mutexes are not required to make
conditions work since these have their own internal mutex handling. For
more information on POSIX threads, please refer to the corresponding
section in the UNIX manual.
Please note that these functions will only work as advertised if the interpreter has been built with POSIX thread support. Moreover, in the current implementation the interpreter effectively serializes multithreaded scripts on the reduction level and thus user-level threads cannot really take advantage of multi-processor machines.
| 12.13.1 Thread Creation and Management | ||
| 12.13.2 Realtime Scheduling | ||
| 12.13.3 Mutexes | ||
| 12.13.4 Conditions | ||
| 12.13.5 Semaphores | ||
| 12.13.6 Threads and Signals | ||
| 12.13.7 Thread Examples |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Clib threads are represented using handles (objects of type
Thread). Note that a thread is canceled automatically as soon as
its handle is garbage collected, thus you should keep the handle around
as long as the thread is needed. For convenience, thread handles are
numbered arbitrarily, starting at 0 which denotes the main thread, and
are ordered by the thread numbers. This is handy, e.g., if you want to
use thread handles as indices in a dictionary.
public extern type Thread; // thread handle type public isthread THREAD; // check for thread objects public extern thread_no THREAD; // thread number public extern this_thread; // handle of the current thread |
The basic thread operations are listed below. The thread function
starts evaluating its special argument in a new thread, and returns its
handle. You can wait for a thread to terminate and obtain the evaluated
result with the result function. (If there is no result, because
the thread has been canceled, or was aborted with halt,
quit or a runtime error, result fails.) Note that
halt or quit in a thread which is not the main thread only
terminates the current thread; however, the exit function,
cf. Process Control, always exits from the
interpreter. You can also terminate the current thread immediately and
return a given value as its result with the return function; in
the main thread, this function is equivalent to halt and the
return value is ignored. Moreover, all threads except the main thread
can also be canceled from any other thread using the cancel
function. Finally, the yield function allows the interpreter to
switch threads at any given point (normally the interpreter will only
switch contexts in certain builtins and when a new rule is activated).
public extern special thread X; // start new thread public extern return X; // terminate thread with result X public extern cancel THREAD; // cancel THREAD public extern result THREAD; // wait for THREAD, return result public extern yield; // allow context switch |
Clib threads always use deferred cancellation, hence thread
cancellation requests are usually not honored immediately, but are
deferred until the thread reaches a cancellation point where it is
safe to do so. Cancellation points occur at certain C library calls
listed in the POSIX threads documentation, when a new equation is
activated in the Q interpreter, and when yield is called.
You can also check whether a thread is still active or has been
canceled. If neither condition holds, then the thread has already been
terminated and you can obtain its result with the result
operation.
public extern active THREAD; // check if THREAD is active public extern canceled THREAD; // check if THREAD was canceled |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [ |