Sather Home Page

Section 8.16.1.3:
$TEXT_STRING

This page defines two generic abstract classes named $TEXT_STRING which have different numbers of class arguments

abstract class $TEXT_STRING{ELT < $IS_EQ} < $STRINGS

Inheritance map $IS_EQ $ELT $HASH $STRINGS

Formal Definitions

This abstract class defines a state component which is a set of all instantiations of objects of any class sub-typing from this class in addition to the vdm model types used wherever this class name is used. Note that SAME has to be an instantiated class, not an abstract one.

types

SAME = object_type ;
$STRING_ELT = set of object_type

state

multi : $STRING_ELT
inv multi_types ==
forall obj in set multi_types & sub_type($STRING_ELT,obj)
NOTE See the important note about vdm state in the notes on vdm-sl usage in this specification.

This abstract class characterises the concept of all forms of simple string whether binary, text or other as sequences of the argument class (elements) which must sub-type from $IS_EQ. Classes which sub-type from this shall have immutable semantics!


index_lib

This feature is the cultural and coding which is associated with the string. It need not be the default culture and coding for the environment in which the program is executing, since a program may manipulate cultur objects independently of local textual representations.

index_lib : LIBCHARS
Formal Signature
index_lib(self : SAME) res : LIBCHARS
Pre-condition

Since the string has to exist then so does this component. The pre-condition, therefore, is vacuously true.

Post-condition

Thie is also vacuously true, since it is a component of every string of text.

This feature provides access to all of the cultural and environment dependencies relating to this character string.


abstract class $TEXT_STRING{ELT < $IS_EQ, FTP < $FTEXT_STRING{ELT}, STP < $TEXT_STRING{ELT}} < $TEXT_STRING{ELT}, $BINARY

Inheritance map $IS_EQ $ELT $HASH $FSTRINGS $STRINGS $FTEXT_STRING{ELT} $TEXT_STRING{ELT}

Formal Definitions

This abstract class defines a state component which is a set of all instantiations of objects of any class sub-typing from this class in addition to the vdm model types used wherever this class name is used. Note that SAME has to be an instantiated class, not an abstract one.

types

SAME = object_type ;
$TEXT_STRING_ELT_FTP_STP = set of object_type

state

multi : $TEXT_STRING_ELT_FTP_STP
inv multi_types ==
forall obj in set multi_types & sub_type($TEXT_STRING_ELT_FTP_STP,obj)
NOTE See the important note about vdm state in the notes on vdm-sl usage in this specification.

This abstract class characterises the concept of a text string as a sequence of the argument class (elements) which must sub-type from $IS_EQ. The second and third class arguments are the 'corresponding' mutable ($FTEXT_STRINGS{ELT}) and immutable (sub-typing from $TEXT_STRINGS{ELT}) string classes. Classes which sub-type from this class shall have immutable semantics!


build

The provision of this feature is required to permit sub-typing classes to convert binary data into a string of text. It is complemented by the binstr feature defined below.

build (
cursor : BIN_CURSOR,
lib : LIBCHARS
) : SAME
Formal Signature
build(cursor : BIN_CURSOR, lib : LIBCHARS) res : SAME
Pre-condition
pre let rest = BIN_CURSOR.remaining(cursor) in
rest > 0
and rest mod LIBCHARS.my_size(lib) = 0
Post-condition
post let tail = BIN_CURSOR.get_remainder(cursor) in tail = binstr(res)

This routine builds the result string from the binary string indicated using the encoding and repertoire defined by lib. If the string indicated by cursor does not contain an integral number of character codes in the given repertoire and encoding then void is returned and the cursor has not been moved.


is_upper

This predicate tests to determine if the string contains all upper-case letters (being defined by the current execution environment cultural specification as being in the class 'upper'). Note that where a script does not define any upper case letters - or has no case distinction at all then the result will be identically false - even though the characters are letters.

is_upper : BOOL
Formal Signature
is_upper(self : SAME) res : BOOL
Pre-condition
pre size(self) > 0
Post-condition
post res =
forall index in inds self & CHAR_CLASS.kind(self(index)) = CHAR_CLASS.Upper_Case

This predicate returns true if and only if every element of self is upper-case, otherwise false. Where there is no case distinction in the script concerned then this returns identically false.


is_lower

This predicate tests to determine if the string contains all lower-case letters (being defined by the current execution environment cultural specification as being in the class 'lower'). Note that where a script does not define any lower case letters - or has no case distinction at all then the result will be identically false - even though the characters are letters.

is_lower : BOOL
Formal Signature
is_lower(self : SAME) res : BOOL
Pre-condition
pre size(self) > 0
Post-condition
post res =
forall index in inds self & CHAR_CLASS.kind(self(index)) = CHAR_CLASS.Lower_Case

This predicate returns true if and only if every element of self is lower-case, otherwise false. Where there is no case distinction in the script concerned then this returns identically false.


char

char (
index : CARD
) : ELT
Formal Signature
char(self : SAME, index : CARD) res : ELT
Pre-condition
pre index < self(size)
Post-condition

Note that the index in this post-condition is incremented by one to take account of the indexing difference between Sather and vdm.

post res = self(index + 1)

This routine returns the element to be found at the indicated position in self.


upper

This routine creates a copy of self in which all lower case letters are replaced by an upper case equivalent if one exists. Note that there are scripts (eg Armenian) which have lower case letters to which there is no corresponding upper case letter. If no upper case equivalent exists then no change is made to a letter code. Non-letter codes are not changed.

upper : SAME
Formal Signature
upper(self : SAME) res : SAME
Pre-condition
pre size(self) > 0
Post-condition
post let upindices : set of nat1 =
{index |forall index in set dom self & CHAR_CLASS.kind(self(index)) = CHAR_CLASS.Lower_Case} in
forall idx in set upindices & self(idx) in set UNICODE.Lower_only
or res(idx) = CHAR_MAPPING.to_domain(self(idx))

This routine returns a copy of self in which every lower case character has been converted to its upper case equivalent provided one exists.


lower

This routine creates a copy of self in which all upper case letters are replaced by a lower case equivalent. Non-letter codes are not changed.

lower : SAME
Formal Signature
lower(self : SAME) res : SAME
Pre-condition
pre size(self) > 0
Post-condition
post let upindices : set of nat1 =
{index | forall index in set dom self & CHAR_CLASS.kind(self(index)) = CHAR_CLASS.Upper_Case} in
forall idx in set upindices & res(idx) = CHAR_MAPPING.to_range(self(idx))

This routine returns a copy of self in which every upper case character has been converted to its lower case equivalent.


capitalize

This routine creates a copy of self in which the first character of each word is converted to its upper case equivalent (if one exists). The start of a word is defined as either the first character in the string unless that is white space or punctuation, otherwise the first character following a whitespace or punctuation character unless that is itself white space or punctuation.

capitalize : SAME
Formal Signature
capitalize(self : SAME) res : SAME
Pre-condition
pre size(self) > 0
Post-condition
post let space : set of ETP = CHAR_TYPES.classes(CHAR_CLASS.Space) union
CHAR_TYPES.classes(CHAR_CLASS.Punctuation) in
let capindices : set of nat1 = {index | forall index in inds self &
((index = 1) and self(index) not in set space)
or (self(index) not in set space

and self(index - 1) in set space} in
forall idx in set capindices & self(idx) in set UNICODE.Lower_only
or res(idx) = CHAR_MAPPING.to_domain(self(idx))

This routine returns a copy of self in which the first character of every word (from the beginning of the string or after punctuation or a whitespace) is converted to its upper case equivalent if one exists.


repeat

This feature returns a text string which is the concatenation of self the given number of times.

repeat (
cnt : CARD
) : SAME
Formal Signature
repeat(self : SAME, cnt : CARD) res : SAME
Pre-condition
pre (size(self) > 0)
and (cnt > 0)
Post-condition
post forall idx in set {1,...,cnt} &
let start : nat1 = (idx - 1) * size(self) + 1 in
forall index in set {start,...,(start + size(self))}, index2 in inds self &
self(index2) = res(index)

This routine returns a new string which contains the contents of self concatenated cnt times.


replace

This feature enables arbitrary element substitution to be made over the entire text string.

replace (
old_elt : ELT,
new_elt : ELT
) : SAME
Formal Signature
replace(self : SAME, old_elt : ELT, new_elt : ELT) res : SAME
Pre-condition
pre size(self) > 0
Post-condition
post forall index in inds self & ((self(index) = old_elt)
and (res(index) = new_elt))
or (self(index) = res(index)

This routine returns a new string which is a copy of self apart from which each occurrence of old_elt has been replaced by new_elt.


replace

This second variant of this feature enables simple set substitution to be made, any element in the string which is treated as if it were a set of elements being replaced by the given replacement element.

replace (
test_set : STP,
new_elt : ELT
) : SAME
Formal Signature
replace(self : SAME, test_set : STP, new_elt : ELT) res : SAME
Pre-condition
pre size(self) > 0
and STP.size(test_set) > 0
Post-condition
post forall index in inds self & ((self(index) in set dom test_set)
and (res(index) = new_elt))
or (self(index) = res(index)

This routine returns a copy of self in which all occurrences of any element in set are replaced by new_elt.


remove

This feature returns a copy of self in which every occurrence of elt has been deleted.

remove (
elt : ELT
) : SAME
Formal Signature
remove(self : SAME, elt : ELT) res : SAME
Pre-condition
pre size(self) > 0
Post-condition
post res = [self(index) | forall index in inds self & self(index) <> elt]

This routine returns a copy of self from which all occurrences of elt have been removed.


remove

This feature returns a copy of self in which every occurrence of an element which is in the str argument has been deleted. The string argument is treated as if it were a set of elements.

remove (
test_set : STP
) : SAME
Formal Signature
remove(self : SAME, test_set : STP) res : SAME
Pre-condition
pre size(self) > 0
and STP.size(test_set) > 0
Post-condition
post res = [self(index) | forall index in inds self & self(index) not in set dom test_set]

This routine returns a copy of self from which all elements contained in test_set have been removed.


escape

This routine provides a facility to convert a text string into one with escape elements inserted. This is frequently useful when it is necessary to process the string by some external service which may treat the elements in elist specially unless preceded by an escape element. The list argument is treated as if it were a set of elements. Note that the list argument may be empty, in which case the only changes which occur is the duplication of every escape element.

escape (
escape : ELT,
elist : STP
) : SAME
Formal Signature
Pre-condition
pre size(self) > 0
Post-condition
post let test_set : set of ELT = dom elist union {esc} in
res = escaped(self,esc,test_set)


escaped : SAME * ELT * set of ELT -> SAME

escaped(me,escape,test_set) ==
let loc_res : SAME =
let head = hd me in
if head in set test_set then
[escape,head]
else
[head] in
if tl me = [] then
loc_res
else
loc_res ^ escaped(tl me,escape,test_set)

This routine returns a text string which is a copy of self in which all elements occurring in elist - and the escape element itself - are preceded by the escape element.


strip

The structure of text consists of a sequence of pages within each of which there are one or more lines of text, any number of which may be empty. This routine strips from the end of the string any number of contiguous line marks (more than one, it will be remembered, denoting blank lines).

strip : SAME
Formal Signature
strip(self : SAME) res : SAME
Pre-condition
pre size(self) > 0
Post-condition

For the purposes of specification a line mark is considered to be a single element in the text string. Where an implementation uses two or more elements then they shall appear as being one for the purposes of addition/removal from a string.

post let lm : ELT = LIBCHARS.default.Line_Mark in
res = stripped(self,lm)


stripped : SAME * ELT -> SAME

stripped(me,lm) ==
if me(size) = lm then
if size = 1 then
[]
else
stripped(me(1,...,(size(me) - 1)),lm)
else
me

This routine returns a copy of self from the end of which has been removed all contiguous line_marks.


minus

This feature returns a copy of self from which the first occurrence (if any) of str has been removed.

minus (
str : STP
) : SAME
Formal Signature
minus(self : SAME, str : STP) res : SAME
Pre-condition
pre size(self) > 0
and size(self) >= STP.size(str)
Post-condition
post let tmp : [seq of ELT] be st
(head ^ tmp ^ tail = self)
and ((tmp = str)

or (tmp = nil)) in
res = head ^ tail

This routine returns a copy of self from which the first (if any) occurrence of str has been deleted.


minus

This variant of the minus feature returns a copy of self from which the first occurrence after the given index position (if any) of str has been removed.

minus (
str : STP,
start : CARD
) : SAME
Formal Signature
minus(self : SAME, str : STP, start : CARD) res : SAME
Pre-condition
pre size(self) > 0
and size(self) >= STP.size(str) + start
Post-condition
post let ignored : [seq of ELT] be st ignored ^ self(1,...,(start + 1)) = self in
let tmp : [seq of ELT] be st
(head ^ tmp ^ tail = self)
and ((tmp = str)

or (tmp = nil)) in
res = ignored ^ head ^ tail

This routine returns a copy of self from which the first (if any) occurrence of str after the starting index has been deleted.


rev!

This feature corresponds to the elt! feature. This one yields the values of the individual elements of self starting with the one with the highest index and thereafter successively lower indices.

rev! : ELT
Formal Signature

Note that the formal name of the iter has been changed to replace the exclamation mark iter symbol to a name acceptable to vdm tools.

rev_iter(self : SAME) yld : ELT
Pre-condition
pre size(self) > 0
Post-condition

This post-condition makes use of the history concept from vdm++ (see the vdm dialect notes).

post yld = self(size(self) - size(history~)
and history = history~ ^ yld
Quit condition

For quit actions see the specificatiion of the quit statement.

errs QUIT : size(history) = size(self) -> quit

This iter yields the elements of self in reverse order of the indices.


Codes from text strings

A text string consists of text elements which may have one or more codes per element (in Telugu or Vietnamese, for example). One of the necessary features of internationalising the required library, therefore, has resulted in the concept of a character code - the class CHAR_CODE. The routines in this section are provided to manipulate these when doing such things as code/character conversion/substitution operations.

create

All forms of text string require this form of creation operation in order that composition of characters may be effected. This merely returns a text string containing the element denoted by the single code. Note that this code may not be a combining code(see the class UNICODE for further information on this).

create (
code : CHAR_CODE
) : SAME
Formal Signature
create(code : CHAR_CODE) res : SAME
Pre-condition

Since the code can have any value and the string takes its encoding from that, the pre-condition is vacuously true.

Post-condition
post size(res) = 1

This creation routine returns a single element string formed from the encoding given.


code!

This is the first of a pair of code yielding iters. Do not assume that the number of codes yielded will correspond to the number of elements in the text string. That is only true for text strings in which all elements happen to have a single code!

code! : CHAR_CODE
Formal Signature

Note that the formal name of the iter has been changed to replace the exclamation mark iter symbol to a name acceptable to vdm tools.

code_iter1(self : SAME) yld : CHAR_CODE
Pre-condition
pre size(self) > 0
Post-condition

This post-condition makes use of the history concept from vdm++ (see the vdm dialect notes).

post let codes : seq of ELT be st codes = self in
yld = codes(card history~ + 1)
and history = history~ ^ yld
Quit condition

For quit actions see the specification of the quit statement.

errs QUIT : let codes : seq of ELT be st codes = self in
card(history) = card(codes) -> quit

This iter yields each individual character encoding in self in sequence using the repertoire and encoding of the text string.


code!

code! (
start : CARD
) : CHAR_CODE
Formal Signature

Note that the formal name of the iter has been changed to replace the exclamation mark iter symbol to a name acceptable to vdm tools.

code_iter2(self : SAME) yld : CHAR_CODE
Pre-condition
pre size(self) > 0
Post-condition

This post-condition makes use of the history concept from vdm++ (see the vdm dialect notes).

post let codes : seq of CHAR_CODE be st codes = self((start_code + 1),..., card self) in
yld = codes(card history~ + 1)
and history = history~ ^ yld
Quit condition

For quit actions see the specification of the quit statement.

errs QUIT : let codes : seq of CHAR_CODE be st codes = self((start_code + 1),..., card self) in
card(history) = card(codes) -> quit

This iter yields individual character encodings in self in sequence beginning with the first code in the element at the given index in the string.


Language Index Library Index String Index
Comments or enquiries should be made to Keith Hopper.
Page last modified: Thursday, 25 May 2000.
Produced with Amaya