![]() |
Section 8.16.1.4:
|
![]() |
This page defines two generic abstract classes named $TEXT_STRING which have different numbers of class arguments
This abstract class defines a state component which is a set of all instantiations of objects of any class sub-typing from this class in addition to the vdm model types used wherever this class name is used. Note that SAME has to be an instantiated class, not an abstract one.
types SAME = object_type ; $STRING_ELT = set of object_type state multi : $STRING_ELT inv multi_types == forall obj | obj in set multi_types & sub_type($STRING_ELT,obj)
NOTE | See the important note about vdm state in the notes on vdm-sl usage in this specification. |
This abstract class characterises the concept of all forms of simple string whether binary, text or other as sequences of the argument class (elements) which must sub-type from $IS_EQ. Classes which sub-type from this shall have immutable semantics!
All forms of text string require to be able to create a new string from a single text element. This is independent of the type of element.
create | ( |
val : ELT | |
) : SAME |
create(val : ELT) res : SAME
Since this is a creation feature, the pre-condition is vacuously true.
post res = [val]
This creation feature returns a new text string consisting of the single element given.
This feature identifies the culture, character encoding and repertoire which are associated with the string. It need not be the default culture and coding for the environment in which the program is executing, since a program may manipulate culture objects independently of local textual representations.
index_lib(self : SAME) res : LIBCHARS
Since the string argument is not optional, the pre-condition is vacuously true.
This is also vacuously true, since it is a component of every string of text.
This feature provides access to all of the cultural and environment dependencies relating to this character string.
abstract class $TEXT_STRING{ | |
ELT < $IS_EQ, | |
FTP < $FTEXT_STRING{ELT}, | |
STP < $TEXT_STRING{ELT} | |
} < $TEXT_STRING{ELT}, $SEARCH{ELT}, $BINARY |
This abstract class defines a state component which is a set of all instantiations of objects of any class sub-typing from this class in addition to the vdm model types used wherever this class name is used. Note that SAME has to be an instantiated class, not an abstract one.
types SAME = object_type ; $TEXT_STRING_ELT_FTP_STP = set of object_type state multi : $TEXT_STRING_ELT_FTP_STP inv multi_types == forall obj in set multi_types & sub_type($TEXT_STRING_ELT_FTP_STP,obj)
NOTE | See the important note about vdm state in the notes on vdm-sl usage in this specification. |
This abstract class characterises the concept of a text string as a sequence of the argument class (elements) which must sub-type from $IS_EQ. The second and third class arguments are the 'corresponding' mutable ($FTEXT_STRINGS{ELT}) and immutable (sub-typing from $TEXT_STRINGS{ELT}) string classes. Classes which sub-type from this class shall have immutable semantics!
The specification of the strip feature in this class needs the following auxiliary functions.
functions lmark () res : STP post res = CHAR_STR.str(LIBCHARS.Line_Mark(STP.index_lib(buffer))) lm_tail : SAME -> BOOL lm_tail(str) == let test = str(1, ..., (len str - len lmark())) in test = lmark() remove_lm : SAME -> SAME remove_lm(str) == if str = [] then [] elseif lm_tail(str) then remove_lm(str(1, ..., (len str - len lmark))) else str end
This feature replaces the one inherited from $BINARY which makes use of the execution environment default repertoire and encoding in building the resultant text string.
build | ( |
cursor : BIN_CURSOR | |
) : SAME |
build(cursor : BIN_CURSOR) res : SAME
pre not cursor.is_done
post let width = lib.my_size in ((BIN_CURSOR.remaining(cursor) mod width > 0) or not exists idx1, idx2 | idx1 in set inds cursor.buffer andidx2 in set inds cursor.buffer and(idx2 = idx1 + width - 1) & REP_MAP.is_valid_encoding(LIBCHARS.culture(LIBCHARS.default()).charmap, cursor.buffer(idx1, ..., idx2)) and (cursor.index = cursor~.index)) or (cursor.is_done and let res be st forall idx | idx in set inds res & let start = idx - 1 * width, finish = start + width - 1 in binstr(res(idx)) = cursor.buffer(start, ..., finish)
This routine builds a new string from the binary string indicated using the encoding and repertoire defined by the external execution environment. If there is not an exact number of character codes in the string then void is returned and the cursor has not been moved.
This feature makes use of the given encoding and repertoire rather than the execution environment default in building the resultant text string.
build | ( |
cursor : BIN_CURSOR, | |
lib : LIBCHARS | |
) : SAME |
build2(cursor : BIN_CURSOR, lib : LIBCHARS) res : SAME
pre not cursor.is_done
post let width = lib.my_size in ((BIN_CURSOR.remaining(cursor) mod width > 0) or not exists idx1, idx2 | idx1 in set inds cursor.buffer andidx2 in set inds cursor.buffer and(idx2 = idx1 + width - 1) & REP_MAP.is_valid_encoding(LIBCHARS.culture(lib).charmap, cursor.buffer(idx1, ..., idx2)) and (cursor.index = cursor~.index)) or (cursor.is_done(self) and let res be st forall idx | idx in set inds res & let start = idx - 1 * width, finish = start + width - 1 in binstr(res(idx)) = cursor.buffer(start, ..., finish)
This routine builds a new text string from the binary string indicated using the encoding and repertoire defined by lib. If there is not an exact number of character codes in the string then void is returned and the cursor has not been moved.
This feature provides a facility for removing line marks from the end of a string (if there are any there). Multiple line marks at the end will be removed, irrespective of any escaping mechanism.
strip | : SAME |
strip(self : SAME) res : SAME
Since the self argument is not optional then this pre-condition is vacuously true.
This post-condition uses the auxiliary function remove_lm defined above.
post res = remove_lm(self)
This feature removes as many line marks as are found at the end of the string, returning the result (which may, of course, be empty).
This predicate tests to determine if the string contains all upper-case letters (being defined by the current execution environment cultural specification as being in the class 'upper'). Note that where a script does not define any upper case letters - or has no case distinction at all then the result will be identically false - even though the characters are letters.
is_upper(self : SAME) res : BOOL
pre size(self) > 0
post res =
forall index | index in inds self & CHAR_CLASS.kind(self(index)) = CHAR_CLASS.Upper_Case
This predicate returns true if and only if every element of self is upper-case, otherwise false. Where there is no case distinction in the script concerned then this returns identically false.
This predicate tests to determine if the string contains all lower-case letters (being defined by the current execution environment cultural specification as being in the class 'lower'). Note that where a script does not define any lower case letters - or has no case distinction at all then the result will be identically false - even though the characters are letters.
is_lower(self : SAME) res : BOOL
pre size(self) > 0
post res =
forall index | index in inds self & CHAR_CLASS.kind(self(index)) = CHAR_CLASS.Lower_Case
This predicate returns true if and only if every element of self is lower-case, otherwise false. Where there is no case distinction in the script concerned then this returns identically false.
This feature returns a single indexed character element from the string.
char(self : SAME, index : CARD) res : ELT
pre index < self(size)
Note that the index in this post-condition is incremented by one to take account of the indexing difference between Sather and vdm.
post index = index~ + 1 and res = self(index~)
This routine returns the element to be found at the indicated position in self.
This routine creates a copy of self in which all lower case letters are replaced by an upper case equivalent if one exists. Note that there are scripts (eg Armenian) which have lower case letters to which there is no corresponding upper case letter. If no upper case equivalent exists then no change is made to a letter code. Non-letter codes are not changed.
upper | : SAME |
upper(self : SAME) res : SAME
pre size(self) > 0
post let upindices : set of nat1 = {index |forall index in set dom self & CHAR_CLASS.kind(self(index)) = CHAR_CLASS.Lower_Case} in forall idx | idx in set upindices & self(idx) in set UNICODE.Lower_only or res(idx) = CHAR_MAPPING.to_domain(self(idx))
This routine returns a copy of self in which every lower case character has been converted to its upper case equivalent provided one exists.
This routine creates a copy of self in which all upper case letters are replaced by a lower case equivalent. Non-letter codes are not changed.
lower | : SAME |
lower(self : SAME) res : SAME
pre size(self) > 0
post let upindices : set of nat1 = {index | forall index in set dom self & CHAR_CLASS.kind(self(index)) = CHAR_CLASS.Upper_Case} in forall idx in set upindices & res(idx) = CHAR_MAPPING.to_range(self(idx))
This routine returns a copy of self in which every upper case character has been converted to its lower case equivalent.
This routine creates a copy of self in which the first character of each word is converted to its upper case equivalent (if one exists). The start of a word is defined as either the first character in the string unless that is white space or punctuation, otherwise the first character following a whitespace or punctuation character unless that is itself white space or punctuation.
capitalize | : SAME |
capitalize(self : SAME) res : SAME
pre size(self) > 0
post let space : set of ETP = CHAR_TYPES.classes(CHAR_CLASS.Space) union CHAR_TYPES.classes(CHAR_CLASS.Punctuation) in let capindices : set of nat1 = {index | forall index | index in inds self & ((index = 1) and self(index) not in set space) or (self(index) not in set space and self(index - 1) in set space} in forall idx | idx in set capindices & self(idx) in set UNICODE.Lower_only or res(idx) = CHAR_MAPPING.to_domain(self(idx))
This routine returns a copy of self in which the first character of every word (from the beginning of the string or after punctuation or a whitespace) is converted to its upper case equivalent if one exists.
This feature returns a text string which is the concatenation of self the given number of times.
repeat(self : SAME, cnt : CARD) res : SAME
pre (size(self) > 0) and (cnt > 0)
post forall idx | idx in set {1,...,cnt} & let start : nat1 = (idx - 1) * size(self) + 1 in forall index. index2 | index in set {start,...,(start + size(self))} and index2 in inds self & self(index2) = res(index)
This routine returns a new string which contains the contents of self concatenated cnt times.
This feature enables arbitrary element substitution to be made over the entire text string.
replace | ( |
old_elt : ELT, | |
new_elt : ELT | |
) : SAME |
replace(self : SAME, old_elt : ELT, new_elt : ELT) res : SAME
pre size(self) > 0
post forall index | index in inds self & ((self(index) = old_elt) and (res(index) = new_elt)) or (self(index) = res(index)
This routine returns a new string which is a copy of self apart from which each occurrence of old_elt has been replaced by new_elt.
This second variant of this feature enables simple set substitution to be made, any element in the string which is treated as if it were a set of elements being replaced by the given replacement element.
replace | ( |
test_set : STP, | |
new_elt : ELT | |
) : SAME |
replace(self : SAME, test_set : STP, new_elt : ELT) res : SAME
pre size(self) > 0 and STP.size(test_set) > 0
post forall index | index in inds self & ((self(index) in set dom test_set) and (res(index) = new_elt)) or (self(index) = res(index)
This routine returns a copy of self in which all occurrences of any element in set are replaced by new_elt.
This feature returns a copy of self in which every occurrence of the argument has been deleted.
remove | ( |
elt : ELT | |
) : SAME |
remove(self : SAME, elt : ELT) res : SAME
pre size(self) > 0
post res = [self(index) | forall index | index in inds self & self(index) <> elt]
This routine returns a copy of self from which all occurrences of elt have been removed.
This feature returns a copy of self in which every occurrence of an element which is in the str argument has been deleted. The string argument is treated as if it were a set of elements.
remove | ( |
test_set : STP | |
) : SAME |
remove2(self : SAME, test_set : STP) res : SAME
pre size(self) > 0 and STP.size(test_set) > 0
post res = [self(index) | forall index | index in inds self & self(index) not in set dom test_set]
This routine returns a copy of self from which all elements contained in test_set have been removed.
This routine provides a facility to convert a text string into one with escape elements inserted. This is frequently useful when it is necessary to process the string by some external service which may treat the elements in elist specially unless preceded by an escape element. The list argument is treated as if it were a set of elements. Note that the list argument may be empty, in which case the only changes which occur is the duplication of every escape element.
escape | ( |
escape : ELT, | |
elist : STP | |
) : SAME |
escape(self : SAME, escape : ELT, elist : STP) res : SAME
pre len self > 0
post let test_set : set of ELT = dom elist union {esc} in res = escaped(self,esc,test_set) escaped : SAME * ELT * set of ELT -> SAME escaped(me,escape,test_set) ==
let loc_res : SAME = let head = hd me in if head in set test_set then [escape,head] else [head] in if tl me = [] then loc_res else loc_res ^ escaped(tl me,escape,test_set)
This routine returns a text string which is a copy of self in which all elements occurring in elist - and the escape element itself - are preceded by the escape element.
This feature returns a copy of self from which the first occurrence (if any) of str has been removed.
minus | ( |
str : STP | |
) : SAME |
minus(self : SAME, str : STP) res : SAME
pre len self > 0 and len self >= STP.size(str)
post let tmp : [seq of ELT] be st (head ^ tmp ^ tail = self) and ((tmp = str) or (tmp = nil)) in res = head ^ tail
This routine returns a copy of self from which the first (if any) occurrence of str has been deleted.
This variant of the minus feature returns a copy of self from which the first occurrence after the given index position (if any) of str has been removed.
minus2(self : SAME, str : STP, start : CARD) res : SAME
pre size(self) > 0 and len self >= STP.size(str) + start
post let ignored : [seq of ELT] be st ignored ^ self(1,...,(start + 1)) = self in let tmp : [seq of ELT] be st (head ^ tmp ^ tail = self) and ((tmp = str) or (tmp = nil)) in res = ignored ^ head ^ tail
This routine returns a copy of self from which the first (if any) occurrence of str after the starting index has been deleted.
This feature corresponds to the elt! feature. This one yields the values of the individual elements of self starting with the one with the highest index and thereafter successively lower indices.
rev! | : ELT |
Note that the formal name of the iter has been changed to replace the exclamation mark iter symbol to a name acceptable to vdm tools.
rev_iter(self : SAME) yld : ELT
pre size(self) > 0
This post-condition makes use of the history concept from vdm++ (see the vdm dialect notes).
post yld = self(size(self) - size(history~) and history = history~ ^ yld
For quit actions see the specificatiion of the quit statement.
errs QUIT : len history = len self -> quit
This iter yields the elements of self in reverse order of the indices.
A text string consists of text elements which may have one or more codes per element (in Telugu or Vietnamese, for example). One of the necessary features of internationalising the required library, therefore, has resulted in the concept of a character code - the class CHAR_CODE. The routines in this section are provided to manipulate these when doing such things as code/character conversion/substitution operations.
All forms of text string require this form of creation operation in order that composition of characters may be effected. This merely returns a text string containing the element denoted by the single code. Note that this code may not be a combining code(see the class UNICODE for further information on this).
create(code : CHAR_CODE) res : SAME
Since the code can have any value and the string takes its encoding from that, the pre-condition is vacuously true.
post len res = 1
This creation routine returns a single element string formed from the encoding given.
This is the first of a pair of code yielding iters. Do not assume that the number of codes yielded will correspond to the number of elements in the text string. That is only true for text strings in which all elements happen to have a single code!
Note that the formal name of the iter has been changed to replace the exclamation mark iter symbol to a name acceptable to vdm tools.
code_iter1(self : SAME) yld : CHAR_CODE
pre size(self) > 0
This post-condition makes use of the history concept from vdm++ (see the vdm dialect notes).
post let codes : seq of ELT be st codes = self in yld = codes(card history~ + 1) and history = history~ ^ yld
For quit actions see the specification of the quit statement.
errs QUIT : let codes : seq of ELT be st codes = self in card history = card codes -> quit
This iter yields each individual character encoding in self in sequence using the repertoire and encoding of the text string.
This variant starts yielding codes at the indicated starting index.
Note that the formal name of the iter has been changed to replace the exclamation mark iter symbol to a name acceptable to vdm tools.
code_iter2(self : SAME, start_code : CARD) yld : CHAR_CODE
pre len self > 0
This post-condition makes use of the history concept from vdm++ (see the vdm dialect notes).
post let codes : seq of CHAR_CODE be st codes = self((start_code + 1),..., card self) in yld = codes(card history~ + 1) and history = history~ ^ yld
For quit actions see the specification of the quit statement.
errs QUIT : let codes : seq of CHAR_CODE be st codes = self((start_code + 1),..., card self) in card(history) = card codes -> quit
This iter yields individual character encodings in self in sequence beginning with the first code in the element at the given index in the string.
![]() |
Language Index | ![]() |
Library Index | ![]() |
String Index |
Comments
or enquiries should be made toKeith Hopper. Page last modified: Wednesday, 4 April 2001. |
![]() |