![]() |
Section 8.3.2.6:
|
![]() |
Inheritance Diagram
types SAME = UNICODE ; UNICODE = token ;
This class implements the concept of a character encoding which is applicable to a given culture.
The following feature is required to be implemented for this class in accordance with the specification given in $IS_EQ from which $ORDERED{UNICODE} sub-types :-
The following feature is required to be implemented for this class in accordance with the specification given in $IS_LT{UNICODE} from which $ORDERED{UNICODE} sub-types :-
The following feature is required to be implemented for this class in accordance with the specification given in $IS_NIL which is inherited from the class $ORDERED from which this sub-types :-
The following feature is required to be implemented for this class in accordance with the specification given in $NIL from which $ORDERED{UNICODE} sub-types:-
The majority of the reader routines below are either individual codes or code tables of some kind. This section has therefore been divided into five sub-groups.
This small group of properties is defined for uniformity with other classes -
The following table of codes are universal in the sense that all previous international 8-bit standards use the identical code values - this is essential for backward compatibility. It is strongly recommended that these names be used, since it is possible that a later revision of the standard changes the code values concerned. All of these names return a value of this class!
SPACE | EXCLAMATION_MARK | QUOTATION_MARK | NUMBER_SIGN |
DOLLAR_SIGN | PERCENT_SIGN | AMPERSAND | APOSTROPHE |
LEFT_PARENTHESIS | RIGHT_PARENTHESIS | ASTERISK | PLUS_SIGN |
COMMA | HYPHEN_MINUS | FULL_STOP | SOLIDUS |
DIGIT_ZERO | DIGIT_ONE | DIGIT_TWO | DIGIT_THREE |
DIGIT_FOUR | DIGIT_FIVE | DIGIT_SIX | DIGIT_SEVEN |
DIGIT_EIGHT | DIGIT_NINE | COLON | SEMICOLON |
LESS_THAN_SIGN | EQUALS_SIGN | GREATER_THAN_SIGN | QUESTION_MARK |
COMMERCIAL_AT | LATIN_CAPITAL_LETTER_A | LATIN_CAPITAL_LETTER_B | LATIN_CAPITAL_LETTER_C |
LATIN_CAPITAL_LETTER_D | LATIN_CAPITAL_LETTER_E | LATIN_CAPITAL_LETTER_F | LATIN_CAPITAL_LETTER_G |
LATIN_CAPITAL_LETTER_H | LATIN_CAPITAL_LETTER_I | LATIN_CAPITAL_LETTER_J | LATIN_CAPITAL_LETTER_K |
LATIN_CAPITAL_LETTER_L | LATIN_CAPITAL_LETTER_M | LATIN_CAPITAL_LETTER_N | LATIN_CAPITAL_LETTER_O |
LATIN_CAPITAL_LETTER_P | LATIN_CAPITAL_LETTER_Q | LATIN_CAPITAL_LETTER_R | LATIN_CAPITAL_LETTER_S |
LATIN_CAPITAL_LETTER_T | LATIN_CAPITAL_LETTER_U | LATIN_CAPITAL_LETTER_V | LATIN_CAPITAL_LETTER_W |
LATIN_CAPITAL_LETTER_X | LATIN_CAPITAL_LETTER_Y | LATIN_CAPITAL_LETTER_Z | LEFT_SQUARE_BRACKET |
REVERSE_SOLIDUS | RIGHT_SQUARE_BRACKET | CIRCUMFLEX_ACCENT | LOW_LINE |
GRAVE_ACCENT | LATIN_SMALL_LETTER_A | LATIN_SMALL_LETTER_B | LATIN_SMALL_LETTER_C |
LATIN_SMALL_LETTER_D | LATIN_SMALL_LETTER_E | LATIN_SMALL_LETTER_F | LATIN_SMALL_LETTER_G |
LATIN_SMALL_LETTER_H | LATIN_SMALL_LETTER_I | LATIN_SMALL_LETTER_J | LATIN_SMALL_LETTER_K |
LATIN_SMALL_LETTER_L | LATIN_SMALL_LETTER_M | LATIN_SMALL_LETTER_N | LATIN_SMALL_LETTER_O |
LATIN_SMALL_LETTER_P | LATIN_SMALL_LETTER_Q | LATIN_SMALL_LETTER_R | LATIN_SMALL_LETTER_S |
LATIN_SMALL_LETTER_T | LATIN_SMALL_LETTER_U | LATIN_SMALL_LETTER_V | LATIN_SMALL_LETTER_W |
LATIN_SMALL_LETTER_X | LATIN_SMALL_LETTER_Y | LATIN_SMALL_LETTER_Z | LEFT_CURLY_BRACKET |
VERTICAL_LINE | RIGHT_CURLY_BRACKET | TILDE |
The Unicode standard defines a small group of codes which are not valid characters, but are defined for use in manipulating codes for communication purposes. Either the ISO/IEC standard or the Unicode documents should be studied for the exact semantics associated with thes encodings.
This long group of tables are the defined groupings in the Unicode Standard. These table ranges include codes which are reserved - and those which are not yet allocated in the specific ranges. Many, however, are full! The order of the entries in the list below is in code point order of the first element in the range.
|
|
The following reader routines contain all and only valid code ranges. Gaps which are currently 'reserved' or otherwise are not in these tables. It is expected in practice that the predicates which are defined in this class will be used to determine the category or validity of some particular code.
This predicate returns true if and only if the given numeric argument is a valid bit-pattern for a character encoding in the given culture.
is_valid(val : CARD) res : BOOL
Since this operation is a predicate then this pre-condition is vacuously true.
Note that this post-condition performs an abstract character code creation in order to determine if the result is in the domain of the character repertoire. Such an operation could not be performed in general with executable code. It is used solely for specification purposes.
post let loc_ch = create(val) in (exists script | script in set dom SCRIPTS & let loc_dom = dom Code_Groups(script) in res = (loc_ch in set loc_dom)) or (exists symbol | symbol in set dom SYMBOLS & let loc_sym_dom = dom Symbolics(symbol) in res = (loc_ch in set loc_sym_dom))
This predicate returns true if and only if the bit-pattern of the numeric argument forms a valid Unicode encoding.
This feature creates a character code from the given numeric value, used as a bit-pattern.
create(val : CARD) res : SAME
pre is_valid(val)
post let loc_res : seq of OCTET be st loc_res = res in loc_res = CARD.binstr(val)(1, ..., asize) and lib(res) = plib
This feature creates a new character code which has the bit-pattern representation which is the same as the value given.
This feature creates a new Unicode encoding which is that of the given rune - or, if the argument contains more than one encoding, that of the first element.
create2(rn : RUNE) res : SAME
Since the argument is not optional then this pre-condition is vacuously true.
post res = create(CHAR_CODE.card(RUNE.code(rn)))
This feature creates a new Unicode encoding which has the same encoding as that of the first element in the given argument.
This predicate returns true only if self is a combining code as defined in the Unicode version 3.1 standard.
is_combining(self : SAME) res : BOOL
Since this feature is a predicate and the argument is not optional then this pre-condition is vacuously true.
post res = self in set dom Combining
This predicate returns true if and only if self is a combining encoding in the Unicode domain.
This predicate returns true only if self is a letter code in the given script as defined in ISO/IEC 10646-1:2000
is_letter(self : SAME, script : SCRIPTS) res : BOOL
Since this feature is a predicate and neither argument is optional then this pre-condition is vacuously true.
post res = self in set dom Letters(script.enum)
This predicate returns true if and only if self is a letter code in the domain of the given script.
This predicate returns true only if self is a letter code in any script defined in the Unicode standard.
is_letter2(self : SAME) res : BOOL
Since this feature is a predicate and the argument is not optional then this pre-condition is vacuously true.
post res = exists script | script in set dom Letters & self in set dom Letters(script)
This predicate returns true if and only if self is a letter encoding in any script defined in the Unicode domain.
This predicate returns true only if self is an encoding for a lower case letter to which there is a corresponding upper case letter.
is_up_mapped(self : SAME) res : BOOL
Since this feature is a predicate and the argument is not optional then this pre-condition is vacuously true.
post res = self in set dom Case_Pair
This predicate returns true if and only if self is an encoding for a lower case letter to which there is a mapped upper case letter in the Unicode domain.
This predicate returns true only if self is an encoding for a lower case letter in the Unicode standard. Note that there is a number of encodings for which there is no corresponding upper case character.
is_lower(self : SAME) res : BOOL
Since this feature is a predicate and the argument is not optional then this pre-condition is vacuously true.
post res = self in set dom Case_Pair or self in set dom Lower_only
This predicate returns true if and only if self is an encoding for a lower case letter in the Unicode domain.
This predicate returns true only if self is an encoding for an upper case letter to which there is a corresponding lower case letter.
is_down_mapped(self : SAME) res : BOOL
Since this feature is a predicate and the argument is not optional then this pre-condition is vacuously true.
post res = self in set rng Case_Pair
This predicate returns true if and only if self is an encoding for an upper case letter to which there is a mapped lower case letter in the Unicode domain.
This predicate returns true only if self is an encoding for an upper case letter in the Unicode standard. Note that there is a number of encodings for which there is no corresponding lower case character.
is_upper(self : SAME) res : BOOL
Since this feature is a predicate and the argument is not optional then this pre-condition is vacuously true.
post res = self in set rng Case_Pair or self in set dom Upper_only
This predicate returns true if and only if self is an encoding for an upper case letter in the Unicode domain.
This predicate returns true only if self is an encoding for an invisible mark in the Unicode standard which occupies space on a presentation medium.
is_spacing(self : SAME) res : BOOL
Since this feature is a predicate and the argument is not optional then this pre-condition is vacuously true.
post res = let loc_space = dom Symbolics(SYMBOLS.Spacing) in self in set loc_space
This predicate returns true if and only if self is an encoding for a character which is invisible on a presentation medium but nevertheless occupies space when rendered.
This predicate returns true only if self is either an encoding for an invisible mark in the Unicode standard which occupies space on a presentation medium or a bit-pattern corresponding to the name of a control function which effects movement but no marking on a presentation medium.
is_white_space(self : SAME) res : BOOL
Since this feature is a predicate and the argument is not optional then this pre-condition is vacuously true.
post res = (let loc_space = dom Symbolics(SYMBOLS.Spacing) in self in set loc_space) or CONTROL_CODES.is_space(CONTROL_CODES.create(card(self)))
This predicate returns true if and only if self is an encoding for a character which is invisible on a presentation medium but nevertheless occupies space when rendered.
This predicate returns true only if self is an encoding for a numeric symbol in the given script as defined in ISO/IEC 10646-1:2000. This does not mean that it is possible to convert such an encoding into a numeric value - see is_digit below.
is_numeric | ( |
script : SCRIPT | |
) : BOOL |
is_numeric(self : SAME, script : SCRIPTS) res : BOOL
Since this feature is a predicate and neither argument is optional then this pre-condition is vacuously true.
post res = self in set dom Numeric(script.enum)
This predicate returns true if and only if self is a letter code in the domain of the given script.
This predicate returns true only if self is an encoding representing a numeric value in any script defined in the Unicode standard. This does not mean that it is possible to convert such an encoding into a numeric value - see is_digit below.
is_numeric2(self : SAME) res : BOOL
Since this feature is a predicate and the argument is not optional then this pre-condition is vacuously true.
post res = exists script | script in set dom Numeric & self in set dom Numeric(script))
This predicate returns true if and only if self is an encoding representing a numeric symbol in the Unicode domain.
This predicate returns true only if self is a decimal digit encoding in the given script as defined in ISO/IEC 10646-1:2000
is_digit(self : SAME, script : SCRIPTS) res : BOOL
Since this feature is a predicate and neither argument is optional then this pre-condition is vacuously true.
post res = self in set dom Decimal(script.enum)
This predicate returns true if and only if self is a decimal digit encoding in the domain of the given script.
This predicate returns true only if self is a decimal digit encoding in any script defined in the Unicode standard.
is_digit2(self : SAME) res : BOOL
Since this feature is a predicate and the argument is not optional then this pre-condition is vacuously true.
post res = exists script | script in set dom Decimal & self in set dom Decimal(script)
This predicate returns true if and only if self is a decimal digit encoding in any script defined in the Unicode domain.
This predicate returns true only if self is an octal digit encoding in the given script as defined in ISO/IEC 10646-1:2000
is_octal_digit | ( |
script : SCRIPT | |
) : BOOL |
is_octal_digit(self : SAME, script : SCRIPTS) res : BOOL
Since this feature is a predicate and neither argument is optional then this pre-condition is vacuously true.
post res = self in set dom Decimal(script.enum) and digit_value(self) < OCTET::Octet_Bits
This predicate returns true if and only if self is an octal digi encoding in the domain of the given script.
This predicate returns true only if self is an octal digit code in any script defined in the Unicode standard.
is_octal_digit2(self : SAME) res : BOOL
Since this feature is a predicate and the argument is not optional then this pre-condition is vacuously true.
post res = (exists script | script in set dom Decimal & self in set dom Decimal(script)) and digit_value(self) < OCTET::Octet_Bits
This predicate returns true if and only if self is an octal digit encoding in any script defined in the Unicode domain.
This predicate returns true only if self is a hexadecimal digit code defined in the Unicode standard. Note that this can only be true in the Latin script.
is_hex_digit(self : SAME) res : BOOL
Since this feature is a predicate and the argument is not optional then this pre-condition is vacuously true.
post res = self in set dom Decimal(SCRIPTS.Latin)) or self in set (LATIN_CAPITAL_LETTER_A, ..., LATIN_CAPITAL_LETTER_F) or self in set (LATIN_SMALL_LETTER_A, ..., LATIN_SMALL_LETTER_F)
This predicate returns true if and only if self is a hexadecimal digit encoding in any script defined in the Unicode domain.
This predicate returns true only if self is an encoding defined in the Unicode standard for which a rendering engine will produce a mark.
is_print(self : SAME) res : BOOL
Since this feature is a predicate and the argument is not optional then this pre-condition is vacuously true.
post res = not is_spacing(self)
This predicate returns true if and only if self is an encoding in any script defined in the Unicode domain for a character which, when rendered on some presentation medium is visible.
This predicate returns true only if self is an encoding defined in the Unicode standard for a punctuation symbol.
is_punct(self : SAME) res : BOOL
Since this feature is a predicate and the argument is not optional then this pre-condition is vacuously true.
post res = self in set dom Symbolics(SYMBOLS.Punctuation)
This predicate returns true if and only if self is an encoding in any script defined in the Unicode domain for a punctuation symbol.
This predicate returns false since control codes, not being characters, do not form part of Unicode.
is_control(self : SAME) res : BOOL
Since this feature is a predicate and the argument is not optional then this pre-condition is vacuously true.
post res = false
This predicate returns true if and only if self is a control code - this is identically false.
This predicate returns true if self is a character encoding corresponding to the encoding of characters defined by ISO 646-IRV. Note that this is identical to ISO 646-US and US-ASCII and is also a subset of all standards in the ISO 8859 family.
is_646char(self : SAME) res : BOOL
Since this feature is a predicate and the argument is not optional then this pre-condition is vacuously true.
post res = (self in set dom Basic_Latin) and (let loc_dom = dom Code_Groups(SCRIPTS.Latin) in res = (loc_ch in set loc_dom) or (exists symbol | symbol in set dom SYMBOLS & let loc_sym_dom = dom Symbolics(symbol) in res = (loc_ch in set loc_sym_dom)))
This predicate returns true if and only if self is an encoding defined in ISO 646 7-bit character encoding standard.
This feature returns the numeric value of the bit-pattern of self as a cardinal number.
card(self : SAME) res : CARD
Since the argument is not optional then this pre-condition is vacuously true.
post create(res) = self
This feature returns the cardinal number corresponding to the code value (as a bit-pattern) of self.
This feature returns the character code corresponding to the value of self. The culture code kind of the result is Unicode.
code(self : SAME) res : CHAR_CODE
Since the argument is not optional then this pre-condition is vacuously true.
post let loc_lib : LIBCHARS = CHAR_CODE.lib(res) in res = CHAR_CODE.create(card(self),loc_lib) and CULTURE.kind(LIBCHARS.culture(loc_lib)) = CODE_KINDS.Unicode
This feature returns the character code which has the encoding which is self and for which the culture code kind is CODE_KINDS::Unicode.
This feature creates and returns a rune containing the single encoding self for which the culture code kind is Unicode.
rune(self : SAME) res : RUNE
Since this feature is a predicate and the argument is not optional then this pre-condition is vacuously true.
post (is_combining(self) and res = RUNE.nil) or res = CHAR_CODE.rune(code(self))
This feature returns the single encoding rune corresponding to self, provided that self is not a combining code when RUNE::nil shall be returned.
This feature returns the lower case letter corresponding to self provided the pre-condition (that such a mapping exists) is satisfied.
to_lower | : SAME |
to_lower(self : SAME) res : SAME
pre is_down_mapped(self)
post to_upper(res) = self
This feature returns the encoding of the lower case letter equivalent to the upper case letter for which self is an encoding.
This feature returns the upper case letter corresponding to self provided the pre-condition (that such a mapping exists) is satisfied.
to_upper | : SAME |
to_upper(self : SAME) res : SAME
pre is_up_mapped(self)
post to_lower(res) = self
This feature returns the encoding of the upper case letter equivalent to the lower case letter for which self is an encoding.
This feature returns the octal value corresponding to the digit character encoding self
octal_value(self : SAME) res : CARD
pre is_octal_digit(self)
post exists sublist | sublist in set dom Decimal & let loc_rng = iota rng | rng in set dom sublist & self in set dom loc_rng in res = (iota idx | idx in inds loc_rng & self = loc_rng(idx)) - 1
This routine returns as a cardinal number the value of the octal digit corresponding to the encoding self.
This feature returns the decimal value corresponding to the digit character encoding self.
digit_value(self : SAME) res : CARD
pre is_digit(self)
post exists sublist | sublist in set dom Decimal & let loc_rng = iota rng | rng in set dom sublist & self in set dom loc_rng in res = (iota idx | idx in inds loc_rng & self = loc_rng(idx)) - 1
This routine returns as a cardinal number the value of the decimal digit corresponding to the encoding self.
This feature returns the decimal value corresponding to the digit character encoding self.
hex_digit_value(self : SAME) res : CARD
pre is_hex_digit(self)
post (is_digit(self) and res = digit_value(self)) or (let uc_seq = (LATIN_CAPITAL_LETTER_A, ..., LATIN_CAPITAL_LETTER_F) in res = (iota idx | idx in inds uc_seq & self = uc_seq(idx)) + 9) or (let lc_seq = (LATIN_SMALL_LETTER_A, ..., LATIN_SMALL_LETTER_F) in res = (iota idx | idx in inds lc_seq & self = lc_seq(idx)) + 9)
This routine returns as a cardinal number the value of the hexadecimal digit corresponding to the encoding self.
![]() |
Language Index | ![]() |
Library Index | ![]() |
Codes Index |
Comments
or enquiries should be made toKeith Hopper. Page last modified: Friday, 1 June 2001. |
![]() |