8.17. File tom/CharEncoding

class tom.CharacterEncoding

The CharacterEncoding class defines the interface of the byte and character encodings for predicates and conversions.

inherits

Behaviour supers: All

instance tom.CharacterEncoding

inherits

Behaviour supers: All

methods


deferred String
  name;

Return the name of this encoding.


deferred char
  decode byte b;

Return the decoded byte b, i.e. the Unicode character corresponding to the byte b in the receiving encoding.


deferred byte
  encode char c;

Return the byte encoding of the character c. If the byte equivalent of the character c does not exist in the receiving encoding, an encoding-condition is signaled, and the byte encoded is the byteValue of the object returned, or 127 if nil is returned.


deferred boolean
  isAlpha byte b;

Return TRUE the character denoted by the byte b in the receiving encoding is a letter.


deferred boolean
  isDigit byte b;

Return TRUE the character denoted by the byte b in the receiving encoding is a digit.


deferred boolean
  isLower byte b;

Return TRUE the character denoted by the byte b in the receiving encoding is a lowercase letter.


deferred boolean
  isPunct byte b;

Return TRUE the character denoted by the byte b in the receiving encoding is a punctuation character.


deferred boolean
  isSpace byte b;

Return TRUE the character denoted by the byte b in the receiving encoding is a space character.


deferred boolean
  isUpper byte b;

Return TRUE the character denoted by the byte b in the receiving encoding is a uppercase letter.


deferred byte
  toLower byte b;

Return the lowercase version of the byte b, according to the receiving encoding. If the character is not in uppercase, it is returned unharmed.


deferred byte
  toUpper byte b;

Return the uppercase version of the byte b, according to the receiving encoding. If the character is not in lowercase, it is returned unharmed.


deferred int
  digitValue byte b;

Return the numeric value of the digit denoted by the byte b in the receiving encoding.


deferred int
  alphaValue byte b;

Return the index of the letter b relative to the start of its letter range. Thus, 'a' returns 0, 'f' returns 5, etc.

class tom.CharEncoding

An instance of the CharEncoding class maintains information on on a particular mapping for encoding a subset of Unicode characters to 8-bit bytes. An example of such mappings is iso-8859-1, which is the well known western european byte encoding, of which USASCII is a subset.

inherits

State supers: State, Constants, Conditions, CharacterEncoding

variables

static MutableDictionary encodings;

Currently known encodings.

methods


ByteArray
  loadBytes int num
       from String name
  extension String ext;

Load num bytes from the file with the name and the extension ext (sans dot). The full path of the file is obtained from the main Bundle.


instance (id)
  named String name;

Return the CharEncoding known as the name. This always succeeds, as a CharEncoding reads the resources it needs on demand.

instance tom.CharEncoding

variables

public String name;

The name of this encoding.

CharArray decoding;

The decoding map.

IntDictionary encoding;

The encoding map.

ByteArray to_lower;

The byte map for conversion to lower case within the encoding.

ByteArray to_upper;

The byte map for conversion to upper case within the encoding.

ByteArray to_title;

The byte map for conversion to title case within the encoding.

ByteArray is_digit;

The bitmap for testing whether a byte is a digit.

ByteArray is_letter;

The bitmap for testing whether a byte is a letter.

ByteArray is_lower;

The bitmap for testing whether a byte is lower case.

ByteArray is_punct;

The bitmap for testing whether a byte is a punctuation character.

ByteArray is_space;

Bitmap for space predicate.

ByteArray is_upper;

The bitmap for testing whether a byte is upper case.

methods


id
  init String n;

Designated initializer.


char
  decode byte b;

Return the decoded byte b, i.e. the Unicode character corresponding to the byte b in the receiving encoding.


CharArray
  decoding;

Return the decoding map, reading it iff necessary.


byte
  encode char c;

Return the byte encoding of the character c. If the byte equivalent of the character c does not exist in the receiving encoding, an encoding-condition is signaled, and the byte encoded is the byteValue of the object returned, or 127 if nil is returned.


IntDictionary
  encoding;

Return the encoding map, creating it from the decoding map if necessary.


protected ByteArray
  loadConversion String conversion;

Load and return the conversion table for the conversion of the receiving encoding.


protected ByteArray
  loadPredicateSet String predicate;

Load and return the predicate set for the predicate of the receiving encoding.


boolean
  isAlpha byte b;

Return TRUE the character denoted by the byte b in the receiving encoding is a letter.


boolean
  isDigit byte b;

Return TRUE the character denoted by the byte b in the receiving encoding is a digit.


boolean
  isLower byte b;

Return TRUE the character denoted by the byte b in the receiving encoding is a lowercase letter.


boolean
  isPunct byte b;

Return TRUE the character denoted by the byte b in the receiving encoding is a punctuation character.


boolean
  isSpace byte b;

Return TRUE the character denoted by the byte b in the receiving encoding is a space character.


boolean
  isUpper byte b;

Return TRUE the character denoted by the byte b in the receiving encoding is a uppercase letter.


byte
  toLower byte b;

Return the lowercase version of the byte b, according to the receiving encoding. If the character is not in uppercase, it is returned unharmed.


byte
  toUpper byte b;

Return the uppercase version of the byte b, according to the receiving encoding. If the character is not in lowercase, it is returned unharmed.


int
  digitValue byte b;

Return the numeric value of the digit denoted by the byte b in the receiving encoding.


int
  alphaValue byte b;

Return the index of the letter b relative to the start of its letter range. Thus, 'a' returns 0, 'f' returns 5, etc.

class tom.USASCIIEncoding

A replacement for a real CharEncoding used during program initialization.

inherits

State supers: State, CharacterEncoding

variables

static USASCIIEncoding shared;

The one and only USASCIIEncoding object.

methods


instance (id)
  shared;

Undocumented.

instance tom.USASCIIEncoding

methods


String
  name;

We're really a dummy, so we do not have a name. In fact, that is how we're recognized.


char
  decode byte b;

This is acceptable for iso-8859-1.


byte
  encode char c;

This is acceptable for iso-8859-1.


boolean
  isAlpha byte b;

Undocumented.


boolean
  isDigit byte b;

Undocumented.


boolean
  isLower byte b;

Undocumented.


boolean
  isPunct byte b;

Undocumented.


boolean
  isSpace byte b;

Undocumented.


boolean
  isUpper byte b;

Undocumented.


byte
  toLower byte b;

Undocumented.


byte
  toUpper byte b;

Undocumented.


int
  digitValue byte b;

Undocumented.


int
  alphaValue byte b;

Undocumented.