|
Culture Description:
File Formats |
|
Introduction
For each culture to which the Required Library may be ported there
are four culture-dependent description files which are produced by lcc, the cultural 'compiler'.
The textual descriptions as specified by the ISO/IEC standards 14651 -
International String Ordering - Method for comparing Character Strings and
Description of the Common Template Tailorable Ordering and 14652 -
Information Technology - Specifications for Cultural Conventions (see
Sather language specification section 2).
In addition to the three binary files produced in this way to enable a
program to operate in a culture-independent manner, there needs to be various
message files (using the local repertoire and
encoding).
All of these files have various components which are written to the file
preceded by an octet giving the count of following octets (which may, of
course, be zero). This is specified in the following structure tables using
the term Sized
(xxx) which shall be read as indicating an
object no bigger than 256 octets the first of which gives the number of
following octets in the object's binary string representation. This is
particularly used for indicating textual strings of arbitrary size in the
target encoding.
Since each file contains many components of different object values, the
structure is described in terms of these objects, which in turn may be
specified of other objects until at some 'level' actual values are on the file
itself.
NOTES
- Figures in the table columns headed 'Octets' give the
count of octets containing the value. Unless a Sather Required Library
class name is given in the comments column or the name includes the
wording 'bit-pattern' then the entity is normally an unsigned numeric
whole number. There is one exception to the unsigned number rule which is
indicated by a note to the relevant table.
- All multiple octet values are stored with the most significant octet
first on the file. In general this does not necessarily apply to
sized objects which are considered to be in binary string form ready for
sizing without any alteration to the binary string.
Repertoire Map File
This file contains data which enables codes to be converted to tokens and
vice versa. This is needed when the tokens are used in establishing ordering
weights when comparing such things as character strings. The file consists of
the sections described in the three following tables in the order given.
Header
Entity |
Octets |
Name |
map size |
4 |
a |
octets per token |
1 |
x |
octets per code |
1 |
y |
Inmap Table
Entity |
Octets |
Name |
>a times< |
|
|
list size |
1 |
b |
|
>b times< |
|
|
Entity |
Octets |
token bit-pattern |
x |
code bit-pattern |
y |
|
Outmap Table
Entity |
Octets |
Name |
>a times< |
|
|
token bit-pattern |
x |
|
list size |
1 |
c |
|
>c times< |
|
|
Entity |
Octets |
code value |
y |
|
Ordering Specification File
While part of the cultural specifications, this file is used in string
ordering, containing as it does the weights attached for ordering purposes to
particular tokens (codes). This file, therefore, contains both a large
sequence of tokens with their corresponding weights as well as one or more
ranges of tokens when some common set of weights can be applied to all tokens
in some range. Note that the lettered size values are unique within the
entire file!
Header
Entity |
Octets |
Name |
code size |
1 |
|
script count |
1 |
a |
list size |
1 |
c |
|
>a times< |
|
|
|
<Undefined Entry> |
|
|
Script Entry
Entity |
Octets |
Name |
<Rules> |
|
|
table size |
4 |
b |
|
>b times< |
|
|
|
ranges size |
4 |
c |
|
>c times< |
|
|
|
Rules
Entity |
Octets |
Name |
rule count |
1 |
d |
|
>d times< |
|
|
Entity |
Octets |
Comment |
forward |
1 |
class BOOL |
positional |
1 |
class BOOL |
|
Table Entry
Entity |
Sized(token value) |
<Weightings> |
Weightings
Entity |
Octets |
|
>d times< |
|
Entity |
Octets |
Name |
element count |
1 |
e |
|
>e times< |
|
|
Entity |
Sized(element
value) |
|
|
Range Entry
Entity |
Sized(start code value) |
Sized(start code weight) |
Sized(end code value) |
Sized(end code weight) |
<Weightings> |
Undefined Entry
Entity |
<Rules> |
<Weightings> |
Cultural Specification File
This last file in the group specified by the international standards
contains all of the remaining cultural specification components as indicated
in ISO/IEC 14652. It is the
smallest of these three files, despite the complexity of data structures it
contains.
Note that the Boolean values are represented by the bit-patterns
0x0
for false
and 0x1
for true
. If one of the 'present' components is
false then the next following table entry is omitted from the file. This is
indicated by the entry being written between brackets - "[" and "]".
Header
Entity |
Octets |
Comment |
code kind |
1 |
class CODE_KINDS |
<Char Groups> |
|
|
<Cash Data> |
|
|
<Time Data> |
|
|
<Number Data> |
|
|
<Answers> |
|
|
paper size present |
1 |
class BOOL |
[<Paper Size>] |
|
|
names present |
1 |
class BOOL |
[<Name Data>] |
|
|
address present |
1 |
class BOOL |
[<Address Data>] |
|
|
phone present |
1 |
class BOOL |
[<Phone Data>] |
|
|
units present |
1 |
class BOOL |
[<Measurement Units>] |
|
|
version present |
1 |
class BOOL |
[<Version Data>] |
|
|
Char Groups
Entity |
Octets |
Name |
Comment |
classes count |
1 |
a |
|
|
>a times< |
|
|
|
Entity |
Octets |
Comment |
class kind |
1 |
class CODE_KINDS |
<One Class> |
|
|
|
"0xffffffff " |
4 |
|
a flag |
map count |
1 |
b |
|
|
>b times< |
|
|
|
Entity |
Octets |
|
map kind |
1 |
class CHAR_MAPPINGS |
<One Map> |
|
|
|
"0xffffffff " |
4 |
|
a flag |
One Class
Entity |
Octets |
Name |
|
code size (octets) |
1 |
a |
|
range count |
1 |
b |
|
|
>b times< |
|
|
|
Entity |
Octets |
range low |
a |
range high |
a |
|
One Map
Entity |
Octets |
Name |
|
code size (octets) |
1 |
a |
|
maplet count |
1 |
b |
|
|
>b times< |
|
|
|
Entity |
Octets |
base code |
a |
offset |
a |
count |
a |
|
Note that in the above structure the offset is a signed number
formed from the given number of octets.
Cash Data
Entity |
Octets |
Comment |
<Cash Format> |
|
local |
<Cash Format> |
|
international |
<Cash Format> |
|
duo local |
<Cash Format> |
|
duo international |
Sized(valid from date) |
|
|
Sized(valid to date) |
|
|
Sized(duo valid from date) |
|
|
Sized(duo valid to date) |
|
|
exchange rate numerator |
4 |
|
exchange rate denominator |
4 |
|
Cash Format
Entity |
Octets |
Name |
Comment |
code kind |
1 |
|
class CODE_KINDS |
Sized(decimal mark) |
|
|
|
Sized(thousands separator) |
|
|
|
Sized(positive sign) |
|
|
|
Sized(negative sign) |
|
|
|
section count |
1 |
a |
|
|
>a times< |
|
|
|
Entity |
Octets |
section digit count |
1 |
|
Sized(currency symbol) |
|
|
|
places precision |
1 |
|
|
plus symbol precedes |
1 |
|
class BOOL |
plus separation spec |
1 |
|
class MON_SPACING |
plus sign position |
1 |
|
class SIGN_POSITIONS |
minus symbol precedes |
1 |
|
class BOOL |
minus separation spec |
1 |
|
class MON_SPACING |
minus sign position |
1 |
|
class SIGN_POSITIONS |
Note that there are a number of single octet logical values in the time and
date structure which have the bit-pattern 0x1 if true - in which case the
following entity is next. If the value is 0x0 (false) then the following
entity is omitted from the file. The entity is written in square brackets to
indicate this possibility.
Time Data
Entity |
Octets |
Comment |
date format present |
1 |
|
[<Date/Time Format>] |
|
date |
time format present |
1 |
|
[<Date/Time Format>] |
|
time |
date/time format present |
1 |
|
[<Date/Time Format>] |
|
date and time |
era date format present |
1 |
|
[<Date/Time Format>] |
|
era date |
era format present |
1 |
|
[<Date/Time Format>] |
|
era |
era year format present |
1 |
|
[<Date/Time Format>] |
|
era year |
relative time format present |
1 |
|
[<Date/Time Format>] |
|
relative time |
weekday count |
1 |
|
first day in week |
1 |
|
first week contains |
1 |
|
first day display |
1 |
|
first workday |
1 |
|
calendar direction |
1 |
class CAL_DISPLAY_ORDERS |
time zones present |
1 |
|
[<Time Zones>] |
|
|
alt digits present |
1 |
|
[<Alternate Digits>] |
|
|
Date/Time Format
Entity |
Octets |
Name |
Comment |
code kind |
1 |
|
class CODE_KINDS |
Sized(prefix) |
|
|
|
element count |
1 |
a |
|
|
>a times< |
|
|
|
Entity |
Octets |
Comments |
modifier |
1 |
class DT_COMPS |
component |
1 |
class DT_COMPS |
Sized(separator) |
|
|
|
Time Zones
Entity |
Octets |
Name |
time zone count |
1 |
a |
|
>a times< |
|
|
|
Time Zone
Entity |
Octets |
Comment |
Sized(standard time name) |
|
|
<Elapsed Time> |
|
standard time offset |
standard offset added |
1 |
if West of UTC |
Sized(daylight saving time
name) |
|
|
<Elapsed Time> |
|
daylight saving time offset |
daylight saving offset added |
1 |
if West of UTC |
rule count |
1 |
a |
|
>a times< |
|
|
|
Elapsed Time
Entity |
Octets |
days since 31 Dec 1899 |
4 |
milliseconds |
4 |
Note that the days component of the elapsed time will inevitable be zero in
this context!
TZ Rule
Entity |
Octets |
Comment |
range low |
4 |
|
range high |
4 |
|
<Rule Element> |
|
start DST |
<Rule Element> |
|
stop DST |
Rule Element
Entity |
Octets |
Comments |
day number |
2 |
in week |
week number |
1 |
in month |
month |
1 |
in year |
time of day |
4 |
in milliseconds |
Alternate Digits
Entity |
Octets |
Name |
|
digit count |
1 |
a |
|
|
>a times< |
|
|
|
|
Note that the alternate digits are given in sequence from the encoding for
the numeric value zero upwards.
Number Format
Entity |
Octets |
Name |
Comments |
code kind |
1 |
|
class CODE_KINDS |
Sized(decimal mark) |
|
|
|
Sized(thousands separator) |
|
|
|
number of sections |
1 |
a |
|
|
>a times< |
|
|
|
Entity |
Octets |
section length (chars) |
|
|
Answers
Reg Expression |
Sized(Yes) |
Sized(No) |
Paper Size
Entity |
Octets |
Comments |
height |
4 |
in millimetres |
width |
4 |
in millimetres |
Name
Data
Entity |
Octets |
format present |
1 |
[<Name Format>] |
|
cardinality |
1 |
Sized(General) |
|
Sized(Mr) |
|
Sized(Mrs) |
|
Sized(Miss) |
|
Sized(Ms) |
|
Name Format
Entity |
Octets |
Name |
Comments |
code kind |
1 |
|
class CODE_KINDS |
Sized(prefix) |
|
|
|
component count |
1 |
a |
|
|
>a times< |
|
|
|
Entity |
Octets |
Comments |
modifier code |
1 |
class
NAME_COMPS |
component code |
1 |
class
NAME_COMPS |
Sized(separator) |
|
|
|
Address Data
Entity |
Octets |
Name |
Comments |
<Address Format> |
|
|
|
code kind |
1 |
|
class CODE_KINDS |
component count |
1 |
a |
|
|
>a times< |
|
|
|
Entity |
Octets |
Comments |
key |
1 |
class
ADDRESS_ELEMS |
Sized(data string) |
|
|
|
Address Format
Entity |
Octets |
Name |
Comments |
code kind |
1 |
|
class CODE_KINDS |
Sized(prefix) |
|
|
|
component count |
1 |
a |
|
|
>a times< |
|
|
|
Entity |
Octets |
Comments |
modifier code |
1 |
class
ADDR_COMPS |
component code |
1 |
class
ADDR_COMPS |
Sized(separator) |
|
|
|
Phone Data
Entity |
Octets |
Comment |
international format present |
1 |
class BOOL |
[<Phone Format>] |
|
international |
domestic format present |
1 |
class BOOL |
[<Phone Format>] |
|
domestic |
Sized(foreign access) |
|
|
Sized(domestic access) |
|
|
Phone Format
Entity |
Octets |
Name |
Comments |
code kind |
1 |
|
class CODE_KINDS |
Sized(prefix) |
|
|
|
component count |
1 |
a |
|
|
>a times< |
|
|
|
Entity |
Octets |
Comments |
modifier code |
1 |
class
PHONE_COMPS |
component code |
1 |
class
PHONE_COMPS |
Sized(separator) |
|
|
|
Version Data
Entity |
Octets |
Name |
Comment |
code kind |
1 |
|
class CODE_KIND |
element count |
1 |
a |
|
|
>a times< |
|
|
|
|
category count |
1 |
b |
|
|
>b times< |
|
|
|
Entity |
Octets |
Comment |
category spec |
1 |
class
CATEGORIES |
|
date |
4 |
|
days since 31 Dec 1899 |
Culture-dependent Sather Environment Description File
This file contains the target culture character codes for a number of
punctuation (and other) marks used either in parsing the cultural
specification files or in generating the above binary files. They are
expressed as names taken from the textual form of the repertoire map
specification which appear in this file as 34 encodings. These are
followed by file mode strings and file path puctuation symbols related to the
run-time environment on any particular computer system. The reader is
referred to the source files in
SATHER_HOME/resources/lcc-Data/definitions/sather
for
detailed information.
Comments
or enquiries should be made to Keith Hopper.
Page last modified: Thursday, 9 March
2000. |
|