Math Characters
- Math characters have math property
- Math characters are numeric, variable, or operator, but not a combination
- Properties are useful in parsing math plain text
- MathML doesn't use these properties: every quantity is explicitly tagged
- Properties still can be useful for inputting text for MathML (no one wants to type all those tags!)
- Sometimes default properties need to be overruled
- Would be useful to have more math properties
|
Unicode Math Characters
- 340 math chars exist in ASCII, U+2200 – U+22FF, arrows, combining marks of Unicode 3.0
- 996 math alphanumeric characters are in Unicode 3.1's Plane 1
- 591 new math symbols and operators are in Unicode 3.2's BMP
- One math variant selector
- One new combining character (reverse solidus).
|
| Unicode 3.2 adds 591 new symbols and operators and 996 new alphanumeric symbols in addition to the 340 symbols encoded in Unicode 3.0 for a total of 1927 math symbols. This repertoire is the result of input from many sources, notably from the STIX project and enables one to display virtually all standard mathematical symbols. In addition, this math support lends itself to a reasonably successful plain-text encoding that's much more compact than MathML or TeX. Note that the plain-text encoding as well as TeX and presentation MathML have ambiguities in higher math semantics. For example a superscript might mean raising the base character to a power or it might specify an element of a vector or tensor. |
| Since mathematicians and other scientists continually invent new mathematical symbols, the plan is to add them as they become accepted in the scientific communities. |
Math property |
Unicode assigns a math property to characters that are typically used for mathematics. Math characters are classified as numeric, variable, or operator, but not a combination. So if you want to use a digit as a variable or an alphabetic as an operator, you need a higher-level protocol. It would be useful to have more math properties, such as operator types like relational, binary, unary, n-ary. Presumably if such properties are defined, they would be informative, rather than normative, especially since characters might be used in other ways on occasions. |
Unicode Character Semantics |
| Unicode's character properties are useful in parsing math plain text. TeX uses some of these properties in its algorithms. MathML doesn't use these properties: every quantity is explicitly tagged. This leads to markup that's substantially more verbose than it would be if these tags were only used to overrule the default Unicode semantics. But the consensus of the MathML committee is that problems would occur if the tags are omitted even when the Unicode semantics are valid. Properties still can be useful for inputting text for MathML (no one wants to type all those tags!) as with the plain-text notation. |
Nonstandard Characters
- People will always invent new math characters that aren't yet standardized.
- Use private use area for these with a higher-level marking that these are for math.
- This approach can lead to collisions in the math community (unless a standard is maintained)
- Cut/copy in plain text can have collisions with other uses of the private use area
|
Mathematicians are by their nature inventive people and will continue to invent new symbols to express their theories. Until these symbols are used by a number of people, they shouldn't be standardized. Nevertheless, one needs a way to handle these symbols in their initial nonstandard usage. The private use area (0xE000 – 0xF8FF) can be used for such nonstandard symbols. It's a tricky business, since the PUA is used for many purposes. For example, it's used on Microsoft operating systems to round-trip codes that aren't currently in Unicode, most notably many Chinese characters. The precise usage may well change since many such symbols may be assigned to plane 2 (Extension B) and hence are now standardized. When using the PUA, it's a good idea to have higher-level backup to define what kind of characters are involved. If they are used as math symbols, it would be good to assign them a math attribute that's maintained in a rich-text layer parallel to the plain text. Such layers are used by rich-text programs such as Microsoft Word and Internet Explorer. |