About BhashaIndia | Contribute | SiteMap | Register | Sign in to Windows Live ID
  Developers Patrons
Hindi Tamil Kannada Gujarati Marathi Telugu Bengali Malayalam Punjabi Konkani Oriya Sanskrit Nepali
Home > Developers > KnowHow > UnicodeMaths > Markup Welcome Guest!

Markup

Presentation markup

Presentation markup directs how the math should be rendered.

<mrow>
    <mi>E</mi>
    <mo>=</mo>
    <mrow>
        <mi>m</mi>
        <mo>&InvisibleTimes;</mo>
        <msup>
            <mi>c</mi>
            <mn>2</mn>
        </msup>
    </mrow>
</mrow>

Each MathML element falls into one of three categories: presentation elements, content elements and interface elements. Just as titles, sections, and paragraphs capture the level syntactic structure of a textual document, presentation elements are meant to express the syntactic structure of math notation. Content elements describe mathematical objects directly, as opposed to describing the notation which represents them.

Content markup

Content markup describes the meaning of the expression, not the format.

<rel>
    <eq/>
    <ci>E</ci>
    <apply>
        <times>
            <ci>m</ci>
            <apply>
                <power/>
                <ci>c</ci>
                <cn>2</cn>
            </apply>
        </times>
    </apply>
</rel>

Unicode and Markup

  • Unicode was never intended to represent all aspects of text
  • Language attribute: sort order, word breaks
  • Rich (fancy) text formatting: built-up fractions
  • Content tags: headings, abstract, author, figure
  • Glyph variants: Poetica font: 58 ampersands; Mantinia font: novel ligatures (TT, TE, etc.)
  • MathML adds XML tags for math constructs, but seems awfully wordy

There is a gray zone between rich (fancy) and plain text: embedded codes. In fact, general rich text can be represented using plain text with embedded fields, as illustrated by Hewlett-Packard's PCL5 print format and various markup languages. A problem with embedded rich text is that it's hard to edit, since cursor movement involves skipping over embedded fields, and the text can confuse various text scanning programs, such as spelling and grammar checkers. Unicode defines a BiDi (bidirectional) algorithm for mixing left-to-right and right-to-left text that does use a few embedded codes, such as U+200E (left-to-right mark) and U+200F (right-to-left mark). Similarly one can embed plane-14 language codes in a plain-text document, although it is recommended to use higher-level markup for such purposes. In this talk, we discuss the addition of a few characters that lets most mathematical expressions be represented using plain text with a couple of embedded symbols.

Partner Profile | Privacy Statement | Why Passport | Testimonials
This site uses Unicode for non-English characters and uses Open Type fonts.
©2003-2007 Microsoft Corporation. All rights reserved.