Each of the shaping engines in Uniscribe contains the shaping knowledge for a particular script or closely related group of scripts. This shaping knowledge focuses on the basic element of each script, which will vary depending on the nature of the writing system. In the Indic scripts, for example, the basic element that needs to be processed is the syllable; in the Arabic script the basic element is always a pair of letters, with the second letter of a pair becoming the first letter of the next.
Uniscribe analyses and prepares strings of Unicode text by breaking runs i.e. strings of text in a single script with uniform formatting—into clusters corresponding to the basic element for that script. The kind of character preprocessing that some complex scripts require—reordering of certain characters in the string.
Character preprocessing takes logical order text as supplied by the client, and outputs it in a form that can take efficient advantage of glyph processing. Once this pre-processing is complete, Uniscribe takes advantage of OpenType Layout Services to render complex scripts, activating specific layout features based on cluster analysis.
Uniscribe is an OTLS client, using the library functions to apply specific OpenType Layout features that are required to correctly render complex scripts. This does not preclude an application that uses Uniscribe from also being an OTLS client itself. Applications may use Uniscribe for basic rendering of complex script text, but interact with OTLS directly to offer users additional discretionary typographic features, such as stylistic variant forms. These discretionary features would be enabled by applying OpenType Layout lookups to the GIDs received from Uniscribe.
In this example, a short piece of complex script text is shown as it makes its way from input to rendering. The focus will be on what happens to the text within Uniscribe. The sample text is a single word in the Sanskrit language, as written in the Devanagari script. It is a long, compound word that exhibits many of the character pre-processing and OTL feature requirements of Indic scripts. The word is extracted from a short sentence in the Aitreyopanishad the sample word is indicated in red and transliterated below.
Note: In this example, the Sanskrit text is displayed in the Microsoft Devanagari UI font, Mangal. This is not an ideal font for classical Sanskrit, being somewhat simplified and with a limited set of ligatured conjunct forms, but it has the benefit of being well hinted for low resolution which will make the illustrations easier to follow.
Here are the characters in the backing string for our sample word, as input by the user using the Windows 2000 Sanskrit keyboard. Beneath it are the codepoints stored by the application in logical order.
Because we are dealing with a single word, using a single font at a single size, our sample constitutes a run of text, as understood by both Uniscribe and OTLS. The first task of the Indic script shaping engine is to break the run into clusters. As mentioned above, the basic element of script processing for Indic scripts is the syllable, so the result of this operation will be separation of our word into syllables. The script shaping engine makes use of Unicode character properties to identify the different types of characters in the run, and its own knowledge of the possible relationships of these characters to identify syllable boundaries. Here are the characters, still in logical order, separated into clusters as indicated by the blue bars.
Once the run is separated into clusters, the shaping engine analyses each cluster to determine if any character reordering is necessary. The rules for character processing in Devanagari are explained in the Unicode Standard. [8] In the next illustration, the characters affected by reordering are indicated in red.
Only four clusters in our sample run require character reordering; three of these involve below-base and above-base forms of the ra consonant, and the other involves moving the i matra to the left of the consonant conjunct. Once this character processing has been done, Uniscribe calls text layout functions in OTLS to apply OpenType substitution and positioning features. All the features required to render the Indic scripts supported by Uniscribe are published in the OpenType specification.
Uniscribe completely insulates client applications from the shaping knowledge required for complex scripts. Once Uniscribe has finished calling OTLS to apply the required features for a complex script like Devanagari, it can pass the glyph string back to the application and to device drivers and system font rasterizers. The application, meanwhile, only needs to manage the original backing string of logical order Unicode text.
Uniscribe never changes the backing string, and any character reordering required by Unicode script shaping rules occurs in a buffer. Uniscribe maintains one index from the buffered characters to the original backing string, and another from the buffered characters to the font glyph string. Client applications can utilize additional Uniscribe APIs to control cursor positioning and caret movement in the rendered text.
Uniscribe likewise insulates font developers from complex script shaping requirements by taking on the task of analyzing clusters and preparing them for OpenType layout. This means that type designers and developers can work with efficient and predictable sets of lookups and features, rather than trying to define the incredibly large number of complicated contextual lookups that would be necessary to render directly from the Unicode backing string. Because the OpenType lookup types were not designed to perform all the reordering required by Unicode shaping rules, some complex script rendering would be impossible without the kind of character preprocessing available in Uniscribe.