Mkt1font, vpl2vpl and vpl2ovp: programs to generate accented fonts ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ John D. Smith ~~~~~~~~~~~~~ These three programs, all made available under the GNU General Public License, address the same requirement using the same basic algorithms: their aim is to make it easy to create versions of existing fonts containing whatever accented characters the user may need, arranged according to whatever encoding he/she may favour. Mkt1font does this by reading in the two files that define a Type 1 PostScript font and writing out new versions of them; vpl2vpl does it by reading in the file that defines a TeX virtual font and writing out a new version of it; vpl2ovp does the same as vpl2vpl, but its output is a 16-bit virtual font suitable for Omega, the Unicode-aware development from TeX. In each case information about what accented characters are required and where they should be located is supplied by means of a simple definition file, which has the same format for both programs. Note that most Type 1 PostScript fonts are commercial products subject to both copyright and licensing restrictions. If the license for such a font forbids reverse-engineering it, it is presumably illegal to run mkt1font on it; in any case it would certainly be a serious breach of copyright to make modified versions of the font available to third parties. The author of mkt1font will not be held responsible for any such misuses of the software. (No such legal problems surround the use of TeX virtual fonts, which merely specify new ways to use existing fonts.) However, would-be users may wish to note the existence of a body of high-quality freeware lookalikes for many of the standard Type 1 fonts: like mkt1font and vpl2vpl, the Ghostscript fonts developed by URW++ Design and Development Incorporated are made available under the GNU General Public License (see the file COPYING), which permits them to be modified and redistributed provided that certain simple decencies are observed. These fonts are available from many ftp sites, including those forming the Comprehensive TeX Archive Network (CTAN), for example ftp.tex.ac.uk: look for the most recent font archive file in the directory /tex-archive/support/ghostscript/gnu/fonts. A detailed specification of the form of the definition file used by mkt1font and vpl2vpl is given below in the instructions for using the programs, but it may be helpful to give a quick idea of its contents here. Each line of the file contains something similar to "128 x acute", which is an instruction to create a new character formed from "x" with an acute accent and located at position 128 in the character set. Once a new character has been defined in this way, it "exists" in just the same way as any other character, under the name "xacute" (or whatever); thus it can even receive further accents. The two lines 129 e macron 130 emacron breve will create a "long e" and a "long-or-short e" at the positions specified. (If only the latter is desired, they can both be assigned the same position in the encoding -- the second will overwrite the first.) Subscript as well as superscript accents are available. Some sample definition files are included to give an idea of the format: note that two of these (CSX.def and Norman.def), used for Indian language material in Roman script, contain the triple-accented character "runderdotmacron acute"! In addition to accented characters proper, both programs can produce digraphs -- single characters composed of pairs of other characters. In itself this may seem to be of limited use, since the digraph should always be indistinguishable from its two constituent characters printed consecutively. However, some Unicode-style encodings require that certain character-pairs be treated as digraphs. In addition, use of a digraph in subsequent definitions allows an accent to be applied to two characters as a pair: so, for example, 131 k h 131 kh underbar will produce a character consisting of an underlined "kh" at position 131. The algorithms used in both programs take great care to secure the most attractive results, placing accents centrally over/under their characters at appropriate heights/depths, varying their position according to the angle of slant in Italic and similar fonts, providing new characters with kerning information (which will be resolutely ignored by most word-processors), and -- in the case of mkt1font -- giving hinting instructions to improve output on low-resolution printers. However, users may feel that some characters need "tweaking". This tends to happen when the part of the character adjacent to the accent is highly asymmetrical: subscript accents below "r" may need moving to the left, while superscript accents above "d" may need moving to the right. It is relatively easy to modify the output of vpl2vpl to make such improvements by hand (vpl files are designed to be easy for humans to read, and vpl2vpl identifies each character with a comment). Type 1 fonts are another matter; mkt1font generates a human-readable and commented ".dis" file as well as the ".pfb" font file, and it is possible to make changes to this and rebuild the pfb file with t1asm (see below), but this is not quite as simple as modifying a TeX virtual font. For details on how vpl2ovp handles access to accented characters by means of a system of ligatures, see further the CHANGES file. The programs are written in Perl, and have not been tested on systems other than Unix. (Apart from the #! line at the top, the only Unix specificities the author is aware of are two references to "/dev/null" in mkt1font and the general assumption that filenames can be of any length.) At run time, mkt1font invokes two further font utilities, t1asm and t1disasm, written by I. Lee Hetherington and available from CTAN ftp sites: look in the directory /tex-archive/fonts/utilities/t1utils. Detailed instructions for using the programs follow. In each case these instructions can also be seen by typing the name of the program with a "-h" option. Mkt1font ~~~~~~~~ Syntax: mkt1font -n fontname -d definition-file -a afm-file -f font-file [-s shrink-factor] [-c candrabindu-adjustment] [-b] Mkt1font creates new Type 1 PostScript fonts based on existing fonts ("input fonts"). In order to do this it makes use of I. Lee Hetherington's programs t1asm and t1disasm, which must be present on the system. A successful run will generate an AFM file and a PFB file, as well as a DIS file containing a disassembled version of the PFB file (including useful comments). Four options must be specified on the command line, as follows: -n should name the font it is intended to generate. Avoid names such as "myfont", as both mkt1font and other programs attempt to draw conclusions from the name; better would be something like "Utopia_French-BoldItalic". The generated font files will also use this for their basename. -d should refer to a font definition file. This file (which could usefully be named, e.g., "French.def") should consist of lines of character definitions, in the form "number" "character" or "number" "character" "accent" Here "number" represents the character's position in the new encoding and may be expressed in decimal, octal or hex; "character" names the character (e.g. "comma", "eight", "A") or consists of the word ".notdef" (indicating that the specified number's "slot" in the new encoding is to be empty); and "accent" optionally names an accent to be placed on the character. In addition to the standard accents available in PostScript fonts, "underbar" and "underdot" are also available, as are "under" versions of all the normal superscript accents ("underdieresis", "underring", etc.). The Indian accent "candrabindu" may also be specified: it is formed by overprinting a breve with a dotaccent. Finally, "overdot" may be used as a synonym for "dotaccent". If the character named in the "accent" position is not in fact a valid accent character, the program interprets the definition as a request for a digraph formed from the "character" and the "accent". A digraph consisting of, say, "k" and "h" will be indistinguishable from the letters "k" and "h" printed consecutively, but the digraph "kh" can itself receive accents like any other character: see next paragraph. A new character (such as "amacron" or "kh") may be freely used in the "character" position of a further definition (such as "amacron breve" or "kh underbar"). There is no constraint on the ordering of definitions within a definition file. The definition of "a macron" does not have to precede that of "amacron breve": requests for "impossible" characters are deferred until their constituents have had a chance to come into being. "Slots" for which no new definition is given retain the definition they have in the input font. The definition file may also contain blank lines and comments (introduced by "#"). -a should refer to the AFM (Adobe Font Metrics) file for the input font. -f should refer to the binary font file (PFB) for the input font. (In fact the equivalent ASCII file (PFA) is also acceptable for input, but the output font is always in PFB format.) -s may optionally give the factor, expressed as a per-thousand value, by which normally superscript accents (such as dieresis, ring) should be shrunk when they are used as subscript accents (such as underdieresis, underring). Values of around 800 may be found useful. -c may optionally give two comma-separated numerical values to adjust the x and y coordinates of the dotaccent placed within a breve to form the candrabindu accent. -b may optionally be specified to block the use of predefined accented characters, forcing mkt1font to define its own versions. This may be useful to secure a consistent appearance in cases where a font designer does not share mkt1font's views on where accents should be placed. The -h option prints this help. Vpl2vpl ~~~~~~~ Syntax: vpl2vpl -d definition-file [-s shrink-factor] [-c candrabindu-adjustment] [-b] vpl-file Vpl2vpl creates new TeX virtual fonts based on existing fonts or virtual fonts ("input fonts"). A successful run will read a pl (Property List) or vpl (Virtual Property List) file and a definition file, and will generate a new vpl (Virtual Property List) file on standard output. The input font is assumed to adhere to the standard TeX encoding for text fonts unless it was created with either of the programs afm2pl or afm2tfm, in which case it is assumed to conform to (respectively) the Adobe Standard Encoding or the encoding specified in the file dvips.enc. In either case, the name of the input font is assumed to be the name of the input file without its .vpl or .pl extension: it must conform to normal TeX conventions for naming fonts, as vpl2vpl attempts to draw conclusions from it about the kind of font it is dealing with. A typical complete sequence of commands to create a new virtual font might therefore be tftopl cmr10.tfm cmr10.pl vpl2vpl -d ISO-Latin1.def cmr10.pl >cmr10_isol1.vpl vptovf cmr10_isol1.vpl cmr10_isol1.vf cmr10_isol1.tfm for a Computer Modern font, or afm2pl Times-Roman.afm rptmr.pl pltotf rptmr.pl rptmr.tfm vpl2vpl -d ISO-Latin1.def rptmr.pl >ptmr-isol1.vpl vptovf ptmr-isol1.vpl ptmr-isol1.vf ptmr-isol1.tfm for a PostScript font. Another approach for a PostScript font is to use afm2tfm: afm2tfm Times-Roman.afm -t dvips.enc -v ptmr rptmr vpl2vpl -d ISO-Latin1.def ptmr.vpl >ptmr-isol1.vpl vptovf ptmr-isol1.vpl ptmr-isol1.vf ptmr-isol1.tfm -- but this is now deprecated, as afm2tfm generates incorrect values for the heights of some characters, and this can lead to bad accent placing. In order to keep the whole upper half of the character set free for the requirements of the encoding specified in the definition file, certain modifications are made to input fonts following the dvips.enc encoding to bring them into greater conformity with the TeX norm. In particular, the characters dotaccent and hungarumlaut are placed in the positions assigned by TeX ("5F, "7D), not those enforced by dvips.enc ("C7, "CD). The f-ligatures, double quotes and dashes are also moved from the upper half of the character set to their normal TeX positions. As a result, the following characters are not found in the lower half of the character set: quotesingle, quotedbl, backslash, underscore, braceleft, bar, braceright. These characters can, however, be assigned positions in the output font if they are needed. (Indeed, they could all be explicitly restored to their dvips.enc positions if this were desired.) Options: -d should refer to a font definition file. This file (which could usefully be named, e.g., "French.def") should consist of lines of character definitions, in the form "number" "character" or "number" "character" "accent" Here "number" represents the character's position in the new encoding and may be expressed in decimal, octal or hex; "character" names the character (e.g. "comma", "eight", "A") or consists of the word ".notdef" (indicating that the specified number's "slot" in the new encoding is to be empty); and "accent" optionally names an accent to be placed on the character. In addition to the standard accents available in PostScript fonts, "underbar" and "underdot" are also available, as are "under" versions of all the normal superscript accents ("underdieresis", "underring", etc.). The Indian accent "candrabindu" may also be specified: it is formed by overprinting a breve with a dotaccent. Finally, "overdot" may be used as a synonym for "dotaccent". If the character named in the "accent" position is not in fact a valid accent character, the program interprets the definition as a request for a digraph formed from the "character" and the "accent". A digraph consisting of, say, "k" and "h" will be indistinguishable from the letters "k" and "h" printed consecutively, but the digraph "kh" can itself receive accents like any other character: see next paragraph. A new character (such as "amacron" or "kh") may be freely used in the "character" position of a further definition (such as "amacron breve" or "kh underbar"). There is no constraint on the ordering of definitions within a definition file. The definition of "a macron" does not have to precede that of "amacron breve": requests for "impossible" characters are deferred until their constituents have had a chance to come into being. "Slots" for which no new definition is given retain the definition they have in the input font. The definition file may also contain blank lines and comments (introduced by "#"). -s may optionally give the factor, expressed as a per-thousand value, by which normally superscript accents (such as dieresis, ring) should be shrunk when they are used as subscript accents (such as underdieresis, underring). Values of around 800 may be found useful. -c may optionally give two comma-separated numerical values to adjust the x and y coordinates of the dotaccent placed within a breve to form the candrabindu accent. A coordinate scheme using "DESIGNUNITS R 1000" is assumed. -b may optionally be specified to block the use of predefined accented characters, forcing vpl2vpl to define its own versions. This may be useful to secure a consistent appearance in cases where a font designer does not share vpl2vpl's views on where accents should be placed. -h prints this help. Vpl2ovp ~~~~~~~ Syntax: vpl2ovp -d definition-file [-s shrink-factor] [-c candrabindu-adjustment] [-b] vpl-file Vpl2ovp creates new Omega virtual fonts based on existing TeX fonts or virtual fonts ("input fonts"). A successful run will read a pl (Property List) or vpl (Virtual Property List) file and a definition file, and will generate a new ovp (Omega Virtual Property List) file on standard output. The input font is assumed to adhere to the standard TeX encoding for text fonts unless it was created with either of the programs afm2pl or afm2tfm, in which case it is assumed to conform to (respectively) the Adobe Standard Encoding or the encoding specified in the file dvips.enc. In either case, the name of the input font is assumed to be the name of the input file without its .vpl or .pl extension: it must conform to normal TeX conventions for naming fonts, as vpl2ovp attempts to draw conclusions from it about the kind of font it is dealing with. A typical complete sequence of commands to create a new virtual font might therefore be tftopl cmr10.tfm cmr10.pl vpl2ovp -d Unicode1.def cmr10.pl >cmr10-uni1.ovp ovp2ovf cmr10-uni1.ovp cmr10-uni1.ovf cmr10-uni1.ofm for a Computer Modern font, or afm2pl Times-Roman.afm rptmr.pl pltotf rptmr.pl rptmr.tfm vpl2ovp -d Unicode1.def rptmr.pl >ptmr-uni1.ovp ovp2ovf ptmr-uni1.ovp ptmr-uni1.ovf ptmr-uni1.ofm for a PostScript font. Another approach for a PostScript font is to use afm2tfm: afm2tfm Times-Roman.afm -t dvips.enc -v ptmr rptmr vpl2ovp -d Unicode1.def ptmr.vpl >ptmr-uni1.ovp ovp2ovf ptmr-uni1.ovp ptmr-uni1.ovf ptmr-uni1.ofm -- but this is now deprecated, as afm2tfm generates incorrect values for the heights of some characters, and this can lead to bad accent placing. In order to keep the whole of the character range "F0-"FF free for the requirements of the encoding specified in the definition file, certain modifications are made to input fonts following the dvips.enc encoding to bring them into greater conformity with the TeX norm. In particular, the characters dotaccent and hungarumlaut are placed in the positions assigned by TeX ("5F, "7D), not those enforced by dvips.enc ("C7, "CD). The f-ligatures, double quotes and dashes are also moved from the upper half of the original 8-bit character set to their normal TeX positions. As a result, the following characters are not found in the lower half of the character set: quotesingle, quotedbl, backslash, underscore, braceleft, bar, braceright. These characters can, however, be assigned positions in the output font if they are needed. (Indeed, they could all be explicitly restored to their dvips.enc positions if this were desired.) Options: -d should refer to a font definition file. This file (which could usefully be named, e.g., "Unicode1.def") should consist of lines of character definitions, in the form "number" "character" or "number" "character" "accent" Here "number" represents the character's position in the new encoding and may be expressed in decimal, octal or hex; "character" names the character (e.g. "comma", "eight", "A") or consists of the word ".notdef" (indicating that the specified number's "slot" in the new encoding is to be empty); and "accent" optionally names an accent to be placed on the character. In addition to the standard accents available in PostScript fonts, "underbar" and "underdot" are also available, as are "under" versions of all the normal superscript accents ("underdieresis", "underring", etc.). The Indian accent "candrabindu" may also be specified: it is formed by overprinting a breve with a dotaccent. Finally, "overdot" may be used as a synonym for "dotaccent". Note that all accents used in defining accented characters must themselves be defined in the .def file. Those which exist in the source font should simply be referenced by name in their appropriate Unicode position (e.g. "0x0304 macron"); those which do not should be defined as the character "space" followed by the name of the accent (e.g. "0x0310 space candrabindu"). If the character named in the "accent" position is not in fact a valid accent character, the program interprets the definition as a request for a digraph formed from the "character" and the "accent". A digraph consisting of, say, "k" and "h" will be indistinguishable from the letters "k" and "h" printed consecutively, but the digraph "kh" can itself receive accents like any other character: see next paragraph. A new character (such as "amacron" or "kh") may be freely used in the "character" position of a further definition (such as "amacron breve" or "kh underbar"). There is no constraint on the ordering of definitions within a definition file. The definition of "a macron" does not have to precede that of "amacron breve": requests for "impossible" characters are deferred until their constituents have had a chance to come into being. "Slots" for which no new definition is given retain the definition they have in the input font. The definition file may also contain blank lines and comments (introduced by "#"). -s may optionally give the factor, expressed as a per-thousand value, by which normally superscript accents (such as dieresis, ring) should be shrunk when they are used as subscript accents (such as underdieresis, underring). Values of around 800 may be found useful. -c may optionally give two comma-separated numerical values to adjust the x and y coordinates of the dotaccent placed within a breve to form the candrabindu accent. A coordinate scheme using "DESIGNUNITS R 1000" is assumed. -b may optionally be specified to block the use of predefined accented characters, forcing vpl2ovp to define its own versions. This may be useful to secure a consistent appearance in cases where a font designer does not share vpl2ovp's views on where accents should be placed. -h prints this help. John D. Smith jds10@cam.ac.uk http://bombay.indology.info