forMarkus Kuhn
Resume:Do not use the ASCII backtick (0x60) as left quotes together with the ASCII apostrophe (0x27) as the corresponding right quotes (as in'other'). Otherwise, your text will look quite strange with most modern fonts (for example, on Windows and Mac systems). Only older X Window System fonts and some older video terminals display ASCII 0x60/0x27 as left and right quotes, while most modern systems follow ISO and Unicode standards. If you can only use ASCII typewriter characters, use the apostrophe character (0x27) as left and right quotes (as in'to quote'). If you can use Unicode characters, directional quotes are available as the characters U+2018, U+2019, U+201C, and U+201D (as in'to quote'o"to quote").
Background
oUnicodeand ISO 10646 standards define the following characters:
U+0022 | BLADE | ![]() | neutral (vertical), used as an opening or closing quotation mark; the preferred English characters for double quotes are U+201C and U+201D |
U+0027 | APOSTROPHE | ![]() | mixed-use neutral (vertical) glyph; the preferred character for the apostrophe is U+2019; the preferred English characters for double quotes are U+2018 and U+2019 |
U+0060 | GRAVE ACCENT | ![]() | |
U+00B4 | acute accent | ![]() | |
U+2018 | SINGLE LEFT BLADE | ![]() | |
Sub+2019 | SINGLE RIGHT BLADE | ![]() | this is the preferred character to use as an apostrophe |
U+201C | DOUBLE LEFT QUOTES | ![]() | |
U+201D | CORRECT DOUBLE QUOTES | ![]() |
ASCII and ISO 8859 were only designed to support the very restricted type style available to typewriter users. The two ASCII characters
0x22 | BLADE | ![]() |
0x27 | APOSTROPHE | ![]() |
they supposedly represent the neutral (vertical) glyphs commonly used on typewriters. They shouldnoused as directional quotation marks.
ISO 8859 and Unicode fonts must display both accented characters
0x60 | GRAVE ACCENT | ![]() |
0xB4 | acute accent | ![]() |
as mutually symmetric shapes.
The problem
Unfortunately, X Window System fonts have long contained the following mutually symmetrical glyphs:
0x27 | APOSTROPHE | ![]() |
0x60 | GRAVE ACCENT | ![]() |
These forms were even sanctioned by a North American version of the ISO646 (ANSI X3.4, aka ASCII) standard, which defined 0x27 as "apostrophe (closing single quote; acute accent)", but should have been changed when fonts were expanded. to cover ISO 8859-1, which added a separate acute accent at 0xB4. Obviously, you can't have 0x27/0x60 and 0x60/0xB4 as mutually symmetrical glyphs and at the same time have a different shape for 0x27 and 0xB4. Since 0x60/0xB4 are defined as accents by modern standards, their symmetrical form takes precedence, except that this wasn't fixed in X sources until 2004 (slightly earlier in versions that shipped with XFree86).
The old X sources encouraged some Unix software and documentation authors to abuse 0x60 together with 0x27 as directional quotes. This practice seemed somewhat acceptable as
precio
if it was displayed with old X fonts, but it looked pretty ugly as
precio
in most other modern display environments (for example, with properly designed Windows and Mac TrueType fonts, but also in many vintage video terminals from the 1970s and 1980s, such as those from Siemens/Nixdorf and many other manufacturers ).
For example, 0x60 and 0x27 appear in Windows NT 4.0 with the Lucida Console TrueType font (size 14) like this:
Unicode and ISO 10646 make a very clear distinction between the undirected typewriter-style ASCII single quote and the apostrophe U+0027 as in
precio
and the smart quotes U+2018 and U+2019 like this
precio
Unicode 2.1explicitly says that U+2019 is the preferred punctuation apostrophe, as in "We've been here before." The Unicode standard also notes:
“For historical reasons, U+0027 is a particularly overloaded character. In ASCII, it is used to represent a punctuation mark (such as right single quote, left single quote, apostrophe, punctuation, vertical line, or prime) or a letter modifier (such as apostrophe modifier or acute accent). (Punctuation marks generally separate words; modifier letters are generally considered part of a word.) In many systems, it is always represented as a straight vertical line and can never represent a curved apostrophe or proper quotation mark.
To do?
If you create any Unix software, be sure to use the ASCII character 0x60 (`) as leading quotes as in'other'. Change it to use the character 0x27 (') on both sides, as in'to quote'. If you work in an environment where UTF-8 encoding is already used everywhere (for example, Plan9 and newer GNU/Linux installations), you may even decide to use proper directional quotes, as in'to quote'o"to quote".
Check your source code directories with
grep\`*
to find out where modifications are needed. Then use (with due care!) something like
perl -pi.bak -e "s/\`/'/g;"archivo1 archivo2...
to make the necessary replacements automatically or make the edits manually.
The use of 0x60 (grave accent) as a special control character in the Unix shell (to indicate command substitution as in`command`or better$(command)), in Perl, inLisp, or in TeX/troff (to denote a proper left single quote) does not need to be changed and remains unchanged. by Donald KnuthText book(chapter 2, page 3, end of second paragraph) has warned TeX users since 1986 that forms of apostrophes and backticks may appear as required by ISO and Unicode and not as used in the rest of the TeXbook. The Unix m4 macroprocessor is probably the only widely used tool that uses the `quote' combination as part of its input syntax; however, even this could be modified viachange quote.
Why should we fix this?
There are several reasons why the old X sources had to be corrected, and with them the associated ASCII backquoting practice:
- Obviously, the grave accent and the acute accent must be mutually symmetrical, which was not the case in the old X fonts.
- oUnicode4.0The standard explicitly says that U+0027 is a "mixed-use neutral (vertical) glyph" and displays the entire ASCII section like this:
- The ISO 10646 standard,ISO8859and ISO 646/ECMA-6the patterns also show the upright typewriter apostrophe for U+0027 and have U+0060 and U+00B4 as accents symmetrical to each other.
- The ANSI X3.4:1986 (“ASCII”) code table, which was printed with the OCR-B font, also shows the upright typewriter apostrophe.historically, the originally proposed use of 0x60 in the 7-bit international coded character set was as a backtick (ISO TC97/SC meeting 2, 29-31 October 1963), and its meaning was only later expanded in implementation pattern US a also covers usage as a left single quote (MCCA 8(4)207-214,1965).
- Most European keyboards have labels for the apostrophe and both accents. They have always resembled the ISO and Unicode standards. The photo below shows the relevant highlighted keys on a standard German PC keyboard, which has the acute/grave accent key on the left and the number sign/apostrophe key below the backspace key:
It can cause some confusion for users if the key labels and glyph shapes in fonts don't match, as they did in older Xfonts.
- Microsoft and Apple fonts also follow modern standards and don't agree with older X fonts. X11 users really shouldn't be fooled about how characters they use will appear on other standards-compliant systems. Otherwise, you won't notice that, for example, all users of a Windows web browser (screenshot: Internet Explorer 5) see "back quotes" as in(Video) Unicode, UTF 8 and ASCII
- Since XFree86 4.0 was addedSupport for TrueType fonts, users of GNU/Linux systems are increasingly using modern fonts with the straight glyph 0x27 and getting funny quotes with older software that tries to display ASCII directional quotes (mostly variousFIELDpackages).
- The characters 0x27 (apostrophe) and 0x22 (quotes) are often used to abbreviate minutes and seconds or feet and inches, which is another reason why 0x27 should be a version of 0x22 with a single hyphen, not a directional quote.
Updated X Window System Basic BDF Fontsthey have been available since 1998, in which the apostrophe and grave accent were fixed, along with various other errors. They have replaced the old fonts in XFree86 since version 4.0 and in the sample X.Org implementation since X11R6.8.
related tips
Postscript
PostScript has a rather complicated history of how it maps ASCII bytes to glyphs. In PostScript fonts, each glyph is identified not by a code position, but by aglyph nameas "single quote". After the publication of the Unicode standard, Adobe released aPostScriptGlyph name for Unicode mappingtable. When a PostScript interpreter displays text, it uses acoding vectorto map 8-bit values found in text strings to glyph names found in fonts.
Unicode | glyph Picture | Postscript | ||||
---|---|---|---|---|---|---|
glyph name | coding vector | |||||
position | Name | Pattern | ISOLatin1 | CE | ||
U+0022 | BLADE | ![]() | cited | 0x22 | 0x22 | 0x22 |
U+0027 | APOSTROPHE | ![]() | simple blades | 0xA9 | — | 0x27 |
U+0060 | GRAVE ACCENT | ![]() | cave | 0xC1 | 0x91 | 0x60 |
U+00B4 | acute accent | ![]() | sharp | 0xC2 | 0x92/0xB4 | 0xB4 |
U+2018 | SINGLE LEFT BLADE | ![]() | left quote | 0x60 | 0x60 | 0x91 |
Sub+2019 | SINGLE RIGHT BLADE | ![]() | right of appointment | 0x27 | 0x27 | 0x92 |
U+201C | DOUBLE LEFT QUOTES | ![]() | quotedblleft | 0xAA | — | 0x93 |
U+201D | CORRECT DOUBLE QUOTES | ![]() | well quoted | 0xBA | — | 0x94 |
PostScript provides several predefined 8-bit encoding vectors. Printer driver authors can easily add their own. As the table above shows, the originalStandard PostScript encodingit followed a practice similar to the old X fonts, with all its flaws, that is, it assigned the ASCII bytes 0x60 and 0x27 to the opening and closing quotation marks ("quoteleft" and "quoteright" in PostScript glyph naming terminology, or U+ 2018 and EU+2019 in Unicode).
When ISO 8859-1 came out, Adobe added another predefined encoding vector to PostScript calledISOLatin1 encoding. This was supposed to be compatible with ISO 8859-1, but remained at 0x60 and 0x27 unchanged from the old one.standard encodingvector and therefore does not correctly print the ISO 8859-1 characters 0x27 and 0x60, which correspond to the Unicode characters U+0027 and U+0060 and must be represented by the PostScript glyphs “grave” and “quotesingle”. Adobe AuthorsPostScript Language Reference, Third Edition(Addison-Wesley, ISBN0-201-37922-8) acknowledge this in section E.5, footnote 3, page 783, where they note that the “ISOLatin1 encodingthe encoding vector deviates from the ISO 8859-1 standard” and that an application that wants to “exactly comply with the ISO standard must create a modified encoding vector”. The newer CE encoding vector (Central Europe, corresponding to Windows CP1250), now also described in the PostScript Language Reference, correctly assigns 0x27 to "quotesingle" and 0x60 to "severe".
If you write a PostScript driver, use the officialUnicode to PostScript mapping tableto map ASCII, ISO 8859, and ISO 10646 characters to PostScript glyphs, just like the Type 1 renderer updated in XFree86 4.0. don't use theISOLatin1 encodingencoding vector to print ISO 8859-1 text, without first changing it to assign 0x27 to "quotesingle" and 0x60 to "severe". (Also, you may want to assign 0x2D = DASH-MINUS to the PostScript "dash" glyph instead of the "minus" assignment used byISOLatin1 encoding).
TeXGenericName
The fountaincmtt10in the Computer Modern family of TeX follows the example of the standard PostScript encoding by providing straight double quotes and directional single quotes at ASCII positions 0x22, 0x60, and 0x27. It also provides a single quote, grave accent, and acute accent at code positions 0x0d, 0x12, and 0x13 respectively, but lacks directional double quotes:
U+0022 ASPEDIOS | " | ![]() |
U+0027 APOSTROPH | \char"0D | ![]() |
U+0060 GRAVE ACCENT | \char"12 | ![]() |
U+00B4 ACUTE ACCENT | \char"13 | ![]() |
U+2018 LEFT SIMPLE QUOTES | ` | ![]() |
U+2019 SINGLE RIGHT BLADE | ' | ![]() |
So, to demonstrate the result of abusing ASCII straight quotes and backticks as directional quotes in a document written in LaTeX, you can write\texttt{\char"12quote\char"0D}. Non-typewriter fonts in Computer Modern do not have single or double quotes.
Usa LaTeXascending quote package(\usepackage{mention ascendant}) to map in literal modes the ASCII characters 0x27 and 0x60 to the correct glyphs.
References
- Michel Everson:In the apostrophe and quotation marks, with a note on the transliteration of Egyptian characters, ISO/IEC Working Group Document JTC1/SC2/WG2 N2043,1999-07-24
- Adobe:Unicode names and glyphs, 1997–2003.
- Frequently Asked Questions about UTF-8 and Unicode for Unix/Linux
- Unicode fonts and tools for X11
- Bruno Haible explainshow to put quotes in Unicode in a portable way using GNUgettext.
- The Unicode Standard, version 4.0, Addison-Wesley, 2003, ISBN0321185781.
- Jukka Korpela:Character histories: notes on some positions of the ASCII code.
- Marcos Kuhn:Confusion of apostrophe and high accent. This is a page about the common mistake of incorrectly using the acute and grave accent U+00B4 or U+0060 as an apostrophe instead of the apostrophe character itself U+0027 or better U+2019. This is now a frequent mistake, made by users of German, Swedish, Spanish and other PC keyboards, where the acute accent key is easier to reach than the (altered) apostrophe key. The high/low key should always be without spaces, so that it is less likely to be misused to insert erroneous apostrophes.
- David A. Wheeler:CurlingQuote en HTML, SGML y XML.
created 12/19/1999 – last modified 12/11/2007 –http://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html