General Knowledge about Fonts

Knowledge about rules
Knowledge about design

Knowledge about rules

Character codes

Each character is assigned its own identification number for use on a PC, and this number is referred to as the character code. Different international standards and the standards of specific countries have assigned different codes to the same character so a given character has a Unicode^® number, a Shift_JIS number, a JIS number, etc.

Character codes

Character sets

Font files are provided in the form of character sets which bring together multiple characters.

There are several types of character sets, including Microsoft Corporation's Windows^® Codepage (CP), ISO (International Organization for Standardization) standards, JIS (Japanese Industrial Standards), ARIB (Association of Radio Industries and Businesses) standards, GB (Chinese national standards), etc. The types of characters and the number of characters included differ between character sets.

Character sets for each language for fonts provided by Ricoh (sorted by language)

Japanese

Character Set	Character Type	Total Characters
CP932	JIS X 0201(Half-width）	158
	JIS X 0208(Non-Kanji）	616
	JIS X 0208 Kanji Level 1	2,965
	JIS X 0208 Kanji Level 2	3,390
	NEC special	89
	IBM^®extension	374
	NEC selected IBM^®extension	0
	Total	7,592
JIS X 0208-1990	JIS X 0208(Non-Kanji）	577
	JIS X 0208 Kanji Level 1	2,965
	JIS X 0208 Kanji Level 2	3,390
	Total	6,932
CP932+JIS X 0213:2004	JIS X 0201(Half-width）	158
	JIS X 0208(Non-Kanji）	616
	JIS X 0208 Kanji Level 1	2,965
	JIS X 0208 Kanji Level 2	3,390
	NEC special	89
	IBM^®extension	374
	NEC selected IBM^®extension	0
	JIS X 0213 Non-Kanji	575
	JIS X 0213 Kanji Level 3	1,071
	JIS X 0213 Kanji Level 4	2,348
	Total	11,586
ARIB STD-B24	JIS X 0201(Half-width）	158
	JIS X 0208(Non-Kanji）	577
	JIS X 0208 Kanji Level 1	2,965
	JIS X 0208 Kanji Level 2	3,390
	Additional code	399
	Additional kanji	137
	Total	7,626

About ARIB and ARIB-compatible fonts

ARIB (Association of Radio Industries and Businesses) is an incorporated Japanese industry association which enacts standards concerning digital broadcasts and cellular telephones in Japan. Digital broadcasts are supposed to use characters compatible with ARIB standards.

At Ricoh, we provide fonts compatible with ARIB STD-B24 (a data broadcasting encoding system and transmission system for digital broadcasting). This standard applies to broadcasting systems up to Hi-Vision HDTV.

About ARIB and ARIB-compatible fonts

Simplified Chinese characters

Used on Chinese mainland, etc.

Character Set	Character Type	Total Characters
GB18030-2005(Mandatory part)	1 Byte Half-width ASCII	95
	2 Byte/Section1(Non-Kanji)	728
	2 Byte/Section5(Non-Kanji)	166
	2 Byte/Section2(Non-Kanji)	6,763
	2 Byte/Section3 (Non-Kanji)	6,080
	2 Byte/Section4(Kanji)	8,160
	4 Byte (Kanji)	6,530
	Total	28,522

GB (GB standards)

China has national standards known as GB (GB standards), which are approved and issued by the Standardization Administration of China (SAC). The name "GB" comes from the first letters of the Chinese term for "national standards," which is "Guójiā Biāozhǔn."

Simplified Chinese character certification

When simplified Chinese characters are included on products used in China, for bitmap fonts companies are supposed to use standard fonts owned by the Chinese government and for outline fonts they are supposed to use fonts that have received Chinese government certification.

At Ricoh, we provide outline fonts (TrueType Font/RT Font) that have passed the conformity assessment carried out by the Chinese government's official certification organization (Conformance Test Center For Information Technology Standards: CTCITS). If you are searching for the standard bitmap fonts owned by the Chinese government, please inquire using the free consultation function.

Traditional Chinese characters

Used in Taiwan, etc.

Character Set	Character Type	Total Characters
CP950	Half-width characters	95
	Big5（Non-Kanji)	408
	Big5 1st Standard Collection	5,399
	Big5 2nd Standard Collection	7,652
	E-TEN	7
	Border fragment	25
	Graphic pattern	1
	Euro symbol	1
	Total	13,588

Korean

Character Set	Character Type	Total Characters
KS X 1001:2004(without hanja) + KS X 1003-1993	Half-width characters	95
	Master	94
	Symbol	895
	Hangul	2,350
	Total	3,434

European languages

Region / Language	Character Set	Total Characters
Western Europe	ISO8859-1	191
Western Europe	CP1252	218
Central Europe	ISO8859-2	191
Central Europe	CP1250	218
Southern Europe	ISO8859-3	184
Northern Europe	ISO8859-4	191
Cyrillic (Russian, etc.)	ISO8859-5	191
Cyrillic (Russian, etc.)	CP1251	223
Greek	ISO8859-7	188
Greek	CP1253	206
Turkish	ISO8859-9	191
Turkish	CP1254	216
North Germanic Languages	ISO8859-10	191
Baltic	ISO8859-13	191
Baltic	CP1257	211
Celtic	ISO8859-14	191

[ Character set support status by language ]

	ISO8859															CP
Languages	1	2	3	4	5	6	7	8	9	10	11	13	14	15	16	1250	1251	1252	1253	1257
Afrikaans	○													○				○
Albanian	○	○							○				○	○	○	○		○
Arabic						○
Baltic language family												○								○
Basque	○								○				○	○				○
Belarusian					○												○
Bosnian		○														○
Breton	○								○				○	○				○
Bulgarian					○												○
Catalan	○								○				○	○				○
Cornish (Celtic)	○								○				○					○
Croatian		○													○	○
Czech		○														○
Danish	○			○					○	○			○	○				○
Dutch	△[1]								△[1]					△[1]				△[1]
English	○	○	○	○	○	○	○	○	○	○	○	○	○	○	○	○	○	○	○	○
Esperanto			○
Estonian				○						○		○		○				○		○
Euro symbol [€(U+20AC)]							○							○	○			○	○
Faroese	○									○				○				○
Finnish	△			○					△	○		○	△	○	○			○		○
French	△		△						△				△	○	○			○
Frisian	○								○					○				○
Gaelic (Isle of Man Gaelic)													○
Galician	○								○				○	○				○
German	○	○	○	○					○	○			○	○	○	○		○
Greenlandic	○			○					○	○			○	○				○
Hebrew								○
Hungarian		○													○	○
Icelandic	○									○				○				○
Indonesian	○													○				○
Ireland Gaelic (new spelling standard)	○								○	○			○	○	○			○
Ireland Gaelic (old spelling standard)													○
Irish	○													○				○
Italian	○		○						○				○	○	○			○
Kurdish									○
Latin	○	○	○	○	○	○	○	○	○	○	○	○	○	○	○	○	○	○	○	○
Latvian				○								○								○
Lithuanian				○						○		○								○
Luxembourgish	○								○				○	○				○
Macedonian					○												○
Maltese			○
Modern Greek							○												○
North Germanic language family										○
Norwegian	○			○					○	○		○	○	○				○		○
Polish		○										○			○	○				○
Portuguese	○		○						○				○	○				○
Rhaeto-Romance	○								○				○	○				○
Rumanian		△[2]													○	△[2]
Russian					○												○
Sami				○						○
Scotland Gaelic	○								○				○	○				○
Serbian					○												○
Slovak		○														○
Slovene		○		○						○					○	○
Sorbian		○														○
Spanish	○								○				○	○				○
Swahili	○													○				○
Swedish	○			○					○	○			○	○				○
Thai											○
Turkish			△						○
Ukrainian					△												△
Welsh													○

[Legend]

○:Fully supported.
△:Almost fully supported, but there are some characters which are not supported.
[1]:[Ĳ(U+0132)/ĳ(U+0133)] missing
[2]:[Ș(U+0218)/ș(U+0219)] and [Ț(U+021A)/ț(U+021B)] (characters with comma accent) are missing, but those characters were previously unified with [Ș(U+015E)/ș(U+015F)] and [Ț(U+0162)/ț(U+0163)] (characters with cedilla).

Arabic

Character Set	Character Type	Total Characters
CP1256＋137characters	CP1256 Regulated character	223
	CP1256 Non-regulated character	137
	Total	360

Hebrew

Character Set	Character Type	Total Characters
CP1255＋82characters	CP1255 Regulated character	200
	CP1255 Non-regulated character	82
	Total	282

Thai

Character Set	Character Type	Total Characters
CP874	CP874 Regulated character	192

Vietnamese

Character Set	Character Type	Total Characters
CP1258＋104characters	CP1258 Regulated character	214
	CP1258 Non-regulated character	104
	Total	318

Hindi

Character Set	Character Type	Total Characters
Unicode Devanagari defined characters	Unicode Devanagari defined characters	155

OCR

OCR (Optical Character Recognition) is technology used to read handwritten or printed characters with an image scanner or digital camera and then convert the characters to character codes that can be used by a computer.

Character Set	Character Type	Total Characters
JIS X 9010:1984 OCR-B Basic	Numerals	10
	Alphabets (uppercase, lowercase)	52
	Symbols, etc.	31
	Others	3
	Total	96
JIS X 9010:1984 OCR-B Basic OCR-K	Numerals	10
	Alphabets (uppercase, lowercase)	52
	Symbols, etc.	31
	Others	3
	Katakana, etc.	63
	Total	159
JIS X 9006:1979 OCR-HN	Numerals	10
	Others	1
	Total	11

Barcodes

With barcodes, the characters that can be used, the character representation method, and the number of digits vary by barcode type.

Standard	Character Set	Character Types
CODE39	JIS X 0503 CODE39 basic specifications	Numbers English upper case characters Symbols
CODE128	JIS X 0504 CODE128 basic specifications	Numbers English uppercase characters English lowercase characters Symbols
NW-7	JIS X 0506 CODABAR (NW-7)	Numbers Symbols
Postal services customer barcode	Postal services customer barcode	Symbols

Knowledge about design

Fixed width (fixed pitch) and proportional

Among fonts, there are fixed-width (fixed-pitch) fonts where all characters are the same width, and there are proportional fonts where the width of each character is different.

Fixed width (fixed pitch) and proportional

In addition, even for the same proportions, character width differs depending on the font.

Fixed width (fixed pitch) and proportional

JIS90 character forms and JIS2004 character forms

In Japanese character sets, JIS X 0208 (established 1978; revised 1983, 1990, and 1997) and JIS X 0213 (established 2000; revised 2004) exist. The character forms specified in the JIS X 0208 1990 revised edition are referred to as the JIS90 character forms and the those specified in the JIS X 0213 2004 revised edition are referred to as the JIS2004 character forms. Between the various standards, the character forms for some characters differ.

Example (left: JIS90 character form, right: JIS2004 character form)

JIS90 character forms and JIS2004 character forms

Universal design fonts (UD fonts)

Fonts specially designed so that a wide range of people find them easy to read, easy to see, and difficult to misread are known as universal design fonts (UD fonts).

Ricoh's UD font policy

①"Easy to read"

・Adjust character height and width so that sentences can be read naturally.

① Easy to read

②"Easy to see"

②-1:Enlarge character internal space.

②-2:Make the differences between characters with similar forms clear.

②-3:Make the difference between dakuten and handakuten clear.

②-4:For characters with many strokes, vary the line thickness.

③"Beauty"

・Delete decorative portions to make characters simple and beautiful.

③ Beauty

・Curved portion represented with large and gently curved line.

③ Beauty

Extended characters

The term "extended characters" refers to characters not included in character sets based on the various standards (Unicode and JIS).

[When are extended characters necessary]

We recommend the use of extended characters to persons using fonts in ways such as those below.

・To use both the characters that have duplicate character codes but different character forms
・To use both new and old character forms
・To use characters to which character codes have not been assigned

[Mechanism for adding extended characters]

Add an extended character area within the font file
In the case of Unicode, a PUA (Private Use Area) is provided as an extended character area.

[Mechanism for adding extended characters]

In the case of TrueType Fonts used in Windows^®, the method shown below is also possible.

Link an extended character file (TTE format) to the font file

[Mechanism for adding extended characters]

Since extended characters are added to the font file of a specific font, if the user switches to another font, the characters added as extended characters will not be displayed.

[Mechanism for adding extended characters]

Variant characters and IVS

The term "variant character" refers to a character that has the same meaning or the same pronunciation/reading as another character but a different character form.

Variant characters and IVS

In the past, extended characters were used for the display of variant characters, but in recent years, another method has been developed.

In a font file, there may be multiple variant characters assigned the same character code, but in such cases it is necessary to have a mechanism known as IVS (Ideographic Variation Sequence/Selector) in order to differentiate how the variants with the same code are used (by specifying and displaying the character to be used).

IVS image

Specifically, the character forms are differentiated by adding at the back of the character code a branch number (E0100, E0101, etc.) to designate each character form.
This branch number is called a VS (Variation Selector).

Character-size balance when using multiple languages

When using multiple languages, it is necessary to consider the character-size balance.

European languages writing when priority is not given to the balance with the Japanese

There is the usual kind of European languages text display, but the viewer may be given the impression that there is a large
difference in character size when the text switches from one language to another.

Character-size balance when using multiple languages

European languages writing when priority is given to the balance with the Japanese

Ricoh has adopted this method.

European languages writing when priority is given to the balance with the Japanese

Ligature

The term "ligature" refers to cases where a combination of two or more characters are replaced by a single character. This term is not used for cases where two or more characters are displayed together by adjusting their positions rather than by replacing them with a single character.

Ligature

Bitmap Font replacement

Including the Bitmap Font in the TrueType Font (and RT Font) makes it possible to prevent character illegibility by giving priority to displaying with the bitmap font using the subject number of dots.

Red characters are Bitmap Font.

	When Bitmap Font included	When Bitmap Font not included
24dot
21dot
20dot
18dot
16dot
14dot
13dot
12dot
10dot
8dot
7dot

About registered trademarks and trademarks