Diacritics, or diacritical marks, are those curious glyphs added to a letter. The term derives from Ancient Greek.
Many differing methods have been adopted over the years meaning unfortunately that there is no standard way of representing Pali's diacritic characters via the then limited character sets available on PCs. As a result the student will encounter a variety of legacy approaches some of which include:
However, the introduction of modern Unicode fonts has meant the problem of representing diacritics is now trivial. Unicode fonts include Tahoma, Arial MS Unicode and the latest version of Times New Roman (in MS word from 2010 onward). And these fonts have all the characters we need. But it's a pain to have to insert them as special symbols...
To aid input of diacritics I've created a JavaScript tool: Pali diacritics converter tool. Simply type in Velthuis character combinations as below and they will magically change into Unicode.
Here's a table for codes comparisons:
ā ī ū ṅ ñ ṇ ṭ ṭh ḍ ḍh ṇ ḷ ṃ ṁ ŋPali is a phonetic language and has no written alphabet of its own. Ever since the 1st century, scholars have relied on their own native alphabets to write Pali ! European scholars have thus transliterated Pali into the Roman alphabet and this required its augmentation with additional characters represented by letter-pairs and diacritics. This was fine whilst Pali literature was mainly printed, but with the introduction of computers, the problem arose of how to represent these characters within a standard ASCII font.
Many differing methods have been adopted over the years meaning unfortunately that there is no standard way of representing Pali's diacritic characters via the then limited character sets available on PCs. As a result the student will encounter a variety of legacy approaches some of which include:
- Ignoring them altogether. This method is used by Access to Insight. For example:
panatipata veramani sikkha-padam samadiyami.
- Another method is to represent long vowels with doubled characters ( aa ii uu); and placing punctuation marks before letters to represent the consonants ( .r .t .th .d .dh .n .m .s .l ~n "n). This is also called the Velthuis system. For example:
paa.naatipaataa verama.nii sikkhaa-pada.m samaadiyaami.
- The development of HTML lead a few to use some sort of HTML accent: ä ï ü, à ì ù, â î û ñ etc. Example:
pâ.nâtipâtâ verama.nî sikkhâ-pada.m samâdiyâmi.
- Others have employed capitalized letters to represent the diacritics. Though simple, it is hard to distinguish between the palatal and guttural n. Example:
pANAtipAtA veramaNI sikkhA-padaM samAdiyAmi.
However, the introduction of modern Unicode fonts has meant the problem of representing diacritics is now trivial. Unicode fonts include Tahoma, Arial MS Unicode and the latest version of Times New Roman (in MS word from 2010 onward). And these fonts have all the characters we need. But it's a pain to have to insert them as special symbols...
So how can I type them? -Direct Input
This brings us to the second problem which is how to enter these characters using a standard keyboard. Many tools/apps have created their own methods; often through menu selection.To aid input of diacritics I've created a JavaScript tool: Pali diacritics converter tool. Simply type in Velthuis character combinations as below and they will magically change into Unicode.
Here's a table for codes comparisons:
character
|
UDP
|
Unicode number
|
Velthuis
|
HTML code
|
ā
|
a macron
|
\u0101
|
aa
|
ā
|
ñ
|
n tilde
|
\u00f1
|
~n
|
ñ
|
ī
|
i macron
|
\u012b
|
ii
|
ī
|
ḍ
|
d dot-under
|
\u1e0d
|
.d
|
ḍ
|
ṅ
|
n dot-over
|
\u1e45
|
“n
|
ṇ
|
ḷ
|
l dot-under
|
\u1e37
|
.l
|
ḷ
|
ṭ
|
t dot-under
|
\u1e6d
|
.t
|
ṭ
|
ṁ
|
m dot-over
|
\u1e41
|
“m
|
ṁ
|
ū
|
u macron
|
\u016b
|
uu
|
ū
|
ṇ
|
n dot-under
|
\u1e47
|
.n
|
ṇ
|
ṃ
|
m dot-under
|
\u1e43
|
.m
|
ṃ
|
ŋ
|
|
\u014b
|
|
ŋ
|
What I really want is: Pali Keyboard
There are also dedicated keyboard tools, for instance see the Pali Keyboard, which allow direct typing of diacritic characters when using word processors, web browsers etc.. by simple key combinations. Simply follow the instructions to install and start typing away...A quick note on looking up words - the Niggahīta
Another source of variation among texts is the representation of the nasal niggahīta sound (also called anusvāra), which in western script it has been transliterated as η, ṁ or ṃ.
Just to add to the confusion, when occurring in the middle of a word, the ‘ṃ’ in some Pali texts can be substituted by a nasal - ṅ, ñ, ṇ, n, or just plain m; which means some texts spell words with ‘ṃ’ and some with a nasal! We will look at this issue in the next post Pali alphabet & Dictionaries.
Dictionaries generally use the nasal. So if you come across a Pali word which has ‘ṃ’ in the middle of it, you have to replace it with a nasal in order to find the word in the dictionary! Also the order of words in dictionaries containing the niggahīta ṃ can be difficult to navigate and so I recommend using the search function.
Again, as an aid I've created another tool Pali Dictionary lookup tool that will take Unicode and re-format to be consistent with the PED. It also produces direct links to any entry too!
Again, as an aid I've created another tool Pali Dictionary lookup tool that will take Unicode and re-format to be consistent with the PED. It also produces direct links to any entry too!
As an aside, it’s also important to note that ‘ti the marker of direct speech affects the spelling of the word immediately preceding it (due to Sandhi) in two ways: an immediately preceding vowel becomes lengthened and the niggahīta ṃ changes to a nasal before ‘ti and sometimes ‘ca. So, when looking up words, these effects must first be reversed.
See the next post on the Pali alphabet & Dictionaries.
See the next post on the Pali alphabet & Dictionaries.
Comments