Tuesday, October 11 2022

Writing hindi using devanagari package in overleaf

Rishi Raj Jain
Rishi Raj Jain @rishi_raj_jain_

Are you finding it difficult to write in Hindi in Overleaf? You've hit the right resource. While with formats like XeLaTeX or LuaLaTeX it's an easy job, writing Hindi in pdfLaTeX follows an unconventional approach. This guide is heavily inspired from this answer, and is only meant to give visibility to the solution available. This guide also aims to cover the mistakes as found in the answer above, and provide helper code for the same.

XeLaTeX or LuaLaTeX

For writing hindi in XeLaTeX or LuaLaTeX, the following ways make it happen:

1. Via package fontspec:

\documentclass{article}
\usepackage{fontspec} % Load the fontspec package
\newfontfamily\hindifont{Noto Sans Devanagari}[Script=Devanagari] % Set script to Devanagari

\begin{document}
{\hindifont नमस्कार} % Writing hindi is flawless
\end{document}

2. Via package polyglossia:

\documentclass{article}
\usepackage{polyglossia}
\setmainlanguage{english} % Continue using english for rest of the document
\setotherlanguages{hindi} % Now can use \texthindi
\newfontfamily\hindifont{Noto Sans Devanagari}[Script=Devanagari]

\begin{document}
\texthindi{नमस्कार} % Writing hindi is flawless
\end{document}

Both of the mentioned ways are Unicode-aware. But commonly used platforms like arXiv, IEEE or Elsevier explicitly require the submissions compilable with engines that do not support unicode-aware font(s), like pdflatex or latex.

PdfLaTeX

To make hindi compatible and native to PdfLaTeX, we follow two steps. 1. Transliteration from hindi to Velthuis, and 2. Pre-processing the file with a python script to jump to a PdfLaTeX compatible version.

Step 1: Convert your Hindi text to Velthuis

Enter your hindi text at https://shreevatsa.appspot.com/sanskrit/transliterate.html, to get the Velthuis transliteration of the same.

https://a.storyblok.com/f/117912/1851x473/b4a9699213/screenshot_2021-09-13_at_9-49-39_am.png

Step 2: Pre-Process your content

[Skip this step if you're using Kaggle] Install a Devanagari processor: https://tex.stackexchange.com/questions/33360/how-can-i-compile-the-devanagari-preprocessor-in-linux?answertab=votes#tab-top

Create a file name.dn with the transliterated content (make sure you add \dn before each transliterated content):

touch name.dn
cat >> name.dn
\documentclass{article}
\usepackage{devanagari}

\begin{document}
{\dn namaskaara} % velthuis transliteration of नमस्कार
\end{document}

Run the following script in the same directory as of the file created above:

# Process.py
# The repaired version of https://gist.github.com/shreevatsa/4d476ac26a367fa68984d8c06867d7dd#file-get-dn-py

from __future__ import unicode_literals

import os
import re
import subprocess
import sys

consonants = {
    0x0915: ['k'],
    0x0916: ['kh'],
    0x0917: ['g'],
    0x0918: ['gh'],
    0x0919: ['"n'],
    0x091A: ['c'],
    0x091B: ['ch'],
    0x091C: ['j'],
    0x091D: ['jh'],
    0x091E: ['~n'],
    0x091F: ['.t'],
    0x0920: ['.th'],
    0x0921: ['.d'],
    0x0922: ['.dh'],
    0x0923: ['.n'],
    0x0924: ['t'],
    0x0925: ['th'],
    0x0926: ['d'],
    0x0927: ['dh'],
    0x0928: ['n'],
    0x092A: ['p'],
    0x092B: ['ph'],
    0x092C: ['b'],
    0x092D: ['bh'],
    0x092E: ['m'],
    0x092F: ['y'],
    0x0930: ['r'],
    0x0932: ['l'],
    0x0933: ['L'],
    0x0935: ['v'],
    0x0936: ['"s'],
    0x0937: ['.s'],
    0x0938: ['s'],
    0x0939: ['h'],
    0x0958: ['q'],
    0x0959: ['.kh'],
    0x095A: ['.g'],
    0x095B: ['z'],
    0x095C: ['R'],
    0x095D: ['Rh'],
    0x095E: ['f'],
}
vowel_signs = {
    0x093E: ['aa'],
    0x093F: ['i'],
    0x0940: ['ii'],
    0x0941: ['u'],
    0x0942: ['uu'],
    0x0943: ['.r'],
    0x0944: ['.R', '.r.r'],
    0x0947: ['e'],
    0x0948: ['ai'],
    0x0949: ['~o'],
    0x094B: ['o'],
    0x094C: ['au'],
    0x0962: ['.l'],
    0x0963: ['.ll', '.l.l'],
}
vowels = {
    0x0905: ['a'],
    0x0906: ['aa'],
    0x0907: ['i'],
    0x0908: ['ii'],
    0x0909: ['u'],
    0x090A: ['uu'],
    0x090B: ['.r'],
    0x090C: ['.l'],
    0x090F: ['e'],
    0x0910: ['ai'],
    0x0913: ['o'],
    0x0914: ['au'],
    0x0960: ['.R'],
    0x0961: ['.L'],
    0x0972: ['~a'],
}
other = {
    # 0x002E: ['..'],
    0x0901: ['/'],
    0x0902: ['.m'],
    0x0903: ['.h'],
    0x093D: ['.a'],
    0x094D: ['&'],
    0x0950: ['.o'],
    0x0964: ['|'],
    0x0965: ['||'],
    0x0966: ['0'],
    0x0967: ['1'],
    0x0968: ['2'],
    0x0969: ['3'],
    0x096A: ['4'],
    0x096B: ['5'],
    0x096C: ['6'],
    0x096D: ['7'],
    0x096E: ['8'],
    0x096F: ['9'],
    0x0970: ['@'],
    0x0971: ['#'],
}

re_consonant = '|'.join(chr(n) for n in consonants)
re_vowel_sign = '|'.join(chr(n) for n in vowel_signs)
re_vowel = '|'.join(chr(n) for n in vowels)
re_other = '|'.join(chr(n) for n in other)
re_virama = chr(0x094D)
re_a = vowels[0x0905][0]  # 'a'

def velthuis(devanagari):
    text = devanagari
    text = re.sub('(%s)(%s)' % (re_consonant, re_vowel_sign),
                  lambda match: consonants[ord(match.group(1))][0] + vowel_signs[ord(match.group(2))][0],
                  text)
    text = re.sub('(%s)(%s)' % (re_consonant, re_virama),
                  lambda match: consonants[ord(match.group(1))][0],
                  text)
    text = re.sub('(%s)' % re_consonant,
                  lambda match: consonants[ord(match.group(1))][0] + re_a,
                  text)
    text = re.sub('(%s)' % re_vowel,
                  lambda match: vowels[ord(match.group(1))][0],
                  text)
    text = re.sub('(%s)' % re_other,
                  lambda match: other[ord(match.group(1))][0],
                  text)
    return text

def wikner(devanagari):
    text = devanagari
    text = re.sub('(%s)(%s)' % (re_consonant, re_vowel_sign),
                  lambda match: consonants[ord(match.group(1))][-1] + vowel_signs[ord(match.group(2))][-1],
                  text)
    text = re.sub('(%s)(%s)' % (re_consonant, re_virama),
                  lambda match: consonants[ord(match.group(1))][-1],
                  text)
    text = re.sub('(%s)' % re_consonant,
                  lambda match: consonants[ord(match.group(1))][-1] + re_a,
                  text)
    text = re.sub('(%s)' % re_vowel,
                  lambda match: vowels[ord(match.group(1))][-1],
                  text)
    text = re.sub('(%s)' % re_other,
                  lambda match: other[ord(match.group(1))][-1],
                  text)
    return text

random_filename = 'lwfzal3XBeV8H10I8f4n'

# Added support of nukta from https://gist.github.com/ritwikmishra/9f8d6de45aff8fbe959d4260269d9eeb
nukta_dict = {
        'क़': 'क़',
        'ख़': 'ख़',
        'ग़': 'ग़',
        'ज़': 'ज़',
        'ड़': 'ड़',
        'ढ़': 'ढ़',
        'फ़': 'फ़',
        'य़': 'य़',
        'ऱ': 'ऱ',
        'ऴ': 'ऴ'
}

def get_preprocessed(filename, ext):
    preprocessor = {
        'dn': 'devnag',
        'skt': './skt',
    }
    assert ext in preprocessor.keys(), ext
    text = open(filename).read()
        for k in nukta_dict.keys():
                text = text.replace(k,nukta_dict[k])
    transliterated = velthuis(text) if ext == 'dn' else wikner(text)
    infile = '%s-%s.%s' % (random_filename, ext, ext)
    open(infile, 'w').write(r'{\%s %s}' % (ext, transliterated))
    p = subprocess.Popen([preprocessor[ext], infile],
                         stdout=subprocess.PIPE,
                         stderr=subprocess.PIPE,
                         close_fds=True)
    out, err, ret = p.stdout.read(), p.stderr.read(), p.returncode
    if err or ret:
        print ('input: <%s>' % text.encode('utf-8'))
        print ('transliterated: <%s>' % transliterated)
        print ('stdout: <%s>' % out)
        print ('stderr: <%s>' % err)
        print ('returned: <%s>' % ret)
        raise ValueError
    outfile = '%s-%s.tex' % (random_filename, ext)
    translation = open(outfile).read()
    os.remove(outfile)
    os.remove(infile)
    prefix = r'\def\DevnagVersion{2.15}{\dn ' if ext == 'dn' else r'{\skt '
    translation = translation[len(prefix):-1]
    return translation

ext= 'dn' # Compiler format
filename= '../input/santshr/name.dn' # Source of the file
print(get_preprocessed(filename, ext)) # PdfLaTeX ready content

For example, on running the python script with name.dn as created above, the following output is observed. Using the same, won't work in overleaf, and hence you need to pick up the content selectively between \begin{document} and \end{document}.

https://a.storyblok.com/f/117912/358x196/ef9e00c8c5/screenshot_2021-09-13_at_10-30-55_am.png

Voila! You're ready to submit 🚀

If you find any errors while going through the process, feel free to drop an email at: rishi18304@iiitd.ac.in

Write a comment

Email will remain confidential.