239x Filetype PDF File size 0.60 MB Source: www.ijsrcsams.com
ISSN 2319 – 1953
International Journal of Scientific Research in Computer Science Applications and Management Studies
An Efficient English To Hindi Translator
1 2 3 4
Dhawal Jain , Aditi Jadhav , Ateeq Ansari , Aditi Raut
1,2,3,4(Department of Computer Engineering, St. John College Of Engineering & Management, Palghar, Maharashtra, India)
1 jaindhawal05@gmail.com,2 jadhavadi25@gmail.com,3ateeqnsr8@gmail.com,4 aditir@sjcet.co.in
Abstract— Machine Translation pertains to a translation of one can translate English into several regional languages. Also,
natural language to another by using automated computing. The several websites are in English, which are of no use to rural
primary objective is to fill the language gap between two different people, as they do not know English, thus are unable to
languages speaking people, communities or countries. India is a understand the information given on the site. Hence a translator
multilingual country; different states have different territorial is needed which can convert English to Hindi which can be
languages, but not all Indians are polyglots. There are 18 easily understood by the people.
constitutional languages and ten prominent scripts. The majority
of the Indians, especially the remote villagers, do not understand, II. LITERATURE REVIEW
read or write English, therefore implementing an efficient The paper focuses on rule-based machine translation. It is
language translator is needed. Machine translation systems that
translate text from one language to another will enhance the based on corpus management and multilingual database. The
enlightened society of Indians without any language barrier. system architecture comprises of the parser and morphological
English, being a universal language and Hindi, the language used tools which analyses grammar of source language and then
by the majority of Indians, we propose an English to Hindi transform it into the target language. The method suggested in
machine translation system design based on Recurrent neural the paper [1] requires a deep understanding of the grammatical
network(RNN), LSTM(Long short-term memory) and attention structure of both source and target language.
mechanism. Statistical machine translation is done using statistics. The
idea behind this comes from information theory. The
Keywords— RNN, LSTM, Attention mechanism. translation Is done according to the probability distribution.
I. INTRODUCTION The method suggested in the paper [2] uses Bayes decision rule
Machine translation has been in the process of development and statistical theory to minimize errors. The approach
since 1940. Machine translation has been in the process of discussed in this paper has a word alignment problem between
growth since 1940. Machine Translation system translates text phrases and language modeling problem.
or speech from one natural language to another language. [3] Hybrid mechanism, i.e., a combination of rule-based and
Machine translation is needed to convert the document or text statistical based machine translation is used for conversion. The
to our native language from other commonly known languages. architecture comprises of the splitter, parser, declension tagger,
It overcomes the lingual barriers. NLP is the field of CS that sentence rules, reordering, lexical dictionary, and translator. In
strives to fill this gap. Neural Machine Translation requires this paper, the source language is passed through splitter in
minimum domain knowledge and is conceptually simple. A which sentence is divided into words, and then parser analyses
vast neural network is trained and can generate very long word the syntax and semantic structure. Declension tagger inflects
sequences. The model does explicitly store large phrase tables noun, adjective, pronoun to indicate singular, plural, case,
and language models, unlike standard machine translation gender. Then the reordering is done and using lexical rule the
system. The first successful demonstration of the MT system is source language is translated into a target language.
done by the collaboration of Georgetown University and IBM The paper [4] is based on the neural machine translation.
in the year 1965. The importance of Machine Translation arises Architecture discussed in this paper comprises the encoder,
from the socio-political significance of translation in decoder, residual connection, etc. This approach is based on
communities where more than one language is spoken. Besides, modeling the conditional probability of translating a source
the concept of attention mechanism is used. sentence to the target sentence. This approach provides a more
Hindi is a widely spoken language as well as the principal accurate translation
official language of India, whereas English is spoken III. METHODOLOGY
worldwide, hence is an internationally well-known language.
From the British period, English as a verbal language was A. Architecture Diagram:
introduced in India. Thus, both English & Hindi are major The System consist of the following modules:
languages, both primarily used. Thus, there is a need to build a 1. Encoder-Decoder Model
translator for converting one to another. Here we are going to 2. LSTM
study English to Hindi translation. Presently awareness has 3. Attention Mechanism
been developed in India to use regional languages like Hindi
for government document writing and other purposes. In this
context, it has become essential to creating an MT system that
IJSRCSAMS
Volume 8, Issue 1 (January 2019) www.ijsrcsams.com
ISSN 2319 – 1953
International Journal of Scientific Research in Computer Science Applications and Management Studies
architecture comprises of a memory cell, an input gate, an
output gate, and a forget gate.
Input Gate: The input gate is responsible for the addition of
information to the cell state.
Forget Gate: A forget gate is responsible for removing
information from the cell state.
Output Gate: Produces the output.
Fig. 1. Architecture Diagram
B. Encoder-Decoder Model:
IT is a way of organizing recurrent neural networks(RNN)
to tackle sequence-to-sequence projection issue where the
count of input and output time steps differ. The model was build
for the matter of machine translation, such as translating
sentences in English to Hindi.
The model involves two sub-models, as follows: Fig. 3. Long Short Term Memory
Encoder: Encoder is an RNN model that reads the entire source D. Attention Mechanism:
sequence to a fixed-length encoding.
Decoder: Decoder is an RNN model that uses the encoded input The encoder-decoder model is an end-to-end model that
sequence and decodes it to output the target sequence. performs well on challenging sequence-to-sequence prediction
The figure shows the relationship between the encoder and the problems such as machine translation. The model appears to be
decoder models. limited on very long sequences. The reason for this is the fixed-
length encoding of the source sequence. Attention is a
mechanism that provides a first encoding of the source
sequence from which to build up a context vector which can
then be used by the decoder. Attention mechanism allows the
model to learn what encoded words in the source sentence pay
attention to and to what degree during the forecast of each word
in the target sentence. The hidden state for each input is
assembled from encoder rather than the hidden state of the final
Fig. 2. Encoder-Decoder Model step of the source sequence. A context vector is build up
The LSTM recurrent neural network is used as the encoder especially for each output word in the target sentence. First,
and decoder. The encoder output describes the source sequence, each hidden state value from the encoder is attained using a
which is used to begin the converting process, trained on the neural network, and then it is normalized to a probability over
words already produced as output so far. The hidden state of an the encoder's hidden states. Finally, the possibilities are used to
encoder for the final time step of the input is used to start the determine a weighted sum of the encoder-hidden states to
state of the decoder. produce a context vector to be used in the decoder.
C. Long Short-term Memory:
Long short-term memory units are units of a recurrent neural
network. An RNN composed of LSTM units is often called an
LSTM network. The cell remembers values over arbitrary time
intervals, and the three gates regulate the flow of information
into and out of the cell. There are several architectures of LSTM
units. An LSTM cell takes input and stores it for some time, it
is equivalent to applying the identity function is constant, when
an LSTM network is trained with backpropagation through
time, the gradient does not vanish. The activation function of
the LSTM gate is often the logistic function. A typical Fig. 4. Attention Mechanism
IJSRCSAMS
Volume 8, Issue 1 (January 2019) www.ijsrcsams.com
ISSN 2319 – 1953
International Journal of Scientific Research in Computer Science Applications and Management Studies
D. Implementation: Other results are shown in the table below.
The project is based on the conversion of English text to a Sr. Input(English) Output(Hindi)
Hindi version. Input can be an English document or a text file, No.
and after processing, we get the output as a Hindi text. 1 You're kidding! मज़ाक कर रह े हो!
Training Phase: 2 Is there a cafe?
In this phase, we have trained English- Hindi bilingual data यह़ााँ कै फे है क्य़ा?
with an epoch= 300. The training data includes both English as
well as its corresponding Hindi sentence and words. 3 Come if you can. अंदर आ ज़ाओ।
Testing:
In the testing phase, we tested various inputs, which were in 4 Make a better
the form of pdf, doc, etc. After training the data with an epoch translation of the आप जजस व़ाक्य क़ा
=300, we have achieved the accuracy of 90 to 95%. Most of sentence that you are अनुव़ाद कर रह े हैं, उस
the input sentences are yielding a correct output. translating. Do not let
translations into other ही क़ा अच्छी तरह से
IV. RESULTS languages influence अनुव़ाद करें। दसू री
We successfully tested our proposed framework with more you. भ़ाष़ाओं के अनुव़ादों से
than twenty individual sentences having a different perspective. प्रभ़ाजवत न होने द।ें
Following some examples illustrates the output for the given
input: Graph in Fig 6. Shows the accuracy of the implemented
system. On X- axix we have epoch and on y-axis we have
accuracy.
Fig 5. Input textbox
V. CONCLUSIONS
In this paper, we built an English to Hindi translator using
RNN. We experimented with long short-term memory (LSTM)
and attention mechanism. Using the attention mechanism and
LSTM the correct translation to a target language is made
possible. In this project, we have added a feature that we can
directly upload a document that is to be translated so eventually
it reduces the typing time. To make the translation process more
efficient, new rules can be added to the system.
ACKNOWLEDGMENT
We thank our guide, Ms. Aditi Raut who has extended all
valuable guidance and help through various stages for the
development of the project. Her Valuable suggestions were of
immense help throughout the project work.
We convey our sincere regards to our respected principal Dr.
Fig 6. Output Textbox G.V. Mulgund and Head of Department Dr. G.A. Walikar for
their valuable support.
IJSRCSAMS
Volume 8, Issue 1 (January 2019) www.ijsrcsams.com
ISSN 2319 – 1953
International Journal of Scientific Research in Computer Science Applications and Management Studies
REFERENCES
[1] Shachi Mall, Umesh Jaiswal. 2013. Developing a system for machine
translation from Hindi to English. In 2013 4th International Conference
on Computer and Communication Technology (ICCCT).
[2] A. R. Babhulgaonkar, S. V. Bharad. 2017. Statistical Machine
st
Translation. In 2017 1 International Conference on Intelligent System
and Information Management (ICISIM), October 5-6, 2017,
Aurangabad, India.
[3] Jayshree Nair, Amrutha Krishnan, Deetha R. 2017. An efficient English
to Hindi machine translation using a hybrid mechanism. 2016 Intl.
Conference on Advances in Computing, Communications, and
Informatics (ICACCI), Sept. 21-24, 2016, Jaipur, India.
[4] Karthik Revanuru, Kaushik Turlapaty, and Shrisha Rao. 2017. Neural
Machine Translation of Indian Languages. In Compute ’17:10th Annual
ACM India Compute Conference, November 16–18-2017, Bhopal,
India.
[5] Brenda Reyes Ayala, Jiangping Chen,2017. A Machine Learning
Approach to Evaluating Translation Quality, IEEE 2017.
[6] Hybrid machine translation for English to Marathi: A research
evaluation in Machine Translation, March 2016
[7] Kamala Kant Yadav, Dr. Umesh Chandra Jaiswal. A Survey Paper on
Performance Improvement of Word Alignment in English to Hindi
Translation System. In 2017 International Conference on Intelligent
Computing and Control (I2C2)
[8] Pankaj Kumar, Sheetal Srivastava, Monica Joshi. Syntax Directed
Translator for English to Hindi Language. In 2015 IEEE International
Conference on Research in Computational Intelligence and
communication Networks.
[9] Brian Sam Thomas, Rajat Dogra, Bhaskar Dixit, Aditi Raut. “Automatic
Image and Video Colourisation using Deep Learning” 2018
International Conference on Smart City and Emerging
Technology(ICSCET), Mumbai, 2018
IJSRCSAMS
Volume 8, Issue 1 (January 2019) www.ijsrcsams.com
no reviews yet
Please Login to review.