ChatBot ( साइलि )

The Multilingual Bot. Your Future personal Assistance

ChatBot Data Collection and management 

The bot is trained with Json File for each language with each intent, Each intent will have tags and these Tags are level of our Class.

Tags will have respective patterns. So the pattern is nothing but the array of answers.

 

For example, tag greeting  will be like the list below

Hi, hey hello, how are you, good morning, good day, etc. Similarly, the tags will have responses as well, as the pattern 

Hey, hello, thanks for visiting,

 

Likewise, we will have multiple tags and multiple greetings The examples of other tags are,

Goodbye, thank you,   [in case of our course sector] ⇒ courses, durations, fees, locations, teachers, timing, and so on


So whenever a new question or sentence comes to our bot it will try to recognize the tags it belongs with and pick out the random responses from the available class. 

This is not required it needs exact sentences that we have given it during the training,

It will try to categorize the sentence from different available words and phrases and proceed for the appropriate answers.

 

This is the way it works. And we train it.

So How it  is made with

 

NLP concepts

Stemming,

Stemming is the concepts for the getting root of the words,

It will chop the unwanted ending words that is proportional to the root of the words

Eg. “organize”, “organizes”, “organizing”

[“organ”, “organ”, “organ”]

 

This is based on the different stemmers that we chose. There are different stemmers available so it's up to us which we shall choose.

 

Tokenization

 

( splitting a string into meaningful units, eg. words, punctuation characters, numbers)

 

Eg. “ where I can take the 2 courses?

[“where”,” I”, “can”,” take”,” the”,”2”, “Courses”, “?”]

 

Punctuation

 

 

Stop Words

 

 

Bag of Words

 

 

Term Frequency

 

Bag of words

Converting a string to vectors containing numbers for each word

 

Based and feasibility with the multiple languages

 

Our NLP preprocessing Pipeline

 

“Which course do you have?”

Tokenize

 


 

[“Which”, “course”, “do”, “you”, “have” “?” ]

Lower+stem

 

[“which”, “course”, “do”, “you”, “have”, “?”]

Exclude punctuation characters

 

[“which”, “course”, “do”, “you”, “have”]

Bag of words (from NLTK )

 

X [0,0,0,1,0,0,1,0,0,1,0,1,0,0,1]



2)

Training the data of multiple languages together

 

We need to provide our data at first.

French format file.

 

English format file.


 

Nepali format file


 

3)

Pytorch Module for each  training module

 

 

Here Again, I am making feature engineering for the making model.

From the loop inside the folder containing different language files, I am getting the entire intents of the file. On that intents, I am applying my stop words, stemmer, and tokenizing them.


 


 

Here the words are ready for training and testing after the vectorizing and I am defining my neural network with batch size, learning rate, epochs, layers 


 

After this, the neural is made to start with the optimizer, and based on hardware the model is choosing the GPU or CPU version of the PyTorch model.

 

4)saving/logging model and implementing the chat for each belonging module and class

 

 

Now I am saving my model in the respective path to get it back 


 

 

5)serving it with web frameworks like Django or flask

 

 

 

For serving chatbot, we can have multiple options like web framework, mobile application, and or even the API-based data.

Thus Here I am using Django to make conversation with and on chat.

Here I have created, view, template, and route.

 

Django Model View and Template

 

The HTML page will make requests of string from user input, it will pass to the view function through the URL route, and in view, the specific function will check which language first with the availability in the model and previous blue printed JSON file. After that, it will figure out the resemblance of the word with the neural network and get the score for the word if the successful prediction takes place. And hence give the answer if in True case or returns predefined “not understanding message”

 

Automatic Sound Recognization,(ASR)

 

In the audio section, I am using python library gTTS for the Text to Speech and SpeechRechognigation for the Speech to text 

The return message will be pronounced with the intents based language voice. So it will give actual accents for the actual languages.

 

For the voice to text, I am manually using the language model to define it.

 

 

Integration of Model.

 

The module can easily integrate with Facebook, WhatsApp.

 

Future Modeling,

 

Saili (साइलि) can be a personal Assistant like Siri, Alexa, Cortona

 

Thanks to Epita and Team for this wonderful platform