Sentiment Analysis: A Way To Improve Your Business

In this blog post, we are going to introduce the readers to an important field of artificial intelligence which is known as Sentiment Analysis. It’s something that is used to discover an individual’s beliefs, emotions, and feelings about a product or a service. As we proceed further in this tutorial, readers and passionate individuals will come to know how such an amazing approach is implemented with the flow diagram. To help readers understand things better and practical, live code is also inserted in one of the sections.

At the end while concluding we are presenting our approach to customize the basic sentiment analysis algorithm and will also provide an API where the users can do practical testing of the customized approach.

Now, let’s define sentiment analysis through a customer’s reviews. As an example, if we take customer feedback, its sentiment analysis in a form of text measures the user’s attitude all towards the aspects of a product or a service which they explain in a text.

The contents of this blog post are as follows:

  1. Using Regex
  2. Using nltk
  1. APIs to web scraping
  2. Bags of Words
  3. TFIDF

What is Sentiment Analysis?

Sentiment analysis is the process of using natural language processing, text analysis, and statistics to analyze customer analysis or sentiments. The most reputable businesses appreciate the sentiment of their customers — what people are saying, how they’re saying it, and what they mean.

If we look at the theory, then it is a computational study of opinions, attitudes, views, emotions, sentiments, etc. expressed in the particular text. And that text can be seen in a variety of formats like reviews, news, comments, or blogs.

Why is a sentimental analysis of Giant E-commerce websites important?

In today’s world, marketing and branding have become the strength of colossal businesses and to build a connection between the customers’ such businesses leverage social media. The major aim of establishing this connection is to simply encourage two-way communication, where everyone benefits from online engagement. Simultaneously, two huge platforms are emerging in the field of marketing. In proceeding further, we’ll grasp why these two enormous platforms have become so efficient specifically for analyzing the sentiments of the customers.

Flipkart and Amazon India are emerging as the two colossal players in the swiftly expanding online retail industry in India. Although Amazon started its operations in India much later than Flipkart, it is giving tough competition to Flipkart.

Generalize approach for Sentiment analysis?

Sentiment analysis uses various Natural Language Processing (NLP) methods and algorithms. There are two processes that clear to you how machine learning classifiers can be implemented. Take a look.

  1. The training process: In this process (a), the model learns to associate a particular text form to the corresponding output which can be recognized as a tag in the image. Tag is based on the test samples used for training. The feature extractor simply transfers the input of the text into a feature vector. Pairs of tags and feature extractor (e.g. positive, neutral, or negative) are fed into the machine learning algorithm to generate a model.
  2. The prediction process: The feature extractor used to transfer unseen text inputs into feature vectors. Then the feature vector fed into the model which simply generates predicted tags (positive, negative, or neutral).

This kind of representation makes it possible for words with similar meanings to have a similar representation, which can improve the performance of classifiers. In the later section, you’ll get to learn the sentiment analysis with the bag-of-words model, data collection, and so on.

We are explaining the general approach for implementing sentiment analysis using a predefined library. I will implement three phases of this approach such as data gathering, data cleaning, and predicting with live code. Each line of the code will be explained in the respective section. The general approach is also mentioned in the flowchart below.

Steps for sentiment analysis using the predefined library

Data collection via web scraping

Data scraping and web scraping are similar things in which we extract the data from the specific URL. Data scraping is the technique to collect the data which extracts data with a computer program and makes it human-readable content so that you can read store, and access easily.

Selenium is an open-source testing tool that is simply used to testing web applications. As well as it is also used for web scraping where the unofficial documents can be checked.

Scrappy is the python framework that simply provides you a complete package for the developers. It works similarly to the beautiful soup.

We’re going to use one of the best libraries “Beautifulsoup”. Let’s understand what beautiful soup is actually.

Beautiful soup is a python library used to parse HTML and XML files.

Here, we will import beautiful soup along with ‘urllib’ then we’ll name the source with an ‘lxml’ file.

To begin, we need to import ‘Beautiful Soup’ and ‘urllib’. And the source code would be:

In the source, we will mention the path of that particular URL

Then we will save the scraped data in the soup variable. And if you want to read the complete file then ‘print’ action to soup variable.

If you want to check the specific tag then you can provide the ‘print’ action with a specific tag:

The scraper can then replicate or store the complete website data or content elsewhere and use it for further processing. Web scraping is also used for illegal purposes, including the undercutting of prices and the theft of copyrighted content. An online entity targeted by a scraper can suffer severe financial losses, especially if it’s a business strongly relying on competitive pricing models or deals in content distribution.

Data Preprocessing (Cleaning)

Data Preprocessing is the technique of data mining that is implemented to transform the raw data into a useful and efficient format. And if your data hasn’t been cleaned and preprocessed, your model does not work.

Using Regex

Line 1: \W stands for punctuation and \d for digits.

Line 2: Removes the link from the text.

Line 3: Values are returned by the function to the part of the program where the function is called.

Using nltk

Line 4: This function converts a string into a list based on the splitter mentioned in the argument of the split function. If not splitter is mentioned then space is used as a default. Join converts list into a string. One can say that joining is a reverse function for the split.

Line 5: So the first step is to convert the string into a list and then each token is iterated in the next step and if there are any stop words in the tokens they are removed.

Line 6: In the final step, the filtered tokens are again converted into a list.

Line 7: Values are returned by the function to the part of the program where the function is called.

Predicting the scores using predefined library SentimentIntensityAnalyzer

Line 8: sid is the object of the class SentimenIntensityAnalyzer(). This class is taken from nltk. sentiment.Vader.

Line 9: It will return four sentiments (negative, neutral, positive, and compound) along with their confidence score. The compound score is a metric that calculates the sum of all lexicon rating that has been normalized between -1 and 1. If the compound value is greater or equal to 0.05, then it will point to a positive score and if it is less than or equal to -0.05, then the sentence is positive and if it does not lie in both ranges then the sentence is neutral.

Line 10: Function will return the output to the calling function.

Code Snippet for General Approach

Two files are used one for getting the text from the end-user named as index.html and another for rendering the response result.html. This small application is built using Django and this is a small code snippet to let you know how the predefined approach works. In the next section, we will have some discussion on how to create a custom approach to make a better sentimental analysis API.

Discussion on custom sentimental analysis approach

In this approach, data gathering will remain the same as this is the basic step and is needed for any approach. Different regex patterns can be applied after data is gathered to make it clean and data is subjected to different operations of nltk such as stemming, removing stop words, lemmatizer to clean it more effectively. Here, some custom functions can be developed based on the requirement and structure of the data set. After this step one will get refined text which is applied to a mechanism that converts the text into some tensors or integers. This could be word embedding or using a bag of words or tf–idf. The benefit of using word embedding as compared to the later methods is former helps to maintain the semantic relationship with the words and helps to understand the context better. The output of such is passed to any deep learning or ml model. I would suggest plotting the graph for the text and if the graph represents some non-linear relationship then it is good to opt for deep learning else machine learning.

When the choice of model is done in the previous step it is time to feed tensor to model for training the model. Training model time depends on the amount of data you have. Once this step is complete I would recommend saving the model so that for the prediction phase one needs to load the model instead of training the model again. Suppose, if you are using Keras then follow the below steps to save the model.

If you are making the model using PyTorch, then please execute the below code to save the model

Once the model is saved, it is time for loading the model for real-time prediction. Saving the model helps to load the model from the checkpoint instead of training it again for each prediction. For the prediction phase, it is important to create the features of the real-time testing data which is fed to the saved model. Output may be overfitting or underfit and hence you need to tweak hyperparameters while creating the model.

Conclusion

In this short tutorial, we have seen what is sentimental analysis and why it is used. Amazon or Flipkart uses it extensively to increase their sales and productivity. We had also implemented a general approach with code and explained every line of code. In the end, there was also discussion on a custom approach that can make the code more robust.

--

--

Paradise Techsoft Solutions Pvt Ltd

Paradise Techsoft grows into the potential of latest AI trends, Web design & development and Digital Marketing Techniques. Stay up to date http://bit.ly/2lZdMb2