Practical - 5

Aim: Data pre-processing and text analytics using Orange.


What is text analytics? 

Text analytics is the automated process of translating large volumes of unstructured text into quantitative data to uncover insights, trends, and patterns. Combined with data visualization tools, this technique enables companies to understand the story behind the numbers and make a better decision. 

What is sentiment analysis?

Sentiment analysis refers to the use of natural language processing, text analysis, computational linguistics and biometrics to systematically identify, extract, quantify and study affective states and subjective information.

Can we get sentiments of any people on their tweet? Why it is useful?

Yes, we can get sentiments of any people on their tweet. Like it is positive, negative or neutral. Because of this study of sentiment, we can get the information like if we are running a campaign and from the tweet of the people, we can know that is user participating in it, like it or not. So it is very useful.

Pre-Processing on Data

As shown in the above image we can apply pre-processing techniques and can see the changes in data.

Discretization

Discretization is the process through which we can transform continuous variables, models or functions into discrete form. We do this by creating a set of contiguous intervals that go across the range of our desired variable, model and function.

This is how we can apply discretization on data in Orange Tool

Continuization

Continuization is the process where we can convert discretize attributes into continuous or we can remove discrete attributes from the table.

This is how we can apply continuization on data in Orange Tool


Normalization

Normalization is a technique of calling or mapping. The technique which provides linear transformation on the original range of data is called Min-Max Normalization.

This is how we can apply Normalization on data in Orange Tool

Randomization

Randomization is the process of adding noise to the data that the behaviour of the individual records is masked.

This is how we can apply randomization on data in Orange Tool

We can also perform all this pre-processing using python script in Orange tool.


This is the code for discretization in python

This is the code for continuization in python

This is the code for normalization in python

This is the code for randomization in python

 Orange File Link

No comments:

Post a Comment

Welcome to my blog