Aim: Data pre-processing and text analytics using Orange.
What is text analytics?
Text analytics is the automated process of translating large volumes of unstructured text into quantitative data to uncover insights, trends, and patterns. Combined with data visualization tools, this technique enables companies to understand the story behind the numbers and make a better decision.
What is sentiment analysis?
Sentiment analysis refers to the use of natural language processing, text analysis, computational linguistics and biometrics to systematically identify, extract, quantify and study affective states and subjective information.
Can we get sentiments of any people on their tweet? Why it is useful?
Yes, we can get sentiments of any people on their tweet. Like it is positive, negative or neutral. Because of this study of sentiment, we can get the information like if we are running a campaign and from the tweet of the people, we can know that is user participating in it, like it or not. So it is very useful.
Pre-Processing on Data
As shown in the above image we can apply pre-processing techniques and can see the changes in data.
Discretization
Discretization is the process through which we can transform continuous variables, models or functions into discrete form. We do this by creating a set of contiguous intervals that go across the range of our desired variable, model and function.
This is how we can apply discretization on data in Orange Tool
Continuization
Continuization is the process where we can convert discretize attributes into continuous or we can remove discrete attributes from the table.
Normalization is a technique of calling or mapping. The technique which provides linear transformation on the original range of data is called Min-Max Normalization.
This is how we can apply Normalization on data in Orange Tool
Randomization
Randomization is the process of adding noise to the data that the behaviour of the individual records is masked.
This is how we can apply randomization on data in Orange Tool
We can also perform all this pre-processing using python script in Orange tool.
This is the code for discretization in python
This is the code for continuization in python
This is the code for normalization in python
This is the code for randomization in python
No comments:
Post a Comment