RESEARCH PAPER
Tweet Clustering in Indonesian Language Twitter Social Media using Naive Bayes Classifier Method
 
More details
Hide details
Publication date: 2018-12-25
 
Eurasian J Anal Chem 2018;13(6):emEJAC181147
 
KEYWORDS
ABSTRACT
Twitter is one of the social media that has been widely used for various purposes, especially to facilitate the means of information, communication, entertainment and a means of expressing expression. We can find various kinds of information on twitter such as culture, sports, culinary, tourism, music, politics and others. The purpose of this research is to build an application that can group tweets from twitter into sports and non-sports categories using the Naive Bayes classifier method. Text mining is a technique used to handle classification, clustering, information extraction and information retrieval problems. To classify tweets from twitter automatically needed one of the mining Clustering text techniques. Learning outcomes in the form of probabilities will be used as material for processing tweet documents that are not yet known in the category. In the process, the tweet document will go through a text pre-processing process, and grouped into unigram (one word), bigram (two words), trigram (three words). For determining the category of a tweet document that is not yet known, the comparison is made between the results of the appearance of the categories of the three n-grams. From the results of testing the system using 100 to 2000 training data in each category, and 10 testing data in each category. The result is the accuracy of tweets that are categorized as 60% in training data as much as 100, accuracy of 65% in training data as much as 200, and accuracy of 90% in training data as much as 2000. The conclusion is that the more training data used as learning increases also the success rate of clusters to a tweet document.
eISSN:1306-3057