Frequency of tags
Motto : To get the frequency of the hashtags from Twitter data during some time period.
Data : The data set we are using can be accessed from here. Every row is about a tweet and consists of timestamp , userid , Hashtags and URL's. In our case we are separating the hashtags as we are only dealing with the hashtags. The total no of tweets are 22.5 million.
Separating tags : First of all we will separate all Hashtags and then we will remove tweets which contains same Hashtags more than once.
Code :
With this you will get all the hashtags in which each row consists of all the Hashtags of a particular tweet. Next we will remove all the tweets which contains same hashtag more than once.
Code2 :
The no of such rows are something around 80k tweets.As it is a small amount when compared to data we can remove them. The reason we are removing such rows is in association rules or frequency data items methods each transaction should have items listed only once.
The next and Final step we apply is the frequency item sets and association rules.
Results : The results we get with support of 0.005 are
The associations rule we get with confidence 0.8 are
Motto : To get the frequency of the hashtags from Twitter data during some time period.
Data : The data set we are using can be accessed from here. Every row is about a tweet and consists of timestamp , userid , Hashtags and URL's. In our case we are separating the hashtags as we are only dealing with the hashtags. The total no of tweets are 22.5 million.
Separating tags : First of all we will separate all Hashtags and then we will remove tweets which contains same Hashtags more than once.
Code :
adda=open('G:\main 3 datasets\\tweets-nov-2012.json\\twitter1.csv',"w")
with open('G:\main 3 datasets\\tweets-nov-2012.json\\twitter.csv') as fin:
    for line in fin:
        x=line.split(":")
        y=x[2][:-11]
        z=y[2:(len(y)-1)].split(",")
        for i in z:
            adda.write(i)
        adda.write(" \n")
With this you will get all the hashtags in which each row consists of all the Hashtags of a particular tweet. Next we will remove all the tweets which contains same hashtag more than once.
Code2 :
adda=open('G:\main 3 datasets\\tweets-nov-2012.json\\twitter2.csv',"w")
with open('G:\main 3 datasets\\tweets-nov-2012.json\\twitter1.csv') as fin:
    for line in fin:
        x=line.split(" ")
        if (len(x) != len(set(x))) == False:
            adda.write(line)
The no of such rows are something around 80k tweets.As it is a small amount when compared to data we can remove them. The reason we are removing such rows is in association rules or frequency data items methods each transaction should have items listed only once.
The next and Final step we apply is the frequency item sets and association rules.
Results : The results we get with support of 0.005 are
["gameinsight"], 1255588
["androidgames"], 707722
["androidgames","gameinsight"], 704138
["android"], 590420
["android","androidgames"], 576085
["android","androidgames","gameinsight"], 574468
["android","gameinsight"], 574478
["PeoplesChoice"], 490642
["ipadgames"], 382201
["ipadgames","gameinsight"], 379872
["ipad"], 348771
["ipad","ipadgames"], 340866
["ipad","ipadgames","gameinsight"], 339881
["ipad","gameinsight"], 339883
["Android"], 231391
["Android","androidgames"], 130041
["Android","androidgames","gameinsight"], 129513
["Android","gameinsight"], 132560
["GetGlue"], 203517
["TeamFollowBack"], 199024
["MTVEMA"], 198811
["sales"], 187489
["EMAWinBieber"], 173581
["iphonegames"], 168656
["iphonegames","gameinsight"], 167216
["iphone"], 162476
["iphone","iphonegames"], 138861
["iphone","iphonegames","gameinsight"], 138841
["iphone","gameinsight"], 139045
["TuitUtil"], 161994
["EMAVoteOneDirection"], 150123
["InstantFollowBack"], 149567
["InstantFollowBack","TeamFollowBack"], 140699
["\u52a3\u5316\u30b3\u30d4\u30fc"], 122426
["RT"], 120826
["love"], 112880
The associations rule we get with confidence 0.8 are
["iphonegames","gameinsight"] => ["iphone"], 0.8303093005454023
["iphone"] => ["iphonegames"], 0.8546554568059282
["iphone"] => ["gameinsight"], 0.8557879317560747
["iphone","gameinsight"] => ["iphonegames"], 0.9985328490776367
["ipad"] => ["ipadgames"], 0.9773346981257043
["ipad"] => ["gameinsight"], 0.9745162298470916
["androidgames","gameinsight"] => ["android"], 0.8158457575077612
["ipad","ipadgames"] => ["gameinsight"], 0.9971103014087648
["android","androidgames"] => ["gameinsight"], 0.9971931225426803
["android","gameinsight"] => ["androidgames"], 0.9999825928930264
["iphone","iphonegames"] => ["gameinsight"], 0.9998559710789927
["androidgames"] => ["gameinsight"], 0.994935864647418
["androidgames"] => ["android"], 0.8139989996071904
["ipadgames","gameinsight"] => ["ipad"], 0.8947250652851487
["iphonegames"] => ["gameinsight"], 0.9914619106346646
["iphonegames"] => ["iphone"], 0.8233386301109952
["InstantFollowBack"] => ["TeamFollowBack"], 0.9407088462027051
["android"] => ["androidgames"], 0.9757206734189221
["android"] => ["gameinsight"], 0.9729988821516886
["Android","gameinsight"] => ["androidgames"], 0.9770141822570911
["ipad","gameinsight"] => ["ipadgames"], 0.9999941156221406
["Android","androidgames"] => ["gameinsight"], 0.9959397420813436
["ipadgames"] => ["gameinsight"], 0.9939063477070965
["ipadgames"] => ["ipad"], 0.8918500998165887
 
No comments:
Post a Comment