twitter - Best Features for Term Level Clustering -


at moment, working on project related mining twitter data. aim of project find themes can used represent set of tweets. finding themes, came idea term level clustering. terms important concepts extracted using textmining tools. well, main question is, best features define term similarity? in project, due insufficient amount of data, doing unsupervised learning, clustering using k-means algorithm. have extracted features. understand, 1 way know semantic (not actually) meaning of term seeing context of term mentioned. therefore, have @ moment preceding , following word , pos of term. instance:

i drink cup of xyz had spoon of abc yesterday. 

by seeing preceding word , pos - cup/nn , of/in xyz , spoon/nn , of/in abc - knew xyz , abc might liquid material or component. well, sounds naive, in fact, don't clusters. in addition previous features, have named entity types considered features. instance, entity type person, location, problem (in medical), medterm etc.

so, common features term level clustering? comments , suggestions appreciated. open guidance, such paper, link etc.

edit: in addition features, i've extracted head nouns of each term , considered 1 of features. thinking of using head noun in case multi-word terms.

well, let me see if understood correctly need. extracted/found terms want centres of clusters, , want find terms similar them grouped in proper cluster?.

in general need define similarity measure (distance) , here main point, want similarity distance measure or determine. if looking term term similarity, letters can try things levenshtein distance example, if want find contextual similar terms, written in different way mean same thing, thats different levenshtein pretty harder do.

what important keep in mind need measure of similarity find similar terms. see call features named entity types, k-means bad when dealing non continuos data.


Comments

Popular posts from this blog

yii2 - Yii 2 Running a Cron in the basic template -

asp.net - 'System.Web.HttpContext' does not contain a definition for 'GetOwinContext' Mystery -

mercurial graft feature, can it copy? -