A Tag Clustering Method t o deal with Syntactic Variations on Collaborative Social Networks – ICWE 2009


Folksonomies have emerged as a common way of annotating and categorizing content using a set of tags that are created and managed in a collaborative way. Tags carry the semantic information within a folksonomy,
and provide thus the link to ontologies. The appeal of folksonomies comes from the fact that they require a low effort for creation and maintenance since they are community-generated. However they present important drawbacks regarding their limited navigation and searching capabilities, in contrast with other methods as taxonomies, thesauruses and ontologies. One of these drawbacks is an effect of its flexibility for tagging, producing frequently multiple syntactic variations of a same tag. The difficulty of clustering tags containing syntactic variations increases as the length of the tag decreases. Similarity measures allow the correct identification of tag variations when tag lengths are greater than five symbols. In this paper we propose the use of cosine relatedness measures in order to cluster tags with lengths lower or equal than five symbols. We build a discriminator based on the combination of a fuzzy similarity and a cosine measures and we analyze the results obtained.


José Javier Astrain, Francisco Echarte, Alberto Córdoba, Jesús Villadangos


  • Excel file with tags similarities: download


Astrain, J. J., Echarte, F., Córdoba, A., and Villadangos, J. 2009. A Tag Clustering Method to Deal with Syntactic Variations on Collaborative Social Networks. In Proceedings of the 9th international Conference on Web Engineering (San Sebastián, Spain, June 24 – 26, 2009). M. Gaedke and M. Grossniklaus, Eds. Lecture Notes In Computer Science, vol. 5648. Springer-Verlag, Berlin, Heidelberg, 434-441