MERLOT Journal of Online Learning and Teaching

Vol. 2, No. 4, December 2006

Learning by Tagging: The Role of Social Tagging in
Group Knowledge Formation


Jude Yew
Doctoral Student
School of Information
University of Michigan
Ann Arbor, MI

Faison P. Gibson
Assistant Professor
School of Business
Eastern Michigan University
Ypsilanti, MI


Stephanie D. Teasley
Research Associate Professor
School of Information
University of Michigan
Ann Arbor, MI




This research presents a case study on the use of Social Tagging in an undergraduate classroom at the University of Michigan during the Fall 2005 semester.  Students were between 20 and 22 years of age.  Students tagged their individual blog posts to contribute to themes and conversations in an online learning environment.  Using content analysis of the blog posts and tags as well as semi-structured interviews, the study examines the role of online social tagging for tracking and aiding group knowledge formation. 



This paper presents a case study from an ongoing research project that investigates knowledge and community formation in online learning environments that employ social tagging. These learning environments allow the user to organize and display online content, such as blogposts and bookmarks, with meaningful keywords or tags presented in a public and collaborative manner. Such labeling of online content potentially allows the individual learner and the community to use technology and social conventions to organize knowledge, coordinate with others, and facilitates the sensemaking efforts of the community (Mathes, 2004).     

This study makes the argument that social tagging systems employed within a learning community can both facilitate the process and provide evidence of knowledge formation within the group. To investigate this, we first put forward a theoretical argument for why social tagging systems should be employed to facilitate the production of group knowledge. We then present an analysis of an undergraduate business school class’ online learning environment that utilized social tagging.  

The case for social tagging 

Tagging describes the activity of marking online content with keywords, called “tags”, as a way to organize content for future navigation, filtering or search. Tags are not based on a controlled vocabulary, but rather are left to the user’s wishes, although as shown in this study group norms and social processes can play a significant role in an individual’s choice of tags leading to fairly consistent assignment of specific tags (Mathes, 2004).  This act of assigning tags to categorize an object is an act of knowledge production as it makes apparent the mental models, or internal representations of knowledge, that one uses to associate with the object (Pauen, 2002). The argument being made here is that allowing students to associate keywords to objects we are enacting the associative structure of knowledge formation (von Anh & Dabbish, 2004). New knowledge is formed in the allocation of tags, as the individual has to make sense of the new object by associating it with prior understandings and classification of objects. For instance, by categorizing a digital photograph with the tag ‘vacation’, we are immediately providing information about the content of the photograph without actually having to view it. Also, the tag “vacation” provides information to others about how we have contextualized the photo. Thus, the use of tags can function both as a way to facilitate the formation of new knowledge as well as to provide evidence of how this knowledge evolves over time. 

Tagging is social because the tags are visible to the whole group with the potential for influencing the tags adopted by each group member. We believe that social tagging systems employed within a learning community can facilitate knowledge formation within the group. In addition, social tagging can provide evidence of knowledge formation to both the group members and to researchers/analysts. In a class, the tags used by individual students to categorize online content also functioned as a “repository” of how that particular student made sense of and assimilated the material being taught in the class (Argote, 1999; Weick, Sutcliffe & Obstfield, 2005). When tags are made public and shared, other students in the class are able to tap into the knowledge being formed by the individual student. Students are able to view the tags used by others and employ those tags to inform their own understanding, creating an iterative learning loop (Russell, Stefik, Pirolli & Card, 1993). Additionally, the tags employed by one member of the class can “self-propagate” and become a “linguistic meme” that enables the entire class to organize and coordinate their online discussion, and in the process of doing so, establishes a common understanding of the material being taught (Heath & Seidel, undated). 


The setting     

This study took place in Business Information Technology 320 (BIT320), a database and Information class offered at the University of Michigan. The class was offered to undergraduates aged 20 to 22 at the Business school and a large part of the class was devoted to group work where students were expected to create information databases based on the technologies taught within the syllabus. BIT320 also used blogs and RSS (an XML format for syndicating blog content) to create an online space where both the professor and the students could share their knowledge. The class website was dubbed the “Class Remix” to encourage participants to improve upon, change, integrate, or otherwise “remix” the group’s knowledge contributions similar to Lessig’s  notions of a remix culture (Koman, 2005). Participation in the Class Remix was mandated through a class policy that stipulated 5 blogposts per week that were then aggregated in the site (Here on the web and pictured in Fig. 1).  Students were encouraged to create a vibrant learning community where group knowledge was built collectively by sharing relevant links, questions, answers, and observations of the material taught in the class.    

In this environment, students could post about new ideas, or they could effectively respond to the contributions of others by writing a response in their own blog and linking back to the original poster.  In this way, conversations  (initial post, comment, response to comment, etc.) effectively occurred across student blogs. When engaging in these sorts of conversations, students were encouraged to reuse at least some of the tags that previous posters had used, as well as, adding any new tags they might find relevant.  In this way, whole conversations came to be grouped by tag and were made findable by tag. A limitation of the system was that once a post was tagged and saved, the tags could not be changed.


Figure 1: Screen capture of class "remix" website (04/14/06)

Unlike more orthodox and prescribed forms of classification, social tagging allowed the users in the community to assign any keyword/category to their contribution that they deemed relevant. Various visualizations, such as the use of tag clouds on the class website (highlighted in blue lower right corner of Figure 1), helped members of the class to be aware of the current and most frequently submitted topics/posts. The class remix website can be seen as an archive of the students contributions, and can be used to document the students’ evolving understanding and knowledge formation that has taken place during the course.

Data & Methodology

Data for this study were composed of participants’ contributions to the class remix website and in-person interviews. To better understand the role of the remix site in the particpants’ learning, content analysis was performed on the student blog posts and the tags they employed to describe these posts. Additionally, the students’ grades in the class and semi-structured interviews with seven out of the eleven participants in the class provided complementary data. In the following section, the server log analysis, the key findings generated by the interviews, and the content analysis of the blogposts are reported.


Table 1 outlines the total number of blogposts made by each student in the class during the term, the total number of tags that they associated with their blogposts and the average number of tags per blog post contributed to the class website.  

The majority of the students adhered to the instructor’s requirements that they contribute five blogposts a week to the class website. With the exception of three students, everyone in the class met the minimum requirements of 5 blogposts a week that was stipulated by the instructor (highlighted in Table 1 by the red line).



Total Posts

Total Tags

Avg. Tags/Post

The Blogstar




Musings of William h

l"> 1.7561

Matt’s Musings




jb's blog








Shady Waters








Pink Footsie




Tigerlily's Blog




Kevin’s Blog








Blogonautic Solutions (instructor)




Table 1: Total blog posts and tags and avg. tags per post
(13 weeks x 5 blog posts/week = 65 minimum required posts)

The instructor’s purpose for stipulating a minimum requirement of contributions was to encourage the students to fully utilize the system, and to ensure sustained participation from the students. The instructor’s rationale for mandating participation online is illustrated in the following quote: 

… This is one of those things where initially people have some hesitation …  I mean there's just all that group anxiety that comes into play and so you got to get over that hump, you got to get over it early and just start making it happen. It’s also practice (that) makes it better … (Inst1interview, 0:32:50)  

As shown by the Average Tags/Post column in the Table 1, participants tended to use more than one tag to describe the content of each blog contribution, a common practice in this type of system (Kroski, 2005). Because of the great number of tags being employed, one issue that emerges is that of the vocabulary problem (Furnas, Landauer, Gomez & Dumais, 1987). This problem highlights the issue that there are multiple ways to describe an object/idea and that random pairs of people label an object similarly at most 20% of the time (Furnas et al, 1987). Because of the vocabulary problem, participants in the class are forced to determine exactly what should be the common vocabulary for describing their blog posts. One student described how the group made sense of multiple tags as follows: 

So when you have hundreds of tags, it's really the case that only a few of them are important.  And that was the case here. And so people were able to figure that out, and that we had sort of themes. So at any given point in time, maybe 10 tags would be important. (Stud2 interview, 0:13:51) 

This pattern was reflected in the analysis of the server logs. In total 143 distinct tags were used 1780 times during the term. However not all tags were used equally. As indicated by the quote from Student 2 above, there were a small number of keywords that were used more frequently than others. Figure 2 highlights the ‘Long Tail’, or the exponential distribution, phenomena (Anderson, 2004) where a large proportion of the 143 keywords contributed were used only once or twice.

Figure 2: Tag Frequency Distribution


Of the 20 most frequently used tags shown in Table 2, the top four tags (highlighted in Table 2 below) were used at least three times more frequently the others.


Tag/ Keyword











































Table 2: Top 20 Distinct Tags by frequency used

 By investigating the timing of when certain tags were adopted and their patterns of use, the formation of group knowledge and convention can be represented.  As shown in Table 3, the top four tags were adopted by the students early on in the semester and their continual use resulted in them becoming conventions for the students in the class to talk about specific subjects in their blog contributions. 




Earliest date published


Kevin’s blog



Pink Footsie



jb’s blog



Tigerlily’s blog


Table 3: Top 4 tags by source and earliest date published

Other more specific tags like SQL, XML and Databases were used only during the part of the term where that subject was the most heavily discussed in class. The instructor of the class represented the phenomenon as follows,  

… a tag winds out being a term or label that people introduce. They introduced it to have a shorthand for referring to some phenomenon. And then if they re- use this term at given points in time, they're saying that phenomenon is there. And so what winds up happening is you see that there are themes, and basically these are recurring uses of tags. (Inst1 interview, 0:15:47)

The formation of “themes” within the class suggests how social tagging aids with the formation of group knowledge around specific course content. The frequency of use of the top four tags and the instructor’s comments support the claim that those tags are functioning as artifacts/repositories of the shared understanding between the individuals in the class (Argote, 1999). And because these tags have been used by every member of the class at one point or another during the term, group knowledge or shared understanding has been formed as a result of the “learning loop” that occurs through their use (Russell et al, 1993).

The differential use of tags

Content coding of the student interviews revealed that not all tags were used in the same way. There were two kinds of tags; functional tags (e.g. “opinionslug” or “classquestions”) and content tags (e.g. “technology” and “XML”).

Functional tags are labels that indicate some form of utility or function to the members of the class. For example, the “classquestions” tag was deliberately used by the instructor of the class as a way to easily indicate and highlight questions or problems that the students may be having with the material being taught. One functional tag, “opinionslug”, was a keyword first coined by a student, Pink Footsie. “Opinionslug” was used to indicate contributions that were personal opinions or views of both the content matter or administrative aspects of the class. According to Student 2,  

… at first it was only Pink Footsie who used that ... cause she was the one who invented it ... but then as we started reading more and understanding what she meant by 'opinionslug' ... we definitely all started using it ... but if you just started looking at this (tag) you would probably have no idea what it was ... So it was a kind of inner group understanding. (Stud2 Interview, 0:27:58) 

From the illustration of the use of the “opinionslug” tag, we can see that an explicit purpose/function is signaled through its use and it prepares the reader of the contribution to both understand and react appropriately to what is being said in the blogpost.  

Another example of a functional tag is “classquestions” which seemed to be a term coined by Tigerlily’s blog but was actually stipulated by the instructor to create threads of interaction that could be retrieved by the students later on. Student 2 indicated that,  

he (the instructor) told us that if ever we had a class question we had to call it "classquestion" ... and if you actually clicked on classquestions you would actually see a stream. (Stud2 interview, 0:33:48)   

The adoption of tags to continue a thread of interaction was practiced by Student 2, who explained that the popularity of certain tags had to do with the fact that they highlighted interesting threads of conversation: 

It definitely had to do with the fact that she (a classmate) would have had to have an interesting enough post where I would reply to it or I would make a post about her post ... and so then when I was picking out my tags I would look at what she called it ...just because I am conscious of that and want to make sure that you could find out stream of conversation ... if it was something really boring that no one answered then it probably wouldn't catch on. (Stud2, 0:29:26) 

Thus we can see that functional tags like “opinionslug” and “classquestions” signaled an explicit purpose and their high frequency of use points to the fact that the convention of using these tags to highlight the function of a blog post became a social norm with in the class.

In contrast, content tags were topics that the class dealt with explicitly. There was a certain amount of ambiguity in how content tags were used and perceived by the students in the class. This ambiguity could be because content tags embodied meanings that went beyond the shared understandings of the students and have significance outside of the class as well. An example of a content tag and how it is used can be seen in the Student 1’s comparison of how her use of the “XML” tag differs from the “opinionslug” tag: 

Well with XML it's harder ... if I had a question about XML and someone answered it and put XML in the tags... it's fine but there's so many different things to call it ... you know it could have been about databases, it could have been about writing code ... whereas with "opinionslug" it was very obvious you were going to call opinionslug because you were basically preaching on your opinions. (Stud1 interview, 0:30:40) 

This sentiment was shared by Student 3, who used the content tag “technology” in the following way; 

For example, when I first started my blog, I was trying to come up with a common thread to a lot of the things, so I use the word "Technology" a lot in my blog. That's such a vague word you know ... And at the same time if I was just looking, or had a couple of minutes to spend, then I would say, "give me something interesting about ‘technology’ that's going on" and I wanted that broad topic. (Stud3 interview, 0:26:30) 

What is highlighted from the student quotes, is the issue of polysemy, or the multiple meanings of words (Furnas et al., 1987). Polysemy is a double-edged sword in the use of social tagging systems. It would seem that the use of popular content tags like “technology” were deliberately used to signal the content of the blog post and appeal broadly to as many individuals as possible. However the problem with such tags is that they are also highly ambiguous and often have to be paired up with other terms such as “ipod” and “Microsoft” to qualify their meaning. As highng. As a result, many tags associated with blog posts tended to be used only once or inute position in the tag cloud. 

From the analysis of how tags are used by the students, we can see that it is much more difficult to base assertions of group knowledge formation around popular or frequently used tags. What is shown is that the students used tags according to a shared notion of the tags’ function. Very often, tags were used to continue threads of conversation and to signal the content of the blogpost. As a result, the group knowledge that is formed around the students’ use of tags does not necessarily represent their understanding of the content but rather the shared understanding of how the tags are used to signal norms of participation within the class.  

To further explore how tags were used, content analysis of the text in the students’ blog posts was conducted to determine the correlation of ideas and concepts in the text of the students’ blog posts with the tags that were used. However, it is obvious from the previous section that keywords like “technology” were broad and that the content analysis of the students’ blog post would not necessarily reveal any correlation between the content of the students’ contribution with the keywords chosen.  

For example, one particular blog post contributed by Matt’s Musings was labeled with the following tags; “opinionslug”, “technology”, and “blogging”. Content analysis of the text in the blog post produced a word frequency analysis that highlighted only one co-occurrence of the tags used with the content of the post.  The tag “technology” was a word that was appeared once in the textual content of the blog post. The subject of the blog post was mainly about cellular phone technology between the US and other countries.  So in general the “Technology” tag only represented the post very broadly. What is interesting to note is that functional tags such as “opinionslug” tend not to co-occur in the body of the post as they represent the function, not the content of the post. Again this highlights the differentiation between the purpose and use of content versus functional tags.

The idea of a shared vocabulary is crucial to the formation of group knowledge. Having a common language enables the processes of establishing mutual beliefs and mutual assumptions in group communication, processes that are essential to the formation of a community (Clark & Brennan, 1991). As had been indicated in the previous section, tags like “opinionslug” and “classquestions” functioned as a way for the students to communicate and interact with each other. It was a way for them to signal the intentions of their contribution and to publicly solicit and provide help to each other. Student 3 articulates this sentiment in the following comment; 

On the occasions when I answered questions, which was rare, or when I responded to somebody else's blog, I tried to use the same tags that they (the other students) used when they wrote ... I would intentionally try and incorporate those into my tags, and maybe if it had to do with something else, also include the other tags just to try to cover my bases so that somebody else could follow the same kind of logic or thread-line, get to their blog and then my blog.  (Stud3 interview, 0:21:08) 

Thus, the tags proved useful to learning because they provided a common vocabulary with which the students are able to interact with each other. This aspect of interaction seemed to be the predominant learning benefit that the students experienced during the term.  

It was these interactions, made public on the class “remix” website through the tags, that the students valued. For them, the system added a new layer of social interactions on top of the physical interactions that were going on during the class.  Student 2 makes this point as follows: 

I think that this contributed to the class so much ... you know it made us more friendly with each other ... we'd come in the next day and we'd be like "Oh my god! Did you read what Student x wrote." Literally, it was so nerdy but we did. And ... the professor would start cracking jokes like "Student Y mis-spelled this word in her blog" and he would mispronounce it during lecture on purpose ... and we all got the joke cause we all read the blog. It really contributed to the bonding and how we got along with each other. (Stud2, 0:45:26) 

The role of blogging in learning 

While the focus of this study concentrated on the use of social tagging, an important premise made was that group blogging might help students learn.  One way to explore this premise is to test the extent to which blogging performance was correlated with performance in other aspects of the class.  Fortunately, the case study provides data to perform this test.  As part of the grading process, the instructor computed a blog index for each student (Table 4).  This index consisted of the instructor’s rating of the quality of each student’s overall blog output multiplied by the total number of posts the student produced.  Quality was a function of the length and relevance of student posts.  This index showed a significant correlation (r(9) = .663, p < .05) between the blog index and the students’ final grades less the blogging component of the course. Examining the components of the blogging index reveal that total posts is significantly correlated with the grade in other components of the course (r(9) = .692, p < .05). However, the quality of posts is not significantly correlated with the students’ final grade (r(9) =.383, p > .05). These correlations suggest that students who interacted more often, by posting blog contributions to the learning remix website, tended to achieve better performance.  


Total Posts

Post Quality

Blog Index

(Total posts * Post quality)

Final Grade less Blogging Component











































le="font-size: 10.0pt; font-family: Arial; font-weight: normal"> Table 4: Class performance with blog index & final grade 

The reasons for improved performance may be varied.  For one, these measures may all simply be correlated with underlying traits of the learner such as diligence and intelligence.  However, learning in higher education is by its nature an intensely social process. People communicate and process information interactively.  The blogging environment, along with the use of social tagging, provided students with an environment that offered greater opportunities to interact regarding class material than could be afforded during the allotted class time.  Those who took advantage of this opportunity more often performed better in other aspects of the class. 


The main hypothesis of this study is that the use of social tagging can aid with group knowledge formation in the classroom. The findings indicate that social tagging enabled the process of group knowledge formation as well as the labeling of that content. Social tagging enabled the students in the class to not only interact with each other through a shared vocabulary, but also develop a set of common norms and practices. For instance, the use of functional tags provided members of the class with a means to indicate the purpose of their blogposts. Blogposts tagged with “opinionslug” highlighted that the author would be getting on his personal soapbox and airing his views. This enabled other students to make a choice of either avoiding or reading that particular posting, without the need to look at the title or the body of the blogpost. Additionally, the use of the tags was a way students kept track of their interactions with each other. The class norm of using the same tags as the post that one is responding to enabled students to identify and track the interactions they had with each other.  

Thus the evidence presented by this analysis strongly shows that, through the use of social tagging, the students built shared vocabulary and norms for interacting with each other in the online learning environment. This can be understood as the mechanism by which group knowledge can begin to form. Instead of uncovering the “what” of group knowledge (its content), this study uncovered instead, the “how” (its process).


Anderson, Chris. “The Long Tail.” Wired, 12.10 October 2004. Retrieved on Oct. 13th, 2005 from

Argote, L. (1999). “Organizational Learning: Creating, Retaining, and Transferring Knowledge”. In, Organizational Memory. Kluwer Academic Publishers, pp. 67-97. 

Clark, H. H., & Brennan, S. E. (1991). Grounding in Communication. In Resnick, L. B., Levine, J. M., & Teasley, S. D. (Eds.) Perspectives on Socially Shared Cognition (pp. 127-149), Washington, DC: American Psychological Association. 

Furnas, G. W., Landauer, T. K., Gomez, L. M., Dumais, S.  T., (1987) "The vocabulary problem in human-system communication." Communications of the Association for Computing Machinery, 30 (11), Nov 1987: 964-971. 

Heath, Chip and Victor Seidel. (Undated) Language as a coordinating mechanism: How linguistic memes help direct appropriate action. Working paper, 

Koman, R. (2005). Remixing Culture: An Interview with Lawrence Lessig. Retrieved October 19th, 2005, from 

Kroski, E. (2005). The Hive Mind: Folksonomies and User-Based Tagging. Infotangle, December 7th, 2005. Retrieved on Jan. 2nd 2006 from 

Mathes, Adam (2004). Folksonomies - Cooperative Classification and Communication Through Shared Metadata, December, 2004.  Retrieved on Dec. 1, 2006 from  

Pauen, S. (2002). "Biobehavioral Development, Perception, and Action: Evidence for Knowledge–Based Category Discrimination in Infancy". In Child Development Volume 73 Issue 4 (July/August 2002). Retreived on 16th December 2005 from 

Russell, D. M., Stefik, M. J., Pirolli, P., Card, S. K. (1993) "Cost structure of sensemaking" Proceedings of the Conference on Human Factors in Computing Systems - INTERACT '93 and CHI '93. ACM, New York, NY, USA: 269-276. 

von Ahn, L. and L. Dabbish (2004). Labeling Images with a Computer Game. In, Proceedings of ACM CHI 2004, pp. 319-326. 

Weick, K., Sutcliffe, K. & Obstfield, D. (2005) Organizing and the Process of Sensemaking. Organizational Science, Vol. 16, No. 4, July – August 2005, pp. 409-421.


Manuscript received 31 Aug 2006; revision received 5 Dec 2006.


Creative Commons License

This work is licensed under a

Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License



Copyright © 2005 MERLOT. All Rights Reserved.
Portions Copyright by MERLOT Community Members. Used with Permission.
Questions? Email:
Last Modified : 2005/04/14