Getting started with social network analysis

I teach an MA course in Advanced Media Methodologies at the University of Cape Town. This  year I’m presenting an elective which introduces Media students to Social Network Analysis. I’m really looking forward to teaching the course and seeing how a conceptual grounding in social network analysis and the  techniques of visualisation will change the work my students are able to produce for their dissertations.

We don’t have much class time and there are so many new skills to be learned.  I decided to design the course around a series of exercises and readings that students can use to prepare before class.

Here is a first draft of the outline with the course readings and exercises. Any feedback welcome!

Analysing Social Media: Text, image, network


Early adopters (joined pre Dec 2007) in my own Twitter network
Early adopters (joined pre Dec 2007) in my own Twitter network

Week 1: Reading and exercise

Garton, L., Haythornthwaite, C., & Wellman, B. (2006). Studying Online Social Networks. Journal of Computer-Mediated Communication, 3(1).

  1. Create a blog (if you don’t have one already). You can use a free site such as You’ll be posting your answers to the class assignments on the blog.
  2. After reading the Garton et al (2006) reading for this week, prepare and pilot a short interview. Your interview should explore a research participant’s use of social media to communicate with his/her strong ties and should be designed to yield both quantitative and qualitative data. Post a short rationale for the interview questions on your blog and bring the questions to class next week.
Spreadsheet listing connections in our class
Spreadsheet listing connections in our class
  • Complete the Connections spreadsheet We will use this to map social networks during class.
    1. Click through to the editable spreadsheet on Google Drive
    2. Add your details to the final line of the spreadsheet.
    3. I have already added my details and the fact that I know all of you.
    4. Add your details by putting your name below the final line of data in the first column. In the second column, (next to your name), add the name of any other student you already know in the class, one per line. (I have already added the connections between the Interactive Media production students.
    5. In the third column, indicate from which class you already know that student.
    6. If you know the student from more than one class, add another line with your name, the student’s name and the name of the additional class.

Week 2: Readings and exercises

Hansen, D., Shneiderman, B., & Smith, M. A. (2010). Analyzing Social Media Networks with NodeXL.  Morgan Kaufmann. (Chapter 3) Chapter 10)

Bruns, A., & Burgess, J. (2012). Researching News Discussion on Twitter. Journalism Studies, 13(5-6), 801–814.

1. As shown to you in class, and using the vertex data from the Connections spreadsheet:

  • Download NodeXL and follow the installation instructions. You will need a Windows PC with Excel (or Windows and Excel installed on your Mac). You will also need internet access on the machine. NodeXL will not work on the UCT network behind the firewall.
  • Work through the NodeXL tutorial
  • Create a NodeXL sociogram to depict the relationships recorded in the Connections spreadsheet
  • Calculate the graph metrics. What are the various centrality measures? What do these numbers mean? What does this suggest to you?
  • Are there any clusters? What do you notice about them? What does this mean?
  • What is the graph density? What does this tell us?
  • How can you make the graph more readable?
  • Create a matrix to depict the relationships..
  •  How would you go about showing how everyone in the class communicates with fellow students and tutors about the social media assignments?
  • Do you have any criticism of the data we collected or how NodeXL represents it? How could we improve the data in the graph?

2.      Advanced (for students who want to use social network data for creative projects)

Week 3: Readings and exercises

Hansen, D., Shneiderman, B., & Smith, M. A. (2010). Analyzing Social Media Networks with NodeXL.  Morgan Kaufmann. (Chapter 10)

  • Papacharissi, Z., de Fatima Oliveira, M.: Affective News and Networked Publics: The Rhythms of News Storytelling on #Egypt. Journal of Communication. 62, 2, 266–282 (2012).
  • Lewis, S.C. et al.: Content Analysis in an Era of Big Data: A Hybrid Approach to Computational and Manual Methods. Journal of Broadcasting & Electronic Media. 57, 1, 34–52 (2013).
  1. Read Hansen et al. Chapter 10 and download your own set of Twitter data to explore and graph your own personal network on Twitter.
  2. Download Twitter search data for a keyword that interests you.

Optional (for creative projects):

  1. Read Chapters 6-11 Stanton, J. (2013). Introduction to Data Science.
  2. Conduct your own popularity contest to compare and graph Twitter activity around two words or phrases which are in the news right now.

Open social – Google code

Open Social – Google Code
I was starting to get “profile fatigue” after joining about four social networking sites in the last few years when I realised it is interesting to watch how people buzz around in swarms from one social site to another. OpenSocial is another good Google idea, a way of hedging bets on which social site will be the darling of the in-crowd and which will just turn out to be the flavour of LAST month. And if OpenSocial takes off, Google gets to keep its strong position as the switch between networks. I wonder whether Facebook will join the list of sites that support this protocol, or if they’ll stay smugly isolated on their own proprietary continent.
Anyone want to help me work out an Intro to Javascript assignment for my students which also involves writing a miniature social application with OpenSocial

Long tails and fat cats: Social networks and inequality

I’ve been fascinated by the idea of the “long tail” and online media for about a year now. The long tail is a distribution graph. For example, you might graph the number of blog readers for each blog, and arrange them in descending order of popularity – you’d find that a small number of blogs would have a large number of readers, and an incredibly large number of blogs (or the long tail) have a small number of readers. The long tail is used to explain how the diversity of online audiences and content on the web have fuelled the growth of new media aggregators and filterers such as Google.
How the long tail works – for some
Here’s a wry comment on the the failure of the Excite search engine>. Although the most popular searches on Excite were for predictable terms such as “sex”, “Britney Spears” and “mp3”, 97% of their traffic came from the “long tail” – a hugely diverse range of pretty unique queries. While Excite failed to figure out what to do with their long tail, Google (which copied Overture) put it to work. They are still systematically “optimising” their techniques of making money from this diverse audience — by using targeted keyword advertising. This is a huge shift from traditional marketing, which sees audiences as segments or categories. For example, the games industry produces loads of games tailored for “18-35 year old males” (naked women on the box, big weapons, lots of blood and gore), and a much smaller number intended for “tweenie girls” (hot pink box, Barbie etc).
How tagging works
The long-tail approach to marketing doesn’t categorise an audience, but rather plays a game of “tag”. Newspapers traditionally categorise a story as “news”, “sport”, “entertainment” etc. Tags, or “folksonomies” work by breaking away from fixed categories, and allow an organic and evolving vocabulary for labelling or annotation of content. Bloggers tag their posts, and social sites such as delicious allow users to tag content. In keyword advertising, an advertiser “tags” their product or service with a set of keywords, and bids for these keywords on a search engine such as Google or Yahoo, and then waits for a user to match the tag with a search query.
These new patterns of media use have been seen to herald the death of the blockbuster. It’s argued that, as people are free to choose from more diverse sources of entertainment, they are less likely to all flock en masse to see the same films and listen to the same music.
The long tail has also been heralded as good news for small, specialised content producers, who now use the web and search engines to target smaller groups of people with very specific interests. From the perspective of developing countries, then, this surely sounds more democratic, and a move away from homogenised “one-size-fits-all” mass produced “McMedia”. Sadly, it also suggests all sorts of new recipes for inequality – the long tail is indeed a “power law” in more senses than one.
First, here’s a sober explanation of who in fact profits from the long tail model of media distribution. The long tail allows fat cat profits – in many cases by producing content for the fat cats for free: In this article, Where’s the money in the long tail, Ventureblog argues that the long tail model only turns a significant profit for media aggregators (e.g. Yahoo’s flickr) and filterers(e.g. search engines, who get to make money from directing the traffic. So there’s a strong centralising tendency emerging as people attempt to make sense of the diversity.
Second, as social networks settle, it’s getting harder and harder to get the kind of attention needed to make any kind of a splash — without a major marketing campaign, that is.
The “long tails” seen in graphs of blog audiences do mean that the vast majority of blogs will be read by only a handful of people. Increasingly, well-capitalised media organisations have huge advantages here, since they have the resources to create content, and more importantly, are able to market the content to audiences via other forms of media.
Blogs to riches – the haves and have nots of the blogging boom.
Most residents of the blogoburbs who talk about social networking and social software don’t feel the need to extend their theories to account for the position of whole groups of people who are not connected, or who occupy a marginal position within global social networks – these people are not in the Rolodex.
Here are two ways that developing countries are probably being systematically marginalised in the social networks that rule the web.
* The search engines favour older, more established content through the time bias in ranking systems – this is a particular problem for those in developing countries who arrived at the Internet party unfashionably late. It remains to be seen whether new localised and community-based versions of search will be able to undo this bias.
* It’s who you know – online networking is about making connections with powerful celebrity players, whose viral marketing will get you attention. Alternatively, you need to be promoted in the media consumed by your target audience.
Systems such as Google or wikipedia are too immense to comprehend easily, and their social effects are similarly complex. Nicholas Carr challenges the technorati and their implicit trust in these statistically “optimised” systems. As he points out, just because something (like the Google algorithm) is technically elegant, doesn’t mean we should accept it and all its social consequences.

Where I have a problem is in [the] implicit trust that the optimization of the system, the achievement of the mathematical perfection of the macroscale, is something to be desired. To people, “optimization” is a neutral term. The optimization of a complex mathematical, or economic, system may make things better for us, or it may make things worse. It may improve society, or degrade it. We may not be able to apprehend the ends, but that doesn’t mean the ends are going to be good.

Read Carr’s whole entryhere .