Assignment 3: Visualizing Networks and Final Assignment

Deadline: 16-01-2023 9:00AM

The third assignment and the final assignment have been merged. You now only have to hand in one assignment, consisting of two parts: visualizing networks, and a free-choice final assignment.

I. Visualizing Networks

The assignment here is to analyze and visualize a social network of Star Wars characters, as well as understand the measures used to analyze this data-set.

Data

The data for this assignment can be found on the Datasets page. It is taken from Gabasova (2016) and consists of network data pertaining to the number of interactions and mentions between characters per scene. Because Gabasova’s network data is in json format and we haven’t discussed that in detail, I have processed the json files into separate csv files for nodes and edges. See the metadata of both Gabasova and of my csvs.

Analysis and visualization

Analyze the network in terms of centrality and modularity, and visualize it with gephi or any other appropriate software.

Report

Your report (500-1500 words) should be in PDF and should contain:

  • A clear reference to the network data used
  • a discussion of your design decisions
  • the following measures of your network:
    • a table with the ten highest ranking nodes as measured by Betweenness and Degree; and an interpretation of said measures with respect to the data
    • the number of nodes per Louvain community; and, if possible, an interpretation of the communities
  • An explanation in your own words and reference to outside sources where applicable of:
    • What is the difference between Betweenness and Closeness?
    • How does Louvain Modularity work?
  • a discussion of challenges and opportunities (if any)

II. Free Choice

The second part of the final project’s aim is to create a self-contained information visualization of a story contained within a data-set of your choice. What does this entail?

Self-contained

Self-contained in this case means that your visualization can be viewed and understood independently.

  • viewed independently: you should not have to download specialized software to view the visualization (i.e. it is in a familiar  file format such as .png or .pdf).
  • understood independently: you do not have to read a meta-data file, your report or any other external source to understand the basic story your visualization is trying to convey. Links in the visualization to external sources can be present but really act as additional information for people who e.g. want to check your references or read further into the topic.

Story

Find a story that you would want to visualize. I suggest departing from a phenomenon you are interested in and looking for a suitable set of data you can use for this (see below). Find a story in this data, but do not worry too much if this story does not overturn your or anyone else’s expectations. The most important thing is to independently find and visualize information.

Data-set(s) of your choice

The idea for this project is not that you will create an entire new set of data yourself through research as that would take too long (save that for your Master and potentially PhD thesis). Instead you should use one or multiple data-sets that are already available for you to use as the basis for your information visualization.

There are a lot of data-sets available online. The following are a selection of data-sets or portals that may provide what you need (in no particular order):

  • UN Data: The portal for all sorts of United Nations macro-scale data.
  • IATI Registry: A portal where major developmental NGOs provide information on the projects they undertake in as transparent a way as possible.
  • Natural Earth: Free raster and vector maps, including info on borders, names, and more. 
  • SEDAC: NASA’s Socio-Economic Data and Applications Center
  • European Data Portal: A portal that collects even more data from accross the Public Sector in Europe.
  • DANS: The KNAW’s (Koninklijke Nederlandse Akademie van Wetenschappen) data repository curating more than 200.000 data-sets based on (Dutch) academic research.
  • The Trans-Atlantic Slave Trade Database: A specific data-set focused on the records of 36.000 slaving voyages.
  • Connected Histories: integrated search for British history sources from 1500-1900, including Old Bailey Online, the Digital Panopticon, and many more.
  • Delpher: Over 100 million digitized Dutch newspaper pages.
  • Stanford Large Network Dataset Collection: It’s both a large collection as well as a collection of large networks! 
  • KONECT: Another network collection (also large, not all of which are large networks) from the university of Koblenz.
  • Data.World: A commercial platform that hosts a large collection of data-sets on various sources. Not all data are open, however, and a sign-up is required.

It is virtually impossible to keep track of all the Open Data collections or portals, let alone data-sets, out there. If you find a good one that you think should be listed here for the benefit of future participants, let me know!

Report

Aside from the visualization you will also write a 1500 word max. report in which you will:

  • An explanation of the story you are visualizing
  • The data and design tools you used and choices you made in creating this visualization in a way that is detailed in such a way that it would in theory be able to roughly replicate the visualization.
    • do not forget to reference course literature or other sources of information where appropriate!
  • Challenges encountered during the project and opportunities you see for this or similar information visualizations
  • A project timetable, including an overview of time spent on the project and optional self-learning

Hand-in

You will have to hand in before 16 January 2023 9.00 AM, via Brightspace:

  • The visualization(s) you created in PDF or PNG for both parts I and II
  • Your reports for parts I and II separately

Grading

Grading will be based on the following elements:

  • Mind at work (3 points)
  • Visualization(s) (4 points)
  • Network analysis (1 points)
  • Report (2 points)

Resit

Deadline: 01-03-2023 9:00AM

The resit of the third assignment and the final assignment is due before March 1, 9:00 AM. It is essentially the same as the final assignment, with a few differences. Read closely:

I. Visualizing Networks

The assignment here is to analyze and visualize a social network of Lord of the Rings characters, as well as understand (and to show that you understand) the measures used to analyze this data-set.

Data

The data for this assignment can be found on the Datasets page. It is taken from GitHub user Raphtory and consists of network data pertaining to the number of sentences two characters co-occur in over the entirety of the Lord of the Rings. Because of a slightly different nature to the data, I have merged it into a Gephi-usable csv for you.

Analysis and visualization

Analyze the network in terms of centrality and modularity, and visualize it with gephi or any other appropriate software.

Report

Your report (500-1500 words) should be in PDF and should contain:

  • A clear reference to the network data used
  • a discussion of your design decisions
  • the following measures of your network:
    • a table with the ten highest ranking nodes as measured by Closeness and Degree; and an interpretation of said measures with respect to the data
    • the number of nodes per Louvain community; and, if possible, an interpretation of the communities
  • An explanation in your own words and reference to outside sources where applicable of:
    • How does edge weight influence centrality in general, and Betweenness centrality in specific?
    • How does Louvain Modularity work (and how does edge weight influence it)?
  • a discussion of challenges and opportunities (if any)

II. Free Choice

The second part of the final project’s aim is to create a self-contained information visualization of a story contained within a data-set of your choice. This is the same as Part II of Assignment 3 (i.e. the non-resit final assignment; see above).

However, you are not allowed to use the same dataset as you did in your first attempt.