Visual network analysis with GephiTutorial 03 in a series on controversy mappingAnders Kristian MunkBlockedUnblockFollowFollowingFeb 12In this tutorial, we will cover the basics of doing a visual network analysis in Gephi.
We will use the networks produced in Tutorial 02 as case examples.
Gephi is a piece of software for manipulating network graphs in a visual and explorative format.
The principles of visual network analysis are covered by Venturini et al.
2015 and the design of Gephi specifically by Bastian et al.
Some basic operations: Layout, coloring, node sizeNetworks (a.
graphs) can be stored in a variety of file formats, such as .
gdf or .
gml, which are basically two lists: one with nodes, and one with edges connecting nodes.
Depending on the formats, nodes and edges can have a variety of attributes associated with them (e.
color, category, weight, etc).
The native graph file format for Gephi is .
gexf (this is the file format we will be using for our graphs) but Gephi will load most other graph file formats without problems.
When you open a graph file the following import report should appear (I am opening the network of Circumcision pages connected by in-text links):Opening a .
gexf file (or any other graph file format) generates an ‘import report’.
If you are happy with the information displayed here, just click ‘OK’.
Generally, you should be able to recognize the number of nodes and edges displayed here.
Also, if Gephi encounters problems opening the graph file these will be displayed in the ‘Issues’ window.
You will normally be able to open the file anyway.
Click ‘OK’ if you are happy with the information you see.
The graph will open and be displayed in a random layout (i.
nodes placed randomly in space) such as the one below.
Initial random layout of the graph in the ‘Overview’ pane.
You can turn node labels on and off by clicking the big ‘T’ in the menu at the bottom of the ‘Graph’ window, and you can scale the label sizes by using the slider on the right side of the menu at the bottom of the ‘Graph’ window.
If you want to see all the information for a specific node you can use the ‘Edit node attributes’ tool (cursor with a small question mark) located in the menu on the left side of the ‘Graph’ window.
You can zoom in and out of the graph using the mouse scroll.
You can also right-click and drag the graph to reposition your view.
LayoutIn order to help us explore the structure of the graph and see clusters, bridges, and structural holes, we can use a force directed layout algorithm.
Force directed layouts (or force vector or spring based layouts) will push nodes apart from each.
Edges between nodes will act as springs pulling these nodes together.
Stronger edges (heavier edge weights) will act as stronger springs.
We can choose the ForceAtlas2 Layout from the ‘Layout’ dropdown on the left side of the ‘Graph’ window:Choosing a layout algorithm.
Before running the layout let us review some of the parameters available to us.
Eventually, the idea is to iteratively keep tweaking these parameters as we see the result of the layout.
This, then, is just the initial setup.
‘Scaling’ will control the size of the area over which your layout can spread.
If you need more space between nodes, increasing the scaling is a good option.
Gravity controls the degree to which nodes are pulled towards the center of the network.
If you have parts of the network floating far away from the center, increasing gravity is a good option.
Finally, if you have a cluttering of nodes overlapping each other you can turn on ‘Prevent overlap’.
However, this option should not be used until the Layout is otherwise complete as it may prevent nodes from finding their place.
When your parameters are set, click ‘Run’.
Setting the parameters of the ForceAtlas2 layoutThe result of a ForceAtlas2 layout with Scaling 50, and Gravity set to ‘Stronger’ but decreased to 0.
SizeGiven that this is a directed network where one page points to another page through a link it could make sense to visualize the most cited nodes.
We can do that in the ‘Appearance’ pane to the left of the ‘Graph’ window.
Select ‘Nodes’, ‘Ranking’ and the icon with growing concentric circles.
From the dropdown, select ‘In-Degree’.
This will size the nodes by the volume of incoming edges from other nodes in the network.
Set minimum and maximum size for the nodes and click ‘Apply’Setting node size in the ‘Appearance’ pane.
Nodes sized by in-degreeColorFinally, let us add some color to the nodes.
As is the case with sizing, we can color nodes by many different parameters.
In this case, we will calculate the modularity of the graph and color the nodes by the resulting modularity classes.
From the ‘Statistics’ pane on the right of the ‘Graph’ window, run ‘Modularity’.
For now, we can use default settings.
The algorithm will try to cut the graph into communities where nodes are strongly related to each other inside the community and weakly connected to nodes in other communities.
Click ‘OK’ to run the algorithm.
Running the Modularity algorithm.
When the results are in we can open the ‘Appearance’ pane on the left side of the ‘Graph’ window and select ‘Nodes’, ‘Partition’ and the color palette icon.
From the dropdown menu, select ‘Modularity Class’ and click ‘Apply’.
Coloring nodes by Modularity Class.
Nodes colored by modularity.
More advanced stuffNow that we master the basics of graph layout, colors, and node sizing we can think about ways of applying those operations in combination with various kinds of network statistics, different ways of filtering the graph, or additional overlayed information about nodes and edges.
Adding and visualizing node attributesI have run the script that queries my Wikipedia category member pages for mentions of specific keywords.
This produces a .
csv output (in this case I have queried members of the Circumcision category for mentions of terms like ‘Muslim’ or ‘Jewish’).
If I want to use that information in Gephi, I need to import the .
csv file into a project with a network of the same nodes.
In this case, I have a network opened already with member pages from the Circumcision category connected by in-text links.
If I switch from ‘Overview’ to ‘Data Laboratory’ (very top above the workspace tabs) I can see the list of nodes in my network displayed as a table.
Currently, there are no attributes except for the ‘Modularity Class’ I computed for coloring earlier in this tutorial.
The Data Laboratory showing a table of nodes.
To import the .
csv with keyword mentions, click ‘Import Spreadsheet’ in the bar above the nodes table.
Select the .
csv file from the right location on your computer.
We now have to go through 3 steps to complete the import.
First, some general CSV options will appear.
You just have to check that the rows and columns are parsed correctly.
Below I can see by keywords figuring as independent rows with a column for each page.
That looks right.
CSV import, first window.
Second, I need to check if the values are parsed correctly.
Since my values are numbers (counting mentions of a keyword for each page) the values should be parsed as integers (not strings).
This also looks right.
CSV import, second window.
Third, I need to make sure that the import is appending the .
csv to my existing network rather than opening the .
csv as a new workspace.
This is not the default setting.
Therefore, I have to change the tick-box to ‘Append to existing workspace’ before clicking ‘OK.
CSV import, third window.
When I inspect the nodes table in the ‘Data Laboratory’ I should now be able to see my existing nodes, still with the ‘Modularity Class’ column intact, but also with the new columns for keyword counts that I just imported.
In this particular case, if I scroll to the bottom of the node list, a few nodes have been created that were not in my network already.
I can spot them because they do not have a ‘Label’.
The reason for this redundancy is the fact that my network only includes those pages from the Circumcision category that have in-text links to other pages from the Circumcision category.
My scrape for keywords, however, return results on all pages from the Circumcision category, including some that are not linked in my network.
We can select these redundant nodes on the table, right-click the selection and delete it.
Deleting redundant nodes in the Data Laboratory after .
When we switch back to the ‘Overview’ from the ‘Data Laboratory’ we will now have some new options available in the ‘Appearance’ tab.
Below I first size the nodes by the degree to which they mention the term ‘Jewish’ and then color them by the degree to which they mention the term ‘Jewish’.
In both cases, I click the ‘Spline…’ option in the bottom left corner of the ‘Appearance’ pane before applying sizes or colors.
This allows me to use a different (non-linear) distribution of the mention counts when I visualize them.
Given that many pages mention ‘Jewish’ once or twice, whereas a few mentions ‘Jewish’ more than 20 times, and given that I am interested in making all mentions visible, I choose a distribution that gives higher weight to the smaller mention counts.
Sizing nodes by mentions of the term ‘Jewish’Nodes sized by the degree to which they mention ‘Jewish’.
Selecting a non-linear spline for node sizes.
Nodes resized with a non-linear spline to show give more equal .
visual presence to nodes that only mention ‘Jewish’ once or twice.
Coloring nodes by mentions of the term ‘Jewish’ (also with a non-linear spline)References (for the entire tutorial series)Barry, A.
The anti-political economy.
Economy and society, 31(2), 268–284.
(16 pages)Bastian, M.
, Heymann, S.
, & Jacomy, M.
Gephi: an open source software for exploring and manipulating networks.
ICWSM, 8, 361–362.
, & Matamoros-Fernández, A.
Mapping sociocultural controversies across digital media platforms: one week of# gamergate on Twitter, YouTube, and Tumblr.
Communication Research and Practice, 2(1), 79–96.
(17 pages)Callon, M.
The role of lay people in the production and dissemination of scientific knowledge.
Science, Technology and Society, 4(1), 81–94.
(13 pages)Collins, H.
The seven sexes: A study in the sociology of a phenomenon, or the replication of experiments in physics.
Sociology, 9(2), 205–224.
(19 pages)DiSalvo, C.
, Lukens, J.
, Lodato, T.
, Jenkins, T.
, & Kim, T.
Making public things: how HCI design can express matters of concern.
In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp.
(9 pages)Elgaard Jensen, T.
, Kleberg Hansen, A.
, Ulijaszek, S.
, Munk, A.
, Madsen, A.
, Hillersdal, L.
and Jespersen, A.
Identifying notions of environment in obesity research using a mixed‐methods approach.
(10 pages)Epstein, S.
The construction of lay expertise: AIDS activism and the forging of credibility in the reform of clinical trials.
Science, Technology, & Human Values, 20(4), 408–437.
(29 pages)Jasanoff, S.
Genealogies of STS.
Social Studies of Science, 42(3), 435–441.
(6 pages)Latour, B.
Why has critique run out of steam?.From matters of fact to matters of concern.
Critical inquiry, 30(2), 225–248.
(23 pages)Latour, B.
, Jensen, P.
, Venturini, T.
, Grauwin, S.
, and Boullier, D.
The whole is always smaller than its parts’ — a digital test of Gabriel Tardes’ monads.
The British Journal of Sociology 2012 Volume 63 Issue 4, 590–615.
(25 pages)Law, J.
, & Singleton, V.
ANT, multiplicity and policy.
Critical policy studies, 8(4), 379–396.
(18 pages)Marres, N.
Issues spark a public into being: A key but often forgotten point of the Lippmann-Dewey debate.
Making things public: Atmospheres of democracy, 208–217.
(9 pages)Marres, N.
Why map issues?.On controversy analysis as a digital method.
Science, Technology, & Human Values, 40(5), 655–686.
(31 pages)Marres, N.
, & Moats, D.
Mapping controversies with Social Media: The case for symmetry.
Social Media+ Society, 1(2).
(17 pages)Merton, R.
Priorities in scientific discovery: a chapter in the sociology of science.
American sociological review, 22(6), 635–659.
(24 pages)Munk, A.
Mapping Wind Energy Controversies Online: Introduction to Methods and Datasets.
(24 pages)Pinch, T.
, & Leuenberger, C.
Studying scientific controversy from the STS perspective.
concluding remarks on panel ‘Citizen Participation and Science and Technology (11 pages)Rogers, R.
Digital Traces in Context| Otherwise Engaged: Social Media from Vanity Metrics to Critical Analytics.
International Journal of Communication, 12, 23.
(22 pages)Rogers, Richard.
Foundations of Digital Methods: Query Design.
In: The Datafied Society: Studying Culture through Data, Publisher: Amsterdam University Press, Editors: Mirko Schaefer and Karin van Es, pp.
75–94 (19 pages)Stengers, I.
The cosmopolitical proposal.
Making things public: Atmospheres of democracy, 994–1003.
(9 pages)Thompson, C.
When elephants stand for competing philosophies of nature: Amboseli National Parc, Kenya.
Law et A.
), Complexities, 166–190.
(24 pages)Venturini, T.
Diving in magma: how to explore controversies with actor-network theory.
Public understanding of science, 19(3), 258–273.
(25 pages)Venturini, T.
Building on faults: how to represent controversies with digital methods.
Public understanding of science, 21(7), 796–812.
(16 pages)Venturini, T.
, Baya Laffite, N.
, Cointet, J.
, Gray, I.
, Zabban, V.
, & De Pryck, K.
Three maps and three misunderstandings: A digital mapping of climate diplomacy.
Big Data & Society, 1(2).
(19 pages)Venturini, T.
, Jacomy, M.
, & Carvalho, P.
Visual Network Analysis.
Mapping knowledge controversies: science, democracy and the redistribution of expertise.
Progress in Human Geography, 33(5), 587–598.
(11 pages)Whatmore, S.
, & Landström, C.
Flood apprentices: an exercise in making things public.
Economy and society, 40(4), 582–610.