Module 2, Practical 8

In this practical we will keep working with data structures. In particular, we will see a very important and quite complex data structure called graph.

Graphs

Graphs are mathematical structures made of two key elements: nodes (or vertices) and edges.
Nodes are objects that we want to represent and edges are relationships among these objects. Mathematically, a graph is an entity \(G = (N,E)\) where \(N\) is a set of nodes and \(E = N \times N\) is the set of edges.

Nodes are most often represented as circles, while edges (relationships) are represented by lines/arrows connecting the nodes. An example follows:

_images/graph_1.png

Relations represented by edges can be symmetric (e.g. sibling_of: if \(X\) is sibling of \(Y\) then \(Y\) is sibling of \(X\)) and in this case the edges are just lines rather than arrows. When edges are symmetric the graph is undirected.

If relationships are not symmetric (i.e. \(X \rightarrow Y\) does not imply \(Y \rightarrow X\)) we put an arrow to indicate the direction of the relationship among the nodes and in this case we say the graph is directed.

Some terminology (from the lecture):

_images/graphTerms.png

The degree of a node is the number of connections it has with other nodes. In directed graphs the in-degree is the number of incoming edges, while the out-degree is the number of outgoing edges.

_images/degree.png

A path in the graph is a sequence of nodes connected by edges.

_images/path.png

Graph ADT

Graphs are dynamic data structures in which nodes and edges can be added/removed. The description of the Graph Abstract Data Type follows (from the lecture):

_images/graphADT.png

This is the most general definition, as in some cases nodes and edges can only be added and not removed.

There are two classic ways of implementing a Graph: adjacency matrices and linked lists.

Implementation as adjacency matrix

A square matrix \(G\) having the size \(N \times N\) where \(N\) is the number of nodes, is used to represent every possible connection among the nodes of the graph. In particular \(G[i,j] = 1\) if the graph has an edge connecting node \(i\) to node \(j\), if that is not the case \(G[i,j] = 0\).

An example of graph as adjacency matrix follows (from lecture):

_images/adjacency.png

This representation of a graph has both advantages and disadvantages:

  • it is quite flexible as it is possible to put weights on the values of the matrix instead of only 0 and 1;

  • it is quite quick to check the presence of an edge (both ways!): this just requires a lookup in the matrix G;

  • it uses a lot of space and most of the values often are 0 (sparse matrix, a lot of space is therefore wasted);

  • in undirected graphs, the matrix is symmetric therefore half of the space could be saved.

Let’s see how we can implement a directed weighted graph as an adjacency matrix in Python.

[1]:
class DiGraphAsAdjacencyMatrix:
    def __init__(self):
        self.__nodes = list() # a set would be better, but we need an index to define
                              # the adjacency matrix
        self.__matrix = list()

    def __len__(self):
        """gets the number of nodes"""
        return len(self.__nodes)

    def nodes(self):
        return self.__nodes
    def matrix(self):
        return self.__matrix

    def __str__(self):
        header = "\t".join([n for n in self.__nodes])
        data = ""
        for i in range(0,len(self.__matrix)):
            data += str(self.__nodes[i]) + "\t"
            data += "\t".join([str(x) for x in self.__matrix[i]]) + "\n"

        return "\t"+ header +"\n" + data

    def insertNode(self, node):
        #add the node if not there already.
        if node not in self.__nodes:
            self.__nodes.append(node)
            #add a row and a column of zeros in the matrix
            if len(self.__matrix) == 0:
                #first node
                self.__matrix = [[0]]
            else:
                N = len(self.__nodes)
                for row in self.__matrix:
                    row.append(0)
                self.__matrix.append([0 for x in range(N)])

    def insertEdge(self, node1, node2, weight):
        i = -1
        j = -1
        if node1 in self.__nodes:
            i = self.__nodes.index(node1)
        if node2 in self.__nodes:
            j = self.__nodes.index(node2)
        if i != -1 and j != -1:
            self.__matrix[i][j] = weight

    def deleteEdge(self, node1,node2):
        """removing an edge means setting its
        corresponding slot in the matrix to 0"""
        i = -1
        j = -1
        if node1 in self.__nodes:
            i = self.__nodes.index(node1)
        if node2 in self.__nodes:
            j = self.__nodes.index(node2)
        if i != -1 and j != -1:
            self.__matrix[i][j] = 0

    def deleteNode(self, node):
        """removing a node means removing
        its corresponding row and column in the matrix"""
        i = -1

        if node in self.__nodes:
            i = self.__nodes.index(node)
        #print("Removing {} at index {}".format(node, i))
        if node != -1:
            self.__matrix.pop(i)
            for row in self.__matrix:
                row.pop(i)
            self.__nodes.pop(i)

    def adjacent(self, node):
        """Your treat! (see exercise 1)"""

    def edges(self):
        """Your treat! (see exercise1). Returns all the edges"""

if __name__ == "__main__":
    G = DiGraphAsAdjacencyMatrix()

    for i in range(6):
        n = "Node_{}".format(i+1)
        G.insertNode(n)

    for i in range(0,4):
        n = "Node_" + str(i+1)
        six = "Node_6"
        n_plus = "Node_" + str((i+2) % 6)
        G.insertEdge(n, n_plus,0.5)
        G.insertEdge(n, six,1)
    G.insertEdge("Node_5", "Node_1", 0.5)
    G.insertEdge("Node_5", "Node_6", 1)
    G.insertEdge("Node_6", "Node_6", 1)
    print(G)

    G.insertNode("Node_7")
    G.insertEdge("Node_1", "Node_7", -1)
    G.insertEdge("Node_2", "Node_7", -2)
    G.insertEdge("Node_5", "Node_7", -5)
    G.insertEdge("Node_7", "Node_2", -2)
    G.insertEdge("Node_7", "Node_3", -3)

    print("Size is: {}".format(len(G)))
    print("Nodes: {}".format(G.nodes()))
    print("\nMatrix:")
    print(G)
    G.deleteNode("Node_7")
    G.deleteEdge("Node_6", "Node_2")
    #no effect, nodes do not exist!
    G.insertEdge("72", "25",3)
    print(G)
        Node_1  Node_2  Node_3  Node_4  Node_5  Node_6
Node_1  0       0.5     0       0       0       1
Node_2  0       0       0.5     0       0       1
Node_3  0       0       0       0.5     0       1
Node_4  0       0       0       0       0.5     1
Node_5  0.5     0       0       0       0       1
Node_6  0       0       0       0       0       1

Size is: 7
Nodes: ['Node_1', 'Node_2', 'Node_3', 'Node_4', 'Node_5', 'Node_6', 'Node_7']

Matrix:
        Node_1  Node_2  Node_3  Node_4  Node_5  Node_6  Node_7
Node_1  0       0.5     0       0       0       1       -1
Node_2  0       0       0.5     0       0       1       -2
Node_3  0       0       0       0.5     0       1       0
Node_4  0       0       0       0       0.5     1       0
Node_5  0.5     0       0       0       0       1       -5
Node_6  0       0       0       0       0       1       0
Node_7  0       -2      -3      0       0       0       0

        Node_1  Node_2  Node_3  Node_4  Node_5  Node_6
Node_1  0       0.5     0       0       0       1
Node_2  0       0       0.5     0       0       1
Node_3  0       0       0       0.5     0       1
Node_4  0       0       0       0       0.5     1
Node_5  0.5     0       0       0       0       1
Node_6  0       0       0       0       0       1

The matrix above represents the following graph:

_images/star.png

Download the complete source file: DiGraphAsAdjacencyMatrix.py

Exercises

  1. Consider the DiGraphAsAdjacencyMatrix graph class. Add the following methods:

  • adjacent(self, node) : given a node returns all the nodes connected to it;

  • adjacentEdge(self, node, incoming=True) : given a node, returns all the nodes close to it (incoming if “incoming=True” or outgoing if “incoming = False”) as a list of triplets (source_node, dest_node, weight);

  • edges(self) : returns all the edges in the graph as triplets (i,j, weight);

  • edgeIn(self, node1, node2) : check if the edge node1 –> node2 is in the graph;

Test the code with:

G = DiGraphAsAdjacencyMatrix()
for i in range(6):
    n = "Node_{}".format(i+1)
    G.insertNode(n)

for i in range(0,4):
    n = "Node_" + str(i+1)
    six = "Node_6"
    n_plus = "Node_" + str((i+2) % 6)
    G.insertEdge(n, n_plus,0.5)
    G.insertEdge(n, six,1)
G.insertEdge("Node_5", "Node_1", 0.5)
G.insertEdge("Node_5", "Node_6", 1)
G.insertEdge("Node_6", "Node_6", 1)


G.insertNode("Node_7")
G.insertEdge("Node_1", "Node_7", -1)
G.insertEdge("Node_2", "Node_7", -2)
G.insertEdge("Node_5", "Node_7", -5)
G.insertEdge("Node_7", "Node_2", -2)
G.insertEdge("Node_7", "Node_3", -3)


G.deleteNode("Node_7")
G.deleteEdge("Node_6", "Node_2")
#no effect, nodes do not exist!
G.insertEdge("72", "25",3)
print(G)

print("\nNodes connected to Node_6:")
print(G.adjacent("Node_6"))
print("\nNodes connected to Node_4:")
print(G.adjacent("Node_4"))
print("\nNodes connected to Node_3:")
print(G.adjacent("Node_3"))
print("Edges outgoing from Node_3:")
print(G.adjacentEdge("Node_3", incoming = False))
print("Edges incoming to Node_3:")
print(G.adjacentEdge("Node_3", incoming = True))
print("\nEdges incoming to Node_6:")
print(G.adjacentEdge("Node_6", incoming = True))
print("\nEdges incoming to Node_743432:")
print(G.adjacentEdge("Node_743432", incoming = True))
print("\nAll edges:")

print(G.edges())

print("\nIs (Node_4,Node_5) there? {}".format( G.edgeIn("Node_4","Node_5")))
print("Is (Node_4,Node_3) there? {}".format( G.edgeIn("Node_4","Node_3")))
print("Is (Node_3,Node_4) there? {}".format( G.edgeIn("Node_3","Node_4")))
print("Is (Node_6,Node_6) there? {}".format( G.edgeIn("Node_6","Node_6")))

Show/Hide Solution

  1. Extend the DiGraphAsAdjacencyMatrix class creating a subclass DiGraphAmAnalyzer and adding the following methods:

  • getTopConnected_incoming(self): finds the node with the highest number of in-coming connections;

  • getTopConnected_outgoing(self): finds the node with the highest number of out-going connections;

  • hasPath(self, node1,node2) to check if there is a path connecting node1 to node2 (if it exists return the path as a list of pair of nodes, otherwise None;

You can test your methods with the following code:

G = DiGraphAmAnalyzer()
for i in range(6):
    n = "Node_{}".format(i+1)
    G.insertNode(n)

for i in range(0,4):
    n = "Node_" + str(i+1)
    six = "Node_6"
    n_plus = "Node_" + str((i+2) % 6)
    G.insertEdge(n, n_plus,0.5)
    G.insertEdge(n, six,1)

G.insertEdge("Node_5", "Node_1", 0.5)
G.insertEdge("Node_5", "Node_6", 1)
G.insertEdge("Node_6", "Node_6", 1)
print("Top connected (outgoing):")
print(G.getTopConnected_outgoing())
print("Top connected (incoming):")
print(G.getTopConnected_incoming())
print("\nAdding edge Node_5 -- 0.5 --> Node_5")
G.insertEdge("Node_5", "Node_5", 0.5)
print("Top connected (outgoing):")
print(G.getTopConnected_outgoing())
print("\nAre Node_1 and Node_4 connected?")
print("{}".format(G.hasPath("Node_1","Node_4")))
print("\nRemoving Node_6")
G.deleteNode("Node_6")
print("Top connected (outgoing):")
print(G.getTopConnected_outgoing())
print("Top connected (incoming):")
print(G.getTopConnected_incoming())
G.insertNode("Node_alone")
G.insertNode("Node_alone2")
G.insertEdge("Node_alone", "Node_alone2", 1)

print("\nAre Node_1 and Node_alone2 connected?")
print(G.hasPath("Node_1", "Node_alone2"))
print("Are Node_alone2 and Node_alone connected?")
print(G.hasPath("Node_alone2", "Node_alone"))

Show/Hide Solution

Implementation as (adjacency) linked list

In this case a graph \(G\) is represented as an adjacency linked list, where each node \(N\) has a linked-list of nodes connected to it in \(G\). In the case of directed graphs, every node contains a list of all the nodes reachable through some outgoing edges, while in the case of undirected graphs the list will be of all nodes connected together by means of an edge.

Some examples follow for both the cases of directed

_images/linkedlistdir.png

and undirected graphs (from lecture):

_images/linkedlistundir.png

The implementation through adjacency linked lists has both advantages and disadvantages:

  • it is flexible, nodes can be complex objects (with the only requirement of the attribute linking to the neighboring nodes);

  • in general, it uses less space, only that required by the pointers encoding for the existing edges;

  • checking presence of an edge is in general slower (this requires going through the list of source node);

  • getting all incoming edges of a node is slow (requires going through all nodes!). A workaround to this problem is to store not only outgoing-edges but also incoming edges (but this requires more memory).

Let’s see how we can implement a directed weighted graph as an adjacency linked list in Python.
We will use a dictionary to represent nodes and corresponding connections.
Therefore, we are not going to implement a linked list, but rather a dictionary of outgoing edges. That is, each node N is a dictionary of edges (the key of the dictionary is the node to which N is connected and the value is the weight).
[4]:
class DiGraphLL:
    def __init__(self):
        """Every node is an element in the dictionary.
        The key is the node id and the value is a dictionary
        with second node as key and the weight as value
        """
        self.__nodes = dict()

    def insertNode(self, node):
        test = self.__nodes.get(node, None)

        if test == None:
            self.__nodes[node] = {}
            #print("Node {} added".format(node))

    def insertEdge(self, node1, node2, weight):
        test = self.__nodes.get(node1, None)
        test1 = self.__nodes.get(node2, None)
        if test != None and test1 != None:
            #if both nodes exist othewise don't do anything
            test = self.__nodes[node1].get(node2, None)
            if test != None:
                exStr= "Edge {} --> {} already existing.".format(node1,node2)
                raise Exception(exStr)
            else:
                #print("Inserted {}-->{} ({})".format(node1,node2,weight))
                self.__nodes[node1][node2] = weight

    def deleteNode(self, node):
        test = self.__nodes.get(node, None)
        if test != None:
            self.__nodes.pop(node)
        # need to loop through all the nodes!!!
        for n in self.__nodes:
            test = self.__nodes[n].get(node, None)
            if test != None:
                self.__nodes[n].pop(node)

    def deleteEdge(self, node1,node2):
        test = self.__nodes.get(node1, None)
        if test != None:
            test = self.__nodes[node1].get(node2, None)
            if test != None:
                self.__nodes[node1].pop(node2)

    def __len__(self):
        return len(self.__nodes)

    def nodes(self):
        return list(self.__nodes.keys())

    def graph(self):
        return self.__nodes

    def __str__(self):
        ret = ""
        for n in self.__nodes:
            for edge in self.__nodes[n]:

                ret += "{} -- {} --> {}\n".format(str(n),
                                                  str(self.__nodes[n][edge]),
                                                  str(edge))
        return ret

    def adjacent(self, node, incoming = True):
        """Your treat! (see exercise 3)"""

    def edges(self):
        """Your treat! (see exercise 3). Returns all the edges"""


if __name__ == "__main__":
    G = DiGraphLL()
    for i in range(6):
        n = "Node_{}".format(i+1)
        G.insertNode(n)

    for i in range(0,4):
        n = "Node_" + str(i+1)
        six = "Node_6"
        n_plus = "Node_" + str((i+2) % 6)
        G.insertEdge(n, n_plus,0.5)
        G.insertEdge(n, six,1)
    G.insertEdge("Node_5", "Node_1", 0.5)
    G.insertEdge("Node_5", "Node_6", 1)
    G.insertEdge("Node_6", "Node_6", 1)
    print(G)

    G.insertNode("Node_7")
    G.insertEdge("Node_1", "Node_7", -1)
    G.insertEdge("Node_2", "Node_7", -2)
    G.insertEdge("Node_5", "Node_7", -5)
    G.insertEdge("Node_7", "Node_2", -2)
    G.insertEdge("Node_7", "Node_3", -3)

    print("Size is: {}".format(len(G)))
    print("Nodes: {}".format(G.nodes()))
    print("Graph:")
    print(G)
    G.deleteNode("Node_7")
    G.deleteEdge("Node_6", "Node_2")
    #nodes do not exist! Therefore nothing happens!
    G.insertEdge("72", "25",3)
    print(G)
    print("Nodes: {}".format(G.nodes()))
    G.deleteEdge("72","25")
    print("Nodes: {}".format(G.nodes()))
    print(G)
Node_1 -- 0.5 --> Node_2
Node_1 -- 1 --> Node_6
Node_2 -- 0.5 --> Node_3
Node_2 -- 1 --> Node_6
Node_3 -- 0.5 --> Node_4
Node_3 -- 1 --> Node_6
Node_4 -- 0.5 --> Node_5
Node_4 -- 1 --> Node_6
Node_5 -- 0.5 --> Node_1
Node_5 -- 1 --> Node_6
Node_6 -- 1 --> Node_6

Size is: 7
Nodes: ['Node_1', 'Node_2', 'Node_3', 'Node_4', 'Node_5', 'Node_6', 'Node_7']
Graph:
Node_1 -- 0.5 --> Node_2
Node_1 -- 1 --> Node_6
Node_1 -- -1 --> Node_7
Node_2 -- 0.5 --> Node_3
Node_2 -- 1 --> Node_6
Node_2 -- -2 --> Node_7
Node_3 -- 0.5 --> Node_4
Node_3 -- 1 --> Node_6
Node_4 -- 0.5 --> Node_5
Node_4 -- 1 --> Node_6
Node_5 -- 0.5 --> Node_1
Node_5 -- 1 --> Node_6
Node_5 -- -5 --> Node_7
Node_6 -- 1 --> Node_6
Node_7 -- -2 --> Node_2
Node_7 -- -3 --> Node_3

Node_1 -- 0.5 --> Node_2
Node_1 -- 1 --> Node_6
Node_2 -- 0.5 --> Node_3
Node_2 -- 1 --> Node_6
Node_3 -- 0.5 --> Node_4
Node_3 -- 1 --> Node_6
Node_4 -- 0.5 --> Node_5
Node_4 -- 1 --> Node_6
Node_5 -- 0.5 --> Node_1
Node_5 -- 1 --> Node_6
Node_6 -- 1 --> Node_6

Nodes: ['Node_1', 'Node_2', 'Node_3', 'Node_4', 'Node_5', 'Node_6']
Nodes: ['Node_1', 'Node_2', 'Node_3', 'Node_4', 'Node_5', 'Node_6']
Node_1 -- 0.5 --> Node_2
Node_1 -- 1 --> Node_6
Node_2 -- 0.5 --> Node_3
Node_2 -- 1 --> Node_6
Node_3 -- 0.5 --> Node_4
Node_3 -- 1 --> Node_6
Node_4 -- 0.5 --> Node_5
Node_4 -- 1 --> Node_6
Node_5 -- 0.5 --> Node_1
Node_5 -- 1 --> Node_6
Node_6 -- 1 --> Node_6

Download the complete source file: DiGraphLL.py

Exercises

  1. Consider the DiGraphLL graph class. Add the following methods:

  • adjacent(self, node) : given a node returns all the nodes connected to it (both incoming and outgoing);

  • adjacentEdge(self, node, incoming=True) : given a node, returns all the nodes close to it (incoming if “incoming=True” or outgoing if “incoming = False”) as a list of pairs (node, other, weight);

  • edges(self) : returns all the edges in the graph as pairs (i,j, weight);

  • edgeIn(self, node1, node2) : check if the edge node1 –> node2 is in the graph;

Test your methods with the test code from the previous exercise, changing DiGraphAsAdjacencyMatrix with DiGraphLL.

Show/Hide Solution

  1. Extend the DiGraphLL class creating a subclass DiGraphLLAnalyzer by adding the following methods:

  • getTopConnected_incoming(self): finds the node with the highest number of in-coming connections;

  • getTopConnected_outgoing(self): finds the node with the highest number of out-going connections;

  • hasPath(self, node1,node2) to check if there is a path connecting node1 to node2 (if it exists return the path as a list of pair of nodes, otherwise None;

Test your class with the following code:

G = DiGraphLLAnalyzer()
    for i in range(6):
        n = "Node_{}".format(i+1)
        G.insertNode(n)

    for i in range(0,4):
        n = "Node_" + str(i+1)
        six = "Node_6"
        n_plus = "Node_" + str((i+2) % 6)
        G.insertEdge(n, n_plus,0.5)
        G.insertEdge(n, six,1)
    G.insertEdge("Node_5", "Node_1", 0.5)
    G.insertEdge("Node_5", "Node_6", 1)
    G.insertEdge("Node_6", "Node_6", 1)
    print("Top connected (outgoing):")
    print(G.getTopConnected_outgoing())
    print("Top connected (incoming):")
    print(G.getTopConnected_incoming())
    print("\nAdding edge Node_5 -- 0.5 --> Node_5")
    G.insertEdge("Node_5", "Node_5", 0.5)
    print("Top connected (outgoing):")
    print(G.getTopConnected_outgoing())
    print("\nAre Node_1 and Node_4 connected?")
    print("{}".format(G.hasPath("Node_1","Node_4")))
    print("\nRemoving Node_6")
    G.deleteNode("Node_6")
    print("Top connected (outgoing):")
    print(G.getTopConnected_outgoing())
    print("Top connected (incoming):")
    print(G.getTopConnected_incoming())
    G.insertNode("Node_alone")
    G.insertNode("Node_alone2")
    G.insertEdge("Node_alone", "Node_alone2", 1)

    print("\nAre Node_1 and Node_alone2 connected?")
    print(G.hasPath("Node_1", "Node_alone2"))
    print("Are Node_alone2 and Node_alone connected?")
    print(G.hasPath("Node_alone2", "Node_alone"))

Show/Hide Solution