Hello. Today we're going to talk about how to find important nodes in a network. Recall these friendship network among 34 people in a karate club. Based on the structure of this network, which would you say are the five most important nodes in this network? Of course, there are many ways of thinking about this question. We're going to start thinking about different ways in which you could imagine answering this particular question. One way to answer the question would be to say, well, nodes who have a very high degree, nodes who have lots of friends are important nodes. If you use that definition, then we'll find out the five most important nodes are nodes 34, 1, 33, 3, and 2, these nodes right here. There are other ways in which you could imagine answering this question. Another way would be to say that nodes who are important are nodes who are very close to other nodes in network, nodes who have high proximity to other nodes in network. If you use that definition, then the five most important nodes in the network would be nodes 1, 3, 34, 32, and 9. Instead of having node 33, we'll have node 9, and then instead of having node 2, we'll have node 32 and all the other ones stay the same. Yet another way of thinking about importance would be to say that nodes who are important are nodes who tend to connect other nodes in the network. We could imagine measuring importance by the fraction of shortest paths that pass through a particular node. If we do that, if we define it in that way, we find the five most important nodes in the network are nodes 1, 34, 33, 3, and 32. Instead of having node number 9, we'll have node number 33 in the top five and all other nodes would stay the same. At this point, these three definitions of importance are very informal and the goal of this video and the following videos is going to be to get a more precise definition of how to measure importance in a network. This general topic is called network centrality, and these centrality measures are the ones that allows us to find the most important nodes in a particular network. The cases in which we want to use these techniques is when, for example, sometimes we are interested in finding who the influential nodes in a social network are? Which are the nodes that are very good at disseminating information to many other nodes? Or on the other hand, which are the nodes that are good at preventing bad behaviors or epidemics from spreading on a social network? These can be used in other networks as well, so we can use centrality measures to find hubs in a transportation network, or important pages on the web, or more generally, these measures allows us to find nodes that prevent the network from breaking up. Nodes that if we were to remove them, they would cause the network to fracture and break up into different components. Here's a list of centrality measures that are commonly used. We're going to be talking about some of these in this course, and in this video, we're going to be focusing on degree centrality and closeness centrality. A degree centrality makes the assumption that important nodes have many connections. This is the most basic way in which you could define importance or centrality. Simply that nodes who have lots of neighbors, lots of friends, are very important. It makes sense. Of course, depending on the type of network you have, you'd have to use different types of degrees. If you have an undirected network, you can simply use a degree, but if you have a directed network, then you have to choose between using in-degree or out-degree or a combination of both. Let's get into exactly how we would define this. We'll say that the degree centrality of a node v is going to be the ratio between its degree and the number of nodes in the graph minus 1. In this case, a node would have a centrality of one if it's connected to every single node in the network, and a centrality of zero if it's connected to no node in the network. This measure goes between zero and one, with one being the case where you're most connected. Let's see how we can use network x to find the degree of centrality of the nodes in this network. First, let me load up the karate club graph and that let me convert the labels of the nodes so that they match what we see here in this figure. Now, we can use the function degree centrality to measure the centrality of older nodes in the network. Here, this returns a dictionary of centralities of every node, and we can, for example, look at the degree centrality of node number 34, which is 0.515, and that's because node 34 has 17 connections and there are 34 nodes in the network, so that is 17 or 33. The degree centrality of node 33 is 0.364 and that is 12 over 33. Now, like I said, in directed networks, we have the choice of using the in-degree centrality or the out-degree centrality of a node, and everything else is defined in the same way, so the in-degree centrality of a node v is going to be its in-degree divided by the number of nodes in the graph minus 1. We can use the function in-degree centrality in network x to find the in-degree centrality old and nodes in a directed network. In this case, A will have in-degree of 0.143, which is two over 14, there are 15 nodes in this network, and L would have a in-degree centrality of 0.214, which is 3 over 14, and that's because node L has an in-degree of three. The very same way we can define using the out-degree instead of the in-degree centrality, using the function out-degree centrality, and the out-degree centrality of A is 0.214 because it has an out-degree of three, and the out-degree centrality of node L is 0.071 because L has an out-degree of one. Another way of measuring centrality is what we call closeness centrality. The assumption here is that nodes that are important are going to be a short distance away from all other nodes in the network. Recall that we measured distance between two nodes by looking at the length of the shortest path between them. The way we're going to find that closeness centrality of node v is going to be by taking the ratio of the number of nodes in the network minus one divided by the sum over all the other nodes in the network and the distance between node v and those nodes. That's the sum in the denominator in the definition of centrality. Let's say that we want to use network x to find the closeness centrality of this node 32, we can use the function closeness centrality which returns a dictionary of the centrality of the closeness centrality of other nodes. Here we find the node 32 has a closeness centrality of 0.541. So using the definition of closeness centrality, let's see how these 0.541 comes about. First of all, let me look at the sum of the lengths of the shortest paths from node number 32 to all the other nodes in the network. I'm using here the shortest path length function, which you've seen before which gives you the length of all the shortest paths from the node number 32 to all the other nodes and that sum here is 61. Then if we take the number of nodes in the graph minus 1 divided by 61, that's how we get these 0.541. Of course we're making the implicit assumption that all the nodes can actually reach all the other nodes. But of course this is not always the case in particular, when we have directed graphs, this is often not the case. Even for undirected graphs, you could have multiple connected components and you can have it so that some nodes cannot reach other nodes. What happens? How do we measure closeness centrality when a node cannot actually reach all the other nodes? Let's take this example, so node L here can only reach node M. How do we measure its closeness centrality? We have a couple of options here, and the first option we can simply only consider the nodes that L can actually reach in order to measure its closeness centrality. The way this would work is that we define this set RL to be the set of nodes that L can actually reach and we define the closeness centrality of L to be the ratio of the number of nodes that L can reach divided by the sum of the distances from L to other nodes that L can actually reach. For node L here, this would be simply one because L can only reach node M. RL here is just the set M with just node M and L can reach M in just one step. The closeness centrality of L here would be one over one which is one. This is the highest possible centrality a node can have in a network, which seems a little bit problematic because node L can only reach one node and we're saying that it has the highest possible centrality that any node can have. These seems unintuitive. Here's where Option 2 comes in. In Option 2, we again only consider the nodes that L can reach, but then we normalize by the fraction of nodes that L can reach. The way this looks here is that when we compute the closest centrality of L, we have the same ratio of RL over the sum. But now we're going to multiply that ratio by the fraction of nodes that L can reach RL divided by the number of nodes in the graph minus one. Basically we're normalizing by the fraction of nodes that L can reach and so if L can not reach many nodes, we're going to be multiplying this other fraction by a very small number. In this case, if we do that, we find that L has a closeness centrality of 0.71 which is more reasonable than defining it to be one. One thing to note here is, in this new definition when we're normalizing, we're not changing the definition of closeness centrality when the graph is connected and every node can reach every other node. That's because when that's the case, our RL for node L would equal N minus 1 and this formula that you see here would be the exact same formula that we had before. We're not changing the definition, this definition can still apply even if the graph is completely connected. How can we use network x to find the closeness centrality? Well, we can use the function closeness centrality and here you get the option of normalizing or not normalizing. For example, if we choose not to normalize, then the closest centrality of node L would be one, as we saw before. If you choose to normalize, then it's closest centrality would be 0.071. In summary, today, we talked about centrality measures and how they aim to find the most important nodes in a network. We said that there are many, many different ways of defining centrality and today we talked about two specific ones. A very basic definition called degree centrality which makes the assumption that important nodes are those who have many, many connections. We use this formula to measure the centrality of a node, and it's simply the ratio between the degree of the node and the number of nodes in the graph minus one. Depending on whether we have a direct or undirect graph, we can use the degree of the node or the in-degree or the out-degree and these are the different functions that you can use in network x to apply them. The second centrality measure that we looked at it was closeness centrality, and the assumption that it makes is that important nodes are close to other nodes. Here is the general formula that we use to compute it and here we can choose to normalize or not normalize as we discussed, and the function that we use in network x to compute it is the function closeness centrality. This is it for this video and I hope to see you next time. Thanks.