Graph

Graph patterns & laws

0

Recently I was trying to learn something about random graph generators and I ended up reading about interesting properties of real graphs. I describe some of them in this post, but first, let me give a short intro on the graph generators.

Random Graph Generators

You’ve probably noticed that we are surrounded by networks. The Internet, Facebook, people and their social relations, road networks, power networks, and the list goes on. In computer science, networks can be represented by graphs and a whole scientific branch is devoted to understanding the graph theory. Apart from working with solid theoretical grounds, scientists also like to run simulations and test ideas empirically time to time. That’s exactly when the graph generators come into play.

Imagine that you propose a new theory explaining how diseases spread and you want to test your hypothesis on a decently sized social graph such as the one Facebook has. You can ask them to share their data with you, but unless you work for the company, it’s not going to happen. You have to create your own artificial graph, in other words, you need to generate a random graph.

Patterns & Laws

But how random should the random graph be? Are graphs that we see in the real world random? That’s one of three questions Christos Faloutsos [1], a professor at Carnegie Mellon University, is asking in his talk and a paper he co-authored. It turns out they are not, mostly because their creation is constraint by many factors. And it does make sense. We don’t become friends with totally random people (usually), we are more likely to meet people in our proximity and we tend to meet friends of our friends. The air traffic network is probably shaped by travel demands of customers, airlines don’t pick destinations at random. You see the logic.

The remaining two questions are building on the following scenario: suppose that there is a social network (like Facebook) with 1 million people registered and you know that the average number of friends per user is 50.

  • If you randomly select a user, how many friends would you expect this person has?
  • Would you be surprised if there is a user with thousands of friends?

The intuition tells us that if the average number of connections is 50, it should be around 50, right? And a user with thousands of friends? That doesn’t look very realistic. Both of these thoughts are actually wrong. Below I describe a couple of laws & patterns describing such unintuitive behavior for both static and dynamic graphs.

Static graphs

Skewed Degree Distribution

The degree of a node is the number of neighbors it is linked to. So if you have 200 connections on LinkedIn, your account’s degree is 200. It’s been discovered that real networks tend to have many nodes with a low degree but only a few high degree nodes, hence the skewed distribution. Given our LinkedIn example, it means that most accounts have very small number of connections (perhaps their owners are not active anymore), however, accounts exist which are connected with thousands of others (check out these guys). This distribution, very often a power-law distribution, ensures that if you pick a node by random it’s likely to have very few neighbors. On the other hand, you shouldn’t be surprised to find nodes with degrees that are orders of magnitude higher that the average degree in the network.

Small World

Maybe you’ve heard about the “six degrees of separation”, which is exactly the same thing as the small-world phenomenon. It says that you are 6 or less “friend-of-a-friend” cycles from anyone else in the world. It means that there are at most 5 people separating you from Dalai Lama, a fisherman somewhere in East Timor or me. The diameter of a network, which is defined as the number of hops in the longest from all shortest paths between every two nodes in the network, is surprisingly small for real graphs. Facebook analysed their social graph and discovered that the average degree of separation on their website is 4.57 and, moreover, it’s shrinking over time. But more about that below.

Dynamic graphs

The previously mentioned qualities describe static graphs, however, when we look at graphs and how they evolve over time, we’ll discover some patterns that are no less surprising.

Densification Power Law

It feels natural that if we add nodes, the number of edges linearly grows with them and the average node degree remains constant. Although, that’s not the case. According to Leskovec et al. [2], the number of edges grows faster which causes graphs to become denser over time. One possible explanation is that over time, it’s very likely that some of the newly added nodes will have a very high node degree and therefore will contribute to the number of edges greatly.

Shrinking Diameter

Apart from the densification power law, it has also been shown that as new nodes are added, the graph diameter actually becomes smaller or stabilizes.

References

[1] Graph mining: Laws, generators, and algorithms., Chakrabarti, Deepayan, and Christos Faloutsos. ACM computing surveys (CSUR) 38.1 (2006)
[2] Graphs over time: densification laws, shrinking diameters and possible explanations., Leskovec, Jure, Jon Kleinberg, and Christos Faloutsos. Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM, 2005.