Back to the Index page

About

A community in a graph is a subset of vertices that are densely connected to each other and are in turn less densely connected to the other vertices in the graph.

We analysed the properties of the English Wikipedia graph, that is, the graph with each vertex standing for a single Wikipedia article and an egde standing for a Wikipedia link. We discovered that the Wikipedia graph has a distinguished community structure. Moreover, we observed that Wikipedia articles falling into the same community are more semantically similar to each other than the ones randomly chosen across the corpus, that is, communities in the Wikipedia graph serve as grouping units for semantically similar Wikipedia articles.

Due to the Wikipedia graph being scale-free one, communities that comprise the Wikipedia graph have their sizes distributed closely to the power law, with a few large communities and a long tail of small ones. We discovered that larger communities inherit the properties of the whole graph, i.e., they possess community structure in turn. Dividing a large community into further sub-communities has the effect of obtaining a further division of the group of Wikipedia articles into nested sub-groups of more semantically similar articles.

Dividing each community while it exhibits good sub-community structure, we organize Wikipedia articles into the hierarchy. The hierarchy relies on solely the link structure of Wikipedia and is fully automatic.

Community titles are selected using PageRank, that is, using purely link-based analysis as well.

Inherited Terms of use

The Wikipedia article abstracts displayed on this site are licenced under the
GNU Free Documentation License. The abstracts use materials from the respective Wikipedia articles.

People

Properties of the community structure of the Wikipedia graph researched by Dmitry Lizorkin, Olena Medelyan and Maria Grineva.


Back to the Index page