At this point, you should be familiar with some of the basic features
of Ligra such as the vertexSubset
datatype and core parts of the API
such as edgeMap
. For examples of how to use this, please refer to either
the previous tutorial or the API reference.
In this tutorial, we will introduce and use other parts of the core API such
as vertexFilter
in order to compute the k-core of a graph. Given
$G = (V, E)$ the $k$-core of $G$ is the maximal set of vertices $S \subset V$
s.t. the degree for $v \in S$ in $G[S]$ (the induced subgraph of $G$ over $S$)
is at least $k$. These objects have been useful in the study of social networks
and bioinformatics, and have natural applications in the visualization of
complex networks.
In the $k$-core problem, our goal will be to produce a map, $f : V \rightarrow \mathbb{N}$ s.t. for $v \in V$, $f(v)$ is the maximum core that $v$ is a part of. We refer to $f(v)$ as the core-number of $v$. A commonly used measure is the degeneracy of a graph, which is just the maximum non-empty core that $G$ supports.
Here's a simple algorithm for computing the $k$-cores of a graph. Our goal
is to produce an array cores
that contains the core-number for each vertex.
Initialize $k$ to be 1. Initially, all vertices are active. Check all active
vertices, and remove any with induced degree less than $k$. Continue this process
until no vertices are left. By definition, this is the $k$-core of the graph.
For any vertices $v \in V$ removed during this step, set cores[v] = k-1
.
Decrement the induced degree of all neighbors of removed vertices. Increment
$k$ and continue until there are no more active vertices.
Let's get started! cd
into the tutorial/
directory and open up the file
called KCore.C
.
#include "ligra.h"
template <class vertex>
void Compute(graph<vertex>& GA, commandLine P) {
const long n = GA.n;
bool* active = new bool[n];
{parallel_for(long i=0;i<n;i++) active[i] = 1;}
vertexSubset Frontier(n, n, active);
long* coreNumbers = new long[n];
}
Last time we explicitly created a vertexSubset
, we used a constructor that took a single
vertex, $v$ and produced the subset ${s}$. The default representation of such a small subset
is sparse. Here, we are creating a dense vertexSubset
. The difference is that a sparse
subset will represent a subset of size $k$ using an array containing $k$ vertex-ids, but a
dense
representation will just use an array of $n$ bits. For large subsets that are roughly
$O(n)$, the dense implementation will be more space efficient, and will allow us to decide
membership in a subset in $O(1)$.
We also need to keep around an array containing the induced degrees of vertices.
long* Degrees = new long[n];
parallel_for(long i=0;i<n;i++) {
coreNumbers[i] = 0;
Degrees[i] = GA.V[i].getOutDegree();
}
long largestCore = -1;
Great! We now need to implement the logic that separates a vertexSubset
into two sets: vertices with degree $< k$ and vertices with induced degree
$\geq k$. In order to do this, we will use a new part of the Ligra API:
vertexFilter
, which takes as input a vertexSubset
and a function
that, given the id of a vertex, determines whether the vertex should be
included in the returned vertexSubset or not. The function has the following
signature:
template <class F>
vertexSubset vertexFilter(vertexSubset V, F filter);
We need to supply it with a structure that is callable on values of the
vertex-id type (in our case, long
). We need to overload the ()
operator
for the structure for a long
argument. The most basic structure that
can be passed to vertexFilter
is:
struct Vertex_F {
inline bool operator () (long i) {
return true;
}
};
Note that this leaves the input vertexSubset unchanged. We implement
the structure to produce a vertexSubset
containing vertices with degree
$>= k$ as follows:
template<class vertex>
struct Deg_AtLeast_K {
vertex* V;
long *Degrees;
long k;
Deg_AtLeast_K(vertex* _V, long* _Degrees, long _k) :
V(_V), k(_k), Degrees(_Degrees) {}
inline bool operator () (long i) {
return Degrees[i] >= k;
}
};
The following structure will filter all vertices with degree $< k$. As vertices that satisfy this predicate are removed, we have to do some bookkeeping. In particular we
template<class vertex>
struct Deg_LessThan_K {
vertex* V;
long* coreNumbers;
long* Degrees;
long k;
Deg_LessThan_K(vertex* _V, long* _Degrees, long* _coreNumbers, long _k) :
V(_V), k(_k), Degrees(_Degrees), coreNumbers(_coreNumbers) { }
inline bool operator () (long i) {
if(Degrees[i] < k) { coreNumbers[i] = k-1; Degrees[i] = 0; return true; }
else return false;
}
};
The last part of the implementation is to update the induced degrees of the
neighbors of removed vertices. We can do this by using edgeMap
. We'll pass
the following struct to edgeMap
.
struct Update_Deg {
long* Degrees;
Update_Deg(long* _Degrees) : Degrees(_Degrees) {}
inline bool update (long s, long d) {
Degrees[d]--;
return 1;
}
inline bool updateAtomic (long s, long d) {
writeAdd(&Degrees[d],(long)-1);
return 1;
}
inline bool cond (long d) { return Degrees[d] > 0; }
};
What should our loop do? Initially $k$ starts at one. We don't know what the degeneracy (recall, the maximum non-empty $k$-core) of the graph is, but have a simple upper bound of $n$.
for (long k = 1; k <= n; k++) {
// TODO
}
What should we do inside of the loop? Inductively assume that we've computed the $k-1$ core, for some $1 < k < n$. In order to compute the $k$ core, we need to keep filtering vertices with induced degree $< k$ until either (1) there are no more vertices remaining in the graph or (2) all vertices have induced degree $>= k$. In code:
for (long k = 1; k <= n; k++) {
while (true) {
vertexSubset toRemove =
vertexFilter(Frontier,Deg_LessThan_K<vertex>(GA.V,Degrees,coreNumbers,k));
vertexSubset remaining =
vertexFilter(Frontier,Deg_AtLeast_K<vertex>(GA.V,Degrees,k));
Frontier.del();
Frontier = remaining;
if (0 == toRemove.numNonzeros()) { // fixed point. found k-core
toRemove.del();
break;
}
else {
vertexSubset output = edgeMap(GA,toRemove,Update_Deg(Degrees));
toRemove.del(); output.del();
}
}
if(Frontier.numNonzeros() == 0) { largestCore = k-1; break; }
}
#include "ligra.h"
struct Update_Deg {
long* Degrees;
Update_Deg(long* _Degrees) : Degrees(_Degrees) {}
inline bool update (long s, long d) {
Degrees[d]--;
return 1;
}
inline bool updateAtomic (long s, long d){
writeAdd(&Degrees[d],-1);
return 1;
}
inline bool cond (long d) { return Degrees[d] > 0; }
};
template<class vertex>
struct Deg_LessThan_K {
vertex* V;
long* coreNumbers;
long* Degrees;
long k;
Deg_LessThan_K(vertex* _V, long* _Degrees, long* _coreNumbers, long _k) :
V(_V), k(_k), Degrees(_Degrees), coreNumbers(_coreNumbers) {}
inline bool operator () (long i) {
if(Degrees[i] < k) { coreNumbers[i] = k-1; Degrees[i] = 0; return true; }
else return false;
}
};
template<class vertex>
struct Deg_AtLeast_K {
vertex* V;
long *Degrees;
long k;
Deg_AtLeast_K(vertex* _V, long* _Degrees, long _k) :
V(_V), k(_k), Degrees(_Degrees) {}
inline bool operator () (long i) {
return Degrees[i] >= k;
}
};
template <class vertex>
void Compute(graph<vertex>& GA, commandLine P) {
const long n = GA.n;
bool* active = new bool[n];
{parallel_for(long i=0;i<n;i++) active[i] = 1;}
vertexSubset Frontier(n, n, active);
long* coreNumbers = new long[n];
long* Degrees = new long[n];
{parallel_for(long i=0;i<n;i++) {
coreNumbers[i] = 0;
Degrees[i] = GA.V[i].getOutDegree();
}}
long largestCore = -1;
for (long k = 1; k <= n; k++) {
while (true) {
vertexSubset toRemove =
vertexFilter(Frontier,Deg_LessThan_K<vertex>(GA.V,Degrees,coreNumbers,k));
vertexSubset remaining =
vertexFilter(Frontier,Deg_AtLeast_K<vertex>(GA.V,Degrees,k));
Frontier.del();
Frontier = remaining;
if (0 == toRemove.numNonzeros()) { // fixed point. found k-core
toRemove.del();
break;
}
else {
vertexSubset output = edgeMap(GA,toRemove,Update_Deg(Degrees));
toRemove.del(); output.del();
}
}
if(Frontier.numNonzeros() == 0) { largestCore = k-1; break; }
}
cout << "Degeneracy is " << largestCore << endl;
Frontier.del(); free(coreNumbers); free(Degrees);
}
Compile the application by running make
, which will produce a binary called
KCore
.
Let's try computing the KCore of one of the input graphs.
$ ./KCore -s ../inputs/rMatGraph_J_5_100
largestCore was 4
Running time : 0.00594
largestCore was 4
Running time : 0.00559
largestCore was 4
Running time : 0.0057
Great! You can try running the KCore on more interesting examples, such
as the twitter graph (note that -b
indicates that the graph is stored
as a binary, and -rounds 1
indicates that the application should be run
just once)
$ numactl -i all ./KCore -s -b -rounds 1 twitter_sym
largestCore was 2488
Running time : 280