зеркало из https://github.com/microsoft/spark.git
Add PageRank example and data
This commit is contained in:
Родитель
f096f4eaf1
Коммит
5e35d39e0f
|
@ -470,10 +470,40 @@ things to worry about.)
|
|||
# Graph Algorithms
|
||||
<a name="graph_algorithms"></a>
|
||||
|
||||
This section should describe the various algorithms and how they are used.
|
||||
GraphX includes a set of graph algorithms in to simplify analytics. The algorithms are contained in the `org.apache.spark.graphx.lib` package and can be accessed directly as methods on `Graph` via an implicit conversion to [`Algorithms`][Algorithms]. This section describes the algorithms and how they are used.
|
||||
|
||||
[Algorithms]: api/graphx/index.html#org.apache.spark.graphx.lib.Algorithms
|
||||
|
||||
## PageRank
|
||||
|
||||
PageRank measures the importance of each vertex in a graph, assuming an edge from *u* to *v* represents an endorsement of *v*'s importance by *u*. For example, if a Twitter user is followed by many others, the user will be ranked highly.
|
||||
|
||||
Spark includes an example social network dataset that we can run PageRank on. A set of users is given in `graphx/data/users.txt`, and a set of relationships between users is given in `graphx/data/followers.txt`. We can compute the PageRank of each user as follows:
|
||||
|
||||
{% highlight scala %}
|
||||
// Load the implicit conversion to Algorithms
|
||||
import org.apache.spark.graphx.lib._
|
||||
// Load the datasets into a graph
|
||||
val users = sc.textFile("graphx/data/users.txt").map { line =>
|
||||
val fields = line.split("\\s+")
|
||||
(fields(0).toLong, fields(1))
|
||||
}
|
||||
val followers = sc.textFile("graphx/data/followers.txt").map { line =>
|
||||
val fields = line.split("\\s+")
|
||||
Edge(fields(0).toLong, fields(1).toLong, 1)
|
||||
}
|
||||
val graph = Graph(users, followers)
|
||||
// Run PageRank
|
||||
val ranks = graph.pageRank(0.0001).vertices
|
||||
// Join the ranks with the usernames
|
||||
val ranksByUsername = users.leftOuterJoin(ranks).map {
|
||||
case (id, (username, rankOpt)) => (username, rankOpt.getOrElse(0.0))
|
||||
}
|
||||
// Print the result
|
||||
println(ranksByUsername.collect().mkString("\n"))
|
||||
{% endhighlight %}
|
||||
|
||||
|
||||
## Connected Components
|
||||
|
||||
## Shortest Path
|
||||
|
|
|
@ -0,0 +1,12 @@
|
|||
2 1
|
||||
3 1
|
||||
4 1
|
||||
6 1
|
||||
3 2
|
||||
6 2
|
||||
7 2
|
||||
6 3
|
||||
7 3
|
||||
7 6
|
||||
6 7
|
||||
3 7
|
|
@ -0,0 +1,6 @@
|
|||
1 BarackObama
|
||||
2 ericschmidt
|
||||
3 jeresig
|
||||
4 justinbieber
|
||||
6 matei_zaharia
|
||||
7 odersky
|
|
@ -106,7 +106,7 @@ object PageRank extends Logging {
|
|||
* @tparam ED the original edge attribute (not used)
|
||||
*
|
||||
* @param graph the graph on which to compute PageRank
|
||||
* @param tol the tolerance allowed at convergence (smaller => more * accurate).
|
||||
* @param tol the tolerance allowed at convergence (smaller => more accurate).
|
||||
* @param resetProb the random reset probability (alpha)
|
||||
*
|
||||
* @return the graph containing with each vertex containing the PageRank and each edge
|
||||
|
|
Загрузка…
Ссылка в новой задаче