Skip to content

Commit 28d925d

Browse files
add the support for the sparklens in spark-3.0.0 and later version of spark with scala-2.12
Author: SaurabhChawla Date: Fri Mar 19 22:37:13 2021 +0530 Committer: Saurabh Chawla <[email protected]> (cherry picked from commit 479468d)
1 parent 8c17881 commit 28d925d

File tree

5 files changed

+37
-16
lines changed

5 files changed

+37
-16
lines changed

README.md

Lines changed: 20 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -92,16 +92,31 @@ Note: Apart from the console based report, you can also get an UI based report s
9292
`--conf spark.sparklens.report.email=<email>` along with other relevant confs mentioned below.
9393
This functionality is available in Sparklens 0.3.2 and above.
9494

95-
Use the following arguments to `spark-submit` or `spark-shell`:
95+
Use the following arguments to `spark-submit` or `spark-shell` for spark-3.0.0 and latest version of spark:
96+
```
97+
--packages qubole:sparklens:0.4.0-s_2.12
98+
--conf spark.extraListeners=com.qubole.sparklens.QuboleJobListener
99+
```
100+
101+
Use the following arguments to `spark-submit` or `spark-shell` for spark-2.4.x and lower version of spark:
96102
```
97103
--packages qubole:sparklens:0.3.2-s_2.11
98104
--conf spark.extraListeners=com.qubole.sparklens.QuboleJobListener
99105
```
100106

107+
101108
#### 2. Run from Sparklens offline data ####
102109

103110
You can choose not to run sparklens inside the app, but at a later time. Run your app as above
104-
with additional configuration parameters:
111+
with additional configuration parameters
112+
For spark-3.0.0 and latest version of spark:
113+
```
114+
--packages qubole:sparklens:0.4.0-s_2.12
115+
--conf spark.extraListeners=com.qubole.sparklens.QuboleJobListener
116+
--conf spark.sparklens.reporting.disabled=true
117+
```
118+
119+
For spark-2.4.x and lower version of spark:
105120
```
106121
--packages qubole:sparklens:0.3.2-s_2.11
107122
--conf spark.extraListeners=com.qubole.sparklens.QuboleJobListener
@@ -111,7 +126,7 @@ with additional configuration parameters:
111126
This will not run reporting, but instead create a Sparklens JSON file for the application which is
112127
stored in the **spark.sparklens.data.dir** directory (by default, **/tmp/sparklens/**). Note that this will be stored on HDFS by default. To save this file to s3, please set **spark.sparklens.data.dir** to s3 path. This data file can now be used to run Sparklens reporting independently, using `spark-submit` command as follows:
113128

114-
`./bin/spark-submit --packages qubole:sparklens:0.3.2-s_2.11 --class com.qubole.sparklens.app.ReporterApp qubole-dummy-arg <filename>`
129+
`./bin/spark-submit --packages qubole:sparklens:0.4.0-s_2.12 --class com.qubole.sparklens.app.ReporterApp qubole-dummy-arg <filename>`
115130

116131
`<filename>` should be replaced by the full path of sparklens json file. If the file is on s3 use the full s3 path. For files on local file system, use file:// prefix with the local file location. HDFS is supported as well.
117132

@@ -124,11 +139,11 @@ running via `sparklens-json-file` above) with another option specifying that is
124139
event history file. This file can be in any of the formats the event history files supports, i.e. **text, snappy, lz4
125140
or lzf**. Note the extra `source=history` parameter in this example:
126141

127-
`./bin/spark-submit --packages qubole:sparklens:0.3.2-s_2.11 --class com.qubole.sparklens.app.ReporterApp qubole-dummy-arg <filename> source=history`
142+
`./bin/spark-submit --packages qubole:sparklens:0.4.0-s_2.12 --class com.qubole.sparklens.app.ReporterApp qubole-dummy-arg <filename> source=history`
128143

129144
It is also possible to convert an event history file to a Sparklens json file using the following command:
130145

131-
`./bin/spark-submit --packages qubole:sparklens:0.3.2-s_2.11 --class com.qubole.sparklens.app.EventHistoryToSparklensJson qubole-dummy-arg <srcDir> <targetDir>`
146+
`./bin/spark-submit --packages qubole:sparklens:0.4.0-s_2.12 --class com.qubole.sparklens.app.EventHistoryToSparklensJson qubole-dummy-arg <srcDir> <targetDir>`
132147

133148
EventHistoryToSparklensJson is designed to work on local file system only. Please make sure that the source and target directories are on local file system.
134149

build.sbt

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
name := "sparklens"
22
organization := "com.qubole"
33

4-
scalaVersion := "2.11.8"
4+
scalaVersion := "2.12.10"
55

6-
crossScalaVersions := Seq("2.10.6", "2.11.8")
6+
crossScalaVersions := Seq("2.10.6", "2.11.12", "2.12.10")
77

88
spName := "qubole/sparklens"
99

10-
sparkVersion := "2.0.0"
10+
sparkVersion := "3.0.1"
1111

1212
spAppendScalaVersion := true
1313

src/main/scala/com/qubole/sparklens/QuboleJobListener.scala

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -257,7 +257,8 @@ class QuboleJobListener(sparkConf: SparkConf) extends SparkListener {
257257
if (stageCompleted.stageInfo.failureReason.isDefined) {
258258
//stage failed
259259
val si = stageCompleted.stageInfo
260-
failedStages += s""" Stage ${si.stageId} attempt ${si.attemptId} in job ${stageIDToJobID(si.stageId)} failed.
260+
// attempt-id is deprecated and attemptNumber is used to get attempt-id from spark-3.0.0
261+
failedStages += s""" Stage ${si.stageId} attempt ${si.attemptNumber} in job ${stageIDToJobID(si.stageId)} failed.
261262
Stage tasks: ${si.numTasks}
262263
"""
263264
stageTimeSpan.finalUpdate()

src/main/scala/com/qubole/sparklens/helper/HDFSConfigHelper.scala

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -24,11 +24,16 @@ import org.apache.spark.deploy.SparkHadoopUtil
2424
object HDFSConfigHelper {
2525

2626
def getHadoopConf(sparkConfOptional:Option[SparkConf]): Configuration = {
27-
if (sparkConfOptional.isDefined) {
28-
SparkHadoopUtil.get.newConfiguration(sparkConfOptional.get)
29-
}else {
30-
val sparkConf = new SparkConf()
31-
SparkHadoopUtil.get.newConfiguration(sparkConf)
32-
}
27+
// After Spark 3.0.0 SparkHadoopUtil is made private to make it work only within the spark
28+
// using reflection code here to access the newConfiguration method of the SparkHadoopUtil
29+
val sparkHadoopUtilClass = Class.forName("org.apache.spark.deploy.SparkHadoopUtil")
30+
val sparkHadoopUtil = sparkHadoopUtilClass.newInstance()
31+
val newConfigurationMethod = sparkHadoopUtilClass.getMethod("newConfiguration", classOf[SparkConf])
32+
if (sparkConfOptional.isDefined) {
33+
newConfigurationMethod.invoke(sparkHadoopUtil, sparkConfOptional.get).asInstanceOf[Configuration]
34+
} else {
35+
val sparkConf = new SparkConf()
36+
newConfigurationMethod.invoke(sparkHadoopUtil, sparkConf).asInstanceOf[Configuration]
37+
}
3338
}
3439
}

version.sbt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
version in ThisBuild := "0.3.2"
1+
version in ThisBuild := "0.4.0"

0 commit comments

Comments
 (0)