I do most of my development in python and scala these days. I totally forgot how verbose java is until I saw some Apache Spark coding examples.
Code in python
file = spark.textFile("hdfs://...") errors = file.filter(lambda line: "ERROR" in line) # Count all the errors errors.count() # Count errors mentioning MySQL errors.filter(lambda line: "MySQL" in line).count() # Fetch the MySQL errors as an array of strings errors.filter(lambda line: "MySQL" in line).collect()
The same code in Scala:
val file = spark.textFile("hdfs://...") val errors = file.filter(line => line.contains("ERROR")) // Count all the errors errors.count() // Count errors mentioning MySQL errors.filter(line => line.contains("MySQL")).count() // Fetch the MySQL errors as an array of strings errors.filter(line => line.contains("MySQL")).collect()
And finally in Java:
JavaRDD<String> file = spark.textFile("hdfs://..."); JavaRDD<String> errors = file.filter(new Function<String, Boolean>() { public Boolean call(String s) { return s.contains("ERROR"); } }); // Count all the errors errors.count(); // Count errors mentioning MySQL errors.filter(new Function<String, Boolean>() { public Boolean call(String s) { return s.contains("MySQL"); } }).count(); // Fetch the MySQL errors as an array of strings errors.filter(new Function<String, Boolean>() { public Boolean call(String s) { return s.contains("MySQL"); } }).collect();
Twice as many lines of code.