Conditional iteration in a single job in Apache Spark -
i working on iterative algorithm using apache spark, claims perfect that. examples have found far creates single job hardcoded number of iterations. need algorithm run until condition met.
my current implementation launches new job each iteration this:
var data = sc.textfile(...).map().cache() while(data.filter(...).isempty()) { // run algorithm (also handles caching) val data = performstep(data) }
this pretty inefficient. between each iteration wait long time next job start. 4 servers wait around 10 seconds in between each job, 32 servers 100 seconds. in total end spending @ least half of runtime waiting in between jobs.
i find conditional iterations quite common in types of algorithms, example stopping criteria in machine learning. hoping can improved.
is there more efficient way of doing this? example away run conditional repetition in single job? thanks!
Comments
Post a Comment