Conditional iterations in Google cloud dataflow -


i looking @ opportunities implementing data analysis algorithm using google cloud dataflow. mind you, have no experience dataflow yet. doing research on whether can fulfill needs.

part of algorithm contains conditional iterations, is, continue until condition met:

pcollection data  = ... while(needsmorework(data)) {   data = doastep(data) } 

i have looked around in documentation , far can see able "iterations" if know exact number of iterations before pipeline starts. in case pipeline construction code can create sequential pipeline fixed number of steps.

the "solution" can think of run each iteration in separate pipelines, store intermediate data in database, , decide in pipeline construction whether or not launch new pipeline next iteration. seems extremely inefficient solution!

are there ways perform kind of additional iterations in google cloud dataflow?

thanks!

for time being, 2 options you've mentioned both reasonable. combine 2 approaches. create pipeline few iterations (becoming no-op if needsmorework false), , have main java program submits pipeline multiple times until needsmorework false.

we've seen use case few times , hope address natively in future. native support being tracked in https://github.com/googlecloudplatform/dataflowjavasdk/issues/50.


Comments

Popular posts from this blog

yii2 - Yii 2 Running a Cron in the basic template -

asp.net - 'System.Web.HttpContext' does not contain a definition for 'GetOwinContext' Mystery -

mercurial graft feature, can it copy? -