JobRunner
Samza jobs are started using a script called run-job.sh.
You provide two parameters to the run-job.sh script. One is the config location, and the other is a factory class that is used to read your configuration file. The run-job.sh script is actually executing a Samza class called JobRunner. The JobRunner uses your ConfigFactory to get a Config object from the config path.
The Config object is just a wrapper around Map
Once the JobRunner gets your configuration, it gives your configuration to the StreamJobFactory class defined by the “job.factory” property. Samza ships with three job factory implementations: ThreadJobFactory, ProcessJobFactory and YarnJobFactory. The StreamJobFactory’s responsibility is to give the JobRunner a job that it can run.
Once the JobRunner gets a job, it calls submit() on the job. This method is what tells the StreamJob implementation to start the SamzaContainer. In the case of LocalJobRunner, it uses a run-container.sh script to execute the SamzaContainer in a separate process, which will start one SamzaContainer locally on the machine that you ran run-job.sh on.
This flow differs slightly when you use YARN, but we’ll get to that later.