Deploy Samza Job To CDH
The tutorial assumes you have successfully run hello-samza and now you want to deploy the job to your Cloudera Data Hub (CDH). This tutorial is based on CDH 5.0.0 and uses hello-samza as the example job.
Upload Package to Cluster
There are a few ways of uploading the package to the cluster’s HDFS. If you do not have the job package in your cluster, scp from you local machine to the cluster. Then run
hadoop fs -put path/to/hello-samza-0.8.0-dist.tar.gz /path/for/tgz
Get Deloying Scripts
Untar the job package (assume you will run from the current directory)
tar -xvf path/to/samza-job-package-0.8.0-dist.tar.gz -C ./
Add Package Path to Properties File
vim config/wikipedia-parser.properties
Change the yarn package path:
yarn.package.path=hdfs://<hdfs name node ip>:<hdfs name node port>/path/to/tgz
Set Yarn Environment Variable
export HADOOP_CONF_DIR=/etc/hadoop/conf
Run Samza Job
bin/run-job.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file://$PWD/config/wikipedia-parser.properties