@InterfaceStability.Evolving public interface StreamApplication extends SamzaApplication<StreamApplicationDescriptor>
StreamApplication
describes the inputs, outputs, state, configuration and the processing logic for the
application in Samza's High Level API.
A typical StreamApplication
implementation consists of the following stages:
SystemDescriptor
s,
InputDescriptor
s, OutputDescriptor
s and TableDescriptor
s
MessageStream
s, OutputStream
s and Table
s from the
provided StreamApplicationDescriptor
.
MessageStream.filter(FilterFunction)
The following example StreamApplication
removes page views older than 1 hour from the input stream:
public class PageViewFilter implements StreamApplication {
public void describe(StreamApplicationDescriptor appDescriptor) {
KafkaSystemDescriptor trackingSystemDescriptor = new KafkaSystemDescriptor("tracking");
KafkaInputDescriptor<PageViewEvent> inputStreamDescriptor =
trackingSystemDescriptor.getInputDescriptor("pageViewEvent", new JsonSerdeV2<>(PageViewEvent.class));
KafkaOutputDescriptor<PageViewEvent>> outputStreamDescriptor =
trackingSystemDescriptor.getOutputDescriptor("recentPageViewEvent", new JsonSerdeV2<>(PageViewEvent.class)));
MessageStream<PageViewEvent> pageViewEvents = appDescriptor.getInputStream(inputStreamDescriptor);
OutputStream<PageViewEvent> recentPageViewEvents = appDescriptor.getOutputStream(outputStreamDescriptor);
pageViewEvents
.filter(m -> m.getCreationTime() > System.currentTimeMillis() - Duration.ofHours(1).toMillis())
.sendTo(recentPageViewEvents);
}
}
All operator function implementations used in a StreamApplication
must be Serializable
. Any
context required within an operator function may be managed by implementing the InitableFunction.init(org.apache.samza.context.Context)
and
ClosableFunction.close()
methods in the function implementation.
Functions may implement the ScheduledFunction
interface to schedule and receive periodic callbacks from the
Samza framework.
Implementation Notes: Currently StreamApplication
s are wrapped in a StreamTask
during execution. The
execution planner will generate a serialized DAG which will be deserialized in each StreamTask
instance used
for processing incoming messages. Execution is synchronous and thread-safe within each StreamTask
. Multiple
tasks may process their messages concurrently depending on the job parallelism configuration.
describe