Getting Started

Low Latency Web-Scale Fraud Prevention

eBay Enterprise is the world’s largest omni-channel commerce provider. The engineering team at eBay chose Apache Samza to build PreCog, their horizontally scalable anomaly detection system.

PreCog extensively leverages Samza’s high-performance, fault-tolerant local storage. Its architecture had the following requirements, for which Samza perfectly fit the bill:

Web-scale: Scale to a large number of users and large volume of data per-user. Additionally, should be possible to add more commodity hardware and scale horizontally.
Low-latency: Process customer interactions real-time by reacting in milliseconds instead of hours.
Fault-tolerance: Gracefully tolerate and handle hardware failures.


The PreCog anomaly-detection system comprises of multiple tiers, with each tier consisting of multiple Samza jobs, which process the output of the previous tier.

Ingestion tier: In this tier, a variety of historical and realtime data from various sources including people, places etc., is ingested into Kafka.

Fanout tier: This tier consists of Samza jobs which process the Kafka events, fan them out and re-partition them based on various facets like email-address, ip-address, credit-card number, shipping address etc.

Compute tier: The Samza jobs in this tier consume messages from the fan-out tier and compute various key metrics and derived features. Features used to evaluate fraud include:

  1. Number of transactions per-customer per-day
  2. Change in the number of daily transactions over the past few days
  3. Amount value ($$) of each transaction per-day
  4. Change in the amount value of transactions over a sliding time-window
  5. Number of transactions per shipping-address

Assembly tier: This tier comprises of Samza jobs which join the output of the compute-tier with other additional data-sources and make a final determination on transaction-fraud.

For monitoring the PreCog pipeline, EBay leverages Samza’s JMXMetricsReporter and ingests the reported metrics into OpenTSDB/ HBase. The metrics are then visualzed using Grafana.

Key Samza features: Stateful processing, Windowing, Kafka-integration, JMX-metrics

More information:

More Case Studies...