kafka
  • About Book
  • 入门
    • 简介
    • Introduction
    • 应用场景
    • Use Cases
    • 快速入门
    • Quick Start
    • 软件生态
    • Ecosystem
    • 升级
    • Upgrading
  • API
    • 生产者API
    • Producer API
    • 消费者API
      • 老的上层消费者API
      • 老的简单消费者API
      • 新的消费者API
    • Consumer API
      • Old High Level Consumer API
      • Old Simple Consumer API
      • New Consumer API
    • 流处理API
    • Streams API
  • 配置
    • Broker配置
    • Broker Configs
    • 生产者配置
    • Producer Configs
    • 消费者配置
      • 老的消费者配置
      • 新的消费者配置
    • Consumer Configs
      • Old Consumer Configs
      • New Consumer Configs
    • Kafka Connect配置
    • Kafka Connect Configs
    • Kafka Streams配置
    • Kafka Streams Configs
  • 设计
    • 设计初衷
    • Motivation
    • 持久化
    • Persistence
    • 性能
    • Efficiency
    • 生产者
    • The Producer
    • 消费者
    • The Consumer
    • 消息投递语义
    • Message Delivery Semantics
    • 复制
    • Replication
    • 日志压缩
    • Log Compaction
    • 配额
    • Quotas
  • Implementation
    • API Design
    • Network Layer
    • Messages
    • Message format
    • Log
    • Distribution
  • Operations
    • Basic Kafka Operations
      • Adding and removing topics
      • Modifying topics
      • Graceful shutdown
      • Balancing leadership
      • Checking consumer position
      • Mirroring data between clusters
      • Expanding your cluster
      • Decommissioning brokers
      • Increasing replication factor
    • Datacenters
    • Important Configs
      • Important Server Configs
      • Important Client Configs
      • A Production Server Configs
    • Java Version
    • Hardware and OS
      • OS
      • Disks and Filesystems
      • Application vs OS Flush Management
      • Linux Flush Behavior
      • Ext4 Notes
    • Monitoring
      • Common monitoring metrics for producer\/consumer\/connect
      • New producer monitoring
      • New consumer monitoring
    • ZooKeeper
      • Stable Version
      • Operationalization
  • Security
    • Security Overview
    • Encryption and Authentication using SSL
    • Authentication using SASL
    • Authorization and ACLs
    • Incorporating Security Features in a Running Cluster
    • ZooKeeper Authentication
      • New Clusters
      • Migrating Clusters
      • Migrating the ZooKeeper Ensemble
  • Kafka Connect
    • Overview
    • User Guide
    • Connector Development Guide
Powered by GitBook
On this page
  • Messaging
  • Website Activity Tracking
  • Metrics
  • Log Aggregation
  • Stream Processing
  • Event Sourcing
  • Commit Log

Was this helpful?

  1. 入门

Use Cases

Previous应用场景Next快速入门

Last updated 5 years ago

Was this helpful?

Here is a description of a few of the popular use cases for Apache Kafka. For an overview of a number of these areas in action, see .

Kafka works well as a replacement for a more traditional message broker. Message brokers are used for a variety of reasons (to decouple processing from data producers, to buffer unprocessed messages, etc). In comparison to most messaging systems Kafka has better throughput, built-in partitioning, replication, and fault-tolerance which makes it a good solution for large scale message processing applications.

In our experience messaging uses are often comparatively low-throughput, but may require low end-to-end latency and often depend on the strong durability guarantees Kafka provides.

In this domain Kafka is comparable to traditional messaging systems such as or .

The original use case for Kafka was to be able to rebuild a user activity tracking pipeline as a set of real-time publish-subscribe feeds. This means site activity (page views, searches, or other actions users may take) is published to central topics with one topic per activity type. These feeds are available for subscription for a range of use cases including real-time processing, real-time monitoring, and loading into Hadoop or offline data warehousing systems for offline processing and reporting.

Activity tracking is often very high volume as many activity messages are generated for each user page view.

Kafka is often used for operational monitoring data. This involves aggregating statistics from distributed applications to produce centralized feeds of operational data.

Many people use Kafka as a replacement for a log aggregation solution. Log aggregation typically collects physical log files off servers and puts them in a central place (a file server or HDFS perhaps) for processing. Kafka abstracts away the details of files and gives a cleaner abstraction of log or event data as a stream of messages. This allows for lower-latency processing and easier support for multiple data sources and distributed data consumption. In comparison to log-centric systems like Scribe or Flume, Kafka offers equally good performance, stronger durability guarantees due to replication, and much lower end-to-end latency.

Many users of Kafka process data in processing pipelines consisting of multiple stages, where raw input data is consumed from Kafka topics and then aggregated, enriched, or otherwise transformed into new topics for further consumption or follow-up processing. For example, a processing pipeline for recommending news articles might crawl article content from RSS feeds and publish it to an "articles" topic; further processing might normalize or deduplicate this content and published the cleansed article content to a new topic; a final processing stage might attempt to recommend this content to users. Such processing pipelines create graphs of real-time data flows based on the individual topics. Starting in 0.10.0.0, a light-weight but powerful stream processing library called is available in Apache Kafka to perform such data processing as described above. Apart from Kafka Streams, alternative open source stream processing tools include and .

is a style of application design where state changes are logged as a time-ordered sequence of records. Kafka's support for very large stored log data makes it an excellent backend for an application built in this style.

Kafka can serve as a kind of external commit-log for a distributed system. The log helps replicate data between nodes and acts as a re-syncing mechanism for failed nodes to restore their data. The feature in Kafka helps support this usage. In this usage Kafka is similar to project.

Kafka Streams
Apache Storm
Apache Samza
Event sourcing
log compaction
Apache BookKeeper
this blog post
ActiveMQ
RabbitMQ
Messaging
Website Activity Tracking
Metrics
Log Aggregation
Stream Processing
Event Sourcing
Commit Log