Kafka Terminology

In this tutorial I will explain basic Kafka basic terms and terminology.

  • Apache Kafka is an open source streaming platform. 
  • The Kafka core is written in Scala, and Kafka Streams and KSQL are written in Java.
  • Fast, highly scalable and redundant messaging through a pub-sub model.

Usecases:

  • Messaging System
  • Activity Tracking
  • Gather metrics from different locations
  • Application Logs
  • Streams Processing
  • Integration with other Big Data Technologies

Some terms to explain:

  • Cluster: This is a set of Kafka brokers.
  • Zookeeper: This is a cluster coordinator
  • Broker: This is a Kafka server, also the Kafka server process itself.
  • Topic: This is a queue (that has log partitions); a broker can run several topics.
  • Offset: This is an identifier for each message.
  • Partition: This is an immutable and ordered sequence of records continually appended to a structured commit log.
  • Producer: This is the program that publishes data to topics.
  • Consumer: This is the program that processes data from the topics.
  • Retention period: This is the time to keep messages available for consumption.

There are four major APIs in Kafka:

  • Producer API – Permits an application to publish streams of records.
  • Consumer API – Permits an application to subscribe to topics and processes streams of records.
  • Connector API – Executes the reusable producer and consumer APIs that can link the topics to the existing applications.
  • Streams API – This API converts the input streams to output and produces the result.