In this tutorial I will explain basic Kafka basic terms and terminology.
- Apache Kafka is an open source streaming platform.
- The Kafka core is written in Scala, and Kafka Streams and KSQL are written in Java.
- Fast, highly scalable and redundant messaging through a pub-sub model.
- Messaging System
- Activity Tracking
- Gather metrics from different locations
- Application Logs
- Streams Processing
- Integration with other Big Data Technologies
Some terms to explain:
- Cluster: This is a set of Kafka brokers.
- Zookeeper: This is a cluster coordinator
- Broker: This is a Kafka server, also the Kafka server process itself.
- Topic: This is a queue (that has log partitions); a broker can run several topics.
- Offset: This is an identifier for each message.
- Partition: This is an immutable and ordered sequence of records continually appended to a structured commit log.
- Producer: This is the program that publishes data to topics.
- Consumer: This is the program that processes data from the topics.
- Retention period: This is the time to keep messages available for consumption.
There are four major APIs in Kafka:
- Producer API – Permits an application to publish streams of records.
- Consumer API – Permits an application to subscribe to topics and processes streams of records.
- Connector API – Executes the reusable producer and consumer APIs that can link the topics to the existing applications.
- Streams API – This API converts the input streams to output and produces the result.