Kafka is a streaming platform. The three distinguishing features are:

  • Publish and subscribe to streams of records, similar to a message queue or enterprise messaging system
  • Store streams of records in a fault-tolerant durable way
  • Process streams of records as they occur

Kafka is generally used for two broad classes of applications:

  • Building real-time streaming data pipelines that reliably get data between systems of applications.
  • Building real-time streaming applications that transform or react to the streams of data.


  • Producer – An application that sends messages to Kafka
  • Message – Small to medium sized piece of data
  • Consumer – An application that reads data from Kafka
  • Broker – Name given to a Kafka sever
  • Cluster – A group of computers sharing workload for a common purpose
  • Topic – A topic is a unique name for Kafka stream
  • Partitions – breaking up topics based on partitions or groupings
  • Offset – A sequence id given to messages as they arrive in a partition
  • Global Unique Identifier of a message – Topic Name -> Partition Number –> Offset
  • Consumer groups – A group of consumers acting as a single logical unit



Kafka video tutorial youtube