Skip to content

Kafka Interview Questions

  • by

Here is a list of top Kafka interview question based on my exposure to Apache Kafka

Kafka Basics

  • What is Apache Kafka
  • What are topics and partitions 
  • What are various components of Kafka
  • What is a consumer group
  • What is an offset
  • What are the different ways to commit an offset?
  • Does Kafka provides ordering guarantees? 

What is the importance of __consumer_offsets topic ?

Offset is basically a pointer that specifies till what position the data has been consumed or produced for a given topic and partition. All Kafka offsets are stored on an internal topic named __consumer_offsets inside Kafka’s cluster. Offsets are committed by consumers to kafka cluster using auto-commit or by committing manually on code. The commit is analogous as relational database commits.

Can two consumers consume from the same topic?

It depends on the Group Id. As long as the consumers belong to different Group Id they can consume from the same topic. In other words, A message within the topic is consumed by only one consumer in a consumer group. In order to process a message by multiple consumers, we need to place it in a different consumer group.

  • What is the benefit of partitioning?
  • What is backpressure?

Kafka Monitoring

  • How are you Monitoring Apache Kafka? Have you used Kafka Manager
  • Kafka exposes metrics are you using it or plotting it somewhere if yes then how?

Kafka Internals

Kafka Topology

  • Explain the topology of Apache KafkaHow many nodes of Apache Kafka
  • How many Zookeeper Instances
  • What is the data scale? How much data are you managing using
  • Kafka How Many Topics are there 
  • How are partition look like?
    • How many partitions per topic.
    • Have you considered a topic partition per tenant? What should be the use case?
    • How will you model data with Kafka. what all things will you keep in mind?
  • How much data are you get per minute/day?
  • What are the message inflow and outflow rates?
  • What is the bandwidth usage?
  • What is disk space usage?

Kafka Topics

  • Explain the anatomy of topic
    • Topics
    • Partitions
    • Offsets
    • Log 
    • Log Segments
    • Log compaction
Anatomy of Kafka Topic

Kafka Availability

  • What if a Kafka node dies out. Is there any monitoring in place. How much time would it take to find out a missing node?
  • How are you replacing a Kafka node in production? How much time does it take to replace a Kafka node?
  • Kafka needs Zookeeper. So if a Zookeeper node dies out,  How are you replacing it and how is Kafka managing. Are you stopping Kafka and replace Zookeeper Ip and restarting.
  • What happens to the producer when a broker is down. How are you handling this in Producer?
  • What happens when a broker is down and no ISR is available
  • What is the average downtime in a month

How to increase replication factor for a topic

Increasing replication is a 4 step process that includes following proces

  • Check existing partition assignment
  • Create a custom reassignment plan ( a JSON file)
  • Do the reassignment process
  • Verify the assignments

check for more details

Kafka Producer Configuration or Tuning Kafka Producer

Describe the producer configuration you need to take care for configuring Kafka Producer. 

  • Compression
  • Batch size
  • Sync or Async
  • Acks

Kafka Manager

  • What is an in-sync replica and how it differs from normal replica
  • What is broker Skew and broker spread
  • What is the purge strategy
  • How Kafka manages replication. What is your replication strategy?

Kafka provisioning & Installation

  • How are you installing Kafka? 
  • How are you provisioning Kafka in cloud?
    • What tools are you using , packer, terraform, anisble…?
  • How often do you make changes to prod Kafka?
  • Are you using Docker Is Zookeeper bundled in the same container?
    How are you monitoring Kafka and zookeeper?

Kafka & Zookeeper Backup and Restore

Kafka Performance & Throughput

  • How Kafka partitions can help to improve throughput. Does more partition means more parallel streams and more throughput.

What makes Kafka scalable, fast and high in throughput.

  • Zero Copy – See basically it calls the OS kernal direct rather than at the application layer to move data fast.
  • Batch Data in Chunks – Kafka is all about batching the data into chunks. This minimises cross machine latency with all the buffering/copying that accompanies this.
  • Avoids Random Disk Access – as Kafka is an immutable commit log it does not need to rewind the disk and do many random I/O operations and can just access the disk in a sequential manner. This enables it to get similar speeds from a physical disk compared with memory.
  • Can Scale Horizontally – The ability to have thousands of partitions for a single topic spread among thousands of machines means Kafka can handle huge loads.
  • References

Kafka Broker Logging Configuration

How will you debug any issue with Kafka? What are various log files used by Kafka broker.

Name What is being logged Rolling configuration Log info
controller.log Partition and Replica states,  Admin tasks like reassign partitions Every hour Trace
state-change.log Changes in server state Every hour
server.log Server messages and any problems which might cause the server to crash, shutdown  Every hour INFO
Kafka-request.log requests received by the broker are logged here Hourly WARN
kafkaServer-gc.log The JVM Garbage collector logs Single file
Hourly Info

Leave a Reply

Your email address will not be published. Required fields are marked *