Here is a list of top Kafka interview question based on my exposure to Apache Kafka
- What is Apache Kafka
- What are topics and partitions
- What are various components of Kafka
- What is a consumer group
- What is an offset
- What are the different ways to commit an offset?
- What is the importance of __consumer_offsets topic
What is the importance of __consumer_offsets topic ?
Offset is basically a pointer that specifies till what position the data has been consumed or produced for a given topic and partition. All Kafka offsets are stored on an internal topic named __consumer_offsets inside Kafka’s cluster. Offsets are committed by consumers to kafka cluster using auto-commit or by committing manually on code. The commit is analogous as relational database commits.
- Can two consumers consume from the same topic?
- What is the benefit of partitioning?
- What is backpressure?
- How are you Monitoring Apache Kafka? Have you used Kafka Manager
- Kafka exposes metrics are you using it or plotting it somewhere if yes the how?
- How Kafka stores its data.
- Why is Kafka So Fast
- What is Zero copy approach
- How Kafka distributes the topic partitions among the brokers
- Explain the topology of Apache KafkaHow many nodes of Apache Kafka
- How many Zookeeper Instances
- What is the data scale? How much data are you managing using
- Kafka How Many Topics are there
- How are partition look like?
- How many partitions per topic.
- Have you considered a topic partition per tenant? What should be the use case?
- How will you model data with Kafka. what all things will you keep in mind?
- How much data are you get per minute/day?
- What are the message inflow and outflow rates?
- What is the bandwidth usage?
- What is disk space usage?
- Explain the anatomy of topic
- Log Segments
- Log compaction
- What if a Kafka node dies out. Is there any monitoring in place. How much time would it take to find out a missing node?
- How are you replacing a Kafka node in production? How much time does it take to replace a Kafka node?
- Kafka needs Zookeeper. So if a Zookeeper node dies out, How are you replacing it and how is Kafka managing. Are you stopping Kafka and replace Zookeeper Ip and restarting.
- What happens to the producer when a broker is down. How are you handling this in Producer?
- What happens when a broker is down and no ISR is available
- What is the average downtime in a month
How to increase replication factor for a topic
Increasing replication is a 4 step process that includes following proces
- Check existing partition assignment
- Create a custom reassignment plan ( a JSON file)
- Do the reassignment process
- Verify the assignments
check for more details https://whiteboardtalks.com/how-to-increase-replication-factor-for-a-kafka-topic/
Kafka Producer Configuration or Tuning Kafka Producer
Describe the producer configuration you need to take care for configuring Kafka Producer.
- Batch size
- Sync or Async
- What is an in-sync replica and how it differs from normal replica
- What is broker Skew and broker spread
- What is the purge strategy
- How Kafka manages replication. What is your replication strategy?
Kafka provisioning & Installation
- How are you installing Kafka?
- How are you provisioning Kafka in cloud?
- What tools are you using , packer, terraform, anisble…?
- How often do you make changes to prod Kafka?
- Are you using Docker Is Zookeeper bundled in the same container?
How are you monitoring Kafka and zookeeper?
Kafka & Zookeeper Backup and Restore
- Are you backing up Kafka’s data? Are you backing up Kafka’s topic configurations
- Any Tools you have used for backing and restore of Kafka
- Burry.sh https://github.com/mhausenblas/burry.sh
- Uses https://github.com/spredfast/kafka-connect-s3 to take the backup of a topic
- Uses Exhibitor UI for backing up Zookeeper
- Uses https://github.com/spredfast/kafka-connect-s3 to take backup of a topic
Kafka Performance & Throughput
- How Kafka partitions can help to improve throughput. Does more partition means more parallel streams and more throughput.
What makes Kafka scalable, fast and high in throughput.
- Zero Copy – See https://en.wikipedia.org/wiki/Zero-copy basically it calls the OS kernal direct rather than at the application layer to move data fast.
- Batch Data in Chunks – Kafka is all about batching the data into chunks. This minimises cross machine latency with all the buffering/copying that accompanies this.
- Avoids Random Disk Access – as Kafka is an immutable commit log it does not need to rewind the disk and do many random I/O operations and can just access the disk in a sequential manner. This enables it to get similar speeds from a physical disk compared with memory.
- Can Scale Horizontally – The ability to have thousands of partitions for a single topic spread among thousands of machines means Kafka can handle huge loads.
Kafka Broker Logging Configuration
How will you debug any issue with Kafka? What are various log files used by Kafka broker.
|Name||What is being logged||Rolling configuration||Log info|
|controller.log||Partition and Replica states, Admin tasks like reassign partitions||Every hour||Trace|
|state-change.log||Changes in server state||
|server.log||Server messages and any problems which might cause the server to crash, shutdown||Every hour||
|Kafka-request.log||requests received by the broker are logged here||Hourly||
|kafkaServer-gc.log||The JVM Garbage collector logs||Single file||