Apache Kafka is the next generation distributed messaging system. Originally developed by technicians at LinkedIn, Kafka has gained widespread popularity in a very short time. In fact, most IT job openings floated by top companies require Kafka as an essential skill among applicants. The Kafka course would acquaint you with ways to integrate Kafka with the data pipeline of your organization.
Why Learning Kafka Is An Important Career Decision?
You can tap into online data generated from various sources to learn about page visits, clicks, user activities, logins, post likes, shares, comments, page loading time, performance, logs and other important analytics metrics with Kafka.
Kafka training would upskill you to track abnormal user behavior on organizational WebPages, deliver targeted ads, display relevance-based search results, and display recommendations based on prior user activities.
Best Practices For Optimizing The Performance of Kafka
1) Keep Monitoring Brokers For Optimum Capacity Planning
For optimizing network throughput, you need to monitor
- Transmit (TX)
- Receive (RX)
- Disk Space
- Disk I/O
- CPU use
You can plan for maximum capacity and maintain cluster performance by tracking the aforesaid parameters.
2) Spread Partition Leadership Within Cluster’s Brokers
A significant amount of a network’s I/O resources are consumed by leadership. If you run Kafka with replication factor 3, for example, then:
- Partition data must be received by the leader
- Two copies have to be transmitted to replica
- All consumers desirous of consuming data should be transmitted required data
A leader is taxed more compared to a follower when the usage of network I/O is considered. Followers are needed to write only, whereas leaders need to frequently read from disk. Hence, it is advisable to distribute partition leadership among cluster’s brokers.
3) Monitor Brokers for Symptoms of Potential Problems
You should periodically track the brokers for symptoms of
- Un-preferred leaders.
- In-sync replica shrinks or ISR.
- Under-replicated partitions.
If the aforesaid are evident, problems in a cluster may show up soon. You may have noticed frequent shrinking of ISR, for example, for a given partition. This is indicative of the fact that partition’s data rate is exceeding the ability of the leader to cater to replica threads and consumer.
4) Change Properties of Apache Log4j When Required
Excess of disk space may get consumed during logging by Kafka broker. You would need to modify the properties of Apache Log4j to minimize this.
Refrain from eliminating logging totally as broker logs are helpful in reconstructing the events’ sequence if an incident happens.
5) Manage Cleaning-Up of Unused Topics Adequately
If messages are not visible for a specified number of days, you should consider the topic inactive and eliminate it from a cluster. By this, you would prevent building up of extra metadata inside the cluster. Hence, resources would not be consumed unnecessarily.
You should also disable the spontaneous creation of topics. Else, you should enforce a transparent policy for cleaning up of unused topics.
6) Set Aside Adequate Memory For High Throughput Brokers
You must make provision for a direct serving of partition data from file system cache of the operating system if possible. For this, consumers need to keep up with serving. If the consumer lags, the broker would be compelled to read from disk sub-system. To avoid this, set aside sufficient memory for sustaining high-throughput brokers.
7) Go For Isolation of Topics
If the SLOs (Service Level Objectives) of a large cluster is of high throughput, you need to isolate topics to a brokers’ subset. Based on the requirements of the business, you need to isolate the topics.
An incident’s probable blast radius can be limited, for example, through isolation of each system’s topics to separate brokers’ subsets. This is useful if the same cluster is used by many online transaction processing (OLTP) systems.
8) Avoid Usage of Older Clients
You brokers would be saddled with the additional load when older clients are used with new formats for topic messages. This is because the formats are converted for the client. This should be avoided if possible.
9) Test The Broker in Real-Time Production Environment
You may test the performance of broker on desktop using loopback interfacing to the partition through replication factor of 1. This simulated topology would actually vary from a real-time production environment. You should not assume that the broker’s performance on the desktop would be similar to that of production.
In the loopback interface, the network latency is insignificant. As replication would not be involved, hence the time needed for receiving the leader acknowledgements would vary significantly.
You can gain further insights into the operational dynamics of Kafka from here: https://www.youtube.com/watch?v=hyJZP-rgooc
Complete Your Kafka Training Now
Learning Kafka would entitle you to a host of benefits on the career front. Your organization too can count on your skills to become familiar with the latest trends and user inclinations through accurate predictive analysis of real-time big data.
Source: Dice Insights