The Internet of Things (IoT) is getting more and more traction as valuable use cases come to light. A key challenge, however, is integrating devices and machines to process the data in real time and at scale. Apache Kafka® and its surrounding ecosystem, which includes Kafka Connect, Kafka Streams, and ksqlDB, have become the technology of choice for integrating and processing these kinds of datasets.
Kafka-native options to note for MQTT integration beyond Kafka client APIs like Java, Python, .NET, and C/C++ are:
Before I discuss these in more detail, let’s take a look at some common use cases where Confluent Platform and Confluent Cloud are used for IoT projects today.
Confluent Platform and Confluent Cloud are already used in many IoT deployments, both in Consumer IoT and Industrial IoT (IIoT). Most scenarios require a reliable, scalable, and secure end-to-end integration that enables bidirectional communication and data processing in real time. Some specific use cases are:
Machine learning plays a huge role in many of these use cases, regardless of the industry, and you can read Using Apache Kafka to Drive Cutting-Edge Machine Learning for more insights.
Let’s now take a look at the 10,000-foot view of a robust IoT integration architecture.
IoT integration architectures need to integrate the edge (devices, machines, cars, etc.) with the datacenter (on premises, cloud, and hybrid) to be able to process IoT data.
To be flexible and future ready, an IoT integration architecture should possess the following requirements:
But several challenges increase the complexity of IoT integration architectures:
Given these requirements and challenges, let’s take a look now at how MQTT and other IoT standards help integrate datacenters and the edge.
There are many IoT standards and technologies available on the market. If we had to choose, these are the most common options for implementing IoT integrations:
No wonder technical know-how is not evenly distributed in both realms. In the IoT environment, for example, a large number of protocols for data exchange have developed in recent years. Only MQTT will seem familiar to an automation technology employee.
In the same way, industrial protocols are a book with seven seals for software engineers. It may be that some industrial protocols are well suited for a specific IoT solution, just as certain security features of modern IoT protocols are suited for industry. But that doesn’t move much.
MQTT has become the standard solution for most IoT scenarios today, especially outside of IIoT. Although MQTT is the focus of this blog post, in a future article I will cover MQTT integration with IIoT and its proprietary protocols, like Siemens S7, Modbus, and ADS, through leveraging PLC4X and its Kafka integration. For more details about using Kafka Connect and PLC4X for IIoT integration scenarios, you can check out these slides on flexible and scalable integration in the automation industry and the accompanying video explaining the relation between IIoT, Apache Kafka, and PLC4X.
Based on my conversations with industrial customers—who are pained by the challenges of closed, inflexible interfaces—I noticed that more and more IIoT devices and machines also provide an MQTT interface that can be integrated into modern systems.
Regarding the tradeoffs of MQTT, consider the pros and cons:
Pros
Cons
These tradeoffs show that MQTT is built for IoT scenarios but requires help when it comes to integrating into the enterprise architecture of a company. This is where the event streaming platform Apache Kafka and its ecosystem come into play.
Apache Kafka is an event streaming platform that combines messaging, storage, and processing of data to build highly scalable, reliable, secure, and real-time infrastructure. Those who use Kafka often use Kafka Connect as well to enable integration with any source or sink. Kafka Streams is also useful, because it allows continuous stream processing. From an IoT perspective, Kafka presents the following tradeoffs:
Pros
Cons
Since Kafka was not built for IoT communication at the edge, the combination of Apache Kafka and MQTT together are a match made in heaven for building scalable, reliable, and secure IoT infrastructures.
How do you integrate both?
The following sections demonstrate three Kafka-native options, meaning you generally do not need an additional technology besides MQTT devices/gateways/brokers and Confluent Platform to integrate and process IoT data.
Kafka Connect is a framework included in Apache Kafka that integrates Kafka with other systems. Its purpose is to make it easy to add new systems to scalable and secure event streaming pipelines while leveraging all the features of Apache Kafka, such as high throughput, scalability, and reliability. The easiest way to download and install new source and sink connectors is via Confluent Hub. You can find installation steps, documentation, and even the source code for connectors that are open source.
The Kafka Connect MQTT connector is a plugin for sending and receiving data from a MQTT broker.
The MQTT broker is persistent and provides MQTT-specific features. It consumes push data from IoT devices, which Kafka Connect pulls at its own pace, without overwhelming the source or getting overwhelmed by the source. Out-of-the-box scalability and integration features like Kafka Connect Converters and Single Message Transforms (SMTs) are further advantages of using Kafka Connect connectors.
The MQTT connectors are independent of a specific MQTT broker implementation. I have seen several projects start with Mosquitto and then move towards a reliable, scalable broker like HiveMQ during the transition from a pilot project to pre-production.
In some scenarios, the main challenge and requirement is to ingest data into Kafka for further processing and analytics in other backend systems. In this case, an MQTT broker is just added complexity, cost, and operational overhead.
Confluent MQTT Proxy delivers a Kafka-native MQTT proxy that allows organizations to eliminate the additional cost and lag of intermediate MQTT brokers. MQTT Proxy accesses, combines, and guarantees that IoT data flows into the business without adding additional layers of complexity.
MQTT Proxy is horizontally scalable, consumes push data from IoT devices, and forwards it to Kafka brokers with low latency. No MQTT broker is required as an intermediary. The Kafka broker is the source of truth responsible for persistence, high availability, and reliability of the IoT data. Please note that producing from Kafka to IoT devices is not implemented yet at the time of writing this blog post.
However, although everybody thinks about IoT standards like MQTT or OPC UA when integrating IoT devices, oftentimes REST and HTTP(S) are a much simpler solution.
REST Proxy provides a RESTful interface to a Kafka cluster, making it easy to produce and consume messages, view the state of the cluster, and perform administrative actions without using the native Kafka protocol or clients.
Why might you use HTTP(S) for an IoT integration? Due to various reasons, REST Proxy makes implementation and deployment simpler, faster, and easier compared with IoT-specific technologies:
No matter how you decide to integrate IoT devices, building a reliable end-to-end monitoring infrastructure is essential.
Distributed systems are hard to monitor and secure. A Kafka cluster is not much different—you have to monitor and secure the Kafka brokers, ZooKeeper nodes, client consumer groups (Java, Python, Go, REST, etc.), and Connect and ksqlDB clusters.
In terms of monitoring your whole Kafka infrastructure end to end, Confluent Control Center delivers insights into the inner workings of your Kafka clusters and the data that flows through them. Control Center gives the administrator monitoring and management capabilities through curated dashboards, so that they can deliver optimal performance and meet SLAs for their clusters. This includes:
With security features like Role-Based Access Control (RBAC), you also have the ability to enable simple and standardized authentication and authorization for all components of the Confluent Platform.
The use cases for IoT integration scenarios are always similar: integrate with devices or machines; ingest the event streaming data in real time into the Kafka cluster (on premises or in the cloud); process the data with Kafka Streams and ksqlDB; and then send the data back to the device or machine, and/or to the other sinks like a database, analytics tool, or any other business application.
With Kafka-native options like clients, MQTT connectors, MQTT Proxy, or REST Proxy, you can integrate IoT technologies and interfaces to establish a powerful but simple architecture without using additional tools. This is especially recommended in 24/7 mission-critical deployments, where each additional component increases complexity, risk, and cost. You have many options, so choose the one that suits your situation best.
If you want to read a complete story about building an end-to-end IoT architecture from edge to cloud, read Enabling Connected Transformation with Apache Kafka and TensorFlow on Google Cloud Platform, which focuses on Google Cloud Platform, Confluent Cloud, and MQTT integration for building a scalable and reliable machine learning infrastructure.
The content of this blog post is also captured in this interactive lightboard recording called End-to-End Integration: IoT Edge to Confluent Cloud.
If you’re encountering similar challenges and use cases in your company, feel free to reach out and I’d be happy to discuss with you further.
Download the Confluent Platform to get started with the leading distribution of Apache Kafka.
We are proud to announce the release of Apache Kafka 3.9.0. This is a major release, the final one in the 3.x line. This will also be the final major release to feature the deprecated Apache ZooKeeper® mode. Starting in 4.0 and later, Kafka will always run without ZooKeeper.
In this third installment of a blog series examining Kafka Producer and Consumer Internals, we switch our attention to Kafka consumer clients, examining how consumers interact with brokers, coordinate their partitions, and send requests to read data from Kafka topics.