
Stream Processing at the Edge: Why Embracing Failure is the Winning Strategy

(Summit Art Creations/Shutterstock)
As a former network and storage systems administrator, it’s been amazing to watch the cloud abstract away the complexity of infrastructure. Managed services today allow enterprises to scale systems without needing to get nearly as deep into the low-level plumbing of networking, storage, and data systems as they once had to.
That’s why I’m fascinated by the widespread adoption of edge computing architectures. With this rush to the edge (a $378 billion market by 2028, according to IDC), enterprises are diving into some of distributed computing’s hardest challenges: constrained networks, messy failure scenarios, and streaming data requirements that break the mold of how many engineers still think of data as something static in a database.
Let’s take a closer look at why the edge is so challenging, how it’s pushing against conventional ways that platform teams think about data, and why the stakes are so high for getting this important and fast-growing architecture right.
From Industrial IoT to Mainstream Enterprise Applications
Edge computing’s early use cases came from industrial IoT, where network connectivity is spotty and milliseconds matter. In factories, predictive maintenance systems at the edge play critical operational roles, like shutting down overheated machinery just in time to avoid disaster. These systems need ultra-low latency, localized processing, and ways to handle dropped connections.
But edge computing has moved far beyond just industrial settings.
Today, businesses are algorithmically processing every data point to make decisions. Getting edge computing right involves figuring out how to handle data generated at the endpoints outside of your centralized infrastructure — making it replayable, resilient, and highly available. Some of these endpoints can include smart devices, cell phones, or connected vehicles. It’s a problem facing any company with remote sites, fleets of devices that need to “phone home,” or any use case where AI training or inference happens. Edge computing is about routing data and extracting value as close to real time as possible, wherever that value makes the greatest impact.
Architecting for Unreliable Networks and Inevitable Data Failures
Edge environments are often defined by unstable network connections. Devices shut off. DSL lines in rural areas drop. Uptime is wildly inconsistent. The first design principle of edge computing: ensure endpoints can recover from failure and deliver data once connections return.
Latency is another failure domain that’s critical in edge architectures. If industrial machines hit certain pressure or temperature thresholds, a split-second delay in command response — caused by bandwidth congestion, for example — can have catastrophic consequences. A good network isn’t enough if a latency spike prevents a crucial data point from reaching its destination.
Data sovereignty and locality add another layer of edge complexity. For example, gambling companies can’t move certain data types across state lines due to regulation. Financial services firms face consumer privacy laws that limit where they can process and analyze data. Many edge use cases require sanitizing data before it leaves a region, to stay compliant and protect customers.
Seeing Data as Events: A New Architectural Mindset
There are two common but flawed approaches that obstruct enterprises seeking to solve these edge computing problems.
The first: piping all edge sites to a central hub, and running services there. This adds latency, complicates sovereignty, and creates a central point of failure.
The second: thinking about edge data in traditional database terms. Historically, data has been treated as something static — organized into schemas and retrieved for later analysis. That “data at rest” model treats persistence as a first-class design characteristic, and events as afterthoughts.
Streaming data flips that model. Instead of storing data to act on it later, it prioritizes acting in real time as events happen. It emphasizes the “happening” over the “thing,” letting systems continuously process and respond to events, including recovering from failures. This is essential at the edge, where latency and sovereignty requirements often call for processing to happen closer to the source.
Why Stream Processing is a Natural Fit for Challenging Edge Architectures
Stream processing provides a flexible and reliable data substrate that allows for in-flight manipulation of data.
The most common framework for streaming data is Apache Kafka, which is built on an immutable, append-only log for durability and replayability. Kafka is a distributed technology that allows for scalability and high availability, even at the edge. If something fails, Kafka lets consumer apps replay events from the log — no data gets lost. Kafka supports exactly-once semantics, transactional processing, and asynchronous replication (e.g., cluster linking), helping systems recover from connectivity issues while maintaining consistency. That makes it a great fit for environments with spotty connectivity or high availability requirements.
Apache Flink complements Kafka with stateful stream processing. Its fault tolerance relies on checkpointing and state snapshots — saving the application’s state periodically to durable storage. In the event of a failure, Flink recovers from the last checkpoint, minimizing disruption and avoiding inconsistencies. Flink also processes streams in near real time, enabling edge use cases like data sanitization, aggregation, and enrichment, all while staying resilient.
Kafka and Flink provide the best starting point for an event-driven data infrastructure that’s highly compatible with edge architectures.
High Rewards for Getting Edge and Streaming Right
Cloud computing’s great victory has been making infrastructure far more usable by default. The promise of allowing engineers to focus on creating value instead of managing infrastructure has become such a truism it’s now cliché.
What makes the edge exciting is that it’s relatively complex and still maturing — offering huge technology and business advantages to companies that get it right. Being good at edge today is like being good at web apps in the late ’90s, microservices in the mid-2000s, or Kubernetes in the mid-2010s.
For platform teams already running multi-tenant systems, dealing with failure in massively distributed, ephemeral environments isn’t a new challenge. But for enterprises still stuck in a “data at rest” mindset, the cost of entry into the edge is adopting event-driven streaming architectures.
The payoff is worth it. Stream processing unlocks real-time data flow and real-time insight, and supports a culture that’s ready to respond instantly to a changing world.
About the author: Joseph Morais serves as a technical champion and data streaming evangelist at Confluent. Before joining Confluent, Joseph was a senior technical account manager at AWS helping enterprise customers scale through their cloud journey. Joseph has also worked for Amino Payments, where he focused on Kafka, Apache Hadoop, NGINX, and automation initiatives. He also was on Urban Outfitters’ e-commerce operations team, focusing on agile methodology, CI/CD, containerization, public cloud architecture, and infrastructure as code projects.
Related Items:
Confluent Goes On Prem with Apache Flink Stream Processing
Understanding Your Options for Stream Processing Frameworks