Microservices might be the bee’s knees for enterprises looking to run applications in a more agile, cloud-native environment, but for data aggregation platform provider Segment the immaturity of the ecosystem forced it to ditch microservices for the comfort of a monolithic architecture.
Segment provides enterprise customers with the ability to tap into its host API to access more than 200 tools. Those tools allow for enterprises to glean insight from their customers that can be used for advertising, marketing, and operational direction. Enterprises using the platform include Crate & Barrel, Gap, IBM, and New Relic.
Segment’s platform routes customer event data between the point of creation and that customer’s APIs, which is typically connected to an analytics platform like Google Analytics. This can mean that Segment is tasked with routing hundreds of thousands of events per second.
Most of Segment’s cloud operations run on top of Amazon Web Services (AWS), with a small percentage also on Google Cloud Platform (GCP). Those cloud operations entail 250 different microservices totaling up to 16,000 individual containers supporting 300 billion events every month.
Segment jumped early into microservices with the company initially adopting the architecture in 2015. Microservices are typically small, one-function applications that when combined form a larger, more complex application. The benefit of a microservices architecture is that those smaller components can be manipulated individually without significantly impacting the larger application.
Calvin French-Owen, co-founder and CTO at Segment, at that time wrote that greater visibility was a significant reason for the move.
“When we’re getting paged at 3 a.m. on a Tuesday, it’s a million-times easier to see that a given worker is backing up compared to adding tracing through every single function call of a monolithic app,” he noted in a blog post. “That’s not to say you can’t get good visibility from more tightly coupled code, it’s just rarer to have all the right visibility from day one.”
However, the company early last year found itself inundated with challenges using microservices and began to look at ways to resolve those issues. Basically, it found that an error in any one of the routing connections would cause requests to begin piling up as the microservice-based application attempted to correct the issue. The system would try to compensate for the backup with auto-scaling that would begin transferring those backed-up requests to other connection ports that were also being used by other customers. Those ports would then get overloaded and system performance issues would migrate across its customer base.
“On the output side we are sending out up to 200 or more APIs to our customers,” French-Owen explained. “If each is moderately well behaved we might see them having one bad day per year. But with 200 or more APIs we are seeing an outage every day and a half.”
Even worse, those requests would overwhelm the system’s storage capabilities and result in some of that data getting lost. That was considered unacceptable.
“While our systems would automatically scale in response to increased load, the sudden increase in queue depth would outpace our ability to scale up, resulting in delays for the newest events,” explained Alexandra Noonan, a software engineer at Segment, in a blog post. “Delivery times for all destinations would increase because destination ‘X’ had a momentary outage.”
French-Owen explained that for the company’s most complex data aggregation platform, using a microservices architecture actually reduced productivity and increased complexity. “It was a good idea when we first approached it, but we found it to actually be more challenging to support,” he explained.
Noonan added that the microservices approach failed because Segment was not set up to handle the corresponding scaling and updating required. “We lacked the proper tooling for testing and deploying the microservices when bulk updates were needed,” she wrote. “As a result, our developer productivity quickly declined.”
Segment’s issues with microservices are not unique. A recent survey and report conducted and written by Dimensional Research and commissioned by monitoring company LightStep, found that 73 percent of enterprises using microservices found it harder to report problems.
For Segment, the only answer was to move its platform back to a monolithic architecture.
Back to the Monolith
A monolithic architecture is constructed as a single unit, with all of the pieces intertwined. This theoretically provides for greater performance and efficiency as all of the needed parts are optimized to work together. But it does make it more challenging in terms of updates or maintenance.
As part of its transition, Segment’s engineering team developed an aggregator – dubbed Centrifuge – that replaced the individual queues from the microservices-based platform and was responsible for sending events to the single monolithic service. It also developed a test suite for recording and saving a destination’s test traffic.
“With every destination living in one service we had a good mix of CPU and memory-intense destinations, which made scaling the service to meet demand significantly easier,” Noonan wrote. “The large worker pool can absorb spikes in load, so we no longer get paged for destinations that process small amounts of load.”
The company did admit to some downsides from the move back to a monolithic architecture. This included more difficulty in dealing with fault isolation, less effective in-memory caching, and dependency updates that if mishandled would have a broader impact on operations.
French-Owen said that while the move back to a monolithic architecture was the right decision for this specific platform, Segment does still use microservices for other services. “For us it just made sense in this instance to move back to something that we were familiar with and knew we could have more control over,” he said.