Deepak Vadgama bio photo

Deepak Vadgama

Software developer, amateur photographer

Email LinkedIn Github Stackoverflow Youtube Subscribe (RSS)

As engineers, we love to build systems. The more complex the system, more challenging it feels, the more satisfaction we get in designing and building a working system. Today, it is easy to brainstorm a new system with your LLM and implement it at lightning speed. As this effort reduces, we get a dopamine hit, a satisfaction of having built a working system. When the effort to build systems reduce by say 10x, you need to be more careful in not saying yes to building things. Amongst other things like customers getting bombardment of features, to accumulating long term technical debt, I feel bit sadenned by the over-engineering that is not utilized over lifetime of the project.

My organization size at Amazon is 120 odd engineers (160+ overall). Being the only Principal Engineer in the org, I encounter lot of new systems and features being built around me. Today, I came across 2 over-engineered systems.

First one, was a route/system for agentic system. The router component in addition to doing bunch of other things, would get structured inputs from the clients (UI) and would call appropriate agent. So far so good. To perform basic vaidations of the input as you do, it would need to access a set of rules configured in another service written in a custom format. The intent being the business users will themselves come and add new agents or new configurations, and this can be rolled out without any changes to the code.

Well, the system is live since 6 months, we have not even exposed that feature to a single user and I can bet my bottom dollar it will never be. Instead, what we have now is a simple web service which has a dependency on another service, which has its own code package and operational overhead. Not to mention the running costs, need to keep these 2 coupled systems consistent and yet another AWS lambda to manage.

Second one, was a system for performing reconciliation of financial records at a very large scale. The engineer proposed a design, of a recon engine. Roughly this is how the conversation went.

“It will be able to connect to any upstream data service like S3, DynamoDB, event bus and push data to downstream systems” Alright “It will be able to connect to any accept transformation required for these events in a specific format so that customer can configure it” I have seen this before “It will also allow for customers to add scripts for transformations if its events” There it is “It will run on customer’s AWS account so that anyone in the company can use this” Alarm bells ringing

What was proposed was essentially an ESB (Enterprise Service Bus) with a bit of matching logic. When using GenAI anyone can build their own such recon system, what is the USP of using this system? By the same logic why are we building a custom system for this instead of using a open source version? If instead of our own 2 projects, no one else ends up using this will the ROI be justified? What will be the maintenance, operational and engineering overhead of designing a system that runs in customer’s AWS account? Can you easily do patches, can you debug an issue, will you get access to the system especially if they have customer’s PII data? Have you objectively compared all approaches of the spectrum from domain specific custom solution to a full blown ESB solution?

On one hand, I don’t blame the engineers involved. We have all been there. Only after years of seeing systems and projects fail we gain that wisdom and start seeing patterns. There is also multitude of cultural attributes in play due to which I am sure my feedback will see a strong pushback. I will hear rationale of why this is the correct way and these over-engineered systems & features will still get built. Sigh.

Simple systems are easy to manage and debug. They argulably exhibit fewer bugs and are more secure due to lower attack surface area. They are easier to add features. Team can keep the system in their head making delta changes, bug fixes, project related communcation much simpler & effective.

In age of GenAI, it’s easy to say yes, but very difficult to say you ain’t gonna need it, keep it simple.


Tags: