The shift from humans doing data work to agents doing it happened faster than most engineering teams have caught up with. Two years ago, wrangling messy data from multiple sources into something a downstream process could consume was a manual slog. Today, that task gets handed to an agent while the engineer moves on to something else. The pipeline did not get smaller. The operator changed.
Gabriel Linero is a Lead Software Engineer at EPAM Systems, where his work on Spark optimization and cost architecture cut pipeline runtimes in half and reduced AWS data pipeline costs by 30 percent. Earlier roles at Perficient Latin America included a migration to Databricks that delivered a 10x reduction in query latency and an end-to-end data quality framework that dropped unexpected model failures by 60 percent. He is now pursuing a Master's in Computer Science at the University of Southern California while watching the ground shift beneath the discipline he has spent his career in.
"With multi-agent architectures, you can delegate each step of a data or ML pipeline to specialized agents, run them in parallel, and consolidate the results. What used to be manual work is now fully automated," says Linero.
How the work gets distributed: "A single agent receives the task and decides, I cannot handle this myself. I need someone on the front end, someone on the back end, someone on integrations. Based on that single prompt, it provides context to each agent to focus only on its specific task," Linero says. A recent sprint building a full web application ran on 10 to 20 coordinated agents working in parallel.
Exploration is the ideal entry point: Read-only access, low blast radius. "You provide read access and let the agent explore your data platform. It is perfect for ramping up on a new codebase because the risk is very low." Production changes are a different story. Linero insists human review stays in the loop before anything ships, and he warns against letting agents push directly to main branches or trigger automated CI/CD without a verification gate.
This orchestration layer is itself being automated, Linero notes. The question of how many agents to spawn for a given task used to be an engineering decision. Now a coordinating "manager" agent makes that call based on exploratory context, spinning up specialists the way a team lead assigns tickets.
Context is the whole game: Thin prompts produce hallucinations that propagate down the chain, and once a bad assumption enters step one, the rest of the pipeline inherits it. Linero's fix is to stop pasting ticket descriptions into a prompt box and start wiring agents directly into the systems where context actually lives. "Most Jira descriptions assume the reader already has context, so they skip details. If you copy-paste that into a model, it might work or it might not. But if you connect directly to Jira through an MCP server or API, the agent can read the ticket, the related tickets, the comments, and the parent spike. It collects real context about how your team actually works."
Feedback loops beat one-shot prompts: Giving agents tools that let them test, observe, and adjust produces more deterministic behavior than hoping the first generation lands. Without that loop, a hallucination in an early step quietly corrupts every downstream task, and the agent rarely catches its own mistake on the way back through.
The same discipline extends to cost. The default instinct is to reach for the most capable model every time, and Linero argues that is how token budgets vanish overnight.
Tier models to the task: "I do not use Opus for everything. Most tasks do not require it. I use Haiku for most work and save the most intelligent model for complex orchestration, like spawning other agents."
Prompt shape drives spend: "Just saying 'give me a concise summary' instead of 'give me a summary' can greatly reduce tokens. Most people do not think about that." Quota-based subscriptions, he adds, force engineers to develop the habit the way a Spark job forces you to think about cluster sizing.
The through-line connects directly to the cost engineering work Linero has been doing for years. Migrating Spark workloads from EMR Serverless to EMR on EKS, or refactoring small jobs into native Python on a single Kubernetes pod, required the same instinct he now applies to model selection. The workload tells you what it needs. Overprovisioning is a choice.
What makes the agent shift interesting is not that the work is new. It is that the discipline is the same, applied to a different layer of the stack. The engineers who already know how to scope, instrument, and constrain a system are the ones who will build agent fleets that actually ship. The rest will spend a lot of tokens discovering the hard way that autonomy without boundaries is just expensive chaos.