Pipeline vs Parallel vs Hierarchical: Choosing Your Agent Architecture
Excerpt: Not every problem needs the same agent pattern. Here's a practical decision framework for choosing between pipeline, parallel, and hierarchical architectures.
I review a lot of agent systems. Student projects, production codebases, open-source repos, startup prototypes. And I see the same mistake over and over: people pick one architecture pattern -- almost always pipeline -- and use it for everything.
It is the hammer problem. You learn pipeline agents, everything looks like a sequential nail. But choosing the wrong agent architecture is not just an academic concern. It is the difference between a system that responds in two seconds and one that takes thirty. Between a system that gracefully handles failure and one that collapses when a single component hiccups. Between a system your team can debug in production and one that turns every incident into an archaeological expedition.
So let us break down the three core patterns, when each one shines, and -- critically -- when each one will hurt you.
Pattern 1: Pipeline Architecture
The idea: Agents execute in sequence. Each agent takes the output of the previous agent as input and passes its result to the next. Think assembly line.
The canonical example:
Document → Extract Entities → Classify Sentiment → Generate Summary → Format Output
Each step refines or transforms the previous result. The entity extractor finds names, dates, and organizations. The classifier uses those entities to assess sentiment. The summarizer uses both entities and sentiment to produce a coherent summary. The formatter wraps it all up for the end user.
When pipeline is the right call:
Tasks have clear, sequential dependencies (step 3 literally cannot run without step 2's output)
Data lineage matters (you need to trace exactly how an output was produced)
The system needs to be auditable (compliance, regulatory, medical)
You are early in development and need something debuggable
The strengths are real. Pipeline systems are the easiest to reason about. When something goes wrong, you check each stage in order. The data flows one direction. There is no ambiguity about what happened when. If you are building your first agent system, start here. Not because it is the best architecture, but because it is the one where you will learn the most about what you actually need.
But the weaknesses are equally real:
**Speed:** Pipeline is the slowest pattern by definition. Total latency = sum of all stage latencies. If you have five agents and each takes two seconds, your user waits ten seconds. Every time.
**Fragility:** A single point of failure at any stage kills the entire pipeline. If your entity extractor fails, nothing downstream can run. You need retry logic, fallback strategies, and graceful degradation at every stage.
**Rigidity:** Adding a new step means modifying the chain. Reordering steps can break downstream assumptions. The more stages you add, the harder the system is to modify.
A real-world lesson: One team I worked with built a pipeline for processing customer support tickets: classify intent, extract key details, draft response, review for policy compliance, send. Five stages. It worked beautifully -- until they needed to add language detection at the front. That single addition broke the entity extractor's assumptions, which broke the classifier's input format, which cascaded through the entire chain. Three days of debugging for what should have been a trivial addition.
Pipeline is a great starting point. It is rarely the final architecture.
Pattern 2: Parallel Architecture
The idea: Multiple agents work simultaneously on independent tasks. Results are aggregated at the end. Think divide and conquer.
The canonical example:
Code Review Request
├── Agent A: Analyze Python files
├── Agent B: Analyze JavaScript files
├── Agent C: Analyze TypeScript files
└── Agent D: Check dependency vulnerabilities
↓
Aggregator: Combine all findings into unified report
Each agent operates on a different slice of the problem. None of them need each other's output. They can run at the same time.
When parallel is the right call:
Tasks are genuinely independent (no data dependencies between them)
Speed is critical (user is waiting, SLA is tight)
Tasks are roughly similar in nature (all doing analysis, all doing generation)
Partial results are acceptable (if one agent fails, the others' results are still useful)
The strengths are compelling. Parallel is the fastest pattern. Total latency = latency of the slowest agent, not the sum. If you have four agents that each take two seconds, your user waits two seconds, not eight. That is a 4x improvement. For user-facing applications, that is the difference between "snappy" and "I'm going to go make coffee."
Parallel is also naturally fault-tolerant. If one agent fails, you still have results from the others. Your code review might be missing the TypeScript analysis, but the Python, JavaScript, and dependency findings are still there. You degrade gracefully instead of crashing entirely.
But aggregation is where parallel gets hard:
**The merge problem:** Combining results from multiple agents is non-trivial. What if Agent A and Agent C found the same bug in shared code? What if their recommendations contradict each other? The aggregator needs to be smart enough to deduplicate, reconcile, and prioritize.
**No inter-agent communication:** Agent A cannot say "hey, I found something in the Python code that Agent C should know about when reviewing the TypeScript." Each agent operates in isolation. If the problem actually has hidden dependencies between slices, parallel will miss them.
**Uneven workloads:** If Agent A finishes in 500ms and Agent D takes 30 seconds, you are bottlenecked by Agent D. Your theoretical speedup evaporates. You need to think carefully about how you partition the work.
A production pattern that works well: Fan-out/fan-in with a smart aggregator. The fan-out sends tasks to parallel agents. Each agent returns structured results (not free-form text). The fan-in aggregator uses a deterministic merge strategy -- not another LLM call, but actual code that deduplicates, ranks by severity, and formats the output. Using an LLM for aggregation adds latency and nondeterminism to the one step where you want speed and consistency.
Pattern 3: Hierarchical Architecture
The idea: A manager agent receives a task, breaks it into subtasks, delegates to specialist agents, reviews results, and may re-delegate or re-plan based on what comes back. Think CEO delegating to department heads.
The canonical example:
User: "Research the competitive landscape for AI tutoring platforms"
↓
Manager Agent
├── Plans: need market data, competitor list, pricing analysis, feature comparison
├── Delegates: "Research Agent, find top 10 competitors"
│ ↓ Returns list of 10 companies
├── Re-plans: "3 of these are defunct. Research Agent, find 3 replacements"
│ ↓ Returns 3 more
├── Delegates: "Analysis Agent, compare pricing models for these 10"
├── Delegates: "Analysis Agent, compare feature sets for these 10"
│ ↓ Both return structured comparisons
├── Reviews: "Pricing data is missing for 2 companies"
├── Re-delegates: "Research Agent, find pricing for Company X and Company Y"
│ ↓ Returns pricing
└── Synthesizes: Final comprehensive report
The key difference from pipeline and parallel is that the plan is not fixed in advance. The manager adapts based on intermediate results. It can retry, re-route, spawn new subtasks, or abandon a line of inquiry that is not productive.
When hierarchical is the right call:
The full scope of work cannot be known before starting
Tasks may reveal new tasks (research that uncovers unexpected findings)
Quality matters more than speed (the manager can iterate until satisfied)
The domain is complex enough that no single prompt can handle it
The strengths are powerful. Hierarchical is the most flexible pattern. It can handle ambiguity, adapt to surprises, and produce higher-quality results than either pipeline or parallel because the manager can iterate. It is the closest thing to how a skilled human actually manages complex work: plan, delegate, review, adjust.
But the weaknesses are significant:
**The manager is a bottleneck.** Every decision flows through the manager agent. If the manager is slow, everything is slow. If the manager makes a bad delegation decision, the entire subtree of work is wasted.
**Higher latency.** The plan-delegate-review-replan loop adds multiple round trips. A hierarchical system that produces great results might take 60 seconds where a pipeline takes 10.
**Harder to debug.** The execution path is dynamic. You cannot look at the code and know what will happen. You have to look at the logs from a specific run and trace the manager's decisions. When something goes wrong, you are debugging the manager's *judgment*, not just its code.
**Cost multiplier.** Every delegation is an LLM call. Every review is an LLM call. Every re-plan is an LLM call. A hierarchical system can easily make 10-20x more API calls than a pipeline solving the same problem.
The Decision Framework
After building and reviewing dozens of agent systems, here is the decision tree I use. It is simple on purpose -- most architecture decisions should be.
START: What is the task?
│
▼
Q1: Do the subtasks depend on each other's output?
│
├── YES → Pipeline
│ (Sequential dependencies require sequential execution)
│
└── NO → Continue
│
▼
Q2: Can you define all subtasks before execution starts?
│
├── YES → Parallel
│ (Known, independent tasks = run them all at once)
│
└── NO → Hierarchical
(Unknown scope = you need a manager to plan dynamically)
Two questions. That is it.
In practice, most systems use a combination. You might have a hierarchical manager that delegates to parallel worker groups, where each worker runs a short pipeline. That is fine. The framework still applies -- you are just applying it at different levels of the system.
Advanced Pattern: The Blackboard
There is a fourth pattern worth knowing, even though it is less common: the Blackboard architecture.
The idea: All agents share a common data structure (the "blackboard"). Any agent can read from and write to the blackboard. Agents are triggered when data they care about appears.
Blackboard (shared state)
├── Agent A watches for: raw_documents → writes: extracted_entities
├── Agent B watches for: extracted_entities → writes: relationships
├── Agent C watches for: raw_documents → writes: sentiment_scores
└── Agent D watches for: relationships + sentiment_scores → writes: final_analysis
The blackboard decouples agents from each other. Agent A does not know Agent B exists. It just writes entities to the blackboard. Agent B does not know where entities come from. It just reads them and produces relationships.
When to use it: When you have a complex system where agents need to share intermediate state, but you do not want to hardcode the communication paths. It is particularly powerful when agents may be added or removed dynamically, or when the same intermediate result is consumed by multiple downstream agents.
The catch: Shared mutable state is the source of approximately 90% of all software bugs in human history. The blackboard pattern trades the simplicity of pipeline for the flexibility of shared state, and you pay for that flexibility with concurrency bugs, race conditions, and debugging sessions that make you question your career choices.
Use it when you need it. Respect it when you do.
Advanced: Orchestrator + Blackboard
For production systems at scale, the most robust pattern I have seen combines hierarchical orchestration with a blackboard for state management.
The orchestrator (manager agent) handles planning and delegation. The blackboard handles state. Worker agents read from and write to the blackboard, and the orchestrator monitors the blackboard to decide what to do next.
This gives you the adaptability of hierarchical, the decoupled communication of blackboard, and a single point of control for observability and debugging. It is more complex to build, but for systems with more than five or six agents, it is dramatically easier to operate.
Advanced: Dynamic Spawning
One more pattern for the toolbox: dynamic agent spawning. Instead of pre-defining your pool of agents, the orchestrator creates agents on demand based on the task.
Orchestrator receives: "Analyze this codebase"
→ Detects: Python, TypeScript, Rust, SQL
→ Spawns: PythonAnalyzer, TypeScriptAnalyzer, RustAnalyzer, SQLAnalyzer
→ Each runs in parallel
→ Orchestrator aggregates results
If the codebase only had Python and SQL, only two agents would be spawned. The system scales to the problem automatically.
This pattern pairs naturally with parallel execution and is increasingly common in production systems where the input is variable and unpredictable.
Choosing in Practice
Here is what I tell my students: start with pipeline. Not because it is always the best choice, but because it forces you to think sequentially about your problem, which reveals the actual data dependencies. Once you understand those dependencies, you can make an informed decision about where parallel execution is safe and where hierarchical delegation adds value.
The worst agent architectures I have seen are the ones where someone started with the most complex pattern because it sounded impressive. Start simple. Let the problem tell you when you need more complexity.
And if you want to see these patterns in action -- not in a blog post, but in an interactive visualization where you can watch agents communicate, fail, recover, and produce results -- we built an Orchestration Pattern Visualizer as part of our curriculum. It lets you toggle between pipeline, parallel, and hierarchical execution of the same task and see exactly how the data flows, where the bottlenecks are, and why the architecture choice matters.
Because reading about architecture is useful. Watching it run is better. Building it yourself is where the learning actually happens.
This is Part 5 of our series on AI agent engineering. Next up: how we track AI costs per student and why that boring infrastructure might be the most important thing we built.