Environment-in-the-Loop

How camel-kit uses the execution environment as a dynamic participant in code generation

Code migration tools typically follow a linear path: analyze source code, generate target code, hope it works. When it doesn’t — a dependency that won’t resolve, a Docker service that won’t start, a component that doesn’t exist for the target runtime — the developer is left debugging alone. The AI agent finished its job and moved on.

Camel-Kit takes a different approach. The execution environment is not an afterthought — it’s a first-class participant in the code generation pipeline. Inspired by the Environment-in-the-Loop paradigm (Li et al., ReCode ‘26), camel-kit creates a closed feedback loop where environment signals actively drive code refinement.

The Core Insight

“Without automated environment interaction, the automation of code migration is only half complete.” — Li et al., ReCode ‘26

Traditional code generation treats the environment as static: generate code based on specifications, then verify at the end. This approach has three problems:

  1. Late discovery of failures — dependency conflicts, missing runtime extensions, and service availability issues are only found after significant code has been generated
  2. No automatic recovery — when the environment rejects the generated code, the AI can’t fix it without starting over
  3. Disconnected testing — test generation happens independently of test execution, so tests are never iteratively refined based on actual runtime behavior

Camel-Kit solves all three by embedding environment interaction at every stage of the pipeline.

How It Works

The pipeline creates a continuous feedback loop between three concerns: code generation, environment verification, and test validation.

Before Any Code Is Generated

The first step of /camel-execute is an environment probe — a lightweight feasibility check that runs before any implementer subagent is dispatched.

The probe generates a throwaway skeleton in a temporary directory:

  • pom.xml with all planned dependencies
  • docker-compose.yaml with required services
  • An empty route (just enough to verify the runtime boots)

Then it runs three checks:

CheckWhat It ValidatesCommand
Dependency resolutionAll Maven artifacts exist and resolve./mvnw dependency:resolve
Docker servicesRequired databases, brokers, etc. can startdocker compose up -d
Runtime startupThe framework itself bootsRuntime-specific start command

If a check fails, the probe classifies the error:

  • Mechanical failure (wrong artifact name, port conflict) — auto-fix and re-probe
  • Architectural failure (component doesn’t exist for this runtime) — trigger automatic re-planning

The skeleton is deleted after the probe completes. The real implementation generates proper project files.

After Code Is Generated

The verification loop (/camel-verify) runs Citrus integration tests to validate the generated code against real infrastructure.

Three phases:

PhaseWhat Happens
BuildCompile the project (./mvnw compile). Classify and fix build errors. Skipped for JBang.
TestRun Citrus YAML tests via camel test run. Tests are self-contained: Testcontainers start services, the Camel integration launches within the test, send/receive actions validate behavior.
ReportStructured summary of phases, fixes applied, and issues found.

Each phase retries up to 15 times. On each iteration, errors are classified and routed to the appropriate fix:

Fix TargetWhen Used
Self-repairMissing dependency, Docker config issue — fix directly
camel-implementRoute logic error — re-generate from the design spec
camel-validateWrong component options — re-verify against the MCP catalog
camel-testTest itself is wrong — re-generate the test from the design spec
re-planPersistent architectural failure — modify the design and re-implement

When the Approach Is Wrong

Sometimes the problem isn’t in the code — it’s in the plan. A component that works in isolation might conflict with another, or a runtime extension might not exist for the chosen platform.

When fix attempts fail repeatedly, camel-kit automatically re-plans:

  1. Identify the scope — which design document sections need to change
  2. Find alternatives via MCP — query the catalog for components that fulfill the same role
  3. Modify the design — update only the affected sections, preserving everything else
  4. Re-implement and re-verify — generate new code and run tests again

The re-plan loop runs up to 3 rounds. If the same failure class persists after a round, it short-circuits immediately rather than trying the same approach again. After 3 rounds, it escalates to the user with a full report of what was tried.

Two-tier promotion model:

The system decides when to re-plan based on how experienced developers think about errors:

  • Tier 1 (immediate): After one failed fix, query the MCP catalog. If the catalog confirms the component doesn’t exist for this runtime — re-plan immediately. A senior developer would check the docs first, not try 15 random fixes.

  • Tier 2 (progressive): After three failed fixes on the same error class — the approach is wrong, not just the code. Re-plan.

The Closed Loop

💡
Design
You approve
📋
Plan
🔍
Probe
⚙️
Implement
Verify
re-plan if architectural failure

One approval gate. You approve the design (the architecture, the components, the integration patterns). After that, planning, probing, implementation, and verification flow continuously. If the environment discovers a problem, the system fixes it — either at the code level (self-repair, re-implement) or at the design level (re-plan).

This means:

  • No wasted implementation work — the probe catches infeasible plans before code is generated
  • No manual test debugging — test failures route automatically to the right fix target
  • No silent failures — every error is classified, every fix attempt is tracked, every escalation includes context
  • No stale tests — when tests are wrong, they’re re-generated from the design spec, not manually patched

Error Taxonomy

Every error discovered during probing or verification is classified and routed. The classification determines what gets fixed and how.

Mechanical vs Architectural

The probe and verify loop use an “assume mechanical, promote on failure” rule:

Mechanical Errors
Fixable without changing the plan
  • Wrong Maven artifact name
  • Docker port conflict
  • Missing transitive dependency
  • Docker image tag not found
  • Incorrect property key
Action: auto-fix and re-probe/re-verify
Architectural Errors
The plan itself is infeasible
  • Component doesn't exist for target runtime
  • Irreconcilable dependency conflict
  • Component removed in target version
  • Private/licensed Docker image
  • Incompatible component combination
Action: trigger re-plan loop (max 3 rounds)

The key insight: MCP is the oracle that distinguishes mechanical from architectural. When a dependency fails, the probe queries camel_catalog_component — if MCP confirms the component doesn’t exist for this runtime/version, it’s architectural. If MCP returns a valid artifact with a different name, it’s mechanical.

How Errors Promote

Not every error reveals its nature immediately. The system uses a two-tier promotion model that mirrors how experienced developers think:

Tier 1: Immediate Promotion
0-1 fix attempts
After one failed fix, query the MCP catalog. If it confirms the failure is structural (component missing, extension unavailable), skip further fix attempts and re-plan immediately.
Like a senior developer who checks the docs first, not tries 15 random fixes.
Tier 2: Progressive Promotion
3 failed fix attempts
After three failed fixes on the same error class — each with a different strategy — the approach is wrong, not just the code. Promote to re-plan.
Like a developer who tries fixing the code a few times before questioning the spec.

Fix Routing

Every classified error routes to a specific fix target. The taxonomy covers errors from both the probe (pre-implementation) and the verification loop (post-implementation):

Error CategoryExamplesFix Target
Missing dependencyClassNotFoundException, unresolved artifactSelf-repair (add to pom.xml)
Version conflictNoSuchMethodError, BOM misalignmentSelf-repair (align versions)
Wrong component optionsResolveEndpointFailedExceptioncamel-validate (re-verify via MCP)
Route logic errorFailedToCreateRouteException, wrong outputcamel-implement (re-generate route)
Test is wrongAssertion expects wrong value, test parse errorcamel-test (re-generate test)
Docker/service issueConnection refused, container won’t startSelf-repair (restart, fix config)
ArchitecturalComponent doesn’t exist, irreconcilable conflictRe-plan (modify design, max 3 rounds)
UnresolvableBuild tool error, Quarkus augmentation failureEscalate to user

What Makes This Different

Most AI coding tools follow a generate-and-hope model: produce code, let the developer figure out if it works. Some add a build check at the end. Camel-Kit goes further:

AspectGenerate-and-HopeCamel-Kit EITL
When environment is checkedAfter all code is generatedBefore (probe) and after (verify)
What happens on failureUser debugsAuto-fix, re-generate, or re-plan
Test strategyGenerate tests, never run themGenerate tests, run them, fix them
Feedback to designNone — design is immutableRe-plan loop modifies design documents
Service managementManual Docker ComposeTestcontainers in self-contained tests