The System Prompts That Stop My AI Agent from Generating Ugly UIs

TL;DR: I built a coding agent for Solo IDE that generates frontend code. The default output looked like every other AI-generated UI — Inter font, purple gradient, white background. A 20-line <frontend_aesthetics> prompt block fixed it. Here's how the entire prompt system works, and the specific techniques that made the biggest difference.

The "AI Slop" Problem

If you've asked Claude, GPT, or any LLM to build you a React component, you've seen it. The same look every time: Inter or system fonts, a purple-to-blue gradient, white cards with rounded corners, generic spacing. It's technically correct but feels lifeless. Developers started calling it "AI slop" — output that's statistically average because that's what the model was trained on.

When I started building Solo IDE's coding agent, every generated page looked the same. Different prompt, same aesthetic. It didn't matter if I asked for a dashboard, a landing page, or a settings screen — they all converged on the same visual DNA.

The fix wasn't about better components or design tokens. It was about prompt engineering.

How Solo's Prompt System Works

Solo's coding agent doesn't use a single monolithic system prompt. Instead, it composes prompts from reusable XML-tagged blocks:

// server/src/agents/coding/prompts/system.ts
export function buildSystemPrompt(opts: SystemPromptOptions): string {
  return `<role>
You are an AI coding agent running inside Solo IDE...
</role>

<environment>
- Working directory: ${opts.workspaceRoot}
- Model: ${opts.model}
</environment>

<frontend_aesthetics>
...the magic block...
</frontend_aesthetics>

<icon_policy>
...icon enforcement...
</icon_policy>

<performance_best_practices>
...Vercel-sourced rules...
</performance_best_practices>`;
}

Each block is an independent concern. Change <icon_policy> once and it applies everywhere — every agent variant, every task type. This composition pattern came from porting Orchids (the Electron predecessor), which had ~10,700 lines of prompt engineering across 10 files.

The architecture is simple but the blocks themselves are where things get interesting.

The Block That Changed Everything

Here's the complete <frontend_aesthetics> block — the single highest-impact piece of prompt engineering in the entire codebase:

<frontend_aesthetics>
You tend to converge toward generic, "on distribution" outputs.
In frontend design, this creates what users call the "AI slop"
aesthetic. Avoid this: make creative, distinctive frontends
that surprise and delight.

Typography: Choose fonts that are beautiful, unique, and
interesting. Avoid generic fonts like Arial and Inter.

Color & Theme: Commit to a cohesive aesthetic. Dominant colors
with sharp accents outperform timid, evenly-distributed palettes.

Motion: Focus on high-impact moments: one well-orchestrated
page load with staggered reveals creates more delight than
scattered micro-interactions.

Backgrounds: Create atmosphere and depth rather than defaulting
to solid colors.

Avoid: Overused font families (Inter, Roboto), cliched color
schemes (purple gradients on white), predictable layouts.
</frontend_aesthetics>

Why does this work? Three techniques that I think are transferable to any AI coding tool:

1. Name the failure mode in the model's vocabulary

The opening line — "You tend to converge toward generic, 'on distribution' outputs" — uses a term from machine learning. Language models understand "on distribution" at a conceptual level. It's telling the model: I know you're going to default to the statistical average, and I'm explicitly asking you not to.

Most prompts say "be creative." This one says "I know how you fail to be creative, and here's the technical term for it." The model can reason about its own tendencies when you frame them in terms it was trained on.

2. Pair every prohibition with an escape hatch

Saying "don't use Inter" isn't enough. The model will just pick the next most common font (Roboto, system-ui). The block works because every "don't" has a corresponding "do":

Don't use generic fonts → "choose distinctive choices that elevate aesthetics"
Don't use evenly-distributed palettes → "dominant colors with sharp accents"
Don't scatter micro-interactions → "one well-orchestrated page load with staggered reveals"

The model needs to know where to go, not just where not to go.

3. Call out convergence explicitly

The last line is sneaky: "You still tend to converge on common choices (Space Grotesk, for example) across generations. Avoid this."

After I added the rest of the block, the model stopped using Inter — and started using Space Grotesk for everything. This line is a second-order correction. It tells the model: even your "creative" choices will converge if you're not careful.

Beyond Aesthetics: Four More Blocks

The frontend aesthetics block handles visual quality. But there were other gaps in the generated code that needed their own policies.

Icon Enforcement

Left to its own devices, the model picks a random icon library every time — sometimes Lucide, sometimes Heroicons, sometimes raw emoji characters. I standardized on Phosphor Icons and made the system prompt enforce it:

<icon_policy>
CRITICAL: Use Phosphor Icons (@phosphor-icons/react) as the
ONLY icon library.
- NEVER use lucide-react, heroicons, react-icons
- NEVER use emoji characters as icons
Available icons: ${getPhosphorIconList()}
</icon_policy>

The getPhosphorIconList() call injects all 1,512 real Phosphor icon names into the prompt. This costs ~4K tokens but it means the model picks from actual existing icons instead of hallucinating names like <SettingsGear /> that don't exist in any library.

I extracted the icon names from the Phosphor font metadata (selection.json) — a one-time script that reads the IcoMoon font manifest and converts kebab-case names to PascalCase React components:

// phosphor-icons-reference.ts (auto-generated, 1,512 entries)
export const PHOSPHOR_ICON_NAMES = [
  "Acorn", "AddressBook", "AirTrafficControl",
  "Airplane", "AirplaneTilt", "Airplay", ...
] as const;

Emoji Prohibition

This one surprised me. I'd get generated code with status badges using raw emoji — a green checkmark emoji for success, a red X for failure. Looks fine in a code review, looks terrible in production. A dedicated block fixed it:

<no_emoji_policy>
STRICT: Never use emoji characters anywhere in generated code.
- No emoji in JSX text, button labels, placeholders
- No emoji in error messages or status indicators
- No emoji in comments or console.log statements
Use Phosphor Icons for all visual indicators instead.
</no_emoji_policy>

Performance Guardrails

I embedded Vercel's published React performance rules directly into the system prompt. The CRITICAL-tier rules catch the most common mistakes:

<performance_best_practices>
CRITICAL - Eliminating Waterfalls:
- Use Promise.all() for independent async operations
- Never sequential awaits for independent data

CRITICAL - Bundle Size:
- Import directly from subpaths, not barrel files
- Use dynamic(() => import('./Heavy'), { ssr: false })
  for heavy components
</performance_best_practices>

Without this, the model writes sequential await calls for data that could be fetched in parallel — every single time. The prompt doesn't need to explain why waterfalls are bad. It just needs to say "use Promise.all() for independent operations" and the model follows through.

Accessibility Baseline

<accessibility_basics>
- Interactive elements must be keyboard accessible
- All images must have meaningful alt text
- Respect prefers-reduced-motion
- Use semantic HTML: nav, main, section, article
</accessibility_basics>

Simple, but without it, the model defaults to <div onClick> instead of <button>, skips alt text, and ignores reduced-motion preferences.

What I'd Do Differently

If I were starting over, I'd write the prompt blocks before building any of the agent infrastructure. The prompt system is where 80% of the output quality comes from. I spent weeks building streaming UI, tool approval dialogs, and session persistence — all important — but the prompt blocks had a bigger impact on user experience than any of those features.

I'd also version the prompt blocks. Right now they're string literals in TypeScript files. If I had them in a more structured format (separate Markdown files, a prompt registry), I could A/B test changes and track which block revisions correlate with better output quality.

Key Takeaways

Name the failure mode using the model's own vocabulary. "On distribution" works better than "be creative" because the model can reason about its own statistical tendencies.
Every prohibition needs an escape hatch. Don't just say "don't use Inter" — tell the model where to go instead. Otherwise it picks the next most common option.
Inject real data into prompts. The Phosphor icon list costs tokens but prevents hallucinated icon names. If your agent needs to pick from a fixed set, put the full set in the prompt.
Separate concerns into XML blocks. One block per policy. Composition over monolith. Change <icon_policy> without touching <performance_best_practices>.
Watch for second-order convergence. The model's "creative" choices converge too. After it stops using Inter, it starts using Space Grotesk for everything. Call this out explicitly.
Embed rules from authoritative sources verbatim. The Vercel performance rules work better than my own paraphrased versions because the model has likely seen them in training data and can apply them more reliably.

The prompt system described here is part of Solo IDE, a Tauri 2 desktop IDE I'm building with a Rust backend, Node.js agent sidecar, and React 19 frontend. The full prompt source is ~250 lines across 4 agent files — small enough to read in one sitting.