Tag

#june 2026

4 posts tagged.

Agency suitability

Agency-suitability benchmark: whitelabel, MCP and API surface (June 2026)

Agencies build for clients, which changes what matters: can you remove the builder's branding, drive it programmatically, integrate via a stable API and export the code you ship? We scored seven builders on whitelabel, MCP support, API surface and portability. Totalum and Bolt.new led on the programmatic axes thanks to broad API and MCP surfaces; the consumer-first builders scored well on output but lagged on whitelabel and export. This page documents each capability, verified hands-on against current docs.

11 min read11
Deploy quality

Deploy-quality benchmark: SEO, accessibility and performance audits (June 2026)

We audited the deployed output of seven AI app builders with Lighthouse, axe-core and a structured SEO checklist — auditing the production build, not the in-editor preview. Performance was the strongest dimension across the board; accessibility was the weakest, with colour-contrast and form-label failures common. Lovable and v0 led overall, but no builder shipped a clean accessibility pass out of the box. This page reports per-dimension scores and the specific failures that recur, so you know what to fix after export.

11 min read9
Speed

Speed-to-first-paint across AI app builders (June 2026)

We timed two clocks for each builder: speed-to-first-paint (prompt to first rendered preview) and time-to-working-app (prompt to all acceptance checks passing). v0 by Vercel was fastest to first paint at a median 9 seconds; Base44 and Bolt.new followed. The ranking shifts for time-to-working-app, where full-app builders pay an upfront cost but reach a runnable result with fewer manual edits. We report medians of five cold runs on a fixed network profile, with the full distribution and caveats below.

10 min read9
Output quality

Benchmarking output quality across 7 AI app builders (June 2026)

We gave seven AI app builders one identical brief and scored the output on visual fidelity, code structure and functional correctness using a published, double-rated rubric. Lovable led on overall output quality (93/100), with v0 close behind on component fidelity and Bolt.new strong on framework breadth. Differences were largest in code structure, not visuals: every builder produced something that looked right, but maintainability and correctness diverged sharply. This page documents the brief, the rubric and the per-builder results, with every figure sourced.

11 min read9