Field Experiment + Research Report

April 202622 min read

Four AIs, One Desk: How the AI Shopping Layer Is Forming Above the Aggregators

The same shopping prompt run through ChatGPT, Gemini, Perplexity, and Claude returned four entirely different shortlists — and four entirely different paradigms for how the purchase journey now begins.

You want a new desk for your office. Fifteen years ago that sentence started at Google and then moved to Amazon. Today it can start anywhere — in a chat window, in an AI assistant embedded in your browser, in a search result that has quietly become a conversation. What happens next depends on which AI you asked, and the differences are larger than anyone is saying.

I ran the same prompt — "I am shopping for a new desk for my office. Any suggestions?" — through ChatGPT, Gemini, Perplexity, and Claude. Four assistants, one query, four entirely different shopping experiences. Different brands, different price anchors, different elicitation behavior, different commerce affordances. These are not four flavors of the same tool. They are four distinct paradigms for how the purchase journey now begins, and the choice of AI materially shapes what the consumer considers, what they trust, and what they eventually buy.

The Experiment

ChatGPT — The Storefront

Picking a desk is one of those decisions that quietly shapes your entire workday. The right one depends on a few key things: space, ergonomics, storage, and aesthetic. Here's a curated set of strong options.

Best overall (most people): Eureka Ergonomic Ark Executive Standing Desk — $1,799. Sweet spot for high-end styling without going extreme; built-in storage and power features.
Best budget-friendly: SEDETA 55" Electric Standing Desk with Storage — $279.99. Affordable electric standing desk with drawers and integrated power.
Best mid-range modern: Eureka Ergonomic 55" Standing Desk with Drawers — $429. Clean modern design, wood finish, compact footprint.
Best luxury statement: iMovR Lander Executive Standing Desk (Solid Wood) — $8,195. Heirloom-quality executive desk with top-tier stability.

[Followed by comparison table and quick advice: "If you're on Zoom a lot → executive. If clean aesthetics → mid-range modern. If buy once, never replace → solid wood luxury."]

If you want, tell me room size, monitors, style — and I can narrow it down.

Paradigm: storefront. Commerce is the default output. Four specific products with prices and buy-ready framing land before you've said anything about your space, budget, or aesthetic. Elicitation is offered as optional refinement after the recommendation is anchored. Three of the four picks are Eureka Ergonomic — algorithmic preference structure is clearly operating.

Gemini — The Catalog

Choosing the right desk depends on how you work. Here are top-rated options for 2026 by best use.

Best for movement & health: FlexiSpot E7 Standing Desk — $389.99 (4.8★, 5k+ reviews). Exceptional stability, quiet motor, height presets. FlexiSpot EW8 Standing Desk — $269.99 (4.6★, 1k+ reviews). Integrated drawer and three USB charging ports.
Best for large setup & storage: Fort Myers Reversible Multi-Functional Desk — $203.99. 360° swivel, L-shape/straight/compact configuration. Realspace Magellan 59" L-Shape Corner Desk — $399.99. Leatherette inlay, slide-out tray, four drawers. Ashley Roanhowe 68" Home Office Desk — $829.99. Hardwood solids, oak veneers, six dovetail drawers, USB-A/C.
Best for small spaces: IKEA Micke Desk — $99.99. Slimline, budget-friendly, cable management.

Summary: For healthiest workflow, FlexiSpot E7. For multi-tasking, Fort Myers. For high-end, Ashley Roanhowe.

Paradigm: catalog. Same commerce-native orientation as ChatGPT, but backed by Google's Shopping Graph. Merchant cards with ratings, review counts, and outbound links are the infrastructure showing through. FlexiSpot dominates the standing desk slot the way Eureka did on ChatGPT. Structure is nearly identical; brand stack is completely different.

Perplexity — The Research Layer

Your best desk depends on how you use the office. A few strong options:

Standing desk for everyday work: FlexiSpot E2. Sit-stand flexibility, good for long hours.
Compact sit-stand option: Branch Duo. Space-saving, quiet, minimalist.
L-shaped for more workspace: Sauder Palladia or Realspace Magellan. Better for multi-monitor and paperwork.
Executive-style: Sauder Heritage Hill or Hooker Furniture Brookhaven. Traditional, storage-heavy, polished look.

How to choose: Pick standing if ergonomics matter most. Pick L-shape if you want to spread out. Pick executive if appearance and storage matter more than movement. Measure your room first.

What's your budget and room size?

Follow-ups: Top desks head-to-head · 2026 buyer's guide · What size fits a small office · Best budget desks under $200 · Standing vs. sitting pros and cons

Paradigm: research layer. Brands are named, but without the product-card commerce treatment. Recommendations feel advisory rather than transactional. The "follow-ups" section treats shopping as an iterative research query rather than a purchase occasion. Closest to the classical search-then-compare pattern, translated into conversational form.

Claude — The Consultant

Happy to help you find the perfect desk! A few quick questions first:

Q: What's your budget range? → $700–$1,500
Q: What style are you going for? → Traditional/classic
Q: Any specific needs? → Multiple monitors, storage & drawers

Great choices. A traditional desk with storage and multi-monitor support in that range opens up solid options. [After web search:]

Bush Cabot L-Shaped Executive Desk (~$700–$900). Timeless look, full-sized hutch with glass doors, 4-port USB hub, double-pedestal storage, ample surface for multiple monitors.
Sauder Edge Water Executive Desk (~$700–$1,000). Roomy surface, T-lock drawer system, full-extension slides, flip-down keyboard panel.
GarveeHome Executive Desk (~$800–$1,100). 400-lb capacity, adjustable shelf, four drawers, full-extension file drawer, built-in charging station.

Things to keep in mind: surface depth of at least 30 inches for monitors. Confirm drawers support letter/legal/A4 for real filing. If running multiple heavy monitors, confirm 350+ lb capacity. Cable management — grommets and trays matter.

My top pick for your needs: the Bush Cabot L-Shaped — nails traditional aesthetic, ample storage, ideal for spreading out multiple monitors.

Paradigm: consultant. Claude refused to recommend anything until it had constraints. Budget, style, and needs came first as structured elicitation. Only after those constraints were set did it web-search and surface products — with shopping considerations (desk depth, file drawer sizing, weight capacity) rather than commerce cards. The inverse of ChatGPT's pattern: deliberation is mandatory, not optional.

What Just Happened

Four paradigms, not four flavors: storefront (commerce-first, elicitation optional), catalog (commerce-first, merchant-metadata-rich), research layer (advisory, iterative), consultant (elicitation-first, constraint-driven).

Diagram comparing how ChatGPT, Gemini, Perplexity, and Claude each handle a shopping query — The four behavior paradigms — same prompt, four entirely different shopping experiences.

The most under-appreciated finding is not that the outputs look different. It's that the brand consideration sets are entirely different. ChatGPT's stack was Eureka-heavy. Gemini's stack was FlexiSpot-heavy. Perplexity mixed FlexiSpot with Sauder. Claude surfaced Bush, Sauder, and GarveeHome. Same prompt, same consumer, four completely different shortlists.

This is not neutral infrastructure. Whatever algorithmic, training, partnership, or ranking forces produce these differences, the practical effect is that brand equity now has a platform-distribution problem that looks a lot like retail distribution fifty years ago — you have to be on the shelf in all four stores, except the shelves are opaque and change with every query.

What the Research Says

The behavior is no longer hypothetical. Adobe's March 2025 survey of more than 5,000 U.S. consumers found that 39% had already used generative AI for online shopping and 53% planned to do so that year; a follow-up in August 2025 showed 38% — a persisting behavior, not a spike. Salesforce's 2025 retail research reported that 17% of shoppers had used an AI assistant for product searches in the past year, and 30% had used one while shopping in physical stores. Bloomreach found that 61% of surveyed U.S. consumers had used an AI tool to help them shop online, and 54% said their search habits had become more conversational over the prior year.

Adoption is uneven across categories. A 2026 Vogue Business survey found that 69% of respondents used AI chatbots at least occasionally, but only 2% regularly used them for fashion and beauty shopping. That resistance points to something structural: AI is strongest where purchase can be specified as constraints, trade-offs, and specs, and weakest where identity, aesthetics, and embodied fit dominate. A desk is mostly a constraint problem. A jacket is mostly not.

The research literature shows AI is already changing consideration sets, not just search speed. A 443-response comparative study found that products recommended by ChatGPT were more likely to be adopted than those from conventional recommenders, with trust-transfer effects strongest for lower- and medium-awareness brands. A 2026 Decision Support Systems study found that AI-generated reviews significantly increased purchase intention for search products — and had no effect for experience products.

The commerce rails are being built fast. OpenAI launched Instant Checkout and the Agentic Commerce Protocol in late 2025. Google built the Universal Commerce Protocol on top of a Shopping Graph that now holds more than 50 billion product listings, refreshed at more than 2 billion updates per hour. Etsy has integrated direct purchasing with OpenAI, Google, Microsoft, and Stripe. The marketplaces are not being displaced. They are becoming upstream suppliers to AI platforms.

Impulse, Migrating

For fifteen years, the dominant engine of online impulse buying has been aggregator recommenders — Amazon's "customers also bought," Etsy's algorithmic feed, eBay's personalized surfacing. Those systems are not getting worse. They are being moved downstream in the purchase journey. The impulse moment is migrating off the aggregator and onto the AI conversation.

The mechanism is structural, and it comes down to a lifetime-value asymmetry almost nobody is naming clearly.

Amazon's recommender cannot tell you to buy a desk from Wayfair. It is structurally incapable. Every recommendation is zero-sum within Amazon's catalog, because Amazon's LTV equation is bounded by Amazon's venue. When Amazon recommends the wrong product, the cost is absorbed in returns and the relationship survives because the consumer has no better alternative inside the walled garden. Same for Etsy, eBay, Walmart, Target. Aggregator recommenders optimize for venue retention, not user outcomes — because venue retention is the business model.

Side-by-side comparison of the aggregator business model versus the AI assistant business model — Aggregator LTV is bounded by the venue. AI assistant LTV spans every future decision a user brings to the assistant.

An AI assistant's LTV equation is cross-commerce. If Claude or ChatGPT recommends the wrong desk, the consumer doesn't just stop trusting it for desks. They stop trusting it for laptops, insurance, the next home renovation, the mortgage refi question, the college decision. The trust surface spans every future purchase and every non-purchase decision the user might bring to the assistant. The payoff window is longer and the category surface is wider than any aggregator can ever have. Which means the AI company's incentive to get this desk right is, structurally, orders of magnitude larger than Amazon's incentive to get the same choice right.

This is how aggregator dominance historically breaks. Amazon broke book retail by caring about cross-book LTV more than any single bookstore could. Then it broke general retail by caring about cross-category LTV more than any single category retailer could. The AI layer runs the same move one level higher: caring about cross-commerce LTV more than any single aggregator can. And in the middle phase — where we are now — consumers may genuinely get better recommendations, because the AI's economic interest really does align with getting the choice right across a trust surface no aggregator can match.

The honest complication is that this is also how new aggregation forms. Consumers are not escaping the walled garden. They are trading Amazon's garden for OpenAI's or Google's — and the new garden is much larger, spans more categories, and is much less visible because it doesn't look like a storefront. Pew found that when a Google search page showed an AI summary, users clicked traditional result links only 8% of the time versus 15% without one. The visibility of the aggregation is collapsing as its scope expands.

The online impulse of the next decade will not be driven primarily by Etsy's homepage algorithm or Amazon's "frequently bought together." It will be driven by what gets surfaced in the AI conversation, and how persuasively the AI presents it. Which brings the question that matters most.

The Alignment Question

If AI assistants have cross-commerce LTV that genuinely aligns their interests with user outcomes, then in principle they guide consumers better than aggregator recommenders ever could. "In principle" is doing enormous work in that sentence. The direction of alignment depends entirely on the business model sitting behind the AI — and the four platforms in the desk experiment are not aligned the same way.

Anthropic (Claude) sells subscriptions. The user is the customer. Revenue is tied directly to sustained trust and willingness to pay, and willingness to pay is tied directly to the assistant actually being useful. Claude's refusal to recommend a desk before eliciting constraints is consistent with that model — the assistant is optimizing for the quality of your decision, because that is what the business model monetizes. This is the cleanest current alignment between cross-commerce LTV and user benefit.

OpenAI (ChatGPT) has a subscription tier but is actively building a merchant-side revenue layer through Instant Checkout and the Agentic Commerce Protocol. OpenAI currently says shopping results are organic and unsponsored, and ranks merchants by factors that include "Instant Checkout availability." That is a subtle but real merchant-economic signal already embedded in the ranking function. The paradigm ChatGPT exhibited in the desk test — storefront-first, commerce-ready recommendations before elicitation — is structurally compatible with merchant economics over time.

Google (Gemini) runs on a quarter-century of advertising muscle memory. The Shopping Graph is built on paid placement. The institutional reflex to monetize the consumer relationship against the consumer's own interest is the deepest in the industry. Google's AI commerce path is not yet as paid-placement-dominant as its search path, but the institutional gravity points in one direction, and it is not toward user-LTV alignment.

Perplexity sits in the most ambiguous position. It has been experimenting with advertising formats, and its revenue model remains in flux. The research paradigm it currently exhibits is closer to advisory than transactional, but its long-run economic shape will determine whether that persists or migrates toward the Google model.

The practical takeaway: consumers currently have no visibility into which economic model shapes the AI they are talking to. The AI that refuses to recommend until it understands your constraints is not more scrupulous than the one that leads with four specific products — it is monetized differently. That is not a minor technical footnote. It is the single most important thing a consumer can understand about AI shopping in 2026.

What This Means

For consumers: the AI you choose is a commerce decision, not a productivity decision. The paradigm shapes what you see. The business model shapes what the paradigm is optimized for. The difference between a subscription-funded AI and an ad-funded AI is as large as the difference between a fiduciary advisor and a commissioned salesperson. Both can be useful. They are not the same relationship.

For brands: the shelf has moved. Your brand equity now has a platform-distribution problem that looks like retail distribution in 1975, except the shelves are opaque, reshuffle with every query, and are governed by ranking signals the retailer does not disclose. Being on Amazon is no longer enough. Being discoverable in the AI conversation is the new distribution question, and it is one most brands have not yet started answering.

For the aggregators: the topline looks fine today, and that is exactly what you would expect in the early phase of this kind of shift. Amazon's GMV is up. Etsy is integrating. eBay is adapting. None of that tells you whether the long-run economic surface of commerce is migrating upstream — and the structural answer is that it is, because cross-commerce LTV beats single-venue LTV on a long enough timeline, and the timeline is shorter than it looks.

Share Your Voice

Join the conversation to share your thoughts and help others understand this topic better.

Join the Conversation

Community Feedback

No comments yet. Be the first to share your thoughts!