How To Find Pages On Your Site That ChatGPT May Be Hallucinating

Posted in: SEO

Join thousands of marketers to get the best search news in under 5 minutes. Get resources, tips and more with The Splash newsletter:

As AI chatbots like ChatGPT become increasingly popular, more users are landing on your website via links shared directly from these tools. Sometimes though LLMS like ChatGPT send visitors to URLs on your site that don’t actually exist. These “hallucinated URLs” are URLs that ChatGPT got wrong, made up, or used a URL that you no longer have on your site.  These can all be challenges for users looking to lean into the new semantic search powered by LLMs.

Based on our analysis of over 18,000 landing page visits from ChatGPT, nearly 3.35% of URLs simply didn’t exist on the destination site and were hallucinated.

With users more often navigating the buyers journey with ChatGPT, having it send traffic to pages that doesn’t exist could be causing you to miss out on business opportunities. Here’s how you can quickly identify these hallucinated URLs using Google Analytics 4 (GA4) and Screaming Frog:

Step 1: Open Your Landing Page Report in GA4

To get started, head into Google Analytics 4 and open up your Landing Page report. You’ll find this by navigating to the “Reports” section, then going to “Engagement” and selecting “Landing page.” This report shows you every page people use as their entry point onto your site, which makes it the perfect place to start tracking AI-driven traffic.

chatgpt url hallucinations - landing page report

Step 2: Filter To Traffic Sent From ChatGPT

Next, you’ll want to zero in on visits that came only from ChatGPT. At the top of your report, click the add filter option to include only rows where the Source / Medium contains “chatgpt.com.” (You may see this appear as “chatgpt.com / referral” or something similar, so use “contains” instead of “exact match” to make sure you capture all relevant traffic.)

filter to chatgpt traffic

Step 3: Select A Date Range

Before exporting, it’s important to set the correct date range so you’re not just looking at a few recent visits. Adjust the date selector in the top right corner so your report covers all data since August 1 2024 (the month ChatGPT launched their search feature), if you want to catch the widest range of traffic. This helps ensure you’re seeing the full picture of ChatGPT-driven activity over time.

Step 4: Export Your Data

With the filters and date range set, export your landing page report. You can export as a CSV or send it directly to Google Sheets. What you’ll have now is a list of all landing page URLs that ChatGPT has sent visitors to.

Step 5: Add In Your Domain

Depending on your export, you may need to add your domain back onto the URLs. Many GA4 exports will just give you the path (like /my-page) instead of the full URL. To fix this, open the file in Google Sheets and use a formula to prepend your domain. For example, if your domain is “yourdomain.com” and your path is in column B, you can use a formula like =A2 & B2 to combine them, where A2 is your domain and B2 is the path.

Add your domain to the path

Step 6: Crawl The List of URLs With Screaming Frog

Once you have your list of full URLs, it’s time to check which of these pages actually exist on your site. Copy your finalized URL list and open Screaming Frog. Switch Screaming Frog to List Mode (under “Mode” in the menu), then paste in your URLs and run the crawl. Screaming Frog will quickly scan each URL and report the HTTP status for each.

After the crawl is complete, export the results and filter for any URLs returning a 404 Not Found status. These are the pages that ChatGPT referred users to that simply don’t exist on your website.

Next Steps

Finding and understanding hallucinated URLs helps you improve user experience by reducing frustration from visitors landing on nonexistent pages. If there are corresponding pages on your site you can 301 redirect the hallucinated URLs or build.

Cleaning up hallucinated traffic helps ensure you’re measuring your AI-driven traffic more accurately, so your reporting isn’t inflated by dead ends.

In our review of over 18,000 landing page visits from ChatGPT, we found that 3.35% of the URLs it recommended didn’t actually exist on the sites in question. This means a significant chunk of ChatGPT-driven traffic is being sent to phantom pages a challenge for both marketers and SEOs.

Search News Straight To Your Inbox

This field is for validation purposes and should be left unchanged.

*Required

Join thousands of marketers to get the best search news in under 5 minutes. Get resources, tips and more with The Splash newsletter: