Website troubleshooting

Start at the datasource row in Data Sources: it shows current status, last crawl timestamps, and an error message if the latest crawl failed.

For most issues, open the datasource detail view — the Crawl history section and the per-page list are where the signal is. Don’t delete and re-create the datasource before checking those.

Common scenarios

Symptom	Likely cause	Fix
Crawl has been “In Progress” for hours, no counters moving	Firecrawl pipeline stall or webhook delivery lost	Open the datasource → Cancel crawl, then Start crawl again. If it recurs, file a ticket with the datasource ID — the back-end crawl jobs are traceable.
AI cites non-English pages even after adding an English-only filter	Old pages from before the filter change are still indexed	Open the datasource → Pages view → filter by language or URL pattern → bulk Delete. The next crawl respects the new filter going forward.
AI answers from blog/funnel pages that shouldn’t be in the index	`exclude_paths` is empty or too narrow	Add the section regex to Exclude paths (e.g. `/blog/`, `/pricing/`). Then bulk-delete existing indexed pages that match from the Pages view.
Some articles are missing after the crawl	`page_limit` hit (default 100), paths blocked by `include_paths`, or pages require JS rendering for content	Raise `page_limit`. Remove over-narrow `include_paths`. For JS-heavy pages, spot-check by hitting a single page’s Resync — if it comes back empty, the page isn’t renderable by the crawler and needs a different source.
AI surfaces the same article twice	Both the crawler and a direct integration (Zendesk/Intercom/Front) are indexing the same content	Pick one. Disconnect the other. The direct integration is almost always the better choice for Help Center content — the crawler stays for other domains.
Crawl completes but no pages synced	The entry URL returned no links, or every link was blocked by `include_paths` / `exclude_paths`	Open a browser to the URL and confirm the page links out to the content you expect. Review `include_paths` — overly specific regexes filter out everything.
None of the above	—	Contact support with the datasource ID, approximate time of the issue, and a sample URL that’s missing/wrong.

Limits at a glance

	Value
Page limit per crawl	`1`–`5000` (default `100`)
Minimum crawl interval	`24` hours
Default crawl interval	`168` hours (7 days)
Binary file types	Always excluded
Auth-gated URLs	Not supported
JavaScript rendering	Best effort via Firecrawl — not guaranteed for heavy SPAs

Connect a website

Re-check URL, limits, include/exclude paths.

Website overview

When to use the crawler vs a direct integration.

Crawl API

Programmatic exclude/include/resync.

Connect a knowledge source

Switch to a direct integration if one fits.

Agent Knowledge

Main Knowledge Sources

Additional Knowledge Sources

Website troubleshooting

Common scenarios

Limits at a glance

Connect a website

Website overview

Crawl API

Connect a knowledge source

​Common scenarios

​Limits at a glance

​Related Documentation

Connect a website

Website overview

Crawl API

Connect a knowledge source

Common scenarios

Limits at a glance

Related Documentation