Skip to main content
The Crawl API allows you to programmatically crawl and index websites into your knowledge base. This enables your AI agents to access and reference content from your website, documentation, or any other web-based resources when responding to customer inquiries.

Overview

Website crawling enables:
  • Automated Content Indexing - Automatically extract and index content from websites
  • Knowledge Base Integration - Crawled content is added directly to your knowledge base
  • Real-time Status Tracking - Monitor crawl progress and completion status
  • Flexible Configuration - Control include/exclude paths, page limits, and crawl intervals
  • Page Management - Exclude, include, delete, or resync individual pages

How It Works

  1. Create a Datasource - Provide a website URL and configuration options
  2. Crawl Starts Automatically - By default, a crawl begins immediately after creation
  3. Monitor Progress - Check crawl status and track page processing
  4. Manage Pages - Review crawled pages, exclude irrelevant ones, or resync outdated content
  5. Scheduled Recrawls - Datasources automatically recrawl on a configurable interval

Crawl Job Statuses

  • pending - Crawl job created, waiting to start
  • scraping - Crawl is actively running and extracting content
  • completed - Crawl finished successfully, content has been indexed
  • failed - Crawl encountered an error and could not complete
  • cancelled - Crawl was manually cancelled before completion

Page Sync Statuses

  • synced - Page content is indexed in the knowledge base
  • pending - Page is waiting to be synced
  • error - Page failed to sync
  • excluded - Page is excluded from syncing
Crawling large websites can take significant time and resources. Use include/exclude paths to focus on relevant content and set appropriate page limits.

Available Endpoints

Datasource Management

Create Datasource

Create a new website datasource and start crawling

List Datasources

List all website datasources for your organization

Get Datasource

Get datasource details with page stats

Update Datasource

Update datasource configuration

Delete Datasource

Delete a website datasource

Crawl Operations

Start Crawl

Start a new crawl for a datasource

Cancel Crawl

Cancel an active crawl

List Crawl Jobs

View crawl history for a datasource

Get Crawl Job

Check the status of a specific crawl job

Page Management

List Pages

List crawled pages with filtering options

Exclude Pages

Exclude pages from future syncs

Include Pages

Re-include previously excluded pages