> ## Documentation Index > Fetch the complete documentation index at: https://docs.open.cx/llms.txt > Use this file to discover all available pages before exploring further. # Crawl > The Crawl API allows you to programmatically crawl and index websites into your knowledge base. The Crawl API allows you to programmatically crawl and index websites into your knowledge base. This enables your AI agents to access and reference content from your website, documentation, or any other web-based resources when responding to customer inquiries. ## Overview Website crawling enables: * **Automated Content Indexing** - Automatically extract and index content from websites * **Knowledge Base Integration** - Crawled content is added directly to your knowledge base * **Real-time Status Tracking** - Monitor crawl progress and completion status * **Flexible Configuration** - Control include/exclude paths, page limits, and crawl intervals * **Page Management** - Exclude, include, delete, or resync individual pages ## How It Works 1. **Create a Datasource** - Provide a website URL and configuration options 2. **Crawl Starts Automatically** - By default, a crawl begins immediately after creation 3. **Monitor Progress** - Check crawl status and track page processing 4. **Manage Pages** - Review crawled pages, exclude irrelevant ones, or resync outdated content 5. **Scheduled Recrawls** - Datasources automatically recrawl on a configurable interval ## Crawl Job Statuses * **`pending`** - Crawl job created, waiting to start * **`scraping`** - Crawl is actively running and extracting content * **`completed`** - Crawl finished successfully, content has been indexed * **`failed`** - Crawl encountered an error and could not complete * **`cancelled`** - Crawl was manually cancelled before completion ## Page Sync Statuses * **`synced`** - Page content is indexed in the knowledge base * **`pending`** - Page is waiting to be synced * **`error`** - Page failed to sync * **`excluded`** - Page is excluded from syncing Crawling large websites can take significant time and resources. Use include/exclude paths to focus on relevant content and set appropriate page limits. ## Available Endpoints ### Datasource Management Create a new website datasource and start crawling List all website datasources for your organization Get datasource details with page stats Update datasource configuration Delete a website datasource ### Crawl Operations Start a new crawl for a datasource Cancel an active crawl View crawl history for a datasource Check the status of a specific crawl job ### Page Management List crawled pages with filtering options Exclude pages from future syncs Re-include previously excluded pages