Adjust the settings of your crawler
crawler.toml
file alongside your code.
name
entry
schedule
*/10 * * * *
-> every 10 minutes45 23 * * *
-> every day at 11:45pm UTC30 12 * * 2
-> every Tuesday at 12:30pm UTC59 1 14 3 *
-> every March 14 at 1:59am UTC@manual
-> only start a crawl by manually clicking a button@hourly
-> every hour at the beginning of the hour (0 * * * *
)@daily
-> every day at midnight (0 0 * * *
)@weekly
-> every week at midnight on Sunday (0 0 * * 0
)@monthly
-> every month at 12am on the 1st of the month (0 0 1 * *
)crawl.maxRequests
maxMinutes
or maxRequests
is reached, whichever happens first.
crawl.maxRequestsPerOrigin
crawl.maxMinutes
maxMinutes
or maxRequests
is reached, whichever happens first.
crawl.matchDomains
*
) match between dot (.
) characters.
Provide values to allow or deny adding certain URLs to the queue.
Domain glob examples:
example.com
-> only crawl URLs whose hostname equals example.com*.example.com
-> only crawl single-level subdomains of example.com**.example.com
-> only crawl multi-level subdomains of example.com**.io
-> only crawl .io domains!**.tk
-> do not crawl .tk domains!**.gov
!**.onion
crawl.matchPaths
*
) match between slash (/
) characters.
Provide values to allow or deny adding certain URLs to the queue.
Path glob examples:
/jobs
-> only crawl URLs whose pathname equals /jobs/jobs/*
-> only crawl URLs one level under the /jobs directory/jobs/**
-> only crawl URLs whose pathname starts with /jobs/**/*.html
-> only crawl URLs that have a .html file extension!**/*.gif
-> do not crawl URLs that have a .gif file extension!**/*.avif
!**/*.bak
!**/*.dmg
!**/*.exe
!**/*.gz
!**/*.m4a
!**/*.mov
!**/*.mp3
!**/*.mp4
!**/*.ogg
!**/*.pdf
!**/*.psd
!**/*.rar
!**/*.rpm
!**/*.wasm
!**/*.wav
!**/*.webm
!**/*.xsd
!**/*.zip
crawl.ignoreQueryParams
queue.batchSize
queue.maxConcurrency
auto
.
queue.maxRetries
Retry-After
header; if absent, with exponential backoff.
vector.enabled
true
to opt in to using a vector database with your crawler.
A vector database gets created the first time your crawler is deployed while this value is true.
You must set this value as true
if your handler returns embeddings.
vector.dimensions
vector.metric
bucket.enabled
true
to opt in to using a bucket with your crawler.
A bucket gets created the first time your crawler is deployed while this value is true.
You must set this value as true
if your handler returns the attach
property.