OpenCourtData Bot
Technical reference for webmasters. This page describes how the OpenCourtData web crawler operates, how to identify it, and how to control its behaviour on your site.
User-Agent string
Every HTTP request made by this crawler carries the following User-Agent header:
OpenCourtDataBot/1.0 (+https://bot.opencourtdata.uk; bot@opencourtdata.uk)
| Field | Value | Description |
|---|---|---|
| Product token | OpenCourtDataBot/1.0 |
Name and version of the crawler for robots.txt matching |
| Info URL | https://bot.opencourtdata.uk |
This page — crawler documentation for webmasters |
| Contact | bot@opencourtdata.uk |
Direct email for crawler-related issues |
To confirm that a request originates from this crawler, see the Verification section below.
Schedule & crawl rate
| Property | Value |
|---|---|
| Crawl frequency | Once daily at 06:00 UTC — not continuous |
| Concurrency | Maximum 5 simultaneous Lambda workers |
| Minimum inter-request delay | 1 second per domain (honouring Crawl-delay if larger) |
| Response timeout | 20 seconds |
| Maximum body size | 10 MB per page |
| Protocols | HTTPS only |
| Infrastructure | AWS Lambda, region eu-west-2 (London) |
| IP range source | AWS published IP ranges — eu-west-2, service LAMBDA |
Configuring robots.txt
This crawler checks your robots.txt file before every request and
strictly honours Disallow rules and Crawl-delay directives.
Changes to your robots.txt take effect within 24 hours.
Allow all crawling (default)
No configuration is necessary if you wish to allow this bot to crawl your site.
Disallow specific paths
User-agent: OpenCourtDataBot Disallow: /private/ Disallow: /admin/ Crawl-delay: 5
Block all crawling
User-agent: OpenCourtDataBot Disallow: /
* will also be respected.
Verifying crawler identity
Two methods are available to confirm that a request originates from this crawler:
Method 1 — Reverse DNS lookup
Perform a reverse DNS lookup on the connecting IP address and confirm that the hostname resolves back to the same IP (forward-confirmed reverse DNS). Requests originate from AWS Lambda in eu-west-2; the resolved hostname will be within the compute.amazonaws.com or eu-west-2.compute.internal domain space.
Method 2 — Cryptographic signature (Web Bot Auth)
Every request carries an Ed25519 HTTP Message Signature header that can be verified using the public key published at the JWKS directory. See the Web Bot Auth section for details.
Simulating a request
You can reproduce a crawler request using curl:
curl -v \ -H "User-Agent: OpenCourtDataBot/1.0 (+https://bot.opencourtdata.uk; bot@opencourtdata.uk)" \ -H "Accept: text/html,application/xhtml+xml" \ "https://example.gov.uk/page"
Web Bot Auth & cryptographic signing
This crawler implements Cloudflare Web Bot Auth, an open standard based on RFC 9421 HTTP Message Signatures. Every request is signed with an Ed25519 private key. Edge networks and WAFs that support this standard can verify the crawler identity in-band without IP allowlisting.
Signed components
The following components are included in each signature:
("@method" "@path" "@authority" "user-agent");tag="web-bot-auth";keyid="opencourtdata-bot-key-v1"
Headers present on every request
| Header | Description |
|---|---|
Signature-Agent |
URL of this bot information page (https://bot.opencourtdata.uk) |
Signature-Input |
Signed component list, key ID, and creation timestamp |
Signature |
Base64-encoded Ed25519 signature over the listed components |
Public key directory
The public signing key is published in the standard key discovery path:
https://bot.opencourtdata.uk/.well-known/http-message-signatures-directory
This file is served with Content-Type: application/http-message-signatures-directory+json as required by the specification.
exp field in the JWKS).
WAF configurations should check the nbf/exp fields
and be updated when a new key is published.
Data policy
- What we index
- Publicly accessible UK court hearing lists, judgment metadata, and court directory pages. We do not access areas requiring authentication.
- What we store
- Structured metadata only (page title, URL, links, HTTP status, SHA-256 content hash). Raw HTML is not retained beyond the duration of the crawl.
- Personal data
- We do not seek out or process personal data beyond what is already published in official public court records. If you believe personal data has been indexed in error, contact us at bot@opencourtdata.uk.
- Right to erasure
- We will action takedown requests for personal data within 72 hours. Block our crawler via
robots.txtand we will not re-index that content. - Retention
- Indexed data is retained for up to 12 months before being purged or archived.
Contact
If you have questions or concerns about this crawler — including crawl rate complaints, data removal requests, or suspected misuse — contact us directly:
| Channel | Address | Response time |
|---|---|---|
| Crawler issues & takedowns | bot@opencourtdata.uk | Within 2 business days |
| Main site | opencourtdata.uk | — |