Readme

Rabbit Hole

Name: ChatAIze.RabbitHole
Author: Marcel Kwiatkowski

Rabbit Hole is a small, deterministic web text scraper for .NET. It discovers links within a root URL and extracts readable text from HTML pages. The output is a Markdown-like string suited for indexing, summarization, or offline processing.

Use cases

Build a lightweight search index for a site
Feed content into an LLM or summarization pipeline
Snapshot documentation pages for offline use
Validate a sitemap against actual in-page links

Features

Async breadth-first link discovery with de-duplication
Scope control to the root URL prefix
Skips common non-HTML assets by extension
HTML-only parsing based on Content-Type
Metadata extraction: title, meta description, meta keywords
Markdown-like content output for headings, paragraphs, and lists
Inline links and images preserved in the output
Cancellation support for long-running crawls

Requirements

.NET 10 (net10.0)

Install

dotnet add package ChatAIze.RabbitHole

Quick start

using ChatAIze.RabbitHole;

var scraper = new WebsiteScraper();

await foreach (var link in scraper.ScrapeLinksAsync("https://example.com", depth: 2))
{
    Console.WriteLine(link);
}

var page = await scraper.ScrapeContentAsync("https://example.com");
Console.WriteLine(page.Title);
Console.WriteLine(page.Content);

Usage patterns

Crawl links, then fetch content

using ChatAIze.RabbitHole;

var scraper = new WebsiteScraper();

await foreach (var link in scraper.ScrapeLinksAsync("https://example.com", depth: 3))
{
    var page = await scraper.ScrapeContentAsync(link);
    Console.WriteLine($"{page.Url} -> {page.Title}");
}

Cancel a long crawl

using ChatAIze.RabbitHole;

var scraper = new WebsiteScraper();
using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(30));

await foreach (var link in scraper.ScrapeLinksAsync("https://example.com", depth: 3, cts.Token))
{
    Console.WriteLine(link);
}

Filter links before scraping content

using ChatAIze.RabbitHole;

var scraper = new WebsiteScraper();

await foreach (var link in scraper.ScrapeLinksAsync("https://example.com", depth: 3))
{
    if (!link.Contains("/docs/"))
    {
        continue;
    }

    var page = await scraper.ScrapeContentAsync(link);
    Console.WriteLine(page.Content);
}

ChatAIze/ChatAIze.RabbitHolev0.3.0

Get Started

Readme

Rabbit Hole

Use cases

Features

Requirements

Install

Quick start

Usage patterns

Crawl links, then fetch content

Cancel a long crawl

Filter links before scraping content

Link discovery details

Content extraction details

Output format

Error handling and resiliency

Limitations and notes

API reference

`WebsiteScraper`

`PageDetails`

Development

License

Links

Keywords

Maintainers