5 packages tagged with “content-extraction”
A .NET 9 library for extracting web content using Playwright automation. Supports both raw HTML extraction and cleaned markdown conversion with intelligent content filtering.
Official .NET client library for Oblien Search API - AI-powered web search, content extraction, and deep research capabilities with real-time streaming.
Enhanced .NET MAUI WebView control with advanced browser capabilities, real-time content monitoring, and PDF processing. Features include custom user-agent configuration, debounced DOM change detection, PDF handling, cookie management, and seamless cross-platform support for Android, iOS, and Windows. Key Features: - Custom User-Agent and browser detection bypass - Real-time content monitoring with intelligent debouncing (1-second delay) - Automatic PDF download handling and text extraction - Link extraction with Routes (all page links) and BodyRoutes (content-only links) - Full cookie and storage support - WebRTC and WebGL compatibility - Cross-platform implementation (Android, iOS, Windows) - Event-based content updates via PageDataChanged event - JavaScript injection support - Production-ready with optimized performance Perfect for applications requiring advanced web content interaction, monitoring, and processing.
PageProbe is a modern, extensible .NET 8 web crawling library for extracting links, multimedia, metadata, images, and text from web pages. It supports robust crawling with depth control, robots.txt compliance, and export to multiple formats (CSV, JSON, XML, Markdown, Text). Designed for reliability, testability, and easy integration in .NET applications.
AI-powered text summarization for .NET using GPT-4, Claude, Llama and 400+ models. Summarize articles, documents, legal texts, research papers, meeting notes, and more. Supports multiple summary styles (bullet points, executive, abstract), custom lengths, key point extraction, and batch processing. Perfect for content curation, document processing, and automated reporting.