69 packages tagged with “spider”
This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams). --------------------------------------- This library is sponsored by ZZZ Projects: https://entityframework-extensions.net/ https://eval-expression.net/ https://dapper-plus.net/ --------------------------------------- HAP is trusted by companies worldwide with over 150 million downloads.
Deprecated as there's new maintainer for original HAP project. Please check the new repo at https://github.com/zzzprojects/html-agility-pack. This is a port of HtmlAgilityPack library created by Simon Mourrier and Jeff Klawiter for .NET Core platform. This NuGet package supports can be used with Universal Windows Platform, ASP.NET 5 (using .NET Core) and full .NET Framework 4.6. Original description: This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).
Abot is an open source C# web crawler built for speed and flexibility. It takes care of the low level plumbing (multithreading, http requests, scheduling, link parsing, etc..). You just register for events to process the page data. You can also plugin your own implementations of core interfaces to take complete control over the crawl process.
This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).
A powerful C# web crawler that makes advanced crawling features easy to use. AbotX builds upon the open source Abot C# Web Crawler by providing a powerful set of wrappers and extensions.
Web scraper / crawler / spider. Supports robots protocol and user agent.
Core library for grabbing information and media from supported sources
HtmlMonkey is a lightweight HTML/XML parser written in C#. It allows you to parse an HTML or XML string into a hierarchy of node objects, which can then be traversed or queried using jQuery-like selectors. The library also supports creating node objects from code and producing HTML or XML from those objects.
Web Crawling and Scraping Framework
Aspose.Total for .NET is the most complete package of all .NET file format APIs offered by Aspose. It empowers developers to create, edit, render, print and convert between a wide range of popular document formats within any .NET, C#, ASP.NET and VB.NET applications.
自用.netframework4.5
A simple to use and modular spider for web crawling with an example rich GitHub repository
[DEPRECATED] Use new package RafaelEstevam.Simple.Spider See github for details
A simple but powerful web crawler library
插件式爬虫的 cef 实现的一个下载器
Stateful programmatic web browsing, based on Python-Mechanize, which is based on Andy Lester’s Perl module WWW::Mechanize.
Plugin Manager Plugin which generates a robots.txt file, based on DenySpider attributes on classes or methods within controllers. If UserSessionMiddleware.Plugin is also installed, will check to see if a bot is trying to access a page it has been denied, and return a 403 forbidden result.
This client library enables working with Robots.txt. Key Features: - Parse robots.txt into Typed object. - Lookup Allowed/Disallowed/Crawldelay based on User-Agent. - Traverse sitemap in robots.txt for urls. For More info see: https://github.com/nicholasbergesen/robotsSharp/master/README.md
Sqlite-based storage engine to the SimpleSpider See examples and documentation on the GitHub page
插件式爬虫核心代码
It helps you to use HAP in easier and meaningful way via Reflection. It works somehow like Entity-Framework. Go to wiki in github page for tutorial : https://github.com/parsalotfy/HtmlAgilityPack_Helper/wiki
LightningChart® is the fastest 2D and 3D WPF / Winforms / UWP data visualization toolkit for science and finance. Includes SignalTools Components for real-time sound device mic-in, audio out, FFT spectrum, arbitrary multi-channel signal generator, WAV file stream reader. 2 WPF APIs included: - Non-Bindable for best performance - Bindable for great performance, MVVM and property binding features LightningChart is entirely GPU accelerated (DirectX9, DirectX11 and WARP) and performance optimized data visualization control for presenting masses of data in 2D XY graph, 3D XYZ, polar, smith chart in real-time. LightningChart has 1500+ properties and 150+ event handlers, which allows to create the most flexible charting applications. - Flexible XY charts - Advanced 3D charts - Smith charts - Polar charts - Pie/donut 3D charts - Volumetric rendering - Off-line vector maps and HERE on-line maps support - Trader API Alternatively, you can download SDK-installer (from www.LightningChart.com). Besides other things it contains Interactive Examples App (Demo). Demo contains hundreds of examples, which are easy to browse, run and extract as a separate Visual Studio project.
This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams). This fork of HtmlAgilityPack has a fix in place for the RemoveChild(keepGrandChildren) bug reported here: https://htmlagilitypack.codeplex.com/workitem/9113
The Crawler-Lib Engine Test Helper simplifies the test of tasks. It can be used to develop unit tests and integration tests for tasks.
Package Description
服务内部调用
一个基于Selenium/Chrome浏览器池的爬虫类库。
这是一个轻量级、快速、多线程、多管道、灵活配置的网络爬虫。