Iron WebScraper is a C# web scraping library, allowing developers to simulate & automate human browsing behavior to extract content, files & images from web applications as native .NET objects. Iron Web Scraper manages politeness & multithreading in the background, leaving a developer’s own application easy to understand & maintain. Iron Web Scraper can be used to migrate content from existing websites as well as build search indexes and monitor website structure & content changes. It's functionality includes: » Read & extract structured content from web pages using html DOM, Javascript, Xpath, jQuery Style CSS Selectors. » Fast multi threading allows hundreds of simultaneous requests. » Politely avoid over stalling remote servers using IP/domain level throttling & optionally respecting robots.txt » Manage multiple identities, DNS, proxies, user agents, request methods, custom headers, cookies & logins. » Data exported from websites becomes native C# objects which can be stored or used immediately. » Exceptions managed in all but the developers own code. Errors and captchas auto retried on failure » Save, pause, resume, autosave scrape jobs. » Built in web cache allows for action replay, crash recovery, and querying existing web scrape data. Change scrape logic on the fly, then replay job without internet traffic. Supports: Framework .NET 4.6.2+, .NET Core 3.1+, .NET Standard 2.0+, .NET 5, .NET 6, .NET 7, .NET 8 and .NET 9 on Windows, Linux, macOS, Mobile, AWS and Azure Licensing & Support available for commercial deployments. For code examples, documentation & more visit http://ironsoftware.com/cshapr/webscraper. For support please email us at support@ironsoftware.com.
PM> Install-Package IronWebScraper
using IronWebScraper;
namespace YourApp
{
public class Program
{
private static void Main(string[] args)
{
var ScrapeJob = new BlogScraper();
ScrapeJob.Start();
}
}
public class BlogScraper : WebScraper
{
public override void Init()
{
LoggingLevel = LogLevel.All;
Request("https://www.zyte.com/blog/", Parse);
}
public override void Parse(Response response)
{
foreach (HtmlNode title_link in response.Css(".oxy-post-title"))
{
string strTitle = title_link.TextContentClean;
Scrape(new ScrapedData() { { "Title", strTitle } });
}
if (response.CssExists("div.oxy-easy-posts-pages > a[href]"))
{
string next_page = response.Css("div.oxy-easy-posts-pages > a[href]")[0].Attributes["href"];
Request(next_page, Parse);
}
}
}
}Dive deeper with our extensive documentation and examples:
Tutorials: Step-by-step guides to help you scrape your first website.
Code Examples: Concise set of code that can be easily executable.
How-To Guides: Practical, goal-oriented instructions to solve specific problems.
Demo: Detailed technical descriptions of the API and its components.
API Reference: Demonstrating guides that showcase how IronWebScraper works the way it does.
Extract Web Data: Precisely extract structured content, images, and files from web pages using CSS selectors, XPath, or direct DOM manipulation.
Scrape Efficiently: Run hundreds of simultaneous requests with fast multithreading while automatically managing politeness with request throttling.
Manage Identity: Customize scraper identity by managing proxies, user agents, and cookies, and handle user logins with precision.
Control Job Flow: Manage long-running tasks with the ability to save, pause, and resume jobs.
Debug and Replay: Use the built-in web cache for crash recovery or to re-run scrapes with modified logic without making new internet requests.
Platforms: .NET 10, .NET 9, .NET 8, .NET 7, .NET 6, .NET 5, Core 2x & 3x, Standard 2
Framework: .NET Framework 4.6.2 (and above)
App Models: Console, Web, and Desktop Apps
Operating Systems: Windows, macOS, Linux (Debian, CentOS, Ubuntu)
Cloud & Containerization Platforms: Azure, AWS, Docker
IDEs: Microsoft Visual Studio or Jetbrains ReSharper & Rider
IronWebScraper is a commercially licensed product.
Have a question or running into an issue?
Email Support: Reach out to our team directly at support@ironsoftware.com.
Live Chat Support: https://ironsoftware.com/csharp/webscraper/#helpscout-support
Report a Bug: https://ironsoftware.com/ticket-submission/
Community: https://ironsoftware.com/company/iron-slack-community/