Skip to main content

Html Agility Pack HTML Parsing Engine


Attention to get the latest Official Html Agility Pack releases please use the Nuget Package

Html Agility Pack is an HTML parsing engine written for .NET. It is available for many .NET platforms including .NET CF, WP7 and Silverlight


What is exactly the Html Agility Pack (HAP)?

This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).

Html Agility Pack now supports Linq to Objects (via a LINQ to Xml Like interface). Check out the new beta to play with this feature

Sample applications:

Page fixing or generation. You can fix a page the way you want, modify the DOM, add nodes, copy nodes, well... you name it.
Web scanners. You can easily get to img/src or a/hrefs with a bunch XPATH queries.
Web scrapers. You can easily scrap any existing web page into an RSS feed for example, with just an XSLT file serving as the binding. An example of this is provided.

There is no dependency on anything else than .Net's XPATH implementation. There is no dependency on Internet Explorer's MSHTML dll or W3C's HTML tidy or ActiveX / COM object, or anything like that. There is also no adherence to XHTML or XML, although you can actually produce XML using the tool. The version posted here on CodePlex is for the .NET Framework 2.0. If you need the old version, please go to the old page or drop me a note.

Examples
http://htmlagilitypack.codeplex.com/wikipage?title=Examples

Download
http://htmlagilitypack.codeplex.com/

For More Info

http://runtingsproper.blogspot.in/2009/11/easily-extracting-links-from-snippet-of.html
http://runtingsproper.blogspot.in/2009/09/introduction-to-htmlagilitypack-library.html


Sample Code

HtmlDocument doc = new HtmlDocument();
doc.Load(@"C:\Sample.HTM");
HtmlNodeCollection linkNodes = doc.DocumentNode.SelectNodes("//a/@href");

Content match = null;

// Run only if there are links in the document.
if (linkNodes != null)
{
    foreach (HtmlNode linkNode in linkNodes)
    {
        HtmlAttribute attrib = linkNode.Attributes["href"];
        // Do whatever else you need here
    }
}

Comments

Popular posts from this blog

ASP.NET e-Commerce website GridView with Product Listing

Introduction : E-Commerce web applications are everywhere these days, and many share a common set of functionality. In this article, I will show how to use the GridView and ListView controls to build a powerful product page with many of the features found on today's e-commerce sites. We'll build a bicycle store product grid using some free clip art bicycle images. The example files are user controls which can be easily added to a page. We're only using three images here to keep the size of the sample application small. In previously I was explained about  Sending Email from Asp.net With Dynamic Content  ,  How To Export gridview data to Excel/PDF , CSV Formates in asp.net C#  , How to print Specific Area in asp.net web page How To- Search records or data in gridview using jQuery  . A shopping cart application would require to display the products in a multi column grid, rather than a straight down list or a table. The each item in a product grid would have

How to hide url parameters in asp.net

There are different ways to Hide the URL in asp.net , you can choose any one from bellow options . Previously I was Explained about the  Difference between Convert.tostring and .tostring() method Example  ,   Reasons to use Twitter Bootstrap , How to Register AJAX toolkit in web.config file in asp.net a) Using Post Method b) Using Of Session . c) URL Encoding & decoding process . d) Using Server.Transfer() instead of Response.Redirect() method (1)Use a form and POST the information. This might require additional code in source pages, but should not require logic changes in the target pages (merely change Request.QueryString to Request.Form). While POST is not impossible to muck with, it's certainly less appealing than playing with QueryString parameters. (2)Use session variables to carry information from page to page. This is likely a more substantial effort compared to (1), because you will need to take session variable checking into account (e.g. the

Nested GridView in ASP.NET Using c# with show/hide

In This example shows how to create Nested GridView In Asp.Net Using C# And VB.NET With Expand Collapse Functionality. Previous post I was Explained about the   ASP.NET e-Commerce website GridView with Product Listing  ,  How To Export gridview data to Excel/PDF , CSV Formates in asp.net C# , Sending Email from Asp.net With Dynamic Content  ,  SQL Server- Case Sensitive Search in SQL Server I have used JavaScript to Create Expandable Collapsible Effect by displaying Plus Minus image buttons. Customers and Orders Table of Northwind Database are used to populate nested GridViews. Drag and place SqlDataSource from toolbox on aspx page and configure and choose it as datasource from smart tags Go to HTML source of page and add 2 TemplateField in <Columns>, one as first column and one as last column of gridview. Place another grid in last templateField column. Markup of page after adding both templatefields will like as shown below. HTML SOURCE < a