Skip to main content

Remove all Html Tags in pdf file




If  you want to remove all of the HTML tags contained within your PDF form?

I'll list two of the major options, using an HTML Parser and using a Regular Expression to tackle this issue.

Option 1 : Use the HTML Agility Pack

 The HTML Agility Pack is an agile parser that reads, writes and handles most situations that you would need to do involving HTML in .NET. (As a bonus is it also available through NuGet)

From this related Stack Overflow discussion, you can see the code listed below to strip all of the HTML tags from some text :

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(Properties.Resources.HtmlContents);
var text = doc.DocumentNode.SelectNodes("//body//text()").Select(node => node.InnerText);
StringBuilder output = new StringBuilder();
foreach (string line in text)
{
   output.AppendLine(line);
}
string textOnly = HttpUtility.HtmlDecode(output.ToString());
I haven't worked with the HTML Agilty Pack, however I have heard nothing but good things so I am listing it.

Option 2 : Regular Expression

If you currently have the entire contents of your PDF within string format, you could use the following Regular Expression to easily strip out all of the HTML tags contained within it (however realize that this may affect the appearance of your PDF) :
<[^>]*>

which you can use in the following way :

//Uses a Regular Expression to strip your HTML tags (RegexOptions.Compiled for improved performance)
string result = new Regex("<[^>]*>", RegexOptions.Compiled).Replace(yourString, "");
This will likely be the "easier" method but may not be perfect by any means (as regular expressions typically aren't).

Comments

Popular posts from this blog

Asp.Net AjaxFileUpload Control With Drag Drop And Progress Bar

This Example explains how to use AjaxFileUpload Control With Drag Drop And Progress Bar Functionality In Asp.Net 2.0 3.5 4.0 C# And VB.NET. Previous Post  I was Explained about the   jQuery - Allow Alphanumeric (Alphabets & Numbers) Characters in Textbox using JavaScript  ,  Fileupload show selected file in label when file selected  ,  Check for file size with JavaScript before uploading  . May 2012 release of AjaxControlToolkit includes a new AjaxFileUpload Control  which supports Multiple File Upload, Progress Bar and Drag And Drop functionality. These new features are supported by Google Chrome version 16+, Firefox 8+ , Safari 5+ and Internet explorer 10 + , IE9 or earlier does not support this feature. To start with it, download and put latest AjaxControlToolkit.dll in Bin folder of application, Place ToolkitScriptManager  and AjaxFileUpload on the page. HTML SOURCE < asp:ToolkitScriptManager I...

Check dot net core framework version in my PC Or System

 Open My Computer → double click "C:" drive → double click "Windows" → double click "Microsoft.NET" → double click "Framework" → Inside this folder, there will be folder(s) like "v1.0.3705" and/or "v2.0.50727" and/or "v3.5" and/or "v4.0.30319". Your latest .NET version would be in the highest v number folder, so if v4.0.30319 is available that would hold your latest .NET framework. However, the v4.0.30319 does not mean that you have the .NET framework version 4.0. The v4.0.30319 is your Visual C# compiler version, therefore, in order to find the .NET framework version do the following. Go to a command prompt and follow this path: C:\Windows\Microsoft.NET\Framework\v4.0.30319 (or whatever the highest v number folder) C:\Windows\Microsoft.NET\Framework\v4.0.30319 > csc.exe Output: Microsoft (R) Visual C# Compiler version 4.0.30319.17929 for Microsoft (R) .NET Framework 4.5 Copyright (C) Microsoft Corporati...

AngularJS Interview Questions and Answers

1) What is Angular.js ? AngularJS is a javascript framework used for creating single web page applications.  It allows you to use HTML as your template language and enables you to extend HTML’s syntax to express your application’s components clearly 2) Explain what are the key features of Angular.js ? The key features of angular.js are Scope Controller Model View Services Data Binding Directives Filters Testable 3) Explain what is scope in Angular.js ? Scope refers to the application model, it acts like glue between application controller and the view.  Scopes are arranged in hierarchical structure and impersonate the DOM ( Document Object Model) structure of the application.  It can watch expressions and propagate events. 4) Explain what is services in Angular.js ? In angular.js services are the singleton objects or functions that are used for carrying out specific tasks.  It holds some business logic and these function can be called as contro...