Remove All HTML from Text in Excel

Add to Favorites

How to quickly and easily remove all HTML from data copied into Excel. This tutorial includes two simple methods, one without a macro and one with a macro.


Remove HTML from Text in Excel

Remove HTML using a Macro


Remove HTML from Text in Excel

  1. Select the cell that contains the HTML and hit Ctrl + H to go to the Find/Replace window.
  2. In the Find what: input, type <*> and then leave the Replace with: input blank.
  3. If you selected more than one cell, click the Replace All button to work on all of the selected cells; if you selected just one single cell, click just the Replace button, otherwise it will perform a Find/Replace on the entire worksheet. After this, a window will appear and tell you have many replacements were made.
  4. That's it!

Now all of the html will have been removed from the data and you're good to go!


How it Works

HTML code is always contained within tags that begin with < and end with > and the above method uses a wildcard character * inbetween the < and the >. This tells Excel to match anything that starts and ends with < and > and then, since we replace it with an empty string, or nothing, Excel replaces all of the HTML tags with nothing, which removes all of the tags.

The macro used below does the same thing, just with some programming code instead of the standard Find/Replace feature in the worksheet.

Weird Formatting

Sometimes when you do this, if you have a lot of text in the cell, Excel will "wrap" it and then create a huge cell that goes way down the worksheet.


To get rid of this, select the cell, go to the Home tab and click the button Wrap Text (Excel sometimes automatically applies this and you have to turn it off to get a normal sized cell again.)


Remove HTML using a Macro

Sub HTML_Removal()

Dim Cell As Range

With CreateObject("vbscript.regexp")
    .Pattern = "\<.*?\>"
    .Global = True

    For Each Cell In Selection
        Cell.Value = .Replace(Cell.Value, "")

End With

End Sub

This removes all html from the cells that you select.

Simply select some cells, run the macro, and that's it.



HTML is always contained within tags which begin with < and end with > and this is how this tutorial removes the HTML, by removing those tags. However, if you copy broken or bad HTML into Excel and use the above methods, it might return unexpected results. For instance, if there is no closing tag > on a tag, then more data than is expected might be removed. There is very little that you can do about this without putting the HTML first through a validator type setup, but you should be aware that issues are possible if the HTML is not in the incorrect format in one way or another. But, for the most part, you probably won't run into this issue when working with small data sets.

Make sure to download the sample file to get the above examples and code in Excel.

Question? Ask it in our Excel Forum

Downloadable Files: Excel File

Similar Content on TeachExcel
3 Methods to Remove Spaces Between Text in Excel
Tutorial: How to remove spaces from the middle of text in Excel.  This includes removing all space...
Fastest Way to Remove Formatting in Excel
Tutorial: Stop wasting time and follow this tip! This shows you the fastest and easiest way to remo...
Quickly Clear all Formatting in Excel
Tutorial: How to remove all formatting from cells at once in Excel. This includes removing any and...
Remove All Filtering From a Worksheet in Excel
Macro: This Excel macro removes all filtering from a worksheet in Excel. This allows you to compl...
Display all Formulas at Once in Excel
Tutorial: How to view all of the formulas at once in Excel so that you can troubleshoot the spreadsh...
Get Text from Comments in Excel Including the Author of the Comment - UDF
Macro: Output all text from a cell comment, including comment author, with this UDF in Excel. Thi...