Ad

Python Pandas Data Frame: One Column Contains Special HTML Spcial Characters Such As & < Is There A Way To Remove Them?

- 1 answer

example dataframe

enter image description here

I am only showing an example here. Is there a way to remove all of the special characters? (eg. not just "&amp;" and "&lt;" shown)

Ad

Answer

I think the following would work with only one pass through the text

re.sub("&[a-zA-Z]+?;","",corpus_of_text)

in a dataframe i think its just (I think...)

cleaned_values = df['column2'].str.replace(re.compile("&[a-zA-Z]+?;"),"")
Ad
source: stackoverflow.com
Ad