Ad

Not Recognising Hyphen On Split

- 1 answer

I'm working with about 24k text files and am splitting some lines on '-'. It works for some files, however it fails to split for some other files.

company_participants is a list with N >= 1 elements, with each element consisting of a name followed by a hyphen ("-"), followed by the job title. To get the names, I use:

names_participants = [name.split('-')[0].strip() for name in company_participants]

On closer inspection, I found that it does not recognise "-" as "-" for some reason.

For example, the first element in company_participants is "robert isom - president"

Calling company_participants[0].split()[2] returns "-" since I've split on whitespace, and the hyphen is the third element (index 2).

When I then run a boolean on whether this is equal to "-", I get False.

company_participants[0].split()[2] == "-"  # Item at index 2 is the hyphen
# Output = False

Any idea what's going on here? Is there something else that looks like a hyphen but isn't one?

Many thanks!

Ad

Answer

So I found that this has actually been answered elsewhere on StackOverflow.

Apparently I'm dealing with a "dash" and not a "hyphen"; couldn't see the difference with me naked eyes but when I copied the symbol from here, then it recognised it such that company_participants[0].split()[2] == "–" returned True.

#textDataProblems
#didNotSeeThatComing

Thank you StackOverflow!

Ad
source: stackoverflow.com
Ad