Ad

How To Split String By Space And Words List

- 1 answer

Assume that I have below string:

"USD Notional Amount: USD 50,000,000.00"
"USD Fixed Rate Payer Currency Amount: USD 10,000,000"
"USD Fixed Rate Payer Payment Dates: Annually"
"KRW Fixed Rate Payer Payment Dates: Annually"

Simply, using split function

df = pd.DataFrame(["USD Notional Amount: USD 50,000,000.00"
                   ,"USD Fixed Rate Payer Currency Amount: USD 10,000,000"
                   ,"USD Fixed Rate Payer Payment Dates: Annually"
                   ,"KRW Fixed Rate Payer Payment Dates: Annually"])

df[0].apply(lambda x: x.split())

[OUTPUT]

0    [USD, Notional, Amount:, USD, 50,000,000.00]                 
1    [USD, Fixed, Rate, Payer, Currency, Amount:, USD, 10,000,000]
2    [USD, Fixed, Rate, Payer, Payment, Dates:, Annually]         
3    [KRW, Fixed, Rate, Payer, Payment, Dates:, Annually]    

I want to have preserving compound words list

words_list = ["Notional Amount:","Fixed Rate Payer Currency Amount:","Fixed Rate Payer Payment Dates:"]

What I want is to split the string into string array, like below:

["USD","Notional Amount:","USD", "50,000,000.00"]
["USD","Fixed Rate Payer Currency Amount:","USD","10,000,000"]
["USD","Fixed Rate Payer Payment Dates:","Annually"]
["KRW","Fixed Rate Payer Payment Dates:","Annually"]

When I split this string I would like to preserve some words as it is not always splitting by space. Anyone knows how to do this kind of string split in Python? Any thoughts?

Ad

Answer

As Xhattam said, there is probably no generic way to do your thing.

However, assuming that you know which strings with spaces you don't want to split, you can do the following (from your example):

test = "USD Notional Amount: USD 50,000,000.00"
a = ['Notional Amount:', 'Fixed Rate Payer Currency Amount:', 'Fixed Rate Payer Payment Dates:', 'Fixed Rate Payer Payment Dates:']

for element in a:
    if element in test:
        # Do this to strip your string from the list
        my_list = test.replace(element, '') 
        # Do this to replace double space by simple space following the word stripping
        my_list = test.replace('  ', ' ')
        # Insert the element you striped in the list at the wanted index
        my_list.insert(1, element)
        break

Now you should be able to print my_list and get the following result:

print(my_list)
['USD', 'Notional Amount:', 'USD', '50,000,000.00']

This is a specific example you can easily adapt to your other strings.

Ad
source: stackoverflow.com
Ad