# Numpy String Partitioning: Perform Multiple Splits

## 23 July 2019 - 1 answer

I have an array of strings, each containing one or more words. I want to split / partition the array on a separator (blank in my case) with as many splits as there are separators in the element containing the most separators. `numpy.char.partition` however only performs a single split, regardless of how often the separator appears:

I've got:

``````>>> a = np.array(['word', 'two words', 'and three words'])
>>> np.char.partition(a, ' ')

>>> array([['word', '', ''],
['two', ' ', 'words'],
['and', ' ', 'three words']], dtype='<U8')
``````

I'd like to have:

``````>>> array([['word', '', '', '', ''],
['two', ' ', 'words', '', ''],
['and', ' ', 'three', ' ', 'words']], dtype='<U8')
``````

Approach #1

Those `partition` functions doesn't seem to partition for all the occurrences. To solve for our case, we can use `np.char.split` to get the split strings and then `masking`,`array-assignment`, like so -

``````def partitions(a, sep):
# Split based on sep
s = np.char.split(a,sep)

# Get concatenated split strings
cs = np.concatenate(s)

# Get params
N = len(a)
l = np.array(list(map(len,s)))
el = 2*l-1
ncols = el.max()

out = np.zeros((N,ncols),dtype=cs.dtype)

# Setup valid mask that starts at fist col until the end for each row

# Assign sepeter into valid ones

# Setup valid mask that has True at postions where words are to be assigned

# Assign words
return out
``````

Sample runs -

``````In [32]: a = np.array(['word', 'two words', 'and three words'])

In [33]: partitions(a, sep=' ')
Out[33]:
array([['word', '', '', '', ''],
['two', ' ', 'words', '', ''],
['and', ' ', 'three', ' ', 'words']], dtype='<U5')

In [44]: partitions(a, sep='ord')
Out[44]:
array([['w', 'ord', ''],
['two w', 'ord', 's'],
['and three w', 'ord', 's']], dtype='<U11')
``````

Approach #2

Here's another with a loop, to save on memory -

``````def partitions_loopy(a, sep):
# Get params
N = len(a)
l = np.char.count(a, sep)+1
ncols = 2*l.max()-1
out = np.zeros((N,ncols),dtype=a.dtype)
for i,(a_i,L) in enumerate(zip(a,l)):
ss = a_i.split(sep)
out[i,1:2*L-1:2] = sep
out[i,:2*L:2] = ss
return out
``````