Ad
Numpy String Partitioning: Perform Multiple Splits
I have an array of strings, each containing one or more words. I want to split / partition the array on a separator (blank in my case) with as many splits as there are separators in the element containing the most separators. numpy.char.partition
however only performs a single split, regardless of how often the separator appears:
I've got:
>>> a = np.array(['word', 'two words', 'and three words'])
>>> np.char.partition(a, ' ')
>>> array([['word', '', ''],
['two', ' ', 'words'],
['and', ' ', 'three words']], dtype='<U8')
I'd like to have:
>>> array([['word', '', '', '', ''],
['two', ' ', 'words', '', ''],
['and', ' ', 'three', ' ', 'words']], dtype='<U8')
Ad
Answer
Approach #1
Those partition
functions doesn't seem to partition for all the occurrences. To solve for our case, we can use np.char.split
to get the split strings and then masking
,array-assignment
, like so -
def partitions(a, sep):
# Split based on sep
s = np.char.split(a,sep)
# Get concatenated split strings
cs = np.concatenate(s)
# Get params
N = len(a)
l = np.array(list(map(len,s)))
el = 2*l-1
ncols = el.max()
out = np.zeros((N,ncols),dtype=cs.dtype)
# Setup valid mask that starts at fist col until the end for each row
mask = el[:,None] > np.arange(el.max())
# Assign sepeter into valid ones
out[mask] = sep
# Setup valid mask that has True at postions where words are to be assigned
mask[:,1::2] = 0
# Assign words
out[mask] = cs
return out
Sample runs -
In [32]: a = np.array(['word', 'two words', 'and three words'])
In [33]: partitions(a, sep=' ')
Out[33]:
array([['word', '', '', '', ''],
['two', ' ', 'words', '', ''],
['and', ' ', 'three', ' ', 'words']], dtype='<U5')
In [44]: partitions(a, sep='ord')
Out[44]:
array([['w', 'ord', ''],
['two w', 'ord', 's'],
['and three w', 'ord', 's']], dtype='<U11')
Approach #2
Here's another with a loop, to save on memory -
def partitions_loopy(a, sep):
# Get params
N = len(a)
l = np.char.count(a, sep)+1
ncols = 2*l.max()-1
out = np.zeros((N,ncols),dtype=a.dtype)
for i,(a_i,L) in enumerate(zip(a,l)):
ss = a_i.split(sep)
out[i,1:2*L-1:2] = sep
out[i,:2*L:2] = ss
return out
Ad
source: stackoverflow.com
Related Questions
- → What are the pluses/minuses of different ways to configure GPIOs on the Beaglebone Black?
- → Django, code inside <script> tag doesn't work in a template
- → React - Django webpack config with dynamic 'output'
- → GAE Python app - Does URL matter for SEO?
- → Put a Rendered Django Template in Json along with some other items
- → session disappears when request is sent from fetch
- → Python Shopify API output formatted datetime string in django template
- → Can't turn off Javascript using Selenium
- → WebDriver click() vs JavaScript click()
- → Shopify app: adding a new shipping address via webhook
- → Shopify + Python library: how to create new shipping address
- → shopify python api: how do add new assets to published theme?
- → Access 'HTTP_X_SHOPIFY_SHOP_API_CALL_LIMIT' with Python Shopify Module
Ad