Ad

Splitting Korean And Number

- 1 answer

I need to separate Korean letters from two numbers.

Korean letter can either be one to three words making everything even more complicated.

Here are some of the Korean regex code range I know:

ㄱ ~ ㅎ: 0x3131 ~ 0x314e
ㅏ ~ ㅣ: 0x314f ~ 0x3163
가 ~ 힣: 0xac00 ~ 0xd7a3

Number on the front is always 4 digits and 5 digits for the other.

Here are some examples:

2019개회54321
2017가51584
2019가행견16997

What I need should look like this:

Var_A = "2019"
Var_B = "가"
Var_C = "23220"

Thanks in advance ;)

Ad

Answer

There is no need for regex here. Since you know the length of the numbers, you can just slice the string.

To get the first 4 digits:

yourString[:4]

To get the Korean part:

yourString[4:-5]

To get the last 5 digits:

yourString[-5:]

If you really want a regex, you can use this:

^(\d{4})([\u3131-\u3163\uac00-\ud7a3]+?)(\d{5})$

Remember to turn on the re.UNICODE option for Unicode matching. The front 4 digits, the Korean text, and the last 5 digits will be in Groups 1, 2 and 3 respectively.

Ad
source: stackoverflow.com
Ad