Ad

Extract Last Paragraph From Tect With Python

I am analyzing service desk tickets and I need to extract the first timestamp from the comments column. that is, I need to know at what date and time the service desk analyst interacted with a ticket for the first time. I have used datefinder.find_dates() function and that works reasonably well but I have some ticket comments that are very technical and use lots of numbers and IUP Addresses and this seems to confuse the datefinder.find_dates() function and a lot of times it is just spitting irrelevant data. I have tried searching for a tutorial on the function but there are none that are helpful as it seems that this function is not very popular. I have also found this and this SOF questions but they don't address my issue. because datefinder.find_dates() does not work well when there is a lot of number data in a text the only other option is to be able to extract the timestamp from the last paragraph of every observation as they are always located at the beginning of the last paragraph but I don't seem to be able to do it myself hence I am asking.

here is a snippet of how most of the data is layedout:

2019-04-10 12:43:54 - Andras Eger (Work notes)
Sim life cycle attached

2019-04-09 17:25:38 - Timea Magyar (Additional comments)
Thank you for contacting us.
We confirm that we have received your email and we are processing the 
case.
As soon as we get any update from the resolver team, we will inform you.

2019-04-09 17:25:25 - Timea Magyar (Work notes)
VTIS: INC000033296089

2019-04-09 17:22:10 - Timea Magyar (Work notes)
This New Incident was raised on behalf of Daniel Orejuela from [code]<a 
href='new_call.do?sys_id=0b580c90dbf837404cd858a5dc961989&
sysparm_stack=new_call_list.do?sysparm_query=active=true'>CALL0109649</a>
[/code][code]<br><p><span>Call Notes

So the main question is: How do I extract the date&time for the last paragraph of every observation? in this case, the output should be:

2019-04-09 17:22:10
Ad

Answer

First split your input by \n\n, use the last result from list, and then use regex.

text = "..."

import re

last_paragraph = text.split("\n\n")[-1]

result = re.findall("[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}",last_paragraph)[0]

print (result)

Result:

2019-04-09 17:22:10
Ad
source: stackoverflow.com
Ad