Extract Last Paragraph From Tect With Python
I am analyzing service desk tickets and I need to extract the first timestamp from the comments column. that is, I need to know at what date and time the service desk analyst interacted with a ticket for the first time. I have used datefinder.find_dates()
function and that works reasonably well but I have some ticket comments that are very technical and use lots of numbers and IUP Addresses and this seems to confuse the datefinder.find_dates()
function and a lot of times it is just spitting irrelevant data. I have tried searching for a tutorial on the function but there are none that are helpful as it seems that this function is not very popular.
I have also found this and this SOF questions but they don't address my issue.
because datefinder.find_dates()
does not work well when there is a lot of number data in a text the only other option is to be able to extract the timestamp from the last paragraph of every observation as they are always located at the beginning of the last paragraph but I don't seem to be able to do it myself hence I am asking.
here is a snippet of how most of the data is layedout:
2019-04-10 12:43:54 - Andras Eger (Work notes)
Sim life cycle attached
2019-04-09 17:25:38 - Timea Magyar (Additional comments)
Thank you for contacting us.
We confirm that we have received your email and we are processing the
case.
As soon as we get any update from the resolver team, we will inform you.
2019-04-09 17:25:25 - Timea Magyar (Work notes)
VTIS: INC000033296089
2019-04-09 17:22:10 - Timea Magyar (Work notes)
This New Incident was raised on behalf of Daniel Orejuela from [code]<a
href='new_call.do?sys_id=0b580c90dbf837404cd858a5dc961989&
sysparm_stack=new_call_list.do?sysparm_query=active=true'>CALL0109649</a>
[/code][code]<br><p><span>Call Notes
So the main question is: How do I extract the date&time for the last paragraph of every observation? in this case, the output should be:
2019-04-09 17:22:10
Answer
First split your input by \n\n
, use the last result from list, and then use regex.
text = "..."
import re
last_paragraph = text.split("\n\n")[-1]
result = re.findall("[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}",last_paragraph)[0]
print (result)
Result:
2019-04-09 17:22:10
Related Questions
- → What are the pluses/minuses of different ways to configure GPIOs on the Beaglebone Black?
- → Django, code inside <script> tag doesn't work in a template
- → React - Django webpack config with dynamic 'output'
- → GAE Python app - Does URL matter for SEO?
- → Put a Rendered Django Template in Json along with some other items
- → session disappears when request is sent from fetch
- → Python Shopify API output formatted datetime string in django template
- → Can't turn off Javascript using Selenium
- → WebDriver click() vs JavaScript click()
- → Shopify app: adding a new shipping address via webhook
- → Shopify + Python library: how to create new shipping address
- → shopify python api: how do add new assets to published theme?
- → Access 'HTTP_X_SHOPIFY_SHOP_API_CALL_LIMIT' with Python Shopify Module