Ad

Solr: Deleting Documents With Angle Brackets In Id

- 1 answer

I'm trying to delete documents from a Solr index. I'm using pysolr and trying to delete them by id and by query. In both cases the operation fails with ids like this one: cr-10.1002/(sici)1520-6688(199621)15:2<476::aid-pam7>3.3.co;2-2 with following error:

pysolr.SolrError: Solr responded with an error (HTTP 400): [Reason: Unexpected character '4' (code 52) in content after '<' (malformed start element?). at [row,col {unknown-source}]: [1,53]]

https://lucene.apache.org/core/7_2_1/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Escaping_Special_Characters has no mention of escaping angle brackets at all. I tried it though, with no luck.

Any idea what I can do to delete these documents?

EDIT: updated the ID to match the error

Ad

Answer

I ended up using the JSON API like this:

import requests

url = 'http://localhost:8983/solr/collection/update' # update endpoint of the collection

ids_to_delete = ['a', 'b<c', 'd:e']
requests.post(url, json={ 'delete': ids_to_delete })
Ad
source: stackoverflow.com
Ad