Grab Specific Text From XML
Hello :) This is my first python program but it doesn't work.
What I want to do :
- import a XML file and grab only Example.swf from
<page id="Example"> <info> <title>page 1</title> </info> <vector_file>Example.swf</vector_file> </page> (the text inside <vector_file>)
- than download the associated file on a website (https://website.com/.../.../Example.swf)
than rename it 1.swf (or page 1.swf)
and loop until I reach the last file, at the end of the page (Exampleaa_idontknow.swf → 231.swf)
convert all the files in pdf
What i have done (but useless, because of AttributeError: 'xml.etree.ElementTree.Element' object has no attribute 'xpath'):
import re import urllib.request import requests import time import requests import lxml import lxml.html import os from xml.etree import ElementTree as ET DIR="C:/Users/mypath.../" for filename in os.listdir(DIR): if filename.endswith(".xml"): with open(file=DIR+".xml",mode='r',encoding='utf-8') as file: _tree = ET.fromstring(text=file.read()) _all_metadata_tags = _tree.xpath('.//vector_file') for i in _all_metadata_tags: print(i.text + '\n') else: print("skipping for filename")
First of all, you need to make up your mind about what module you're going to use.
xml? Import only one of them.
lxml has more features, but it's an external dependency.
xml is more basic, but it is built-in. Both modules share a lot of their API, so they are easy to confuse. Check that you're looking at the correct documentation.
For what you want to do, the built-in module is good enough. However, the
.xpath() method is not supported there, the method you are looking for here is called
Then you need to remember to never parse XML files by opening them as plain text files, reading them into into string, and parsing that string. Not only is this wasteful, it's fundamentally the wrong thing to do. XML parsers have built-in automatic encoding detection. This mechanism makes sure you never have to worry about file encodings, but you have to use it, too.
It's not only better, but less code to write: Use
ET.parse() and pass a filename.
import os from xml.etree import ElementTree as ET DIR = r'C:\Users\mypath' for filename in os.listdir(DIR): if not filename.lower().endswith(".xml"): print("skipping for filename") continue fullname = os.path.join(DIR, filename) tree = ET.parse(fullname) for vector_file in tree.findall('.//vector_file'): print(vector_file.text + '\n')
If you only expect a single
<vector_file> element per file, or if you only care for the first such element, use
.find() instead of
vector_file = tree.find('.//vector_file') if vector_file is None: print('Nothing found') else: print(vector_file.text + '\n')
- → What are the pluses/minuses of different ways to configure GPIOs on the Beaglebone Black?
- → Django, code inside <script> tag doesn't work in a template
- → React - Django webpack config with dynamic 'output'
- → GAE Python app - Does URL matter for SEO?
- → Put a Rendered Django Template in Json along with some other items
- → session disappears when request is sent from fetch
- → Python Shopify API output formatted datetime string in django template
- → Shopify app: adding a new shipping address via webhook
- → Shopify + Python library: how to create new shipping address
- → shopify python api: how do add new assets to published theme?
- → Access 'HTTP_X_SHOPIFY_SHOP_API_CALL_LIMIT' with Python Shopify Module