Multiple Errors Importing Huge CSV. How To Diagnose?
I have a csv with over 1 million rows that I'm trying to import. Unfortunately I can't share a sample of the data but this is the code I'm using to import it:
transactions = pd.read_csv('bank_raw_data.csv', sep=',', error_bad_lines=False, warn_bad_lines=True, engine='python', encoding='ISO-8859-1', escapechar='\\', skiprows=[i for i in range(1,263)])
I skip rows that have errors, and below is a section of errors I'm getting:
Skipping line 1294103: ',' expected after '"' Skipping line 1300423: field larger than field limit (131072) Skipping line 1300695: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 1294273: Expected 21 fields in line 1294273, saw 31
Unfortunately I can't check the csv in Excel due to it's size so I don't know whats' going on in line 12455 etc.
Any advice on how to diagnose these errors?
I have also changed encoding to
encoding='cp1252' but get the error:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 4082: character maps to <undefined>
The reason I tried
cp1252 as the encoding is this:
with open('bank_raw_data.csv') as f: print(f) <_io.TextIOWrapper name='bank_raw_data.csv' mode='r' encoding='cp1252'>
But it fails.
You can check the specific line through:
Get-Content filename.csv | Select -Index x-1
Note Select starts on 0, so to read line 10 you'd write
cat filename.csv | awk 'NR==x'
- → What are the pluses/minuses of different ways to configure GPIOs on the Beaglebone Black?
- → Django, code inside <script> tag doesn't work in a template
- → React - Django webpack config with dynamic 'output'
- → GAE Python app - Does URL matter for SEO?
- → Put a Rendered Django Template in Json along with some other items
- → session disappears when request is sent from fetch
- → Python Shopify API output formatted datetime string in django template
- → Shopify app: adding a new shipping address via webhook
- → Shopify + Python library: how to create new shipping address
- → shopify python api: how do add new assets to published theme?
- → Access 'HTTP_X_SHOPIFY_SHOP_API_CALL_LIMIT' with Python Shopify Module