Problems To Read German Csv File In Python
I am having a german csv file, which I want to read with pd.read_csv
.
Data:
The original file looks like this:
So it has two Columns (A,B) and the seperator should be ';'
,
Problem: When I ran the command:
dataset = pd.read_csv('C:/Users/.../GermanNews/articles.csv',
encoding='utf-8', header=None, sep=';')
I get the error:
ParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 3
Half-Solution: I understand this could have several reasons, but when I ran the command:
dataset = pd.read_csv('C:/Users/.../GermanNews/articles.csv',
encoding='utf-8', header=None, sep='delimiter')
I get the following dataset back:
0
0 Etat;Die ARD-Tochter Degeto hat sich verpflich...
1 Etat;App sei nicht so angenommen worden wie ge...
2 Etat;'Zum Welttag der Suizidprävention ist es ...
3 Etat;Mitarbeiter überreichten Eigentümervertre...
4 Etat;Service: Jobwechsel in der Kommunikations...
so I only get one column instead of the two desired columns,
Target: any idea how to load the dataset correctly that I have:
0 1
0 Etat Die ARD-Tochter Degeto hat sich verpflich...
1 Etat App sei nicht so angenommen worden wie ge...
Hints/Tries:
When I run the search function over my data in excel, I am also not finding any ;
in it.
It seems like that some lines have more then two columns (as you can see for example in line 3 and 13 of my example
Answer
One possible solution is create one column DataFrame
with separator not in data like delimiter
and then use Series.str.split
with n
parameter and expand=True
for new DataFrame
:
dataset = pd.read_csv('C:/Users/.../GermanNews/articles.csv',
encoding='utf-8', header=None, sep='delimiter')
#more general solution is use some value NOT exist in data like yen ¥
#dataset = pd.read_csv('C:/Users/.../GermanNews/articles.csv',
# encoding='utf-8', header=None, sep='¥')
df = dataset[0].str.split(';', n=1, expand=True)
df.columns = ['A','B']
print (df)
Related Questions
- → What are the pluses/minuses of different ways to configure GPIOs on the Beaglebone Black?
- → Django, code inside <script> tag doesn't work in a template
- → React - Django webpack config with dynamic 'output'
- → GAE Python app - Does URL matter for SEO?
- → Put a Rendered Django Template in Json along with some other items
- → session disappears when request is sent from fetch
- → Python Shopify API output formatted datetime string in django template
- → Can't turn off Javascript using Selenium
- → WebDriver click() vs JavaScript click()
- → Shopify app: adding a new shipping address via webhook
- → Shopify + Python library: how to create new shipping address
- → shopify python api: how do add new assets to published theme?
- → Access 'HTTP_X_SHOPIFY_SHOP_API_CALL_LIMIT' with Python Shopify Module