Ad
Pandas Import A Multiindex Csv With Index Levels On The Same Column
I have a multiindex csv with the following format:
; ;2000;2001;2002;2003;2004;2005;2006;2007;2008;2009;2010;2011;2012;2013;2014;2015;2016;2017
CO2;;;;;;;;;;;;;;;;;;;
010000 Agriculture and horticulture;AZZ;2312;2249;2165;2102;2034;2095;2106;2067;2060;1935;1985;1983;1893;1865;1750;1728;1777;1736
020000 Forestry;AZZ;40;42;39;43;46;50;49;49;46;52;62;62;67;60;63;66;67;66
030000 Fishing;AZZ;785;767;746;722;645;655;629;580;501;485;472;441;351;384;352;382;387;377
; ;2000;2001;2002;2003;2004;2005;2006;2007;2008;2009;2010;2011;2012;2013;2014;2015;2016;2017
More CO2;;;;;;;;;;;;;;;;;;;
010000 Agriculture and horticulture;AZZ;2312;2249;2165;2102;2034;2095;2106;2067;2060;1935;1985;1983;1893;1865;1750;1728;1777;1736
020000 Forestry;AZZ;40;42;39;43;46;50;49;49;46;52;62;62;67;60;63;66;67;66
030000 Fishing;AZZ;785;767;746;722;645;655;629;580;501;485;472;441;351;384;352;382;387;377
So both levels of the MultiIndex are actually on the same column.
I am trying to import it as follows:
df=pd.read_csv('my.csv',sep=";",header=[0],index_col=[0])
But this returns the following error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5 in position 24: invalid start byte
I am not sure where position 24 is referring to and how to proceed to import the file.
Here is a link to the file: https://wetransfer.com/downloads/338c3aa2ef68052b45d29c509d5bf82120191009073413/88bc558e72adc48e8683d8af2792d51d20191009073413/81d59b
Desired Output
2000 2001 2002 2003 ...
CO2 010000 Agriculture and horticulture AZZ 2312.0 2249.0 2165.0 2102.0 ...
020000 Forestry AZZ 40.0 42.0 39.0 43.0 ...
030000 Fishing AZZ 785.0 767.0 746.0 722.0 ...
060000 Extraction of oil and gas BZ1 2174.0 2190.0 2184.0 2188.0 ...
080090 Extraction of gravel and stone BZ2 295.0 332.0 304.0 277.0 ...
2000 2001 2002 2003 ...
More CO2 010000 Agriculture and horticulture AZZ 2312.0 2249.0 2165.0 2102.0 ...
020000 Forestry AZZ 40.0 42.0 39.0 43.0 ...
030000 Fishing AZZ 785.0 767.0 746.0 722.0 ...
060000 Extraction of oil and gas BZ1 2174.0 2190.0 2184.0 2188.0 ...
080090 Extraction of gravel and stone BZ2 295.0 332.0 304.0 277.0 ...
Ad
Answer
For me working set encoding
and then is necessary some processing:
df = pd.read_csv('AirEmissions117.csv',
sep=";",
encoding = "ISO-8859-1",
)
#check if last 5 columns contains only NaN
m = df.iloc[:, -5:].isna().all(1)
#create new column in first position by types
df.insert(0, 'type', df.iloc[:, 0].where(m).ffill())
#remove NaNs rows and create MultiIndex
df = df[~m].set_index(df.columns[:3].tolist())
Ad
source: stackoverflow.com
Related Questions
- → What are the pluses/minuses of different ways to configure GPIOs on the Beaglebone Black?
- → Django, code inside <script> tag doesn't work in a template
- → React - Django webpack config with dynamic 'output'
- → GAE Python app - Does URL matter for SEO?
- → Put a Rendered Django Template in Json along with some other items
- → session disappears when request is sent from fetch
- → Python Shopify API output formatted datetime string in django template
- → Can't turn off Javascript using Selenium
- → WebDriver click() vs JavaScript click()
- → Shopify app: adding a new shipping address via webhook
- → Shopify + Python library: how to create new shipping address
- → shopify python api: how do add new assets to published theme?
- → Access 'HTTP_X_SHOPIFY_SHOP_API_CALL_LIMIT' with Python Shopify Module
Ad