Ad

Concatenating Header List To Dataframe In Pandas

I am having trouble to concatenate 2 simple DataFrames. I upload first one .txt file containing the data set, and then another one containing the header of the previous dataset.

First I upload the 2 DataFrames:

df = pd.read_csv(file_dir + file_name, sep = ',', header = None, encoding = 'latin-1', low_memory = False)
df_column_names = pd.read_csv(file_dir + file_name_cols, sep = ',', header = None, encoding = 'latin-1', low_memory = False)

Afterwards, I create a list of the header's DataFrame by first transposing the table, and the converting it into a list:

list_names = df_column_names.T.values.tolist()

Then, I finally create the desired DataFrame:

df.columns = list_names

But I receive the following error message:

ValueError: Length mismatch: Expected axis has 26 elements, new values have 1 elements

The dimensions of my objects are: df of size (204,26) and type DataFrame, df_column_names is size (1,26) and type DataFrame, list_names is size 26 and type list.

After reading other threads, the most similars were here, and here. Nevertheless, after checking the indexes of my two DataFrames, both seem OK:

In [4]: print(df.index)
RangeIndex(start=0, stop=205, step=1)

In [5]: print(df_column_names.index)
RangeIndex(start=0, stop=1, step=1)

In [6]: len(list_names)
Out[6]: 26

The look of list_names is the following:

In [7]: list_names
Out[7]: 
[['symboling'],
 ['normalized-losses'],
 ['make'],
 ['fuel-type'],
 ['aspiration'],
 ['num-of-doors'],
 ['body-style'],
 ['drive-wheels'],
 ['engine-location'],
 ['wheel-base'],
 ['length'],
 ['width'],
 ['height'],
 ['curb-weight'],
 ['engine-type'],
 ['num-of-cylinders'],
 ['engine-size'],
 ['fuel-system'],
 ['bore'],
 ['stroke'],
 ['compression-ratio'],
 ['horsepower'],
 ['peak-rpm'],
 ['city-mpg'],
 ['highway-mpg'],
 ['price']]

Thanks in advance for your help and advice.

Ad

Answer

Your list_names is a list of lists. The requirement is to have a flat list.

You need to amend this line:

list_names = df_column_names.T.values.tolist()

To this:

df_column_names = df_column_names.transpose() # transpose dataframe if necessary
list_names = df_column_names[0].tolist()

You need to transpose your dataframe, as above, if your column names are in the first row rather than first column.

Ad
source: stackoverflow.com
Ad