I Want To Extract The File Name And Slice It To Take A Part And Put This Part In A Column Of The Excel. And I Want To Do This For 8 Files

- 1 answer

I am trying to extract a part of the excel file name and add it to the new column in the same excel. I want to do this for 8 files.

I could slice a part of the file name but I am unable to add it in its column.

import glob
import pandas as pd

output = pd.DataFrame()
for file in glob.glob("*.xlsx"):

    x = file
    slice1, slice2 = 13, 17
    final = [x[slice1:slice2]]
    x['Out'] = final
    output.to_excel("outPut.xlsx", index = False, na_rep = "NA", header=True)

For Ex :

I have 8 excels with file name "ABC_Alphwise_OUT1" , "ABC_Alphwise_OUT2" and so on. I want to slice the file name first to get "OUT1", "OUT2" and so on.

Then I want "OUT1", "OUT2" and so on to be added in a column of the excels "ABC_Alphwise_OUT1" , "ABC_Alphwise_OUT2" and so on respectively.

I have given 2 input and output for sample.


ABC_Alphwise_OUT1 : 1st Excel input

ABC_Alphwise_OUT2 : 2nd Excel input


ABC_Alphwise_OUT1 : 1st Excel Output

ABC_Alphwise_OUT1 : 2nd Excel Output



As I understand it now, each excel file has x number of rows. You want to append a column to that excel file with that extracted and transformed name (file_slice) of the file repeated x number of times.

First, let's get the file_slice and load the excel file as a pandas dataframe. Then we get the number of rows in the excel sheet (len(df.index)), and create a list of the file_slice duplicated the same number of times. The list is then appended to the dataframe in a new column Out. The dataframe can then overwrite the original excel file.

import glob
import pandas as pd

for file in glob.glob('*.xlsx'):
    file_slice = file[slice1:slice2]
    df = pd.read_excel(file)
    file_slice_list = [file_slice]*len(df.index)
    df['Out'] = pd.Series(file_slice_list).values
    df.to_excel(file, index=False, na_rep = 'NA', header = True)