Ad

What Is The Fastest Way To Retrieve Header Names From Excel Files Using Pandas

- 1 answer

I have a big size excel files that I'm organizing the column names into a unique list. The code below works, but it takes ~9 minutes! Does anyone have suggestions for speeding it up?

import pandas as pd
import os
get_col = list(pd.read_excel("E:\DATA\dbo.xlsx",nrows=1, engine='openpyxl').columns)
print(get_col)
Ad

Answer

Using pandas to extract just the column names of a large excel file is very inefficient. You can use openpyxl for this:

from openpyxl import load_workbook

wb = load_workbook("E:\DATA\dbo.xlsx", read_only=True)

columns = {}

for sheet in worksheets:
    for value in sheet.iter_rows(min_row=1, max_row=1, values_only=True):
        columns = value

Assuming you only have one sheet, you will get a tuple of column names here.

Ad
source: stackoverflow.com
Ad