Ad

Scrape Data From Webpage With BeautifulSoup - How To Append Data To Existing Dataframe?

With the following code I try to scrape data from a website (reference: https://towardsdatascience.com/web-scraping-scraping-table-data-1665b6b2271c):

df = pd.DataFrame(columns=headings)
for i in range (102,158):
    URL = 'http://bulibox.de/abschlusstabellen/'
    URL_ = URL + 'B100' +str(i+1) + '.html'
    r = urllib.request.urlopen(URL_).read()
    soup = BeautifulSoup(r,'lxml')
    table = soup.find('table' ,attrs={'class':'abschluss'})
    body = table.find_all("tr")
    head = body[0]
    body_rows = body[1:]
    headings = []
    for item in head.find_all('th'):
        item = (item.text).rstrip('\n')
        headings.append(item)
    all_rows = [] 
    for row_num in range(len(body_rows)):
        row = []
        for row_item in body_rows[row_num].find_all("td"):
            aa = re.sub("(\xa0)|(\n)|,","",row_item.text)
            row.append(aa)
        all_rows.append(row)
    df1 = pd.DataFrame(data=all_rows,columns=headings)
    df.append(df1, ignore_index=True)

I 'intialized' the dataframe as an empty dataframe only with the correct column names and then tried to use a loop in order to loop over the data on the website. Partially it seems to work because df1 is the data of the last website link. But df is still the initialized empty dataframe. I am wondering what I did wrong here?

Ad

Answer

This is not the best strategy to append to a dataframe. Use instead a python data structure like list or dict then at the end of the loop, concat them to get your dataframe:

data = []
for i in range(102, 158)
    # do stuff here
    df1 = ...
    data.append(df1)
df = pd.concat(data, ignore_index=True)

Output:

>>> df
     Platz             Mannschaft Spiele    S-U-N        Tore Pkt.         Statistik
0       1.       TSV 1860 München     34  20-10-4  80:40(+40)   50  Saison 1965/1966
1       2.      Borussia Dortmund     34   19-9-6  70:36(+34)   47  Saison 1965/1966
2       3.         Bayern München     34   20-7-7  71:38(+33)   47  Saison 1965/1966
3       4.          Werder Bremen     34  21-3-10  76:40(+36)   45  Saison 1965/1966
4       5.             1. FC Köln     34   19-6-9  74:41(+33)   44  Saison 1965/1966
...    ...                    ...    ...      ...         ...  ...               ...
1005   14.             Hertha BSC     34  8-11-15  41:52(-11)   35  Saison 2020/2021
1006   15.  DSC Arminia Bielefeld     34   9-8-17  26:52(-26)   35  Saison 2020/2021
1007   16.             1. FC Köln     34   8-9-17  34:60(-26)   33  Saison 2020/2021
1008   17.       SV Werder Bremen     34  7-10-17  36:57(-21)   31  Saison 2020/2021
1009   18.          FC Schalke 04     34   3-7-24  25:86(-61)   16  Saison 2020/2021

[1010 rows x 7 columns]
Ad
source: stackoverflow.com
Ad