Ad

How To Keep Running The Loop In Python Even After A KeyError

I am having troubles with a code that is useful to create a pixel map, particularly in the loop that groups data in the selected area. I can't get over a KeyError. How can I handle with this?

I am working with Python 3.7 and I have already tried to put some control on the loop, but the loop doesn't go over, cause the first encountered pixel seems to be empty. I also tried to use try: and except:KeyError, but at the end I get a row that I can't reshape because, obviously, the loop just skips empty sub-dataframes. Here I report the main code steps, letting you know that 'lin' and 'col' are the integer numbers that indicate the position of a certain measure in a certain pixel:

1st Tryout:

mean_val=[]
row=[]

for i in range (0,Ypix):

   for j in range (0,Xpix):

      data_pix = data.groupby(['lin', 'col']).get_group((i,j))[['ref', 'th']]

      if KeyError:
                data_pix = pd.DataFrame()

       else:
                mean_level= data_pix['ref'].mean()  
                row.append(mean_level)

mean_val = np.array(row).reshape(Ypix, Xpix) 

2nd tryout:

mean_val=[]
row = []

for i in range (0,Ypix):

  for j in range (0,Xpix):

      try:
         data_pix=data.groupby(['lin', 'col']).get_group((i,j))[['ref', 'th']]

      except KeyError:
         data_pix = pd.DataFrame()

      else:

         mean_level= data_pix['ref'].mean()  
         row.append(mean_level)

mean_val = np.array(row).reshape(Ypix, Xpix)

I expected at the end a row to be reshaped to have the map, and I expected to get at least an empty pixel where there are no data, in order to reshape properly. The errors showing are the following:

1st tryout:

Traceback (most recent call last):
File "grid.py", line 385, in <module>
    proc.process()

File "grid.py", line 106, in process
    data_pix = data.groupby(['lin', 'col']).get_group((i,j))[['ref', 'th']]

File "C:\xxx\yyy\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\groupby\groupby.py", line 680, in get_group

  raise KeyError(name)

KeyError: (0, 0)

2nd tryout:

Traceback (most recent call last):
  File "grid.py", line 379, in <module>
    proc.process()

File "grid.py", line 276, in process

   mean_val = np.array(row).reshape(Ypix, Xpix) 

ValueError: cannot reshape array of size 1506 into shape (50,50)

Anyone could help me please?

Ad

Answer

As I suppose, your groupby gives groups for a fraction of possible combinations of i and j (for some combinations of i / j there is no corresponding group).

Then exception handling alone (as proposed in the other answer) will not do the job, because you:

  • gather data only for existing groups,
  • then try to reshape them as if you had data for all groups.

My proposition is that instead of gathering the data for all combinations of i / j, with exception handling for lack of particular group, you should fill each element of an intermediate result, only for existing groups. Something like:

means = data.groupby(['lin', 'col'])['ref'].mean()

The result is a Series with:

  • a MultiIndex composed of lin and col - pixel coordinates,
  • value - means for ref from the current group.

Then transcode this table to your result table (of size Xpix * Ypix), filling remaining cells with some value meaning "no data" (e.g. 0).

Note: As you didn't provide any sample data, I couldn't do any tests, so all the above is based on how I understood your case and most likely it requires some corrections / completions to get really working code.

Supplement: How to perform transcoding (example)

Assume that means - the source Series is:

         ref
lin col     
0   0      1
    1      2
    2      3
1   0      4
    1      5
    2      6
2   0      7
    1      8
    2      9

Run:

Xpix = 5; Ypix = 5       # Target array size (example)
df1 = means.unstack()    # Convert to DataFrame
# Drop top level from the column index ('ref')
df1.columns = df1.columns.droplevel()
df1.columns.name = None  # Drop the name from the column index ('col')
df1.index.name = None    # Drop the name from the row index ('lin')
# Reindex (change the shape), and fill with "empty" values
df1 = df1.reindex(index=range(Xpix), columns=range(Xpix), fill_value=0)

The result is:

   0  1  2  3  4
0  1  2  3  0  0
1  4  5  6  0  0
2  7  8  9  0  0
3  0  0  0  0  0
4  0  0  0  0  0

Now you have a DataFrame with default column index and default row index, but if you wish, you can take df1.values - the underlying Numpy array.

Ad
source: stackoverflow.com
Ad