Ad

Assign Random Values Equally To Pandas Dataframe

- 1 answer

I have pandas dataframe, say df which looks like

Region  ID
A       111
A       222
A       333
A       444
B       555
B       666
B       777
C       888
C       999

ID column has its weights. In this case, A's weight is 2, B's weight is 2 and C's weight is 1.

weights never are more than the number of values in "Region" column, meaning A's weight never be more than 4 as we have 4 records for A

I want to make a new column and in this column assign random integer values according to weights in ID column BUT these random values have to be equally distributed. For more clarity, I expect new dataframe should look like this

Region  ID   Random_Value
A       111      1
A       222      2 
A       333      1
A       444      2
B       555      2
B       666      2
B       777      1
C       888      1
C       999      1

When the values in "Region" column is odd, like "B" I want to assign random values equally but the remainder can have any random integer value.

When the values in "Region" column is even, like "A" and its weight is 2 I need to assign random integer value from 1 to 2 inclusively and the number of these random integers should be equal.

I tried many ways but no success. Is there a way to solve this problem?

My code is the following:

df['Random_Value'] = np.nan

A = df['region'] == 'A'

df.loc[A, 'Random_Value'] = np.random.randint(1,3, size=A.sum())
Ad

Answer

Suppose you have the dictionary store each region weight.

weight_dict = {'A':2, 'B':2, 'C':1}

I used.

  1. groupy then loop over it to get each group from dataframe.
  2. np.range to generate the possible weight from weight_dict.
  3. np.repeat to generate values for random.
  4. np.random.choice with replace=False to get the value without replacement.

Then create the new column with np.concatenate to combine list.

ls = []

for idx, d in df.groupby('Region'):

    group_size = d.shape[0]

    weight_range = np.arange(1, weight_dict[idx]+1)

    combination = np.repeat(weight_range, np.ceil(group_size/len(weight_range)))

    ls.append(np.random.choice(combination, group_size, replace=False))

df['Random_Value'] = np.concatenate(ls)

df

  Region   ID  Random_Value
0      A  111             2
1      A  222             1
2      A  333             1
3      A  444             2
4      B  555             1
5      B  666             2
6      B  777             2
7      C  888             1
8      C  999             1

You can try to print each variable to see what happened in the loop.

Ad
source: stackoverflow.com
Ad