# Maintaining Proportionality Of Sum Constrained Row When Changing A Single Row Value

## 11 August 2019 - 1 answer

I have a dataset that consists of compositional data. Each column represents the percentage (decimal value) of a component in the mixture whole. Each row sums to 1.

If one component in the mixture changes, the rest must change accordingly so as to satisfy the sum constraint.

I am performing a multiple linear regression with this data and it requires some transformations such that the regression coefficients are meaningful and interpretable. The dataset contains zero values and this is an issue for the particular type of transformation I am trying to implement.

Before I can perform this transformation the recommended action is to replace all zero values with a small number and to adjust the the remaining components such that the sum constraint is still satisfied.

You can see in the dummy df below there are cases where there are more than 1 zero value in a row.

``````data = {'X1': [0.21, 0.08, 0.57, 0.03],
'X2': [0.27, 0.56, 0.0, 0.02],
'X3': [0.0, 0.14, 0.0, 0.45],
'X4': [0.13, 0.02, 0.26, 0.37],
'X5': [0.39, 0.2, 0.17, 0.13]}

df = pd.DataFrame(data)

print(df)
``````

Lets just consider one row and so the formula to do so is as follows:

let the original value be `r_i`. For a change in component `r_i` of `delta_i` we get the new value `x_i`.

so, `x_i = r_i + delta_i`

To maintain the relative proportion between the remaining components,

let `r_j` be the original value of the remaining components,

the new value for the jth component x_j is then,

`x_j = r_j - ((r_j / (1 - r_i) * delta_i) and i != j`

I am struggling to write an appropriate loop for this problem that will search the dataset for zero values, then add a small number to the index and column containing the zero value and then go on to adjust the entire row with the formula I described above.

EDIT:

Sorry for the bad representation of the maths formula.

For the first row in the dummy df the application of the formula is straight forward as there is only one zero in the row:

It's important that the relative proportions between the remaining components remains the same, you can see that here as I update the zero value to a small number.

For the third row in the dummy df things get a bit more complicated. I update the first (X2) zero value by adding a small number. The second (X3) zero value remains zero because the formula is multiplying and dividing by zero. So I make a second update such that X2 and X3 are now small non zero values and this this shown on the third row of the table below.

It's the same case for maintaining relative proportions between remaining components for the case where more than one zero exists on the row.

I can't think of a loop for the first problem, never mind the second! Also, don't worry about the large numbers created by dividing by a small number in the relative proportions tables, I'll deal with that later.

Here is the answer following edit:

``````import pandas as pd
# To show 10 decimal points.
pd.options.display.float_format = '{:.10f}'.format

data = {'X1': [0.21, 0.08, 0.57, 0.03],
'X2': [0.27, 0.56, 0.0, 0.02],
'X3': [0.0, 0.14, 0.0, 0.45],
'X4': [0.13, 0.02, 0.26, 0.37],
'X5': [0.39, 0.2, 0.17, 0.13]}

df = pd.DataFrame(data)

delta_i = 0.000001
r_i = 0.0

# Provided formula.
return r_j - ((r_j / (1 - r_i)) * delta_i)

# For row-wise application.

# Get all zeros and their count in the row.
zero_count = row[zero_mask].shape[0] # Get only x.

for i in range(zero_count):

# Increase the mask by delta_i across the board.

return row

# Apply ROW-WISE using axis=1.
df.apply(lambda x: adjust_row(x, r_i, delta_i), axis=1)
print(df)

# Check sums.
print(df.apply(lambda x: x.sum(), axis=1))
``````

This gives the following result:

There are more optimal ways but this should take care of the general logic.