Ad

How To Loop The Data With Condition?

- 1 answer

I have a set of data such that the 1st column is age (numerical), 2nd column is gender (categorical) and 3rd column is saving (numerical).

What I want to do is find the mean and standard deviation if the column is a numerical data, and find the mode if the column is categorical data.

I tried to find the index if the type = num and put the index into the for loop to calculate the mean and standard deviation and the rest of the index is used to calculate the mode of the categorical data (in this case is 2nd column), however, I had stuck in the loop.

import numpy as np

data = np.array([[11, "male",1222],[23,"female",333],[15,"male",542]])

# type of the data above
types = ["num","cat","num"]

idx = []
for i in range(2): 
    if (types[i] == "num"):
       idx.append(types[i].index)

for i in idx:
    np.mean(data[:,i].astype("float64"))

I hope the code is able to obtain the mean and standard deviation for numerical data and mode for categorical data. If it is possible, try not to build in any other package (I'm not sure `index' have it own package or not).

Ad

Answer

Simply remove the parenthesis in the if statement.

...

idx = []
for i in range(2): 
    if types[i] == "num":
       idx.append(types[i].index)
...

Edit: Instead of looping a range I would suggest iterate your types array with enumerate, in that way you have the index of your desired item.

for index, _type in enumerate(types):
    if _type == 'num':
        idx.append(index)

Ad
source: stackoverflow.com
Ad