Ad

Python, Measure Unique Elements By Row Ignoring Specific Values

- 1 answer

I have a data frame of 6 columns where each entry has a sequence of numbers.

pd.DataFrame(FixByteOrderUnique)
Out[518]: 
         0   1   2    3    4  5
0       58  68  58   59   -1 -1
1       59  69  59   58   -1 -1
2       93  94  93   33   -1 -1
3       58  59  58   68   -1 -1
4       92  94  92   33   -1 -1
5       59  58  59   69   -1 -1
6       57  48  57   79   -1 -1
7       15  26  15  101   -1 -1

I want per line to measure the number of unique elements ignoring in the count the numbers: -1,100,101 and 102. Valid numbers are from [0,99].

What I did was to make a lambda function that ignores in the counting the -1

def myfunc(row):
    if -1 in row.values:
        return row.nunique() - 1
    else:
        return row.nunique()

and then call my function like this

pd_sequences['unique'] = pd.DataFrame(FixByteOrderUnique).apply(myfunc, axis=1)

How I can include inside my lambda function to check if the number is from [0,99] to be eligible for the uniqueness counting?

Ad

Answer

You can change myfunc to

def myfunc(row):
    return row[(row < 100) & (row > -1)].nunique()

using boolean indexing of dataframe.

Ad
source: stackoverflow.com
Ad