Ad

Getting The Location Of Indices Missing From Secondary DataFrame

- 1 answer

Please examine the commented text in the code below in order to understand the problem.

import pandas as pd
import numpy as np

primary = pd.DataFrame(
    data = ['little','mary','had','a','swan'],
    index =pd.DatetimeIndex(['2015-09-25 12:00:00', 
                           '2015-09-25 13:00:00',
                           '2015-09-25 14:00:00',
                           '2015-09-25 15:00:00',
                           '2015-09-25 16:00:00']),
    columns=['some_nonsense'])

secondary = pd.DataFrame(
    data = ['mommy',np.nan],
    index =pd.DatetimeIndex(['2015-09-25 14:00:00',
                           '2015-09-25 15:00:00']),
    columns=['copy_me'])

# 1. secondary dataframe values have already been computed
# 2. we want to assign them to the primary dataframe for available dates
# 3. once done, we want to return dataframe index locations for missing values
# 4. nan is one of the valid values the secondary dataframe can take

primary['copy_me'] = secondary['copy_me']

print (secondary)
print (primary)

# The values have been copied successfully
# But how to get the locations of missing indices?
# The expected result is as follows:
# If I know these values I could pass them to my computing function

missing_indices = np.array([0,1,4])
print('needed result: ', missing_indices)
Ad

Answer

If I understand correctly, this might help:

(~primary.index.isin(secondary.index)).nonzero()[0]

Breakdown:

  1. Find which primary indixes are present in secondary (primary.index.isin(secondary.index)).
  2. Negate that condition (~).
  3. Find positions where value is non-zero, meaning True, using numpy.nonzero. (.nonzero()[0], [0] because it returns a tuple)
Ad
source: stackoverflow.com
Ad