.Net Core Fastest Way To Check For Existing Ids (Large Dataset)

- 1 answer

Once a week I run a background service that checks an export file which contains a list of ids that may need to be merged into my database. Currently my database has ~3.5 million records in it.

The first thing I do after getting all the possible new ids from the file is check if the id is already in the database. This check takes around 9-10 minutes to complete. This is a background job so its not the end of the world, I just wonder if it can be more efficient.

Here is what I'm currently using (seems to be the fastest way I've found)

var newIds = new List<int>();     
var existingIds = _context.{currentRecords}.Select(i => i.Id).ToList();
var ingestIds = ingestList.Select(i => i.Id).ToList();
newIds = ingestIds.Where(id => !existingIds.Contains(id)).ToList();

other things i've tried which are slower

foreach (var id in ingestIds)
   if (!_context.{currentRecords}.Any(i => i.Id == id)) newIds.Add(id);

I've tried a couple of other slightly different methods based on searches, but they were slower and I've apparently mis-placed the code from those attempts. Any help is greatly appreciated! Thanks!



I think the problem is that you get all the data to the client-side. Maybe you can try this;

    List<TKey> ingestIds = ingestList.Select(i => i.Id);

    var existingIds = _context.{currentRecords}.Where(i => ingestIds.Contains(i.Id)).Select(i => i.Id).ToList();

    var newIds = ingestIds.Except(existingIds);

This will execute where and select operations in database.