Ad

Firebase Function Timeout When Calling Set Many Times On Realtime Db

In short: I have a scheduled firebase function that downloads a file, converts this into chunks of JSON and imports that into a realtime database. I've given the function 2GB of Memory and 540 Seconds (9 minutes, the maximum) to get its work done, but it still ends in a timeout about 50% of the time. There must be a leak, but it seems to be in my blind spot.

In details: I have the following scheduled function, that runs once every hour. The function calls a method updateDatabase, which in turn, downloads a file about 35MB in size from an external server (using the method getNewData. This file is a whitespace-separated (kid you not) file containing 2500 rows and 2500 columns of data. I need each of these data points in a database to read them out quickly later.

At first I simply converted this into json and tried to import that into the database, since 35MB didn't seem that big of a deal. However, this made the function run out of memory every single time. So I decided to split it up into smaller portions.

So when the file is downloaded I first split it into rows. Then I loop over the rows, split that into columns and do a set on a realtime database on reference /grid/[currentrow]. This means I call set 2500 times each time. I've tried using an await for each row and using Promise.all() (the current version, but both sometimes just seem to hang at some point. I don't get any errors in the log, just a timeout when the 540 seconds have passed.

The scheduled function:

exports.scheduledDataUpdate = functions
  .region("europe-west1")
  .runWith({
    timeoutSeconds: 540,
    memory: '2GB'
  })
  .pubsub.schedule("0 * * * *")
  .onRun(async () => {
    try {
      await updateDatabase();
      console.log('Database updated');
      return true;
    } catch(error) {
      console.error(error);
      return false;
    }
  });

which call the method updateDatabase:

async function updateDatabase() {
  let data;
  try {
    data = await getNewData().catch(err => console.error(err));
  } catch(error) {
    console.error(error);
    return null;
  }
  console.log('Data download complete');

  const lines = data.split("\n"); // split the data into rows (2500)

  lines.forEach((line, r) => {
    const cols = line.split(/\s+/); // split the row into columns (2500)
    dbUpdates.push(admin.database().ref(`/grid/${r}`).set(cols).then(() => {
      if(r > 1 && r % 500 === 0) {
        console.log(`updated row ${r}`); // just to get some info in logs whether some rows were processed
      }
      return true;
    }).catch((error) => {
      console.error(`Error updating row ${r} -- ${error}`);
    }));
    return true;
  });

  return await Promise.all(dbUpdates);

and, in case you are curious, the method getNewData:

const dataUrl = 'https://some.server/somefile'; // I guess you guess this is different in my code (it is)
async function getNewData() {
  console.log('Start download of data');
  return await new Promise((resolve, reject) => {
    https.get(dataUrl, response => {
      if (response.statusCode === 200) {
        let data = "";
        return response
          .on("data", chunk => {
            data += chunk;
          })
          .on("end", () => {
            resolve(data);
          })
          .on("error", error => {
            reject(error);
            console.error(`error while downloading data: ${data}`);
          });
      } else {
        switch (response.statusCode) {
          case 301:
          case 302:
          case 303:
          case 307:
            console.warning(`Redirected to ${response.headers.location} [${response.statusCode}]`)
            return getNewData(response.headers.location);
          default:
            return reject(new Error(`Could not download new data (error ${response.statusCode})`));
        }
      }
    })
  });
}

When I look at the quotas that firebase offers (I'm running on the Blaze, pay as you go subscription) this really shouldn't be that much of a problem to run. Clearly I'm missing something or making some stupid mistake, but for the life of me, I don't see it.

Update 1 An example of log with timeout (asked by @FrankvanPuffelen)

7:15:03.224 p.m.     scheduledDataUpdate      Function execution started
7:15:03.546 p.m.     scheduledDataUpdate      Start download of data
7:15:08.035 p.m.     scheduledDataUpdate      Data download complete
7:22:28.390 p.m.     scheduledDataUpdate      updated row 500
7:24:03.244 p.m.     scheduledDataUpdate      Function execution took 540021 ms, finished with status: 'timeout'
Ad

Answer

As you can see in the comments on the original questions Frank van Puffelen suggested this use case is simply more than the functions environment can handle. I've done a lot of profiling and I can only conclude that this is indeed the case. So I've tried the following things:

  1. Instead of trying to update 2500 rows with each 2500 items each hour (so 2500 sets with an object of 2500 items) I split it up into chunks of 500 rows every 12 minutes. That helps somewhat, in that now the function is able to get the whole database updated every hour, albeit with some delay for some rows (which is fine in my case); without getting a timeout. Unfortunately, each of these runs would take around seven minutes to complete. This adds up to the the CPU seconds quite a bit. We're not talking massive amounts of money here, but it would add up to say 40-50 dollars per month. So I wanted to try and get things a bit quicker.
  2. I realized that quite a lot of fields (around 35% as it turned out) contained the value -1.00 which basically means "no data". (by the way, every data item is a positive number with two decimals). Instead of adding those to the database I figured I could solve this on the end where we request the data later on. If the value doesn't exist in the database, that means we had a -1.00 value to start with. So for each row (or line if you will), I create an object with just those fields that have a positive value. I then ran this over all rows once, so the empty field got deleted. And low and beholds, after that, things began running fast, and I mean really fast. Instead of 7 minutes per 50 rows, now we're done in less than 10 seconds.

So yeah, the size of you data has a major influence on performance, thanks again @frankvanpuffelen for the hint in the right direction!

Ad
source: stackoverflow.com
Ad