How To Create Large Number Of Entities In Cloud Datastore
My requirement is to create large number of entities in Google Cloud Datastore. I have csv files and in combine number of entities can be around 50k. I tried following:
1. Read a csv file line by line and create entity in the datstore. Issues: It works well but it timed out and cannot create all the entities in one go.
2. Uploaded all files in Blobstore and red them to datastore Issues: I tried Mapper function to read csv files uploaded in Blobstore and create Entities in datastore. Issues i have are, mapper does not work if file size go larger than 2Mb. Also I simply tried to read files in a servlet but again timedout issue.
I am looking for a way to create above(50k+) large number of entities in datastore all in one go.
Answer
Number of entities isn't the issue here (50K is relatively trivial). Finishing your request within the deadline is the issue.
It is unclear from your question where you are processing your CSVs, so I am guessing it is part of a user request - which means you have a 60 second deadline for task completion.
Task Queues
I would suggest you look into using Task Queues, where when you upload a CSV that needs processing, you push it into a queue for background processing.
When working with Tasks Queues, the tasks themselves still have a deadline, but one that is larger than 60 seconds (10 minutes when automatically scaled). You should read more about deadlines in the docs to make sure you understand how to handle them, including catching the DeadlineExceededError
error so that you can save when you are up to in a CSV so that it can be resumed from that position when retried.
Caveat on catching DeadlineExceededError
Warning: The DeadlineExceededError can potentially be raised from anywhere in your program, including finally blocks, so it could leave your program in an invalid state. This can cause deadlocks or unexpected errors in threaded code (including the built-in threading library), because locks may not be released. Note that (unlike in Java) the runtime may not terminate the process, so this could cause problems for future requests to the same instance. To be safe, you should not rely on theDeadlineExceededError, and instead ensure that your requests complete well before the time limit.
If you are concerned about the above, and cannot ensure your task completes within the 10 min deadline, you have 2 options:
- Switch to a manually scaled instance which gives you are 24 hour deadline.
- Ensure your tasks saves progress and returns an error well before the 10 min deadline so that it can be resumed correctly without having to catch the error.
Related Questions
- → "failed to open stream" error when executing "migrate:make"
- → I can't do a foreign key, constraint error
- → Setting a default value on settings form return null in Octobercms
- → Eloquent Multitable query
- → "Laravel 5.1" add user and project with userId
- → Image does not go in database with file name only tmp name saved?
- → Database backup with custom code in laravel 5 and get the data upto 10 rows from per table in database
- → Trait 'IlluminateFoundationBusDispatchesJobs' not found
- → Setting the maxlength of text in an element that is displayed
- → laravel check between 2 integer from database
- → how to retrieve image from database in laravel 5.1?
- → relationship for database Column type object
- → Carousel in Laravel 4 does not show as expected