Ad

BigQuery / Shopify Order Data Query

My order import from Shopify creates a new entry in BigQuery for every order if something has changed since the last import, this way you could see how the order properties change over time and not just the last import state. This also creates multiple entries in the table for the same order where the only unique part is the _sdc_batched_at and sdc_sequence values. I'm seeing sometimes as many as 30 entries of the same order.

Table schema...

order:
  order_number: Int
  fulfillments: Array
  _sdc_batched_at: DateTime
  _sdc_sequence: Int

What I've done...

I've created a partitioned table that essentially boils down to a subset of entries between a given date range and where fulfillments > 0

Initial Query to reduce dataset...

with orders as (
    select order_number, fulfillments, _sdc_batched_at, _sdc_sequence
    from `project.shopify.orders`
    where created_at between '2018-11-08' and '2018-11-15'
    and ARRAY_LENGTH(fulfillments) > 0
)

The problem... I'm running into issues trying to use distinct or group by since fulfillments is an array and that throws things off. How do I write a query that will only return the latest order entry by _sdc_batched_at value?

Sample Data

    [
    {
        "order_number": "5545",
        "fulfillments": [
        {
            "tracking_url": null,
            "id": "617029074993",
            "tracking_company": "ups",
            "tracking_number": "Z1234567890"
        }
        ],
        "_sdc_batched_at": "2018-11-10 02:46:21.270 UTC",
        "_sdc_sequence": "1541817507934"
    },
    {
        "order_number": "5545",
        "fulfillments": [
        {
            "tracking_url": null,
            "id": "617029074993",
            "tracking_company": "ups",
            "tracking_number": "Z1234567890"
        }
        ],
        "_sdc_batched_at": "2018-11-10 03:16:16.606 UTC",
        "_sdc_sequence": "1541819139795"
    },
    {
        "order_number": "5545",
        "fulfillments": [
        {
            "tracking_url": null,
            "id": "617029074993",
            "tracking_company": "ups",
            "tracking_number": "Z1234567890"
        }
        ],
        "_sdc_batched_at": "2018-11-10 03:46:12.704 UTC",
        "_sdc_sequence": "1541821046476"
    },
    {
        "order_number": "5545",
        "fulfillments": [
        {
            "tracking_url": null,
            "id": "617029074993",
            "tracking_company": "ups",
            "tracking_number": "Z1234567890"
        }
        ],
        "_sdc_batched_at": "2018-11-10 04:16:07.952 UTC",
        "_sdc_sequence": "1541822755508"
    },
    {
        "order_number": "2212",
        "fulfillments": [
            {
                "tracking_url": null,
                "id": "617029074993",
                "tracking_company": "ups",
                "tracking_number": "Z1234567890"
            }
        ],
        "_sdc_batched_at": "2018-11-10 03:46:12.704 UTC",
        "_sdc_sequence": "1541821046476"
    },
    {
        "order_number": "2212",
        "fulfillments": [
            {
                "tracking_url": null,
                "id": "617029074993",
                "tracking_company": "ups",
                "tracking_number": "Z1234567890"
            }
        ],
        "_sdc_batched_at": "2018-11-10 04:1:07.952 UTC",
        "_sdc_sequence": "1541822755508"
    }
    ]

Expected Result

Return only the latest entry by _sdc_batched_at value

{
    "order_number": "5545",
    "fulfillments": [
    {
        "tracking_url": null,
        "id": "617029074993",
        "tracking_company": "ups",
        "tracking_number": "Z1234567890"
    }
    ],
    "_sdc_batched_at": "2018-11-10 04:16:07.952 UTC",
    "_sdc_sequence": "1541822755508"
},
{
    "order_number": "2212",
    "fulfillments": [
        {
            "tracking_url": null,
            "id": "617029074993",
            "tracking_company": "ups",
            "tracking_number": "Z1234567890"
        }
    ],
    "_sdc_batched_at": "2018-11-10 04:1:07.952 UTC",
    "_sdc_sequence": "1541822755508"
}
Ad

Answer

Below is for BigQuery Standard SQL

SELECT AS VALUE ARRAY_AGG(t ORDER BY _sdc_batched_at DESC LIMIT 1)[OFFSET(0)] 
FROM `project.shopify.orders` t
GROUP BY order_number   

obviously you can add all needed for your WHERE clause

Ad
source: stackoverflow.com
Ad