Google BigQuery Enables Global Queries

Google has launched new functionality in BigQuery that enables global queries, allowing single SQL statements to run across distributed, multi-region datasets. The feature is seen as a significant development for scaling analytics infrastructure. It is particularly relevant for organizations in regulated environments like healthcare that must manage data residency and compliance.

- Before this feature, querying data across different geographic regions required building and maintaining complex, costly, and time-consuming ETL pipelines to copy and centralize datasets into a single location. This update provides a "zero-ETL" experience by allowing a single SQL statement to directly query distributed data. - The functionality is disabled by default to prevent accidental data transfers and costs. Administrators must explicitly enable it for each project, and users require the specific `bigquery.jobs.createGlobalQuery` IAM permission to run these queries. - The cost of a global query has four components: the compute cost for subqueries in remote locations, the compute cost for the final query in the execution region, data replication charges for moving intermediate results, and the cost of temporarily storing that copied data. - In terms of system design, this capability supports modern distributed architectures like data mesh by allowing decentralized, domain-owned data to be queried in place without requiring central ingestion, respecting data residency and sovereignty rules. - For analytics engineering workflows, this simplifies dbt projects by enabling models to directly reference source tables in different regions (e.g., `ref('us_customers')` joining `ref('eu_orders')`) within a single DAG, eliminating the need for separate tools or processes to handle cross-region data movement. - This feature enhances data observability for distributed systems by providing a unified view of data health across regions. It simplifies monitoring for key pillars of observability—such as freshness, volume, and distribution—without first needing to centralize the data. - AI copilots and assistants that translate natural language to SQL can now generate more powerful queries that answer business questions across global datasets in a single step. This was previously not possible without a pre-aggregated, centralized view of the data. - There is a daily data transfer limit of 180 TB between each pair of regions for global queries, and a project can execute up to 10,000 copy jobs per day as part of these queries.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.