Technology

Federated queries on BigQuery: The Zero ETL approach for querying distributed data

7/13/2023

In the world of data, querying information from different sources can be a complex and time-consuming and resource-intensive task. However, with the introduction of federated queries on BigQuery, a new approach called “Zero ETL” has taken hold, opening up new possibilities for analyzing distributed data.

Before the introduction of the Zero ETL approach offered by federated queries on BigQuery, the predominant approach for querying distributed data was ETL (Extract, Transform, Load).

ETL is a process that involves three main phases: extracting data from different sources, transforming them to make them consistent and suitable for analysis, and finally loading the data into a single centralized location.

However, the ETL approach has some limitations: it requires significant time and resources to implement and maintain ETL data flows. In addition, it introduces a delay between the acquisition of data and its availability for analysis, since the data must be extracted, transformed and loaded, before it can be used.

Federated queries on BigQuery allow you to access and query data from different sources without having to move or copy it to a single location.

This approach eliminates the need for ETL (Extract, Transform, Load), the traditional process of copying and transforming data into a single structure before it can be interrogated.

Instead, federated queries allow you to query directly on external data sources, providing real-time access to the most up-to-date data.

The Zero ETL approach offered by federated queries has several advantages. First, it reduces the complexity and costs associated with ETL. By eliminating the need to move and transform data, organizations can save time, resources, and storage costs.

In addition, this approach offers immediate access to real-time data. Because queries run directly on external data sources, users can analyze and obtain updated information without having to wait for the ETL process to complete.

This is especially beneficial in scenarios where working with real-time data is crucial, such as the analysis of log data or transactional data.

Another strength of the Zero ETL approach is flexibility. Federated queries provide access to a wide range of data sources, including relational databases, storage services such as Google Cloud Storage, and other customized sources.

This means that organizations can combine data from different sources without having to consolidate it in one place. This ability to query distributed data opens up new possibilities for advanced analysis and obtaining a complete view of the data.

It should be emphasized that the Zero ETL approach does not mean that ETL is completely eliminated. In some cases, it may still be necessary to perform data transformation or aggregation processes before you can effectively query it.

However, the use of federated queries significantly reduces dependence on traditional ETL, allowing organizations to obtain information faster and with less effort.

BigQuery provides a number of advanced features for federated queries, including the ability to create custom extensions for data sources and performance optimization by distributing queries across parallel computing nodes.

This ensures high performance even on large volumes of distributed data.

Conclusions

Federated queries on BigQuery offer a Zero ETL approach to querying distributed data. By eliminating the need to copy and transform data, this approach reduces the complexity, cost, and time needed to access information.

Federated queries open up new perspectives for advanced analysis and obtaining a complete overview of data, offering immediate access to real-time data and the flexibility to query different data sources.

With BigQuery, organizations can embrace the Zero ETL approach and fully exploit the potential of their distributed data.

Author

Emanuele Giallella

Data Engineer

Contacts: amministrazione-value@we-plus.eu