In the previous section of this course, you explored Dataflow and Pub/Sub, Google Cloud's solutions to processing streaming data. Now let’s focus your attention on BigQuery. You’ll begin by exploring BigQuery’s two main services, storage and analytics, and then get a demonstration of the BigQuery user interface. After that, you’ll see how BigQuery ML provides a data-to-AI lifecycle all within one place. You’ll also learn about BigQuery ML project phases, as well as key commands. Finally, you’ll get hands-on practice using BigQuery ML to build a custom ML model. Let’s get started. BigQuery is a fully managed data warehouse. A data warehouse is a large store, containing terabytes and petabytes of data gathered from a wide range of sources within an organization, that's used to guide management decisions. At this point, it’s useful to consider what the main difference is between a data warehouse and a data lake. A data lake is just a pool of raw, unorganized, and unclassified data, which has no specified purpose. A data warehouse on the other hand, contains structured and organized data, which can be used for advanced querying. Being fully managed means that BigQuery takes care of the underlying infrastructure, so you can focus on using SQL queries to answer business questions–without worrying about deployment, scalability, and security. Let’s look at some of the key features of BigQuery. BigQuery provides two services in one: storage plus analytics. It’s a place to store petabytes of data. For reference, 1 petabyte is equivalent to 11,000 movies at 4k quality. BigQuery is also a place to analyze data, with built-in features like machine learning, geospatial analysis, and business intelligence, which we will look at a bit later on. BigQuery is a fully managed serverless solution, meaning that you don’t need to worry about provisioning any resources or managing servers in the backend but only focus on using SQL queries to answer your organization's questions in the frontend. If you’ve never written SQL before, don’t worry. This course provides resources and labs to help. BigQuery has a flexible pay-as-you-go pricing model where you pay for the number of bytes of data your query processes and for any permanent table storage. If you prefer to have a fixed bill every month, you can also subscribe to flat-rate pricing where you have a reserved amount of resources for use. Data in BigQuery is encrypted at rest by default without any action required from a customer. By encryption at rest, we mean encryption used to protect data that is stored on a disk, including solid-state drives, or backup media. BigQuery has built-in machine learning features so you can write ML models directly in BigQuery using SQL. Also, if you decide to use other professional tools—such as Vertex AI from Google Cloud—to train your ML models, you can export datasets from BigQuery directly into Vertex AI for a seamless integration across the data-to-AI lifecycle. So what does a typical data warehouse solution architecture look like? The input data can be either real-time or batch data. If you recall from the last module when we discussed the four challenges of big data, in modern organizations the data can be in any format (variety), any size (volume), any speed (velocity), and possibly inaccurate (veracity). If it's streaming data, which can be either structured or unstructured, high speed, and large volume, Pub/Sub is needed to digest the data. If it’s batch data, it can be directly uploaded to Cloud Storage. After that, both pipelines lead to Dataflow to process the data. That’s the place we ETL – extract, transform, and load – the data if needed. BigQuery sits in the middle to link data processes using Dataflow and data access through analytics, AI, and ML tools. The job of the analytics engine of BigQuery at the end of a data pipeline is to ingest all the processed data after ETL, store and analyze it, and possibly output it for further use such as data visualization and machine learning. BigQuery outputs usually feed into two buckets: business intelligence tools and AI/ML tools. If you’re a business analyst or data analyst, you can connect to visualization tools like Looker, Data Studio, Tableau, or other BI tools. If you prefer to work in spreadsheets, you can query both small or large BigQuery datasets directly from Google Sheets and even perform common operations like pivot tables. Alternatively if you’re a data scientist or machine learning engineer, you can directly call the data from BigQuery through AutoML or Workbench. These AI/ML tools are part of Vertex AI, Google's unified ML platform. BigQuery is like a common staging area for data analytics workloads. When your data is there, business analysts, BI developers, data scientists, and machine learning engineers can be granted access to your data for their own insights.