Hello, everyone. Welcome to our next presentation on the Structured Query Language. This is our first lesson within Module number 3, and we're going to be talking about Group functions. Now in module 2, we covered the Select statement and how I can modify a select statement with a where clause to reduce the number of rows that I'm bringing in for my input table. I can use the distinct keyword to reduce or eliminate the duplicate rows in my answer set. I can include an Order By on my query to cause SQL to sort my answer set. I can limit the number of rows that are shown in my answer set with the limit command. We learned about dates and date functions in Module 2, and we wrapped up talking about the topic of nulls and what does it mean to have a null value in a column? That's kind of a summary of what we did in Module 2. In Module 3, we're going to be talking about Group Functions and the Group By which kind of goes with Group Functions and the having clause. That's what we're going to cover in this next set of lectures in module number 3. We begin by talking about Group Functions. A typical query that contains a Group Function is going to start with a table and I query that table and it's going to select some rows and columns to appear in my answer set. However, when I use a function, SQL is going to create an interim answer set because it's not done yet. It's going to process the first part of my query, which could be a Select statement with a WHERE clause, create a temporary answer set, and then go back and do further processing against that answer set. Specifically with a Group Function, where it processes the Group Function against the columns and rows in my interim answer set and then creates a final answer set. These functions process against the interim answer set and produces a final answer set that contains only the final result after the function. There's an important concept that you need to remember when you're using GROUP functions and that is, if I'm using a Group Function and a WHERE clause, the WHERE clause reduces the number of rows that appear in my interim answer set. If I want a subset of rows from my input table to show up, in my final answer, I use a WHERE clause, and then that will select a subset of rows that are passed into the interim answer set and then the GROUP function does its calculation against the interim answer set after the WHERE clause has been processed. That's just a concept that I want you to hold that thought and think about that while we're going through the next few slides. There's going to come a point where you may decide to select a subset of rows and it's important to understand, you can select that subset of rows with a WHERE clause against the input table or you can select a subset of rows from your interim answer set that goes into your final answer set and it's important for you to realize the distinction between these two. I'll talk more on this later. There are five SQL functions, and here they are. I can do a query with a SUM function, which is going to add up all the values in a column, all the values that are not null. All of these Group functions ignore nulls. I can do an AVG Group Function which will calculate the average of all the non-null values in a column I can count and you've seen that before, I showed you the count Group Function prior. It can count rows or non-null values as it goes through a column. I can do a mean group function, which will go across an entire column of values and return the lowest non-null value in a column and the max is the converse. It will go through all the rows and find the highest value in a column. Those are my five group functions. We're going to look at each of them and do some examples with each of them so you understand how they work. The sum and the average group functions only work with numeric columns. So if you've got a text column with character data in it you can't do sums and averages. That makes sense, you would only want to do that with numbers. However, the MIN and MAX functions, they can be used with any type of column. If I've got a numeric column, the MIN and MAX will find the lowest or the highest number value in that column. If I've got a text column, the MIN and MAX will find the highest or lowest character value in that column according to the sort sequence or collating sequence of the computer that you're using. If it's numeric data, it will find the lowest and highest numbers in that column. If it's character data, it will find the lowest or highest character value, starting with the first character, and then the second and then the third. Look at them, basically, in alphabetical order or the collating sequence of that character data. The count function can be used with any datatype in the column. It doesn't matter. It will count non-null values within the column or if I use an asterisk, it'll just count rows in the table. We're going to look at examples of each of these. This first one, you've seen a query like this before where I can just do a count against the employees table, a COUNT star, and it's going to count how many rows are in that table. I think we used an example like this in the prior lesson. You can see it went to the employees table and counted. It does a count star, which basically, it's counting rows that are not null, and there aren't any rows that have all nulls in all columns, so it's really just counting all the rows in the table. You can see it came up with the solution of there's nine rows in the employee table. The next query is going to do Count Distinct. This is an example that we looked at in a prior lesson. It's going to find out how many unique countries are in the customers table. Some of you may remember the answer to that even though there's 87 customers, there are 21 unique countries represented in that customer table. Next query, we're going to do a sum and we're going to look at a column called unitsinstock. In the North wins database, units in stock is in the products table, and it's a number that represents how many of that product do we have in inventory? We're going to select a sum of units and stock, call it inventory, and it's coming from the products table. Let's run this query and let's see how much stuff we've got in inventory. That comes back and shows that we've got a total number of units in stock across all the different products in our products table is 3,119. We're going to take a look at the MAX group function and how it works. It's going to look at that products table and look at the unit price. Every product has a unit price. What does one of those products cost? It's going to find the highest priced product in our product table. And as we run this query, we can see that the highest priced product has a price of $263.50. We can do the same thing with the MIN and find the lowest priced product in that product table. That is a product with a cost of $2.50. That just gives you a quick glimpse of all the different group functions and how they work. One more that we can look at is the average. Each product in the product table has a price. This is going to tell us the average unit price across all those products, in the product table. The average unit price is $28 and a long decimal number, digits to the right of the decimal place. You might look at that and think , "That's never meaningful, we're down to very tiny fractions of a cent." Well, when you calculate an average with a column that's defined as a double-precision, it's going to carry a lot of decimal places. But that's not very meaningful as we're looking at this. I want to show you how we can make that look more meaningful to the people that we're running queries for as we do our analytics. There is a function within Postgres called, to_ character, to_char, sometimes people call it that. I can take this average unit price that it's calculating with that long string of numbers to the right of the decimal place, and I can convert it to character. I take the average unit price, run it through this to_char function, and after the column name, there's a comma, and then there's a series of text characters to indicate how do I want this thing to look when I convert it to a character. Keep in mind inside the database, this is stored as a double-precision number. I want to convert it to a character representation of that so that it displays nicely in the answer set of my query. I'm telling it that I want three digits followed by a comma, followed by three digits, followed by a decimal point, followed by two digits. I can run this query, and we'll see that the output comes out a lot prettier. We'll see that when I do this query, it gets rid of all those extra numbers to the right of the decimal. But it actually rounds the result. It doesn't truncate the result, it rounds the result to the 100th of a dollar value here. See that it rounded the result. I also want to show you something interesting about the group function that does account. You'll see this one is counting rows. When I say, select count star from the employees table, it's going to come back and tell me, "You've got nine employees." I can do a select count employee ID. Every row in the employee table has a primary key of employee ID, and I can count those. It's going to look through there and find every non-null employee ID and count those. There's a little bit of a difference between these two queries. They're going to yield the same result, which is nine. If I count rows, there's nine rows in that table. If I count not-null primary keys, there's nine of them. You cannot have a null primary key. These queries are similar, but it's important to understand the difference between a count star, counting rows, and counting a particular column. Because for that column, it's just going to count non-null values. Now, there is an important rule in place when you're using group functions that the level of the group function must match the level of the detail in your select statement. Now that statement probably deserves some explanation. So let me explain to you. In this slide, you can see I've got a query here that is selecting from the order details table. If you can imagine North wins business. They sell things to customers and customers place orders, and one order may contain many detail lines. So this is looking at the detailed data from the order details table. I'm selecting the order ID, the product ID, the unit price, and the quantity on a given detail line on an order. You can see what that data is going to look like. Where I've got Order ID, and you can see that there are three products on this first order. The order is 10248, and this first product is product ID 11 with a unit price of 14 and a quantity of 12, and I could calculate unit price times quantity to see the value of that line on this order. So this orders got three different products on it, and this order 10249 has two different products on it, and 10250 has three. This is the query that I can run that shows me the detail that's on that order. Now there's no group function of group-buying here. If I add a group function to this query, like this, I can select the order ID and the product ID, and a sum of unit price and a sum of quantity that will yield one row in my answer set, and it's going to show me the order ID of the first row and the product ID of the first row, but the sum is across all the rows. I want you to see that there's a mismatch here. There's a mismatch. This sum is across all the rows for these four orders. This sum is across all the rows for these four orders. Yet when I structure my query like this, selecting the order ID, the product ID, and then the sum, these sums are across many rows, but this order ID and product ID is across one row. So that's a mismatch where these sums don't go with this order ID and this product ID. This query is not legitimate, because it violates this rule, it violates the rule that the level of the group function must match the level of the detail in my select. In this case, the detail in my select is order ID and product ID. But I'm summing across all orders and all products within this subset of orders. So you can see the mismatch there. If I run this query, I will actually get an error that says this, it's a bad query. In this case, Postgres didn't process my query. Look at the error message I get that the column orderdetails.orderid must appear in the GROUP BY clause or be used in an aggregate function. Now, that's a lot of words and I will further explain what they really mean. But basically, to dive deeper into this error and what this rule really means, we need to take a look at the GROUP BY. I've broken this up into two presentations. One about group functions and the other about the GROUP BY, and we are going to move on in our next lesson to talk about the GROUP BY. That's it for now. I'll see you in the next video. Thank you.