Hi folks. Welcome back to our Module 2, data manipulation. A lot of times in data science when we're dealing with data, we should obtain a very throughout understanding of our data. A lot of times we have to modify, we have to drop, we have to clean, we have to process our data so that we can feed the cleaned data for analysis. The data manipulation is important and is the skill we have to learn very well. The NumPy and Pandas will be the two packages we need to learn in this module. From this module, we're going to learn those packages directly from Google Colab because those packages can only be learned when you can run the code, when you can modify the code, and when you can make errors if you did something incorrectly so that you can remember how to fix it and how to do that next time. I'm going to upload this iPython Notebook so you can either run them together with me or you can run by yourself later on. The first package we're going to learn is NumPy. Basically NumPy is for numerical Python package. It is specified to deal with numbers, and in a very efficient, both in time and efficient in the space as well. The NumPy is the foundation for many other widely used packages such as Pandas, such as Matplotlib, and etc. This will be the most critical package you need to learn for data size. Let's get started. Like we have learned, in order to do a package, we need to import it. Here we have something new, which is as. In previous module, we learned we just import a random and we just call the random. But however, a lot of times when we try to save the effort of typing or to make the name shorter, for convenience, we can give it a nickname so that we can refer to this nickname as it is. For example, we really don't want to type NumPy or over our program, so we can just import NumPy as a nickname, as np in convention. In the field when you see np, don't think about something not possible, but np stands for NumPy as a very useful Python package. We imported NumPy and we give it a new name as np. Whenever we want to call the NumPy, we just use np. It is similar like a variable, but now this variable is assigned with a package which is NumPy. Let's move on to the basic arrays. The arrays, you can think about the things 0, 1, 2, 3, etc as a list. Basically the list and arrays or a sequential data structure has the numbers one by one sits together like someone sit on the bench, and etc. That will be a very useful data structure to store the numbers. Just put things on the slots one by one, and then we can use it. In this part, we're going to learn very simple conversion from a Python list which has wrapped the numbers using the brackets to a NumPy array. There are some difference between the NumPy array and the Python list, both in terms of the type of data, and the size of data, and etc. We're not going to cover those in this class, we just need to learn how to use them, actually without knowing what is inside. We are going to focus on the application side. But for your information, if you're interested in the difference between NumPy array and the Python list, you can simply Google and find the result by yourself. In order to create a NumPy array, we need to have a list first. This list will be a list containing these numbers, either integer or float. Let's use an integer as an example. Here, we have an integer, 0, 1, 2, 3 on a list. Then we just feed this list to the np.array. Np.array is a built in function in NumPy, which will convert legitimate something to a np NumPy array. This NumPy array with 0, 1, 2, 3 list, we're simply create an array with 0, 1, 2, 3 as it's elements. This will be the result, and this is how we can easily convert a list into a NumPy array, and then we can further process our data and manipulate some data. We can assign the returned NumPy array to a variable called a. Then we can tap out, print out the top of the a, which is a numpy.ndarray. Now we know array is a sequence of numbers, a collection of numbers in order sequentially. What is unnoticed d, basically NumPy is the name of the library. We all know that, and np is actually is undimensional array. Because NumPy, we are going to learn it later, NumPy is specified to deal with multi-dimensional data like a matrix neck, a matrix as 2D and you can have 3D or 4D or multipart dimension as well. That is where NumPy will be used. We can print out the shape of this particular a or this particular NumPy and ndarray. The shape is four. That's it, because we have only one dimension. It is a simple list with four elements. We have a four as the first dimension, the size of the first dimension. We have only one because there is only one value in this tuple. To access the elements in the NumPy array, we can use the index as we used in our list. The a1, a0, a3, and a4 will be the index. The first position is a0. Secondary position is one. Third position is two, and fourth position will be 3. I purposely made 0, 1, 2, 3 here is to remind you, in Python, we always start with zero as the first index. The first element will be index zero, which is different than r. It is fine if we call an index within the range. In this case, we have the minimum index as zero and maximum index as 0, 1, 2, 3. The fourth element, which is the last element is index three. If you are calling a index out of range, out of this zero and three, you will be getting an error because you do not have index four which is the fifth element in your list and in your array. You are going to get this error. If you see this error, don't be panicked You may just use a non-existing index. We can also do another round, another way of looking at the list of number. 0,1,2,3 is from the left to the right. If we look from the right to the left, then we will get the rightmost ones, second to the rightmost and etc. We can have the negative 1 as the rightmost, which is three. Negative 1 will be same as positive 3 because the rightmost is the fourth from the left. Then you can have negative 4, and that will be our first element because that is from the right, we can fall, which will be our first elements. Again, if you try to use a non-existing index, no matter it is from the left or to the right or from right to the left, you are going to get an error. The next part is about slicing. We learned an array is a connection of numbers. How about if we just want to get a subset of these numbers or a piece of these numbers? In order to do that, we can easily use A and the column, this zero indicates from which index we want to start the slicing. The second one here, we do not have a new number here, but that will be the index to where we want to cut off of the sub-string. This zero column and empty, basically means we start from Index 0 to the end of the array or in other words, we want the complete array, so that's the complete array will be 0,1,2,3, exactly same as we have for A. If we change the first number to one, which means that we want to start from the Index 1, that will be our second element, so we are going to get a smaller set of the number or smaller piece of the list. We do not have the zero anymore, but we do have 1,2,3. Another option is we can actually do not have the first number filled, which means we start from the beginning and we can specify the ending point. The one thing you have to keep in mind is this ending point will be exclusive, which means that if you start from the beginning to the one, this will exclude the Index 1 element. Basically you will only get the first element in the array, that is your zero. Then we can also play with the negative numbers as well. For here, we're going to get the array 3, is because we simply go from the right side to the left side. For here we're going to get a piece from the right side to the left-hand side, and here in the minus 1, is the right most, or this minus 1 is the same as three we learned before. Then we can also make another choice, that is to specify the step value as we learned in the range. In this case, we are going to consider the whole list from the beginning, because we do not have any value here, and to the end, because we do not have any value here either. However, we have the two, so we're going to go from the beginning and skip every other elements and get a sub list that will be zero and a two. For here, we're going to start from Index 1, the second element, and then we skip every other value, so we have one and the three. This will be a very basic introduction for the NumPy and in the rest, we are going to learn more about the array types, dimensions, shapes, indexing and the statistics, and also linear algebra and useful functions. I'll see you later.