If you have been in a conversation on machine learning, you have probably heard terms like feature, sample, and variable. We will be defining some of those terms in this lecture. After this video, you will be able to describe what a feature is, and how it relates to a sample. Name some alternative terms for feature. Summarize how a categorical feature differs from a numerical feature. Before we delve into the methods for processing and analyzing data, let's first start with defining some term used to describe data, starting with sample and variable. A sample is an instance or example of an entity in your data. This is typically a row in your dataset. This figure shows part of a dataset of values related to weather. Each row is a sample representing weather data for particular day. The table in the figure shows four samples of weather data, each for different day. In this table, each sample has five values associated with it. These values are different information pieces about the sample such as the sample ID, sample date, minimum temperature, maximum temperature, and rainfall on that day. We call these different values variables of the sample. There are many names for sample and variable. Some other terms for sample that you might hear in a machine learning context include record, example, row, instance and observation. It is helpful to realize that all of these terms mean the same thing in machine learning. That is, they all refer to a specific example of an entity in your dataset. There are also many names for the term variable, such as feature, column, dimension, attribute, and field. All of these terms refer to specific characteristics for each sample in your dataset. An important point to emphasize about variable is that, they are additional values with a data type. Each variable has a data type associated with it. The most common data types are numeric and categorical. There are other data types as well such as string and date but we will focus on two of the more common data types, numeric and categorical. As the name implies, numeric variables are variables that take on number values. Numeric variables can be measured, and their values can be sorted in some way. Note that a numeric variable can take on just integer values or be continuous valued. It can also have just positive numbers, negative numbers or both. Let's go over some examples of various numeric variables. A person's height is a positive, continuous valued number. The score in an exam is a positive number that range between zero and a 100%. The number of transactions per hour is a positive integer, whereas the change in a stock price can be either positive or negative. A variable with labels, names, or categories for values instead of numbers are called categorical variables. For example a variable that describes the color of an item, such as the color of a car, can have values such as red, silver, blue, white and black. These are non-numeric values that describes some quality or characteristic of an entity. These values can be thought of as names or labels that can be sorted into categories. Therefore, categorical variables are also referred to as qualitative variables, or nominal variables. Some examples of categorical variables are gender, marital status, type of customer, for example, teenager, adult, senior. Product categories, for example, electronics, kitchen, bathroom and color of an item. To summarize, a sample is an instance or example of an entity in your data. A variable captures a specific characteristic of each entity. So a sample has many variables to describe it. Data from real applications are often multidimensional, meaning that there are many dimensions or variables describing each sample. Each variable has a data type associated with it, the most common data types are numeric and categorical. Note that there are many terms to describe these data related concepts.