This is the video six of the third course. This video is about the filename validation. A program can receive a film file name from different source. We already saw an example where the program receive filename on the command line. We can also add a program that received file name from the user input on the keyboard or using a graphical interface. We can also imagine the program that received filename through a network requests. The perfect example for that is HTTP server that receive a lot of filename and return the requested file. Often, the program can access many file it should not reveal the content or should not corrupt or should not change. If we go back to the example of our HTTP server, the program have access to a lot of file on the computer configuration file, executable file and things like that. This program have to protect this file so then if the request ask for getting the configuration file of the HTTP server, the program will not send back the configuration file or other important file it maybe a less of user or other thing. Let's imagine the situation together. We will talk about very bad FTP server. The FTP server run as root to be able to access all file on the system. If the user earlier access file accessible but want to access this file, she will be able and if [inaudible], also want to access this file, will be able to. When the user login, so we enter a name and password, the FTP server changes it's home directory of this user. But what happens if the code at the FTP server does not check for use of "../" in the filename? In this case, the user will be able to access any file on the system. Let's imagine a new sudo command, both very urgent to user a configuration file where all the user that were able to access, that are allowed to access the root privilege are listed. We will use another way so the new sudo command, we'll just check in the home folder for a file with the private key. If this private key match a public key and the configuration, the user will be able to access the root privilege so to run command using sudo. What happens if a hacker replaces the file that is supposed to contain the public key with a symbolic link to the file of the public key of another user? For sure the user that do that isn't must probability not able to access the public key of the other user. But when sudo is tried, it run has root and most probably the other user let root access the file with the public keys because they want to use the new sudo commands. This way it would be able to get the privilege, even if it was not supposed to be able. Everyone could add filename are not safe and need to be validated anyway. A good program never assumes a file name is valid, always verify the status of file open operation. It's a minimal file name validation. Often has to check for leading / when a relative name is expected, check for the the folder shortcut in relative name, has to verify the file is a symbolic link, using lstat. We'll see the example on Linux and on Windows after how to check to know if file is symbolic link or not. This is the Symlink eight example. We see here that the code is created to be able to run and on Windows and on Linux we'll at this time focus on the Linux code. Some data type, we use the function that will follow the link if the file name was the name of the link. Not the link if the filename was the name of a real filename. The file error, if we get an error when we're trying to discover the file was a link or not. The program get one argument and the argument is a filename. We as the example get the filename by the command line argument. We have the declaration of the static function. One function used to follow a link to know if it's an ink or not, and if it's a link, I'll be able to add the information about what is the link target. We also have a validation filename that would in a real program do some validation, but here it's just a function that returns to functional main. Here we verified the argument quantity, and we first validate the filename before even trying to verify if it's a link or not. We do some validations, so we don't want the program to be able to access and do some verification on a name that is clearly not valid at the start. After that we copy the first name, and look our variable, we use the strncpy function in order to don't cause a buffer overflow on the health filename variable, always good to do. If we copy the filename in the variable L-filename is because each time we will follow a link, we would replace the name in the L-filename variable with the distinction of the link. If I say each time is because a link can have a link as target. The program must be prepared to follow not only one link, but many link, it's why we got a while loop, so while the file is a link, we will follow it. We'll see later how the Follow link function work. Each time we follow the link, we validate the name of the link, as we did with the first filename we received directly from the user. At the end, we will check the return value, return by the Follow link function, and if it's was an error, we'll just report error to the user. If it was not an error, because we get out from the while loop, we know that it's a Follow-Not-A-Link, so we get the real filename, and the real file name get validated inside the loop using the validate file name function. Then here, because the program do absolutely nothing, we just displayed real name of the input file. Let's take a look to the Follow link function. The following function are the code to run on Windows and Linux, we'll clearly look to the Linux scope for now, and we'll just after this part of the video, we'll do another part on Windows. There is the Linux part of the code, so we use the lStat function. It's exactly like the stat function, except that if the file is a link, the lStat function return is the information about the link itself and not the target of the link. We ask the lStat, and we after the verify if the information returned by lStat and it's clear that the file is a link. If the file is not a link, lStat will return the information about the real file. In this case, the S-ISLINK, macro will return false and we will simply return of the function saying that. No, this is not a link. If we get a link, we want to retrieve the destination of the link, so this is what this function do, and we put the destination of the link directly in the same variable where we received the input link name. Based on the return of the retrieval in destination function, we'll return follow_link or follow_error. We have two version of diversion retrieval link destination; one for Windows and another for Linux. We'll just jump to the Linux version now. There is the function. We simply use the readlink function, that is a stone door function on Linux. This will get the destination, the target of the link, and put it in the variable. The first argument is the name of the link and the other argument is the output buffer and the output buffer size. Here, we verify that the size is not larger than the output size. It show that the variable I use is large enough for every possible target on a Linux system. But the code that is here will simply truncate the name. If the variable is not long enough the program would not work correctly in this case, but it will not cause a buffer overflow. Let's try the program. We can make it. The program I have was already compiled. I can then use it as executable. I can just give it an invalid name. It'll say that the name is not valid. I can give the name of the source file. It does not indicate that any file was linked. I already created a link in this folder. We say that I have a link that have a target name destination. I also have a file destination just next to it. If I use my program I'm just getting the name of the link. We said that if you start with the name link. Realize that the link is a symbolic link to the destination and know that the real filename is destination. We are now on Windows always with the SymLink_a example. I just put a breakpoint in the function that are not the same on Linux and on Windows so we'll be able to trace the code that is unique to the Windows version. Also I configured a Visual Studio project, I can show you, to pass as argument, the name of the source name. Where for the first test we'll test with a file that is really a file, a file that is not a link. I start. We are in FollowLink function in the part working on Windows. The first thing the function do is to use the CreateFile function. I use the eighth version just because I want to use the SQ file name. Then I need to make a note about the flag here, so we pass to the function the file flag open reparse point. This enscape to the create file to not open the target of the symbolic link if the file is a symbolic link, but to open the symbolic link itself. The file is currently open. We call another Windows function to retrieve information about the file. The function we call is the GetFileInformationByHandle and we get some information about the file and Linfo variable that is a structure. We verify one bits in the dwFileAttributes field to know if this file is a reparse point or what is the Window's name for symbolic link. It can also be done with something else than a symbolic link. It could be a mount point that is also called reparse point on Windows. This example is created especially for symbolic link, so it would not work with the mount point. We verify if the file is a symbolic link. The file is not a symbolic link just because we give it a name that is clearly a normal file, so we close and return the FOLLOW_NOT_A_LINK, so just to indicate to the rest of the program that this is not a link. I can change the property and it's gate has name Link. If I do that, it's because I already create the link. We can see it here in the DIR that we have a link that is of the type Symlink, or symbolic link. The target is the file destination in the same folder, so would identify that. It's just there. I start the program. Same thing. Call the program, call CreateFileA, always with the file flag open reparse point. Open worked. Get the information. This time the information have some bit set. Yes, it's not the regular file, it's a symbolic link. We'll now call the RetrieveLinkDestination function, the Windows version. We just skip it, skip inside this function. The function to receive the address, where to put the name of the target, and also receive a handle to the symbolic link. To retrieve the target of the symbolic link, we need to use the DeviceIoControl, and we send IoControl Windows defined. We bass a buffer. If the DeviceIoControl succeed, this structure will contain some information and will also contain in the data, the name of the target. The DeviceIoControl did not report error, so the assertions verify that the size of the return that does not go over the size of the buffer we provide. The buffer is an unsigned char array, but to be able to read the data inside it, I need to cast the pointer to this array to a pointer to the ReparseInfo which we've seen just before. I will use two field, we'll just go back to the definition. They have two field here that is of interest for us. PrintNameOfSet, so where on that data the name of the target begin and then the length of this file. They have two name in the data which is because they have one name, that is the name user is used to see, so real normal name with drive letter and folder name. The other name is a name more as the system set, so it will replace the drive letter using the disk name, something like that. For this example, we'll use the print name, and validation will work with the print name too just because the driver there is working fine. So I know that the number of letter and the start. I just divide by the size of the Unicode char, because the NameLength and the NameOfSet is given in byte, and the Data field is an array of Unicode char. In the loop here, the condition here is just to avoid causing a buffer overflow on the output variable. The loop here will just do a quick translation from Unicode to ASCII, where we'll just take the lower eight bits for each Unicode character. It's work, but it's really not the way to do in a real program. I just do this like that in order to avoid adding to the complexity needed to do a real Unicode to ASCII translation. I just skip the loop. I return the return B in skate that, yes, we retrieved the name of the link and we close the handle. I will let the program complete. It will follow the link again. The second time it will open the real file, and if we look at the file name, we saw that the name is the file the target name, so the filename destination. It's not a link so we would return following the link. If we look at what happened on the console, we saw that, yes is the link, the target of the link was the file destination, and we get the final message and it's getting the real input filename. On windows, user must have administrator privilege to be able to create a symbolic link. Because of that, windows application usually trust symbolic link and don't do validation as shown in the sim link example. The fact that windows require administrator privilege to be able to create a symbolic link is a good indication about how dangerous can be Linux symbolic link if the application don't do the correct validation. A program can use the solution we shown in the last example to validate if a file is or not a symbolic link, but they can also use a simpler way on Linux, the function open passing the flag's name O_ NOFOLLOW, and if this flag is parsed to the open function, therefore, the open function will not follow symbolic link, so they will fail the open operation if the file to open is a symbolic link. It can be a very easy way to protect a program against attack using a symbolic link. When those that create file also have a flag, the same flag we used in the lesson example. The flag is named FILE_FLAG_OPEN_REPARSE_POINT and caused the create file to not follow the link, but it would not fail the operation, it will open the symbolic link rather. This is a little bit less simpler than Linux and we still have to do other step to know if the open file is the file we was trying to open our symbolic links. It came back to exactly what we see in the sim link underscore eight example. Some program need a lot more when they're working with file. Example we saw in situation number 1, a very bad FTP server. If we think about a real secure FTP server, this one will run as the user that requests. After the FTP server accept the connection and validate the username and password, it will simply change the privilege of the user that run the program to make it run as the user that request the file or request the services. This way the operating system itself would be able to validate all the permission as requested. It's a way better approach than trying to validate by itself all the permission. Many server will work the same way. In some case HTTP server will also impersonate, so change that privilege but on a tread basis. To be able to have a tread that run with the same privilege as the user that's on the requests. That's really if the HTTP server is able to know the user who did their request, and is able to validate it. Also a program can also retrieve the permission of a file before accessing it. After receiving a file name, a program can use C function to access the permission set on this file and verify if the user is allowed to read or write this file and these technique are outside the scope of the secure C ++ learning path, but I doubt it was very important to talk about this. This is the end of this video. The next video is about seconds.