Hello, I'd like to go over the first lab for module two which is the multi CPU-lab. There are some simple instructions here, but let's actually look at the lab itself. So there are a number of files here based on what state you're running, what your code is or writing of it. There'll be some input files, ESPN lock files and the same thing on the output, based on what of the different CPU scripts are running. So just the lock files just used as a signal that something is being done, there's actually nothing in them. They're just used as a means for either the python scripts that will be out putting a file with CSV content. So floating point numbers, so two be compared to threads will be running and then they're going to output to see CSV. But they need to have a lock file such that your CUDA code isn't writing to them or reading from them at the same time. So it keeps things synchronized the same thing on the output. These tell the python code, don't write anything until you see this lock file, go away. And then you'll have your CUDA code so we'll go quickly over the producer consumer code, it's fairly simple. It outputs to the input ID which is the A or B, you don't need to worry about modifying this. And it basically will generate random numbers and I'll put them to the input underscore input ID. So it'll be input underscore A that CSV and then the lock file and then I'll read back once contents written to it. And then basically if you wanted to change this, what happens is you two arguments are passed, it's that input ID, in our case A or B. And then the number of elements if you want to do large or smaller datasets, so that's what run input A run and run input B for you. The CUDA codes fairly simple main logic and the colonel is just the difference between the two items in the vectors that have been passed via the input CSVs. There's some stuff here to allocate memory from the host and the device then copying back and forth. And the execution of the kernel which basically takes a number of threads per block in the amount of data and determines how many blocks need to be run. And then another point of interest is this place to hear two files again, you shouldn't have to modify this. This will just output to those outputs CSV files and then open and close the lock file and then retrieving does the same thing. It create a lock file saying I'm reading this and then it reads it in and then it will once it start reading things in or remove all the output content. And then I'll right back out and then there's some content code here to parse arrays of floats passed in as strings, that's what the input CSP files do. So it just basically tokenizes it and creates an array of floats as opposed to strings. And then the basic process by which this runs is very similar to what you've done in the past. You allocate your memory, you retrieve your data, copy it into the device, you execute the colonel, get the results back out, you place them into files and a note. I'm using unified memory access at times so there won't be copies back out necessarily from device to host, I've simplified that a little bit. And then once that's done, you do the thing where you remove the lock for the input, telling the python code that can read data. And then it's ready to output its own random values back into the input CSV, this can run sort of infinitely, that's what this while it does. It just will continually allow the python and CUDA code of multiple CPUs and GPU to communicate and execute a synchronously. Well, asynchronously within themselves, but synchronized between them based on a certain pattern. So you'll build your code run input A and B, that just keep them going. Then you'll run multiple CPU CUDA which runs this CUDA file until you do control C. And that'll be the case for all these end up being in different terminals and you just have to go to them control C to stop the. And then what you can do is you can look at the output files and that's just what its first puts out 80.36, 41 and 16.19, so you're getting different output. I know these are not necessarily synchronized any points, so the comparison between A and B does not mean if you look at this file and the python code is writing to this. That the these will represent the same value, they'll sort of be off sync. So you end that would probably be the only time, as long as everything's run and it hasn't restarted the loop with python, that things will be sink. One thing you'll note though is that I'll put A and output B should be the opposite of each other, like negative of each other. And that's because it's the difference from the perspective of A versus the perspective of B. Hopefully this has been helpful, feel free to modify things, you have a lot of options here.