Designing a system at the end of the day means that we have a task it has to be accomplished. Now, we can see a task as a set of instruction that have to be executed. Just to keep it simple: what if we are going to have three multiplications that have to be executed? That means that if you are going to execute these three multiplications on a general-purpose processor we have an instruction set that we're going to use and we are fixed in number of resources that can be used to execute those three multiplications. What if we are going to execute on an ASIC? Again, is somehow easy because we are constrained by the way in which the ASIC has been implemented and we know how to map those instruction onto the resources that are available in the system. So we are starting from a problem where we are going to have a task that in our example is made of three multiplications. Now, we know that with the FPGA things are different: the FPGA is a game-changing technology and the reason is quite simple. It can be seen and the way in which we are going to show it is by drawing a chart. In this chart we are going to have here the performance on the epsilon axis. We can consider performance as the overall execution time. The x axis is the area of our chip or if you prefer the number of resources that we are going to use in order to implement our system that is going to be used to implement the task. So, here we're going to have the number of resources. Obviously, as we know, resources on FPGA are different: we can have a lot of different blocks that are going to be used. Just to keep it simple: we're going to sell just one of them, so, generally speaking are the resources used to implement the task. Now, three multiplication: what we can have is something like... we are going to execute all the three multiplication in parallel because we do not have any data dependency in between them. That means that we can execute them in parallel. In terms of resources, well, the resources of three in parallel is quite high. What about the performance? Well, if the three multiplications are going to be executed in parallel, that means the overall execution time is the time needed to provide the data to the multiplication unit to make the multiplication and then to send back the results. So, at the end of the day the overall execution time is quite low So, this is for the three multiplications in parallel. So, this is going to provide us a point basically here: a high request in terms of resources and a very super efficient architecture. Now, what if we're going to have something different and in saying this I mean that we may not be willing of using all the resources available so we're going to constrain it and instead of having the three multiplications executed in parallel we are going to have just one multiplication implemented on our device And in order to be able to complete the task, that means in time we are going to reuse the same multiplication over and over again . So, that means that we are quite low in terms of resources but we're going to be quite high in terms of overall execution time so this is the case with one multiplication unit. So, we are somehow here. And now that's exactly why I love FPGA so much: because nothing of these has been defined. We can play as much as we want with the resources according to our needs. Let's see what I'm talking about. Now, can we do better than this in terms of performance which means: can we improve the performance? Well, actually the answer is no! We cannot do better because that's the best implementation that we can have. So, we cannot go lower this line. Now, what about this part of their chart? Can we use less than these amount of resources? And the answer is again no! We cannot use less of one multiplication because otherwise we won't be able to implement our design. Now, can we do worse in terms of performance with respect to the one that we were obtaining with three multiplication unit executed in parallel, so can we move up here? And the answer is yes we can do worse! Obviously, the idea is to try to be as close as possible to this point, but because of communication overhead and so on so far, we may have solutions that are going to be somehow here. Now, can we do in terms of resources, can we move here? Well, again, the answer is yes, but again do we need to have more multiplication unit? No, so those are unfeasible areas and I would say that this is a kind of useless implementation so it is not necessary to try to explore anything that is going to be higher in terms of resources because we are not going to gain in terms of performance, which means that we are going to waste resources. So, it's not really useful. What about the performance? Can we do worse than this? Well, obviously the answer is yes and the reason is again as we were having here so we can have some communication over there and so on and so forth So, we may have something that is not going to be exactly here but it's going to be more up here. So this is a kind of the optimal situation that we may have, this is more a realistic situation. Can we do worse? Well, obviously we can try to lower it, but again it's not really going to be a great idea. So, again this is not unfeasible, it's just stupid, let's say, because we don't want to have worse performance with respect to what we can get. So, this is basically going to define what we may have. Now, with respect to this can we draw a sort of a line just to keep it simple? It is going to connect these two dots together. What we are going to have here? Well, let's see it. What is a point here? So, what we have here is basically something that is going to tell you: I'm going to play with the resources and I'm going to gain this performance. Well, the kind of thing here is: is this point below this line? Yes, so that means that this is a kind of deteriorating performance that we can expect with this number of resources. So, everything that is going to be below this line is a sort of an infeasible solution, so we're going to get rid of this space. What is going to remain is our feasible solution space. That's exactly where we're going to play in order to define our solution and why are we not just shooting for the best? Well, it's always a matter of trade-off: being the best in terms of execution time may not imply that we're going to be the best in terms of power, in terms of energy consumption, so we may not be willing to have these number of resources use to implement our tasks, to accomplish our task, but we're going to lower it. We know that we are going to lower the performance in execution time but because of an overall scenario that may be the solution that we want to have. Now, let's see what is going to happen if we're going to add into this scenario the device, the FPGA. The FPGA is a fixed area which means that a device has a fixed number of resources that can be used to implement a task. Now, what if our device is going to have a certain number of resources that is going to be here. So, here we have the device resources available. What we are observing is that with that device at the best what we can gain are those performances. That's great, but unfortunately because we want to play the Devil's Advocate what we're going to do is: okay, you know what? the performance that we are willing to obtain are somehow here. So, that's a kind of a bad situation. Why? Because the device is getting performances that are not as low as the one that we are expecting and in saying low we are meaning not as good as the one that we are expecting: remind overall execution time, the lower is the better. This is something bad because with that device we cannot achieve those performances. That's true, that's totally true, but what if we're going to use a sort of, let's say, virtual device. The virtual device is going to provide us a sort of bigger area that can be exploited. Obviously, those are the expected performances, but this is a virtual device so this is not something that we are really going to have, it's something that we are willing to have but that we have not. So, what does it mean to have a virtual device? It means that over time we are going to reconfigure our FPGA in order to have more resources available paying these in terms of reconfiguration time. So, now the game is: how big is going to be this reconfiguration time . If the reconfiguration time is going to be as big as this line, awesome, we are done because we have a virtual device that is going to imply and over it in reconfiguration but it is still going to meet our expected performances. So, those are the expected performance . That's great. What if we are not going to be able to find a way in which with the reconfiguration we are going to meet this performance? In other terms: what if our virtual device, let's say, virtual device star instead of having this is going to have this reconfiguration? Well, that's again something that we are not willing of doing because it's going to be higher, which means worse performance with respect to the one that we have So, that is exactly where the reconfiguration is coming to play. So, let's try to sum up all the things that we have seen. We have the optimal similar solution in terms of peak performance, we have a kind or sort of worst implementation in terms of peak performance but we have also a kind of a solutions place, a design exploration space that we may have in order to play not just with peak performance but also take into consideration energy, power, latency and so on and so forth. So, it's not just best, worst, but it is the kind of solutions space that we want to explore in order to have something that is going to be implemented on our device to meet the designer expected performance, the system designer expected performance. What if the expected performance are going to meet, which means: the expected performance are going to be here, higher in this graph, which means lower than the one that we can achieve with our device? That's great, that's exactly where the FPGA are going to be useful. we can play in between the worst and the optimal within the green and the purple point better than say the worse and the optimal and we can configure our device to achieve this performance. What if the performances, the expected performances, are going to be lower than the one that we can have with our device? Not really an ideal situation but we can try to make it with the reconfiguration. So, that means that we are going to extend our device and we are going to gain an advantage in saying that the reconfiguration can take place over time to try to place the resources that were not available in the previous configuration in the future. If the reconfiguration overhead is going to introduce an extension in time in the overall execution time that is still going to be below this expected performance line? Right, we are done, otherwise we are not. Now, just to make it clear: when we are saying that we are going to reconfigure the device? Well, the reconfiguration doesn't have to be a complete reconfiguration of the device. We can decide to reconfigure just a portion of the device in order to have the extension to define the virtual device that is going to be needed to gain that performance, to meet that performance. So, not necessary a complete reconfiguration. It can be a partial reconfiguration of the .FPGA