Welcome to this presentation on Parallelization. You'll learn how parallelization helps improve the speed and efficiency of your jobs by executing subJobs at the same time. With sequential processing, subJobs in a job are executed in a linear fashion. This means that the first subJob must complete its execution before the next subJob can begin. Typically, you design jobs to run sequentially when the output of one subJob is required as an input for another subJob. This is considered synchronous computing. The total time it takes for a job to complete includes the sum of the times taken by the individual subJobs it contains. In parallel processing. Multiple instances of certain sub jobs can be run in parallel so that the work is distributed among the subJobs. Although there are environmental considerations, such as application and hardware capacity, in general, this design can improve your task completion speed. In this example, the task for processing and storing data is shared by two subJobs, which reduces the execution time from 23 to 17 seconds. If there is no dependency among the output of various subJobs, you can design parallel subJobs to execute different tasks. For example, after the input data is processed, you may choose to use parallel subJobs to store, aggregate and archive the process data at the same time. In this case, the total time taken by the job is only as long as the time taken by the longest subJob. Multiple subJobs executed simultaneously is considered asynchronous computing. You can employ different methods and Talend Studio to process data concurrently, including multithreading, using the dedicated tParallelize component, the parallelization operation of database components and the auto parallelization of your data flows. You can enable multithreaded execution for Talend Jobs at the project or job level, which executes any unconnected subJobs at the same time as the main job. For greater control over which sub jobs are executed in parallel, use the dedicated tParallelize component. Connect the subJobs you'd like to run in parallel to this component and any additional subJobs can either run in parallel or synchronously. Database output components include an option to enable parallel execution. This allows the process data to be fragmented so that multiple smaller fragments are processed in parallel rather than a single larger fragment allowing more efficient use of the databases resources at runtime. Talend Studio includes an option to enable auto parallelization for the first component in your subJob. If you enable this function, for an input component Studio automatically inserts the following components where required. tPartitioner to split the input records into multiple threads. tCollector to feed the threads to the next component, tDe-Partitioner to create groups of records generated by each thread and tRecollector to take the groups of records and feed them to the next component. In this demo, we have two similar subJobs that each generate one million entries and write them to a file. As you can see here, multithreaded execution is deselected at the job level. When we execute the job, the first subJob completes before the second job starts. Now we'll enable multithreaded execution on the job tab and run the job again. This time, both jobs run at the same time, regardless of whether multithreading is enabled in the job properties. You can use different triggers for the dedicated tParallelized component to execute subJobs in parallel or sequentially. We'll use the parallelized trigger for the two top subJobs and the synchronized trigger for the third subJob. We'll run the job again, and this time the first two jobs execute at the same time and the third subJob is executed only after both parallel subJobs have completed. Alternatively, you can configure the synchronize trigger to execute as soon as the first subJob completes, as well as terminate the entire job, if any of the parallel subJobs fail. Now, we'll use the parallel execution feature of the tDBOutput component. The input file contains a large number of entries, which the tDBOutput component writes to a MySQL database. First, we run the job without enabling parallel execution for the component. We see that one million rows are written to the database in roughly 16 seconds. Next, we open the advanced settings of the tDBOutput component and select the enable parallel execution option with the number of parallel executions set to two. Notice that the designer adds an x2 tDBOutput to signify the number of parallel executions configured for the component. We run the job again, and the same job executes in less than 13 seconds this time, thanks to the parallel rights to the database. Instead of using dedicated components to run subJobs in parallel, you can automatically parallelize your jobs by right clicking the first component and selecting the option to set parallelization. Studio inserts several icons to depict the action taken on the data being processed. Initially, the rows of data are split equally into the number of threads defined on the parallelization tab of the main row, which is five. Data from the different threads are collected before being processed by the team up and tlog row components. The tMap component is configured so that every row of data is appended with the Thread_ID of the thread that processed it. This ensures that we can identify which thread processed the individual rows of data. Let's cancel out of here. Because there are five threads, this tLogRow component should generate five separate data tables. The de-partitioning section groups, the outputs of the individual threads, and the recollecting section captures the grouped results and feeds them to the next tLogRow component. This tlog row component displays a single table because all the thread outputs were combined at the recollection stage. Let's run the job. When it completes, we see in the console that the first tLogRow component printed five tables, one for each thread with the data rows processed by that thread. However, the output of log row two is a larger table that contains all the data processed by the different threads combined into one. Note that you can manually add the dedicated tPartitioner, tDe-partitioner, tCollector and tRe-collector components to your jobs if required. In this presentation, you learned how to improve the efficiency of job execution by implementing parallelization in your integration jobs. We discussed how to execute some jobs in parallel and how to use dedicated components to choose which subJobs are executed in parallel and which subJobs are executed sequentially. Finally, we discussed the auto parallelize feature of Studio that processes rows of data in different execution threads to improve job performance. For more information about Talend Studio, please check out the other presentations in this series. Thanks for watching.