Program running in the parallel mode

Now let's discuss how to run the created template in the parallel mode.

In fact, when you compile and run program from the IDE is is running as a single process. To run the program in parallel mode, a "supervising" program is required, which, firstly, provides running the required number of instances of the program and, secondly, intercepts messages which are sent by these instances ("processes") and transfers them to their destinations.

It should be noted that the instances of a "real" (not training) parallel programs are usually run on different computers connected in the network (cluster) or on supercomputers with a large number of processors, because it provides the maximum efficiency of a parallel program. Of course, to check correctness the training parallel program it is enough to run it on a single local computer. However, the supervising program is required in this case, too.

The Programming Taskbook uses the special application from the MPICH system as a supervising program. In the MPICH 1.2.5, it is named MPIRun.exe and is contained in the MPICH\mpd\bin directory; in the MPICH2 1.3, it is named mpiexec.exe and is contained in the MPICH2\bin directory. To run the executable file in parallel mode, it is enough to run the appropriate supervising program (MPIRun.exe or mpiexec.exe) passing it the executable full name, the required number of processes (i. e., running instances of the program), and some additional parameters. Since while debugging the program such launches will have to be performed repeatedly, it is desirable to create a batch file (bat-file) containing the call of the supervising program with all necessary parameters. However, the process of testing the parallel program will not be very convenient: every time after making the required corrections to the program, you should recompile it, then leave the IDE and run the bat-file. After checking the results of the program execution, you should return to the IDE again to make subsequent changes to the program, and so on.

In order to simplify the process of launching the program in the parallel mode, Programming Taskbook performs automatically many of required actions. Let's demonstrate this with the example of our project for solving the MPI1Proc2 task, which is ready for launch. Press the [F5] key, the program will be compiled and launched. As a result, a console window appears on the screen:

After several lines of information message in this window, a command line is displayed that allows the ptprj.exe program to run in parallel mode under the control of mpiexec.exe. The number "5", specified before the full name of the ptprj.exe file, means that the exe-file will be launched in five instances. The -nopopup_debug option disables the output of error messages in a separate window (since these messages will eventually be displayed in the Programming Taskbook window), the parameter -localonly ensures that all processes of the parallel program run on the local computer.

Remark 1. If a parallel program named ptprj.exe was not launched earlier, a window "Windows Security Alert" might appear on the screen warning that some features of the program were blocked by the Windows firewall. In this case, you should click on the "Allow access" button in this window.

Immediately after the appearance of the console window, the Programming Taskbook window will be displayed:

In our case, the program running is considered as acquaintance running because the processes of our program do not perform input-output operations.

To finish the program we should close the Programming Taskbook window by clicking the "Exit (Esc)" button or press [Esc] or [F5]. After closing the Programming Taskbook window the console window also will be closed and we will return into the IDE.

Thus, we can run our program in the parallel mode from the IDE. This is due to a rather complicated mechanism, which is implemented in the Programming Taskbook core. Describe this mechanism briefly.

The program that had been launched from the IDE does not solve the task and is running in the non-parallel mode. This instance of the program creates and run the batch file $pt_run$.bat, which contains call of the mpiexec.exe application with the requires parameters. The mpiexec.exe application runs, in turn, the required number of program instances in the parallel mode, and these processes try to solve the task. In particular, the Programming Taskbook sends input data to all processes and receives obtained results from them.

It should be noted that the Programming Taskbook window is displayed by the master process of the parallel program, while all slave processes (as well as the first instance of the program, which creates and runs the batch file) are running in the "invisible" mode.

After closing the Programming Taskbook window all processes finish, then the batch file finishes, and at last the first instance of the program finishes and we return into the IDE.

Remark 2. The "start" instance of the program performs one additional action: it automatically unloads all "hang" processes of the parallel program from memory in the case of program deadlock. If the Programming Taskbook window does not appear on the screen after 10-15 seconds, it means that the parallel program is in deadlock (or in the infinite loop). In this situation it is necessary to close the console window by pressing several times the key combination [Ctrl]+[C] or [Ctrl]+[Break]. If "start" instance of the program will find that the batch file has finished, but some processes of a parallel program are still in memory, then it automatically unloads all these processes. This action is important because the "hang" processes make it impossible to recompile the program.

Remark 3. Sometimes, only some slave processes do not respond. In this case, the master process usually displays its window and informs about these slave processes (and also displays the results from those slave processes that do not hang). Such information may be useful for localizing and correcting program errors.

The master process considers the slave process to be hang if it does not receive a response from this slave process for a certain period of time (proportional to the number of processes). By default this interval is 3 * K seconds, where K is the number of processes (the corresponding information is displayed in the console window). In some very rare cases, when performing tasks on computers with low speed, some slave processes may not be able to complete the calculations for the given waiting time, and the master process will consider them hanging, although the solution of the task is quite correct. In such cases, one can change the timeout by using the pup-up menu items of the Programming Taskbook window named "Increase the response time for slave processes" and "Reduce the response time for slave processes".

Last revised:
01.01.2025