Programming Taskbook

Electronic problem book on programming

Main

PT for MPI-2

Task groups

MPI9Matr

Solution example

PT for MPI-2 | Task groups | MPI9Matr

Parallel matrix algorithms

All numeric data in tasks are integers. Matrices should be input and output by rows. Files with the matrix elements also contain them in a row-major order.

The number of processes in tasks related to the band algorithms (MPI9Matr2–MPI9Matr20) does not exceed 5. The number of processes in tasks related to the block algorithms (MPI9Matr21–MPI9Matr44) does not exceed 16.

Use the char[12] array to store the file name, use the MPI_Bcast function with the MPI_CHAR datatype parameter to send the file name from the master process to the slave processes.

The program templates for each task already contain descriptions of integer variables for storing the numeric data mentioned in tasks (in particular, the matrix sizes), pointers to arrays for storing the matrices themselves, as well as variables of the MPI_Datatype and MPI_Comm type. These variables should be used in all the functions that you need to implement when solving tasks. All names of variables correspond to the notations used in the task formulations. For arrays associated with bands or blocks of matrices, the names a, b, c, t are used; for arrays associated with the initial matrices A, B, and their resulting product C, the names with the underline are used (namely, a_, b_, c_).

Tasks with the file input-output (MPI9Matr8–MPI9Matr10, MPI9Matr18–MPI9Matr20, MPI9Matr29–MPI9Matr31, MPI9Matr42–MPI9Matr44) require the use of the MPI-2 library. To solve the other tasks in this group, you can use any version of MPI.

Non-parallel matrix multiplication algorithm

MPI9Matr1°. Integers M, P, Q, a matrix A of the size M × P, and a matrix B of the size P × Q are given in the master process. Find and output a M × Q matrix C that is the product of the matrices A and B.

The formula for calculating the elements of the matrix C under the assumption that the rows and columns of all matrices are numbered from 0 is as follows: C_I,J = A_I,0·B_0,J + A_I,1·B_1,J + … + A_I,P-1·B_P-1,J, where I = 0, …, M − 1, J = 0, …, Q − 1.

To store the matrices A, B, C, use one-dimensional arrays of size M·P, P·Q, and M·Q placing elements of matrices in a row-major order (that is, the matrix element with indices I and J will be stored in the element of the corresponding array with the index I·N + J, where N is the number of columns of the matrix). The slave processes are not used in this task.

Band algorithm 1 (horizontal bands)

MPI9Matr2°. Integers M, P, Q, a matrix A of the size M × P, and a matrix B of the size P × Q are given in the master process. In the first variant of the band algorithm of matrix multiplication, each matrix multiplier is divided into K horizontal bands, where K is the number of processes (hereinafter bands are distributed by processes and used to calculate a part of the total matrix product in each process).

The band of the matrix A contains N_A rows, the band of the matrix B contains N_B rows. The numbers N_A and N_B are calculated as follows: N_A = ceil(M/K), N_B = ceil(P/K), where the operation "/" means the division of real numbers and the function ceil performs rounding up. If the matrix contains insufficient number of rows to fill the last band, then the zero-valued rows should be added to this band.

Add, if necessary, the zero-valued rows to the initial matrices, save them in one-dimensional arrays in the master process, and then send the matrix bands from these arrays to all processes as follows: a band with the index R is sent to the process of rank R (R = 0, 1, …, K − 1), all the bands A_R are of the size N_A × P, all the bands B_R are of the size N_B × Q. In addition, create a band C_R in each process to store the part of the matrix product C = AB which will be calculated in this process. Each band C_R is of the size N_A × Q and is filled with zero-valued elements.

Include all the above mentioned actions in a Matr1ScatterData function (without parameters). As a result of the call of this function, each process will receive the values N_A, P, N_B, Q, as well as one-dimensional arrays filled with the corresponding bands of the matrices A, B, C. Output all obtained data (that is, the numbers N_A, P, N_B, Q and the bands of the matrices A, B, C) in each process after calling the Matr1ScatterData function. Perform the input of initial data in the Matr1ScatterData function, perform the output of the results in the Solve function.

Note. To reduce the number of the MPI_Bcast function calls, all matrix sizes may be sent as a single array.

MPI9Matr3°. Integers N_A, P, N_B, Q and one-dimensional arrays filled with the corresponding bands of matrices A, B, C are given in each process (thus, the given data coincide with the results obtained in the MPI9Matr2 task). Implement the first step of the band algorithm of matrix multiplication as follows: multiply elements in the bands A_R and B_R of each process and perform the cyclic sending each band B_R to the process of the previous rank (that is, from the process 1 to the process 0, from the process 2 to the process 1, …, from the process 0 to the process K − 1, where K is the number of processes).

Use the MPI_Sendrecv_replace function to send the bands. To determine the ranks of the sending and receiving processes, use the expression containing the % operator that gives the remainder of a division.

Include all the above mentioned actions in a Matr1Calc function (without parameters). Output the new contents of the bands C_R and B_R in each process; perform data input and output in the Solve function.

Note. As a result of multiplying the bands A_R and B_R, each element of the band C_R will contain a part of the terms included in the elements of the product AB; all elements of the band B_R and some of the elements of the band A_R will be used (in particular, the first N_B elements of the band A₀ will be used in the process 0 in the first step and the last N_B elements of the band A_K-1 will be used in the process K − 1 in the first step).

MPI9Matr4°. Integers N_A, P, N_B, Q and one-dimensional arrays filled with the corresponding bands of matrices A, B, C are given in each process (thus, the given data coincide with the results obtained in the MPI9Matr2 task). Modify the Matr1Calc function, which was implemented in the previous task; the modified function should provide execution of any step of the band algorithm of matrix multiplication.

To do this, add the parameter named step to the function (this parameter specifies the step number and may be in the range 0 to K − 1, where K is the number of processes) and use the value of this parameter in the part of the algorithm that deals with the recalculation of the elements of the band C_R (the cyclic sending of the bands B_R does not depend on the value of the parameter step).

Using two calls of the modified Matr1Calc function with the parameters 0 and 1, execute two initial steps of the band algorithm and output the new contents of the bands C_R and B_R in each process. Perform data input and output in the Solve function.

Note. The parameter step determines which part of the band A_R will be used for the next recalculation of the elements of the band C_R (note that these parts should be selected cyclically).

MPI9Matr5°. Integers N_A, P, N_B, Q and one-dimensional arrays filled with the corresponding bands of matrices A, B, C are given in each process (thus, the given data coincide with the results obtained in the MPI9Matr2 task). In addition, a number L with the same value is given in each process. The value of L is in the range 3 to K, where K is the number of processes, and determines the number of steps of the band algorithm.

Using the function Matr1Calc(I) (see the previous task) in a loop with the parameter I (I = 0, …, L − 1), execute the initial L steps of the band algorithm and output the new contents of the bands C_R and B_R in each process. Perform data input and output in the Solve function.

Remark. If the value of L is equal to K, then the bands C_R will contain the corresponding parts of the final matrix product AB.

MPI9Matr6°. An integer M (the number of rows of the matrix product) is given in the master process. In addition, integers N_A, Q and one-dimensional arrays filled with the N_A × Q bands of matrix C are given in each process (the given bands of C are obtained as a result of K steps of the band algorithm — see the MPI9Matr5 task). Send all the bands C_R to the master process and output the received matrix C of the size M × Q in this process.

To store the resulting matrix C in the master process, use a one-dimensional array sufficient to store the matrix of the size (N_A·K) × Q. To send data to this array, use the MPI_Gather collective function.

Include all the above mentioned actions in a Matr1GatherData function (without parameters). Perform the input of initial data in the Solve function, perform the output of the resulting matrix in the Matr1GatherData function.

MPI9Matr7°. Integers M, P, Q, a matrix A of the size M × P, and a matrix B of the size P × Q are given in the master process (thus, the given data coincide with the given data in the MPI9Matr2 task).

Using successively the Matr1ScatterData, Matr1Calc (in a loop), and Matr1GatherData functions, that are developed in the MPI9Matr2−MPI9Matr6 tasks, find a matrix C, which is equal to the product of the initial matrices A and B, and output this matrix in the master process.

In addition, output the current contents of the band C_R in each process after each call of the Matr1Calc function. Modify the Matr1Calc function (see the MPI9Matr4 task), before using in this task, as follows: the bands B_R should not be sent when the parameter step is equal to K − 1.

MPI9Matr8°. Integers M, P, Q and two file names are given in the master process. The given files contain elements of a matrix A of the size M × P and a matrix B of the size P × Q. Modify the initial stage of the band algorithm of matrix multiplication (see the MPI9Matr2 task) as follows: each process should read the corresponding bands of the matrices A and B directly from the given files using the MPI_File_seek and MPI_File_read_all collective functions (a new file view is not required).

To send the sizes of matrices and file names, use the MPI_Bcast collective function.

Include all these actions in a Matr1ScatterFile function (without parameters). As a result of the call of this function, each process will receive the values N_A, P, N_B, Q, as well as one-dimensional arrays filled with the corresponding bands of the matrices A, B, C. Output all obtained data (that is, the numbers N_A, P, N_B, Q and the bands of the matrices A, B, C) in each process after calling the Matr1ScatterFile function. Perform the input of initial data in the Matr1ScatterFile function, perform the output of the results in the Solve function.

Remark. For some bands, some of their elements (namely, the last rows) or even the entire bands should not be read from the source files and will remain zero-valued ones. However, this situation does not require special processing, since the MPI_File_read_all function automatically stops reading the data (without generating any error message) when the end of the file is reached.

MPI9Matr9°. Integers N_A, Q and one-dimensional arrays filled with the N_A × Q bands C_R are given in each process (the given bands C_R are obtained as a result of K steps of the band algorithm of matrix multiplication — see the MPI9Matr5 task). In addition, an integer M (the number of rows of the matrix product) and the name of file (to store this product) are given in the master process.

Send the number M and the file name to all processes using the MPI_Bcast function. Write all the parts of the matrix product contained in the bands C_R to the resulting file, which will eventually contain a matrix C of the size M × Q. To write the bands to the file, use the MPI_File_seek and MPI_File_write_all collective functions.

Include all these actions (namely, the input of file name, sending number M and the file name, and writing all bands to the file) in a Matr1GatherFile function. Perform the input of all initial data, except the file name, in the Solve function.

Note. When writing data to the resulting file, it is necessary to take into account that some of the bands C_R may contain trailing zero-valued rows that are not related to the resulting matrix product (the number M should be sent to all processes in order to control this situation).

MPI9Matr10°. Integers M, P, Q and three file names are given in the master process. The first two file names are related to the existing files containing the elements of matrices A and B of the size M × P and P × Q, respectively, the third file should be created to store the resulting matrix product C = AB. Using successively the Matr1ScatterFile, Matr1Calc (in a loop), and Matr1GatherFile functions (see the MPI9Matr8, MPI9Matr5, MPI9Matr9 tasks), find a matrix C and write its elements to the resulting file.

In addition, output the current value of the c[step] in each process after each call of the Matr1Calc function, where c is a one-dimensional array containing the band C_R, and step is the algorithm step number (0, 1, …, K − 1). Thus, the element c[0] should be output on the first step of the algorithm, the element c[1] should be output on the second step of the algorithm, and so on.

Band algorithm 2 (horizontal and vertical bands)

MPI9Matr11°. Integers P and Q are given in each process; in addition, a matrix B of the size P × Q is given in the master process. The number Q is a multiple of the number of processes K. Input the matrix B into a one-dimensional array of the size P·Q in the master process and define a new datatype named MPI_BAND_B that contains a vertical band of the matrix B. The width of the vertical band should be equal to N_B = Q/K columns. When defining the MPI_BAND_B datatype, use the MPI_Type_vector and MPI_Type_commit functions.

Include this definition in a Matr2CreateTypeBand(p, n, q, t) function with the input integer parameters p, n, q and the output parameter t of the MPI_Datatype type; the parameters p and n determine the size of the vertical band (the number of its rows and columns), and the parameter q determines the number of columns of the matrix from which this band is extracted.

Using the MPI_BAND_B datatype, send to each process (including the master process) the corresponding band of the matrix B in the ascending order of ranks of receiving processes. Sending should be performed using the MPI_Send and MPI_Recv functions; the bands should be stored in one-dimensional arrays of the size P·N_B. Output the received band in each process.

Remark. In the MPICH2 version 1.3, the MPI_Send function call is erroneous if the sending and receiving processes are the same. You may use the MPI_Sendrecv function to send a band to the master process. You may also fill a band in the master process without using tools of the MPI library.

MPI9Matr12°. Integers M, P, Q, a matrix A of the size M × P, and a matrix B of the size P × Q are given in the master process. In the second variant of the band algorithm of matrix multiplication, the first multiplier (the matrix A) is divided into K horizontal bands and the second multiplier (the matrix B) is divided into K vertical bands, where K is the number of processes (hereinafter bands are distributed by processes and used to calculate a part of the total matrix product in each process).

The band of the matrix A contains N_A rows, the band of the matrix B contains N_B columns. The numbers N_A and N_B are calculated as follows: N_A = ceil(M/K), N_B = ceil(Q/K), where the operation "/" means the division of real numbers and the function ceil performs rounding up. If the matrix contains insufficient number of rows (or columns) to fill the last band, then the zero-valued rows (or columns) should be added to this band.

Add, if necessary, the zero-valued rows or columns to the initial matrices, save them in one-dimensional arrays in the master process, and then send the matrix bands from these arrays to all processes as follows: a band with the index R is sent to the process of rank R (R = 0, 1, …, K − 1), all the bands A_R are of the size N_A × P, all the bands B_R are of the size P × N_B. In addition, create a band C_R in each process to store the part of the matrix product C = AB which will be calculated in this process. Each band C_R is of the size (N_A·K) × N_B and is filled with zero-valued elements.

The bands, like the initial matrices, should be stored in one-dimensional arrays in a row-major order. To send the matrix sizes, use the MPI_Bcast collective function, to send the bands of the matrix A, use the MPI_Scatter collective function, to send the bands of the matrix B, use the MPI_Send and MPI_Recv functions and also the MPI_BAND_B datatype created by the Matr2CreateTypeBand function (see the previous task and a note to it).

Include all the above mentioned actions in a Matr2ScatterData function (without parameters). As a result of the call of this function, each process will receive the values N_A, P, N_B, as well as one-dimensional arrays filled with the corresponding bands of the matrices A, B, C. Output all obtained data (that is, the numbers N_A, P, N_B and the bands of the matrices A, B, C) in each process after calling the Matr2ScatterData function. Perform the input of initial data in the Matr2ScatterData function, perform the output of the results in the Solve function.

Notes. (1) When input the matrix B into an array in the master process, it should be taken into account that this array may contain elements corresponding to additional zero-valued columns.

(2) To reduce the number of the MPI_Bcast function calls, all matrix sizes may be sent as a single array.

MPI9Matr13°. Integers N_A, P, N_B and one-dimensional arrays filled with the corresponding bands of matrices A, B, C are given in each process (thus, the given data coincide with the results obtained in the MPI9Matr12 task). Implement the first step of the band algorithm of matrix multiplication as follows: multiply elements in the bands A_R and B_R of each process and perform the cyclic sending each band A_R to the process of the previous rank (that is, from the process 1 to the process 0, from the process 2 to the process 1, …, from the process 0 to the process K − 1, where K is the number of processes).

Include all the above mentioned actions in a Matr2Calc function (without parameters). Output the new contents of the bands C_R and A_R in each process; perform data input and output in the Solve function.

Note. In this variant of the band algorithm, the bands A_R contain the full rows of the matrix A and the bands B_R contain the full columns of the matrix B, so, as a result of their multiplication, the band C_R will contain part of the elements of the final matrix product already at the first step of the algorithm (the other elements of the band C_R will remain zero-valued). The location of the found elements in the band C_R depends on the rank of the process (in particular, the first N_A rows of the band C₀ in the process 0 will be filled in the first step and the last N_A rows of the band C_K-1 in the process K − 1 will be filled in the first step).

MPI9Matr14°. Integers N_A, P, N_B and one-dimensional arrays filled with the corresponding bands of matrices A, B, C are given in each process (thus, the given data coincide with the results obtained in the MPI9Matr12 task). Modify the Matr2Calc function, which was implemented in the previous task; the modified function should provide execution of any step of the band algorithm of matrix multiplication.

To do this, add the parameter named step to the function (this parameter specifies the step number and may be in the range 0 to K − 1, where K is the number of processes) and use the value of this parameter in the part of the algorithm that deals with the recalculation of the elements of the band C_R (the cyclic sending of the bands A_R does not depend on the value of the parameter step).

Using two calls of the modified Matr2Calc function with the parameters 0 and 1, execute two initial steps of the band algorithm and output the new contents of the bands C_R and A_R in each process. Perform data input and output in the Solve function.

Note. The parameter step determines which rows of the band C_R will be calculated in this step of the algorithm (note that these rows are selected cyclically).

MPI9Matr15°. Integers N_A, P, N_B and one-dimensional arrays filled with the corresponding bands of matrices A, B, C are given in each process (thus, the given data coincide with the results obtained in the MPI9Matr12 task). In addition, a number L with the same value is given in each process. The value of L is in the range 3 to K, where K is the number of processes, and determines the number of steps of the band algorithm.

Using the function Matr2Calc(I) (see the previous task) in a loop with the parameter I (I = 0, …, L − 1), execute the initial L steps of the band algorithm and output the new contents of the bands C_R and A_R in each process. Perform data input and output in the Solve function.

Remark. If the value of L is equal to K, then the bands C_R will contain the corresponding parts of the final matrix product AB.

MPI9Matr16°. Integers M and Q (the numbers of rows and columns of the matrix product) are given in the master process. In addition, integers N_A, N_B and one-dimensional arrays filled with the (N_A·K) × N_B bands of the matrix C are given in each process (the given bands of C are obtained as a result of K steps of the band algorithm — see the MPI9Matr15 task). Send all the bands C_R to the master process and output the received matrix C of the size M × Q in this process.

To store the resulting matrix C in the master process, use a one-dimensional array sufficient to store the matrix of the size (N_A·K) × (N_B·K). To send data to this array, use the MPI_Send and MPI_Recv functions and the MPI_BAND_C datatype created by the Matr2CreateTypeBand function (see the MPI9Matr11 task and a note to it).

Include all the above mentioned actions in a Matr2GatherData function (without parameters). Perform the input of initial data in the Solve function, perform the output of the resulting matrix in the Matr2GatherData function.

Note. When output the matrix C in the master process, it should be taken into account that an array, which is intended for matrix storage, may contain elements corresponding to additional zero-valued columns.

MPI9Matr17°. Integers M, P, Q, a matrix A of the size M × P, and a matrix B of the size P × Q are given in the master process (thus, the given data coincide with the given data in the MPI9Matr12 task).

Using successively the Matr2ScatterData, Matr2Calc (in a loop), and Matr2GatherData functions, that are developed in the MPI9Matr12−MPI9Matr16 tasks, find a matrix C, which is equal to the product of the initial matrices A and B, and output this matrix in the master process.

In addition, output the current contents of the band C_R in each process after each call of the Matr2Calc function.

Modify the Matr2Calc function (see the MPI9Matr14 task), before using in this task, as follows: the bands A_R should not be sent when the parameter step is equal to K − 1.

MPI9Matr18°. Integers M, P, Q and two file names are given in the master process. The given files contain elements of a matrix A of the size M × P and a matrix B of the size P × Q. The number Q is a multiple of the number of processes K. Modify the initial stage of the band algorithm of matrix multiplication (see the MPI9Matr12 task) as follows: each process should read the corresponding bands of the matrices A and B directly from the given files.

To send the sizes of matrices and file names, use the MPI_Bcast collective function. Use the MPI_File_seek and MPI_File_read_all collective functions to read the horizontal bands of the matrix A. To read the vertical bands of the matrix B, set the appropriate file view using the MPI_File_set_view function and the MPI_BAND_B filetype defined with the Matr2CreateTypeBand function (see the MPI9Matr11 task), and then use the MPI_File_read_all function.

Include all these actions in a Matr2ScatterFile function (without parameters). As a result of the call of this function, each process will receive the values N_A, P, N_B, as well as one-dimensional arrays filled with the corresponding bands of the matrices A, B, C. Output all obtained data (that is, the numbers N_A, P, N_B and the bands of the matrices A, B, C) in each process after calling the Matr2ScatterFile function. Perform the input of initial data in the Matr2ScatterFile function, perform the output of the results in the Solve function.

Note. A condition that the number Q is a multiple of K allows us to perform reading of the bands B_R using the same filetype in all processes.

If this condition is not fulfilled, then it would be necessary to use special types that ensure the correct reading from the file and write to the array of "truncated" bands of the matrix B in the last processes (in addition, in this case it would be necessary to send to each process the value of Q which is necessary for the correct type definition for "truncated" bands).

MPI9Matr19°. Integers N_A, N_B and one-dimensional arrays filled with the (N_A·K) × N_B bands C_R are given in each process (the given bands C_R are obtained as a result of K steps of the band algorithm of matrix multiplication — see the MPI9Matr15 task). In addition, an integer M (the number of rows of the matrix product) and the name of file (to store this product) are given in the master process. The number of columns Q of the matrix product is a multiple of the number of processes K (and, therefore, is equal to N_B·K).

To write the bands to the file, set the appropriate file view using the MPI_File_set_view function and the MPI_BAND_C filetype defined with the Matr2CreateTypeBand function (see the MPI9Matr11 task), and then use the MPI_File_write_all function.

Include all these actions (namely, the input of file name, sending number M and the file name, and writing all bands to the file) in a Matr2GatherFile function. Perform the input of all initial data, except the file name, in the Solve function.

Note. When writing data to the resulting file, it is necessary to take into account that the bands C_R may contain trailing zero-valued rows that are not related to the resulting matrix product (the number M should be sent to all processes in order to control this situation).

MPI9Matr20°. Integers M, P, Q and three file names are given in the master process. The first two file names are related to the existing files containing the elements of matrices A and B of the size M × P and P × Q, respectively, the third file should be created to store the resulting matrix product C = AB. The number Q is a multiple of the number of processes K. Using successively the Matr2ScatterFile, Matr2Calc (in a loop), and Matr2GatherFile functions (see the MPI9Matr18, MPI9Matr15, MPI9Matr19 tasks), find a matrix C and write its elements to the resulting file.

In addition, output the current value of the c[step] in each process after each call of the Matr2Calc function, where c is a one-dimensional array containing the band C_R, and step is the algorithm step number (0, 1, …, K − 1). Thus, the element c[0] should be output on the first step of the algorithm, the element c[1] should be output on the second step of the algorithm, and so on.

Cannon's block algorithm

MPI9Matr21°. Integers M and P are given in each process; in addition, a matrix A of the size M × P is given in the master process. The number of processes K is a perfect square: K = K₀·K₀, the numbers M and P are multiples of K₀. Input the matrix A into a one-dimensional array of the size M·P in the master process and define a new datatype named MPI_BLOCK_A that contains a M₀ × P₀ block of the matrix A, where M₀ = M/K₀, P₀ = P/K₀.

When defining the MPI_BLOCK_A type, use the MPI_Type_vector and MPI_Type_commit functions. Include this definition in a Matr3CreateTypeBlock(m0, p0, p, t) function with the input integer parameters m0, p0, p and the output parameter t of the MPI_Datatype type; the parameters m0 and p0 determine the size of the block, and the parameter p determines the number of columns of the matrix from which this block is extracted.

Using the MPI_BLOCK_A datatype, send to each process (in ascending order of ranks of processes, including the master process) the corresponding block of the matrix A in a row-major order of blocks (that is, the first block should be sent to the process 0, the next block in the same row of blocks should be sent to the process 1, and so on). Sending should be performed using the MPI_Send and MPI_Recv functions; the blocks should be stored in one-dimensional arrays of the size M₀·P₀. Output the received block in each process.

Remark. In the MPICH2 version 1.3, the MPI_Send function call is erroneous if the sending and receiving processes are the same. You may use the MPI_Sendrecv function to send a block to the master process. You may also fill a block in the master process without using tools of the MPI library.

MPI9Matr22°. Integers M₀, P₀ and a matrix A of the size M₀ × P₀ are given in each process. The number of processes K is a perfect square: K = K₀·K₀. Input the matrix A into a one-dimensional array of the size M₀·P₀ in each process and create a new communicator named MPI_COMM_GRID using the MPI_Cart_create function. The MPI_COMM_GRID communicator defines a Cartesian topology for all processes as a two-dimensional periodic K₀ × K₀ grid (ranks of processes should not be reordered).

Include the creation of the MPI_COMM_GRID communicator in a Matr3CreateCommGrid(comm) function with the output parameter comm of the MPI_Comm type. Using the MPI_Cart_coords function for this communicator, output the process coordinates (I₀, J₀) in each process.

Perform a cyclic shift of the matrices A given in all processes of each grid row I₀ by I₀ positions left (that is, in descending order of ranks of processes) using the MPI_Cart_shift and MPI_Sendrecv_replace functions. Output the received matrix in each process.

MPI9Matr23°. Integers M, P, Q, a matrix A of the size M × P, and a matrix B of the size P × Q are given in the master process. The number of processes K is a perfect square: K = K₀·K₀. In the block algorithms of matrix multiplication, the initial matrices are divided into K blocks and are interpreted as square block matrices of the order K₀ (hereinafter blocks are distributed by processes and used to calculate a part of the total matrix product in each process).

The block of the matrix A is of the size M₀ × P₀, the block of the matrix B is of the size P₀ × Q₀, the numbers M₀, P₀, Q₀ are calculated as follows: M₀ = ceil(M/K₀), P₀ = ceil(P/K₀), Q₀ = ceil(Q/K₀), where the operation "/" means the division of real numbers and the function ceil performs rounding up. If the matrix contains insufficient number of rows (or columns) to fill the last blocks, then the zero-valued rows (or columns) should be added to these blocks.

Add, if necessary, the zero-valued rows or columns to the initial matrices (as a result, the matrices A and B will have the size (M₀·K₀) × (P₀·K₀) and (P₀·K₀) × (Q₀·K₀) respectively), save them in one-dimensional arrays in the master process, and then send the matrix blocks (in a row-major order) from these arrays to all processes (in ascending order of its ranks): the process R will receive the blocks A_R and B_R, R = 0, …, K − 1. In addition, create a block C_R in each process to store the part of the matrix product C = AB which will be calculated in this process. Each block C_R is of the size M₀ × Q₀ and is filled with zero-valued elements.

The blocks, like the initial matrices, should be stored in one-dimensional arrays in a row-major order. To send the matrix sizes, use the MPI_Bcast collective function, to send the blocks of the matrices A and B, use the MPI_Send and MPI_Recv functions and also the MPI_BLOCK_A and MPI_BLOCK_B datatypes created by the Matr3CreateTypeBlock function (see the MPI9Matr21 task and a note to it).

Include all the above mentioned actions in a Matr3ScatterData function (without parameters). As a result of the call of this function, each process will receive the values M₀, P₀, Q₀, as well as one-dimensional arrays filled with the corresponding blocks of the matrices A, B, C. Output all obtained data (that is, the numbers M₀, P₀, Q₀ and the blocks of the matrices A, B, C) in each process after calling the Matr3ScatterData function. Perform the input of initial data in the Matr3ScatterData function, perform the output of the results in the Solve function.

Notes. (1) When input the matrices A and B into arrays in the master process, it should be taken into account that these arrays may contain elements corresponding to additional zero-valued columns.

(2) To reduce the number of the MPI_Bcast function calls, all matrix sizes may be sent as a single array.

MPI9Matr24°. Integers M₀, P₀, Q₀ and one-dimensional arrays filled with the corresponding blocks of matrices A, B, C are given in each process (thus, the given data coincide with the results obtained in the MPI9Matr23 task). Implement the initial block redistribution used in the Cannon's algorithm for block matrix multiplication.

To do this, define a Cartesian topology for all processes as a two-dimensional periodic K₀ × K₀ grid, where K₀·K₀ is equal to the number of processes (ranks of processes should not be reordered), and perform a cyclic shift of the blocks A_R given in all processes of each grid row I₀ by I₀ positions left (that is, in descending order of ranks of processes), I₀ = 0, …, K₀ − 1, and perform a cyclic shift of the blocks B_R given in all processes of each grid column J₀ by J₀ positions up (that is, in descending order of ranks of processes), J₀ = 0, …, K₀ − 1.

To create the MPI_COMM_GRID communicator associated with the Cartesian topology, use the Matr3CreateCommGrid function implemented in the MPI9Matr22 task. Use the MPI_Cart_coords, MPI_Cart_shift, MPI_Sendrecv_replace functions to perform the cyclic shifts (compare with MPI9Matr22).

Include all the above mentioned actions in a Matr3Init function (without parameters). Output the received blocks A_R and B_R in each process; perform data input and output in the Solve function.

MPI9Matr25°. Integers M₀, P₀, Q₀ and one-dimensional arrays filled with the corresponding blocks of matrices A, B, C are given in each process. The blocks C_R are zero-valued, the initial redistribution for the blocks A_R and B_R has already been performed in accordance with the Cannon's algorithm (see the previous task). Implement one step of the Cannon's algorithm of matrix multiplication as follows: multiply elements in the blocks A_R and B_R of each process and perform a cyclic shift of the blocks A₀ given in all processes of each row of the Cartesian periodic grid by 1 position left (that is, in descending order of ranks of processes) and perform a cyclic shift of the blocks B₀ given in all processes of each grid column by 1 position up (that is, in descending order of ranks of processes).

To create the MPI_COMM_GRID communicator associated with the Cartesian topology, use the Matr3CreateCommGrid function implemented in the MPI9Matr22 task. Use the MPI_Cart_shift and MPI_Sendrecv_replace functions to perform the cyclic shifts (compare with MPI9Matr22).

Include all the above mentioned actions in a Matr3Calc function (without parameters). Output the new contents of the blocks C_R, A_R, and B_R in each process; perform data input and output in the Solve function.

Remark. A special feature of the Cannon's algorithm is that the actions at each step are not depend on the step number.

MPI9Matr26°. Integers M₀, P₀, Q₀ and one-dimensional arrays filled with the corresponding blocks of matrices A, B, C are given in each process. The blocks C_R are zero-valued, the initial redistribution for the blocks A_R and B_R has already been performed in accordance with the Cannon's algorithm (see the MPI9Matr24 task). In addition, a number L with the same value is given in each process. The value of L is in the range 2 to K₀, where K₀·K₀ is the number of processes, and determines the number of steps of the Cannon's algorithm.

Using the function Matr3Calc (see the previous task) in a loop, execute the initial L steps of the Cannon's algorithm and output the new contents of the blocks C_R, A_R, and B_R in each process. Perform data input and output in the Solve function.

Note. If the value of L is equal to K₀, then the blocks C_R will contain the corresponding parts of the final matrix product AB.

MPI9Matr27°. Integers M and Q (the numbers of rows and columns of the matrix product) are given in the master process. In addition, integers M₀, Q₀ and one-dimensional arrays filled with the M₀ × Q₀ blocks of the matrix C are given in each process (the given blocks of C are obtained as a result of K₀ steps of the Cannon's algorithm — see the MPI9Matr26 task). Send all the blocks C_R to the master process and output the received matrix C of the size M × Q in this process.

To store the resulting matrix C in the master process, use a one-dimensional array sufficient to store the matrix of the size (M₀·K₀) × (Q₀·K₀). To send the blocks C_R to this array, use the MPI_Send and MPI_Recv functions and the MPI_BLOCK_C datatype created by the Matr3CreateTypeBlock function (see the MPI9Matr21 task and a note to it).

Include all the above mentioned actions in a Matr3GatherData function (without parameters). Perform the input of initial data in the Solve function, perform the output of the resulting matrix in the Matr3GatherData function.

MPI9Matr28°. Integers M, P, Q, a matrix A of the size M × P, and a matrix B of the size P × Q are given in the master process (thus, the given data coincide with the given data in the MPI9Matr23 task).

Using successively the Matr3ScatterData, Matr3Init, Matr3Calc (in a loop), and Matr3GatherData functions, that are developed in the MPI9Matr23−MPI9Matr27 tasks, find a matrix C, which is equal to the product of the initial matrices A and B, and output this matrix in the master process.

In addition, output the current contents of the block C_R in each process after each call of the Matr3Calc function.

The MPI_COMM_GRID communicator, which is used in the Matr3Init and Matr3Calc functions, should not be created several times. To do this, modify the Matr3CreateCommGrid function; the modified function should not perform any actions when it is called with the parameter comm that is not equal to MPI_COMM_NULL.

In addition, modify the Matr3Calc function (see the MPI9Matr25 task), before using in this task, as follows: add the parameter named step to this function (step = 0, …, K₀ − 1); the blocks A_R and B_R should not be sent when the parameter step is equal to K₀ − 1.

MPI9Matr29°. Integers M, P, Q and two file names are given in the master process. The given files contain elements of a matrix A of the size M × P and a matrix B of the size P × Q. Modify the stage of receiving blocks for the Cannon's algorithm of matrix multiplication (see the MPI9Matr23 task) as follows: each process should read the corresponding blocks of the matrices A and B directly from the given files. In this case, all processes should receive not only the sizes M₀, P₀, Q₀ of the blocks, but also the sizes M, P, Q of the initial matrices, which are needed to determine correctly the positions of blocks in the source files.

To send the sizes of matrices and file names, use the MPI_Bcast collective function. Use the MPI_File_read_at local function to read each row of the block (a new file view is not required).

Include all these actions in a Matr3ScatterFile function (without parameters). As a result of the call of this function, each process will receive the values M, P, Q, M₀, P₀, Q₀, as well as one-dimensional arrays filled with the corresponding blocks of the matrices A, B, C. Output all obtained data (that is, the numbers M, P, Q, M₀, P₀, Q₀ and the blocks of the matrices A, B, C) in each process after calling the Matr3ScatterFile function. Perform the input of initial data in the Matr3ScatterFile function, perform the output of the results in the Solve function.

Note. For some blocks, some of their elements (namely, the last rows and/or columns) should not be read from the source files and will remain zero-valued ones. To determine the actual size of the block being read (the number of rows and columns), it is required to use the sizes of the initial matrices and the coordinates (I₀, J₀) of the block in a square Cartesian grid of order K₀ (note that I₀ = R/K₀, J₀ = R%K₀, where R is the process rank).

Remark. Whereas the values of P and Q are necessary to ensure the correct reading of the file blocks, the value of M is not required for this purpose, since the attempt to read data beyond the end of file is ignored (without generating any error message). However, the value of M is required at the final stage of the algorithm (see the next task), so it must also be sent to all processes.

MPI9Matr30°. Integers M, Q, M₀, Q₀ and one-dimensional arrays filled with the M₀ × Q₀ blocks C_R are given in each process (the given blocks C_R are obtained as a result of K₀ steps of the Cannon's block algorithm of matrix multiplication — see the MPI9Matr25 task). In addition, the name of file to store the matrix product is given in the master process.

Send the file name to all processes using the MPI_Bcast function. Write all the parts of the matrix product contained in the blocks C_R to the resulting file, which will eventually contain a matrix C of the size M × Q.

Use the MPI_File_write_at local function to write each row of the block to the file (a new file view is not required).

Include all these actions (namely, the input of file name, sending the file name, and writing all blocks to the file) in a Matr3GatherFile function. Perform the input of all initial data, except the file name, in the Solve function.

Note. When writing data to the resulting file, it is necessary to take into account that some of the blocks C_R may contain trailing zero-valued rows and/or columns that are not related to the resulting matrix product (see also the note and the remark for the previous task).

MPI9Matr31°. Integers M, P, Q and three file names are given in the master process. The first two file names are related to the existing files containing the elements of matrices A and B of the size M × P and P × Q, respectively, the third file should be created to store the resulting matrix product C = AB.

Using successively the Matr3ScatterFile, Matr3Init, Matr3Calc (in a loop), and Matr3GatherFile functions (see the MPI9Matr29, MPI9Matr24, MPI9Matr25, and MPI9Matr30 tasks), find a matrix C and write its elements to the resulting file.

In addition, output the current value of the c[step] in each process after each call of the Matr3Calc function, where c is a one-dimensional array containing the block C_R, and step is the algorithm step number (0, 1, …, K₀ − 1). Thus, the element c[0] should be output on the first step of the algorithm, the element c[1] should be output on the second step of the algorithm, and so on.

Fox's block algorithm

MPI9Matr32°. Integers M and P are given in each process; in addition, a matrix A of the size M × P is given in the master process. The number of processes K is a perfect square: K = K₀·K₀, the numbers M and P are multiples of K₀. Input the matrix A into a one-dimensional array of the size M·P in the master process and define a new datatype named MPI_BLOCK_A that contains a M₀ × P₀ block of the matrix A, where M₀ = M/K₀, P₀ = P/K₀.

When defining the MPI_BLOCK_A type, use the MPI_Type_vector and MPI_Type_commit functions. Include this definition in a Matr4CreateTypeBlock(m0, p0, p, t) function with the input integer parameters m0, p0, p and the output parameter t of the MPI_Datatype type; the parameters m0 and p0 determine the size of the block, and the parameter p determines the number of columns of the matrix from which this block is extracted.

Using the MPI_BLOCK_A datatype, send to each process (in ascending order of ranks of processes, including the master process) the corresponding block of the matrix A in a row-major order of blocks (that is, the first block should be sent to the process 0, the next block in the same row of blocks should be sent to the process 1, and so on). Sending should be performed using the MPI_Alltoallw function; the blocks should be stored in one-dimensional arrays of the size M₀·P₀. Output the received block in each process.

Notes. (1) Use the MPI_Send and MPI_Recv functions instead of the MPI_Alltoallw function when solving this task using the MPI-1 library.

(2) The MPI_Alltoallw function introduced in MPI-2 is the only collective function that allows you to specify the displacements for the sent data in bytes (not in elements). This gives opportunity to use it in conjunction with complex data types to implement any variants of collective communications (in our case, we need to implement a communication of the scatter type).

It should be noted that all array parameters of the MPI_Alltoallw function associated with the sent data must be differently defined in the master and slave processes. In particular, the array scounts (which determines the number of sent elements) must contain the values 0 in all the slave processes and the value 1 in the master process (the sent elements are of the MPI_BLOCK_A datatype).

At the same time, arrays associated with the received data will be defined in the same way in all processes; in particular, the zero-indexed element of the array rcounts (which determines the number of received elements) must be equal to M₀·P₀, and all other elements of this array must be equal to 0 (the received elements are of the MPI_INT datatype).

It is necessary to pay special attention to the correct definition of elements in the array sdispls of displacements for the sent data in the master process (in the slave processes, it is enough to use the zero-valued array sdispls).

MPI9Matr33°. Integers M₀, P₀ and a matrix A of the size M₀ × P₀ are given in each process. The number of processes K is a perfect square: K = K₀·K₀. Input the matrix A into a one-dimensional array of the size M₀·P₀ in each process and create a new communicator named MPI_COMM_GRID using the MPI_Cart_create function. The MPI_COMM_GRID communicator defines a Cartesian topology for all processes as a two-dimensional periodic K₀ × K₀ grid (ranks of processes should not be reordered).

Include the creation of the MPI_COMM_GRID communicator in a Matr4CreateCommGrid(comm) function with the output parameter comm of the MPI_Comm type. Using the MPI_Cart_coords function for this communicator, output the process coordinates (I₀, J₀) in each process.

On the base of the MPI_COMM_GRID communicator, create a set of communicators named MPI_COMM_ROW, which are associated with the rows of the initial two-dimensional grid. Use the MPI_Cart_sub function to create the MPI_COMM_ROW communicators.

Include the creation of the MPI_COMM_ROW communicators in a Matr4CreateCommRow(grid, row) function with the input parameter grid (the communicator associated with the initial two-dimensional grid) and the output parameter row (both parameters are of the MPI_Comm type). Output the process rank R₀ for the MPI_COMM_ROW communicator in each process (this rank must be equal to J₀).

In addition, send the matrix A from the grid element (I₀, I₀) to all processes of the same grid row I₀ (I₀ = 0, …, K₀ − 1) using the MPI_Bcast collective function for the MPI_COMM_ROW communicator. Save the received matrix in the auxiliary matrix T of the same size as the matrix A (it is necessary to copy the matrix A to the matrix T in the sending process before the call of the MPI_Bcast function). Output the received matrix T in each process.

MPI9Matr34°. Integers P₀, Q₀ and a matrix B of the size P₀ × Q₀ are given in each process. The number of processes K is a perfect square: K = K₀·K₀. Input the matrix B into a one-dimensional array of the size P₀·Q₀ in each process and create a new communicator named MPI_COMM_GRID, which defines a Cartesian topology for all processes as a two-dimensional periodic K₀ × K₀ grid.

Use the Matr4CreateCommGrid function (see the MPI9Matr33 task) to create the MPI_COMM_GRID communicator. Using the MPI_Cart_coords function for this communicator, output the process coordinates (I₀, J₀) in each process.

On the base of the MPI_COMM_GRID communicator, create a set of communicators named MPI_COMM_COL, which are associated with the columns of the initial two-dimensional grid. Use the MPI_Cart_sub function to create the MPI_COMM_COL communicators.

Include the creation of the MPI_COMM_COL communicators in a Matr4CreateCommCol(grid, col) function with the input parameter grid (the communicator associated with the initial two-dimensional grid) and the output parameter col (both parameters are of the MPI_Comm type). Output the process rank R₀ for the MPI_COMM_COL communicator in each process (this rank must be equal to I₀).

In addition, perform a cyclic shift of the matrices B given in all processes of each grid column J₀ by 1 position up (that is, in descending order of ranks of processes) using the MPI_Sendrecv_replace function for the MPI_COMM_COL communicator (to determine the ranks of the sending and receiving processes, use the expression containing the % operator that gives the remainder of a division). Output the received matrix in each process.

MPI9Matr35°. Integers M, P, Q, a matrix A of the size M × P, and a matrix B of the size P × Q are given in the master process. The number of processes K is a perfect square: K = K₀·K₀. In the block algorithms of matrix multiplication, the initial matrices are divided into K blocks and are interpreted as square block matrices of the order K₀ (hereinafter blocks are distributed by processes and used to calculate a part of the total matrix product in each process).

Add, if necessary, the zero-valued rows or columns to the initial matrices (as a result, the matrices A and B will have the size (M₀·K₀) × (P₀·K₀) and (P₀·K₀) × (Q₀·K₀) respectively), save them in one-dimensional arrays in the master process, and then send the matrix blocks (in a row-major order) from these arrays to all processes (in ascending order of its ranks): the process R will receive the blocks A_R and B_R, R = 0, …, K − 1. In addition, create two blocks C_R and T_R filled with zero-valued elements in each process: the block C_R (of the size M₀ × Q₀) is intended to store the part of the matrix product C = AB, which will be calculated in this process, the block T_R (of the size M₀ × P₀) is an auxiliary one.

The blocks, like the initial matrices, should be stored in one-dimensional arrays in a row-major order. To send the matrix sizes, use the MPI_Bcast collective function, to send the blocks of the matrices A and B, use the MPI_Alltoallw collective function and also the MPI_BLOCK_A and MPI_BLOCK_B datatypes created by the Matr4CreateTypeBlock function (see the MPI9Matr32 task and notes to it).

Include all the above mentioned actions in a Matr4ScatterData function (without parameters). As a result of the call of this function, each process will receive the values M₀, P₀, Q₀, as well as one-dimensional arrays filled with the blocks A_R, B_R, C_R, T_R. Output all obtained data (that is, the numbers M₀, P₀, Q₀ and the blocks A_R, B_R, C_R, T_R) in each process after calling the Matr4ScatterData function. Perform the input of initial data in the Matr4ScatterData function, perform the output of the results in the Solve function.

Notes. (1) When input the matrices A and B into arrays in the master process, it should be taken into account that these arrays may contain elements corresponding to additional zero-valued columns.

(2) To reduce the number of the MPI_Bcast function calls, all matrix sizes may be sent as a single array.

MPI9Matr36°. Integers M₀, P₀, Q₀ and one-dimensional arrays filled with the blocks A_R, B_R, C_R, T_R are given in each process (thus, the given data coincide with the results obtained in the MPI9Matr35 task). A virtual Cartesian topology in the form of a square grid of order K₀ is used for all processes, the value of K₀·K₀ is equal to the number of the processes. Each step of the Fox's block algorithm of matrix multiplication consists of two stages.

In the first stage of the first step, the block A_R is sent from the process with the grid coordinates (I₀, I₀) to all processes of the same grid row I₀ (I₀ = 0, …, K₀ − 1). The received block is saved in the block T_R in the receiving processes. Then the block T_R is multiplied by the block B_R from the same process and the result is added to the block C_R.

Implement the first stage of the first step of the Fox's algorithm. To do this, create the MPI_COMM_GRID and MPI_COMM_ROW communicators using the Matr4CreateCommGrid and Matr4CreateCommRow functions implemented in the MPI9Matr33 task. Use the MPI_Bcast function for the MPI_COMM_ROW communicator to send the blocks A_R (compare with MPI9Matr33).

Include all the above mentioned actions in a Matr4Calc1 function (without parameters). Output the new contents of the blocks T_R and C_R in each process; perform data input and output in the Solve function.

MPI9Matr37°. Integers M₀, P₀, Q₀ and one-dimensional arrays filled with the blocks A_R, B_R, C_R, T_R are given in each process (thus, the given data coincide with the results obtained in the MPI9Matr35 task).

Implement the second stage of the first step of the Fox's algorithm of matrix multiplication as follows: perform a cyclic shift of the blocks B_R given in all processes of each column of the Cartesian periodic grid by 1 position up (that is, in descending order of ranks of processes).

To do this, create the MPI_COMM_GRID and MPI_COMM_COL communicators using the Matr4CreateCommGrid and Matr4CreateCommCol functions implemented in the MPI9Matr34 task, then use the MPI_Sendrecv_replace function for the MPI_COMM_COL communicator to perform the cyclic shift of the blocks B_R (compare with MPI9Matr34).

Include all the above mentioned actions in a Matr4Calc2 function (without parameters). Output the received blocks B_R in each process; perform data input and output in the Solve function.

MPI9Matr38°. Integers M₀, P₀, Q₀ and one-dimensional arrays filled with the blocks A_R, B_R, C_R, T_R are given in each process (thus, the given data coincide with the results obtained in the MPI9Matr35 task).

Modify the Matr4Calc1 function, which was implemented in the MPI9Matr36 task; the modified function should provide execution of the first stage of any step of the Fox's algorithm. To do this, add the parameter named step to the function (this parameter specifies the step number and may be in the range 0 to K₀ − 1, where K₀ is the order of the Cartesian grid of processes) and use the value of this parameter in the part of the algorithm that deals with the sending the blocks A_R: the block A_R should be sent from the process with the grid coordinates (I₀, (I₀ + step)%K₀) to all processes of the same grid row I₀, I₀ = 0, …, K₀ − 1 (the recalculation of the elements of the block C_R does not depend on the value of the parameter step).

Using successively the calls of Matr4Calc1(0), Matr4Calc2(), Matr4Calc1(1) (the Matr4Calc2 function provides the second stage of each step of the algorithm — see the MPI9Matr37 task), execute two initial steps of the Fox's algorithm and output the new contents of the blocks T_R, B_R, and C_R in each process. Perform data input and output in the Solve function.

MPI9Matr39°. Integers M₀, P₀, Q₀ and one-dimensional arrays filled with the blocks A_R, B_R, C_R, T_R are given in each process (thus, the given data coincide with the results obtained in the MPI9Matr35 task). In addition, a number L with the same value is given in each process. The value of L is in the range 3 to K₀ and determines the number of steps of the Fox's algorithm.

Using successively the calls of Matr4Calc1(0), Matr4Calc2(), Matr4Calc1(1), Matr4Calc2(), …, Matr4Calc1(L − 1) (see the MPI9Matr38 and MPI9Matr37 tasks), execute the initial L steps of the Fox's algorithm and output the new contents of the blocks T_R, B_R, and C_R in each process. Perform data input and output in the Solve function.

Remark. If the value of L is equal to K₀, then the blocks C_R will contain the corresponding parts of the final matrix product AB. Note that the second stage (associated with the call of the Matr4Calc2 function) is not necessary at the last step of the algorithm.

MPI9Matr40°. Integers M and Q (the numbers of rows and columns of the matrix product) are given in the master process. In addition, integers M₀, Q₀ and one-dimensional arrays filled with the M₀ × Q₀ blocks of the matrix C are given in each process (the given blocks of C are obtained as a result of K₀ steps of the Fox's algorithm — see the MPI9Matr39 task).

Send all the blocks C_R to the master process and output the received matrix C of the size M × Q in this process. To store the resulting matrix C in the master process, use a one-dimensional array sufficient to store the matrix of the size (M₀·K₀) × (Q₀·K₀). To send the blocks C_R to this array, use the MPI_Alltoallw collective function and the MPI_BLOCK_C datatype created by the Matr4CreateTypeBlock function (see the MPI9Matr32 task and notes to it).

Include all the above mentioned actions in a Matr4GatherData function (without parameters). Perform the input of initial data in the Solve function, perform the output of the resulting matrix in the Matr4GatherData function.

MPI9Matr41°. Integers M, P, Q, a matrix A of the size M × P, and a matrix B of the size P × Q are given in the master process (thus, the given data coincide with the given data in the MPI9Matr35 task).

Using successively the Matr4ScatterData, Matr4Calc1, Matr4Calc2, and Matr4GatherData functions, that are developed in the MPI9Matr35−MPI9Matr40 tasks, find a matrix C, which is equal to the product of the initial matrices A and B, and output this matrix in the master process. The Matr4Calc1 and Matr4Calc2 functions should be called in a loop, the number of Matr4Calc2 function calls must be one less than the number of Matr4Calc1 function calls.

In addition, output the current contents of the block C_R in each process after each call of the Matr4Calc1 function.

The MPI_COMM_GRID, MPI_COMM_ROW, and MPI_COMM_COL communicators that are used in the Matr4Calc1 and Matr4Calc2 functions, should not be created several times. To do this, modify the Matr4CreateCommGrid, Matr4CreateCommRow, and Matr4CreateCommCol functions (see the MPI9Matr33 and MPI9Matr34 tasks); the modified functions should not perform any actions when it is called with the parameter comm that is not equal to MPI_COMM_NULL.

MPI9Matr42°. Integers M, P, Q and two file names are given in the master process. The given files contain elements of a matrix A of the size M × P and a matrix B of the size P × Q. The numbers M, P, and Q are multiples of the order K₀ of square grid of processes.

Modify the stage of receiving blocks for the Fox's algorithm of matrix multiplication (see the MPI9Matr35 task) as follows: each process should read the corresponding blocks of the matrices A and B directly from the given files.

To send the sizes of matrices and file names, use the MPI_Bcast collective function. To read the blocks from the files, set the appropriate file views using the MPI_File_set_view function and the MPI_BLOCK_A and MPI_BLOCK_B filetypes defined with the Matr4CreateTypeBlock function (see the MPI9Matr32 task), and then use the MPI_File_read_all function.

Include all these actions in a Matr4ScatterFile function (without parameters). As a result of the call of this function, each process will receive the values M₀, P₀, Q₀, as well as one-dimensional arrays filled with the blocks A_R, B_R, C_R, T_R (the blocks C_R and T_R should contain zero-valued elements). Output all obtained data (that is, the numbers M₀, P₀, Q₀ and the blocks A_R, B_R, C_R, T_R) in each process after calling the Matr4ScatterFile function. Perform the input of initial data in the Matr4ScatterFile function, perform the output of the results in the Solve function.

Remark. A condition that the numbers M, P, Q are multiples of K₀ means that you do not need to add zero-valued rows and/or zero-valued columns to the blocks obtained from the matrices A and B, and therefore you may perform reading of the blocks A_R and B_R using the same filetypes (namely, MPI_BLOCK_A and MPI_BLOCK_B) in all processes.

If this condition is not fulfilled, then it would be necessary to use special types that ensure the correct reading from the file and write to the array of "truncated" blocks of the matrices A and B in some processes (in addition, in this case it would be necessary to send to each process the values of P and Q that are necessary for the correct type definition for "truncated" blocks).

MPI9Matr43°. Integers M₀, Q₀ and one-dimensional arrays filled with the M₀ × Q₀ blocks C_R are given in each process (the given blocks C_R are obtained as a result of K₀ steps of the Fox's block algorithm of matrix multiplication — see the MPI9Matr39 task). In addition, the name of file to store the matrix product is given in the master process. The numbers M and Q (the numbers of rows and columns of the matrix product) are multiples of the order K₀ of square grid of processes (thus, M = M₀·K₀, Q = Q₀·K₀).

To write the blocks to the files, set the appropriate file view using the MPI_File_set_view function and the MPI_BLOCK_C filetype defined with the Matr4CreateTypeBlock function (see the MPI9Matr32 task), and then use the MPI_File_write_all collective function.

Include all these actions (namely, the input of file name, sending the file name, and writing all blocks to the file) in a Matr4GatherFile function. Perform the input of all initial data, except the file name, in the Solve function.

Remark. A condition that the numbers M and Q are multiples of K₀ means that the blocks C_R do not contain "extra" zero-valued rows and/or columns, and therefore you may perform writing of the blocks C_R to the file using the same filetype (namely, MPI_BLOCK_C) in all processes.

MPI9Matr44°. Integers M, P, Q and three file names are given in the master process. The first two file names are related to the existing files containing the elements of matrices A and B of the size M × P and P × Q, respectively, the third file should be created to store the resulting matrix product C = AB. The numbers M, P, and Q are multiples of the order K₀ of square grid of processes.

Using successively the Matr4ScatterFile, Matr4Calc1, Matr4Calc2, and Matr4GatherFile functions (see the MPI9Matr42, MPI9Matr38, MPI9Matr37, and MPI9Matr43 tasks), find a matrix C and write its elements to the resulting file. The Matr4Calc1 and Matr4Calc2 functions should be called in a loop, the number of Matr4Calc2 function calls must be one less than the number of Matr4Calc1 function calls.

In addition, output the current value of the c[step] in each process after each call of the Matr4Calc1 function, where c is a one-dimensional array containing the block C_R, and step is the algorithm step number (0, 1, …, K₀ − 1). Thus, the element c[0] should be output on the first step of the algorithm, the element c[1] should be output on the second step of the algorithm, and so on.

Last revised:
01.01.2025