Parallel matrix algorithms
Non-parallel matrix multiplication algorithm
MPI9Matr1°. Integers M, P, Q, a matrix A of the size M × P, and a matrix B of the size P × Q are given in the master process. Find and output a M × Q matrix C that is the product of the matrices A and B. The formula for calculating the elements of the matrix C under the assumption that the rows and columns of all matrices are numbered from 0 is as follows: CI,J = AI,0·B0,J + AI,1·B1,J + … + AI,P-1·BP-1,J, where I = 0, …, M − 1, J = 0, …, Q − 1. To store the matrices A, B, C, use one-dimensional arrays of size M·P, P·Q, and M·Q placing elements of matrices in a row-major order (that is, the matrix element with indices I and J will be stored in the element of the corresponding array with the index I·N + J, where N is the number of columns of the matrix). The slave processes are not used in this task.
Band algorithm 1 (horizontal bands)
MPI9Matr2°. Integers M, P, Q, a matrix A of the size M × P, and a matrix B of the size P × Q are given in the master process. In the first variant of the band algorithm of matrix multiplication, each matrix multiplier is divided into K horizontal bands, where K is the number of processes (hereinafter bands are distributed by processes and used to calculate a part of the total matrix product in each process). The band of the matrix A contains NA rows, the band of the matrix B contains NB rows. The numbers NA and NB are calculated as follows: NA = ceil(M/K), NB = ceil(P/K), where the operation "/" means the division of real numbers and the function ceil performs rounding up. If the matrix contains insufficient number of rows to fill the last band then the zero-valued rows should be added to this band. Add, if necessary, the zero-valued rows to the initial matrices, save them in one-dimensional arrays in the master process, and then send the matrix bands from these arrays to all processes as follows: a band with the index R is sent to the process of rank R (R = 0, 1, …, K − 1), all the bands AR are of the size NA × P, all the bands BR are of the size NB × Q. In addition, create a band CR in each process to store the part of the matrix product C = AB which will be calculated in this process. Each band CR is of the size NA × Q and is filled with zero-valued elements. The bands, like the initial matrices, should be stored in one-dimensional arrays in a row-major order. To send the matrix sizes, use the MPI_Bcast collective function, to send the bands of the matrices A and B, use the MPI_Scatter collective function. Include all the above mentioned actions in a Matr1ScatterData function (without parameters). As a result of the call of this function, each process will receive the values NA, P, NB, Q, as well as one-dimensional arrays filled with the corresponding bands of the matrices A, B, C. Output all obtained data (that is, the numbers NA, P, NB, Q and the bands of the matrices A, B, C) in each process after calling the Matr1ScatterData function. Perform the input of initial data in the Matr1ScatterData function, perform the output of the results in the Solve function. Note. To reduce the number of the MPI_Bcast function calls, all matrix sizes may be sent as a single array.
MPI9Matr3°. Integers NA, P, NB, Q and one-dimensional arrays filled with the corresponding bands of matrices A, B, C are given in each process (thus, the given data coincide with the results obtained in the MPI9Matr2 task). Implement the first step of the band algorithm of matrix multiplication as follows: multiply elements in the bands AR and BR of each process and perform the cyclic sending each band BR to the process of the previous rank (that is, from the process 1 to the process 0, from the process 2 to the process 1, …, from the process 0 to the process K − 1, where K is the number of processes). Use the MPI_Sendrecv_replace function to send the bands. To determine the ranks of the sending and receiving processes, use the expression containing the % operator that gives the remainder of a division. Include all the above mentioned actions in a Matr1Calc function (without parameters). Output the new contents of the bands CR and BR in each process; perform data input and output in the Solve function. Note. As a result of multiplying the bands AR and BR, each element of the band CR will contain a part of the terms included in the elements of the product AB; all elements of the band BR and some of the elements of the band AR will be used (in particular, the first NB elements of the band A0 will be used in the process 0 in the first step and the last NB elements of the band AK-1 will be used in the process K − 1 in the first step).
MPI9Matr4°. Integers NA, P, NB, Q and one-dimensional arrays filled with the corresponding bands of matrices A, B, C are given in each process (thus, the given data coincide with the results obtained in the MPI9Matr2 task). Modify the Matr1Calc function, which was implemented in the previous task; the modified function should provide execution of any step of the band algorithm of matrix multiplication. To do this, add the parameter named step to the function (this parameter specifies the step number and may be in the range 0 to K − 1, where K is the number of processes) and use the value of this parameter in the part of the algorithm that deals with the recalculation of the elements of the band CR (the cyclic sending of the bands BR does not depend on the value of the parameter step). Using two calls of the modified Matr1Calc function with the parameters 0 and 1, execute two initial steps of the band algorithm and output the new contents of the bands CR and BR in each process. Perform data input and output in the Solve function. Note. The parameter step determines which part of the band AR will be used for the next recalculation of the elements of the band CR (note that these parts should be selected cyclically).
MPI9Matr5°. Integers NA, P, NB, Q and one-dimensional arrays filled with the corresponding bands of matrices A, B, C are given in each process (thus, the given data coincide with the results obtained in the MPI9Matr2 task). In addition, a number L with the same value is given in each process. The value of L is in the range 3 to K, where K is the number of processes, and determines the number of steps of the band algorithm. Using the function Matr1Calc(I) (see the previous task) in a loop with the parameter I (I = 0, …, L − 1), execute the initial L steps of the band algorithm and output the new contents of the bands CR and BR in each process. Perform data input and output in the Solve function. Remark. If the value of L is equal to K then the bands CR will contain the corresponding parts of the final matrix product AB.
MPI9Matr6°. An integer M (the number of rows of the matrix product) is given in the master process. In addition, integers NA, Q and one-dimensional arrays filled with the NA × Q bands of matrix C are given in each process (the given bands of C are obtained as a result of K steps of the band algorithm — see the MPI9Matr5 task). Send all the bands CR to the master process and output the received matrix C of the size M × Q in this process. To store the resulting matrix C in the master process, use a one-dimensional array sufficient to store the matrix of the size (NA·K) × Q. To send data to this array, use the MPI_Gather collective function. Include all the above mentioned actions in a Matr1GatherData function (without parameters). Perform the input of initial data in the Solve function, perform the output of the resulting matrix in the Matr1GatherData function.
MPI9Matr7°. Integers M, P, Q, a matrix A of the size M × P, and a matrix B of the size P × Q are given in the master process (thus, the given data coincide with the given data in the MPI9Matr2 task). Using successively the Matr1ScatterData, Matr1Calc (in a loop), and Matr1GatherData functions, that are developed in the MPI9Matr2−MPI9Matr6 tasks, find a matrix C, which is equal to the product of the initial matrices A and B, and output this matrix in the master process. In addition, output the current contents of the band CR in each process after each call of the Matr1Calc function. Modify the Matr1Calc function (see the MPI9Matr4 task), before using in this task, as follows: the bands BR should not be sent when the parameter step is equal to K − 1.
MPI9Matr8°. Integers M, P, Q and two file names are given in the master process. The given files contain elements of a matrix A of the size M × P and a matrix B of the size P × Q. Modify the initial stage of the band algorithm of matrix multiplication (see the MPI9Matr2 task) as follows: each process should read the corresponding bands of the matrices A and B directly from the given files using the MPI_File_seek and MPI_File_read_all collective functions (a new file view is not required). To send the sizes of matrices and file names, use the MPI_Bcast collective function. Include all these actions in a Matr1ScatterFile function (without parameters). As a result of the call of this function, each process will receive the values NA, P, NB, Q, as well as one-dimensional arrays filled with the corresponding bands of the matrices A, B, C. Output all obtained data (that is, the numbers NA, P, NB, Q and the bands of the matrices A, B, C) in each process after calling the Matr1ScatterFile function. Perform the input of initial data in the Matr1ScatterFile function, perform the output of the results in the Solve function. Remark. For some bands, some of their elements (namely, the last rows) or even the entire bands should not be read from the source files and will remain zero-valued ones. However, this situation does not require special processing, since the MPI_File_read_all function automatically stops reading the data (without generating any error message) when the end of the file is reached.
MPI9Matr9°. Integers NA, Q and one-dimensional arrays filled with the NA × Q bands CR are given in each process (the given bands CR are obtained as a result of K steps of the band algorithm of matrix multiplication — see the MPI9Matr5 task). In addition, an integer M (the number of rows of the matrix product) and the name of file (to store this product) are given in the master process. Send the number M and the file name to all processes using the MPI_Bcast function. Write all the parts of the matrix product contained in the bands CR to the resulting file, which will eventually contain a matrix C of the size M × Q. To write the bands to the file, use the MPI_File_seek and MPI_File_write_all collective functions. Include all these actions (namely, the input of file name, sending number M and the file name, and writing all bands to the file) in a Matr1GatherFile function. Perform the input of all initial data, except the file name, in the Solve function. Note. When writing data to the resulting file, it is necessary to take into account that some of the bands CR may contain trailing zero-valued rows that are not related to the resulting matrix product (the number M should be sent to all processes in order to control this situation).
MPI9Matr10°. Integers M, P, Q and three file names are given in the master process. The first two file names are related to the existing files containing the elements of matrices A and B of the size M × P and P × Q, respectively, the third file should be created to store the resulting matrix product C = AB. Using successively the Matr1ScatterFile, Matr1Calc (in a loop), and Matr1GatherFile functions (see the MPI9Matr8, MPI9Matr5, MPI9Matr9 tasks), find a matrix C and write its elements to the resulting file. In addition, output the current value of the c[step] in each process after each call of the Matr1Calc function, where c is a one-dimensional array containing the band CR, and step is the algorithm step number (0, 1, …, K − 1). Thus, the element c[0] should be output on the first step of the algorithm, the element c[1] should be output on the second step of the algorithm, and so on.
Band algorithm 2 (horizontal and vertical bands)
MPI9Matr11°. Integers P and Q are given in each process; in addition, a matrix B of the size P × Q is given in the master process. The number Q is a multiple of the number of processes K. Input the matrix B into a one-dimensional array of the size P·Q in the master process and define a new datatype named MPI_BAND_B that contains a vertical band of the matrix B. The width of the vertical band should be equal to NB = Q/K columns. When defining the MPI_BAND_B datatype, use the MPI_Type_vector and MPI_Type_commit functions. Include this definition in a Matr2CreateTypeBand(p, n, q, t) function with the input integer parameters p, n, q and the output parameter t of the MPI_Datatype type; the parameters p and n determine the size of the vertical band (the number of its rows and columns), and the parameter q determines the number of columns of the matrix from which this band is extracted. Using the MPI_BAND_B datatype, send to each process (including the master process) the corresponding band of the matrix B in the ascending order of ranks of receiving processes. Sending should be performed using the MPI_Send and MPI_Recv functions; the bands should be stored in one-dimensional arrays of the size P·NB. Output the received band in each process. Remark. In the MPICH2 version 1.3, the MPI_Send function call is erroneous if the sending and receiving processes are the same. You may use the MPI_Sendrecv function to send a band to the master process. You may also fill a band in the master process without using tools of the MPI library.
MPI9Matr12°. Integers M, P, Q, a matrix A of the size M × P, and a matrix B of the size P × Q are given in the master process. In the second variant of the band algorithm of matrix multiplication, the first multiplier (the matrix A) is divided into K horizontal bands and the second multiplier (the matrix B) is divided into K vertical bands, where K is the number of processes (hereinafter bands are distributed by processes and used to calculate a part of the total matrix product in each process). The band of the matrix A contains NA rows, the band of the matrix B contains NB columns. The numbers NA and NB are calculated as follows: NA = ceil(M/K), NB = ceil(Q/K), where the operation "/" means the division of real numbers and the function ceil performs rounding up. If the matrix contains insufficient number of rows (or columns) to fill the last band then the zero-valued rows (or columns) should be added to this band. Add, if necessary, the zero-valued rows or columns to the initial matrices, save them in one-dimensional arrays in the master process, and then send the matrix bands from these arrays to all processes as follows: a band with the index R is sent to the process of rank R (R = 0, 1, …, K − 1), all the bands AR are of the size NA × P, all the bands BR are of the size P × NB. In addition, create a band CR in each process to store the part of the matrix product C = AB which will be calculated in this process. Each band CR is of the size (NA·K) × NB and is filled with zero-valued elements. The bands, like the initial matrices, should be stored in one-dimensional arrays in a row-major order. To send the matrix sizes, use the MPI_Bcast collective function, to send the bands of the matrix A, use the MPI_Scatter collective function, to send the bands of the matrix B, use the MPI_Send and MPI_Recv functions and also the MPI_BAND_B datatype created by the Matr2CreateTypeBand function (see the previous task and a note to it). Include all the above mentioned actions in a Matr2ScatterData function (without parameters). As a result of the call of this function, each process will receive the values NA, P, NB, as well as one-dimensional arrays filled with the corresponding bands of the matrices A, B, C. Output all obtained data (that is, the numbers NA, P, NB and the bands of the matrices A, B, C) in each process after calling the Matr2ScatterData function. Perform the input of initial data in the Matr2ScatterData function, perform the output of the results in the Solve function. Notes. (1) When input the matrix B into an array in the master process, it should be taken into account that this array may contain elements corresponding to additional zero-valued columns. (2) To reduce the number of the MPI_Bcast function calls, all matrix sizes may be sent as a single array.
MPI9Matr13°. Integers NA, P, NB and one-dimensional arrays filled with the corresponding bands of matrices A, B, C are given in each process (thus, the given data coincide with the results obtained in the MPI9Matr12 task). Implement the first step of the band algorithm of matrix multiplication as follows: multiply elements in the bands AR and BR of each process and perform the cyclic sending each band AR to the process of the previous rank (that is, from the process 1 to the process 0, from the process 2 to the process 1, …, from the process 0 to the process K − 1, where K is the number of processes). Use the MPI_Sendrecv_replace function to send the bands. To determine the ranks of the sending and receiving processes, use the expression containing the % operator that gives the remainder of a division. Include all the above mentioned actions in a Matr2Calc function (without parameters). Output the new contents of the bands CR and AR in each process; perform data input and output in the Solve function. Note. In this variant of the band algorithm, the bands AR contain the full rows of the matrix A and the bands BR contain the full columns of the matrix B, so, as a result of their multiplication, the band CR will contain part of the elements of the final matrix product already at the first step of the algorithm (the other elements of the band CR will remain zero-valued). The location of the found elements in the band CR depends on the rank of the process (in particular, the first NA rows of the band C0 in the process 0 will be filled in the first step and the last NA rows of the band CK-1 in the process K − 1 will be filled in the first step).
MPI9Matr14°. Integers NA, P, NB and one-dimensional arrays filled with the corresponding bands of matrices A, B, C are given in each process (thus, the given data coincide with the results obtained in the MPI9Matr12 task). Modify the Matr2Calc function, which was implemented in the previous task; the modified function should provide execution of any step of the band algorithm of matrix multiplication. To do this, add the parameter named step to the function (this parameter specifies the step number and may be in the range 0 to K − 1, where K is the number of processes) and use the value of this parameter in the part of the algorithm that deals with the recalculation of the elements of the band CR (the cyclic sending of the bands AR does not depend on the value of the parameter step). Using two calls of the modified Matr2Calc function with the parameters 0 and 1, execute two initial steps of the band algorithm and output the new contents of the bands CR and AR in each process. Perform data input and output in the Solve function. Note. The parameter step determines which rows of the band CR will be calculated in this step of the algorithm (note that these rows are selected cyclically).
MPI9Matr15°. Integers NA, P, NB and one-dimensional arrays filled with the corresponding bands of matrices A, B, C are given in each process (thus, the given data coincide with the results obtained in the MPI9Matr12 task). In addition, a number L with the same value is given in each process. The value of L is in the range 3 to K, where K is the number of processes, and determines the number of steps of the band algorithm. Using the function Matr2Calc(I) (see the previous task) in a loop with the parameter I (I = 0, …, L − 1), execute the initial L steps of the band algorithm and output the new contents of the bands CR and AR in each process. Perform data input and output in the Solve function. Remark. If the value of L is equal to K then the bands CR will contain the corresponding parts of the final matrix product AB.
MPI9Matr16°. Integers M and Q (the numbers of rows and columns of the matrix product) are given in the master process. In addition, integers NA, NB and one-dimensional arrays filled with the (NA·K) × NB bands of the matrix C are given in each process (the given bands of C are obtained as a result of K steps of the band algorithm — see the MPI9Matr15 task). Send all the bands CR to the master process and output the received matrix C of the size M × Q in this process. To store the resulting matrix C in the master process, use a one-dimensional array sufficient to store the matrix of the size (NA·K) × (NB·K). To send data to this array, use the MPI_Send and MPI_Recv functions and the MPI_BAND_C datatype created by the Matr2CreateTypeBand function (see the MPI9Matr11 task and a note to it). Include all the above mentioned actions in a Matr2GatherData function (without parameters). Perform the input of initial data in the Solve function, perform the output of the resulting matrix in the Matr2GatherData function. Note. When output the matrix C in the master process, it should be taken into account that an array, which is intended for matrix storage, may contain elements corresponding to additional zero-valued columns.
MPI9Matr17°. Integers M, P, Q, a matrix A of the size M × P, and a matrix B of the size P × Q are given in the master process (thus, the given data coincide with the given data in the MPI9Matr12 task). Using successively the Matr2ScatterData, Matr2Calc (in a loop), and Matr2GatherData functions, that are developed in the MPI9Matr12−MPI9Matr16 tasks, find a matrix C, which is equal to the product of the initial matrices A and B, and output this matrix in the master process. In addition, output the current contents of the band CR in each process after each call of the Matr2Calc function. Modify the Matr2Calc function (see the MPI9Matr14 task), before using in this task, as follows: the bands AR should not be sent when the parameter step is equal to K − 1.
MPI9Matr18°. Integers M, P, Q and two file names are given in the master process. The given files contain elements of a matrix A of the size M × P and a matrix B of the size P × Q. The number Q is a multiple of the number of processes K. Modify the initial stage of the band algorithm of matrix multiplication (see the MPI9Matr12 task) as follows: each process should read the corresponding bands of the matrices A and B directly from the given files. To send the sizes of matrices and file names, use the MPI_Bcast collective function. Use the MPI_File_seek and MPI_File_read_all collective functions to read the horizontal bands of the matrix A. To read the vertical bands of the matrix B, set the appropriate file view using the MPI_File_set_view function and the MPI_BAND_B file type defined with the Matr2CreateTypeBand function (see the MPI9Matr11 task), and then use the MPI_File_read_all function. Include all these actions in a Matr2ScatterFile function (without parameters). As a result of the call of this function, each process will receive the values NA, P, NB, as well as one-dimensional arrays filled with the corresponding bands of the matrices A, B, C. Output all obtained data (that is, the numbers NA, P, NB and the bands of the matrices A, B, C) in each process after calling the Matr2ScatterFile function. Perform the input of initial data in the Matr2ScatterFile function, perform the output of the results in the Solve function. Note. A condition that the number Q is a multiple of K allows us to perform reading of the bands BR using the same file type in all processes. If this condition is not fulfilled then it would be necessary to use special types that ensure the correct reading from the file and write to the array of "truncated" bands of the matrix B in the last processes (in addition, in this case it would be necessary to send to each process the value of Q which is necessary for the correct type definition for "truncated" bands).
MPI9Matr19°. Integers NA, NB and one-dimensional arrays filled with the (NA·K) × NB bands CR are given in each process (the given bands CR are obtained as a result of K steps of the band algorithm of matrix multiplication — see the MPI9Matr15 task). In addition, an integer M (the number of rows of the matrix product) and the name of file (to store this product) are given in the master process. The number of columns Q of the matrix product is a multiple of the number of processes K (and, therefore, is equal to NB·K). Send the number M and the file name to all processes using the MPI_Bcast function. Write all the parts of the matrix product contained in the bands CR to the resulting file, which will eventually contain a matrix C of the size M × Q. To write the bands to the file, set the appropriate file view using the MPI_File_set_view function and the MPI_BAND_C file type defined with the Matr2CreateTypeBand function (see the MPI9Matr11 task), and then use the MPI_File_write_all function. Include all these actions (namely, the input of file name, sending number M and the file name, and writing all bands to the file) in a Matr2GatherFile function. Perform the input of all initial data, except the file name, in the Solve function. Note. When writing data to the resulting file, it is necessary to take into account that the bands CR may contain trailing zero-valued rows that are not related to the resulting matrix product (the number M should be sent to all processes in order to control this situation).
MPI9Matr20°. Integers M, P, Q and three file names are given in the master process. The first two file names are related to the existing files containing the elements of matrices A and B of the size M × P and P × Q, respectively, the third file should be created to store the resulting matrix product C = AB. The number Q is a multiple of the number of processes K. Using successively the Matr2ScatterFile, Matr2Calc (in a loop), and Matr2GatherFile functions (see the MPI9Matr18, MPI9Matr15, MPI9Matr19 tasks), find a matrix C and write its elements to the resulting file. In addition, output the current value of the c[step] in each process after each call of the Matr2Calc function, where c is a one-dimensional array containing the band CR, and step is the algorithm step number (0, 1, …, K − 1). Thus, the element c[0] should be output on the first step of the algorithm, the element c[1] should be output on the second step of the algorithm, and so on.
Cannon's block algorithm
MPI9Matr21°. Integers M and P are given in each process; in addition, a matrix A of the size M × P is given in the master process. The number of processes K is a perfect square: K = K0·K0, the numbers M and P are multiples of K0. Input the matrix A into a one-dimensional array of the size M·P in the master process and define a new datatype named MPI_BLOCK_A that contains a M0 × P0 block of the matrix A, where M0 = M/K0, P0 = P/K0. When defining the MPI_BLOCK_A type, use the MPI_Type_vector and MPI_Type_commit functions. Include this definition in a Matr3CreateTypeBlock(m0, p0, p, t) function with the input integer parameters m0, p0, p and the output parameter t of the MPI_Datatype type; the parameters m0 and p0 determine the size of the block, and the parameter p determines the number of columns of the matrix from which this block is extracted. Using the MPI_BLOCK_A datatype, send to each process (in ascending order of ranks of processes, including the master process) the corresponding block of the matrix A in a row-major order of blocks (that is, the first block should be sent to the process 0, the next block in the same row of blocks should be sent to the process 1, and so on). Sending should be performed using the MPI_Send and MPI_Recv functions; the blocks should be stored in one-dimensional arrays of the size M0·P0. Output the received block in each process. Remark. In the MPICH2 version 1.3, the MPI_Send function call is erroneous if the sending and receiving processes are the same. You may use the MPI_Sendrecv function to send a block to the master process. You may also fill a block in the master process without using tools of the MPI library.
MPI9Matr22°. Integers M0, P0 and a matrix A of the size M0 × P0 are given in each process. The number of processes K is a perfect square: K = K0·K0. Input the matrix A into a one-dimensional array of the size M0·P0 in each process and create a new communicator named MPI_COMM_GRID using the MPI_Cart_create function. The MPI_COMM_GRID communicator defines a Cartesian topology for all processes as a two-dimensional periodic K0 × K0 grid (ranks of processes should not be reordered). Include the creation of the MPI_COMM_GRID communicator in a Matr3CreateCommGrid(comm) function with the output parameter comm of the MPI_Comm type. Using the MPI_Cart_coords function for this communicator, output the process coordinates (I0, J0) in each process. Perform a cyclic shift of the matrices A given in all processes of each grid row I0 by I0 positions left (that is, in descending order of ranks of processes) using the MPI_Cart_shift and MPI_Sendrecv_replace functions. Output the received matrix in each process.
MPI9Matr23°. Integers M, P, Q, a matrix A of the size M × P, and a matrix B of the size P × Q are given in the master process. The number of processes K is a perfect square: K = K0·K0. In the block algorithms of matrix multiplication, the initial matrices are divided into K blocks and are interpreted as square block matrices of the order K0 (hereinafter blocks are distributed by processes and used to calculate a part of the total matrix product in each process). The block of the matrix A is of the size M0 × P0, the block of the matrix B is of the size P0 × Q0, the numbers M0, P0, Q0 are calculated as follows: M0 = ceil(M/K0), P0 = ceil(P/K0), Q0 = ceil(Q/K0), where the operation "/" means the division of real numbers and the function ceil performs rounding up. If the matrix contains insufficient number of rows (or columns) to fill the last blocks then the zero-valued rows (or columns) should be added to these blocks. Add, if necessary, the zero-valued rows or columns to the initial matrices (as a result, the matrices A and B will have the size (M0·K0) × (P0·K0) and (P0·K0) × (Q0·K0) respectively), save them in one-dimensional arrays in the master process, and then send the matrix blocks (in a row-major order) from these arrays to all processes (in ascending order of its ranks): the process R will receive the blocks AR and BR, R = 0, …, K − 1. In addition, create a block CR in each process to store the part of the matrix product C = AB which will be calculated in this process. Each block CR is of the size M0 × Q0 and is filled with zero-valued elements. The blocks, like the initial matrices, should be stored in one-dimensional arrays in a row-major order. To send the matrix sizes, use the MPI_Bcast collective function, to send the blocks of the matrices A and B, use the MPI_Send and MPI_Recv functions and also the MPI_BLOCK_A and MPI_BLOCK_B datatypes created by the Matr3CreateTypeBlock function (see the MPI9Matr21 task and a note to it). Include all the above mentioned actions in a Matr3ScatterData function (without parameters). As a result of the call of this function, each process will receive the values M0, P0, Q0, as well as one-dimensional arrays filled with the corresponding blocks of the matrices A, B, C. Output all obtained data (that is, the numbers M0, P0, Q0 and the blocks of the matrices A, B, C) in each process after calling the Matr3ScatterData function. Perform the input of initial data in the Matr3ScatterData function, perform the output of the results in the Solve function. Notes. (1) When input the matrices A and B into arrays in the master process, it should be taken into account that these arrays may contain elements corresponding to additional zero-valued columns. (2) To reduce the number of the MPI_Bcast function calls, all matrix sizes may be sent as a single array.
MPI9Matr24°. Integers M0, P0, Q0 and one-dimensional arrays filled with the corresponding blocks of matrices A, B, C are given in each process (thus, the given data coincide with the results obtained in the MPI9Matr23 task). Implement the initial block redistribution used in the Cannon's algorithm for block matrix multiplication. To do this, define a Cartesian topology for all processes as a two-dimensional periodic K0 × K0 grid, where K0·K0 is equal to the number of processes (ranks of processes should not be reordered), and perform a cyclic shift of the blocks AR given in all processes of each grid row I0 by I0 positions left (that is, in descending order of ranks of processes), I0 = 0, …, K0 − 1, and perform a cyclic shift of the blocks BR given in all processes of each grid column J0 by J0 positions up (that is, in descending order of ranks of processes), J0 = 0, …, K0 − 1. To create the MPI_COMM_GRID communicator associated with the Cartesian topology, use the Matr3CreateCommGrid function implemented in the MPI9Matr22 task. Use the MPI_Cart_coords, MPI_Cart_shift, MPI_Sendrecv_replace functions to perform the cyclic shifts (compare with MPI9Matr22). Include all the above mentioned actions in a Matr3Init function (without parameters). Output the received blocks AR and BR in each process; perform data input and output in the Solve function.
MPI9Matr25°. Integers M0, P0, Q0 and one-dimensional arrays filled with the corresponding blocks of matrices A, B, C are given in each process. The blocks CR are zero-valued, the initial redistribution for the blocks AR and BR has already been performed in accordance with the Cannon's algorithm (see the previous task). Implement one step of the Cannon's algorithm of matrix multiplication as follows: multiply elements in the blocks AR and BR of each process and perform a cyclic shift of the blocks A0 given in all processes of each row of the Cartesian periodic grid by 1 position left (that is, in descending order of ranks of processes) and perform a cyclic shift of the blocks B0 given in all processes of each grid column by 1 position up (that is, in descending order of ranks of processes). To create the MPI_COMM_GRID communicator associated with the Cartesian topology, use the Matr3CreateCommGrid function implemented in the MPI9Matr22 task. Use the MPI_Cart_shift and MPI_Sendrecv_replace functions to perform the cyclic shifts (compare with MPI9Matr22). Include all the above mentioned actions in a Matr3Calc function (without parameters). Output the new contents of the blocks CR, AR, and BR in each process; perform data input and output in the Solve function. Remark. A special feature of the Cannon's algorithm is that the actions at each step are not depend on the step number.
MPI9Matr26°. Integers M0, P0, Q0 and one-dimensional arrays filled with the corresponding blocks of matrices A, B, C are given in each process. The blocks CR are zero-valued, the initial redistribution for the blocks AR and BR has already been performed in accordance with the Cannon's algorithm (see the MPI9Matr24 task). In addition, a number L with the same value is given in each process. The value of L is in the range 2 to K0, where K0·K0 is the number of processes, and determines the number of steps of the Cannon's algorithm. Using the function Matr3Calc (see the previous task) in a loop, execute the initial L steps of the Cannon's algorithm and output the new contents of the blocks CR, AR, and BR in each process. Perform data input and output in the Solve function. Note. If the value of L is equal to K0 then the blocks CR will contain the corresponding parts of the final matrix product AB.
MPI9Matr27°. Integers M and Q (the numbers of rows and columns of the matrix product) are given in the master process. In addition, integers M0, Q0 and one-dimensional arrays filled with the M0 × Q0 blocks of the matrix C are given in each process (the given blocks of C are obtained as a result of K0 steps of the Cannon's algorithm — see the MPI9Matr26 task). Send all the blocks CR to the master process and output the received matrix C of the size M × Q in this process. To store the resulting matrix C in the master process, use a one-dimensional array sufficient to store the matrix of the size (M0·K0) × (Q0·K0). To send the blocks CR to this array, use the MPI_Send and MPI_Recv functions and the MPI_BLOCK_C datatype created by the Matr3CreateTypeBlock function (see the MPI9Matr21 task and a note to it). Include all the above mentioned actions in a Matr3GatherData function (without parameters). Perform the input of initial data in the Solve function, perform the output of the resulting matrix in the Matr3GatherData function. Note. When output the matrix C in the master process, it should be taken into account that an array, which is intended for matrix storage, may contain elements corresponding to additional zero-valued columns.
MPI9Matr28°. Integers M, P, Q, a matrix A of the size M × P, and a matrix B of the size P × Q are given in the master process (thus, the given data coincide with the given data in the MPI9Matr23 task). Using successively the Matr3ScatterData, Matr3Calc (in a loop), and Matr3GatherData functions, that are developed in the MPI9Matr23−MPI9Matr27 tasks, find a matrix C, which is equal to the product of the initial matrices A and B, and output this matrix in the master process. In addition, output the current contents of the block CR in each process after each call of the Matr3Calc function. The MPI_COMM_GRID communicator, which is used in the Matr3Init and Matr3Calc functions, should not be created several times. To do this, modify the Matr3CreateCommGrid function; the modified function should not perform any actions when it is called with the parameter comm that is not equal to MPI_COMM_NULL. In addition, modify the Matr3Calc function (see the MPI9Matr25 task), before using in this task, as follows: add the parameter named step to this function (step = 0, …, K0 − 1); the blocks AR and BR should not be sent when the parameter step is equal to K0 − 1.
MPI9Matr29°. Integers M, P, Q and two file names are given in the master process. The given files contain elements of a matrix A of the size M × P and a matrix B of the size P × Q. Modify the stage of receiving blocks for the Cannon's algorithm of matrix multiplication (see the MPI9Matr23 task) as follows: each process should read the corresponding blocks of the matrices A and B directly from the given files. In this case, all processes should receive not only the sizes M0, P0, Q0 of the blocks, but also the sizes M, P, Q of the initial matrices, which are needed to determine correctly the positions of blocks in the source files. To send the sizes of matrices and file names, use the MPI_Bcast collective function. Use the MPI_File_read_at local function to read each row of the block (a new file view is not required). Include all these actions in a Matr3ScatterFile function (without parameters). As a result of the call of this function, each process will receive the values M, P, Q, M0, P0, Q0, as well as one-dimensional arrays filled with the corresponding blocks of the matrices A, B, C. Output all obtained data (that is, the numbers M, P, Q, M0, P0, Q0 and the blocks of the matrices A, B, C) in each process after calling the Matr3ScatterFile function. Perform the input of initial data in the Matr3ScatterFile function, perform the output of the results in the Solve function. Note. For some blocks, some of their elements (namely, the last rows and/or columns) should not be read from the source files and will remain zero-valued ones. To determine the actual size of the block being read (the number of rows and columns), it is required to use the sizes of the initial matrices and the coordinates (I0, J0) of the block in a square Cartesian grid of order K0 (note that I0 = R/K0, J0 = R%K0, where R is the process rank). Remark. Whereas the values of P and Q are necessary to ensure the correct reading of the file blocks, the value of M is not required for this purpose, since the attempt to read data beyond the end of file is ignored (without generating any error message). However, the value of M is required at the final stage of the algorithm (see the next task), so it must also be sent to all processes.
MPI9Matr30°. Integers M, Q, M0, Q0 and one-dimensional arrays filled with the M0 × Q0 blocks CR are given in each process (the given blocks CR are obtained as a result of K0 steps of the Cannon's block algorithm of matrix multiplication — see the MPI9Matr25 task). In addition, the name of file to store the matrix product is given in the master process. Send the file name to all processes using the MPI_Bcast function. Write all the parts of the matrix product contained in the blocks CR to the resulting file, which will eventually contain a matrix C of the size M × Q. Use the MPI_File_write_at local function to write each row of the block to the file (a new file view is not required). Include all these actions (namely, the input of file name, sending the file name, and writing all blocks to the file) in a Matr3GatherFile function. Perform the input of all initial data, except the file name, in the Solve function. Note. When writing data to the resulting file, it is necessary to take into account that some of the blocks CR may contain trailing zero-valued rows and/or columns that are not related to the resulting matrix product (see also the note and the remark for the previous task).
MPI9Matr31°. Integers M, P, Q and three file names are given in the master process. The first two file names are related to the existing files containing the elements of matrices A and B of the size M × P and P × Q, respectively, the third file should be created to store the resulting matrix product C = AB. Using successively the Matr3ScatterFile, Matr3Init, Matr3Calc (in a loop), and Matr3GatherFile functions (see the MPI9Matr29, MPI9Matr24, MPI9Matr25, and MPI9Matr30 tasks), find a matrix C and write its elements to the resulting file. In addition, output the current value of the c[step] in each process after each call of the Matr3Calc function, where c is a one-dimensional array containing the block CR, and step is the algorithm step number (0, 1, …, K0 − 1). Thus, the element c[0] should be output on the first step of the algorithm, the element c[1] should be output on the second step of the algorithm, and so on.
Fox's block algorithm
MPI9Matr32°. Integers M and P are given in each process; in addition, a matrix A of the size M × P is given in the master process. The number of processes K is a perfect square: K = K0·K0, the numbers M and P are multiples of K0. Input the matrix A into a one-dimensional array of the size M·P in the master process and define a new datatype named MPI_BLOCK_A that contains a M0 × P0 block of the matrix A, where M0 = M/K0, P0 = P/K0. When defining the MPI_BLOCK_A type, use the MPI_Type_vector and MPI_Type_commit functions. Include this definition in a Matr4CreateTypeBlock(m0, p0, p, t) function with the input integer parameters m0, p0, p and the output parameter t of the MPI_Datatype type; the parameters m0 and p0 determine the size of the block, and the parameter p determines the number of columns of the matrix from which this block is extracted. Using the MPI_BLOCK_A datatype, send to each process (in ascending order of ranks of processes, including the master process) the corresponding block of the matrix A in a row-major order of blocks (that is, the first block should be sent to the process 0, the next block in the same row of blocks should be sent to the process 1, and so on). Sending should be performed using the MPI_Alltoallw function; the blocks should be stored in one-dimensional arrays of the size M0·P0. Output the received block in each process. Notes. (1) Use the MPI_Send and MPI_Recv functions instead of the MPI_Alltoallw function when solving this task using the MPI-1 library. (2) The MPI_Alltoallw function introduced in MPI-2 is the only collective function that allows you to specify the displacements for the sent data in bytes (not in elements). This gives opportunity to use it in conjunction with complex data types to implement any variants of collective communications (in our case, we need to implement a communication of the scatter type). It should be noted that all array parameters of the MPI_Alltoallw function associated with the sent data must be differently defined in the master and slave processes. In particular, the array scounts (which determines the number of sent elements) must contain the values 0 in all the slave processes and the value 1 in the master process (the sent elements are of the MPI_BLOCK_A datatype). At the same time, arrays associated with the received data will be defined in the same way in all processes; in particular, the zero-indexed element of the array rcounts (which determines the number of received elements) must be equal to M0·P0, and all other elements of this array must be equal to 0 (the received elements are of the MPI_INT datatype). It is necessary to pay special attention to the correct definition of elements in the array sdispls of displacements for the sent data in the master process (in the slave processes, it is enough to use the zero-valued array sdispls).
MPI9Matr33°. Integers M0, P0 and a matrix A of the size M0 × P0 are given in each process. The number of processes K is a perfect square: K = K0·K0. Input the matrix A into a one-dimensional array of the size M0·P0 in each process and create a new communicator named MPI_COMM_GRID using the MPI_Cart_create function. The MPI_COMM_GRID communicator defines a Cartesian topology for all processes as a two-dimensional periodic K0 × K0 grid (ranks of processes should not be reordered). Include the creation of the MPI_COMM_GRID communicator in a Matr4CreateCommGrid(comm) function with the output parameter comm of the MPI_Comm type. Using the MPI_Cart_coords function for this communicator, output the process coordinates (I0, J0) in each process. On the base of the MPI_COMM_GRID communicator, create a set of communicators named MPI_COMM_ROW, which are associated with the rows of the initial two-dimensional grid. Use the MPI_Cart_sub function to create the MPI_COMM_ROW communicators. Include the creation of the MPI_COMM_ROW communicators in a Matr4CreateCommRow(grid, row) function with the input parameter grid (the communicator associated with the initial two-dimensional grid) and the output parameter row (both parameters are of the MPI_Comm type). Output the process rank R0 for the MPI_COMM_ROW communicator in each process (this rank must be equal to J0). In addition, send the matrix A from the grid element (I0, I0) to all processes of the same grid row I0 (I0 = 0, …, K0 − 1) using the MPI_Bcast collective function for the MPI_COMM_ROW communicator. Save the received matrix in the auxiliary matrix T of the same size as the matrix A (it is necessary to copy the matrix A to the matrix T in the sending process before the call of the MPI_Bcast function). Output the received matrix T in each process.
MPI9Matr34°. Integers P0, Q0 and a matrix B of the size P0 × Q0 are given in each process. The number of processes K is a perfect square: K = K0·K0. Input the matrix B into a one-dimensional array of the size P0·Q0 in each process and create a new communicator named MPI_COMM_GRID, which defines a Cartesian topology for all processes as a two-dimensional periodic K0 × K0 grid. Use the Matr4CreateCommGrid function (see the MPI9Matr33 task) to create the MPI_COMM_GRID communicator. Using the MPI_Cart_coords function for this communicator, output the process coordinates (I0, J0) in each process. On the base of the MPI_COMM_GRID communicator, create a set of communicators named MPI_COMM_COL, which are associated with the columns of the initial two-dimensional grid. Use the MPI_Cart_sub function to create the MPI_COMM_COL communicators. Include the creation of the MPI_COMM_COL communicators in a Matr4CreateCommCol(grid, col) function with the input parameter grid (the communicator associated with the initial two-dimensional grid) and the output parameter col (both parameters are of the MPI_Comm type). Output the process rank R0 for the MPI_COMM_COL communicator in each process (this rank must be equal to I0). In addition, perform a cyclic shift of the matrices B given in all processes of each grid column J0 by 1 position up (that is, in descending order of ranks of processes) using the MPI_Sendrecv_replace function for the MPI_COMM_COL communicator (to determine the ranks of the sending and receiving processes, use the expression containing the % operator that gives the remainder of a division). Output the received matrix in each process.
MPI9Matr35°. Integers M, P, Q, a matrix A of the size M × P, and a matrix B of the size P × Q are given in the master process. The number of processes K is a perfect square: K = K0·K0. In the block algorithms of matrix multiplication, the initial matrices are divided into K blocks and are interpreted as square block matrices of the order K0 (hereinafter blocks are distributed by processes and used to calculate a part of the total matrix product in each process). The block of the matrix A is of the size M0 × P0, the block of the matrix B is of the size P0 × Q0, the numbers M0, P0, Q0 are calculated as follows: M0 = ceil(M/K0), P0 = ceil(P/K0), Q0 = ceil(Q/K0), where the operation "/" means the division of real numbers and the function ceil performs rounding up. If the matrix contains insufficient number of rows (or columns) to fill the last blocks then the zero-valued rows (or columns) should be added to these blocks. Add, if necessary, the zero-valued rows or columns to the initial matrices (as a result, the matrices A and B will have the size (M0·K0) × (P0·K0) and (P0·K0) × (Q0·K0) respectively), save them in one-dimensional arrays in the master process, and then send the matrix blocks (in a row-major order) from these arrays to all processes (in ascending order of its ranks): the process R will receive the blocks AR and BR, R = 0, …, K − 1. In addition, create two blocks CR and TR filled with zero-valued elements in each process: the block CR (of the size M0 × Q0) is intended to store the part of the matrix product C = AB, which will be calculated in this process, the block TR (of the size M0 × P0) is an auxiliary one. The blocks, like the initial matrices, should be stored in one-dimensional arrays in a row-major order. To send the matrix sizes, use the MPI_Bcast collective function, to send the blocks of the matrices A and B, use the MPI_Alltoallw collective function and also the MPI_BLOCK_A and MPI_BLOCK_B datatypes created by the Matr4CreateTypeBlock function (see the MPI9Matr32 task and notes to it). Include all the above mentioned actions in a Matr4ScatterData function (without parameters). As a result of the call of this function, each process will receive the values M0, P0, Q0, as well as one-dimensional arrays filled with the blocks AR, BR, CR, TR. Output all obtained data (that is, the numbers M0, P0, Q0 and the blocks AR, BR, CR, TR) in each process after calling the Matr4ScatterData function. Perform the input of initial data in the Matr4ScatterData function, perform the output of the results in the Solve function. Notes. (1) When input the matrices A and B into arrays in the master process, it should be taken into account that these arrays may contain elements corresponding to additional zero-valued columns. (2) To reduce the number of the MPI_Bcast function calls, all matrix sizes may be sent as a single array.
MPI9Matr36°. Integers M0, P0, Q0 and one-dimensional arrays filled with the blocks AR, BR, CR, TR are given in each process (thus, the given data coincide with the results obtained in the MPI9Matr35 task). A virtual Cartesian topology in the form of a square grid of order K0 is used for all processes, the value of K0·K0 is equal to the number of the processes. Each step of the Fox's block algorithm of matrix multiplication consists of two stages. In the first stage of the first step, the block AR is sent from the process with the grid coordinates (I0, I0) to all processes of the same grid row I0 (I0 = 0, …, K0 − 1). The received block is saved in the block TR in the receiving processes. Then the block TR is multiplied by the block BR from the same process and the result is added to the block CR. Implement the first stage of the first step of the Fox's algorithm. To do this, create the MPI_COMM_GRID and MPI_COMM_ROW communicators using the Matr4CreateCommGrid and Matr4CreateCommRow functions implemented in the MPI9Matr33 task. Use the MPI_Bcast function for the MPI_COMM_ROW communicator to send the blocks AR (compare with MPI9Matr33). Include all the above mentioned actions in a Matr4Calc1 function (without parameters). Output the new contents of the blocks TR and CR in each process; perform data input and output in the Solve function.
MPI9Matr37°. Integers M0, P0, Q0 and one-dimensional arrays filled with the blocks AR, BR, CR, TR are given in each process (thus, the given data coincide with the results obtained in the MPI9Matr35 task). Implement the second stage of the first step of the Fox's algorithm of matrix multiplication as follows: perform a cyclic shift of the blocks BR given in all processes of each column of the Cartesian periodic grid by 1 position up (that is, in descending order of ranks of processes). To do this, create the MPI_COMM_GRID and MPI_COMM_COL communicators using the Matr4CreateCommGrid and Matr4CreateCommCol functions implemented in the MPI9Matr34 task, then use the MPI_Sendrecv_replace function for the MPI_COMM_COL communicator to perform the cyclic shift of the blocks BR (compare with MPI9Matr34). Include all the above mentioned actions in a Matr4Calc2 function (without parameters). Output the received blocks BR in each process; perform data input and output in the Solve function.
MPI9Matr38°. Integers M0, P0, Q0 and one-dimensional arrays filled with the blocks AR, BR, CR, TR are given in each process (thus, the given data coincide with the results obtained in the MPI9Matr35 task). Modify the Matr4Calc1 function, which was implemented in the MPI9Matr36 task; the modified function should provide execution of the first stage of any step of the Fox's algorithm. To do this, add the parameter named step to the function (this parameter specifies the step number and may be in the range 0 to K0 − 1, where K0 is the order of the Cartesian grid of processes) and use the value of this parameter in the part of the algorithm that deals with the sending the blocks AR: the block AR should be sent from the process with the grid coordinates (I0, (I0 + step)%K0) to all processes of the same grid row I0, I0 = 0, …, K0 − 1 (the recalculation of the elements of the block CR does not depend on the value of the parameter step). Using successively the calls of Matr4Calc1(0), Matr4Calc2(), Matr4Calc1(1) (the Matr4Calc2 function provides the second stage of each step of the algorithm — see the MPI9Matr37 task), execute two initial steps of the Fox's algorithm and output the new contents of the blocks TR, BR, and CR in each process. Perform data input and output in the Solve function.
MPI9Matr39°. Integers M0, P0, Q0 and one-dimensional arrays filled with the blocks AR, BR, CR, TR are given in each process (thus, the given data coincide with the results obtained in the MPI9Matr35 task). In addition, a number L with the same value is given in each process. The value of L is in the range 3 to K0 and determines the number of steps of the Fox's algorithm. Using successively the calls of Matr4Calc1(0), Matr4Calc2(), Matr4Calc1(1), Matr4Calc2(), …, Matr4Calc1(L − 1) (see the MPI9Matr38 and MPI9Matr37 tasks), execute the initial L steps of the Fox's algorithm and output the new contents of the blocks TR, BR, and CR in each process. Perform data input and output in the Solve function. Remark. If the value of L is equal to K0 then the blocks CR will contain the corresponding parts of the final matrix product AB. Note that the second stage (associated with the call of the Matr4Calc2 function) is not necessary at the last step of the algorithm.
MPI9Matr40°. Integers M and Q (the numbers of rows and columns of the matrix product) are given in the master process. In addition, integers M0, Q0 and one-dimensional arrays filled with the M0 × Q0 blocks of the matrix C are given in each process (the given blocks of C are obtained as a result of K0 steps of the Fox's algorithm — see the MPI9Matr39 task). Send all the blocks CR to the master process and output the received matrix C of the size M × Q in this process. To store the resulting matrix C in the master process, use a one-dimensional array sufficient to store the matrix of the size (M0·K0) × (Q0·K0). To send the blocks CR to this array, use the MPI_Alltoallw collective function and the MPI_BLOCK_C datatype created by the Matr4CreateTypeBlock function (see the MPI9Matr32 task and notes to it). Include all the above mentioned actions in a Matr4GatherData function (without parameters). Perform the input of initial data in the Solve function, perform the output of the resulting matrix in the Matr4GatherData function. Note. When output the matrix C in the master process, it should be taken into account that an array, which is intended for matrix storage, may contain elements corresponding to additional zero-valued columns.
MPI9Matr41°. Integers M, P, Q, a matrix A of the size M × P, and a matrix B of the size P × Q are given in the master process (thus, the given data coincide with the given data in the MPI9Matr35 task). Using successively the Matr4ScatterData, Matr4Calc1, Matr4Calc2, and Matr4GatherData functions, that are developed in the MPI9Matr35−MPI9Matr40 tasks, find a matrix C, which is equal to the product of the initial matrices A and B, and output this matrix in the master process. The Matr4Calc1 and Matr4Calc2 functions should be called in a loop, the number of Matr4Calc2 function calls must be one less than the number of Matr4Calc1 function calls. In addition, output the current contents of the block CR in each process after each call of the Matr4Calc1 function. The MPI_COMM_GRID, MPI_COMM_ROW, and MPI_COMM_COL communicators that are used in the Matr4Calc1 and Matr4Calc2 functions, should not be created several times. To do this, modify the Matr4CreateCommGrid, Matr4CreateCommRow, and Matr4CreateCommCol functions (see the MPI9Matr33 and MPI9Matr34 tasks); the modified functions should not perform any actions when it is called with the parameter comm that is not equal to MPI_COMM_NULL.
MPI9Matr42°. Integers M, P, Q and two file names are given in the master process. The given files contain elements of a matrix A of the size M × P and a matrix B of the size P × Q. The numbers M, P, and Q are multiples of the order K0 of square grid of processes. Modify the stage of receiving blocks for the Fox's algorithm of matrix multiplication (see the MPI9Matr35 task) as follows: each process should read the corresponding blocks of the matrices A and B directly from the given files. To send the sizes of matrices and file names, use the MPI_Bcast collective function. To read the blocks from the files, set the appropriate file views using the MPI_File_set_view function and the MPI_BLOCK_A and MPI_BLOCK_B file types defined with the Matr4CreateTypeBlock function (see the MPI9Matr32 task), and then use the MPI_File_read_all function. Include all these actions in a Matr4ScatterFile function (without parameters). As a result of the call of this function, each process will receive the values M0, P0, Q0, as well as one-dimensional arrays filled with the blocks AR, BR, CR, TR (the blocks CR and TR should contain zero-valued elements). Output all obtained data (that is, the numbers M0, P0, Q0 and the blocks AR, BR, CR, TR) in each process after calling the Matr4ScatterFile function. Perform the input of initial data in the Matr4ScatterFile function, perform the output of the results in the Solve function. Remark. A condition that the numbers M, P, Q are multiples of K0 means that you do not need to add zero-valued rows and/or zero-valued columns to the blocks obtained from the matrices A and B, and therefore you may perform reading of the blocks AR and BR using the same file types (namely, MPI_BLOCK_A and MPI_BLOCK_B) in all processes. If this condition is not fulfilled then it would be necessary to use special types that ensure the correct reading from the file and write to the array of "truncated" blocks of the matrices A and B in some processes (in addition, in this case it would be necessary to send to each process the values of P and Q that are necessary for the correct type definition for "truncated" blocks).
MPI9Matr43°. Integers M0, Q0 and one-dimensional arrays filled with the M0 × Q0 blocks CR are given in each process (the given blocks CR are obtained as a result of K0 steps of the Fox's block algorithm of matrix multiplication — see the MPI9Matr39 task). In addition, the name of file to store the matrix product is given in the master process. The numbers M and Q (the numbers of rows and columns of the matrix product) are multiples of the order K0 of square grid of processes (thus, M = M0·K0, Q = Q0·K0). Send the file name to all processes using the MPI_Bcast function. Write all the parts of the matrix product contained in the blocks CR to the resulting file, which will eventually contain a matrix C of the size M × Q. To write the blocks to the files, set the appropriate file view using the MPI_File_set_view function and the MPI_BLOCK_C file type defined with the Matr4CreateTypeBlock function (see the MPI9Matr32 task), and then use the MPI_File_write_all collective function. Include all these actions (namely, the input of file name, sending the file name, and writing all blocks to the file) in a Matr4GatherFile function. Perform the input of all initial data, except the file name, in the Solve function. Remark. A condition that the numbers M and Q are multiples of K0 means that the blocks CR do not contain "extra" zero-valued rows and/or columns, and therefore you may perform writing of the blocks CR to the file using the same file type (namely, MPI_BLOCK_C) in all processes.
MPI9Matr44°. Integers M, P, Q and three file names are given in the master process. The first two file names are related to the existing files containing the elements of matrices A and B of the size M × P and P × Q, respectively, the third file should be created to store the resulting matrix product C = AB. The numbers M, P, and Q are multiples of the order K0 of square grid of processes. Using successively the Matr4ScatterFile, Matr4Calc1, Matr3Calc2, and Matr4GatherFile functions (see the MPI9Matr42, MPI9Matr38, MPI9Matr37, and MPI9Matr43 tasks), find a matrix C and write its elements to the resulting file. The Matr4Calc1 and Matr4Calc2 functions should be called in a loop, the number of Matr4Calc2 function calls must be one less than the number of Matr4Calc1 function calls. In addition, output the current value of the c[step] in each process after each call of the Matr4Calc1 function, where c is a one-dimensional array containing the block CR, and step is the algorithm step number (0, 1, …, K0 − 1). Thus, the element c[0] should be output on the first step of the algorithm, the element c[1] should be output on the second step of the algorithm, and so on.
|