Pemahaman mendalam mengenai sistem komputer (4) -cache lab

Depth Understanding Computer Systems Cache Lab



Mengoptimumkan fungsi transposisi matriks untuk menjadikan cache ketinggalan sesedikit mungkin. Ukuran cache adalah: 32 kumpulan, dipetakan secara langsung, 32 bait data setiap baris.

1. Misi



Tulis fungsi untuk melaksanakan matriks transpose. Dan buat cache terlewat sesedikit mungkin semasa panggilan fungsi. Tuliskan kod terakhir dalam fungsi berikut:



char transpose_submit_desc[] = 'Transpose submission' void transpose_submit(int M, int N, int A[N][M], int B[M][N])

32 × 32 : Jumlah rindu yang diperlukan kurang dari 300.



Terapkan idea partisi matriks.

Konflik miss sebenarnya ketika mengakses dua elemen di blok yang sama, kerana blok lain diakses di tengah, blok dimuat diusir, dan akses kedua meleset. Atas sebab ini, kita dapat mengakses beberapa elemen dalam blok yang sama sekaligus, dan kita tidak perlu lagi mengakses blok ini setelah akses, yang dapat mengurangkan jumlah kehilangan konflik. Di sini 8 elemen pertama berada di blok yang sama, kita dapat langsung mengeluarkan 8 elemen ini, dan kemudian blok di mana 8 elemen tersebut berada tidak perlu lagi diakses.

Pertimbangkan pemotongan 8x8:



void trans_1(int M, int N, int A[N][M], int B[M][N]) { int ii, jj, i, val1, val2, val3, val4, val5, val6, val7, val0 for(jj = 0 jj < 32 jj += 8) for(ii = 0 ii < 32 ii += 8) { for(i = ii i < ii + 8 i++) { val0 = A[i][jj] val1 = A[i][jj + 1] val2 = A[i][jj + 2] val3 = A[i][jj + 3] val4 = A[i][jj + 4] val5 = A[i][jj + 5] val6 = A[i][jj + 6] val7 = A[i][jj + 7] B[jj][i] = val0 B[jj + 1][i] = val1 B[jj + 2][i] = val2 B[jj + 3][i] = val3 B[jj + 4][i] = val4 B[jj + 5][i] = val5 B[jj + 6][i] = val6 B[jj + 7][i] = val7 } } }

64x64

Sekatan dan ubah 8x8 pertama di b

void trans_2(int M, int N, int A[N][M], int B[M][N]) { int i, j, ii, jj, val0, val1, val2, val3, val4, val5, val6, val7 for(ii = 0 ii < N ii += 8) { for(jj = 0 jj < M jj += 8) { //For each row in the 8*4 block for(i = 0 i < 4 i++) { val0 = A[ii + i][jj + 0] val1 = A[ii + i][jj + 1] val2 = A[ii + i][jj + 2] val3 = A[ii + i][jj + 3] val4 = A[ii + i][jj + 4] val5 = A[ii + i][jj + 5] val6 = A[ii + i][jj + 6] val7 = A[ii + i][jj + 7] B[jj + 0][ii + i] = val0 B[jj + 1][ii + i] = val1 B[jj + 2][ii + i] = val2 B[jj + 3][ii + i] = val3 B[jj + 0][ii + 4 + i] = val4 B[jj + 1][ii + 4 + i] = val5 B[jj + 2][ii + 4 + i] = val6 B[jj + 3][ii + 4 + i] = val7 } //First copy the first 4 rows for(i = 0 i < 4 i++)//Do the fantastic transformation! { //get this row of the right-upper 4*4 block val0 = B[jj + i][ii + 4] val1 = B[jj + i][ii + 5] val2 = B[jj + i][ii + 6] val3 = B[jj + i][ii + 7] //update this row to its correct value val4 = A[ii + 4][jj + i] val5 = A[ii + 5][jj + i] val6 = A[ii + 6][jj + i] val7 = A[ii + 7][jj + i] B[jj + i][ii + 4] = val4 B[jj + i][ii + 5] = val5 B[jj + i][ii + 6] = val6 B[jj + i][ii + 7] = val7 //update the left lower 4*4 block of B B[jj + 4 + i][ii + 0] = val0 B[jj + 4 + i][ii + 1] = val1 B[jj + 4 + i][ii + 2] = val2 B[jj + 4 + i][ii + 3] = val3 } //update the right lower 4*4 block for(i = 4 i < 8 i++) for(j = 4 j < 8 j++) B[jj + j][ii + i] = A[ii + i][jj + j] } } }

61x67

Sekatan 16x16

void trans_3(int M, int N, int A[N][M], int B[M][N]) { int ii, jj, i, j, val0, val1, val2, val3, val4, val5, val6, val7 for(ii = 0 ii + 16 < N ii += 16) for(jj = 0 jj + 16 < M jj += 16) { for(i = ii i < ii + 16 i++) { val0 = A[i][jj + 0] val1 = A[i][jj + 1] val2 = A[i][jj + 2] val3 = A[i][jj + 3] val4 = A[i][jj + 4] val5 = A[i][jj + 5] val6 = A[i][jj + 6] val7 = A[i][jj + 7] B[jj + 0][i] = val0 B[jj + 1][i] = val1 B[jj + 2][i] = val2 B[jj + 3][i] = val3 B[jj + 4][i] = val4 B[jj + 5][i] = val5 B[jj + 6][i] = val6 B[jj + 7][i] = val7 val0 = A[i][jj + 8] val1 = A[i][jj + 9] val2 = A[i][jj + 10] val3 = A[i][jj + 11] val4 = A[i][jj + 12] val5 = A[i][jj + 13] val6 = A[i][jj + 14] val7 = A[i][jj + 15] B[jj + 8][i] = val0 B[jj + 9][i] = val1 B[jj + 10][i] = val2 B[jj + 11][i] = val3 B[jj + 12][i] = val4 B[jj + 13][i] = val5 B[jj + 14][i] = val6 B[jj + 15][i] = val7 } } for(i = ii i < N i++) for(j = 0 j < M j++) B[j][i] = A[i][j] for(i = 0 i < ii i++) for(j = jj j < M j++) B[j][i] = A[i][j] }