RTL, Synthesis, P&R

16KB Cache Memory Controller - (1) RTL 및 Coverage

MiddleJo 2024. 9. 7. 20:52

진행년월: 24.06

목차

1. 배경

2. 과제 정의 및 개요

3. 소스코드

4. 시뮬레이션 결과

 

 

1. 배경

규모가 큰 설계를 하게 된다면, 메모리에 접근은 반드시 필요하게 됩니다.

Cache는 고속으로 일을 처리해 응답시간을 줄여주고,

DRAM 등에 접근하는 횟수를 줄여주어 메모리 부하를 줄여줍니다.

 

이전에 Testbench들은 몇 가지 의도된 케이스를 직접 만들어 실험하였고,

모든 케이스가 검증되었다고 볼 수는 없습니다.

Cache 메모리 컨트롤러를 설계하며,

다양한 Test case들에 대해 Coverage를 올리는 Verification에 대해 학습해 봅니다.

 

 

2. 과제 정의 및 개요

 

위 그림처럼 원래는 Cache가 DRAM에 접근하지만,

이번 설계에서는 조금 더 간단하게 하기 위해

Bus Interface가 있다고 가정하고 설계해 보겠습니다.

 

CPU Interface
1. cpu_addr은 32-bit address
2. cpu_din/cpu_dout은 32-bit in/out data
3. cpu_cs가 0일 때 전송
4. cpu_we는 Read/Write, 0일 때 Read
5. cpu_nwait는 전송이 끝나는 flag

Dram Interface
1. dram_addr은 32-bit address
2. dram_din/dram_dout은 32-bit in/out data
3. dram_cs가 0일때 전송
4. dram_we는 Read/Write, 0일 때 Read
5. dram_nwait는 전송이 끝나는 flag
 
Cache Spec.
1. 한 line에 4 word
2. 1024 lines, block address는 10-bit
3. address는 32-bit : tag는 18-bit
즉, 18(tag)+10(block addr)+2(word addr)+2(intra word addr) = 32 bit

Cache Interface
1. cache_din/cache_dout은 data(32*4) + tag(18) = 146-bit
2. valid, dirty bit는 따로 관리

 

 

 

State Diagram과 Timing Diagram은 아래와 같습니다. 

 

3. 소스코드

module cache (
				input	clk,
				input	n_reset,

				input				cpu_cs,
				input				cpu_we,
				input		[31:0]	cpu_addr,
				input		[31:0]	cpu_din,
				output reg	[31:0]	cpu_dout,
				output				cpu_nwait,

				output			dram_cs,
				output			dram_we,
				output	[31:0]	dram_addr,
				output	[31:0]	dram_din,
				input	[31:0]	dram_dout,
				input			dram_nwait
);

parameter	IDLE   = 4'b0000;
parameter	READ   = 4'b0001;
parameter	R_WMEM = 4'b0010;
parameter	R_RMEM = 4'b0011;
parameter	R_REND = 4'b0100;
parameter	R_OUT  = 4'b0101;
parameter	WRITE  = 4'b1001;
parameter	W_WMEM = 4'b1010;
parameter	W_RMEM = 4'b1011;
parameter	W_REND = 4'b1100;

reg	[3:0]	state, next;
wire		hit, dirty, valid;
reg	[1:0]	cnt;

always@(*) begin
	next = state;
	case(state)
		IDLE: begin
				if(cpu_cs == 1'b1) begin
					if(cpu_we == 1'b1) next = WRITE;
					else next = READ;
				end
			end
		READ: begin
				if(hit == 1'b1) begin
					if(cpu_cs == 1'b1) begin
						if(cpu_we == 1'b1) next = WRITE;
						else next = READ;
					end else begin
						next = IDLE;
					end
				end else begin
					if(dirty == 1'b1) next = R_WMEM;
					else next = R_RMEM;
				end
			end
		R_WMEM: begin
				if((dram_nwait == 1'b1) && (cnt == 3)) next = R_RMEM;
			end
		R_RMEM: begin
				if((dram_nwait == 1'b1) && (cnt == 3)) next = R_REND;
			end
		R_REND: begin
				if(dram_nwait == 1'b1) next = R_OUT;
			end
		R_OUT: begin
				if(cpu_cs == 1'b1) begin
					if(cpu_we == 1'b1) next = WRITE;
					else next = READ;
				end else begin
					next = IDLE;
				end
			end
		WRITE: begin
				if(hit == 1'b1) begin
					next = IDLE;
				end else begin
					if(dirty == 1'b1) next = W_WMEM;
					else next = W_RMEM;
				end
			end
		W_WMEM: begin
				if((dram_nwait == 1'b1) && (cnt == 3)) next = W_RMEM;
			end
		W_RMEM: begin
				if((dram_nwait == 1'b1) && (cnt == 3)) next = W_REND;
			end
		W_REND: begin
				if(dram_nwait == 1'b1) next = IDLE;
			end
	endcase
end

always@(negedge n_reset or posedge clk) begin
	if(n_reset == 1'b0) begin
		state <= IDLE;
		cnt <= 0;
	end else begin
		state <= next;
		if((state == R_WMEM) || (state == R_RMEM) ||
				(state == W_WMEM) || (state == W_RMEM)) begin
			if(dram_nwait == 1'b1) begin
				cnt <= cnt + 1;
			end
		end
	end
end

reg		[31:0]	cpu_addr_d;
reg		[31:0]	cpu_din_d;
wire	[145:0]	cache_dout;
reg		[145:0]	cache_line;
always@(negedge n_reset or posedge clk) begin
	if(n_reset == 1'b0) begin
		cpu_addr_d <= 'b0;
		cpu_din_d <= 'b0;
	end else begin
		if((cpu_cs == 1'b1) && (cpu_nwait == 1'b1)) begin
			cpu_addr_d <= cpu_addr;
			if(cpu_we == 1'b1) cpu_din_d <= cpu_din;
		end
	end
end
assign cpu_nwait = (state == IDLE) ||
					((state == READ) && (hit == 1'b1)) ||
					(state == R_OUT);
always@(*) begin
	if(state == READ) begin
		case(cpu_addr_d[3:2])
			2'b00: cpu_dout = cache_dout[31:0];
			2'b01: cpu_dout = cache_dout[63:32];
			2'b10: cpu_dout = cache_dout[95:64];
			2'b11: cpu_dout = cache_dout[127:96];
		endcase
	end else begin
		case(cpu_addr_d[3:2])
			2'b00: cpu_dout = cache_line[31:0];
			2'b01: cpu_dout = cache_line[63:32];
			2'b10: cpu_dout = cache_line[95:64];
			2'b11: cpu_dout = cache_line[127:96];
		endcase
	end
end

wire	cache_read = ((state == IDLE) && (cpu_cs == 1'b1))
					|| ((state == READ) && (cpu_cs == 1'b1) && (hit == 1'b1))
					|| ((state == R_OUT) && (cpu_cs == 1'b1));
wire	cache_write = ((state == WRITE) && (hit == 1'b1))
					|| ((state == R_REND) && (dram_nwait == 1'b1))
					|| ((state == W_REND) && (dram_nwait == 1'b1));
wire	cache_cs = cache_read || cache_write;
wire	cache_we = cache_write;
wire	[9:0]	cache_addr = (state == IDLE) || (state == READ) ||
														(state == R_OUT) ?
							cpu_addr[13:4] : cpu_addr_d[13:4];
// tag 18, 4 words
reg		[145:0]	cache_din;

always@(*) begin
	if(state == WRITE) begin
		cache_din[145:0] = cache_dout[145:0];
		if(cpu_addr_d[3:2] == 2'b00) cache_din[31:0] = cpu_din_d;
		if(cpu_addr_d[3:2] == 2'b01) cache_din[63:32] = cpu_din_d;
		if(cpu_addr_d[3:2] == 2'b10) cache_din[95:64] = cpu_din_d;
		if(cpu_addr_d[3:2] == 2'b11) cache_din[127:96] = cpu_din_d;
	end else if(state == R_REND) begin
		cache_din[145:128] = cpu_addr_d[31:14];
		cache_din[127:96] = dram_dout;
		cache_din[95:0] = cache_line[95:0];
	end else if(state == W_REND) begin
		cache_din[145:128] = cpu_addr_d[31:14];
		cache_din[127:96] = dram_dout;
		cache_din[95:0] = cache_line[95:0];
		if(cpu_addr_d[3:2] == 2'b00) cache_din[31:0] = cpu_din_d;
		if(cpu_addr_d[3:2] == 2'b01) cache_din[63:32] = cpu_din_d;
		if(cpu_addr_d[3:2] == 2'b10) cache_din[95:64] = cpu_din_d;
		if(cpu_addr_d[3:2] == 2'b11) cache_din[127:96] = cpu_din_d;
	end else begin
		cache_din = cache_line;
	end
end

mem_single #(
				.WD(146),
				.DEPTH(1024)
) i_cache_mem (
				.clk(clk),
				.cs(cache_cs),
				.we(cache_we),
				.addr(cache_addr),
				.din(cache_din),
				.dout(cache_dout)
);

reg		[1023:0]	valids;
reg		[1023:0]	dirtys;
always@(negedge n_reset or posedge clk) begin
	if(n_reset == 1'b0) begin
		cache_line <= 'b0;
		valids <= 1024'b0;
		dirtys <= 1024'b0;
	end else begin
		if(state == READ) begin
			cache_line <= cache_dout;
		end else if(state == WRITE) begin
			cache_line <= cache_dout;
		end else if((state == R_RMEM) && (dram_nwait == 1'b1)) begin
			if(cnt == 2'b01) cache_line[31:0] <= dram_dout;
			if(cnt == 2'b10) cache_line[63:32] <= dram_dout;
			if(cnt == 2'b11) cache_line[95:64] <= dram_dout;
		end else if((state == R_REND) && (dram_nwait == 1'b1)) begin
			cache_line[127:96] <= dram_dout;
		end else if((state == W_RMEM) && (dram_nwait == 1'b1)) begin
			if(cnt == 2'b01) cache_line[31:0] <= dram_dout;
			if(cnt == 2'b10) cache_line[63:32] <= dram_dout;
			if(cnt == 2'b11) cache_line[95:64] <= dram_dout;
		end else if((state == W_REND) && (dram_nwait == 1'b1)) begin
			cache_line[127:96] <= dram_dout;
		end
		if((state == WRITE) && (hit == 1'b1)) begin
			dirtys[cpu_addr_d[13:4]] <= 1'b1;
		end else if((state == R_REND) && (dram_nwait == 1'b1)) begin
			dirtys[cpu_addr_d[13:4]] <= 1'b0;
		end else if((state == W_REND) && (dram_nwait == 1'b1)) begin
			dirtys[cpu_addr_d[13:4]] <= 1'b1;
		end
		if((state == R_REND) && (dram_nwait == 1'b1)) begin
			valids[cpu_addr_d[13:4]] <= 1'b1;
		end else if((state == W_REND) && (dram_nwait == 1'b1)) begin
			valids[cpu_addr_d[13:4]] <= 1'b1;
		end
	end
end

assign	valid = valids[cpu_addr_d[13:4]];
assign	dirty = dirtys[cpu_addr_d[13:4]];
wire	[17:0]	tag = cache_dout[145:128];
assign	hit = (tag == cpu_addr_d[31:14]) &&
			  (valid == 1'b1);


wire	dram_read = (state == R_RMEM) || (state == W_RMEM);
wire	dram_write = (state == R_WMEM) || (state == W_WMEM);
assign	dram_cs = dram_read || dram_write;
assign	dram_we = dram_write;
assign	dram_addr = (state == R_RMEM) || (state == W_RMEM) ?
						{cpu_addr_d[31:4], cnt, 2'b00} :
						{cache_line[145:128], cpu_addr_d[13:4],cnt, 2'b00};
assign	dram_din = (cnt == 2'b00) ? cache_line[31:0] :
				   (cnt == 2'b01) ? cache_line[63:32] :
				   (cnt == 2'b10) ? cache_line[95:64] :
									cache_line[127:96];

endmodule

 

 

module mem_single #(
		  WD = 128
		, DEPTH = 64
		, WA = $clog2(DEPTH)
) ( 
		  input					clk
		, input					cs
		, input					we
		, input		[WA-1:0]	addr
		, input		[WD-1:0]	din
		, output 	[WD-1:0]	dout
);

reg	[WD-1:0]	data[DEPTH-1:0];
reg	[WA-1:0]	addr_d;

always@(posedge clk) begin
	if(cs == 1'b1) begin
		if(we == 1'b1) data[addr] <= din;
		addr_d <= addr;
	end
end
assign dout = data[addr_d];

endmodule

 

편의를 위해 system-verilog가 사용되었습니다.

verilog-HDL로 컴파일하기 위해서는

Parameter 자리에 WA 부분을 localparam으로 옮기고,

이에 따라 input 정의에 WA가 아닌 직접 기술해줘야 합니다.

 

 

- top0 (Test case 직접 구상)

module top_cache;

reg		clk, n_reset;

initial clk = 1'b0;
always #5 clk = ~clk;

reg	[31:0]	dram_data[0:64*1024*1024-1];

initial begin
	$vcdplusfile("cache.vpd");
	$vcdpluson(0,top_cache);
end

reg				cpu_cs;
reg				cpu_we;
reg		[31:0]	cpu_addr;
reg		[31:0]	cpu_din;
wire	[31:0]	cpu_dout;
wire			cpu_nwait;
initial begin
	n_reset = 1'b1;
	for(int i=0;i<64*1024*1024;i++) dram_data[i] = $random;
	#3;
	n_reset = 1'b0;
	#20;
	n_reset = 1'b1;
	cpu_cs = 1'b0;

	@(posedge clk);
	@(posedge clk);
	@(posedge clk);

	// first miss
	cpu_cs = 1'b1;
	cpu_we = 1'b0;
	cpu_addr = 32'h00A37B9C;
	while(1) begin
		@(posedge clk);
		#6;
		cpu_cs = 1'b0;
		if(cpu_nwait == 1'b1) break;
	end

	// hit
	cpu_cs = 1'b1;
	cpu_we = 1'b0;
	cpu_addr = 32'h00A37B98;
	while(1) begin
		@(posedge clk);
		#6;
		cpu_cs = 1'b0;
		if(cpu_nwait == 1'b1) break;
	end

	// miss on the same cache line
	cpu_cs = 1'b1;
	cpu_we = 1'b0;
	cpu_addr = 32'h00A3BB98;
	while(1) begin
		@(posedge clk);
		#6;
		cpu_cs = 1'b0;
		if(cpu_nwait == 1'b1) break;
	end

	// write hit
	cpu_cs = 1'b1;
	cpu_we = 1'b1;
	cpu_addr = 32'h00A3BB90;
	cpu_din = 32'hFFFFFFFF;
	while(1) begin
		@(posedge clk);
		#6;
		cpu_cs = 1'b0;
		if(cpu_nwait == 1'b1) break;
	end

	// miss & write back because of dirty
	cpu_cs = 1'b1;
	cpu_we = 1'b0;
	cpu_addr = 32'h00A37B94;
	while(1) begin
		@(posedge clk);
		#6;
		cpu_cs = 1'b0;
		if(cpu_nwait == 1'b1) break;
	end

	// miss on the same cache line
	cpu_cs = 1'b1;
	cpu_we = 1'b0;
	cpu_addr = 32'h00A3BB90;
	while(1) begin
		@(posedge clk);
		#6;
		cpu_cs = 1'b0;
		if(cpu_nwait == 1'b1) break;
	end

	@(posedge clk);
	@(posedge clk);
	@(posedge clk);
	@(posedge clk);
	$finish;
end

wire			dram_cs;
wire			dram_we;
wire	[31:0]	dram_addr;
wire	[31:0]	dram_din;
wire	[31:0]	dram_dout;
wire			dram_nwait;

cache i_cache(
				.clk(clk),
				.n_reset(n_reset),

				.cpu_cs(cpu_cs),
				.cpu_we(cpu_we),
				.cpu_addr(cpu_addr),
				.cpu_din(cpu_din),
				.cpu_dout(cpu_dout),
				.cpu_nwait(cpu_nwait),

				.dram_cs(dram_cs),
				.dram_we(dram_we),
				.dram_addr(dram_addr),
				.dram_din(dram_din),
				.dram_dout(dram_dout),
				.dram_nwait(dram_nwait)
);

reg		[1:0]	cnt;
reg				dram_we_d;
reg		[31:0]	dram_addr_d;
reg		[31:0]	dram_din_d;
always@(negedge n_reset or posedge clk) begin
	if(n_reset == 1'b0) begin
		cnt <= 0;
	end else begin
		if((dram_cs == 1'b1) || (cnt > 0)) begin
			cnt <= cnt + 1;
		end
		if((dram_cs == 1'b1) && (dram_nwait == 1'b1)) begin
			dram_we_d <= dram_we;
			dram_addr_d <= dram_addr;
			dram_din_d <= dram_din;
		end
		if((dram_we_d == 1'b1) && (cnt == 3)) begin
			dram_data[dram_addr_d[31:2]] <= dram_din_d;
		end
	end
end

assign dram_dout = (dram_nwait==1'b1) ? dram_data[dram_addr_d[31:2]] : 'bx;
assign dram_nwait = (cnt == 0);

endmodule

 

기존의 방식처럼 4가지 경우의 수를 직접 기술한 테스트벤치입니다.

편하게 확장하기 위해 system verilog로 작성하였습니다.

 

 

- top1 Random generation

module top_cache;
parameter	DRAM_SIZE = 64*1024*1024;

reg		clk, n_reset;

initial clk = 1'b0;
always #5 clk = ~clk;

reg	[31:0]	dram_data[0:DRAM_SIZE-1];
reg	[31:0]	dram_data_ref[0:DRAM_SIZE-1];

initial begin
	$shm_open("./waveform");
	$shm_probe(top_cache,"AS");
end

reg				cpu_cs;
reg				cpu_we;
reg		[31:0]	cpu_addr;
reg		[31:0]	cpu_din;
wire	[31:0]	cpu_dout;
wire			cpu_nwait;
initial begin
	n_reset = 1'b1;
	for(int i=0;i<DRAM_SIZE;i++) begin
		dram_data[i] = $random;
		dram_data_ref[i] = dram_data[i];
	end
	#3;
	n_reset = 1'b0;
	#20;
	n_reset = 1'b1;
	cpu_cs = 1'b0;

	@(posedge clk);
	@(posedge clk);
	@(posedge clk);
	#6;

	repeat(10000) begin                   // -----> 10000 cases
		cpu_cs = $random % 2;             // -----> access or not
		if(cpu_cs == 1'b1) begin
			cpu_we = $random % 2;         // -----> read or write
			cpu_addr = {$random & (DRAM_SIZE-1), 2'b00}; // ----> random addr
			if(cpu_we == 1'b1) cpu_din = $random;        // ----> random data on write
			else cpu_din = 'bx;
		end else begin
			cpu_we = 1'bx;
			cpu_addr = 'bx;
			cpu_din = 'bx;
		end
		while(1) begin
			@(posedge clk);
			#6;
			if(cpu_nwait == 1'b1) begin
				break;
			end else begin
				cpu_cs = 1'bx;
				cpu_we = 1'bx;
				cpu_addr = 'bx;
				cpu_din = 'bx;
			end
		end
	end

	@(posedge clk);
	@(posedge clk);
	@(posedge clk);
	@(posedge clk);
	$finish;
end

wire			dram_cs;
wire			dram_we;
wire	[31:0]	dram_addr;
wire	[31:0]	dram_din;
wire	[31:0]	dram_dout;
reg				dram_nwait;

cache i_cache(
				.clk(clk),
				.n_reset(n_reset),

				.cpu_cs(cpu_cs),
				.cpu_we(cpu_we),
				.cpu_addr(cpu_addr),
				.cpu_din(cpu_din),
				.cpu_dout(cpu_dout),
				.cpu_nwait(cpu_nwait),

				.dram_cs(dram_cs),
				.dram_we(dram_we),
				.dram_addr(dram_addr),
				.dram_din(dram_din),
				.dram_dout(dram_dout),
				.dram_nwait(dram_nwait)
);

reg		[1:0]	cnt;
reg				dram_we_d;
reg		[31:0]	dram_addr_d;
reg		[31:0]	dram_din_d;
always@(negedge n_reset or posedge clk) begin
	if(n_reset == 1'b0) begin
		cnt <= 0;
	end else begin
		if((dram_cs == 1'b1) || (cnt > 0)) begin
			cnt <= cnt + 1;
		end
		if((dram_cs == 1'b1) && (dram_nwait == 1'b1)) begin
			dram_we_d <= dram_we;
			dram_addr_d <= dram_addr;
			dram_din_d <= dram_din;
		end
		if((dram_we_d == 1'b1) && (cnt == 3)) begin
			dram_data[dram_addr_d[31:2]] <= dram_din_d;
		end
	end
end

assign dram_dout = (dram_nwait==1'b1) ? dram_data[dram_addr_d[31:2]] : 'bx;
assign dram_nwait = (cnt == 0);

reg		[31:0]	cpu_addr_d;
reg		[31:0]	cpu_din_d;
reg		[1:0]	cpu_prev_op;
always@(posedge clk) begin
	if(cpu_nwait == 1'b1) begin
		cpu_prev_op <= {cpu_cs, cpu_we};
		if(cpu_cs == 1'b1) begin
			cpu_addr_d <= cpu_addr;
			cpu_din_d <= cpu_din;
		end
		if(cpu_prev_op == 2'b10) begin
			if(dram_data_ref[cpu_addr_d[31:2]] == cpu_dout) begin
			end else begin
				$display("Error!! addr = %X, dram_data = %X, cpu_dout = %X",
					cpu_addr_d, dram_data_ref[cpu_addr_d[31:2]], cpu_dout);
				#10; $finish;
			end
		end else if(cpu_prev_op == 2'b11) begin
			dram_data_ref[cpu_addr_d[31:2]] <= cpu_din_d;
		end
	end
end


endmodule

 

직접 기술할 수 없을 수많은 케이스를 확인하기 위해

Random generation을 사용해 10000가지 경우의 수로 검증합니다.

 

 

- top2 (Constrained Random generation)

module top_cache;
parameter	DRAM_SIZE = 64*1024*1024;

reg		clk, n_reset;

initial clk = 1'b0;
always #5 clk = ~clk;

reg	[31:0]	dram_data[0:DRAM_SIZE-1];
reg	[31:0]	dram_data_ref[0:DRAM_SIZE-1];

initial begin
	$shm_open("./waveform");
	$shm_probe(top_cache,"AS");
end

reg				cpu_cs;
reg				cpu_we;
reg		[31:0]	cpu_addr;
reg		[31:0]	cpu_din;
wire	[31:0]	cpu_dout;
wire			cpu_nwait;
reg		[31:0]	p_addr;
initial begin
	n_reset = 1'b1;
	for(int i=0;i<DRAM_SIZE;i++) begin
		dram_data[i] = $random;
		dram_data_ref[i] = dram_data[i];
	end
	#3;
	n_reset = 1'b0;
	#20;
	n_reset = 1'b1;
	cpu_cs = 1'b0;

	@(posedge clk);
	@(posedge clk);
	@(posedge clk);
	#6;

	p_addr = {$random & (DRAM_SIZE-1), 2'b00};
	repeat(10000) begin
		cpu_cs = $random % 2;
		if(cpu_cs == 1'b1) begin
			cpu_we = $random % 2;
			if($random%4 > 0) begin         // -----> in 75% probability
				if($random%2 == 0) cpu_addr = p_addr + $random%4 * 4;     // --    increase
				else cpu_addr = p_addr - $random%4 * 4;                   //  |--> or decrease
				if(cpu_addr >= DRAM_SIZE*4) cpu_addr = (DRAM_SIZE-1) * 4; // --    previous addr
			end else begin                  // -----> in 25% probability
				cpu_addr = {$random & (DRAM_SIZE-1), 2'b00};  // ----> new random addr
			end
			p_addr = cpu_addr;                                // ----> save generated addr
			if(cpu_we == 1'b1) cpu_din = $random;
			else cpu_din = 'bx;
		end else begin
			cpu_we = 1'bx;
			cpu_addr = 'bx;
			cpu_din = 'bx;
		end
		while(1) begin
			@(posedge clk);
			#6;
			if(cpu_nwait == 1'b1) begin
				break;
			end else begin
				cpu_cs = 1'bx;
				cpu_we = 1'bx;
				cpu_addr = 'bx;
				cpu_din = 'bx;
			end
		end
	end

	@(posedge clk);
	@(posedge clk);
	@(posedge clk);
	@(posedge clk);
	$finish;
end

wire			dram_cs;
wire			dram_we;
wire	[31:0]	dram_addr;
wire	[31:0]	dram_din;
wire	[31:0]	dram_dout;
reg				dram_nwait;

cache i_cache(
				.clk(clk),
				.n_reset(n_reset),

				.cpu_cs(cpu_cs),
				.cpu_we(cpu_we),
				.cpu_addr(cpu_addr),
				.cpu_din(cpu_din),
				.cpu_dout(cpu_dout),
				.cpu_nwait(cpu_nwait),

				.dram_cs(dram_cs),
				.dram_we(dram_we),
				.dram_addr(dram_addr),
				.dram_din(dram_din),
				.dram_dout(dram_dout),
				.dram_nwait(dram_nwait)
);

reg		[1:0]	cnt;
reg				dram_we_d;
reg		[31:0]	dram_addr_d;
reg		[31:0]	dram_din_d;
always@(negedge n_reset or posedge clk) begin
	if(n_reset == 1'b0) begin
		cnt <= 0;
	end else begin
		if((dram_cs == 1'b1) || (cnt > 0)) begin
			cnt <= cnt + 1;
		end
		if((dram_cs == 1'b1) && (dram_nwait == 1'b1)) begin
			dram_we_d <= dram_we;
			dram_addr_d <= dram_addr;
			dram_din_d <= dram_din;
		end
		if((dram_we_d == 1'b1) && (cnt == 3)) begin
			dram_data[dram_addr_d[31:2]] <= dram_din_d;
		end
	end
end

assign dram_dout = (dram_nwait==1'b1) ? dram_data[dram_addr_d[31:2]] : 'bx;
assign dram_nwait = (cnt == 0);

reg		[31:0]	cpu_addr_d;
reg		[31:0]	cpu_din_d;
reg		[1:0]	cpu_prev_op;
always@(posedge clk) begin
	if(cpu_nwait == 1'b1) begin
		cpu_prev_op <= {cpu_cs, cpu_we};
		if(cpu_cs == 1'b1) begin
			cpu_addr_d <= cpu_addr;
			cpu_din_d <= cpu_din;
		end
		if(cpu_prev_op == 2'b10) begin
			if(dram_data_ref[cpu_addr_d[31:2]] == cpu_dout) begin
			end else begin
				$display("Error!! addr = %X, dram_data = %X, cpu_dout = %X",
					cpu_addr_d, dram_data_ref[cpu_addr_d[31:2]], cpu_dout);
				#10; $finish;
			end
		end else if(cpu_prev_op == 2'b11) begin
			dram_data_ref[cpu_addr_d[31:2]] <= cpu_din_d;
		end
	end
end


endmodule

 

무작위로 할 경우, 원하는 상황이 나올지 안 나올지 확인하기 힘듭니다.

제약조건을 걸어 75% 확률로 hit를 만들고,

이전 주소에서 word단위로 이동되도록 합니다.

이 과정에서 DRAM의 유효범위를 넘지 않게 합니다.

 

 

-top3 (task 적용)

module top_cache;
parameter	DRAM_SIZE = 64*1024*1024;

reg		clk, n_reset;

initial clk = 1'b0;
always #5 clk = ~clk;

reg	[31:0]	dram_data[0:DRAM_SIZE-1];
reg	[31:0]	dram_data_ref[0:DRAM_SIZE-1];

initial begin
	$shm_open("./waveform");
	$shm_probe(top_cache,"AS");
end

reg				cpu_cs;
reg				cpu_we;
reg		[31:0]	cpu_addr;
reg		[31:0]	cpu_din;
wire	[31:0]	cpu_dout;
wire			cpu_nwait;

task mem_drive (
	input			cs,
	input			we,
	input	[31:0]	addr,
	input	[31:0]	din
);
begin
	cpu_cs = cs;
	if(cs == 1'b1) begin
		cpu_we = we;
		cpu_addr = addr;
		if(we == 1'b1) cpu_din = din;
		else cpu_din = 'bx;
	end else begin
		cpu_we = 1'bx;
		cpu_addr = 'bx;
		cpu_din = 'bx;
	end
	while(1) begin
		@(posedge clk);
		#6;
		if(cpu_nwait == 1'b1) begin
			break;
		end else begin
			cpu_cs = 1'bx;
			cpu_we = 1'bx;
			cpu_addr = 'bx;
			cpu_din = 'bx;
		end
	end
end
endtask

reg				cs;
reg				we;
reg		[31:0]	addr;
reg		[31:0]	din;
reg		[31:0]	p_addr;

initial begin
	n_reset = 1'b1;
	for(int i=0;i<DRAM_SIZE;i++) begin
		dram_data[i] = $random;
		dram_data_ref[i] = dram_data[i];
	end
	#3;
	n_reset = 1'b0;
	#20;
	n_reset = 1'b1;
	cpu_cs = 1'b0;

	@(posedge clk);
	@(posedge clk);
	@(posedge clk);
	#6;

	// first miss
	mem_drive(1'b1, 1'b0, 32'h00A37B9C, 'b0);
	// hit
	mem_drive(1'b1, 1'b0, 32'h00A37B98, 'b0);
	// miss on the same cache line
	mem_drive(1'b1, 1'b0, 32'h00A3BB98, 'b0);
	// write hit
	mem_drive(1'b1, 1'b1, 32'h00A3BB90, 32'hFFFFFFFF);
	// miss & write back because of dirty
	mem_drive(1'b1, 1'b0, 32'h00A37B94, 'b0);
	// miss on the same cache line
	mem_drive(1'b1, 1'b0, 32'h00A3BB90, 'b0);

	p_addr = {$random & (DRAM_SIZE-1), 2'b00};
	repeat(10000) begin
		cs = $random % 2;
		we = $random % 2;
		if($random%4 > 0) begin
			if($random%2 == 0) addr = p_addr + $random%4 * 4;
			else addr = p_addr - $random%4 * 4;
			if(addr >= DRAM_SIZE*4) addr = (DRAM_SIZE-1) * 4;
		end else begin
			addr = {$random & (DRAM_SIZE-1), 2'b00};
		end
		din = $random;
		p_addr = addr;

		mem_drive(cs, we, addr, din);
	end

	@(posedge clk);
	@(posedge clk);
	@(posedge clk);
	@(posedge clk);
	$finish;
end

wire			dram_cs;
wire			dram_we;
wire	[31:0]	dram_addr;
wire	[31:0]	dram_din;
wire	[31:0]	dram_dout;
reg				dram_nwait;

cache i_cache(
				.clk(clk),
				.n_reset(n_reset),

				.cpu_cs(cpu_cs),
				.cpu_we(cpu_we),
				.cpu_addr(cpu_addr),
				.cpu_din(cpu_din),
				.cpu_dout(cpu_dout),
				.cpu_nwait(cpu_nwait),

				.dram_cs(dram_cs),
				.dram_we(dram_we),
				.dram_addr(dram_addr),
				.dram_din(dram_din),
				.dram_dout(dram_dout),
				.dram_nwait(dram_nwait)
);

reg		[1:0]	cnt;
reg				dram_we_d;
reg		[31:0]	dram_addr_d;
reg		[31:0]	dram_din_d;
always@(negedge n_reset or posedge clk) begin
	if(n_reset == 1'b0) begin
		cnt <= 0;
	end else begin
		if((dram_cs == 1'b1) || (cnt > 0)) begin
			cnt <= cnt + 1;
		end
		if((dram_cs == 1'b1) && (dram_nwait == 1'b1)) begin
			dram_we_d <= dram_we;
			dram_addr_d <= dram_addr;
			dram_din_d <= dram_din;
		end
		if((dram_we_d == 1'b1) && (cnt == 3)) begin
			dram_data[dram_addr_d[31:2]] <= dram_din_d;
		end
	end
end

assign dram_dout = (dram_nwait==1'b1) ? dram_data[dram_addr_d[31:2]] : 'bx;
assign dram_nwait = (cnt == 0);

reg		[31:0]	cpu_addr_d;
reg		[31:0]	cpu_din_d;
reg		[1:0]	cpu_prev_op;
always@(posedge clk) begin
	if(cpu_nwait == 1'b1) begin
		cpu_prev_op <= {cpu_cs, cpu_we};
		if(cpu_cs == 1'b1) begin
			cpu_addr_d <= cpu_addr;
			cpu_din_d <= cpu_din;
		end
		if(cpu_prev_op == 2'b10) begin
			if(dram_data_ref[cpu_addr_d[31:2]] == cpu_dout) begin
			end else begin
				$display("Error!! addr = %X, dram_data = %X, cpu_dout = %X",
					cpu_addr_d, dram_data_ref[cpu_addr_d[31:2]], cpu_dout);
				#10; $finish;
			end
		end else if(cpu_prev_op == 2'b11) begin
			dram_data_ref[cpu_addr_d[31:2]] <= cpu_din_d;
		end
	end
end


endmodule

 

처음에 의도한 4개의 case를 보고, random 한 케이스를 봅니다.

4개의 case는 주소만 다를 뿐 같은 작업이므로,

task 함수를 만들어 활용합니다.

 

 

4. 시뮬레이션 결과

동작을 확인해 보기 위해 function simulation을 합니다.

Tool은 Xcelium과 VCS를 사용하였습니다.

 

VCS 툴에서는 Coverage를 제공합니다.

Coverage는, Statement, line, block coverage를 계산해

모든 경우의 수 중 얼마나 검증해 봤는가 확인해 주는 기능입니다.

 

다음과 같이 사용할 수 있으며, 이후 결과만 보여드리도록 하겠습니다.

 

- 명령어

vcs -full64 -sverilog -debug_access+all top0.v cache.v mem_sim.v -cm line+cond+fsm+tgl+branch -cmhier cov.conf

./simv -cm line+cond+fsm+tgl+branch
urg -full64 -dir ./simv.vdb -format text
vi urgReport/modinfo.txt

 

 

- cov.conf

+tree top_cache.i_cache

 

 

 

 

4-1. top0 (Testcase 직접구상)

 

1. 처음으로 읽는 부분은 miss이고, dram에서 데이터를 읽어옵니다.

이때, 근처의 데이터 포함 4개의 데이터가 읽어져 cache에 저장됩니다.

 

2. 그다음 비슷한 주소를 읽으면, cache에 저장된 데이터가 즉시 불러와집니다.

 

1. 그 다음 miss에 의해 다시 dram에서 데이터를 읽어옵니다.

 

2. 그 다음 cache에 저장된 주소중 하나에 쓰기를 진행하면(hit), 즉시 cache의 데이터가 수정됩니다.

 

 

1. 이후 miss 읽기를 진행하면, dirty bit가 1이기 때문에,

먼저 기존 주소의 dram을 update 하기 위해 쓰기를 진행합니다.

 

2. 그다음 원래 요청주소에 대해 dram에서 데이터를 읽어옵니다.

 

 

- Coverage_top0

 

직접 만든 케이스들은, 모든 경우의 수를 테스트하지 못합니다.

그래서 coverage score 역시 매우 낮습니다.

 

 

 

4-2. top1 (random generation)

 

random generation에 의해 1만 가지 case가 동작하고 있습니다.

 

- Coverage_top1

 

top0와 달리 점수가 월등하게 오른 것을 볼 수 있습니다.

 

 

 

4-3. top2 (Constrained random generation)

이 경우 역시 top1과 마찬가지로 waveform은 의미가 없으므로, 바로 점수를 보도록 하겠습니다.

 

 

- Coverage_top2

 

조금이지만 score가 더 올라간 것을 확인할 수 있습니다.

 

 

 

 

위처럼 직접 찾아 확인해 볼 수 있고, 다시 방법을 찾아 검증할 수 있습니다.

현재는 Cover 되지 않은 부분 중에 문제가 되는 경우는 없습니다.

따라서 여기까지만 테스트하도록 합니다.

 

 

 

※참고: 직접 확인하는 방법 말고, functional coverage를 사용하는 방법도 있습니다.

program automatic test;
    covergroup fcov @(port_event);   // coverage group 생성
        coverpoint sa;
        coverpoint da;
    endgroup: fcov

    bit[3:0] sa, da;
    event port_event;
    real coverage = 0.0;
    fcov port_fc = new();    // Instantiate & coverage object 생성 

    initial while (coverage < 99.5) begin
        ...
        sa = pkt_ref.sa;
        da = pkt_ref.da;
        ->port_event;      // port_fc coverage group의 data가 sampling 됨
        // port_fc.sample(); // alternative form of updating of bins

        coverage = $get_coverage(); // overall coverage  , coverage result query
        // coverage = port_fc.get_inst_coverage(); // instance coverage
    end
endprogram: test

출처: https://wikidocs.net/172887

 

다음 검증에는 위와 같은 방법도 고려해보려고 합니다.