FIFO基础与CDC设计


1 前言

        FIFO(First In First Out)是数字系统中最常见的 Buffer 结构之一,广泛用于流水线解耦、速率匹配、突发流量吸收、总线桥接以及跨时钟域数据传输。对于数字 IC 和 FPGA 设计而言,FIFO 看似简单,但真正容易出错的地方往往集中在:

        本文按照 Buffer 基础学习路径展开,重点讨论同步 FIFO、异步 FIFO、peek FIFO 与 non-peek FIFO,并给出可综合 SystemVerilog RTL 代码示例。


2 学习路径

        FIFO 的学习建议按如下路径展开:

FIFO学习路径

        对应的学习目标如下:

类别 内容
Buffer 基础 理解 buffering、backpressure、burst absorption
FIFO 类型 sync FIFO、async FIFO、peek FIFO、non-peek FIFO
CDC 掌握异步 FIFO 中 Gray pointer 和 synchronizer 的作用
架构重点 full/empty、深度 sizing、边界条件和 reset
RTL 能力 能够写出可综合 SystemVerilog FIFO 模块

3 Buffer 与 FIFO 的基本概念

3.1 为什么需要 Buffer

        Buffer 的核心作用是解耦 producer 和 consumer。当上游和下游存在速率差异、响应延迟或突发流量时,如果没有 Buffer,系统很容易出现吞吐下降、数据丢失或控制逻辑复杂化。

        FIFO 常用于以下场景:

3.2 常见 Buffer 类型

类型 说明
Register buffer 通常只有 1-entry,用于打拍
Skid buffer 常用于 ready/valid timing 优化
Sync FIFO 单时钟域多 entry buffer
Async FIFO 跨时钟域 FIFO
Peek FIFO 支持查看队头但不弹出

4 同步 FIFO 架构

        同步 FIFO 的读写操作位于同一个时钟域。典型组成包括:

同步FIFO基本框图

4.1 count-based full/empty

        对同步 FIFO 而言,最直观的方式是使用 count 记录当前 FIFO 中的数据个数:

empty = count == 0
full  = count == DEPTH

        count 更新规则:

操作 count 变化
write only count + 1
read only count - 1
write and read count 不变
no operation count 不变

4.2 同步 FIFO RTL

        下面给出一个 count-based 同步 FIFO,支持同周期读写。为了方便指针回绕,深度参数建议取 2 的幂。

module sync_fifo #(
    parameter int DATA_WIDTH = 32,
    parameter int DEPTH      = 16,
    localparam int ADDR_WIDTH = $clog2(DEPTH),
    localparam int CNT_WIDTH  = $clog2(DEPTH + 1)
) (
    input  logic                  clk,
    input  logic                  rst_n,

    input  logic                  wr_en,
    input  logic [DATA_WIDTH-1:0] wr_data,
    output logic                  full,

    input  logic                  rd_en,
    output logic [DATA_WIDTH-1:0] rd_data,
    output logic                  empty,

    output logic [CNT_WIDTH-1:0]  level
);

    logic [DATA_WIDTH-1:0] mem [DEPTH];

    logic [ADDR_WIDTH-1:0] wr_ptr;
    logic [ADDR_WIDTH-1:0] rd_ptr;
    logic [CNT_WIDTH-1:0]  count;

    logic do_write;
    logic do_read;

    assign empty = (count == '0);
    assign full  = (count == DEPTH[CNT_WIDTH-1:0]);
    assign level = count;

    assign do_write = wr_en && (!full || rd_en);
    assign do_read  = rd_en && !empty;

    assign rd_data = mem[rd_ptr];

    always_ff @(posedge clk or negedge rst_n) begin
        if (!rst_n) begin
            wr_ptr <= '0;
        end else if (do_write) begin
            mem[wr_ptr] <= wr_data;
            wr_ptr      <= wr_ptr + 1'b1;
        end
    end

    always_ff @(posedge clk or negedge rst_n) begin
        if (!rst_n) begin
            rd_ptr <= '0;
        end else if (do_read) begin
            rd_ptr <= rd_ptr + 1'b1;
        end
    end

    always_ff @(posedge clk or negedge rst_n) begin
        if (!rst_n) begin
            count <= '0;
        end else begin
            unique case ({do_write, do_read})
                2'b10: count <= count + 1'b1;
                2'b01: count <= count - 1'b1;
                default: count <= count;
            endcase
        end
    end

endmodule

        这里的关键点是:


5 full / empty 判断

5.1 为什么 wr_ptr == rd_ptr 有歧义

        在环形 FIFO 中,wr_ptr == rd_ptr 可能表示:

情况 1:FIFO empty
情况 2:FIFO full after wrap-around

        因此需要额外信息区分 full 和 empty。常用方法包括:

5.2 pointer with extra bit

        对深度为 2 的幂的 FIFO,可以让指针多一位。低位作为地址,高位作为 wrap bit。

DEPTH      = 8
ADDR_WIDTH = 3
PTR_WIDTH  = 4

判断方式:

assign empty = (wr_ptr == rd_ptr);

assign full = (wr_ptr[ADDR_WIDTH]     != rd_ptr[ADDR_WIDTH]) &&
              (wr_ptr[ADDR_WIDTH-1:0] == rd_ptr[ADDR_WIDTH-1:0]);

        这种方法在同步 FIFO 和异步 FIFO 中都很常见。


6 FIFO 深度 sizing 与计算

        FIFO 深度 sizing 的核心问题是:在最坏情况下,FIFO 需要吸收多少尚未被下游消费的数据。换句话说,需要计算 FIFO occupancy 的最大值:

occupancy(t) = total_write(t) - total_read(t)
depth >= max(occupancy)

        实际工程中,FIFO 深度通常不是只看平均吞吐率,而是看最坏 burst、下游 stall、仲裁延迟、CDC 反馈延迟和 backpressure 生效延迟。

6.1 通用计算方法

        对任意一段时间窗口 T,假设:

W(T) = 该窗口内最多写入的数据个数
R(T) = 该窗口内最少读出的数据个数

        则 FIFO 需要满足:

depth >= max_over_T { W(T) - R(T) }

        如果系统支持 backpressure,还需要额外考虑从 FIFO 即将满到上游真正停止写入之间的延迟:

depth >= max_over_T { W(T) - R(T) } + backpressure_latency_write_count

其中:

backpressure_latency_write_count = write_rate * response_latency

6.2 写快读慢

        如果写入速率为 W,读出速率为 R,burst 持续 B 个周期,并且 W > R,则 FIFO 至少需要:

depth >= (W - R) * B

        例如:

write rate   = 1 item/cycle
read rate    = 0.5 item/cycle
burst length = 16 cycles

depth >= (1 - 0.5) * 16 = 8

        如果上游看到 full 后还需要 2 个周期才能停止写入,则额外 margin 为:

extra = write_rate * response_latency
      = 1 * 2
      = 2

depth >= 8 + 2 = 10

        工程实现时通常取 2 的幂:

depth = 16

6.3 下游 stall

        如果上游持续写入,而下游最多 stall S 个周期,则在 stall 期间 FIFO 只进不出:

depth >= write_rate * S

        例如:

write_rate = 1 item/cycle
max_stall  = 12 cycles

depth >= 12

        如果 full 反馈到上游需要 3 个周期才生效:

extra = 1 * 3 = 3

depth >= 12 + 3 = 15

        工程上通常选择:

depth = 16

6.4 周期性速率不匹配

        有些接口的平均速率相同,但局部速率不同。例如上游每 4 个周期连续写 4 个数据,下游每 4 个周期均匀读 4 个数据。虽然平均速率都是 1 item/cycle,但瞬时 occupancy 可能增加。

        计算时应列出每个 cycle 的写读情况:

cycle write read occupancy
0 1 0 1
1 1 0 2
2 1 0 3
3 1 0 4
4 0 1 3
5 0 1 2
6 0 1 1
7 0 1 0

        因此:

depth >= max occupancy = 4

        这个例子说明:平均带宽相等不代表 FIFO 深度可以为 1,还必须考虑 burst pattern。

6.5 总线仲裁延迟

        如果 FIFO 的读侧需要等待仲裁,例如多个 master 共享一个 downstream port,则读侧可能在最坏情况下等待 A 个周期才获得服务。

        假设:

write_rate = W item/cycle
arb_latency = A cycles

则至少需要:

depth >= W * A

        如果获得仲裁后读侧带宽仍低于写侧,还需要叠加后续速率差:

depth >= W * A + (W - R) * B

6.6 异步 FIFO 深度计算

        异步 FIFO 除了速率差异,还要考虑 CDC 反馈延迟。由于 full 依赖同步到写时钟域的读指针,empty 依赖同步到读时钟域的写指针,因此 full/empty 的感知天然滞后。

        常用估算公式:

depth >= burst_accumulation + cdc_feedback_margin

其中:

burst_accumulation = max_write_accumulation - min_read_drain

而:

cdc_feedback_margin =
    synchronizer_latency
  + pointer_compare_latency
  + upstream_response_latency

        注意这里的 latency 需要换算成写侧可能继续写入的数据个数:

cdc_feedback_margin_items = write_rate * feedback_latency_in_wr_clk

        例如:

wr_clk = 200 MHz
rd_clk = 100 MHz
write_rate = 1 item/wr_clk
read_rate  = 1 item/rd_clk
write burst = 32 wr_clk cycles

        在 32 个写时钟周期内,读时钟只走了约 16 个周期,因此:

write_count = 32
read_count  = 16

burst_accumulation = 32 - 16 = 16

        如果读指针同步到写时钟域需要 2 拍,上游响应 full 需要 2 拍:

feedback_latency = 2 + 2 = 4 wr_clk cycles
cdc_feedback_margin_items = 1 * 4 = 4

depth >= 16 + 4 = 20

        工程实现中通常取 2 的幂:

depth = 32

6.7 深度取整规则

        计算得到理论最小深度后,还需要进行工程取整:

depth_final = next_power_of_2(depth_required + safety_margin)

        对同步 FIFO,不一定必须是 2 的幂,但 2 的幂可以简化指针回绕。对异步 FIFO,强烈建议使用 2 的幂,因为 Gray pointer full/empty 判断依赖标准二进制递增和 wrap bit。

        例如:

depth_required = 20
safety_margin  = 4

depth_final = next_power_of_2(24) = 32

6.8 FIFO 深度计算 checklist

[ ] 最大连续 burst 长度是多少?
[ ] burst 内 write rate 是多少?
[ ] burst 内 read rate 是多少?
[ ] 下游最大 stall 周期是多少?
[ ] 是否存在仲裁等待?最长等待多少周期?
[ ] full/backpressure 到上游停止写入有多少周期?
[ ] async FIFO 是否包含 pointer synchronizer 延迟?
[ ] reset 或 clock gating 后是否会出现短时间读写不平衡?
[ ] 是否需要 almost_full 提前反压?
[ ] 最终深度是否取 2 的幂?

6.9 深度选择建议

场景 推荐深度
简单 pipeline 解耦 2 ~ 4
ready/valid 普通缓冲 4 ~ 8
burst absorption 8 ~ 64
bus bridge / DMA 16 ~ 256
CDC FIFO 通常 8 起步,按速率和 CDC latency 计算
SRAM-based FIFO 64 以上更常见

7 Peek FIFO 与 Non-Peek FIFO

7.1 Non-Peek FIFO

        Non-Peek FIFO 是普通 FIFO。一次 read 操作即表示消费当前 head entry,并移动读指针。

rd_en asserted -> pop head -> rd_ptr advance

        适用于 streaming data、bus response queue、普通 pipeline buffer。

7.2 Peek FIFO

        Peek FIFO 支持查看队头数据但不弹出。只有 pop 才会真正移动读指针。

peek: 查看 head data,FIFO 状态不变
pop : 消费 head data,rd_ptr 前进

        典型应用包括:

7.3 Peek FIFO RTL

        下面的 Peek FIFO 始终输出当前 head entry。peek_dataempty == 0 时有效,pop_en 才会移动读指针。

module peek_fifo #(
    parameter int DATA_WIDTH = 32,
    parameter int DEPTH      = 16,
    localparam int ADDR_WIDTH = $clog2(DEPTH),
    localparam int CNT_WIDTH  = $clog2(DEPTH + 1)
) (
    input  logic                  clk,
    input  logic                  rst_n,

    input  logic                  push_en,
    input  logic [DATA_WIDTH-1:0] push_data,
    output logic                  full,

    input  logic                  pop_en,
    output logic [DATA_WIDTH-1:0] peek_data,
    output logic                  peek_valid,
    output logic                  empty,

    output logic [CNT_WIDTH-1:0]  level
);

    logic [DATA_WIDTH-1:0] mem [DEPTH];

    logic [ADDR_WIDTH-1:0] wr_ptr;
    logic [ADDR_WIDTH-1:0] rd_ptr;
    logic [CNT_WIDTH-1:0]  count;

    logic do_push;
    logic do_pop;

    assign empty      = (count == '0);
    assign full       = (count == DEPTH[CNT_WIDTH-1:0]);
    assign level      = count;
    assign peek_valid = !empty;
    assign peek_data  = mem[rd_ptr];

    assign do_push = push_en && (!full || pop_en);
    assign do_pop  = pop_en && !empty;

    always_ff @(posedge clk or negedge rst_n) begin
        if (!rst_n) begin
            wr_ptr <= '0;
        end else if (do_push) begin
            mem[wr_ptr] <= push_data;
            wr_ptr      <= wr_ptr + 1'b1;
        end
    end

    always_ff @(posedge clk or negedge rst_n) begin
        if (!rst_n) begin
            rd_ptr <= '0;
        end else if (do_pop) begin
            rd_ptr <= rd_ptr + 1'b1;
        end
    end

    always_ff @(posedge clk or negedge rst_n) begin
        if (!rst_n) begin
            count <= '0;
        end else begin
            unique case ({do_push, do_pop})
                2'b10: count <= count + 1'b1;
                2'b01: count <= count - 1'b1;
                default: count <= count;
            endcase
        end
    end

endmodule

        验证 Peek FIFO 时需要重点检查:

peek_valid && !pop_en -> rd_ptr stable
pop_en && !empty      -> rd_ptr advance
peek_data             -> always points to current head

        如果底层 memory 是同步读 SRAM,则 peek_data 可能不是组合可见,需要增加 prefetch 或 output register。


8 异步 FIFO 与 CDC

        异步 FIFO 用于跨时钟域传输数据。其核心思想是:

异步FIFO CDC架构

8.1 为什么使用 Gray code

        二进制指针递增时可能多个 bit 同时翻转,例如:

0111 -> 1000

        如果直接跨时钟域同步,接收端可能采到非法中间状态。Gray code 每次递增只变化 1 bit,因此适合跨时钟域同步。

8.2 Binary 与 Gray 转换

function automatic logic [PTR_WIDTH-1:0] bin2gray(
    input logic [PTR_WIDTH-1:0] bin
);
    return (bin >> 1) ^ bin;
endfunction

9 异步 FIFO RTL

        下面给出一个经典异步 FIFO 的 SystemVerilog 实现。为了简化指针和 Gray code 判断,DEPTH 应取 2 的幂。

module async_fifo #(
    parameter int DATA_WIDTH = 32,
    parameter int DEPTH      = 16,
    localparam int ADDR_WIDTH = $clog2(DEPTH),
    localparam int PTR_WIDTH  = ADDR_WIDTH + 1
) (
    input  logic                  wr_clk,
    input  logic                  wr_rst_n,
    input  logic                  wr_en,
    input  logic [DATA_WIDTH-1:0] wr_data,
    output logic                  full,

    input  logic                  rd_clk,
    input  logic                  rd_rst_n,
    input  logic                  rd_en,
    output logic [DATA_WIDTH-1:0] rd_data,
    output logic                  empty
);

    logic [DATA_WIDTH-1:0] mem [DEPTH];

    logic [PTR_WIDTH-1:0] wr_ptr_bin;
    logic [PTR_WIDTH-1:0] wr_ptr_bin_next;
    logic [PTR_WIDTH-1:0] wr_ptr_gray;
    logic [PTR_WIDTH-1:0] wr_ptr_gray_next;

    logic [PTR_WIDTH-1:0] rd_ptr_bin;
    logic [PTR_WIDTH-1:0] rd_ptr_bin_next;
    logic [PTR_WIDTH-1:0] rd_ptr_gray;
    logic [PTR_WIDTH-1:0] rd_ptr_gray_next;

    logic [PTR_WIDTH-1:0] wr_ptr_gray_rdclk_q1;
    logic [PTR_WIDTH-1:0] wr_ptr_gray_rdclk_q2;

    logic [PTR_WIDTH-1:0] rd_ptr_gray_wrclk_q1;
    logic [PTR_WIDTH-1:0] rd_ptr_gray_wrclk_q2;

    logic do_write;
    logic do_read;

    function automatic logic [PTR_WIDTH-1:0] bin2gray(
        input logic [PTR_WIDTH-1:0] bin
    );
        return (bin >> 1) ^ bin;
    endfunction

    assign do_write = wr_en && !full;
    assign do_read  = rd_en && !empty;

    assign wr_ptr_bin_next  = wr_ptr_bin + , do_write};
    assign wr_ptr_gray_next = bin2gray(wr_ptr_bin_next);

    assign rd_ptr_bin_next  = rd_ptr_bin + , do_read};
    assign rd_ptr_gray_next = bin2gray(rd_ptr_bin_next);

    assign rd_data = mem[rd_ptr_bin[ADDR_WIDTH-1:0]];

    always_ff @(posedge wr_clk or negedge wr_rst_n) begin
        if (!wr_rst_n) begin
            wr_ptr_bin  <= '0;
            wr_ptr_gray <= '0;
        end else begin
            wr_ptr_bin  <= wr_ptr_bin_next;
            wr_ptr_gray <= wr_ptr_gray_next;
        end
    end

    always_ff @(posedge wr_clk) begin
        if (do_write) begin
            mem[wr_ptr_bin[ADDR_WIDTH-1:0]] <= wr_data;
        end
    end

    always_ff @(posedge rd_clk or negedge rd_rst_n) begin
        if (!rd_rst_n) begin
            rd_ptr_bin  <= '0;
            rd_ptr_gray <= '0;
        end else begin
            rd_ptr_bin  <= rd_ptr_bin_next;
            rd_ptr_gray <= rd_ptr_gray_next;
        end
    end

    always_ff @(posedge rd_clk or negedge rd_rst_n) begin
        if (!rd_rst_n) begin
            wr_ptr_gray_rdclk_q1 <= '0;
            wr_ptr_gray_rdclk_q2 <= '0;
        end else begin
            wr_ptr_gray_rdclk_q1 <= wr_ptr_gray;
            wr_ptr_gray_rdclk_q2 <= wr_ptr_gray_rdclk_q1;
        end
    end

    always_ff @(posedge wr_clk or negedge wr_rst_n) begin
        if (!wr_rst_n) begin
            rd_ptr_gray_wrclk_q1 <= '0;
            rd_ptr_gray_wrclk_q2 <= '0;
        end else begin
            rd_ptr_gray_wrclk_q1 <= rd_ptr_gray;
            rd_ptr_gray_wrclk_q2 <= rd_ptr_gray_wrclk_q1;
        end
    end

    always_ff @(posedge wr_clk or negedge wr_rst_n) begin
        if (!wr_rst_n) begin
            full <= 1'b0;
        end else begin
            full <= (wr_ptr_gray_next == {
                ~rd_ptr_gray_wrclk_q2[PTR_WIDTH-1:PTR_WIDTH-2],
                 rd_ptr_gray_wrclk_q2[PTR_WIDTH-3:0]
            });
        end
    end

    always_ff @(posedge rd_clk or negedge rd_rst_n) begin
        if (!rd_rst_n) begin
            empty <= 1'b1;
        end else begin
            empty <= (rd_ptr_gray_next == wr_ptr_gray_rdclk_q2);
        end
    end

endmodule

9.1 异步 FIFO 设计重点

        上述 RTL 中有几个关键点:

9.2 full 判断逻辑

        异步 FIFO 中常见 full 判断如下:

full <= (wr_ptr_gray_next == {
    ~rd_ptr_gray_sync[PTR_WIDTH-1:PTR_WIDTH-2],
     rd_ptr_gray_sync[PTR_WIDTH-3:0]
});

        这表示写指针追上读指针一圈,即地址部分相同,但 wrap 信息表示 FIFO 已满。

9.3 empty 判断逻辑

empty <= (rd_ptr_gray_next == wr_ptr_gray_sync);

        这表示下一次读之后,读指针等于已经同步到读时钟域的写指针,FIFO 为空。


10 CDC 分析 Checklist

        分析 async FIFO 时,建议按以下 checklist 检查:

[ ] data 是否通过 dual-port memory 传输
[ ] 是否没有直接同步 data bus
[ ] binary pointer 是否只在本地时钟域使用
[ ] 跨域 pointer 是否转换为 Gray code
[ ] Gray pointer 是否经过 two-flop synchronizer
[ ] full 是否只在 wr_clk domain 产生
[ ] empty 是否只在 rd_clk domain 产生
[ ] reset assertion 是否能清零两个时钟域指针
[ ] reset release 是否满足本地时钟域同步要求
[ ] DEPTH 是否为 2 的幂
[ ] CDC 工具是否识别 synchronizer
[ ] Gray bus 是否有 max delay 或 bus skew 约束

        常见错误如下:

错误 后果
binary pointer 直接跨域 full/empty 可能错误
data bus 直接打两拍 多 bit 数据不一致
full 在读时钟域产生 写侧 backpressure 错误
empty 在写时钟域产生 读侧控制错误
reset 异步释放无处理 两边 pointer 初始状态不一致
Gray bus 无约束 物理实现后可能多 bit 同时到达

11 架构分析重点

11.1 FIFO 类型

        首先要确认 FIFO 类型:

sync or async?
peek or non-peek?
register array or SRAM?
ready/valid interface or wr_en/rd_en interface?

11.2 full/empty

        对同步 FIFO:

count 是否正确更新?
同周期读写是否定义清楚?
full 时 read + write 是否允许?
empty 时 write + read 是否允许?

        对异步 FIFO:

Gray pointer 是否正确?
同步方向是否正确?
full/empty 是否在本地域生成?
是否比较 next pointer?

11.3 depth sizing

        sizing 时建议明确以下参数:


12 验证建议

        FIFO 验证应覆盖边界条件,而不仅仅是普通 push/pop。

12.1 Sync FIFO 验证点

[ ] reset 后 empty=1, full=0
[ ] empty 时 read 不改变状态
[ ] full 时 write 不覆盖未读数据
[ ] 连续写到 full
[ ] 连续读到 empty
[ ] 同周期 read/write 时 count 保持
[ ] pointer wrap-around
[ ] data ordering 保持 FIFO 顺序

12.2 Peek FIFO 验证点

[ ] peek_data 指向 head entry
[ ] peek 不改变 rd_ptr
[ ] pop 改变 rd_ptr
[ ] 连续 peek 数据保持
[ ] peek 后 pop 顺序正确
[ ] empty 时 peek_valid=0

12.3 Async FIFO 验证点

[ ] wr_clk faster than rd_clk
[ ] rd_clk faster than wr_clk
[ ] random wr_en / rd_en
[ ] random reset
[ ] full boundary
[ ] empty boundary
[ ] long-run data ordering
[ ] no overflow
[ ] no underflow

13 推荐练习

13.1 练习一:4-entry 同步 FIFO

要求:

        参考 RTL 如下:

module ex1_sync_fifo_4x8 (
    input  logic       clk,
    input  logic       rst_n,

    input  logic       wr_en,
    input  logic [7:0] wr_data,
    output logic       full,

    input  logic       rd_en,
    output logic [7:0] rd_data,
    output logic       empty
);

    localparam int DEPTH      = 4;
    localparam int ADDR_WIDTH = 2;
    localparam int CNT_WIDTH  = 3;

    logic [7:0] mem [DEPTH];

    logic [ADDR_WIDTH-1:0] wr_ptr;
    logic [ADDR_WIDTH-1:0] rd_ptr;
    logic [CNT_WIDTH-1:0]  count;

    logic do_write;
    logic do_read;

    assign empty = (count == 0);
    assign full  = (count == DEPTH[CNT_WIDTH-1:0]);

    assign do_write = wr_en && (!full || rd_en);
    assign do_read  = rd_en && !empty;

    assign rd_data = mem[rd_ptr];

    always_ff @(posedge clk or negedge rst_n) begin
        if (!rst_n) begin
            wr_ptr <= '0;
        end else if (do_write) begin
            mem[wr_ptr] <= wr_data;
            wr_ptr      <= wr_ptr + 1'b1;
        end
    end

    always_ff @(posedge clk or negedge rst_n) begin
        if (!rst_n) begin
            rd_ptr <= '0;
        end else if (do_read) begin
            rd_ptr <= rd_ptr + 1'b1;
        end
    end

    always_ff @(posedge clk or negedge rst_n) begin
        if (!rst_n) begin
            count <= '0;
        end else begin
            unique case ({do_write, do_read})
                2'b10: count <= count + 1'b1;
                2'b01: count <= count - 1'b1;
                default: count <= count;
            endcase
        end
    end

endmodule

        对应 testbench 如下,覆盖 empty read、full write、normal push/pop 和 simultaneous read/write:

module tb_ex1_sync_fifo_4x8;

    logic       clk;
    logic       rst_n;
    logic       wr_en;
    logic [7:0] wr_data;
    logic       full;
    logic       rd_en;
    logic [7:0] rd_data;
    logic       empty;

    ex1_sync_fifo_4x8 dut (.*);

    initial clk = 1'b0;
    always #5 clk = ~clk;

    task automatic push(input logic [7:0] data);
        @(negedge clk);
        wr_en   = 1'b1;
        wr_data = data;
        @(negedge clk);
        wr_en   = 1'b0;
    endtask

    task automatic pop(output logic [7:0] data);
        @(negedge clk);
        rd_en = 1'b1;
        @(posedge clk);
        #1;
        data = rd_data;
        @(negedge clk);
        rd_en = 1'b0;
    endtask

    logic [7:0] data;

    initial begin
        rst_n   = 1'b0;
        wr_en   = 1'b0;
        wr_data = '0;
        rd_en   = 1'b0;

        repeat (3) @(negedge clk);
        rst_n = 1'b1;

        assert(empty == 1'b1);
        assert(full  == 1'b0);

        // empty read
        @(negedge clk);
        rd_en = 1'b1;
        @(negedge clk);
        rd_en = 1'b0;
        assert(empty == 1'b1);

        // normal push/pop
        push(8'h11);
        push(8'h22);
        pop(data);
        assert(data == 8'h11);
        pop(data);
        assert(data == 8'h22);

        // full write
        push(8'hA0);
        push(8'hA1);
        push(8'hA2);
        push(8'hA3);
        assert(full == 1'b1);

        @(negedge clk);
        wr_en   = 1'b1;
        wr_data = 8'hFF;
        @(negedge clk);
        wr_en   = 1'b0;
        assert(full == 1'b1);

        // simultaneous read/write when full
        @(negedge clk);
        wr_en   = 1'b1;
        wr_data = 8'hB0;
        rd_en   = 1'b1;
        @(negedge clk);
        wr_en   = 1'b0;
        rd_en   = 1'b0;

        pop(data);
        assert(data == 8'hA1);

        $display("EX1 PASS");
        $finish;
    end

endmodule

13.2 练习二:pointer-based 同步 FIFO

要求:

        参考 RTL 如下。该实现使用 ADDR_WIDTH + 1 位指针,低位作为 memory address,最高位作为 wrap bit。

module ex2_sync_fifo_ptr #(
    parameter int DATA_WIDTH = 8,
    parameter int DEPTH      = 8,
    localparam int ADDR_WIDTH = $clog2(DEPTH),
    localparam int PTR_WIDTH  = ADDR_WIDTH + 1
) (
    input  logic                  clk,
    input  logic                  rst_n,

    input  logic                  wr_en,
    input  logic [DATA_WIDTH-1:0] wr_data,
    output logic                  full,

    input  logic                  rd_en,
    output logic [DATA_WIDTH-1:0] rd_data,
    output logic                  empty
);

    logic [DATA_WIDTH-1:0] mem [DEPTH];

    logic [PTR_WIDTH-1:0] wr_ptr;
    logic [PTR_WIDTH-1:0] rd_ptr;

    logic [PTR_WIDTH-1:0] wr_ptr_next;
    logic [PTR_WIDTH-1:0] rd_ptr_next;

    logic do_write;
    logic do_read;

    assign empty = (wr_ptr == rd_ptr);

    assign full = (wr_ptr[PTR_WIDTH-1]     != rd_ptr[PTR_WIDTH-1]) &&
                  (wr_ptr[ADDR_WIDTH-1:0] == rd_ptr[ADDR_WIDTH-1:0]);

    assign do_write = wr_en && (!full || rd_en);
    assign do_read  = rd_en && !empty;

    assign wr_ptr_next = wr_ptr + , do_write};
    assign rd_ptr_next = rd_ptr + , do_read};

    assign rd_data = mem[rd_ptr[ADDR_WIDTH-1:0]];

    always_ff @(posedge clk or negedge rst_n) begin
        if (!rst_n) begin
            wr_ptr <= '0;
        end else begin
            wr_ptr <= wr_ptr_next;
            if (do_write) begin
                mem[wr_ptr[ADDR_WIDTH-1:0]] <= wr_data;
            end
        end
    end

    always_ff @(posedge clk or negedge rst_n) begin
        if (!rst_n) begin
            rd_ptr <= '0;
        end else begin
            rd_ptr <= rd_ptr_next;
        end
    end

endmodule

        简单 wrap-around 测试如下:

module tb_ex2_sync_fifo_ptr;

    logic       clk;
    logic       rst_n;
    logic       wr_en;
    logic [7:0] wr_data;
    logic       full;
    logic       rd_en;
    logic [7:0] rd_data;
    logic       empty;

    ex2_sync_fifo_ptr #(
        .DATA_WIDTH(8),
        .DEPTH(8)
    ) dut (.*);

    initial clk = 1'b0;
    always #5 clk = ~clk;

    task automatic cycle;
        @(negedge clk);
    endtask

    initial begin
        rst_n   = 1'b0;
        wr_en   = 1'b0;
        wr_data = '0;
        rd_en   = 1'b0;

        repeat (3) cycle();
        rst_n = 1'b1;

        // fill FIFO
        for (int i = 0; i < 8; i++) begin
            cycle();
            wr_en   = 1'b1;
            wr_data = 8'(i);
        end
        cycle();
        wr_en = 1'b0;
        assert(full == 1'b1);

        // drain 4 entries
        for (int i = 0; i < 4; i++) begin
            cycle();
            rd_en = 1'b1;
        end
        cycle();
        rd_en = 1'b0;
        assert(full == 1'b0);

        // write 4 entries to force write pointer wrap
        for (int i = 0; i < 4; i++) begin
            cycle();
            wr_en   = 1'b1;
            wr_data = 8'(8 + i);
        end
        cycle();
        wr_en = 1'b0;
        assert(full == 1'b1);

        // drain all entries
        for (int i = 0; i < 8; i++) begin
            cycle();
            rd_en = 1'b1;
        end
        cycle();
        rd_en = 1'b0;
        assert(empty == 1'b1);

        $display("EX2 PASS");
        $finish;
    end

endmodule

13.3 练习三:Peek FIFO

要求:

        参考 RTL 如下。这里的 peek_en 只用于表示外部正在查看队头数据,不参与读指针更新;真正改变 FIFO 状态的是 pop_en

module ex3_peek_fifo #(
    parameter int DATA_WIDTH = 8,
    parameter int DEPTH      = 4,
    localparam int ADDR_WIDTH = $clog2(DEPTH),
    localparam int CNT_WIDTH  = $clog2(DEPTH + 1)
) (
    input  logic                  clk,
    input  logic                  rst_n,

    input  logic                  push_en,
    input  logic [DATA_WIDTH-1:0] push_data,
    output logic                  full,

    input  logic                  peek_en,
    output logic [DATA_WIDTH-1:0] peek_data,
    output logic                  peek_valid,

    input  logic                  pop_en,
    output logic                  empty
);

    logic [DATA_WIDTH-1:0] mem [DEPTH];

    logic [ADDR_WIDTH-1:0] wr_ptr;
    logic [ADDR_WIDTH-1:0] rd_ptr;
    logic [CNT_WIDTH-1:0]  count;

    logic do_push;
    logic do_pop;

    assign empty      = (count == '0);
    assign full       = (count == DEPTH[CNT_WIDTH-1:0]);
    assign peek_valid = peek_en && !empty;
    assign peek_data  = mem[rd_ptr];

    assign do_push = push_en && (!full || pop_en);
    assign do_pop  = pop_en && !empty;

    always_ff @(posedge clk or negedge rst_n) begin
        if (!rst_n) begin
            wr_ptr <= '0;
        end else if (do_push) begin
            mem[wr_ptr] <= push_data;
            wr_ptr      <= wr_ptr + 1'b1;
        end
    end

    always_ff @(posedge clk or negedge rst_n) begin
        if (!rst_n) begin
            rd_ptr <= '0;
        end else if (do_pop) begin
            rd_ptr <= rd_ptr + 1'b1;
        end
    end

    always_ff @(posedge clk or negedge rst_n) begin
        if (!rst_n) begin
            count <= '0;
        end else begin
            unique case ({do_push, do_pop})
                2'b10: count <= count + 1'b1;
                2'b01: count <= count - 1'b1;
                default: count <= count;
            endcase
        end
    end

endmodule

        对应 testbench 如下:

module tb_ex3_peek_fifo;

    logic       clk;
    logic       rst_n;
    logic       push_en;
    logic [7:0] push_data;
    logic       full;
    logic       peek_en;
    logic [7:0] peek_data;
    logic       peek_valid;
    logic       pop_en;
    logic       empty;

    ex3_peek_fifo #(
        .DATA_WIDTH(8),
        .DEPTH(4)
    ) dut (.*);

    initial clk = 1'b0;
    always #5 clk = ~clk;

    task automatic push(input logic [7:0] data);
        @(negedge clk);
        push_en   = 1'b1;
        push_data = data;
        @(negedge clk);
        push_en   = 1'b0;
    endtask

    initial begin
        rst_n     = 1'b0;
        push_en   = 1'b0;
        push_data = '0;
        peek_en   = 1'b0;
        pop_en    = 1'b0;

        repeat (3) @(negedge clk);
        rst_n = 1'b1;

        // empty peek
        @(negedge clk);
        peek_en = 1'b1;
        @(posedge clk);
        #1;
        assert(peek_valid == 1'b0);
        @(negedge clk);
        peek_en = 1'b0;

        push(8'h55);
        push(8'h66);

        // continuous peek should not pop
        repeat (3) begin
            @(negedge clk);
            peek_en = 1'b1;
            @(posedge clk);
            #1;
            assert(peek_valid == 1'b1);
            assert(peek_data  == 8'h55);
        end
        @(negedge clk);
        peek_en = 1'b0;

        // pop after peek
        @(negedge clk);
        pop_en = 1'b1;
        @(negedge clk);
        pop_en = 1'b0;

        @(negedge clk);
        peek_en = 1'b1;
        @(posedge clk);
        #1;
        assert(peek_data == 8'h66);
        @(negedge clk);
        peek_en = 1'b0;

        $display("EX3 PASS");
        $finish;
    end

endmodule

13.4 练习四:Async FIFO

要求:

        参考 RTL 如下。该实现假设 DEPTH 为 2 的幂。

module ex4_async_fifo #(
    parameter int DATA_WIDTH = 8,
    parameter int DEPTH      = 8,
    localparam int ADDR_WIDTH = $clog2(DEPTH),
    localparam int PTR_WIDTH  = ADDR_WIDTH + 1
) (
    input  logic                  wr_clk,
    input  logic                  wr_rst_n,
    input  logic                  wr_en,
    input  logic [DATA_WIDTH-1:0] wr_data,
    output logic                  full,

    input  logic                  rd_clk,
    input  logic                  rd_rst_n,
    input  logic                  rd_en,
    output logic [DATA_WIDTH-1:0] rd_data,
    output logic                  empty
);

    logic [DATA_WIDTH-1:0] mem [DEPTH];

    logic [PTR_WIDTH-1:0] wr_bin;
    logic [PTR_WIDTH-1:0] wr_bin_next;
    logic [PTR_WIDTH-1:0] wr_gray;
    logic [PTR_WIDTH-1:0] wr_gray_next;

    logic [PTR_WIDTH-1:0] rd_bin;
    logic [PTR_WIDTH-1:0] rd_bin_next;
    logic [PTR_WIDTH-1:0] rd_gray;
    logic [PTR_WIDTH-1:0] rd_gray_next;

    logic [PTR_WIDTH-1:0] wr_gray_rd_q1;
    logic [PTR_WIDTH-1:0] wr_gray_rd_q2;
    logic [PTR_WIDTH-1:0] rd_gray_wr_q1;
    logic [PTR_WIDTH-1:0] rd_gray_wr_q2;

    logic do_write;
    logic do_read;

    function automatic logic [PTR_WIDTH-1:0] bin2gray(
        input logic [PTR_WIDTH-1:0] bin
    );
        return (bin >> 1) ^ bin;
    endfunction

    assign do_write = wr_en && !full;
    assign do_read  = rd_en && !empty;

    assign wr_bin_next  = wr_bin + , do_write};
    assign wr_gray_next = bin2gray(wr_bin_next);

    assign rd_bin_next  = rd_bin + , do_read};
    assign rd_gray_next = bin2gray(rd_bin_next);

    assign rd_data = mem[rd_bin[ADDR_WIDTH-1:0]];

    always_ff @(posedge wr_clk or negedge wr_rst_n) begin
        if (!wr_rst_n) begin
            wr_bin  <= '0;
            wr_gray <= '0;
        end else begin
            wr_bin  <= wr_bin_next;
            wr_gray <= wr_gray_next;
        end
    end

    always_ff @(posedge wr_clk) begin
        if (do_write) begin
            mem[wr_bin[ADDR_WIDTH-1:0]] <= wr_data;
        end
    end

    always_ff @(posedge rd_clk or negedge rd_rst_n) begin
        if (!rd_rst_n) begin
            rd_bin  <= '0;
            rd_gray <= '0;
        end else begin
            rd_bin  <= rd_bin_next;
            rd_gray <= rd_gray_next;
        end
    end

    always_ff @(posedge rd_clk or negedge rd_rst_n) begin
        if (!rd_rst_n) begin
            wr_gray_rd_q1 <= '0;
            wr_gray_rd_q2 <= '0;
        end else begin
            wr_gray_rd_q1 <= wr_gray;
            wr_gray_rd_q2 <= wr_gray_rd_q1;
        end
    end

    always_ff @(posedge wr_clk or negedge wr_rst_n) begin
        if (!wr_rst_n) begin
            rd_gray_wr_q1 <= '0;
            rd_gray_wr_q2 <= '0;
        end else begin
            rd_gray_wr_q1 <= rd_gray;
            rd_gray_wr_q2 <= rd_gray_wr_q1;
        end
    end

    always_ff @(posedge wr_clk or negedge wr_rst_n) begin
        if (!wr_rst_n) begin
            full <= 1'b0;
        end else begin
            full <= (wr_gray_next == {
                ~rd_gray_wr_q2[PTR_WIDTH-1:PTR_WIDTH-2],
                 rd_gray_wr_q2[PTR_WIDTH-3:0]
            });
        end
    end

    always_ff @(posedge rd_clk or negedge rd_rst_n) begin
        if (!rd_rst_n) begin
            empty <= 1'b1;
        end else begin
            empty <= (rd_gray_next == wr_gray_rd_q2);
        end
    end

endmodule

        对应 testbench 如下,写时钟和读时钟使用不同周期,并用 scoreboard 检查数据顺序:

module tb_ex4_async_fifo;

    logic       wr_clk;
    logic       rd_clk;
    logic       wr_rst_n;
    logic       rd_rst_n;

    logic       wr_en;
    logic [7:0] wr_data;
    logic       full;

    logic       rd_en;
    logic [7:0] rd_data;
    logic       empty;

    logic [7:0] exp_q [$];

    ex4_async_fifo #(
        .DATA_WIDTH(8),
        .DEPTH(8)
    ) dut (.*);

    initial wr_clk = 1'b0;
    always #3 wr_clk = ~wr_clk;

    initial rd_clk = 1'b0;
    always #7 rd_clk = ~rd_clk;

    initial begin
        wr_rst_n = 1'b0;
        rd_rst_n = 1'b0;
        wr_en    = 1'b0;
        wr_data  = '0;
        rd_en    = 1'b0;

        repeat (5) @(posedge wr_clk);
        wr_rst_n = 1'b1;

        repeat (5) @(posedge rd_clk);
        rd_rst_n = 1'b1;
    end

    initial begin
        wait(wr_rst_n);
        for (int i = 0; i < 64; i++) begin
            @(posedge wr_clk);
            if (!full) begin
                wr_en   <= 1'b1;
                wr_data <= 8'(i);
                exp_q.push_back(8'(i));
            end else begin
                wr_en <= 1'b0;
            end
        end
        @(posedge wr_clk);
        wr_en <= 1'b0;
    end

    initial begin
        logic [7:0] exp;

        wait(rd_rst_n);
        repeat (10) @(posedge rd_clk);

        forever begin
            @(posedge rd_clk);
            if (!empty) begin
                rd_en <= 1'b1;
                @(posedge rd_clk);
                #1;
                if (exp_q.size() > 0) begin
                    exp = exp_q.pop_front();
                    assert(rd_data == exp)
                        else $fatal("Async FIFO mismatch: rd_data=%0h exp=%0h", rd_data, exp);
                end
            end else begin
                rd_en <= 1'b0;
            end

            if (exp_q.size() == 0 && wr_en == 1'b0) begin
                repeat (20) @(posedge rd_clk);
                $display("EX4 PASS");
                $finish;
            end
        end
    end

endmodule

14 一周学习节奏

时间 内容
Day 1 Buffer 基础、sync FIFO、count-based full/empty
Day 2 pointer-based FIFO、wrap bit、off-by-one
Day 3 peek FIFO、head data、pop/peek 分离
Day 4 depth sizing、burst/stall/rate mismatch
Day 5 async FIFO、Gray code、pointer synchronizer
Day 6 CDC 分析、reset、CDC waiver/constraint
Day 7 综合项目:producer -> async FIFO -> random-stall consumer

15 总结

        FIFO 的主线可以总结为:

Buffering
  -> Sync FIFO
  -> full/empty boundary
  -> depth sizing
  -> peek/non-peek behavior
  -> Async FIFO
  -> CDC safety

        其中最重要的工程能力包括:

        FIFO 是 Buffer 设计的基础,也是 CDC 设计中最经典的结构之一。掌握 FIFO 的架构、RTL 和验证方法,是进一步学习 bus bridge、NoC、DMA、cache queue、packet buffer 的重要基础。

Back to Archive
WeChat QR Code

Scan to connect