This repository contains the current DSP block development for a larger FPGA sensor processing accelerator project. The current RTL implements a 5-point moving average FIR-style filter intended to process streaming 8-bit sensor/sample data.
The long-term system target is a Zynq-based PS-to-PL data path where software provides samples through memory/DMA, programmable logic processes the stream, and the processed output is returned for software-side use or visualization.
This repository is currently focused on the DSP kernel and its early verification environment, not the full AXI/DMA system yet.
This project is a work in progress.
Completed so far:
- 5-sample moving average RTL block
- Shift-register based streaming sample window
- Output valid generation after startup fill period
- SystemVerilog randomized stimulus testbench
- Functional coverage collection in Vivado XSim
- VCD waveform dumping for signal inspection
- MATLAB reference model for 5-point moving average behavior
- Makefile flow for compile, elaborate, simulation, GUI launch, lint, and clean
Still in progress:
- Self-checking scoreboard against a fixed MATLAB/SystemVerilog golden vector set
- Full valid-signal/output alignment verification
- AXI-Stream wrapper around the DSP kernel
- AXI DMA integration into the larger Zynq sensor processing accelerator
- PetaLinux or bare-metal C software control path
- Coverage closure beyond the current initial coverage report
This DSP block is intended to become part of a larger FPGA sensor processing accelerator.
Planned high-level data path:
Sensor/sample data
|
v
Processor system memory
|
v
AXI DMA MM2S
|
v
AXI-Stream DSP block
|
v
AXI DMA S2MM
|
v
Processed output in memory
The current repository implements and verifies the central DSP block only:
data_in[7:0] -> 5-sample moving average filter -> data_out[15:0]
The main RTL module is DSP_block.sv.
module DSP_block (
input logic clk,
input logic rst_n,
input logic [7:0] data_in,
output logic [15:0] data_out,
output logic data_out_valid
);The design stores the most recent five samples in a shift-register window:
x0, x1, x2, x3, x4
On each clock cycle:
x0 <= newest sample
x1 <= previous x0
x2 <= previous x1
x3 <= previous x2
x4 <= previous x3
The output is computed as:
data_out = (x0 + x1 + x2 + x3 + x4) / 5
The current RTL uses a counter to detect when enough samples have entered the window. Once the counter reaches 4, data_out_valid remains asserted.
assign data_out_valid = (counter == 3'd4);
assign data_out = sum;The MATLAB model in Moving_average_FIR_golden.m implements a one-sided 5-point moving average over a randomized 8-bit data stream.
The model generates 32 random samples:
data_stream = randi(255,1,32);It then computes valid 5-sample averages:
output(i) = int16(accumulator / 5);Because a 5-point FIR window has a delay of (N - 1) / 2, the plot shifts the output by two samples to align the filtered result with the middle of the window.
plot(3:(length(data_stream)-2), output);The MATLAB model documents the RTL intent:
5-sample shift register
sum = x0 + x1 + x2 + x3 + x4
avg = sum / 5
valid_out after 5 samples
The current SystemVerilog testbench is DSP_tb.sv.
It includes:
- Randomized transaction class
- 8-bit input sample constraint
- DUT instantiation
- Clock generation
- Reset sequencing
- 1000 randomized input samples
- Functional coverage collection
- VCD waveform dumping
class dsp_transaction;
rand bit [7:0] sample;
constraint sample_range_c {
sample inside {[0 : 255]};
}
endclassThe testbench randomizes a new sample on the negative clock edge so the value is stable before the next positive edge coverage sample.
for (int i = 0; i < 1000; i++) begin
@(negedge clk);
if (tr.randomize() == 1'b0) begin
$fatal("Randomization could not be solved");
end
data_in = tr.sample;
endThe current testbench defines one covergroup:
covergroup dsp_cover_group @(posedge clk);It tracks:
| Coverpoint | Purpose |
|---|---|
cp_input |
Covers low, mid, and high input sample ranges |
cp_valid |
Covers both valid and invalid output states |
cp_output_valid |
Covers output value ranges only when data_out_valid is asserted |
Vivado XSim coverage report:
| Metric | Result |
|---|---|
| Total group coverage | 77.7778% |
| Number of covergroups | 1 |
| Covergroup name | DSP_tb::dsp_cover_group |
cp_input coverage |
100% |
cp_valid coverage |
100% |
cp_output_valid coverage |
33.3333% |
The current output-valid coverpoint only hits the mid-range output bin. This is expected for the current randomized moving average stimulus because averaging random 8-bit samples tends to concentrate outputs near the middle of the value range. Additional directed tests are needed to intentionally hit low and high output bins.
The simulation generates a VCD file:
DSP_tb.vcd
The waveform includes:
clkrst_ndata_indata_outdata_out_valid- internal DUT shift registers
x0throughx4 - internal
counter - internal
sum
This is useful for debugging reset behavior, valid timing, shift-register movement, and output alignment.
The included Makefile runs Vivado/XSim from WSL by converting the current WSL path into a Windows path and calling the Vivado 2025.2 environment script.
Current Vivado setup path:
VIVADO_SETTINGS := C:\AMDDesignTools\2025.2\Vivado\settings64.batThe top-level simulation module is:
TOP := DSP_tbThe simulation snapshot is:
SNAPSHOT := sim_snapshotList SystemVerilog files:
make filesCompile:
make compileElaborate:
make elabRun simulation:
make simOpen XSim GUI:
make guiRun Verible lint:
make lintClean generated files:
make cleanGenerate an HTML functional coverage report after simulation:
xcrg -dir xsim.covdb -report_dir xcrg_report -report_format html| File | Purpose |
|---|---|
DSP_block.sv |
RTL implementation of the 5-point moving average DSP block |
DSP_tb.sv |
SystemVerilog testbench with randomized stimulus and functional coverage |
Moving_average_FIR_golden.m |
MATLAB reference model for one-sided 5-point moving average behavior |
Makefile |
Vivado/XSim automation from WSL into Windows Vivado tools |
DSP_tb.vcd |
Generated waveform dump from simulation |
dashboard.html |
Vivado XSim coverage dashboard |
groups.html |
Vivado XSim coverage group summary |
grp0.html |
Detailed coverpoint report for DSP_tb::dsp_cover_group |
This repository is intentionally labeled as WIP because the DSP block is still being developed as part of a larger FPGA sensor processing accelerator.
Current limitations:
- The testbench currently generates random stimulus and functional coverage, but it is not yet a self-checking scoreboard.
- The MATLAB model and RTL are not yet connected through a shared fixed-vector regression flow.
- The valid-signal timing and first-valid output alignment still need to be locked against the golden model.
- The current RTL uses a simple always-valid streaming assumption and does not yet implement AXI-Stream
tvalid/treadyhandshaking. - AXI DMA, AXI-Lite control, and software integration are planned for the larger accelerator project but are not implemented in this DSP block repo yet.
Planned improvements:
- Add deterministic test vectors shared between MATLAB and SystemVerilog.
- Add a self-checking scoreboard that compares DUT output against expected moving average values.
- Add directed tests for low-output and high-output coverage bins.
- Add an AXI-Stream wrapper around the DSP block.
- Add
tvalid,tready, and output-valid pipeline behavior. - Integrate the DSP block into the larger Zynq AXI/DMA accelerator design.
- SystemVerilog RTL design
- Streaming DSP datapath design
- Shift-register based FIR filtering
- Randomized SystemVerilog stimulus
- Functional coverage with covergroups and coverpoints
- Vivado XSim simulation
- VCD waveform generation and debug
- MATLAB reference modeling
- Makefile-based simulation automation
- WSL-to-Windows Vivado tool flow
- Early FPGA accelerator integration planning
This WIP project implements the DSP kernel for a larger FPGA sensor processing accelerator. The current design is a 5-point moving average filter written in SystemVerilog, verified with randomized simulation stimulus, functional coverage, waveform generation, and a MATLAB reference model. The next phase is to turn the current randomized testbench into a self-checking golden-model regression and then wrap the DSP block for AXI-Stream/AXI DMA integration on Zynq.