After some optimization, I have achieved a better QoR~
[Optimized for throughput]
1. Single floating-point
2. Fmax = 189MHz
3. Latency = 267tCLK
4. LUT6 = 11876
5. FF = 12702
6. DSP48 = 36
[Optimized for resource]
1. Single floating-point
2. Fmax = 201MHz
3. Latency = 554tCLK
4. LUT6 = 4886
5. FF = 3829
6. DSP48 = 12
|