Have you ever written code that behaves correctly under a simulator only to have intermittent failures in the field? Or maybe your code no longer functions properly when you compile with a newer version of your tool chain. You review your test bench and verify 100 percent complete test coverage and that all tests have passed with no errors--yet the problem stubbornly remains. While designers understandably place great emphasis on coding and simulation, they often have only a nodding acquaintance with the internal workings of the silicon within an FPGA. As a result, incorrect logic synthesis and timing problems, rather than logic errors, are the cause of most logic failures. But writing FPGA code that creates predictable, reliable logic is simple if designers take the right steps.
In FPGA design, logic synthesis and related timing closure occur during compilation. And many things, including I/O cell structure, asynchronous logic and timing constraints, can have a big impact on the compilation process, varying results with each pass through the tool chain. Let's take a closer look at ways to eliminate these variances to better and more quickly achieve timing closure. The I/O Cell Structure All FPGAs have I/O pins that can be highly customized. The customization affects timing, drive strength, termination and many other factors. When your I/O cell structure is not clearly defined, your tool chain will often use a default that may or may not be what you want. In the VHDL code below, the intent is to create a bidirectional I/O buffer named sda using the declaration "sda: inout std_logic;".
When the synthesis tool sees this block of code, there is no clear directive on how to implement the bidirectional buffer. As a result, the tool will take a best guess. One way to accomplish the task would be to use a bidirectional buffer on the I/O ring of the FPGA (indeed, this is the desired implementation). Another option would be a tristate output buffer and input buffer, both implemented in lookup table (LUT) logic. A final possibility would be to use a tristate output buffer on the I/O ring along with an input buffer in an LUT—and this is the option that most synthesizers will choose. All three methods yield valid logic, but the last two implementations result in additional routing delays when the signal moves between the I/O pin and the LUT. They also require additional timing constraints to ensure timing closure. FPGA Editor clearly shows in Figure 1 that our bidirectional I/O has portions scattered outside the I/O buffer.
Click on image to enlarge
The lesson? Don't let your synthesis tool guess how to implement critical sections of your code. Even if the synthesized logic happens to be what you want, it may change when the synthesis tool goes through a new revision. Clearly define your I/O logic and any critical logic. The following VHDL code shows how to implicitly define the I/O buffer using the Xilinx primitive IOBUF. Also note that all electrical properties of the buffer are likewise clearly defined.
In Figure 2, FPGA Editor clearly shows that our bidirectional I/O has been implemented entirely within the I/O buffer.
Click on image to enlarge
Trials of Asynchronous Logic Asynchronous code results in logic that is difficult to constrain, simulate and debug. Errors from asynchronous logic are often intermittent and nearly impossible to replicate. It's also not possible to generate a test bench to find errors due to asynchronous logic. While asynchronous logic may seem easy to spot, in fact it often goes undetected, so designers must be aware of the many ways that asynchronous logic lurks in our designs. All clocked logic requires a minimum setup-and-hold time, and this also applies to the reset input of flip-flops. The code below uses an asynchronous reset. Here, there is no possible way to apply timing constraints to meet the setup-and-hold time requirements of the flip-flop.
The next listing uses a synchronous reset. However, the reset signal for most systems may be a pushbutton switch or some other source that is not related to the system clock. Although reset is mostly static, and asserted or deasserted for long periods, there is still a change in level. It is the deassertion of reset, relative to the rising edge of the system clock, that can violate the setup-time requirements of a flip-flop, and there is no way to constrain this.
Once we realize that we can't directly feed an asynchronous signal into our synchronous logic, the problem becomes easy to fix. The code below creates a new reset called sys_reset that has been synchronized to our system clock sys_clk. When sampling asynchronous logic, metastability issues can arise. We can reduce the chance of its occurrence by using a laddered sample that is ANDed with the previous stages of the ladder.
So, let's assume you've taken care to make all your logic synchronous. Nevertheless, if you're not careful, your logic can easily become decoupled from the system clock. Don't let your tool chain use local routing resources for your system clock. Doing so will make your logic impossible to constrain. Remember to clearly define all your important logic. The VHDL code below uses the Xilinx primitive BUFG to force sys_clk onto a dedicated high-fan-out buffer that drives low-skew nets.
Some designs use a divided version of their single master clock to process deserialized data. The VHDL code below, process nibble_proc, shows an example of data being captured at one-quarter of the system clock rate.
It looks like everything is synchronous, but the nibble_proc uses a product term divide_by_4 to sample nibble_wide_data from clock domain sys_clk_bufg. Due to routing delays, there is no well-defined phase relationship between divde_by_4 and sys_clk_bufg. Moving divide_by_4 onto a BUFG will not help either, as the process incurs a routing delay. The solution is to keep nibble_proc on the sys_clk_bufg domain and use divide_by_4 as a qualifier, as shown below.
Applying the proper timing constraints is a necessity if you want your logic to perform properly. If you've taken care to ensure that 100 percent of your code is synchronous and all I/Os are registered, those steps will greatly simplify timing closure. Using the above code and assuming that the system clock is 100 MHz, the timing constraint file is easily done in four lines, as shown below.
Note that setup-and-hold times for I/O registered logic on Xilinx FPGAs are pretty much fixed and don't change much within a package. But we still apply them, mainly as a verification step to ensure that the design meets its system parameters. Three Easy Steps Designers will find that it's not hard to implement reliable code if they follow three simple steps.
Don't let your synthesis tool guess at what you want. Use Xilinx primitives to clearly define all I/O pins and critical logic. Be sure to define the electrical properties of your I/O pins. Make your logic 100 percent synchronous and reference all logic to your master clock domain.
Apply timing constraints to ensure timing closure. If you follow these three steps, you will have removed variances due to synthesis and timing. Abolishing those two significant obstacles will give you code that works with 100 percent reliability. This article was originally printed in Xcell Journal and reprinted here with the permission of Xilinx Inc. and Spirent Communications. |