arXiv posts FPGA sigmoid method
- Chintan Panchal, Ankur Changela, and Mohendra Roy posted an arXiv paper on April 26 describing an FPGA sigmoid circuit built with mixed-radix CORDIC. - The design runs on a Xilinx Virtex-7 with 16-bit fixed point, uses 835 logic slices, zero digital signal processors, and reports 4.23×10^-4 mean error. - The paper targets cheaper neural-network activation hardware on edge-class field-programmable gate arrays. (arxiv.org)
A sigmoid is the S-shaped function that turns a raw number into a value between 0 and 1, and hardware engineers keep trying to make it cheaper to compute on chips. (arxiv.org) The expensive part is the math: sigmoid normally depends on exponentials, which are awkward on field-programmable gate arrays, or FPGAs, that favor simple repeated operations. (arxiv.org) One common workaround is CORDIC, short for coordinate rotation digital computer, a method that replaces hard multipliers with shift-and-add steps, like approximating a curve through many tiny turns. (arxiv.org) (inass.org) Chintan Panchal, Ankur Changela, and Mohendra Roy said in an arXiv paper submitted April 26 that they built a sigmoid design around a mixed-radix hyperbolic rotation CORDIC. (arxiv.org) Their method first maps the sigmoid problem onto hyperbolic tangent, or tanh, then shrinks the input range to ±1 so the tanh stage only has to work over ±0.5. (arxiv.org 1) (arxiv.org 2) The circuit starts with radix-2 steps for stable convergence, switches to radix-4 steps to move faster, and finishes with a radix-2 linear vectoring stage to divide hyperbolic sine by cosine. (arxiv.org) The authors said they implemented the fully pipelined design on a Xilinx Virtex-7 FPGA using 16-bit fixed-point arithmetic. (arxiv.org) They reported 835 logic slices, zero digital signal processor blocks, and a mean absolute error of 4.23×10^-4, which the paper said beat several recent sigmoid implementations. (arxiv.org) That puts the paper in a familiar FPGA tradeoff: spend a small amount of logic and routing to avoid dedicated multiplier hardware, then use pipelining to keep throughput high. (arxiv.org 1) (arxiv.org 2) The paper is not a product launch or a deployment report. It is an arXiv preprint that the authors said has been accepted for the 2026 International Conference on Applied Artificial Intelligence. (arxiv.org)