arXiv posts FPGA sigmoid method

- Chintan Panchal, Ankur Changela, and Mohendra Roy posted an arXiv paper on April 26 describing an FPGA sigmoid circuit built with mixed-radix CORDIC. - The design runs on a Xilinx Virtex-7 with 16-bit fixed point, uses 835 logic slices, zero digital signal processors, and reports 4.23×10^-4 mean error. - The paper targets cheaper neural-network activation hardware on edge-class field-programmable gate arrays. (arxiv.org)

A sigmoid is the S-shaped function that turns a raw number into a value between 0 and 1, and hardware engineers keep trying to make it cheaper to compute on chips. (arxiv.org) The expensive part is the math: sigmoid normally depends on exponentials, which are awkward on field-programmable gate arrays, or FPGAs, that favor simple repeated operations. (arxiv.org) One common workaround is CORDIC, short for coordinate rotation digital computer, a method that replaces hard multipliers with shift-and-add steps, like approximating a curve through many tiny turns. (arxiv.org) (inass.org) Chintan Panchal, Ankur Changela, and Mohendra Roy said in an arXiv paper submitted April 26 that they built a sigmoid design around a mixed-radix hyperbolic rotation CORDIC. (arxiv.org) Their method first maps the sigmoid problem onto hyperbolic tangent, or tanh, then shrinks the input range to ±1 so the tanh stage only has to work over ±0.5. (arxiv.org 1) (arxiv.org 2) The circuit starts with radix-2 steps for stable convergence, switches to radix-4 steps to move faster, and finishes with a radix-2 linear vectoring stage to divide hyperbolic sine by cosine. (arxiv.org) The authors said they implemented the fully pipelined design on a Xilinx Virtex-7 FPGA using 16-bit fixed-point arithmetic. (arxiv.org) They reported 835 logic slices, zero digital signal processor blocks, and a mean absolute error of 4.23×10^-4, which the paper said beat several recent sigmoid implementations. (arxiv.org) That puts the paper in a familiar FPGA tradeoff: spend a small amount of logic and routing to avoid dedicated multiplier hardware, then use pipelining to keep throughput high. (arxiv.org 1) (arxiv.org 2) The paper is not a product launch or a deployment report. It is an arXiv preprint that the authors said has been accepted for the 2026 International Conference on Applied Artificial Intelligence. (arxiv.org)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.