Table 2. Throughput of Native Arithmetic Instructions. (Number of Operations per Clock Cycle per Multiprocessor) Compute Capability

Architecture                                                                    , FER , FER , KPL , MAX
32-bit floating-point add multiply multiply-add                                 , 32  , 48  , 192 , 128
64-bit floating-point add multiply multiply-add                                 , 16  , 4   , 8   , 1
32-bit floating-point reciprocal reciprocal square root log2f/exp2f/sine/cosine , 4   , 8   , 32  , 32
32-bit integer add extended-precision add subtract extended-precision subtract  , 32  , 48  , 160 , 128
32-bit integer multiply multiply-add extended-precision multiply-add            , 16  , 16  , 32  , Multiple instructions
32-bit integer shift                                                            , 16  , 16  , 32  , 64
compare minimum maximum                                                         , 32  , 48  , 160 , 64
32-bit integer bit reverse bit field extract/insert                             , 16  , 16  , 32  , 64
32-bit bitwise AND / OR / XOR                                                   , 32  , 160 , 160 , 128
count of leading zeros most significant non-sign bit                            , 16  , 16  , 32  , Multiple instructions
population count                                                                , 16  , 16  , 32  , 32
warp shuffle                                                                    , N/A , N/A , 32  , 32
sum of absolute difference                                                      , 16  , 16  , 32  , 64
SIMD video instructions vabsdiff2                                               , N/A , N/A , 160 , Multiple instructions
SIMD video instructions vabsdiff4                                               , N/A , N/A , 160 , Multiple instructions
All other SIMD video instructions                                               , 16  , 16  , 32  , Multiple instructions
Type conversions from 8/16-bit integer to 32-bit types                          , 16  , 16  , 128 , 32
Type conversions from and to 64-bit types                                       , 16  , 4   , 8   , 4
All other type conversions                                                      , 16  , 16  , 32  , 32


Some tips:
* bit field operations are as fast as shift operations.
