Fanout decomposition dataflow optimizations for FPGA-based Sparse LU factorization
Performance of FPGA-based token dataflow architectures is often limited by the long tail distribution of parallelism in the compute paths of the dataflow graphs. This is known to limit speedup of dataflow processing of Sparse LU factorization to only 3-10x over CPUs. One reason behind the limitation...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Conference Paper |
Language: | English |
Published: |
2015
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/81207 http://hdl.handle.net/10220/39179 |