Fanout decomposition dataflow optimizations for FPGA-based Sparse LU factorization

Fanout decomposition dataflow optimizations for FPGA-based Sparse LU factorization

Performance of FPGA-based token dataflow architectures is often limited by the long tail distribution of parallelism in the compute paths of the dataflow graphs. This is known to limit speedup of dataflow processing of Sparse LU factorization to only 3-10x over CPUs. One reason behind the limitation...

Full description

Bibliographic Details
Main Authors:	Siddhartha, Kapre, Nachiket
Other Authors:	School of Computer Engineering
Format:	Conference Paper
Language:	English
Published:	2015
Subjects:	Computer Science and Engineering
Online Access:	https://hdl.handle.net/10356/81207 http://hdl.handle.net/10220/39179

Similar Items

Heterogeneous dataflow architectures for FPGA-based sparse LU factorization
by: Siddhartha, et al.
Published: (2015)

Breaking Sequential Dependencies in FPGA-Based Sparse LU Factorization
by: Siddhartha, et al.
Published: (2015)

Limits of Statically-Scheduled Token Dataflow Processing
by: Kapre, Nachiket, et al.
Published: (2015)

Custom FPGA-based soft-processors for sparse graph acceleration
by: Kapre, Nachiket
Published: (2015)

An NoC Traffic Compiler for Efficient FPGA Implementation of Sparse Graph-Oriented Workloads
by: Kapre, Nachiket, et al.
Published: (2015)

MixFX-SCORE: Heterogeneous Fixed-Point Compilation of Dataflow Computations
by: Ye, Deheng, et al.
Published: (2015)

Dataflow optimized overlays for FPGAs
by: Siddhartha
Published: (2019)

GraphMMU: Memory Management Unit for Sparse Graph Accelerators
by: Han, Jianglei, et al.
Published: (2015)

Analysis and optimization of a deeply pipelined FPGA soft processor
by: Cheah, Hui Yan, et al.
Published: (2015)

VLIW-SCORE: Beyond C for sequential control of SPICE FPGA acceleration
by: Kapre, Nachiket, et al.
Published: (2015)

Timing Fault Detection in FPGA-Based Circuits
by: Stott, Edward, et al.
Published: (2015)

Performance comparison of single-precision SPICE Model-Evaluation on FPGA, GPU, Cell, and multi-core processors
by: Kapre, Nachiket, et al.
Published: (2015)

Driving Timing Convergence of FPGA Designs through Machine Learning and Cloud Computing
by: Kapre, Nachiket, et al.
Published: (2015)

System-level FPGA device driver with high-level synthesis support
by: Vipin, Kizheppatt, et al.
Published: (2015)

Limits of FPGA acceleration of 3D Green's Function computation for geophysical applications
by: Kapre, Nachiket, et al.
Published: (2015)

Shift Register, Reconvergent-Fanout (SiRF) PUF Implementation on an FPGA
by: Jim Plusquellic
Published: (2022-11-01)

Exploiting input parameter uncertainty for reducing datapath precision of SPICE device models
by: Kapre, Nachiket
Published: (2013)

Application composition and communication optimization in iterative solvers using FPGAs
by: Rafique, Abid, et al.
Published: (2013)

Enhancing performance of Tall-Skinny QR factorization using FPGAs
by: Rafique, Abid, et al.
Published: (2015)

Hoplite: Building austere overlay NoCs for FPGAs
by: Kapre, Nachiket, et al.
Published: (2015)

Energy-Efficient Acceleration of OpenCV Saliency Computation Using Soft Vector Processors
by: Hegde, Gopalakrishna, et al.
Published: (2015)

FX-SCORE: A Framework for Fixed-Point Compilation of SPICE Device Models Using Gappa++
by: Martorell, Hélène, et al.
Published: (2015)

Zedwulf: Power-Performance Tradeoffs of a 32-Node Zynq SoC Cluster
by: Moorthy, Pradeep, et al.
Published: (2015)

Distributed dynamic partially stateful dataflow
by: Behrens, Jonathan (Jonathan Kyle)
Published: (2018)

Implementation of a general purpose dataflow multiprocessor
by: Papadopoulos, Gregory M. (Gregory Michael)
Published: (2005)

A dataflow/von Neumann hybrid architecture
by: Iannucci, Robert A
Published: (2005)

Managing parallelism and resources in scientific dataflow programs
by: Culler, David E
Published: (2005)

Simulation of a novel multiprocessor system based on dataflow principles
by: Zhou, Mo, S.M. Massachusetts Institute of Technology
Published: (2016)

Optimization of Microarchitecture and Dataflow for Sparse Tensor CNN Acceleration
by: Ngoc-Son Pham, et al.
Published: (2023-01-01)

Synthesis and optimization of pipelines for HW implementations of dataflow programs
by: Prihozhy, Anatoly, et al.
Published: (2015)

Scalable fault tolerance for high-performance streaming dataflow
by: Yuan, Gina,M. Eng.Massachusetts Institute of Technology.
Published: (2020)

A formal model of non-determinate dataflow computation
by: Brock, Jarvis Dean
Published: (2006)

Efficient FPGA-based sparse matrix-vector multiplication with data reuse-aware compression
by: Li, Shiqing, et al.
Published: (2023)

Spatial hardware implementation for sparse graph algorithms in GraphStep
by: Delorimier, Michael, et al.
Published: (2015)

Using a denotational proof language to verify dataflow analyses
by: Hao, Melissa B. (Melissa Betty), 1979-
Published: (2014)

Communication Optimization of Iterative Sparse Matrix-Vector Multiply on GPUs and FPGAs
by: Rafique, Abid, et al.
Published: (2015)

Advanced topics in dataflow computing and multithreading /
by: Bic, Lubomir, 1951-, et al.
Published: (1995)

Enhancing Speedups for FPGA Accelerated SPICE through Frequency Scaling and Precision Reduction
by: Lim, Hui Hui, et al.
Published: (2015)

Dynamic 3-D facial compression using low rank and sparse decomposition
by: Chau, Lap-Pui, et al.
Published: (2013)

SPICE2: Spatial Processors Interconnected for Concurrent Execution for Accelerating the SPICE Circuit Simulator Using an FPGA
by: Kapre, Nachiket, et al.
Published: (2015)