Efficient Memory Organization for DNN Hardware Accelerator Implementation on PSoC
The use of deep learning solutions in different disciplines is increasing and their algorithms are computationally expensive in most cases. For this reason, numerous hardware accelerators have appeared to compute their operations efficiently in parallel, achieving higher performance and lower latenc...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-01-01
|
Series: | Electronics |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-9292/10/1/94 |
_version_ | 1797542340280188928 |
---|---|
author | Antonio Rios-Navarro Daniel Gutierrez-Galan Juan Pedro Dominguez-Morales Enrique Piñero-Fuentes Lourdes Duran-Lopez Ricardo Tapiador-Morales Manuel Jesús Dominguez-Morales |
author_facet | Antonio Rios-Navarro Daniel Gutierrez-Galan Juan Pedro Dominguez-Morales Enrique Piñero-Fuentes Lourdes Duran-Lopez Ricardo Tapiador-Morales Manuel Jesús Dominguez-Morales |
author_sort | Antonio Rios-Navarro |
collection | DOAJ |
description | The use of deep learning solutions in different disciplines is increasing and their algorithms are computationally expensive in most cases. For this reason, numerous hardware accelerators have appeared to compute their operations efficiently in parallel, achieving higher performance and lower latency. These algorithms need large amounts of data to feed each of their computing layers, which makes it necessary to efficiently handle the data transfers that feed and collect the information to and from the accelerators. For the implementation of these accelerators, hybrid devices are widely used, which have an embedded computer, where an operating system can be run, and a field-programmable gate array (FPGA), where the accelerator can be deployed. In this work, we present a software API that efficiently organizes the memory, preventing reallocating data from one memory area to another, which improves the native Linux driver with a 85% speed-up and reduces the frame computing time by 28% in a real application. |
first_indexed | 2024-03-10T13:29:13Z |
format | Article |
id | doaj.art-9b7c67ef574046b0a1d6b1b8d9f31c97 |
institution | Directory Open Access Journal |
issn | 2079-9292 |
language | English |
last_indexed | 2024-03-10T13:29:13Z |
publishDate | 2021-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Electronics |
spelling | doaj.art-9b7c67ef574046b0a1d6b1b8d9f31c972023-11-21T08:26:51ZengMDPI AGElectronics2079-92922021-01-011019410.3390/electronics10010094Efficient Memory Organization for DNN Hardware Accelerator Implementation on PSoCAntonio Rios-Navarro0Daniel Gutierrez-Galan1Juan Pedro Dominguez-Morales2Enrique Piñero-Fuentes3Lourdes Duran-Lopez4Ricardo Tapiador-Morales5Manuel Jesús Dominguez-Morales6Robotics and Technology of Computers Lab, Universidad de Sevilla, Escuela Tecnica Superior de Ingenieria Informatica—Escuela Politectica Superior, 41012 Sevilla, SpainRobotics and Technology of Computers Lab, Universidad de Sevilla, Escuela Tecnica Superior de Ingenieria Informatica—Escuela Politectica Superior, 41012 Sevilla, SpainRobotics and Technology of Computers Lab, Universidad de Sevilla, Escuela Tecnica Superior de Ingenieria Informatica—Escuela Politectica Superior, 41012 Sevilla, SpainRobotics and Technology of Computers Lab, Universidad de Sevilla, Escuela Tecnica Superior de Ingenieria Informatica—Escuela Politectica Superior, 41012 Sevilla, SpainRobotics and Technology of Computers Lab, Universidad de Sevilla, Escuela Tecnica Superior de Ingenieria Informatica—Escuela Politectica Superior, 41012 Sevilla, SpainRobotics and Technology of Computers Lab, Universidad de Sevilla, Escuela Tecnica Superior de Ingenieria Informatica—Escuela Politectica Superior, 41012 Sevilla, SpainRobotics and Technology of Computers Lab, Universidad de Sevilla, Escuela Tecnica Superior de Ingenieria Informatica—Escuela Politectica Superior, 41012 Sevilla, SpainThe use of deep learning solutions in different disciplines is increasing and their algorithms are computationally expensive in most cases. For this reason, numerous hardware accelerators have appeared to compute their operations efficiently in parallel, achieving higher performance and lower latency. These algorithms need large amounts of data to feed each of their computing layers, which makes it necessary to efficiently handle the data transfers that feed and collect the information to and from the accelerators. For the implementation of these accelerators, hybrid devices are widely used, which have an embedded computer, where an operating system can be run, and a field-programmable gate array (FPGA), where the accelerator can be deployed. In this work, we present a software API that efficiently organizes the memory, preventing reallocating data from one memory area to another, which improves the native Linux driver with a 85% speed-up and reduces the frame computing time by 28% in a real application.https://www.mdpi.com/2079-9292/10/1/94deep learningembedded systemsPSoCmemory organizationFPGAhardware accelerator |
spellingShingle | Antonio Rios-Navarro Daniel Gutierrez-Galan Juan Pedro Dominguez-Morales Enrique Piñero-Fuentes Lourdes Duran-Lopez Ricardo Tapiador-Morales Manuel Jesús Dominguez-Morales Efficient Memory Organization for DNN Hardware Accelerator Implementation on PSoC Electronics deep learning embedded systems PSoC memory organization FPGA hardware accelerator |
title | Efficient Memory Organization for DNN Hardware Accelerator Implementation on PSoC |
title_full | Efficient Memory Organization for DNN Hardware Accelerator Implementation on PSoC |
title_fullStr | Efficient Memory Organization for DNN Hardware Accelerator Implementation on PSoC |
title_full_unstemmed | Efficient Memory Organization for DNN Hardware Accelerator Implementation on PSoC |
title_short | Efficient Memory Organization for DNN Hardware Accelerator Implementation on PSoC |
title_sort | efficient memory organization for dnn hardware accelerator implementation on psoc |
topic | deep learning embedded systems PSoC memory organization FPGA hardware accelerator |
url | https://www.mdpi.com/2079-9292/10/1/94 |
work_keys_str_mv | AT antonioriosnavarro efficientmemoryorganizationfordnnhardwareacceleratorimplementationonpsoc AT danielgutierrezgalan efficientmemoryorganizationfordnnhardwareacceleratorimplementationonpsoc AT juanpedrodominguezmorales efficientmemoryorganizationfordnnhardwareacceleratorimplementationonpsoc AT enriquepinerofuentes efficientmemoryorganizationfordnnhardwareacceleratorimplementationonpsoc AT lourdesduranlopez efficientmemoryorganizationfordnnhardwareacceleratorimplementationonpsoc AT ricardotapiadormorales efficientmemoryorganizationfordnnhardwareacceleratorimplementationonpsoc AT manueljesusdominguezmorales efficientmemoryorganizationfordnnhardwareacceleratorimplementationonpsoc |