Fast Serial Append File I/O Mode For Cilk

We introduce Serial Append, a new parallel file I/O mode well suited for parallel applications. Serial Append allows parallel applications to efficiently perform file I/O in parallel while maintaining a structure that makes the sequential file I/O operations easily available. This property of Serial...

Full description

Bibliographic Details
Main Author: CaracaÅ , Alexandru
Format: Article
Language:en_US
Published: 2003
Subjects:
Online Access:http://hdl.handle.net/1721.1/3859
_version_ 1811089938263834624
author CaracaÅ , Alexandru
author_facet CaracaÅ , Alexandru
author_sort CaracaÅ , Alexandru
collection MIT
description We introduce Serial Append, a new parallel file I/O mode well suited for parallel applications. Serial Append allows parallel applications to efficiently perform file I/O in parallel while maintaining a structure that makes the sequential file I/O operations easily available. This property of Serial Append lets serial applications, that are strictly dependent on sequential file I/O operation, to be easily parallelized. Serial Append can be used for example to parallelize compression utilities such as bz2, or to add logging to existent parallel applications, or as additional support to database applications. We present a cost efficient algorithm for performing Serial Append that uses concurrent Skip-Lists to guarantee write as well as read and seek file operations. We discuss the implementation of the algorithm for the Cilk multithreaded language. We also show the performance of several serial applications that were ported to use Serial Append. As an example of Serial Append consider a parallel execution (on several processors) of a multithreaded computation that performs output to a file. The output of the parallel execution of the program is scrambled, namely the output of one processor is interleaved with the output of the others. This does not correspond to the sequential (single processor) execution of the multithreaded computation. With negligible slow down in the parallel execution, Serial Append allows for the serial output to be easily available and reconstructed by saving the meta-data of the parallel execution of the computation in an additional structure. This permits efficient, parallel access, to the parallel file I/O operations as if they were performed sequentially. Cilk is a multithreaded language for writing parallel applications, and it has a provably good work-stealing scheduler. To make Serial Append work for Cilk it is important to keep track of the steal operations that occur in the parallel execution of a Cilk program. We define an I/O node to contain all the write operations that a processor performs between two steals. The implementation of Serial Append for Cilk is closely coupled with the Cilk runtime and scheduler. Meta-data about the parallel computation, namely the I/O nodes that are created when steal operations occur, is saved in a concurrent Skip-List. This is the only synchronization required during the parallel execution. By using a Skip-List, we improve the performance of the read and seek operations. On each steal operation, a new I/O node is created and inserted in the right order into the Skip-List, namely after the I/O node of the processor from which the current processor stole work. Each processor has its own file I/O buffer and all its write operations are performed into this buffer. In practice, an I/O node only contains the relevant meta-data that allows to access and retrieve the data written to the file I/O buffers. This scheme makes possible for all processors to perform parallel file I/O operations in their own buffers and maximizes the performance of write operations.
first_indexed 2024-09-23T14:27:51Z
format Article
id mit-1721.1/3859
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T14:27:51Z
publishDate 2003
record_format dspace
spelling mit-1721.1/38592019-04-12T13:40:39Z Fast Serial Append File I/O Mode For Cilk CaracaÅ , Alexandru Serial Append Cilk multithreaded language parallel computation Skip-Lists We introduce Serial Append, a new parallel file I/O mode well suited for parallel applications. Serial Append allows parallel applications to efficiently perform file I/O in parallel while maintaining a structure that makes the sequential file I/O operations easily available. This property of Serial Append lets serial applications, that are strictly dependent on sequential file I/O operation, to be easily parallelized. Serial Append can be used for example to parallelize compression utilities such as bz2, or to add logging to existent parallel applications, or as additional support to database applications. We present a cost efficient algorithm for performing Serial Append that uses concurrent Skip-Lists to guarantee write as well as read and seek file operations. We discuss the implementation of the algorithm for the Cilk multithreaded language. We also show the performance of several serial applications that were ported to use Serial Append. As an example of Serial Append consider a parallel execution (on several processors) of a multithreaded computation that performs output to a file. The output of the parallel execution of the program is scrambled, namely the output of one processor is interleaved with the output of the others. This does not correspond to the sequential (single processor) execution of the multithreaded computation. With negligible slow down in the parallel execution, Serial Append allows for the serial output to be easily available and reconstructed by saving the meta-data of the parallel execution of the computation in an additional structure. This permits efficient, parallel access, to the parallel file I/O operations as if they were performed sequentially. Cilk is a multithreaded language for writing parallel applications, and it has a provably good work-stealing scheduler. To make Serial Append work for Cilk it is important to keep track of the steal operations that occur in the parallel execution of a Cilk program. We define an I/O node to contain all the write operations that a processor performs between two steals. The implementation of Serial Append for Cilk is closely coupled with the Cilk runtime and scheduler. Meta-data about the parallel computation, namely the I/O nodes that are created when steal operations occur, is saved in a concurrent Skip-List. This is the only synchronization required during the parallel execution. By using a Skip-List, we improve the performance of the read and seek operations. On each steal operation, a new I/O node is created and inserted in the right order into the Skip-List, namely after the I/O node of the processor from which the current processor stole work. Each processor has its own file I/O buffer and all its write operations are performed into this buffer. In practice, an I/O node only contains the relevant meta-data that allows to access and retrieve the data written to the file I/O buffers. This scheme makes possible for all processors to perform parallel file I/O operations in their own buffers and maximizes the performance of write operations. Singapore-MIT Alliance (SMA) 2003-12-13T19:19:26Z 2003-12-13T19:19:26Z 2004-01 Article http://hdl.handle.net/1721.1/3859 en_US Computer Science (CS); 56925 bytes application/pdf application/pdf
spellingShingle Serial Append
Cilk multithreaded language
parallel computation
Skip-Lists
CaracaÅ , Alexandru
Fast Serial Append File I/O Mode For Cilk
title Fast Serial Append File I/O Mode For Cilk
title_full Fast Serial Append File I/O Mode For Cilk
title_fullStr Fast Serial Append File I/O Mode For Cilk
title_full_unstemmed Fast Serial Append File I/O Mode For Cilk
title_short Fast Serial Append File I/O Mode For Cilk
title_sort fast serial append file i o mode for cilk
topic Serial Append
Cilk multithreaded language
parallel computation
Skip-Lists
url http://hdl.handle.net/1721.1/3859
work_keys_str_mv AT caracaaalexandru fastserialappendfileiomodeforcilk