Symbolic automata for representing big code

Analysis of massive codebases (“big code”) presents an opportunity for drawing insights about programming practice and enabling code reuse. One of the main challenges in analyzing big code is finding a representation that captures sufficient semantic information, can be constructed efficiently, and...

Full description

Bibliographic Details
Main Authors: Yang, H, Peleg, H, Shoham, S, Yahav, E
Format: Journal article
Published: Springer Verlag 2016
_version_ 1826289240164532224
author Yang, H
Peleg, H
Shoham, S
Yahav, E
author_facet Yang, H
Peleg, H
Shoham, S
Yahav, E
author_sort Yang, H
collection OXFORD
description Analysis of massive codebases (“big code”) presents an opportunity for drawing insights about programming practice and enabling code reuse. One of the main challenges in analyzing big code is finding a representation that captures sufficient semantic information, can be constructed efficiently, and is amenable to meaningful comparison operations. We present a formal framework for representing code in large codebases. In our framework, the semantic descriptor for each code snippet is a partial temporal specification that captures the sequences of method invocations on an API. The main idea is to represent partial temporal specifications as symbolic automata—automata where transitions may be labeled by variables, and a variable can be substituted by a letter, a word, or a regular language. Using symbolic automata, we construct an abstract domain for static analysis of big code, capturing both the partialness of a specification and the precision of a specification. We show interesting relationships between lattice operations of this domain and common operators for manipulating partial temporal specifications, such as building a more informative specification by consolidating two partial specifications, and comparing partial temporal specifications.
first_indexed 2024-03-07T02:25:52Z
format Journal article
id oxford-uuid:a594a7d2-c7c5-4a61-884b-0f44a89815b8
institution University of Oxford
last_indexed 2024-03-07T02:25:52Z
publishDate 2016
publisher Springer Verlag
record_format dspace
spelling oxford-uuid:a594a7d2-c7c5-4a61-884b-0f44a89815b82022-03-27T02:41:25ZSymbolic automata for representing big codeJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:a594a7d2-c7c5-4a61-884b-0f44a89815b8Symplectic Elements at OxfordSpringer Verlag2016Yang, HPeleg, HShoham, SYahav, EAnalysis of massive codebases (“big code”) presents an opportunity for drawing insights about programming practice and enabling code reuse. One of the main challenges in analyzing big code is finding a representation that captures sufficient semantic information, can be constructed efficiently, and is amenable to meaningful comparison operations. We present a formal framework for representing code in large codebases. In our framework, the semantic descriptor for each code snippet is a partial temporal specification that captures the sequences of method invocations on an API. The main idea is to represent partial temporal specifications as symbolic automata—automata where transitions may be labeled by variables, and a variable can be substituted by a letter, a word, or a regular language. Using symbolic automata, we construct an abstract domain for static analysis of big code, capturing both the partialness of a specification and the precision of a specification. We show interesting relationships between lattice operations of this domain and common operators for manipulating partial temporal specifications, such as building a more informative specification by consolidating two partial specifications, and comparing partial temporal specifications.
spellingShingle Yang, H
Peleg, H
Shoham, S
Yahav, E
Symbolic automata for representing big code
title Symbolic automata for representing big code
title_full Symbolic automata for representing big code
title_fullStr Symbolic automata for representing big code
title_full_unstemmed Symbolic automata for representing big code
title_short Symbolic automata for representing big code
title_sort symbolic automata for representing big code
work_keys_str_mv AT yangh symbolicautomataforrepresentingbigcode
AT pelegh symbolicautomataforrepresentingbigcode
AT shohams symbolicautomataforrepresentingbigcode
AT yahave symbolicautomataforrepresentingbigcode