Cross-platform binary analysis requires a common representation of binaries across platforms, on which a specific analysis can be performed. Recent work proposed to learn low-dimensional, numeric vector representations (i.e., embeddings) of disassembled binary code, and perform binary analysis in the embedding space. Unfortunately, however, existing techniques fall short in that they are either (i) specific to a single platform producing embeddings not aligned across platforms, or (ii) not designed to capture the rich contextual information available in a disassembled binary. We present a novel deep learning-based method, XBA, which addresses the aforementioned problems. To this end, we first abstract binaries as typed graphs, dubbed binary disassembly graphs (BDGs), which encode control-flow and other rich contextual information of different entities found in a disassembled binary, including basic blocks, external functions called, and string literals referenced. We then formulate binary code representation learning as a graph alignment problem, i.e., finding the node correspondences between BDGs extracted from two binaries compiled for different platforms. XBA uses graph convolutional networks to learn the semantics of each node, (i) using its rich contextual information encoded in the BDG, and (ii) aligning its embeddings across platforms. Our formulation allows XBA to learn semantic alignments between two BDGs in a semi-supervised manner, requiring only a limited number of node pairs be aligned across platforms for training. Our evaluation shows that XBA can learn semantically-rich embeddings of binaries aligned across platforms without apriori platform-specific knowledge. By training our model only with 50% of the oracle alignments, XBA was able to predict, on average, 75% of the rest. Our case studies further show that the learned embeddings encode knowledge useful for cross-platform binary analysis.
|Title of host publication||ISSTA 2022 - Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis|
|Editors||Sukyoung Ryu, Yannis Smaragdakis|
|Publisher||Association for Computing Machinery, Inc|
|Number of pages||13|
|Publication status||Published - 2022 Jul 18|
|Event||31st ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2022 - Virtual, Online, Korea, Republic of|
Duration: 2022 Jul 18 → 2022 Jul 22
|Name||ISSTA 2022 - Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis|
|Conference||31st ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2022|
|Country/Territory||Korea, Republic of|
|Period||22/7/18 → 22/7/22|
Bibliographical noteFunding Information:
We thank the anonymous reviewers for their insightful feedback. This material is based upon work partly supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) under awards 2021R1F1A1057436 and 2022R1C1C1003551, the Office of Naval Research (ONR) under awards N00014-21-1-2409 and N00014-17-1-2232, the Defense Advanced Research Projects Agency (DARPA) under award N66001-20-C-4027, and the Defense Advanced Research Projects Agency (DARPA) Small Business Technology Transfer (STTR) Program Office under contracts W31P4Q-20-C-0052 and W912CG-21-C-0020. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of NRF, MSIT, ONR, DARPA, the DARPA STTR Program Office, or any other South Korea and U.S. government agency. We also gratefully acknowledge an łEndeavorž research award from the Donald Bren School of Information and Computer Sciences at UC Irvine, and an award from the Yonsei University Research Fund of 2021 (2021-22-0039).
© 2022 ACM.
All Science Journal Classification (ASJC) codes
- Computational Theory and Mathematics
- Computer Science Applications