AMBIENTUM BIOETHICA BIOLOGIA CHEMIA DIGITALIA DRAMATICA EDUCATIO ARTIS GYMNAST. ENGINEERING EPHEMERIDES EUROPAEA GEOGRAPHIA GEOLOGIA HISTORIA HISTORIA ARTIUM INFORMATICA IURISPRUDENTIA MATHEMATICA MUSICA NEGOTIA OECONOMICA PHILOLOGIA PHILOSOPHIA PHYSICA POLITICA PSYCHOLOGIA-PAEDAGOGIA SOCIOLOGIA THEOLOGIA CATHOLICA THEOLOGIA CATHOLICA LATIN THEOLOGIA GR.-CATH. VARAD THEOLOGIA ORTHODOXA THEOLOGIA REF. TRANSYLVAN
|
|||||||
The STUDIA UNIVERSITATIS BABEŞ-BOLYAI issue article summary The summary of the selected article appears at the bottom of the page. In order to get back to the contents of the issue this article belongs to you have to access the link from the title. In order to see all the articles of the archive which have as author/co-author one of the authors mentioned below, you have to access the link from the author's name. |
|||||||
STUDIA INFORMATICA - Issue no. 2 / 2023 | |||||||
Article: |
DEOBFUSCATING JAVASCRIPT CODE USING CHARACTER-BASED TOKENIZATION. Authors: ALEXANDRU-GABRIEL SÎRBU. |
||||||
Abstract: DOI: 10.24193/subbi.2023.2.01 Published Online: 2023-12-22 pp. 5-21 VIEW PDF FULL PDF The JavaScript code deployed goes through the process of minification, in which variables are renamed using single-character names and spaces are removed in order for the files to have a smaller size, thus loading faster. Because of this, the code becomes unintelligible, making it harder to be analyzed manually. Since JavaScript experts can under- stand it, machine learning approaches to deobfuscate the minified file are possible. Thus, we propose a technique that finds a fitting name for each obfuscated variable, which is both intuitive and meaningful based on the usage of that variable, based on a Sequence-to-Sequence model, which generates the name character by character to cover all the possible variable names. The proposed approach achieves an average exact name generation accuracy of 70.53%, outperforming the state-of-the-art by 12%. Received by the editors: 31 July 2023. 2010 Mathematics Subject Classification 68T05, 68T50. 1998 CR Categories and Descriptors. I.2.6 [Learning]: Subtopic – Connectionism and neural nets; I.2.7 [Natural Language Processing]: Subtopic – Language generation. Keywords and phrases: JavaScript deobfuscation, variable name prediction, Deep Learning, Recurrent Neural Network, Abstract Syntax Tree. |
|||||||