|
1 | 1 | { |
2 | 2 | "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "code", |
| 5 | + "execution_count": null, |
| 6 | + "outputs": [], |
| 7 | + "source": [ |
| 8 | + "from cldk.utils.treesitter.tree_sitter_utils import TreeSitterUtils\n", |
| 9 | + "!pip install ollama" |
| 10 | + ], |
| 11 | + "metadata": { |
| 12 | + "collapsed": false |
| 13 | + }, |
| 14 | + "id": "3195a8c0612cb428" |
| 15 | + }, |
3 | 16 | { |
4 | 17 | "cell_type": "markdown", |
5 | 18 | "source": [ |
6 | | - "Code translation aims to convert source code from one programming language (PL) to another. Given the promising abilities of large language models (LLMs) in code synthesis, researchers are exploring their potential to automate code translation. In our recent paper [https://dl.acm.org/doi/10.1145/3597503.3639226] published at ICSE'24, we found that LLM-based code translation is very promising. In this example, we will walk through the steps of translating each Java class to Python and checking various properties of translated code, such as the number of methods, number of fields, formal arguments, etc.\n", |
| 19 | + "Code translation aims to convert source code from one programming language (PL) to another. Given the promising abilities of large language models (LLMs) in code synthesis, researchers are exploring their potential to automate code translation. In our recent paper [https://dl.acm.org/doi/10.1145/3597503.3639226] published at ICSE'24, we found that LLM-based code translation is very promising. In this example, we will walk through the steps of translating each Java class to Python and checking various properties of translated code, such as the number of methods, number of fields, formal arguments, etc.\n", |
7 | 20 | "\n", |
8 | 21 | "(Step 1) First, we will import all the necessary libraries" |
9 | 22 | ], |
|
83 | 96 | "collapsed": false |
84 | 97 | }, |
85 | 98 | "id": "1c86224032a6eb70" |
| 99 | + }, |
| 100 | + { |
| 101 | + "cell_type": "markdown", |
| 102 | + "source": [ |
| 103 | + "(Step 4) Translate each class in the application (provide the application path as an environment variable, ```JAVA_APP_PATH```) and check certain properties of the translated code, such as (a) number of translated method, and (b) number of translated fields. " |
| 104 | + ], |
| 105 | + "metadata": { |
| 106 | + "collapsed": false |
| 107 | + }, |
| 108 | + "id": "518efea0d8c4d307" |
| 109 | + }, |
| 110 | + { |
| 111 | + "cell_type": "code", |
| 112 | + "execution_count": null, |
| 113 | + "outputs": [], |
| 114 | + "source": [ |
| 115 | + "from cldk.analysis.python.treesitter import PythonSitter\n", |
| 116 | + "from cldk.analysis.java.treesitter import JavaSitter\n", |
| 117 | + "\n", |
| 118 | + "# Create a new instance of the CLDK class\n", |
| 119 | + "cldk = CLDK(language=\"java\")\n", |
| 120 | + "# Create an analysis object over the java application. Provide the application path using JAVA_APP_PATH\n", |
| 121 | + "analysis = cldk.analysis(project_path=\"JAVA_APP_PATH\", analysis_level=AnalysisLevel.symbol_table)\n", |
| 122 | + "# Go through all the classes in the application\n", |
| 123 | + "for class_name in analysis.get_classes():\n", |
| 124 | + " # Get the location of the Java class\n", |
| 125 | + " class_path = analysis.get_java_file(qualified_class_name=class_name)\n", |
| 126 | + " # Read the file content\n", |
| 127 | + " if not class_path:\n", |
| 128 | + " class_body = ''\n", |
| 129 | + " with open(class_path, 'r', encoding='utf-8', errors='ignore') as f:\n", |
| 130 | + " class_body = f.read()\n", |
| 131 | + " # Sanitize the file content by removing comments.\n", |
| 132 | + " tree_sitter_utils = cldk.tree_sitter_utils(source_code=class_body)\n", |
| 133 | + " sanitized_class = JavaSitter.remove_all_comments(source_code=class_body)\n", |
| 134 | + " translated_code = prompt_ollama(\n", |
| 135 | + " message=sanitized_class,\n", |
| 136 | + " model_id=\"granite-code:20b-instruct\")\n", |
| 137 | + " py_cldk = PythonSitter()\n", |
| 138 | + " all_methods = py_cldk.get_all_methods(module=translated_code)\n", |
| 139 | + " all_functions = py_cldk.get_all_functions(module=translated_code)\n", |
| 140 | + " all_fields = py_cldk.get_all_fields(module=translated_code)\n", |
| 141 | + " if len(all_methods) + len(all_functions) != len(analysis.get_methods_in_class(qualified_class_name=class_name)):\n", |
| 142 | + " print(f'Number of translated method not matching in class {class_name}')\n", |
| 143 | + " if len(all_fields) != len(analysis.get_class(qualified_class_name=class_name).field_declarations):\n", |
| 144 | + " print(f'Number of translated field not matching in class {class_name}') " |
| 145 | + ], |
| 146 | + "metadata": { |
| 147 | + "collapsed": false |
| 148 | + }, |
| 149 | + "id": "fe3be3de6790f7b3" |
86 | 150 | } |
87 | 151 | ], |
88 | 152 | "metadata": { |
|
0 commit comments