Building on the version control workflow introduced in Chapter 4, remember that each time you refine the code, you should also commit or push these changes to your repository. This way, you capture an incremental history of improvements (or partial reverts) made in collaboration with the LLM. Whether you are introducing new features or simply tweaking the logic, you can:
- Create a new commit after each LLM-generated update or fix.
- Write a concise commit message describing which improvements or features were added.
- Compare your latest code against previous versions using
git diffor a GitHub pull request.
With that in mind, after the initial review of the LLM-generated code, you will likely have a list of potential improvements or missing elements. This marks the beginning of the Refine Code & Add Features step, where we iteratively enhance the AI's initial output to better align with our research needs and coding standards 34.
The first step in this refinement process is to create a wishlist of improvements 39. Based on your review in the previous chapter, jot down any aspects of the code that you would like to enhance or any features that are currently missing but would be beneficial. For our BMI harmonization example, this wishlist might include items such as implementing more robust handling of missing data, adding more detailed comments to explain the code's logic, or incorporating the calculation of an additional metric like height percentile into the output dataset 40. Identifying these desired changes provides a clear direction for the subsequent refinement efforts.
Once you have your wishlist, it is generally best to address issues iteratively 44. Instead of trying to implement all the changes at once, tackle them one set at a time. This approach helps to isolate any problems that might arise from the modifications and makes it easier to guide the LLM in the refinement process. Focus on one major improvement or missing feature in each iteration. For instance, you might first ask the LLM to add the BMI category definitions if they were absent in the initial code, and then in the next iteration, ask it to include the summary table of BMI category counts.
When you are ready to implement a change, it is crucial to re-prompt with specific instructions 45. Clearly state the exact part of the code that you want the LLM to modify or the new feature that you want it to add. Providing precise instructions will help to avoid ambiguity and ensure that the LLM understands exactly what you are asking for. For example, instead of a vague prompt like "improve the data cleaning," you could say: "Please modify the code in the 'Data Cleaning' section to also remove BMI entries where either the height or the weight is recorded as zero."
To further illustrate this process, consider some example refinement prompts 46. You might prompt the LLM to "The code is good, but now please modify it to also remove BMI entries where height or weight is zero, and then output a summary table of how many patients fall into each BMI category." Or, if you noticed that the code didn't include the calculation for height percentile, you could prompt: "Please add a calculation for height percentile based on the height_cm column and include it as a new column in the output dataset." These specific prompts guide the LLM to make targeted changes and add the desired functionalities.
In essence, code refinement is an integral part of the LLM-assisted workflow 21. By iteratively addressing issues and adding features through clear and specific prompts, researchers can mold the AI-generated code into a solution that not only works but also meets their specific research needs and adheres to high standards of quality and functionality. This iterative process allows for a collaborative evolution of the code, where the LLM acts as a helpful assistant guided by your expertise.
Remember to commit or push each refined version to your repository. This incremental approach keeps a clear record of the evolution of your code and ensures that all your improvements—small or large—are captured and can be revisited or merged as needed.