Today, we are excited to announce the capability to fine-tune Code Llama models by Meta using Amazon SageMaker JumpStart. The Code Llama family of large language models (LLMs) is a collection of pre-trained and fine-tuned code generation models ranging in scale from 7 billion to 70 billion parameters. Fine-tuned Code Llama models provide better accuracy and explainability over the base Code Llama models, as evident on its testing against HumanEval and MBPP datasets. You can fine-tune and deploy Code Llama models with SageMaker JumpStart using the Amazon SageMaker Studio UI with a few clicks or using the SageMaker Python SDK. Fine-tuning of Llama models is based on the scripts provided in the llama-recipes GitHub repo from Meta using PyTorch FSDP, PEFT/LoRA, and Int8 quantization techniques. In this post, we walk through how to fine-tune Code Llama pre-trained models via SageMaker JumpStart through a one-click UI and SDK experience available in the following GitHub repository. What is SageMaker JumpStart With SageMaker JumpStart, machine learning (ML) practitioners can choose from a broad selection of publicly available foundation models. ML practitioners can deploy foundation models to dedicated Amazon SageMaker instances from a network isolated environment and customize models using SageMaker for model training and deployment. What is Code Llama Code Llama is a code-specialized version of Llama 2 that was created by further training Llama 2 on its code-specific datasets and sampling more data from that same dataset for longer. Code Llama features enhanced coding capabilities. It can generate code and natural language about code, from both code and natural language prompts (for example, “Write me a function that outputs the Fibonacci sequence”). You can also use it for code completion and debugging. It supports many of the most popular programming languages used today, including Python, C++, Java, PHP, Typescript (JavaScript), C#, Bash, and more. Why fine-tune Code Llama models Meta published Code Llama performance benchmarks on HumanEval and MBPP for common coding languages such as Python, Java, and JavaScript. The performance of Code Llama Python models on HumanEval demonstrated varying performance across different coding languages and tasks ranging from 38% on 7B Python model to 57% on 70B Python models. In addition, fine-tuned Code Llama models on SQL programming language have shown better results, as evident in SQL evaluation benchmarks. These published benchmarks highlight the potential benefits of fine-tuning Code Llama models, enabling better performance, customization, and adaptation to specific coding domains and tasks. No-code fine-tuning via the SageMaker Studio UI To start fine-tuning your Llama models using SageMaker Studio, complete the following steps: On the SageMaker Studio console, choose JumpStart in the navigation pane. You will find listings of over 350 models ranging from open source and proprietary models. Search for Code Llama models. If you don’t see Code Llama models, you can update your SageMaker Studio version by shutting down and restarting. For more information about version updates, refer to Shut down and Update Studio Apps. You can also find other model variants by choosing Explore all Code Generation Models or searching for Code Llama in the search box. SageMaker JumpStart currently supports instruction fine-tuning for Code Llama models. The following screenshot shows the fine-tuning page for the Code Llama 2 70B model. For Training dataset location, you can point to the Amazon Simple Storage Service (Amazon S3) bucket containing the training and validation datasets for fine-tuning. Set your deployment configuration, hyperparameters, and security settings for fine-tuning. Choose Train to start the fine-tuning job on a SageMaker ML instance. We discuss the dataset format you need prepare for instruction fine-tuning in the next section. After the model is fine-tuned, you can deploy it using the model page on SageMaker JumpStart. The option to deploy the fine-tuned model will appear when fine-tuning is finished, as shown in the following screenshot. Fine-tune via the SageMaker Python SDK In this section, we demonstrate how to fine-tune Code LIama models using the SageMaker Python SDK on an instruction-formatted dataset. Specifically, the model is fine-tuned for a set of natural language processing (NLP) tasks described using instructions. This helps improve the model’s performance for unseen tasks with zero-shot prompts. Complete the following steps to complete your fine-tuning job. You can get the entire fine-tuning code from the GitHub repository. First, let’s look at the dataset format required for the instruction fine-tuning. The training data should be formatted in a JSON lines (.jsonl) format, where each line is a dictionary representing a data sample. All training data must be in a single folder. However, it can be saved in multiple .jsonl files. The following is a sample in JSON lines format: { ‘system_prompt’: ‘a chat’, ‘question’: ‘Please focus on the efficiency of this problem and provide code in python:\nYou are given two strings `s` and `t` consisting of only lowercase English letters.\n\nReturn _the minimum number of characters that need to be appended to the end of_ `s` _so that_ `t` _becomes a **subsequence** of_ `s`.\n\nA **subsequence** is a string that can be derived from another string by deleting some or no characters without changing the order of the remaining characters.\n\n**Example 1:**\n\n**Input:** s = “coaching “, t = “coding “\n**Output:** 4\n**Explanation:** Append the characters “ding ” to the end of s so that s = “coachingding “.\nNow, t is a subsequence of s ( “**co**aching**ding** “).\nIt can be shown that appending any 3 characters to the end of s will never make t a subsequence.\n\n**Example 2:**\n\n**Input:** s = “abcde “, t = “a “\n**Output:** 0\n**Explanation:** t is already a subsequence of s ( “**a**bcde “).\n\n**Example 3:**\n\n**Input:** s = “z “, t = “abcde “\n**Output:** 5\n**Explanation:** Append the characters “abcde ” to the end of s so that s = “zabcde “.\nNow, t is a subsequence of s ( “z**abcde** “).\nIt can be shown that appending any 4 characters to the end of s will never make t a subsequence.\n\n**Constraints:**\n\n* `1 <= s.length, t.length <= 105`\n* `s` and `t` consist only of lowercase English letters.’, ‘response’: ‘### Explanation\n The algorithm maintains two pointers, `i` and `j`, to iterate through the given strings `s` and `t`. The pointer `i` iterates through the string `s` and the pointer `j` iterates through the string `t`. While iterating through `s`, if we find a character that matches the current character in `t`, we increment the `j` pointer. This is because the matched character can be a part of the subsequence we are looking for. The iteration stops when either `i` reaches the end of `s` or `j` reaches the end of `t`.\n\nAt the end, we subtract the index `j` from the length of `t`. This gives us the minimum number of characters that need to be appended to the end of `s` so that `t` becomes a subsequence of `s`.\n\n### Pseudocode:\n“`\nMIN-ADDITION(s, t)\n s_len = LENGTH(s)\n t_len = LENGTH(t)\n i = 1\n j = 1\n WHILE i <= s_len AND j <= t_len\n IF s[i] == t[j]\n j = j + 1\n ENDIF\n i = i + 1\n ENDWHILE\n RETURN t_len – j + 1\nENDMIN-ADDITION\n“`\n\n### Code\n“`python\ndef min_addition(s: str, t: str) -> int:\n s_len = len(s)\n t_len = len(t)\n i, j = 0, 0\n while i < s_len and j < t_len:\n if s[i] == t[j]:\n j += 1\n i += 1\n return t_len – j\n“`\n’ } The training folder can contain a template.json file describing the input and output formats. The following is an example template: { “prompt”: “{system_prompt} ### Input: {question}”, “completion”: “{response}” } To match the template, each sample in the JSON lines files must include system_prompt, question, and response fields. In this demonstration, we use the Dolphin Coder dataset from Hugging Face. After you prepare the dataset and upload it to the S3 bucket, you can start fine-tuning using the following code: from sagemaker.jumpstart.estimator import JumpStartEstimator model_id = “meta-textgeneration-llama-codellama-7b” model_version = “*” train_data_location = f”s3://{your_own_bucket_hosting_training_data}/” # training data in s3 bucket estimator = JumpStartEstimator( model_id=model_id, model_version=model_version, hyperparameters= hyperparameters, environment={ “accept_eula”: “false” }, # please change `accept_eula` to be `true` to accept EULA. ) estimator.fit({“training”: train_data_location}) You can deploy the fine-tuned model directly from the estimator, as shown in the following code. For details, see the notebook in the GitHub repository. finetuned_predictor = estimator.deploy() Fine-tuning techniques Language models such as Llama are more than 10 GB or even 100 GB in size. Fine-tuning such large models requires instances with significantly high CUDA memory. Furthermore, training these models can be very slow due to the size of the model. Therefore, for efficient…
Fine-tune Code Llama on Amazon SageMaker JumpStart
Sign Up for Our Newsletters
Get notified of the best deals on our WordPress themes.