Andrej Karpathy is a former research scientist and founding member of OpenAI. He was also the senior director of AI at Tesla.
Lately, he has been using Copilot, which leverages GPT-3 to generate code. He tweeted this about it:
“Nice read on reverse engineering of GitHub Copilot. Copilot has dramatically accelerated my coding, it’s hard to imagine going back to “manual coding”. Still learning to use it but it already writes ~80% of my code, ~80% accuracy. I don’t even really code, I prompt. & edit.”
While ChatGPT has recently captivated the world, the fact is that generative AI has been making significant inroads the last few years. A key area for this has been to help with code development.
Yet there are some issues with these systems, such as security vulnerabilities. This is a conclusion from a paper by Stanford academics. They note:
“We found that participants with access to an AI assistant often produced more security vulnerabilities than those without access, with particularly significant results for string encryption and SQL injection… Surprisingly, we also found that participants provided access to an AI assistant were more likely to believe that they wrote secure code than those without access to the AI assistant.”
Understanding AI Auto Coding Systems
For some time, IDEs or Integrated Development Environments, have had smart systems to improve coding. Some of the features include autocompletion, code suggestions and advanced debugging.
But the emergence of large language models like GPT-3, Codex, Copilot and ChatGPT have been transformative. They leverage generative AI techniques like transformers, unsupervised learning and reinforcement learning. By processing huge amounts of content, LLMs can understand and create sophisticated code.
For example, you can write a prompt like “Write a function in Python that averages the numbers from the XYZ database.” The AI system will do this. It will even understand the context, such as the relevant variable declarations to include.
An AI-based coding system can also provide recommendations when someone is programming. This could be when you begin to write the header for a function and the system will finish the code block. You can press tab to accept it.
“Code generation is one of the early killer apps for generative AI,” said Muddu Sudhakar, the founder and CEO of Aisera, a generative AI startup. “These systems do not replace programmers. But they certainly make them much more productive.”
AI Coding Problems
There are a host of issues with AI code generation systems. They can “hallucinate,” which means that the code can seem solid but actually has flaws. In some cases, the code creation may stop mid-stream because of the complexity of the functions.
But these problems should not be a surprise. AI code generation systems are trained on huge amounts of public repositories, such as on GitHub. Some of the programs may not be well written or in accordance with common standards.
This can also allow for vulnerabilities with security.
“Trusting that the AI will generate code to the specification of the request does not mean the code has been generated to incorporate the best libraries, considered supply chain risks, or has access to all of the close-source tools used to scan for vulnerabilities,” said Matt Duench, Senior Director of Product Marketing at Okta, an identity management company. “They can often lack the cybersecurity context of how that code functions within a company’s internal environment and source code.”
Another issue is that developers may not have the skill sets to identify the security problems. Part of this is due to how well structured the code looks.
“When you develop a program yourself, you have a pretty strong knowledge of what it does, line by line,” said Richard Ford, Chief Technology Officer at Praetorian, a cybersecurity firm. “While Internet sites such as StackOverflow already provide a corpus of code that developers can and do cut and paste into their own programs without full understanding, models like ChatGPT provide significantly more code with significantly less effort – potentially opening this ‘understanding gap’ wider.”
Managing AI Coding Security Issues
When it comes to managing the security risks of AI code generation systems, there should first be a thorough evaluation of the tool. What are the terms of service? How is the data used? Are there guardrails in place?
For example, one concern is that there could be potential intellectual property violations. The code for training may have licenses that do not allow it to be used for code generation.
To deal with this, ServiceNow teamed up with Hugging Face to create BigCode. The goal is to create a coding tool that abides by “open and responsible” AI.
Even if a tool is appropriate for your organization, there should also be effective code reviews. “When it comes to cybersecurity, these outputs should be carefully checked by a security expert who can complete a secure code review of the output,” said Duench. “Additionally, the output should be double-checked against a database of existing known vulnerabilities to identify potential areas of risk.”
Regardless, it seems like AI code generation systems are here to stay – and will have a major impact on IT. The technology will improve productivity, democratize development, and help to alleviate the developer shortage.
“I don’t think companies should respond by banning this kind of help,” said Ford. “The genie is out of the bottle, and the companies who will do best in this Brave New World will be those who embrace advances with care and thought, not those who either reject them outright or deploy them recklessly as if they were a panacea.”
Get the Free Cybersecurity Newsletter
Strengthen your organization’s IT security defenses by keeping up to date on the latest cybersecurity news, solutions, and best practices.