Home
/
Blog
/
Security Focus: Repository & Versioning Platforms

Security Focus: Repository & Versioning Platforms

Ordinary repository and versioning platforms like GIT and Subversion need security enhancement. This is because the current developer culture and typical coding practices lead to source code security vulnerabilities. In this ByteScout developer feature, we will explore repo and versioning platform enhancements that enforce security measures to prevent the attack.

We will look at both software repositories and version control platforms because they work together functionally in continuous integration pipelines. Developer methods of building CI pipelines commonly lead to security vulnerability and access by unauthorized users. These security vulnerabilities often arise from the following issues: Hardcoded

Unsecured source code
Unencrypted source code (Python)
Hardcoded authentication credentials
Shared credentials (encryption keys, API keys)
Memory safety with Rust compiler (Ownership)

Security-focused repository and versioning platforms can help to resolve these issues. However, it is clear that the developer adoption of security practices is essential to the successful use of these platforms. Developers will require compliance training to ensure the adoption of best practices, even when using security-centric platforms.

As with all software today, there are both open source and paid platforms. Overlap of terminology, as well as overlap in features supported by various platforms somewhat blur the distinction of terms. But we can outline the important concepts to determine security features essential to operating a secure development platform and delivery pipeline, including:

Automatic source code scanning
Source code encryption
Data warehouse encryption
Static Code analysis
Memory safety enforcement

Today, these features are found in various components along the development cycle. OpenBSD is an operating system that features automatic encryption protocols. Private repos now feature code scanners, which automatically search for hard-coded passwords and secret keys when the code is pushed to a repository. A private repo operating on OpenBSD could achieve a very high level of security. OpenCVS is the open-source repo designed to work within OpenBSD scope. Let’s look through the development cycle and discover the most secure developer platforms, with a special focus on repos and versioning.

Source Code Scanners

Enterprises will benefit from repository apps featuring automatic source code security scanners. These scanners have built-in configurable settings in dashboards for scanning source code pushed to a repo platform. There is a growing number of such platforms, because of the complaint that “Security is nailed on as an afterthought!” Here are a few:

Clousseau – Scans Git repos for security issues
Seekret – Scans code on GitLab, BitBucket and other repos
Source Clear – Scans Node.js apps for dependencies
Snyk – Scans Node.js apps for Security issues

What are these scanners actually hunting in a build? Jenkins is a developer tool for automated pipelines. In the Jenkins documentation, you can find code samples such as the following, which demonstrate how to script login credentials for testing a web app. This kind of script often sits unencrypted on a software repo:

// define a login function for a Jenkins build
withCredentials([myusernamePassword(credentialsId: 'amazon',
                     myusernameVariable: 'USERNAME', mypasswordVariable: 'PASSWORD')]) {
    //available as an environment variable,
    sh 'echo $PASSWORD'
    echo "${env.USERNAME}"
}
 
// You can also request multiple credentials in a single call
withCredentials([myusernamePassword(credentialsId: 'amazon',
                     myusernameVariable: 'USERNAME',mypasswordVariable: 'PASSWORD'),
                 string(credentialsId: 'slack-url',
                     variable: 'SLACK_URL'),]) {
    sh 'echo $PASSWORD'
    echo "${env.SLACK_URL}"
}
 
// (myusernamePassword, string, ...) yet, and directly call the class:
withCredentials([[$class: 'UsernamePasswordMultiBinding', credentialsId: 'amazon',
                  myusernameVariable: 'USERNAME', mypasswordVariable: 'PASSWORD']]) {
    //available as an environment variable,
    sh 'echo $PASSWORD'
    echo "${env.USERNAME}"
}

In fact, automation testing is the pipeline component most likely to contain usernames and passwords. This is because, in order to test a new version of software which requires bots to login to actual accounts, a testware app must spin up virtual users and playback recorded user gestures. Scripts like the one above are used to by virtual testers for automated sign-in.

Login Credentials Seep Out Through The Cracks

Even developers are sometimes surprised by the unanticipated places their authentication credentials show up. Many web servers auto-generate a script log of every script run and make a copy of the script. When a server script log contains a script like the one above, now all server admin staff have access to the credentials! Attackers can use source scanners too!

Enable Source Code Scanning on a Repo

You can configure source code security scanning tools so that, anytime you push a commit to your repository, the tool will detect and report vulnerabilities. Strings which look like API secret keys, passwords or other authentication credentials will be flagged by the scanner. A code scanner typically highlights code issues in red and adds other warning symbols. Warnings show up on commits lists and commit details pages. Let’s look at a few code scanners. Here is a typical warning in which a code scanner flags code with a security risk:

Clouseau Inspects Code Builds for Security Problems

Clouseau inspects git code commits, including source code and commit messages. The targets are patterns of strings and text which resemble passwords, API keys, secret tokens, and ssh keys, as well as personal identification data. Developers can use regular expressions to specify patterns and filters for searching code. Clouseau also works from the command line.

Clouseau is an open-source project, also hosted on GitHub. To use it, clone the Clouseau repo to a UNIX site where you can use Python. You can search another repo for a base set of patterns by entering commands in this format:

$ bin/clouseau --url [target-repo-url]

And Clouseau will search the repo using expressions in the file clouseau/patterns/default.txt. To search using regular expressions, use a command of this format:

$ bin/clouseau --url https://github.com/ByteScout/catalog.git --term regex here"

You can search for a repo using one pattern file or multiple pattern files. Clouseau supports searching between two commits, or even searching the range of commits pushed since a given date. To test your Clouseau installation, make a commit where you intentionally add a string that looks like an SSN or API key to a source file.

SourceClear Detects Node.js Dependencies

We can scan Node.js application builds automatically to find vulnerabilities and dependencies. Some security issues can be repaired automatically by SourceClear. SourceClear prevents an app with issues from deploying to production. SourceClear also supports scanning for Python, Java, and Ruby source code. Here’s a typical scan report:

Snyk Tests Code for Security Dependencies

You can configure Snyk to check a node.js GitHub repo for predefined security risks such as strings with secret keys and other dependencies. Snyk provides continuous alerts during the development cycle. Best of all Snyk, is free to use on a public Node.js application in GitHub repositories. Public NPM packages can also be scanned.

Python Source Code Encryption

We will devote special emphasis to understanding Python source code vulnerabilities, because of the prolific use of Python in app development today. The Aeroflot breach served as a sobering wake-up for developers around the world. All of Aeroflot’s Python source code became publicly accessible on a repository because of insecure developer methods.

Although developers are more knowledgeable than anyone about data security, we are still human and subject to workflow pressure! And popular developer trends, especially in distributed computing, now lead to new kinds of source code and data vulnerabilities. Because of time pressure to meet deadlines involved in Agile and DevOps sprints and workflows, coders often take shortcuts.

Two important shortcuts related to security issues are actually interwoven in daily developer practices. So, we need to discuss these simultaneously:

Storing source code in plain text form on repositories (unencrypted)
Sharing access to authentication credentials among team members

For the benefit of those who may not be familiar with the differences between a compiled language like C++ and an interpreted language like Python, here is a word of the intro. Compiled languages are inherently more secure because they are not human-readable. By contrast, interpreted languages like Python sit in plain text, human-readable form until execution.

This means that if unauthorized users get access to authentication credentials then it will be much easier for them to steal Python source code. This is what happened to Aeroflot in a recent breach which led to the loss of their entire code base! What are the best developer methods for securing Python source code?

Compiled Python Kills Two Problems With One Snake!

A very exciting and efficient method for securing Python source code is to compile it to C++ machine code! That’s right, there is a C++ compiler which converts Python source code to C++ machine language. And compiling Python to C++ comes with a bevy surprising bonuses and benefits:

Source code no longer human-readable
Python code optimized and executes faster
Loose Python variable types become strong C++ types
Memory safety can be stronger with garbage collection

We mentioned earlier that the C++ machine language looks encrypted and is not readable to humans. If a rebuilder wants to steal Python source code which has been compiled to binary, it is no longer easy to identify.

A rebuilder can easily reverse engineer Python source code and discover the intellectual property of the designers. The original methods used by the owner to design an algorithm often comprise the core value of a company. In spy terms, it’s their secret formula! After Python is compiled, the original coding language it is no longer obvious. This adds a nice layer of security to deployments. But there is yet another benefit to be reaped by compiling Python.

Compiling Python to C++ machine code actually makes the code run faster! An advantage of running compiled code is that it often runs much faster than interpreted code. And because C++ has stricter data types than Python, the compiled Python application will often have a higher level of memory safety. Cython, as it’s called, is a popular Python compiler. Let’s have a look.

Compiling Python Code with Cython

Essentially, Cython is a C++ compiler, but one which reads Python source code and converts it to C++ first. Machine code generated by Cython is an optimized C++ machine code! As we’ve discussed, the executable file output of Cython is more resistant to the theft of intellectual property. Furthermore, the executable code usually runs much faster than the Python code would normally run.

Developers will naturally want to know if other modules and dependencies will remain compatible with the compiled Python executable. Fortunately, the answer is YES! A Python app compiled to C++ still uses the same Python DLLs. Cython supports configuring various optimization parameters that will have bearing on the resultant C++ executable.

The Cython platform compiles Python code into .pyd modules. At this point, another app called Nuitka can be used to generate .exe files, and embed all required libraries and other dependencies for running the code. We are talking about beneficial security side-effects of compiling Python code. One such benefit is improved memory safety.

Stronger Data Types With Compiled Python

A super side-effect of compiling Python is that Cython converts the loosely-typed parameters and variables of Python to the strong types of C++. Python data types are not strict by design, and Cython converts them into strong C++ types. Cython is effectively a Python language compiler, but with stronger data types. Resulting executables can likewise make calls to both Python and C++ APIs. The ultimate result is increased memory safety and reduced possibility of memory overflow attacks!

Another Word on Memory Safety From the Rust Language Compiler

The choice of language nowadays often has more to do with the way compilers manage memory than any visible scripting features of the language. The Rust compiler, for example, reclaims the memory allocated for a vector at the end of the vector’s scope through a unique concept called Ownership. This memory safety feature enforced by the Rust compiler can be engineered into a comprehensive development security solution as we will see in the next sections. Have a look at this code segment:

fn mymain() {
    let a1 = true;
    let _y = change_truth(a1);
    println!("{}", a1);
}
fn change_truth(x: bool) -> bool {
    !x
}

fn mymain() {
    let a1 = 5;
    let _y = double(a1);
    println!("{}", a1);
}
fn double(x: i32) -> i32 {
    x * 2
}

The above code segments only compile because the I32 and Bool types implement the copy trait. Otherwise, the compiler would generate an error for moving objects out of scope.

Integrating Many Security Measures into One Platform

New security-centric developer platforms are rapidly evolving today. This is because security vulnerabilities arising from common developer practices to automate continuous integration are relatively new. These practices are technical and poorly understood by non-technical staff. Paid platforms are popping up which enhances security features of existing freeware and open source versioning platforms like Apache Subversion.

In the next sections, we will show that a security focus can be added to all components in the development cycle. Furthermore, a cycle that includes only such components is the most security-robust. Let’s begin with an existing OS and probe all the way down to hardware-level security.

OpenBSD OS Features Encryption Protocols

UNIX based OpenBSD is a higher security-oriented open-source operating system. Kernel operations are handled in innovative ways. For example, swap space is divided into small encrypted sections, each with a unique key. This prevents accidental overflow between memory areas.

OpenBSD also assigns random process IDs to applications. This has the security benefit of unpredictability. A random PIDs associated with apps are more difficult to attack. A bind system call implements random port numbers. And files are also created with random inode numbers. These are just a few of the security measures which can be enforced at the OS level of our imaginary comprehensive solution. Now let’s look at a very real hardware contribution to the idea.

Hardware-Based Strategy

The Z14 mainframe server, for example, now features pervasive encryption of all data and code! Z14 architecture intends to utterly solve security problems at the hardware level! However, very few companies can afford to implement this strategy. The Z14 starts around $75,000 each! Most companies need economical options, which include a combination of paid and open source components.

Security-centric Developer Platforms

Inevitably, all developer platforms will assimilate increasing security screening functionality. Static code scanning is one such functionality. When machine learning improves to the extent that app simulation reveals security risks in addition to bugs, then code scanning will become standard. After all, hackers can use code scanners too! The can use it to find credentials to gain access. Therefore enterprises must beat them to the punch.

Although it is hypothetical at this time, a security-based repo could actually integrate all of the methods discussed here! From developer training all the way down to hardware encryption! In fact, it seems inevitable that all of these methods will be integrated, because recent data breaches represent potential losses in the billions of dollars. Every possible security measure is financially justified.