The Problem With Digital Transformations

Digital Transformation is one of the most ubiquitous buzz words zipping around the hipstersphere these days. Every day we hear a new approach to DT, Agile, agile, cloud transformation, containerisation, data centre grandfathering… etc.

What does Digital Transformation actually mean?

Most people bearing solutions want to sell you something more complicated than you have before – at a price. It looks great and will propel you into the next century and end all your problems, right?

However, the key problem that most businesses and government departments face today is that they consistently fail to remove complexity as they try to transform. A successful transformation plan should look to reduce complexity first rather than changing technology for the sake of it. In fact I would go as far to say that digital transformation is not about platforms or web pages or agile, it is rather about quantifying the data that needs to flow in and out of your department, Ensuring you can secure that data properly then finding simple innovative ways to let the user access that data to scratch an itch. Anyone that turns up at your door telling you that you need a new web portal/phone app or widget as the focus for transformation is going to add complexity and should be avoided.

Simplify, prioritise and standardise. These are your paths to digital transformation success…

AWS Force MFA Policy for IAM Admins

Whenever I take on a new customer, one of the very first security checks I perform is to ensure all users are leveraging multi-factor authentication (MFA). Customers often ask if there is a way to force all users to use MFA. While there is no simple checkbox for this, there is an IAM policy that can be applied to all users, which strips away all IAM permissions (except those needed to configure MFA) until a user logs in with an MFA code. This policy is one of my top recommendations to customers looking to secure access to the AWS management console and API.

Fortunately, you do not need to write this policy from scratch! And neither did I… AWS released a blog post last summer describing how to implement this IAM policy. This has become the standard policy to enforce MFA across an organisation. This policy works great when applied to a user without any IAM permissions, but if you apply it to a user who already has AdministratorAccess or IAMFullAccess, there is a minor security risk. As an MSP and Consulting Partner of AWS, most of our new customers already have a number of users with full AdministratorAccess. Given this, I decided to expand upon the Force_MFA policy to mitigate the security risk that remains when applying it to users with AdministratorAccess permissions.

In this blog post, I will first breakdown the AWS Force_MFA policy by each statement ID (SID), then I will describe the changes we made to cover the use case of applying this policy to users who already have full IAM rights. Remember, the purpose of the Force_MFA policy is to strip the user of all permissions unless they log in with MFA, but it should still give the user enough permissions to log in to the console to be able to configure MFA (only for their account).

    {
        "Sid": "AllowAllUsersToListAccountsAndGetPasswordPolicyForReset",
        "Effect": "Allow",
        "Action": [
            "iam:ListAccountAliases",
            "iam:GetAccountPasswordPolicy",
            "iam:ListUsers"
        ],
        "Resource": [
            "*"
        ]
    }

This SID gives the user access to the IAM permissions to view the account alias when clicking on the IAM service, as well as the ability to list all users in the console. Without ListUsers, the user would not be able click into their ID to make changes such as reset the password or configure MFA. It also gives users access to the account’s password policy, which is required to reset the password.

    {
        "Sid": "AllowIndividualUserToSeeTheirAccountInformation",
        "Effect": "Allow",
        "Action": [
            "iam:ChangePassword",
            "iam:CreateLoginProfile",
            "iam:DeleteLoginProfile",
            "iam:GetAccountPasswordPolicy",
            "iam:GetAccountSummary",
            "iam:GetLoginProfile",
            "iam:UpdateLoginProfile"
        ],
        "Resource": [
            "arn:aws:iam::ACCOUNT-ID-WITHOUT-HYPHENS:user/${aws:username}"
        ]
    }

This SID gives the user a number of different permissions required to view all of their account information and change their password. Notice that the resource is different than in the above SID. Here, we can limit the resource to just one specific user – i.e., user/${aws:username}, whereas in the previous SID, the actions defined must apply across the account – i.e., “*”.

The ${aws:username} is what is known as a policy variable. When the IAM policy is evaluated, the IAM ID of the authenticated user replaces the policy variable.

    {
        "Sid": "AllowIndividualUserToListTheirMFA",
        "Effect": "Allow",
        "Action": [
            "iam:ListVirtualMFADevices",
            "iam:ListMFADevices"
        ],
        "Resource": [
            "arn:aws:iam::ACCOUNT-ID-WITHOUT-HYPHENS:mfa/*",
            "arn:aws:iam::ACCOUNT-ID-WITHOUT-HYPHENS:user/${aws:username}"
        ]
    }

This SID gives the user proper permissions to view their MFA device. You can think of this one as read only access on MFA for the authenticated user.

    {
        "Sid": "AllowIndividualUserToManageThierMFA",
        "Effect": "Allow",
        "Action": [
            "iam:CreateVirtualMFADevice",
            "iam:DeactivateMFADevice",
            "iam:DeleteVirtualMFADevice",
            "iam:EnableMFADevice",
            "iam:ResyncMFADevice"
        ],
        "Resource": [
            "arn:aws:iam::ACCOUNT-ID-WITHOUT-HYPHENS:mfa/${aws:username}",
            "arn:aws:iam::ACCOUNT-ID-WITHOUT-HYPHENS:user/${aws:username}"
        ]
    }

This SID allows the user to actually configure their (and only their) MFA device. This is write access to the authenticated user’s MFA configurations.

    {
        "Sid": "DoNotAllowAnythingOtherThanAboveUnlessMFAd",
        "Effect": "Deny",
        "NotAction": "iam:*",
        "Resource": "*",
        "Condition": {
            "Null": [
                "aws:MultiFactorAuthAge": "true"
            ]
        }
    }

In this SID, we thought it would be best to make an improvement based on our particular use case. The above SID uses the NotAction element to deny all possible API actions other than the the one listed (“iam:*”). There is also a condition to this SID. The condition says that the SID should only be in effect if aws:MultiFactorAuthAge = true is null – i.e., only evaluate the SID if the user has not configured MFA. (Note that the aws:MultiFactorAuthAge key is not present if MFA is not enabled; hence, the Null portion of the condition.)

This SID works perfectly fine if the policy is applied to a user who does not already have permissions to perform IAM actions. As a Consulting and MSP partner, when I support a new customer, I often apply this policy to user IDs who already have full AdministratorAccess rights attached. This means that denying “NotAction:” “iam:*” is not strict enough because the user would then have ALL iam:* permissions applied from the AdministratorAccess policy.

In summary, if this policy is applied to a user with admin rights, the user is really not forced to configure MFA if they do not want to. The user has the IAM rights to simply remove the Force_MFA policy from their ID rather than configuring MFA. The user could also just create a new, full admin user without having to configure MFA. To mitigate this risk, we decided to replace the last SID with the two below SIDs, which truly removes all permissions (regardless of whether or not the user already has full admin) until MFA is configured.

    {
        "Sid": "DenyEverythingExceptForBelowUnlessMFAd",
        "Effect": "Deny",
        "NotAction": [
            "iam:ListVirtualMFADevices",
            "iam:ListMFADevices",
            "iam:ListUsers",
            "iam:ListAccountAliases",
            "iam:CreateVirtualMFADevice",
            "iam:DeactivateMFADevice",
            "iam:DeleteVirtualMFADevice",
            "iam:EnableMFADevice",
            "iam:ResyncMFADevice",
            "iam:ChangePassword",
            "iam:CreateLoginProfile",
            "iam:DeleteLoginProfile",
            "iam:GetAccountPasswordPolicy",
            "iam:GetAccountSummary",
            "iam:GetLoginProfile",
            "iam:UpdateLoginProfile"
        ],
        "Resource": "*",
        "Condition": {
            "Null": [
                "aws:MultiFactorAuthAge": "true"
            ]
        }
    }

This SID is very similar to the original one it is replacing, except we are breaking down the iam:* action into only the specific actions that are required. This means that all other IAM actions will be denied.

    {
        "Sid": "DenyIamAccessToOtherAccountsUnlessMFAd",
        "Effect": "Deny",
        "Action": [
            "iam:CreateVirtualMFADevice",
            "iam:DeactivateMFADevice",
            "iam:DeleteVirtualMFADevice",
            "iam:EnableMFADevice",
            "iam:ResyncMFADevice",
            "iam:ChangePassword",
            "iam:CreateLoginProfile",
            "iam:DeleteLoginProfile",
            "iam:GetAccountSummary",
            "iam:GetLoginProfile",
            "iam:UpdateLoginProfile"
        ],
        "NotResource": [
            "arn:aws:iam::ACCOUNT-ID-WITHOUT-HYPHENS:mfa/${aws:username}",
            "arn:aws:iam::ACCOUNT-ID-WITHOUT-HYPHENS:user/${aws:username}"
        ]
    }    

This SID limits all actions that are user specific to only being allowed by the user who is authenticating by using NotResource. The NotResource element means that the SID will apply to all resources except for the one defined; in this case, the authenticated user. The SID explicitly denies any actions which may be granted from an AdministratorAccess policy. Remember that an explicit deny will always override any allow. Note that some may argue this last SID is really not necessary because once the admin user enables MFA, they will have access to these actions. If an attacker gains access to this account, the attacker can, of course, just configure MFA to gain access to these actions. I would argue that an attacker may not realise MFA is required to gain full admin access, so there is still value in this SID so that an attacker cannot log in and start performing these destructive actions on IAM accounts (like deleting access keys).

Here is link to the full policy for implementation into your environment.

 

Dave Snowden On Agile Methodology

The problem with most agile methodologies in the modern environment is ensuring they are applied in the correct context. Too many people stare so close to these practices that the means start to overshadow the ends.

Dave Snowden reminds us of the importance of cognitive neuroscience, anthropology and theory informed practice in modern software development:

And that is why The Pentagon listens to him.

Maybe you should too.

Evolving Information Security For Maximum Impact

Modern Security Methods: Lean Principles

Information security is hard. Providing good security is even harder. Agile development methodology and the practice of Lean principles has allowed industry leaders to produce and deploy software faster and more frequently than in decades before. Naturally, our diligence to protect this software, and our ways of doing so, must evolve too.

Security has always followed the innovation of technology, and rightly so — seeing that without the technology, there would be nothing to protect. It is pretty obvious from simply observing current events that protecting customer data and fortifying code against attacks is something that needs to be at the forefront of modern software development.

Unfortunately, in the security profession, there always seems to be a delay between the change in software development and the change in securing that software. Problems creep in when trying to improve development workflow, and providing visible value to both the consumer of the software as well as the development organisation.

Old-school security certainly adds value to old-school operations, but frequent, iterative software releases cannot be protected by systems and methods designed to guard the monolith.

Security needs to evolve!

Old School: Out of the Loop

So, what are the constraints that antiquated methods put on flow?

There are stereotypes that you’ll hear in the IT and DevOps world, which include but are not limited to: painful change advisory board (CAB) meetings, and comically bureaucratic hierarchies of approval for technological change.

Although these processes were initially birthed from a genuine concern for security, they have in many ways lived up to their stereotypes. They have made quick patching, frequent vulnerability analysis, and integrated security ironically difficult, and in the long run, they can negatively affect the security posture of your organisation.

These limitations not only exist in infrastructure, but also in the software development lifecycle. In old-school information security, developers may work on a piece of software for months before a release. Often, security is the last stop on the tracks. When security then finds a problem with the code, the software may have to go through an extensive rewrite. In these instances, security is uninvolved, detached and not as effective as it has the potential to be.

Lean Adds Value

Value is another Lean idea that is negated from the conventional security model. A better way to say that might be the ‘perceived value’. Successful security is hard to quantify, because its correct implementation should be the lack of impact i.e. breaches, incidents or outages.

What’s even worse is that really bad security practices seemingly have the same quantifiable results, due to a lack of insight into the threat landscape. This leaves the security engineers, application security specialists and hackers in a difficult position in which they must find a way to express the risk that has been mitigated by the systems they have put in place.

New School: Continuous Deployment as a Security Feature

Successful and innovative companies like Etsy, claim that continuous deployment is their number one security feature. This should be considered across all organisations

First, frequent iterations in both software deployments and infrastructure changes allow for more thorough security and vulnerability analysis. If the change being pushed is small enough, it allows the security analysis to be extremely focused. More possible avenues of exploitation can be examined when the light is shining on only a few changes, and it’s all happening within the existing flow of the SDLC.

Second, continuous deployment forces security into a more intimate relationship with development. When that occurs, security has the opportunity to implement code-level security insight. These metrics can then be taken, and a real threat landscape can start to be revealed. This allows the value of security to be presented in a quantifiable manner. Instead of a “dysfunctional family” dynamic, development and security can start down the road of a more symbiotic relationship.

But How Do We Get There From Here?

Lean practices lend themselves well to progressive information security, but where does one start on this path? Is there developer training that needs to take place? Do policies need to formed and instigated before change can occur?

I don’t have answers to these specifically because the solutions differ between organisations. But I can say that it helps to have resources available to developers and IT operations team members. Technology employees, for the most part, are used to learning and absorbing new concepts but spend the majority of their focus elsewhere. If a simple catalogue of resources can be made available for consumption, it can highly contribute to a cultural shift.

No organisation can be perfect, however integrating high-quality, pragmatic security resources is essential to maximise your risk mitigation. Even if there is a lack of security awareness within a company, a pool of appropriately skilled resources can can have a huge impact once you convince people that security adds value.

There is awesome potential for improving the value that security offer even if you maintain your current workflow. However improving engagement combined with a modernised workflow will transform your your entire security posture. All of this is not to say that there aren’t problems with modern methods — but Agile teams that value Lean principles at least have the framework to evolve quickly to fix those problems.

Where Should We Go From Here?

Too often I still encounter an aversion to security compliance. People pay lip-service to built-in security, but don’t want to embrace the rigour that comes with undertaking such a model. Product owners over-constrain their systems by accepting project timelines at the expense of maintaining quality throughout the software development cycle. However challenging this may be, there are some areas in which we can still gain traction.

It is the process of doing security that matters more than the solutions that are in place. Security is the ability to react and adapt in order to maintain the integrity of the data and resources that it is protecting. Security, for better or worse, is dependent upon people. Educating people in a progressive manner is key to creating this change.

Security Risks and Benefits of Docker Application Containers

Container Security

Running applications in containers rather than virtual machines is gaining traction in the IT community. This ecosystem presently revolves around Docker, a platform for packaging, distributing and managing Linux apps within containers. Though this technology is still maturing, it will evolve along the trajectory similar to that of VLANs and virtual machines. With this in mind, I’d like to highlight some of the the security risks and benefits of using such containers.

The Degree of Isolation of Application Containers

Containers take advantage of the Linux kernel’s ability to create isolated environments that are often described as a “chroot on steroids.” Containers and the underlying host share a kernel. However, each container is assigned its own mostly independent runtime environment as with the help of Control Groups (cgroups) and namespaces. Each container receives its own network stack and process space, as well as its instance of a file system.

In the past, Docker did not provide each container with its own user namespace, which meant that it offered no user ID isolation. A process running in the container with UID 1000, for example, would have had the privileges of UID 1000 on the underlying host as well. Along these lines, a process running as root (UID 0) in a container had root-level privileges on the underlying host when interacting with the kernel.

Docker recognised the lack of namespace isolation as a limitation and has introduced User Namespaces as a result. As of this writing, Docker has introduced formal support into the software that lays the foundation for being able to map a container’s root user to a non-root user on the host.

Docker isolates many aspects of the underlying host from an application running in a container without root privileges. However, this separation is not as strong as that of virtual machines, which run independent OS instances on top of a hypervisor without sharing the kernel with the underlying OS. It’s too risky to run apps with different security profiles as containers on the same host, but there are security benefits to encapsulating into containers applications that would otherwise run directly on the same host.

Locking Down and Patching Containers

A regular system often contains software components that aren’t required by its applications. In contrast, a proper Docker container includes only those dependencies that the application requires, as explicitly prescribed in in the corresponding Dockerfile. This decreases the vulnerability surface of the application’s environment and makes it easier to lock it down. The smaller footprint also decreases the number of components that need to be patched with security updates.

When patching is needed, the workflow is different from a typical vulnerability management approach:

  • Traditionally, security patches are installed on the system independently of the application, in the hopes that the update doesn’t break the app.
  • Containers integrate the app with dependencies more tightly and allow for the container’s image to be patched as part of the application deployment process.
  • Rebuilding the container’s image (e.g., “docker build”) allows the application’s dependencies to be automatically updated.
  • The container ecosystem changes the work that ops might traditionally perform, but that isn’t necessarily a bad thing.

Running a vulnerability scanner when distributing patches the traditional way doesn’t quite work in this ecosystem. What a container-friendly approach should entail is still unclear. However, it promises the advantage of requiring fewer updates, bringing dev and ops closer together and defining a clear set of software components that need to be patched or otherwise locked down.

Security Benefits and Weaknesses of Containers

Application containers offer operational benefits that will continue to drive the development and adoption of the platform. While the use of such technologies introduces risks, it can also provide security benefits:

  • Containers make it easier to segregate applications that would traditionally run directly on the same host. For instance, an application running in one container only has access to the ports and files explicitly exposed by another container.
  • Containers encourage treating application environments as transient, rather static systems that exist for years and accumulate risk-inducing artefacts.
  • Containers make it easier to control what data and software components are installed through the use of repeatable, scripted instructions in setup files.
  • Containers offer the potential of more frequent security patching by making it easier to update the environment as part of an application update. They also minimise the effort of validating compatibility between the app and patches.

Not all is perfect in the world of application containers, of course. The security risks that come to mind when assessing how and whether to use containers include the following:

  • The flexibility of containers makes it easy to run multiple instances of applications (container sprawl) and indirectly leads to Docker images that exist at varying security patch levels.
  • The isolation provided by Docker is not as robust as the segregation established by hypervisors for virtual machines.
  • The use and management of application containers is not well-understood by the broader ops, infosec, dev and auditors community yet.

Containers are Here

The current state of application containers is reminiscent of the early days of other segmentation technologies, namely VLANs and virtual machines.

VLANs were created for performance and convenience. They allow defining Ethernet broadcast domains flexibly across multiple switches. As an isolation tool, they are useful for improving the network’s performance. They were initially seen as risky due to software, hardware and security implementation flaws. Security concerns with VLANs still exist, but the technology and management practices have matured to the point that rare is a network that doesn’t employ VLANs in some capacity.

Similarly, one could draw a parallel to the evolution of virtualisation technologies. Even today, the flexibility that they provide acts as a magnifying force for accomplishing great IT feats, while also giving the organisation plenty of opportunities to weaken its security posture. Yet, the proper use of virtual machines is now widely understood and accepted even in environments where data security is important.

Application containers, whether implemented using Docker, LXC, Rocket or another project, are gaining momentum. They are not going away. As the technology and associated processes mature they will address many of the risks outlined above. Security professionals can help shape the evolution of the container ecosystem by exploring its risks and mitigation strategies. We also need to be prepared to discuss containers with our dev and ops colleagues when the time comes.

That time is now.

Atom Editor Setup For Python

First Install For Your OS

Atom on Mac

Atom was originally built for Mac and should be a simple setup process. You can either hit the download button from the atom.io site or you can go to the Atom releases page at:

https://github.com/atom/atom/releases/latest

Here you can download the atom-mac.zip file explicitly.

Once you have that file, you can click on it to extract the binary and then drag the new Atom application into your “Applications” folder.

When you first open Atom, it will try to install the atom and apm commands for use in the terminal. In some cases, Atom might not be able to install these commands because it needs an administrator password. To check if Atom was able to install the atom command, for example, open a terminal window and type which atom. If the atom command has been installed, you’ll see something like this:

$ which atom
/usr/local/bin/atom
$

If the atom command wasn’t installed, the which command won’t return anything:

$ which atom
$

To install the atom and apm commands, run “Window: Install Shell Commands” from the Command Palette, (cmd + shift + p), which will prompt you for an administrator password.

Atom on Windows

Atom comes with a windows installer. You can download the installer from https://atom.io or from:

https://github.com/atom/atom/releases/latest

This will install Atom, add the atom and apm commands to your PATH, create shortcuts on the desktop and in the start menu, and also add an Open with Atom context menu in the Explorer.

Atom on Windows

FIGURE 1-2.Atom on Windows

If you just want to download a .zip of the latest Atom release for Windows, you can also get it from the Atom releases page at https://github.com/atom/atom/releases.

Atom on Linux

To install Atom on Linux, you can download a Debian package or RPM package either from the main Atom website at atom.io or from the Atom project releases page at https://github.com/atom/atom/releases.

On Debian, you would install the Debian package with dpkg -i:

$ sudo dpkg -i atom-amd64.deb

On RedHat or another RPM based system, you would use the rpm -i command:

$ rpm -i atom.x86_64.rpm

Atom from Source

If none of those options works for you or you just want to build Atom from source, you can also do that.

There are detailed and up to date build instructions for Mac, Windows, Linux and FreeBSD at: https://github.com/atom/atom/tree/master/docs/build-instructions

In general, you need Git, a C++ toolchain, and Node to build it. See the repository documentation for detailed instructions.

Setting up a Proxy

If you’re using a proxy, you can configure apm (Atom Package Manager) to use it by setting the https-proxy config in your ~/.atom/.apmrc file:

https-proxy = https://9.0.2.1:0

If you are behind a firewall and seeing SSL errors when installing packages, you can disable strict SSL by putting the following in your ~/.atom/.apmrc file:

strict-ssl = false

You can run apm config get https-proxy to verify it has been set correctly, and running apm config list lists all custom config settings.

Edit a Python file and use Atom’s Autocomplete

Let’s start by creating a Python file with:

In the new file, if you type de, you’ll see that it suggests if you want to create a new function. This is because Atom has detected that the file extension is a Python extension.

creaate-function

If you type the Tab key, you’ll see a template for a new function:

create-function-2

Note that you have the fname highlighted. This is because you can now type the name of your function and it will replace  fname. Let’s name our function  product.

Next, if you hit the  Tab key again, the arguments of the function, arg, will be now selected. Just write x, y, as we need two different arguments for our function.

Finally, hit the  Tab key again to select the body of our function, pass, and replace it for our code. The end function should be something like:

Linter for Atom

Linter is an Atom package that provides a top level API so that there is a unification among all the linter atom plugins. This means that all the extra packages that you install, that highlight your code (for example to detect errors) will use a unified method.

To install it, just type:

This package is called linter-flake8 and it’s an interface to flake8. To install it, you need to run:

atom-linter-flake8

If you open Atom and you find an error that says

 The linter binary flake8 cannot be found

process.env.PATH = [‘/usr/local/bin/’, process.env.PATH].join(‘:’)

Moreover, there are Linters for other languages like HTML, CSS or Javascript. You can find a list here.

Further customisation for Python to follow PEP8

Here I’ll show you how you can configure Atom to follow PEP8, the official Python styling guide.

First, open the Atom –> Preferences window.

1. Use spaces instead of tabs.

Scroll down the Settings panel until you see the Soft Tabs option. Make sure it’s checked. This setting will convert tabs into spaces automatically.

atom-settings

2. Set the tab length to 4 spaces

A little below the Soft Tab setting, you”ll see the Tab Length. Set it to 4 spaces.

3. Automatic PEP8 validation.

If you installed the linter-flake8 package discussed in the previous section, you already have automatic PEP8 validation

Keybindings customisation

In the same Preferences panel, you can see the Keybindings menu on the left. There, you’ll find a list of all the default keybindings active in your Atom editor.

However, by default, Atom confirms an autocomplete suggestion with both the Tab and Enter keys. But I only want to use the Tab key.

In order to disable Enter as an autocomplete confirm key, we need to go to the Keybindings menu where you’ll see a link that says your keymap file. Click on that link to open the keymap.cson file.

There, you need to write:

Other Useful Packages

Project manager: a package for saving your projects.

Atom Django: Django support for Atom

Minimap: Displays a small map of the current file on the right side of your document (like Sublime Text by default).

Script: Lets you run python scripts in Atom

Beautifier: Autocorrect your PEP8 lint errors

$ pip install autopep8
$ apm install atom-beautify

Creating Bootable USB On Linux / Mac

On Mac:

# sudo su –   ## (or whatever)

# diskutil list

Identify your disk/USB device

# sudo diskutil unmountDisk /dev/disk3

or

# sudo unmount /dev/xxx

# sudo dd if=input.iso of=/dev/usb_device

where input.iso is the OS iso file and /dev/usb_device is the USB device you’re writing to.

Or create an iso using

# dd if=/dev/disk3 of=~/Downloads/SDcard_image.iso [bs=1m]   ## optionally specify block size (doesn’t really matter)

 

Load Balancing With Nginx

Overview

Nginx, the web server, is a fantastically simple and inexpensive front-end load balancer for web applications – large and small. Its ability to handle high loads of concurrency and simply forwarding settings make it an excellent choice.

Although it doesn’t have the bells and whistles of enterprise solutions from Citrix or F5, it is very capable of doing the job, and doing it very well. The biggest down fall, depending on your team’s skill set, is it won’t have a friendly GUI to guide you. The configurations will have to be done in the Nginx configuration files using a text editor.

Don’t let that stop you from deploying Nginx. Many start-ups and relatively large technology companies rely on Nginx for load balancing their web applications.

Outcomes

There are many alternatives, but what follows is my experience of:

  • Deploying an Nginx server on CentOS 6
  • Load balance 3 Apache web servers
  • Web server 1 and 2 are new, powerful servers and should receive most of the connections.
  • Web server 3 is old and should not receive too many connections.
  • Connections should be persistent. This is required to ensure users remain on the same server when they log in, as session information isn’t replicated to other servers.

Server Configuration

Internal Hostname OS Role IP Address
slloadbal01.example.com CentOS 6.5 Nginx Load Balancer 172.30.0.35
mywebapp01.example.com CentOS 6.5 Apache Web Server 172.30.0.50
mywebapp02.example.com CentOS 6.5 Apache Web Server 172.30.0.51
mywebapp03.example.com CentOS 6.5 Apache Web Server 172.30.0.52
TABLE1 – Frontend load balancers and backend web servers

 

Application Configuration

Each web application server’s IP address will be assigned the same public hostname, in addition to their real hostnames as listed above. They will each have WordPress installed, with exact same configuration and content.

Website Hostname Application Database Server
www.tctest.com WordPress webdb01.example.com
TABLE2 – DNS information for balanced web service and database

Listing the database isn’t really that relevant to this tutorial, other than to illustrate that the WordPress database is not hosted on any of the web servers.

Installing Nginx

  1. Create a YUM repo file for Nginx.

vi /etc/yum.repos.d/nginx.repo

  1. Add the following lines to it.

[nginx]

name=nginx repo

baseurl=http://nginx.org/packages/centos/$releasever/$basearch/

gpgcheck=0

enabled=1

  1. Save the file and exit the text editor.
  2. Install Nginx.

yum install nginx

Configure Nginx

  1. Open the default site configuration file into a text editor.

vi /etc/nginx/conf.d/default.conf

  1. Add the upstream module to the top of the configuration file. The name back-endcan be replaced with a name of your choosing. All three back-end servers are defined by their internal DNS hostnames. You may use IP addresses instead.

upstream website1 {

server mywebapp01.example.com;

server mywebapp02.example.com;

server mywebapp03.example.com;

}

  1. Assign weight values to the servers. The lower the value, the more traffic the server will receive relative to the other servers. Both mywebapp01 and mywebapp02 will be assigned a weigh value of 1 to spread load evenly between them. Mywebapp03 will, however, be assigned a higher weight of 5 to minimize its load. It will receive every 7th (1+1+5) connection.

upstream website1 {

server mywebapp01.example.com weight=1;

server mywebapp02.example.com weight=1;

server mywebapp03.example.com weight=5;

}

  1. We need our users logged into the WordPress CMS to always connect to the same server. If they don’t, the will be shuffled around the servers and constantly having to log in. We use the hash directive to force users to always communicate with the same server.

upstream website1 {

ip_hash;

server mywebapp01.example.com weight=1;

server mywebapp02.example.com weight=1;

server mywebapp03.example.com weight=5;

}

  1. Now we configure the server directive to listen for incoming connections, and then forward them to one of the backend servers. Below the upstream directive, configure the server directive.

server {

listen 80; # Listen on the external interface

server_name www.tctest.com;

location / {

proxy_pass http://website1;

}

}

  1. Save the configuration file and exit the text editor.
  2. Reload the default configuration into Nginx.

service nginx reload

Additional Options and Directives

Marking a Server as Down (offline)

You may need to bring one of the servers down for emergency maintenance. And you want to be able to do this without impacting your users. The Down directive will allow you to do this.

upstream website1 {

ip_hash;

server mywebapp01.example.com weight=1 down;

server mywebapp02.example.com weight=1;

server mywebapp03.example.com weight=5;

}

Health Checks

Enable health checks to automatically check the health of each server in an upstream group. By default, each server is checked every 5 seconds by sending an http connection. If the server doesn’t return a 2XX or 3XX status, it is flagged as unhealthy and will no longer have connections forwarded to it.

upstream website1 {

server mywebapp01.example.com;

server mywebapp02.example.com;

server mywebapp03.example.com;

health_check;

}

 

Upstream Server Ports

Unless a port is specified, all requests will be forwarded to port 80. If your back-end web servers are hosting the application on another port, you may specify it at the end of the server name/ip address.

upstream website1 {

server mywebapp01.example.com:8080;

server mywebapp02.example.com:8080;

server mywebapp03.example.com:9000;

}

 

Backup Servers

You have a requirement for having a server as a hot backup for when a node unexpectedly goes down. The backup server will only handle traffic when a node goes down, and will remain idle when all nodes are healthy.

upstream website1 {

server mywebapp01.example.com;

server mywebapp02.example.com;

server mywebapp03.example.com;

 

server mywebbkup01.example.com backup;

}

 

More Options and Directives

There are so many different directives to manage Nginx load balancers that it doesn’t make sense to list them all here. I’ve kept is short to highlight popular options. I do recommend that you read through the upstream documentation for a complete list of capabilities.

Conclusion

You now have a functional load balancer for your website, spreading load among three nodes. It may not have all of the bells and whistles of enterprise balancers, but it is very fast and very efficient, and it can balance connections with minimal hardware resources. It certain gives typical hardware load balancers a run for their money, which is why it is used by large Internet web sites all over the world.

If you do need a simple and lightning fast balancer for a web application, I would definitely recommend using Nginx.

 

Cynefin And Complexity Theory

Introduction To Complex Systems Theory

As an outspoken character, Dave Snowden usually provokes an opinion from most people with whom he comes into contact. Best known for his creation of the ‘known knowns’ to ‘unknown unknowns’ concept, fewer people have studied the official body of work from which this is derived – Cynefin.

However his theories on complex systems theory are undoubtedly provocative and noteworthy.

I’ve found his theories on complex adaptive systems to be of particular relevance to cloud computing, micro-service oriented, adaptive software development.

Below is a pithy but insightful summation of his opinion of traditional management/development practices.

Enjoy.

For more, see his full talk on Agile development practices:

AWS VPC Design

Design Considerations for VPCs on AWS

Experiences of building VPCs

Few areas of cloud infrastructure are more important to get right from the start than the IP address layout of one’s Virtual Private Cloud (VPC). VPC design has far-reaching implications for scaling, fault-tolerance and security. It also directly affects the flexibility of your infrastructure: paint yourself into a corner, and you’ll spend ungodly amounts of time migrating instances across subnets to free up address space.

Fortunately, it’s easier to lay out a VPC the right way than the wrong way. You just have to keep a few principles in mind.

Subnets

Proper subnet layout is the key to a well-functioning VPC. Subnets determine routing, Availability Zone (AZ) distribution, and Network Access Control Lists (NACLs).

The most common mistake I’ve observed around VPC subnetting is the treatment of a VPC like a data centre network. VPC’s are not data centres. They are not switches. They are not routers. (Although they perform the jobs of all three.) A VPC is a software-defined network (SDN) optimised for moving massive amounts of packets into, out of and across AWS regions. Your packet is picked up at the front door and dropped off at its destination. It’s as simple as that.

Because of that simplicity, a number of data centre and networking-gear issues are eliminated at the outset.

A bit of history: when I first started building data centres in the 90’s, we had 10 Mb/s ethernet switches. Ethernet uses Address Resolution Protocol (ARP) broadcasts to determine who’s where in the switch fabric. Because of that, network segments are chatty in direct proportion to the number of hosts on the broadcast domain. So anything beyond a couple hundred hosts would start to degrade performance. That, combined with the counter-intuitive nature of IPv4 subnet math, led to the practical effect of everyone using 24-bit subnets for different network segments. Three-octet addresses seemed to sit right in the sweet spot of all the constraints.

That thinking is no longer valid in a cloud environment. VPCs support neither broadcast nor multicast. What looks like ARP to the OS is actually the elegant function of the SDN. With that in mind, there is absolutely no reason to hack a VPC into 24-bit subnets. In fact, you have an important reason not to: waste. When you have a “middle-tier” subnet with 254 addresses available (or 128 or 64 or 32 or 16) and you only have 4 middle-tier hosts, the rest of those addresses are unavailable for the remainder of your workloads.

If instead you have a mixed-use subnet with 4,094 addresses, you can squeeze every last IP for autoscaling groups and more. Thus it behooves you to make your subnets as large as possible. Doing so gives you the freedom to dynamically allocate from an enormous pool of addresses.

Generally speaking, there are three primary reasons to create a new subnet:

  1. You need different hosts to route in different ways (for example, internal-only vs. public-facing hosts)
  2. You are distributing your workload across multiple AZs to achieve fault-tolerance. Always, ALWAYS do this.
  3. You have a security requirement that mandates NACLs on a specific address space (for example, the one in which the database with your customers’ personally identifiable information resides)

Let’s look at each of these factors in turn.

Routing

All hosts within a VPC can route to all other hosts within a VPC. Period. The only real question is what packets can route into and out of the VPC.

In fact, you could easily have a VPC that doesn’t allow packets to enter or leave at all. Just create a VPC without an Internet Gateway or Virtual Private Gateway. You’ve effectively black-holed it.

A VPC that can’t serve any network traffic would be of dubious value, so let’s just assume that you have an app that you’re making available to the Internet. You add an Internet Gateway and assign some Elastic IP addresses to your hosts. Does this mean they’re publicly accessible? No, it does not. You need to create a route table for whom the Internet Gateway is the default route. You then need to apply that table to one or more subnets. After that, all hosts within those subnets will inherit the routing table. Anything destined for an IP block outside the VPC will go through the Internet Gateway, thus giving your hosts the ability to respond to external traffic.

That said, almost no app wants all its hosts to be publicly accessible. In fact, good security dictates the principle of least privilege. So any host that doesn’t absolutely need to be reachable directly from the outside world shouldn’t be able to send traffic directly out the front door. These hosts will need a different route table from the ones above.

Subnets can have only one route table (though route tables can be applied to more than one subnet). If you want one set of hosts to route differently from another, you need to create a new subnet and apply a new route table to it.

Fault-Tolerance

AWS provides geographic distribution out of the box in the form of Availability Zones (AZs). Every region has at least two.

Subnets cannot span multiple AZs. So to achieve fault tolerance, you need to divide your address space among the AZs evenly and create subnets in each. The more AZs, the better: if you have three AZs available, split your address space into four parts and keep the fourth segment as spare capacity.

In case it’s not obvious, the reason you need to divide your address space up evenly is so the layout of each AZ is the same as the others. When you create resources like autoscaling groups, you want them to be evenly distributed. If you create disjointed address blocks, you’re creating a maintenance nightmare for yourself and you will regret it later.

Security

The first layer of defence in a VPC is the tight control you have over what packets can enter and leave.

Above the routing layer are two levels of complementary controls: Security Groups and NACLs. Security Groups are dynamic, stateful and capable of spanning the entire VPC. NACLs are stateless (meaning you need to define inbound and outbound ports), static and subnet-specific.

Generally, you only need both if you want to distribute change control authority over multiple groups of admins. For instance, you might want your sys admin team to control the security groups and your networking team to control the NACL’s. That way, no one party can single-handedly defeat your network restrictions.

In practice, NACLs should be used sparingly and, once created, left alone. Given that they’re subnet-specific and punched down by IP addresses, the complexity of trying to manage traffic at this layer increases geometrically with each additional rule.

Security Groups are where the majority of work gets done. Unless you have a specific use-case like the ones described earlier, you’ll be better served by keeping your security as simple and straightforward as possible. That’s what Security Groups do best.

An Example

The above was meant as a set of abstract guidelines. I’d like to provide a concrete example to show how all this works together in practice.

The simplest way to lay out a VPC is to follow these steps:

  1. Evenly divide your address space across as many AZ’s as possible.
  2. Determine the different kinds of routing you’ll need and the relative number of hosts for each kind.
  3. Create identically-sized subnets in each AZ for each routing need. Give them the same route table.
  4. Leave yourself unallocated space in case you missed something. (Trust me on this one.)

So for our example, let’s create a standard n-tier app with web hosts that are addressable externally. We’ll use 10.0.0.0/16 as our address space.

The easiest way to lay out a VPC’s address space is to forget about IP ranges and think in terms of subnet masks.

For example, take the 10.0.0.0/16 address space above. Let’s assume you want to run across all three AZs available to you in us-west–2 so your Mongo cluster can achieve a reliable quorum. Doing this by address ranges would be obnoxious. Instead, you can simply say “I need four blocks—one for each of the three AZs and one spare.” Since subnet masks are binary, every bit you add to the mask divides your space in two. So if you need four blocks, you need two more bits. Your 16-bit becomes four 18-bits.

10.0.0.0/16: 
    10.0.0.0/18 — AZ A
    10.0.64.0/18 — AZ B
    10.0.128.0/18 — AZ C
    10.0.192.0/18 — Spare

Now within each AZ, you determine you want a public subnet, a private subnet and some spare capacity. Your publicly-accessible hosts will be far fewer in number than your internal-only ones, so you decide to give the public subnets half the space of the private ones. To create the separate address spaces, you just keep adding bits. To wit:

10.0.0.0/18 — AZ A
    10.0.0.0/19 — Private
    10.0.32.0/19
            10.0.32.0/20 — Public
            10.0.48.0/20 — Spare

Later on, if you want to add a “Protected” subnet with NACL’s, you just subdivide your Spare space:

10.0.0.0/18 — AZ A
      10.0.0.0/19 — Private
      10.0.32.0/19
              10.0.32.0/20 — Public
              10.0.48.0/20
                  10.0.48.0/21 — Protected
                  10.0.56.0/21 — Spare

Just make sure whatever you do in one AZ, you duplicate in all the others:

10.0.0.0/16:
    10.0.0.0/18 — AZ A
        10.0.0.0/19 — Private
        10.0.32.0/19
               10.0.32.0/20 — Public
               10.0.48.0/20
                   10.0.48.0/21 — Protected
                   10.0.56.0/21 — Spare
    10.0.64.0/18 — AZ B
        10.0.64.0/19 — Private
        10.0.96.0/19
                10.0.96.0/20 — Public
                10.0.112.0/20
                    10.0.112.0/21 — Protected
                    10.0.120.0/21 — Spare
    10.0.128.0/18 — AZ C
        10.0.128.0/19 — Private
        10.0.160.0/19
                10.0.160.0/20 — Public
                10.0.176.0/20
                    10.0.176.0/21 — Protected
                    10.0.184.0/21 — Spare
    10.0.192.0/18 — Spare

Your routing tables would look like this:

“Public”
    10.0.0.0/16 — Local
    0.0.0.0/0  —  Internet Gateway
“Internal-only” (ie, Protected and Private)
    10.0.0.0/16 — Local

Create those two route-tables and then apply them to the correct subnets in each AZ. You’re done.

And in case anyone on your team gets worried about running out of space, show them this table:

    16-bit: 65534 addresses
    18-bit: 16382 addresses
    19-bit: 8190 addresses
    20-bit: 4094 addresses

Obviously, you’re not going to need 4,000 IP addresses for your web servers. That’s not the point. The point is that this VPC has only those routing requirements. There’s no reason to create new subnets in this VPC that don’t need to route differently within the same AZ.

Conclusion

Done properly, this method of planning goes a long way to ensuring you won’t get boxed in by an early decision. Everything that you’ll get into from here — Security Groups, Auto Scaling , Elastic Load Balancing , Amazon Relational Database Service, AWS Direct Connect, and more — will fit neatly into this model.