Application Deployment and Security (6)

 You are learning how software development, applications and networking integrate to make automation possible on a massive scale. You might already be thinking about how to deploy your application. And of course, as a NetAcad student, you already know that security must be part of any deployment.

This module introduces you to application deployment options. You will learn about the components of a continuous integration and continuous deployment pipeline, including containers and microservices. You will learn about examples of deployment in test environments and in production environments. You will also learn about deployment security measures and understand known vulnerabilities.

Understanding Deployment Choices with Different Models

Introduction to Deployment Choices

At this point in the course, you have learned about the basic concepts behind software development and APIs. Now it is time to look at deploying applications.

Even if you are a solo developer building an application for yourself, when deploying your application you must account for a number of different factors, from creating the appropriate environments, to properly defining the infrastructure, to basic security concepts. This simply means that developers need to do more than deliver application code: they need to concern themselves with how applications are deployed, secured, operated, monitored, scaled, and maintained.

Meanwhile, the physical and virtual infrastructure and platforms on which applications are being developed and deployed are quickly evolving. Part of this rapid evolution is aimed at making life easier for developers and operators. For example, platform paradigms such as Containers as a Service and Serverless Computing are designed to let developers focus on building core application functionality, without having to worry about handling underlying platform configuration, mechanics, scaling, and other operations.

But not all current development takes place on these platforms. Developers are confronted with an expanding "stack" of platform options: bare metal, virtual machines, containers, and others, are all hosted on infrastructures and frameworks of increasing flexibility and complexity.

This module discusses some of the places today's software "lives". It goes on to cover basic techniques for deploying and testing applications, plus workflow techniques and tools for delivering software (to development platforms, test environments, staging, and production) quickly and efficiently. Finally, it covers some networking and security basics. All developers should be familiar with these concepts.

Deployment Environments

Some chaos in the early stages of development is normal, but code should be well tested by the time it gets to users. To make that happen, code needs to go through a series of steps to improve its reliability. Code passes through a number of environments, and as it does, its quality and reliability increases. These environments are self-contained, and intended to mimic the ultimate environment in which the code will ‘live’.

Typically, large organizations use a four-tier structure: development, testing, staging, and production.

Development environment

The development environment is where you do your coding. Usually, your development environment bears little resemblance to the final environment. The development environment is typically just enough for you to manage fundamental aspects of your infrastructure, such as containers or cloud networking. You may use an Integrated Development Environment (IDE) or other tool to make deployment easier.

This environment may also include “mock” resources that provide the form of the real resources, but not the content. For example, you might have a database with a minimal number of test records, or an application that mimics the output of a remote service. Each developer typically has their own development environment.

Testing environment

When you believe your code is finished, you may move on to a second environment that has been set aside for testing the code, though when working on small projects, the development and testing environments are often combined. This testing environment should be structurally similar to the final production environment, even if it is on a much smaller scale.

The testing environment often includes automated testing tools such as Jenkins, CircleCI, or Travis CI, as well as integration with a version control system. It should be shared among the entire team. It may also include code review tools such as Gerrit.

Staging environment

After the code has been tested, it moves to the staging environment. Staging should be as close as possible to the actual production environment, so that the code can undergo final acceptance testing in a realistic environment. Instead of maintaining a smaller-scale staging environment, some organizations maintain two matching production environments, one of which hosts the current release of an application, the other standing by to receive a new release. In this case, when a new version is deployed, traffic is shifted (gradually or suddenly, as in "cut over") from the current production environment to the other one. With the next release, the process is done in reverse.

This is, of course, much more affordable in clouds, where an unused, virtualized environment can be torn down and rebuilt automatically when needed.

Production environment

Finally, the code arrives at the production environment, where end users interact with it. At this point it has been tested multiple times, and should be error free. The production environment itself must be sized and constructed to handle expected traffic, including surges that might come seasonally or with a particular event.

Handling those surges is something you can plan for when designing your infrastructure. Before looking at infrastructure, however, you need to know about different models that you can use for deploying your software.

Deployment Models

In the early days of computers, there were no choices regarding how to deploy your software; you simply installed it on the computer itself. Today this model is known as “bare metal,” but it is only one of a variety of options available to you. These options include virtual machines, containers, and newer options such as serverless computing.

Bare metal

The most familiar, and the most basic way to deploy software is by installing it directly on the target computer, or the “bare metal.” In addition to this being the simplest method, bare metal deployment has other advantages, such as the fact that software can access the operating system and hardware directly. This is particularly useful for situations in which you need access to specialized hardware, or for High Performance Computing (HPC) applications in which every bit of speed counts.


One place where bare metal can be a disadvantage, however, is in isolating different workloads from each other. In a bare metal environment, every application on the machine is using the same kernel, operating system, storage, etc.. There are things you can do to isolate some resources, but if this is an issue, other models are likely a better choice. Additionally, bare metal is not very flexible in terms of resources; a machine with 64 GB of RAM is not going to get larger or smaller unless someone physically takes it apart to add or remove hardware.

More commonly, bare metal is now used as infrastructure to host virtualization (hypervisors) and cloud frameworks (orchestrators for virtual compute, storage, and networking resources). Cisco, among others, has pioneered development of software-defined hardware platforms (such as Cisco UCS) that make bare metal infrastructure easily configurable to serve both application and Infrastructure-as-a-Service requirements.

Virtual machines

One way to solve the flexibility and isolation problems is through the use of Virtual Machines, or VMs. A virtual machine is like a computer within your computer; it has its own computing power, network interfaces, and storage.

A hypervisor is software that creates and manages VMs. Hypervisors are available as open-source (OpenStack, Linux KVM, XEN), and also from commercial vendors such as Oracle (VirtualBox), VMware (Horizon, vSphere, Fusion), Microsoft (Hyper-V), and others. Hypervisors are generally classified as either 'Type 1', which run directly on the physical hardware ('bare metal'), and 'Type 2', which run, usually as an application, under an existing operating system.

The use of VMs overcome a number of restrictions. For example, if you had three workloads you wanted to isolate from each other, you could create three separate virtual machines on one bare metal server.


A few things to notice here:

  1. Applications running on the VM are bounded by the VM. Access to resources outside of the VM is via virtual networks.
  2. Even though the VMs are running on the same computer, they can run different operating systems from one another (called 'guest operating systems'), and from the bare metal on which VMs are running (called the 'host operating system').
  3. The total amount of virtual memory allocated to these three VMs is greater than the amount of RAM available on the host machine. This is called “overcommitting”. This is possible because it is unlikely that all three VMs will need all of their virtual memory at the same time, and the hypervisor can timeshare VMs as needed. Overcommitting can lead to performance issues if resource consumption is too extreme.

VMs run on top of a hypervisor, such as KVM, QEMU, or VMware, which provides them with simulated hardware, or with controlled access to underlying physical hardware. The hypervisor sits on top of the operating system and manages the VMs.

VMs can be convenient for several reasons, not the least of which is that a VM image can be saved for future use, or so that others can instantiate and use. This enables you to distribute a VM, or at least, the means to use it. Applications that run VMs, such as VirtualBox and VMware, can also take snapshots, or backups, of a VM so that you can return it to its previous state, if necessary.

Because they are so much like physical machines, VMs can host a wide range of software, even legacy software. Newer application environments, like containers, may not be "real machine-like" enough to host applications that are not written with their limitations in mind.

Container-based infrastructure

Moving up the abstraction ladder from VMs, you will find containers. Software to create and manage or orchestrate containers is available from Docker, AWS (Elasticized Container Service), Microsoft (Azure Container Service), and others.

Containers were designed to provide many of the same benefits as VMs, such as workload isolation and the ability to run multiple workloads on a single machine, but they are architected a bit differently.

For one thing, containers are designed to start up quickly, and as such, they do not include as much underlying software infrastructure. A VM contains an entire guest operating system, but a container shares the operating system of the host machine and uses container-specific binaries and libraries.

Where VMs emulate an entire computer, a container typically represents just an application or a group of applications. The value of using containers is that all of the libraries and binaries you need to run the application are included, so the user does not have to take that additional installation step.

An important distinction between a Docker container and a VM is that each VM has its own complete operating system. Containers only contain part of the operating system. For example, you may have an Ubuntu Linux host computer running a CentOS Linux VM, an Ubuntu Linux VM, and a Windows 10 VM. Each of these VMs has its own complete OS. This can be very resource intensive for the host computer.

With Docker, containers share the same kernel of their host computer. For example, on the Ubuntu Linux host computer you may have an Ubuntu Linux container and a Centos Linux container. Both of these containers are sharing the same Linux kernel. However, you could not have a container running Windows 10 on this same Ubuntu Linux host computer, because Windows uses a different kernel. Sharing the same kernel requires far fewer resources than using separate VMs, each with its own kernel.

Containers also solve the problem that arises when multiple applications need different versions of the same library in order to run. Because each application is in its own container, it is isolated from any conflicting libraries and binaries.

Containers are also useful because of the ecosystem of tools around them. Tools such as Kubernetes make fairly sophisticated orchestration of containers possible, and the fact that containers are often designed to be stateless and to start up quickly means that you can save resources by not running them unless you need to.

Containers are also the foundation of cloud native computing, in which applications are generally stateless. This statelessness makes it possible for any instance of a particular container to handle a request. When you add this to another aspect of cloud computing that emphasizes services, serverless computing becomes possible.

Serverless computing

Let’s start with this important point: to say that applications are “serverless” is great for marketing, but it is not technically true. Of course your application is running on a server. It is just running on a server that you do not control, and do not have to think about. Hence the name “serverless”.

Serverless computing takes advantage of a modern trend towards applications that are built around services. That is, the application makes a call to another program or workload to accomplish a particular task, to create an environment where applications are made available on an “as needed” basis.

It works like this:

Step 1. You create your application.

Step 2. You deploy your application as a container, so that it can run easily in any appropriate environment.

Step 3. You deploy that container to a serverless computing provider, such as AWS Lambda, Google Cloud functions, or even an internal Function as a Service infrastructure. This deployment includes a specification of how long the function should remain inactive before it is spun down.

Step 4. When necessary, your application calls the function.

Step 5. The provider spins up an instance of the container, performs the needed task, and returns the result.


What is important to notice here is that if the serverless app is not needed, it is not running, and you are not getting charged for it. On the other hand, if you are typically calling it multiple times, the provider might spin up multiple instances to handle the traffic. You do not have to worry about any of that.

Because the capacity goes up and down with need, it is generally referred to as “elastic” rather than “scalable.”

While there is a huge advantage in the fact that you are only paying for the resources that are actually in use, as opposed to a virtual machine that may be running all the time, even when its capacity is not needed. The serverless computing model means that you have zero control over the host machine, so it may not be appropriate from a security perspective.

Types of Infrastructure

In the early days of computers, infrastructure was pretty straightforward. The software you so carefully wrote ran on a single computer. Eventually, you had a network that could link multiple computers together. From there, things just got more and more complicated. Let’s look at the various options for designing your infrastructure, such as different types of clouds, and what each does and does not do well.

On-Premises

On-premises

Technically speaking, “on-premises” means any system that is literally within the confines of your building. In this case we are talking about traditional data centers that house individual machines which are provisioned for applications, rather than clouds, external or otherwise.

These traditional infrastructures are data centers with servers dedicated to individual applications, or to VMs, which essentially enable a single computer to act like multiple computers.

Operating a traditional on-premises data center requires servers, storage devices, and network equipment to be ordered, received, assembled in racks ("racked and stacked"), moved to a location, cabled for power and data. This equipment must be provided with environmental services such as power protection, cooling, and fire prevention. Servers then need to be logically configured for their roles, operating systems and software must be installed, and all of it needs to be maintained and monitored.

All of this infrastructure work takes time and effort. Requests for resources need to go through the operations team, which can lead to delays of days, weeks, or even months while new hardware is obtained, prepared, and provisioned.

In addition, scaling an application typically means moving it to a larger server, which makes scaling up or down a major event. That means an application is almost always either wasting money with excess capacity that is not being used, or underperforming because it does not have enough resources.

These problems can be solved by moving to a cloud-based solution.

Private Cloud

The downside of on-premises infrastructure can be easily solved by cloud computing. A cloud is a system that provides self-service provisioning for compute resources, networking, and storage.


A cloud consists of a control plane, which enables you to perform requests. You can create a new VM, attach a storage volume, even create a new network and compute resources.

Clouds provide self-service access to computing resources, such as VMs, containers, and even bare metal. This means that users can log into a dashboard or use the command line to spin up new resources themselves, rather than waiting for IT to resolve a ticket. These platforms are sometimes referred to as Infrastructure-as-a-Service (IaaS). Common private cloud platforms include VMware (proprietary), OpenStack (open source), and Kubernetes (a container orchestration framework). Underlying hardware infrastructure for clouds may be provided by conventional networked bare-metal servers, or by more advanced, managed bare-metal or "hyperconverged" physical infrastructure solutions, such as Cisco UCS and Cisco HyperFlex, respectively.

What distinguishes a private cloud from other types of clouds is that all resources within the cloud are under the control of your organization. In most cases, a private cloud will be located in your data center, but that is not technically a requirement to be called “private.” The important part is that all resources that run on the hardware belong to the owner organization.

The advantage of a private cloud is that you have complete control over where it is located, which is important in situations where there are specific compliance regulations, and that you do not typically have to worry about other workloads on the system.

On the downside, you do have to have an operations team that can manage the cloud and keep it running.

Public Cloud

A public cloud is essentially the same as a private cloud, but it is managed by a public cloud provider. Public clouds can also run systems such as OpenStack or Kubernetes, or they can be specific proprietary clouds such as Amazon Web Services or Azure.

Public cloud customers may share resources with other organizations: your VM may run on the same host as a VM belonging to someone else. Alternatively, public cloud providers may provide customers with dedicated infrastructure. Most provide several geographically-separate cloud 'regions' in which workloads can be hosted. This lets workloads be placed close to users (minimizing latency), supporting geographic redundancy (the East Coast and West Coast regions are unlikely to be offline at the same time), and enabling jurisdictional control over where data is stored.


Public clouds can be useful because you do not have to pay for hardware you are not going to use, so you can scale up virtually indefinitely as long as the load requires it, then scale down when traffic is slow. Because you only pay for the resources you are actually using, this solution can be most economical because your application never runs out of resources, and you do not pay for resources you are not using. You also do not have to worry about maintaining or operating the hardware; the public cloud provider handles that. However, in practice, when your cloud gets to be a certain size, the cost advantages tend to disappear, and you are better off with a private cloud.

There is one disadvantage of public cloud. Because you are sharing the cloud with other users, you may have to contend with situations in which other workloads take up more than their share of resources.

This problem is worse when the cloud provider is overcommitting. The provider assumes not all resources will be in use at the same time, and allocates more "virtual" resources than "physical" resources. For example, it is not unusual to see an overcommit ratio of 16:1 for CPUs, which means that for every physical CPU, there may be 16 virtual CPUs allocated to VMs. Memory can be overcommitted as well. With a ratio of 2:1 for memory, a server with 128GB of RAM might be hosting 256GB of workloads. With a public cloud you have no control over that (save for paying more for dedicated instances or other services that help guarantee service levels).

Hybrid Cloud

As you might guess, hybrid cloud is the combination of two different types of clouds. Typically, hybrid cloud is used to bridge a private cloud and a public cloud within a single application.


For example, you might have an application that runs on your private cloud, but “bursts” to public cloud if it runs out of resources. In this way, you can save money by not overbuying for your private cloud, but still have the resources when you need them.

You might also go in the other direction, and have an application that primarily runs on the public cloud, but uses resources in the private cloud for security or control. For example, you might have a web application that serves most of its content from the public cloud, but stores user information in a database within the private cloud.

Hybrid cloud is often confused with multi-cloud, in which an organization uses multiple clouds for different purposes. What distinguishes hybrid cloud is the use of more than one cloud within a single application. As such, a hybrid cloud application has to be much more aware of its environment than an application that lives in a single cloud.

A non-hybrid cloud application and its cloud are like a fish and the ocean; the fish does not need to be aware of the ocean because the ocean is just there, all around the fish. When you start adding hybrid cloud capabilities to an application, that application has to be aware of what resources are available and from where.

It is best if the application itself does not have to handle these things directly. It is a better practice to have some sort of interface that the application can call when it needs more resources, and that interface makes the decision regarding where to run those resources and passes them back to the application. This way the resource mapping logic can be controlled independently of the application itself, and you can adjust it for different situations. For example, you may keep all resources internal during the testing and debugging phase, then slowly ramp up public cloud use.

One way to accomplish this is through a tool such as Cisco Hybrid Cloud Platform for Google Cloud, which manages networking, security, management, data center, open-source and API software and tools. This provides you with a single, consistent, secure environment for your application, enabling it to work across both on-premises data centers and the Google Cloud.

In addition, container orchestrators have become very popular with companies employing hybrid-cloud deployments. The orchestrators provide a cloud-vendor agnostic layer which the application can consume to request necessary resources, reducing the environmental awareness needed in the application itself.

Edge Cloud

The newest type of cloud is edge cloud. Edge cloud is gaining popularity because of the growth of the Internet of Things (IoT). These connected devices, such as connected cameras, autonomous vehicles, and even smartphones, increasingly benefit from computing power that exists closer to them on the network.

The two primary reasons that closer computing power helps IoT devices are speed and bandwidth. For example, if you are playing a first person shooter game, even half a second of latency between when you pull the trigger and when the shot registers is unacceptable. Another instance where latency may be fatal, literally is with self-driving vehicles. At 55 miles per hour, a car travels more than 40 feet in just 500ms. If a pedestrian steps off the curb, the car cannot wait for instructions on what to do.

There is a second issue. Typically the self-driving car prevents the latency problem by making its own decisions, but that leads to its own problems. These vehicles use machine learning, which requires enormous amounts of data to be passed to and from the vehicle. It is estimated that these vehicles generate more than 4 TB of data every hour, and most networks cannot handle that kind of traffic (especially with the anticipated growth of these vehicles in the market).

To solve both of these problems, an edge cloud moves computing closer to where it is needed. Instead of transactions making their way from an end user in Cleveland, to the main cloud in Oregon, there may be an intermediary cloud, an edge cloud, in Cleveland. The edge cloud processes the data or transaction. It then either sends a response back to the client, or does preliminary analysis of the data and sends the results on to a regional cloud that may be farther away.


Edge cloud computing comprises one or more central clouds that act as a hub for the edge clouds themselves. Hardware for the edge clouds is located as close as possible to the user. For example, you might have edge hardware on the actual cell tower handling the signals to and from a user’s mobile phone.

Another area where you may see edge computing is in retail, where you have multiple stores. Each store might have its own internal cloud. This is an edge cloud which feeds into the regional cloud, which in turn might feed into a central cloud. This architecture gives local offices the benefits of having their own cloud (such as consistent deployment of APIs to ensure each store can be managed, updated, and monitored efficiently).

There is nothing "special" about edge clouds. They are just typical clouds. What makes them "edge" is where they are, and that they are connected to each other. There is one more thing about edge clouds, however. Because they often run on much smaller hardware than "typical" clouds, they may be more resource-constrained. In addition, edge cloud hardware must be reliable, efficient in terms of power usage, and preferably remotely manageable, because it may be located in a remote area, such as a cell tower in the middle of the desert, where servicing the hardware is difficult.

Creating and Deploying a Sample Application

What is Docker?

The most popular way to containerize an application is to deploy it as a Docker container. A container is a way of encapsulating everything you need to run your application, so that it can easily be deployed in a variety of environments. Docker is a way of creating and running that container. Specifically, Docker is a format that wraps a number of different technologies to create what we know today as containers. These technologies are:

  • Namespaces - These isolate different parts of the running container. For example, the process itself is isolated in the pid (process ID) namespace, the filesystem is isolated in the mnt (mount) namespace, and networking is isolated in the net namespace.
  • Control groups - These cgroups are a standard linux concept that enables the system to limit the resources, such as RAM or storage, used by an application.
  • Union File Systems - These UnionFS are file systems that are built layer by layer, combining resources.

A Docker image is a set of read-only files which has no state. A Docker Image contains source code, libraries, and other dependencies needed to run an application. A Docker container is the run-time instance of a Docker image. You can have many running containers of the same Docker image. A Docker image is like a recipe for a cake, and you can make as many cakes (Docker containers) as you wish.

Images can in turn be stored in registries such as Docker Hub. Overall, the system looks like this:


So a simplified version of the workflow of creating a container looks like this:

Step 1. Either create a new image using docker build or pull a copy of an existing image from a registry using docker pull. (Depending on the circumstances, this step is optional. See step 3.)

Step 2. Run a container based on the image using docker run or docker container create.

Step 3. The Docker daemon checks to see if it has a local copy of the image. If it does not, it pulls the image from the registry.

Step 4. The Docker daemon creates a container based on the image and, if docker run was used, logs into it and executes the requested command.

As you can see, if you are going to create a container-based deployment of the sample application, you are going to have to create an image. To do that, you need a Dockerfile.

What is a Dockerfile?

If you have used a coding language such as C, you know that it required you to compile your code. If so, you may be familiar with the concept of a “makefile.” This is the file that the make utility uses to compile and build all the pieces of the application.

That is what a Dockerfile does for Docker. It is a simple text file, named Dockerfile. It defines the steps that the docker build command needs to take to create an image that can then be used to create the target container.

You can create a very simple Dockerfile that creates an Ubuntu container. Use the cat command to create a Dockerfile, and then add FROM ubuntu to the file. Enter Ctrl+D to save and exit the file with the following text and save it in your current directory:

devasc@labvm:~$ cat > Dockerfile
FROM ubuntu:latest
<Ctrl+D>
devasc@labvm:~$ 

That is all it takes, just that one line. Now you can use the docker build command to build the image as shown in the following example. The -t option is used to name the build. Notice the period (.) at the end of the command which specifies that the image should be built in the current directory. Use docker build --help to see all the available options.

devasc@labvm:~$ docker build -t myubuntu:latest .
Sending build context to Docker daemon  983.3MB
Step 1/1 : FROM ubuntu:latest
latest: Pulling from library/ubuntu
692c352adcf2: Pull complete 
97058a342707: Pull complete 
2821b8e766f4: Pull complete 
4e643cc37772: Pull complete 
Digest: sha256:55cd38b70425947db71112eb5dddfa3aa3e3ce307754a3df2269069d2278ce47
Status: Downloaded newer image for ubuntu:latest
 ---> adafef2e596e
Successfully built adafef2e596e
Successfully tagged myubuntu:latest
devasc@labvm:~$

Enter the command docker images to see your image in the list of images on the DEVASC VM:

devasc@labvm:~$ docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
myubuntu            latest              adafef2e596e        3 days ago          73.9MB
ubuntu              latest              adafef2e596e        3 days ago          73.9MB
devasc@labvm:~$

Now that you have the image, use the docker run command to run it. You are now in a bash shell INSIDE the docker image you created. Change to the home directory and enter ls to see that it is empty and ready for use. Enter exit to leave the Docker container and return to your DEVASC VM main operating system.

devasc@labvm:~$ docker run -it myubuntu:latest /bin/sh
# ls
bin  boot  dev  etc  home  lib  lib32  lib64  libx32  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var
# cd home
# ls
# exit
devasc@labvm:~$

Anatomy of a Dockerfile

Of course, if all you could do with a Dockerfile was to start a clean operating system, that would be useful, but what you need is a way to start with a template and build from there.

Note: The steps shown in this rest of this topic are for instruction purposes only. Additional details that you would need to complete these commands in your DEVASC VM are not provided. However, you will complete similar steps in the lab Build a Sample Web App in a Docker Container later in the topic.

Consider the following Dockerfile that containerizes a Python app:

FROM python
WORKDIR /home/ubuntu
COPY ./sample-app.py /home/ubuntu/.
RUN pip install flask
CMD python /home/ubuntu/sample-app.py
EXPOSE 8080

In the Dockerfile above, an explanation of the commands are as follows:

  • The FROM command installs Python in the Docker image. It invokes a Debian Linux-based default image from Docker Hub, with the latest version of Python installed.
  • The WORKDIR command tells Docker to use /home/ubuntu as the working directory.
  • The COPY command tells Docker to copy the sample-app.py file from Dockerfile’s current directory into /home/ubuntu.
  • The RUN command allows you to directly run commands on the container. In this example, Flask is installed. Flask is a platform to support your app as a web app.
  • The CMD command will start the server when you run the actual container. Here, you use the python command to run the sample-app.py inside the container.
  • The EXPOSE command tells Docker that you want to expose port 8080. Note that this is the port on which Flask is listening. If you have configured your web server to listen somewhere else (such as https requests on port 443) this is the place to note it.

Use the docker build command to build the image. In the following output, the image was previously built. Therefore, Docker takes advantage of what is stored in cache to speed up the process.

$ docker build -t sample-app-image .
Sending build context to Docker daemon  3.072kB
Step 1/6 : FROM python
 ---> 0a3a95c81a2b
Step 2/6 : WORKDIR /home/ubuntu
 ---> Using cache
 ---> 17befcf89bab
Step 3/6 : COPY ./sample-app.py /home/ubuntu/.
 ---> Using cache
 ---> c0b3a4f9c568
Step 4/6 : RUN pip install flask
 ---> Using cache
 ---> 8cf8226c9f31
Step 5/6 : CMD python /home/ubuntu/sample-app.py
 ---> Running in 267c5d569356
Removing intermediate container 267c5d569356
 ---> 75cd4bf1d02a
Step 6/6 : EXPOSE 8080
 ---> Running in cc82eaca2028
Removing intermediate container cc82eaca2028
 ---> 9616439582f8
Successfully built 9616439582f8
Successfully tagged sample-app-image:latest
$

As you can see, Docker goes through each step in the Dockerfile, starting with the base image, Python. If this image does not exist on your system, Docker pulls it from the registry. The default registry is Docker Hub. However, in a secure environment, you might set up your own registry of trusted container images. Notice that the image is actually a number of different images layered on top of each other, just as you are layering your own commands on top of the base image.

Notice that between steps such as executing a command, Docker actually creates a new container and builds an intermediate image, a new layer, by saving that container. In fact, you can do that yourself by creating a container, making the changes you want, then saving that container as a new image.

In the previous example, only a small number of the available Dockerfile commands were used. The complete list is available in the Docker documentation in the Dockerfile reference. Currently a list of available commands looks like this:

  • FROM
  • MAINTAINER
  • RUN
  • CMD
  • EXPOSE
  • ENV
  • COPY
  • ENTRYPOINT
  • VOLUME
  • USER
  • WORKDIR
  • ARG
  • ONBUILD
  • STOPSIGNAL
  • LABEL

Enter the command docker images to view a list of images. Notice that there are actually two images that are now cached on the machine. The first is the Python image, which you used as your base. Docker has stored it so that if you were to rebuild your image, you will not have to download it again.

$ docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
sample-app-image    latest              7b1fd666ae4c        About an hour ago   410MB
python              latest              daddc1037fdf        2 days ago          410MB
$

Start a Docker Container Locally

Now that image is created, use it to create a new container and actually do some work by entering the docker run command, as shown in the following output. In this case, several parameters are specified. The -d parameter is short for --detach and says you want to run it in the background. The -P parameter tells Docker to publish it on the ports that you exposed (in this case, 8080).

$ docker run -d -P sample-app-image
1688a2c34c9e7725c38e3d9262117f1124f54685841e97c3c5225af88e30bfc5
$ 

You can see the container by listing processes:

$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                   NAMES
90edd03a9511        sample-app-image    "/bin/sh -c 'python …"   5 seconds ago       Up 3 seconds        0.0.0.0:32774->8080/tcp   jovial_sammet
$

There are a few things to note here. Working backwards, notice that Docker has assigned the container a name, jovial_sammet. You could also have named it yourself with the --name option. For example:

docker run -d -P --name pythontest sample-app-image

Notice also that, even though the container is listening on port 8080, that is just an internal port. Docker has specified an external port, in this case 32774, that will forward to that internal port. This lets you run multiple containers that listen on the same port without having conflicts. If you want to pull up your sample app website, you can use the public IP address for the host server and that port. Alternatively, if you were to call it from the host machine itself, you would still use that externalized port, as shown with the following curl command.

$ curl localhost:32774
You are calling me from 172.17.0.1
$

Docker also lets you specify a particular port to forward, so that you can create a more predictable system:

$ docker run -d -p 8080:8080 --name pythontest sample-app-image
$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                   NAMES
a51da037bf35        sample-app-image    "/bin/sh -c 'python …"   28 seconds ago      Up 27 seconds       0.0.0.0:8080->8080/tcp    pythontest
90edd03a9511        sample-app-image    "/bin/sh -c 'python …"   24 minutes ago      Up 24 minutes       0.0.0.0:32774->8080/tcp   jovial_sammet
$

When your container is running, you can log into it just as you would log into any physical or virtual host using the exec command from the host on which the container is running:

$ docker exec -it pythontest /bin/sh
# whoami
root
# pwd
/var/www/html
# exit
$

To stop and remove a running container, you can call it by its name:

$ docker stop pythontest
pythontest
$ docker rm pythontest
pythontest
$

Now if you look at the running processes again, you can see that it is gone.

$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                   NAMES
90edd03a9511        sample-app-image    "/bin/sh -c 'python …"   25 minutes ago      Up 25 minutes       0.0.0.0:32774->8080/tcp   jovial_sammet
$

Save a Docker Image to a Registry

Now that you know how to create and use your image, it is time to make it available for other people to use. One way to do this is by storing it in an image registry.

By default, Docker uses the Docker Hub registry, though you can create and use your own registry. You will need to start by logging in to the registry:

$ docker login
Login with your Docker ID to push and pull images from Docker Hub. If you don't have a Docker ID, head over to https://hub.docker.com to create one.
Username: devnetstudent # This would be your username
Password:               # This would be your password
WARNING! Your password will be stored unencrypted in /home/ubuntu/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store
Login Succeeded
$

Next, you commit a running container instance of your image. For example, the pythontest container is running in this example. Commit the container with the docker commit command.

$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                   NAMES
54c44606344c        sample-app-image    "/bin/sh -c 'python …"   4 seconds ago       Up 2 seconds        0.0.0.0:8080->8080/tcp    pythontest
$ docker commit pythontest sample-app
Sha256:bddc326383032598a1c1c2916ce5a944849d90e4db0a34b139eb315af266e68b
$

Next, use the docker tag command to give the image you commited a tag. The tag takes the following form:

<repository>/<imagename>:<tag>

The first part, the repository, is usually the username of the account storing the image. In this example, it is devnetstudent. Next is the image name, and then finally the optional tag. (Remember, if you do not specify it, it will come up as latest.)

In this example, the tag could be v1, as shown here:

$ docker tag sample-app devnetstudent/sample-app:v1
$

Now the image is ready to be pushed to the repository:

$ docker push devnetstudent/sample-app:v1
The push refers to repository [docker.io/nickchase/sample-app]
e842dba90a43: Pushed
868914f88a69: Pushed
c7d71f6230b3: Pushed
1ed9b15dd229: Pushed
00947a3aa859: Mounted from library/python
7290ddeeb6e8: Mounted from library/python
d3bfe2faf397: Mounted from library/python
cecea5b3282e: Mounted from library/python
9437609235f0: Mounted from library/python
bee1c15bf7e8: Mounted from library/python
423d63eb4a27: Mounted from library/python
7f9bf938b053: Mounted from library/python
f2b4f0674ba3: Mounted from library/python
v1: digest: sha256:28e119f43e9c8e5e44f167d9baf113cc91d4f8b461714cd6bb578ebb0654f243 size: 3052
$

From here you can see that the new image is stored locally:

$ docker images
REPOSITORY             TAG                 IMAGE ID            CREATED              SIZE
sample-app             latest              bddc32638303        About a minute ago   410MB
devnetstudent/sample-app   v1                  bddc32638303        About a minute ago   410MB
$

Create a Development Environment

As you may recall, there are four different environments in a typical workflow:

  • The Development environment
  • The Testing environment
  • The Staging environment
  • The Production environment

Start by creating the development environment.

The development environment is meant to be convenient to the developer; it only needs to match the production environment where it is relevant. For example, if the developer is working on functionality that has nothing to do with the database, the development environment does not need a replica of the production database, or any database at all.

A typical development environment can consist of any number of tools, from Integrated Development Environments (IDEs) such as Eclipse to databases to object storage. The important part here is that it has to be comfortable for the developer.

In this case, you are going to build a simple Python app with tools available from the basic command line, Bash. You can also use Bash to perform testing and deployment tasks, so start with a Bash refresher.

Continuous Integration/Continuous Deployment (CI/CD)


Introduction to CI/CD

Continuous Integration/Continuous Deployment (CI/CD) is a philosophy for software deployment that figures prominently in the field of DevOps. DevOps itself is about communication and making certain that all members of the team are working together to ensure smooth operation.

Continuous Integration

Have you ever done a lot of work on an application and when you tried to merge it back into the main application, there were many merge conflicts, any one of which carried the potential to introduce major bugs? Continuous Integration is intended to eliminate this problem.

The idea behind Continuous Integration is that you, and all other developers on the project, continually merge your changes with the main branch of the existing application. This means that any given change set is small and the potential for problems is low. If everyone is using the main branch, anyone who checks out code is going to have the latest version of what everyone else is developing.

As part of this process, developers are expected to perform extensive, and usually automated, testing on their code before merging back into the main branch. Doing this, the idea is that most issues are caught before they become a more serious problem.

The Continuous Integration process provides a number of additional benefits, as every commit provides an opportunity for the system to perform additional tasks. For example, the pipeline might be set up to perform these tasks:

  • Code compilation
  • Unit test execution
  • Static code analysis
  • Integration testing
  • Packaging and versioning
  • Publishing the version package to Docker Hub or other package repositories

Note that there are some situations that involve large and complicated changes, such as the addition of new features, in which changes must be grouped together. In this case, every commit may trigger only part of the CI pipeline, with the packaging and versioning steps running only when the entire feature is merged to the master.

In some cases, adjusting to this way of working requires a change in thinking on the part of the organization, or on the part of individual developers who may be used to working in their own branch, or on feature branches. This change is necessary, however, if you are going to achieve the next step: Continuous Delivery, which is not quite the same as Continuous Deployment.

Continuous Delivery

Continuous Delivery is the process of developing in sprints that are short enough so that the code is always in a deployable state. With Continuous Integration, small change sets are continuously integrated into the main code branch. Continuous Delivery means that those changes are engineered to be self-contained to the point where at any given time, you could deploy a working application.

The process looks something like this:

Step 1. Start with the version artifact that was created as part of the Continuous Integration process.

Step 2. Automatically deploy the candidate version on staging.

Step 3. Run integration tests, security tests, performance tests, scale tests, or other tests identified by the team or organization. These are known as gating tests because they determine whether this version of the software can be promoted further in the deployment process.

Step 4. If all gating tests pass, tag this build as suitable for production.

Note that Continuous Delivery does not mean you deploy constantly, that process is called Continuous Deployment. Continuous Delivery ensures that you always have a version that you can deploy.


This process tells us two things:

  • You must think about testing in advance. In the module on Software Development we discussed "Test Driven Development,” and the general idea is that you write automated test routines that can be run by the CI/CD infrastructure.
  • If something breaks, everything stops. The idea behind this concept is that if a bug is discovered, all other development stops until it has been fixed, returning the system to a deployable state. This might be accomplished through finding and fixing the bug, or it might be accomplished by rolling back changes until the error disappears, but the important part is that the system must stay deployable. In practice, most organizations do not actually follow this procedure, but it is the primary idea behind CI/CD.

Continuous Deployment

Continuous Deployment is the ultimate expression of CI/CD. When changes are made, tested, integrated with the main branch, and tested again, they are deployed to production using automation. This means that code is being deployed to production constantly, which means your users are going to be your final testers. In other words, Continuous Deployment is a special type of Continuous Delivery, in which every build that is marked as ready for production gets deployed.

Some organizations favor this type of deployment because it means that users always have the most up to date code. Most organizations take a more cautious approach that requires a human to push the code to production.

Preventing impact to users

Although we try to do extensive testing as part of the CI/CD process, there is always the possibility that a bad build will pass the gate. In order to avoid impacting users, or at least to limit the impact, you can use deployment strategies such as:

  • Rolling upgrade - This is the most straightforward version of Continuous Delivery, in which changes are periodically rolled out in such a way that they do not impact current users, and nobody should have to "reinstall" the software.
  • Canary pipeline - In this case, the new version is rolled out to a subset of users (or servers, depending on the architecture). If these users experience problems, the changes can be easily rolled back. If these users do not experience problems, the changes are rolled out to the rest of production.
  • Blue-green deployment - In this case, an entirely new environment (Blue) is created with the new code on it, but the old environment (Green) is held in reserve. If users on the new environment experience problems, traffic can be diverted back to the original environment. If there are no problems within a specific amount of time, the new environment becomes the production environment and the old one is retired to be used for the next change.

CI/CD Benefits

Companies are willing to make such a seemingly “drastic” change in their processes because of the benefits that come with using CI/CD for development. These benefits include:

  • Integration with agile methodologies - Agile development is built around the idea of short sprints, after which the developer team delivers a functional application with some subset of the required features. CI/CD works within that same short sprint framework. Every commit is a version of the “deliver a working version of the software” concept.
  • Shorter Mean Time To Resolution (MTTR) - Because change sets are small, it becomes much easier to isolate faults when they do occur, and to either fix them or roll them back and resolve any issues.
  • Automated deployment - With automated testing and predictable deployment comes the ability to do automated deployments. This means it is possible to use deployment strategies such as canary release pipeline deployment, in which one set of users gets the new feature set and the rest gets the old. This process enables you to get live testing of the new feature to ensure it is functioning as expected before rolling it out to the entire user base.
  • Less disruptive feature releases - With development proceeding in small chunks that always result in a deployable artifact, it is possible to present users with incremental changes rather than large-scale changes that can be disorienting to users.
  • Improved quality - All of these benefits add up to higher quality software because it has been thoroughly tested before wide scale adoption. And because error resolution is easier, it is more likely to be handled in a timely manner rather than accruing technical debt.
  • Improved time to market - Because features can be rolled out individually, they can be offered to users much more quickly than if they had to be deployed all at the same time.

Example Build Job for Jenkins

Note: The steps shown in this rest of this topic are for instruction purposes only. Additional details that you would need to complete these commands in your DEVASC VM are not provided. However, you will complete similar steps in the lab Build a CI/CD Pipeline Using Jenkins later in the topic.

In this part we show a deployment pipeline, which is normally created with a build tool such as Jenkins. These pipelines can handle tasks such as gathering and compiling source code, testing, and compiling artifacts such as tar files or other packages. All these examples show screenshots from an existing Jenkins server.

Example build job for Jenkins

The fundamental unit of Jenkins is the project, also known as the job. You can create jobs that do all sorts of things, from retrieving code from a source code management repo such as GitHub, to building an application using a script or build tool, to packaging it up and running it on a server.

Here is a simple job that retrieves a version of the sample application from GitHub and runs the build script. Then you do a second job that tests the build to ensure that it is working properly.

First, create a New Item in the Jenkins interface by clicking the "create new jobs" link on the welcome page:


Enter a name, choose Freestyle project (so that you have the most flexibility) and click OK.


Scroll down to Source Code Management and select Git, then enter a GitHub repository URL for the Repository URL. Typically this is a repository to which you have write access so you can get scripts merged to. It is best to store build scripts within the repository itself and use version control on the script.


Now scroll down to Build and click Add Build Step. Choose Execute shell.


In the Command box, add the command:

./buildscript.sh

This script is intended to be part of a repo, and is downloaded as a first step. From here you could use a post-build action to run another job, but you would have to create it first. Click Save to move on.


On the left-hand side, click Build Now to start the job.


You can see the job running in the left column. Move your mouse over the build number to get a pulldown menu that includes a link to the Console Output.


The console is where you can see all of the output from the build job. This is a great way to see what is happening if the build shows up with a red circle, indicating that the build has failed.


Click the Jenkins link and New Item to start a new job, then create another Freestyle job, this time called TestAppJob. This time, leave Source Code Management as None because you have already done all that in the previous job. But you have the option to set a Build Trigger so that this job runs right after the previous job, BuildAppJob.


Next scroll down and once again add a Build Step of Execute shell script.

Add the following script as the command, using the IP address of an example Jenkins server. (In this case it is set up locally.)

if [ "$(curl localhost:8000/test)" = "You are calling me from 172.17.0.1" ]; then
   exit 0
else
   exit 1
fi

Taking this example step by step, first, check to see if a condition is true. That condition is whether the output of calling the test URL gives you back text that says "You are calling me from" followed by your IP address. Remember, the routine is running on the Jenkins server, so that is the IP address that you want in your script.

If that comes back correctly, exit with a code of 0, which means that there are no errors. This means the build script was successful. If it does not come back correctly, exit with a value of 1, which signals an error.

Next, you will look at the console output to see the results.

Now, put these jobs together into a pipeline.

Example Pipeline in Jenkins

Now that you have seen how these two jobs are built, look at how to build an actual pipeline. This examples shows a third New Item, and this time Pipeline is the selected type.


You will notice that you have a number of different ways to trigger a pipeline, but in this case, just trigger it manually.



Look at the pipeline section, scrolling down.

This code can go into the script box:

node {
   stage('Preparation') {
       catchError(buildResult: 'SUCCESS') {
          sh 'sudo docker stop runningsample'
          sh 'sudo docker rm runningsample'
       }
   }
   stage('Build') {
       build 'BuildAppJob'
   }
   stage('Results') {
       build 'TestAppJob'
   }
}

Look at this one step at a time. First, you are executing this pipeline on a single node. The pipeline itself has three stages, Preparation, Build, and Results. These stages that are run sequentially. If an early stage fails, the pipeline stops.

Use the Preparation stage to stop and remove the container if it is already running. If there is not yet a running container, you will get an error, so set the script to catch any errors and return a “SUCCESS” value. This way, the pipeline continues on.

The next stage (the Build stage) you can simply call the BuildAppJob. If it succeeds, you will move on to calling the TestAppJob. The results of TestAppJob will determine whether the pipeline itself succeeds or fails. Click save and Build Now to run the pipeline.


Each time you build the pipeline, you will see the success or failure of each stage. If you do have a failure, you can easily see what is happening by running your mouse over the offending stage:


To make changes, click the Configure link, fix any problems, and save and build again.
Now, all of those build jobs and pipeline pieces have come together as CI/CD components in a line to get a set of jobs or scripts done. While it is a simple example, it can come in handy when you want to inspect other's build jobs or start building your own CI/CD pipeline.

Project Activity 4: Automated Software Testing and Deployment

In this activity, you will complete the following tasks:

  • Design an automated testing process.
  • Implement your automated testing process.

Refer to the DEVASC Project Rubric below for this activity to record your process and outcomes.

Scenario

The marketing department has received very favorable reviews and feedback about your application. Your manager has acquired funding and resources for your team to add some of the backlog features to your application.

As your team continues working on the code, each team member contributes to your shared application codebase on GitHub with their changes. Despite the fact that you are trying to do doing manual code reviews for each commit, or pull request, sometimes a team member’s change will break some functionality in the application. Many times, such issues are identified only when the application is deployed and running. This usually leads to long troubleshooting sessions and slows down the development.

To eliminate the issues from above and ease development, your team has been asked to use some of the best practices from the DevOps methodology. Design a testing process that would be triggered automatically by GitHub whenever there is a new commit to the codebase. Only if all the test cases are passed, can the committed change be merged, or further used. Your manager has recommended that, in addition to Jenkins, explore other CI/CD tools like GitHub Actions, CircleCI, etc., that might help you to build this automated CI/CD pipeline.

Deliverable/Rubric:

  • Which features did your team choose to implement from your backlog?
  • What are the specific objectives of these features?
  • Why were these features chosen?
  • Document your team member roles, knowledge and skillsets, if anything has changed in relation to this new Project Activity.
  • Provide a brief description of your team strategy for completing this project if anything has changed in relation to this new Project Activity.

Final Deliverables

At this point, your project has been completed. Your team will present to your manager:

  • Presentation
  • Team activities and reflection
Presentation

Deliverable/Rubric: Create a presentation about the project you selected. Your presentation should include:

  • Application code with new features
  • Show changes pushed to GitHub
  • Provide test cases used
  • Describe CI/CD pipeline functionality
  • Reflection points – what issues have you faced while working on this activity, how did you find solutions, what have you learned, etc.
Team Activities and Reflection

Deliverable/Rubric: Your manager is interested in knowing how everyone is continuing to work together as a team. Here is a list of questions from your manager:

  • What did you enjoy about working as a team? What worked well?
  • What team problems did you encounter and how did you resolve them?
  • What technical problems did you encounter and how did you resolve them?
  • How was each team member held accountable individually and for the team as a whole?
  • What was your team's decision-making process?
  • Overall, how were the team dynamics and what were any lessons learned?

Networks for Application Development and Security

Introduction

These days, you must take networking into account for all but the simplest of use cases. This is especially true when it comes to cloud and container deployments. Here are some of the applications you need to consider when it comes to cloud deployment:

  • Firewalls
  • Load balancers
  • DNS
  • Reverse proxies

Firewall

Firewalls are a computer’s most basic defense against unauthorized access by individuals or applications. They can take any number of forms, from a dedicated hardware device to a setting within an individual computer’s operating system.

At its most basic level, a firewall accepts or rejects packets based on the IP addresses and ports to which they're addressed. For example, consider a web server. This server has on it the actual web server software, as well as the application that represents the site and the database that contains the content the site displays. Without a firewall, the server could be accessed in multiple ways:

  • A web browser can access the web application with an HTTP request to port 80 or an HTTPS request to port 443
  • A database client can access the database with a UDP request to port 5000
  • An SSH client can log into the server itself with a TCP request to port 22

But is that really what you want? You definitely want to access the web application, though perhaps you only want HTTPS requests. You definitely do NOT want anyone to access the database directly; in this case, only the web application really needs that access. You might want to be able to log in to the server using an SSH client, rather than having to have physical access to the machine.

To accomplish this, set up a firewall with specific “rules”, which are layered on top of each other. So for example, you may have this rule:

  • Deny all access to anyone, with these rules...
    • Allow TCP requests to port 443 from anyone
    • Block all TCP requests to port 22
    • Block all TCP requests to port 80

(Both HTTPS and SSH use TCP requests.)


So in this case, the only access to the machine would be those HTTPS requests. Anything else would be blocked by the firewall.

In some cases, you do want to enable access, but not from just anybody. For example, you might set up your systems so that logins to sensitive systems can only come from a single machine. This is called a “jump box”, and everyone must log into that server first, then log into the target machine from there. A jump box can be used to provide additional access while still providing an additional layer of security.


For example, if your jump box had an internal IP address of 172.0.30.42, your firewall rules might look like this:

  • Deny all access to anyone, except...
    • Allow TCP requests to port 443 from anyone
    • Allow TCP requests to port 22 from 172.0.30.42 only
    • Allow UDP requests to port 5000 from 172.0.30.42 only

With software deployments, here are some things to consider when it comes to firewalls:

  • Firewalls should keep any outside access to the untested application from occurring.
  • Firewalls need to be configured so that the application can be appropriately tested. For example, if the application needs to access a development version of a database, the firewall rules will need to allow that.
  • The environment should be as close a replica of production as possible in order to catch any firewall-related configuration issues quickly.

Note that firewalls do not just keep traffic from coming in; they can also be configured to keep traffic from getting out. For example, schools often have firewalls set up that keep students from accessing all but a small handful of educational sites using the school network.

Load Balancer

A load balancer does exactly what it says; it takes requests and “balances” them by spreading them out among multiple servers. For example, if you have 10 servers hosting your web application, requests will come first to the load balancer, which will then parcel them out among those 10 hosts.


Load balancers can make their decisions on which servers should get a particular request in a few different ways:

Persistent sessions - If an application requires a persistent session, for example, a user needs to be logged in, the load balancer will send requests to the server handling that session.


Round robin - With round robin load balancing, the server simply sends each request to the “next” server on the list.


Least connections - Often it makes sense to send requests to the server that is the least “busy” - the least number of active connections. In the figure, Server 3 receives the first request because it is currently not handling any transactions. Server 3 receives the second request because the service has the least number of active transactions. Server 1 and Server 3 both now have two active transactions, so load balancer will now use the round robin method. Server 1 receives the third request, Server 3 the fourth request and Server 1 the fifth request.


IP Hash - With this algorithm, the load balancer makes a decision based on a hash (an encoded value based on the IP address of the request). You can think of this as similar to when you attend an event and lines are formed for different stations based on the first letter of your last name. This is also a simple way to maintain consistent sessions.


Other, more complicated algorithms can be used for deployment purposes. Some of these examples include:

  1. Blue-green deployment - Recall that this kind of deployment applies changes to a new production environment (blue) rather than making the changes on the existing production environment (green). A load balancer sends traffic to the blue environment when it is ready, and if issues arise, the load balancer can send traffic back to the green environment and changes can be rolled back.
  2. Canary deployment - This deployment starts by diverting a small fraction of your traffic to the blue environment. A load balancer can then increase the amount of traffic diverted to the blue environment until issues are detected and traffic goes back to the old environment, or all servers and users are on the new environment, and the old one is retired or used for the next push.

DNS

DNS, or the Domain Name System, is how servers on the internet translate human-readable names (such as developer.cisco.com or www.example.com) into machine-routable IP addresses such as 74.125.157.99 (for Google) or 208.80.152.201 (for Wikipedia). These IP addresses are necessary to actually navigate the internet.

In software deployment, this system is beneficial because you can change the meaning of these addresses. In this example, the application is coded to look for the database at database.example.com:5000, which lives at the IP address of 203.0.113.25.

In another example, you might create a development version of the application, and you would want it to hit a development version of the database, which lives at 172.24.18.36.

You can set the development machine to use a DNS server that lists database.example.com as 172.24.18.36. You can test the application against the test database without actually making any changes to the application.

Another way to use DNS as part of software deployment is to emulate some of the functions that might be performed by a load balancer. Do this by changing the IP address of the target server when you are ready to go “live”. (This is not necessarily a good option because DNS changes can take a day or more to propagate through the internet at large.)

Reverse Proxy

A reverse proxy is similar to a regular proxy: however, while a regular proxy works to make requests from multiple computers look like they all come from the same client, a reverse proxy works to make sure responses look like they all come from the same server.

Here is an example of a forward proxy:


All requests coming from the internal network go through the proxy server, so it is impossible to know what is behind the proxy. Often this method is used to track or limit access, as in a school environment.

In the case of a reverse proxy, the situation is similar:


All requests to the network come to the proxy, where they are evaluated and sent to the appropriate internal server for processing. Like a forward proxy, a reverse proxy can evaluate traffic and act accordingly. In this way, it is similar to, and can be used as, a firewall or a load balancer.

Because it is so much like these functions, Reverse Proxy can also be used for software deployment in similar ways.

Securing Applications


Securing the Data

It is no secret that security is a major issue in today’s world. That applies to both data and applications. If one is secure and the other is not, both are vulnerable.

In this part, you will look at some of the issues involved in securing both your data and your application, starting with data.

Data is not just the heart of your application; it is said to be the new priceless resource; and it has got to be protected, both for practical and legal reasons. That applies whether data is being stored (also known as data at rest) or transferred from one server to another (also known as data in flight or in motion).

Best practices for storing encrypted data

When it comes to protecting data at rest, there are a few things you need to take into consideration.

Encrypting data

You have probably seen plenty of news stories about data breaches. These are typically a matter of individuals accessing data that is stored but not protected. In this context, this means that data is stored in such a way that not only can an individual gain access, but that when they do, the data is readily visible and usable.

Ideally, unauthorized persons or applications will never gain access to your systems, but obviously you cannot guarantee that. So when a person with bad intentions (who could just as easily be a disgruntled employee who has legitimate access) gets access to your database, you do not want them to see something like this:


Instead, you want them to see something more like this:


There are two methods for encrypting data: one-way encryption, and two-way encryption.

Two-way encryption is literally what it sounds like; you encrypt the data using a key, and then you can use that key (or a variation on it) to decrypt the data to get it back in plaintext. You would use this for information you would need to access in its original form, such as medical records or social security numbers.

One way encryption is simpler, in that you can easily create an encrypted value without necessarily using a specific key, but you cannot unencrypt it. You would use that for information you do not need to retrieve, just need to compare, such as passwords. For example, let’s say you have a user, bob, who has a password of munich. You could store the data as:

username

scrambled_password

Bob

@#$%SD@$$drw

In this case, scrambling munich gives you @#$%SD@$$drw, but there is no way to get “munich” back from that. To check Bob’s password when he logs back in, you would need to do something like this:

select * from users where username = ‘bob’ and scrambled_password=scrambledversion(‘munich’)

This would evaluate to:

select * from users where username = ‘bob’ and scrambled_password=’@#$%SD@$$drw’

Of course, the question then becomes, if you are going to encrypt your data using a key, where do you store the key safely? You have a number of different options, from specialized hardware (good, but difficult and expensive), to using a key management service such as Amazon Key Management Service (uses specialty hardware but is easier and less expensive), to storing it in the database itself (which is not best practice, has no specialty hardware or physical characteristics, and is vulnerable to attack).

Software vulnerabilities

When it comes to software vulnerabilities, you need to worry about two different types: your own, and everyone else’s.

Most developers are not experts in security, so it is not uncommon to inadvertently code security vulnerabilities into your application. For this reason, there are a number of different code scanning tools, such as Bandit, Brakeman, and VisualCodeGrepper, that will scan your code looking for well-known issues. These issues may be embedded in code you have written yourself, or they may involve the use of other libraries.

These other libraries are how you end up with everyone else’s vulnerabilities. Even software that has been in use for decades may have issues, such as the Heartbleed bug discovered in OpenSSL, the software that forms the basis of much of the internet. The software has been around since 1998, but the bug was introduced in 2012 and sat, undetected, for two years before it was found and patched.

Make sure that someone in your organization is responsible for keeping up with the latest vulnerabilities and patching them as appropriate.

Storing too much data

Remember that hackers cannot get what you do not store. For example, if you only need a credit card authorization code for recurring billing, there is no reason to store the entire credit card number. This is particularly important when it comes to personally identifying information such as social security numbers and birthdays, and other information that could be considered “private”, such as a user’s history.

Unless you need data for an essential function, do not store it.

Storing data in the cloud

Remember that when you store data in “the cloud” you are, by definition, storing it on someone else’s computer. While in many cases a cloud vendor’s security may be better than that of most enterprises, you still have the issue that those servers are completely outside of your control. You do not know which employees are accessing them, or even what happens to hard drives that are decommissioned. This is particularly true when using SSDs as storage, because the architectural structure of an SSD makes it difficult or impossible to truly wipe every sector. Make sure that your cloud data is encrypted or otherwise protected.

Roaming devices

In May of 2006, the United States Department of Veterans Affairs lost a laptop that contained a database of personal information on 26.5 million veterans and service members. The laptop was eventually recovered, but it is still a great example of why information must be stored in a secure way, particularly because the world’s workforce is much more mobile now than it was in 2006.

In addition, apps are increasingly on devices that even more portable than laptops, such as your tablet and especially your mobile phone. They are simply easier to lose. These might not even be traditional apps such as databases, but apps targeted at the end user. Be sure you are not leaving your data vulnerable by encrypting it whenever possible.

Best practices for transporting data

Data is also vulnerable when it is being transmitted. In fact, it may be even more vulnerable because of the way the internet is designed, where packets pass through multiple servers (that may or may not belong to you) on their way to their final destination.

This structure makes your data vulnerable to “man in the middle” attacks, in which a server along the way can observe, steal, and even change the data as it goes by. To prevent these problems you can use:

  • SSH - When connecting to your servers, always use a secure protocol such as SSH, or secure shell, rather than an insecure protocol such as Telnet. SSH provides for authentication and encryption of messages between the source and target machines, making it difficult or impossible to snoop on your actions.
  • TLS - These days, the vast majority of requests to and from the browser use the https:// protocol (rather than http://). This protocol was originally called SSL, or Secured Sockets Layer, but over the years it has been gradually replaced with TLS, or Transport Layer Security. TLS provides message authentication and stronger ciphers than its predecessor. Whenever possible you should be using TLS.
  • VPN - Virtual Private Networks, are perhaps the most important means for protecting your application. A VPN makes it possible to keep all application-related traffic inside your network, even when working with remote employees. The remote employee connects to a VPN server, which then acts as a proxy and encrypts all traffic to and from the user.

Using a VPN has several benefits. First, traffic to and from the user is not vulnerable to snooping or manipulation, so nobody can use that connection to damage your application or network. Second, because the user is essentially inside the private network, you can restrict access to development and deployment resources, as well as resources that do not need to be accessible to end users, such as raw databases.

What is SQL Injection?

SQL injection is a code injection technique that is used to attack data-driven applications, in which malicious SQL statements are inserted into an entry field for execution (e.g. to dump the database contents to the attacker). SQL injection must exploit a security vulnerability in an application's software. Two examples are when user input is either incorrectly filtered for string literal escape characters embedded in SQL statements, or user input is not strongly typed and unexpectedly executed. SQL injection is mostly known as an attack vector for websites but can be used to attack any type of SQL database.

SQL injection attacks allow attackers to spoof identity, tamper with existing data, cause repudiation issues such as voiding transactions or changing balances, allow the complete disclosure of all data on the system, destroy the data or make it otherwise unavailable, and become administrators of the database server.

SQL in Web Pages

SQL injection is one of the most common web hacking techniques. It is the placement of malicious code in SQL statements, via web page input. It usually occurs when you ask a user for input, like their username/userid, and instead of a name/id, the user gives you an SQL statement that you will unknowingly run on your database.

Look at the following example which creates a SELECT statement by adding a variable (uid) to a select string. The variable is fetched from user input using request.args("uid"):

uid = request.args("uid");
str_sql = "SELECT * FROM Users WHERE UserId = " + uid;

One example is SQL Injection based on 1=1 is always true (in SQL-speak).

Take a look at the code to create an SQL statement to select user profile by UID , with a given UserProfile UID.

If there is not an input validator to prevent a user from entering "wrong" input, the user can enter input like this:

UID:

2019 OR 1=1

The output SQL statement will be like this:

SELECT * FROM UserProfiles WHERE UID = 2019 OR 1=1;

The SQL statement above is valid, but will return all rows from the "UserProfiles" table, because OR 1=1 is always TRUE.

What will happen if the "UserProfiles" table contains names, emails, addresses, and passwords?

The SQL statement will be like this:

SELECT UID, Address, Email, Name, Password FROM UserProfiles WHERE UID = 2019 or 1=1;

A malware creator or hacker might get access to all user profiles in database, by simply typing 2019 OR 1=1 into the input field.

Another example is SQL Injection based on ""="" is always true. Here is that example.

Username: user_a Password: pass_user_a

Example:

u_name = request.args("uid");
u_pass = request.args("password");
sql = 'SELECT * FROM UserProfiles WHERE Name ="' + u_name + '" AND Pass ="' + u_pass + '"'

Here is the expected SQL statement:

SELECT * FROM UserProfiles WHERE Name ="user_a" AND Pass ="pass_user_a"

But the hacker can get access to user names and passwords in a database by simply inserting " OR ""=" into the user name or password text box:

User Name:
" OR ""="
Password:
" OR ""="

The output code will create a valid SQL statement at server side, like this:

Output:

SELECT * FROM UserProfiles WHERE Name ="" OR ""="" AND Pass ="" OR ""=""

The SQL above is valid and will return all rows from the "Users" table, because OR ""="" is always TRUE.

SQL Injection based on batched SQL statements

Most databases support batched SQL statements. A batch of SQL statements is a group of two or more SQL statements, separated by semicolons.

The SQL statement below will return all rows from the "UserProfiles" table, then delete the "UserImages" table.

Example:

SELECT * FROM UserProfiles; DROP TABLE UserImages

Look at the following example:

uid = request.args("uid");
strSQL = "SELECT * FROM UserProfiles WHERE UID = " + uid;

Now you input the UID:

2019; DROP TABLE UserImages

The SQL statement would look like this:

SQL:

SELECT * FROM Users WHERE UID = 2019; DROP TABLE UserImages;

Hopefully these examples influence you to design your data intake forms to avoid these common security hacks.

How to Detect and Prevent SQL Injection

SQL injection vulnerability exists because some developers do not care about data validation and security. There are tools that can help detect flaws and analyze code.

Open source tools

To make detecting a SQL injection attack easy, developers have created good detection engines. Some examples are SQLmap or SQLninja.

Source code analysis tools

Source code analysis tools, also referred to as Static Application Security Testing (SAST) Tools, are designed to analyze source code and/or compiled versions of code to help find security flaws.

These tools can automatically find flaws such as buffer overflows, SQL Injection flaws, and others.

You can detect and prevent SQL injection by using a database firewall. Database Firewalls are a type of Web Application Firewall that monitor databases to identify and protect against database specific attacks. These attacks mostly seek to access sensitive information stored in the databases.

Work with a database firewall

SQL injection filtering works in a way similar to email spam filters. Database firewalls detect SQL injections based on the number of invalid queries from a host, while there are OR and UNION blocks inside of the request, or others.

Use prepared statements

The use of prepared statements with variable binding (also known as parameterized queries) is how all developers should first be taught how to write database queries. They are simple to write, and easier to understand than dynamic queries. Parameterized queries force the developer to first define all the SQL code, and then pass in each parameter to the query later. This coding style allows the database to distinguish between code and data, regardless of what user input is supplied.

Prepared statements ensure that an attacker is not able to change the intent of a query, even if SQL commands are inserted by an attacker. In the safe example below, if an attacker were to enter the userID of tom' or '1'='1, the parameterized query would not be vulnerable and would instead look for a username which literally matched the entire string tom' or '1'='1.

// Get customer's name from parameter
String              custname = request.getParameter("custname");
// Perform input validation to detect attacks
String              query = "SELECT account_balance FROM user_data WHERE user_name = ? ";
PreparedStatement   pStatement = connection.prepareStatement( query );
pStatement.setString( 1, custname);
ResultSet           results = pStatement.executeQuery();

Use Stored Procedures

Stored procedures are not always safe from SQL injection. However, certain standard stored procedure programming constructs have the same effect as the use of parameterized queries when implemented safely. This is the norm for most stored procedure languages.

They require the developer to just build SQL statements with parameters, which are automatically parameterized unless the developer does something largely out of the norm. The difference between prepared statements and stored procedures is that the SQL code for a stored procedure is defined and stored in the database itself, and then called from the application. Both of these techniques have the same effectiveness in preventing SQL injection so your organization should choose which approach makes the most sense for you.

Note: 'Implemented safely' means the stored procedure does not include any unsafe dynamic SQL generation. Developers do not usually generate dynamic SQL inside stored procedures. However, it can be done, but should be avoided. If it cannot be avoided, the stored procedure must use input validation or proper escaping. This is to make sure that all user supplied input to the stored procedure cannot be used to inject SQL code into the dynamically generated query. Auditors should always look for uses of sp_executeexecute or exec within SQL Server stored procedures. Similar audit guidelines are necessary for similar functions for other vendors.

There are also several cases where stored procedures can increase risk. For example, on MS SQL server, you have three main default roles: db_datareaderdb_datawriter and db_owner. Before stored procedures came into use, DBA's would give db_datareader or db_datawriter rights to the web service's user, depending on the requirements. However, stored procedures require execute rights, a role that is not available by default. Some setups where the user management has been centralized, but is limited to those three roles, cause all web apps to run under db_owner rights so that stored procedures can work. That means that if a server is breached, the attacker has full rights to the database, where previously they might only have had read-access.

Whitelist Input Validation

Various parts of SQL queries are not legal locations for the use of bind variables, such as the names of tables or columns, and the sort order indicator (ASC or DESC). In such situations, input validation or query redesign is the most appropriate defense. For the names of tables or columns, ideally those values come from the code, and not from user parameters.

But if user parameter values are used for targeting different table names and column names, then the parameter values should be mapped to the legal/expected table or column names to make sure unvalidated user input does not end up in the query. Please note, this is a symptom of poor design and a full re-write should be considered.

Example of table name validation

String tableName;
switch(PARAM):
  case "Value1": tableName = "fooTable";
                 break;
  case "Value2": tableName = "barTable";
                 break;
  ...
  default      : throw new InputValidationException("unexpected value provided"
                                                  + " for table name");

The tableName can then be directly appended to the SQL query because it is now known to be one of the legal and expected values for a table name in this query. Keep in mind that generic table validation functions can lead to data loss because table names are used in queries where they are not expected.

For something simple like a sort order, it would be best if the user supplied input is converted to a boolean, and then that boolean is used to select the safe value to append to the query. This is a standard need in dynamic query creation.

For example:

public String someMethod(boolean sortOrder) {
 String SQLquery = "some SQL ... order by Salary " + (sortOrder ? "ASC" : "DESC");`
 ...

Any user input can be converted to a non-string (such as a date, numeric, boolean, or enumerated type). If you convert before the input is appended to a query, or used to select a value to append to the query, this conversion ensures you can append to a query safely.

Input validation is also recommended as a secondary defense in ALL cases. More techniques on how to implement strong whitelist input validation is described in document Open Web Application Security Project (OWASP) Input Validation Cheat Sheet.

Escaping all user-supplied input

This technique should only be used as a last resort, when none of the above are feasible. Input validation is probably a better choice as this methodology is frail compared to other defenses and we cannot guarantee it will prevent all SQL Injection in all situations.

This technique is to escape user input before putting it in a query. Its implementation is very database-specific. It is usually only recommended to retrofit legacy code when implementing input validation is not cost effective. Applications built from scratch, or applications requiring low risk tolerance should be built or re-written using parameterized queries, stored procedures, or some kind of Object Relational Mapper (ORM) that builds your queries for you.

The Escaping works like this. Each DBMS supports one or more character escaping schemes specific to certain kinds of queries. If you then escape all user supplied input using the proper escaping scheme for the database you are using, the DBMS will not confuse that input with SQL code written by the developer, thus avoiding any possible SQL injection vulnerabilities.

There are some libraries and tools that can be used for Input Escaping. For example, OWASP Enterprise Security API or ESAPI is a free, open source, web application security control library that makes it easier for programmers to write lower-risk applications.

The ESAPI libraries are designed to make it easier for programmers to retrofit security into existing applications and serve as a solid foundation for new development.

Additional defenses

Beyond adopting one of the four primary defenses, we also recommend adopting all of these additional defenses in order to provide defense in depth. These additional defenses can be:

Least privilege

To minimize the potential damage of a successful SQL injection attack, you should minimize the privileges assigned to every database account in your environment. Do not assign DBA or admin type access rights to your application accounts. We understand that this is easy, and everything just 'works' when you do it this way, but it is very dangerous. Start from the ground up to determine what access rights your application accounts require, rather than trying to figure out what access rights you need to take away. Make sure that accounts that only need read access are only granted read access to the tables for which they need access. If an account only needs access to portions of a table, consider creating a view that limits access to that portion of the data and assigning the account access to the view instead, rather than the underlying table. Rarely, if ever, grant create or delete access to database accounts.

If you adopt a policy where you use stored procedures everywhere, and do not allow application accounts to directly execute their own queries, then restrict those accounts to only be able to execute the stored procedures they need. Do not grant them any rights directly to the tables in the database.

SQL injection is not the only threat to your database data. Attackers can simply change the parameter values from one of the legal values they are presented with, to a value that is unauthorized for them, but the application itself might be authorized to access. Minimizing the privileges granted to your application will reduce the likelihood of such unauthorized access attempts, even when an attacker is not trying to use SQL injection as part of their exploit.

You should also minimize the privileges of the operating system account that the DBMS runs under. Do not run your DBMS as root or system! Most DBMSs run out of the box with a very powerful system account. For example, MySQL runs as system on Windows by default. Change the DBMS's OS account to something more appropriate, with restricted privileges.

Multiple database users

Web applications designers should avoid using the same owner/admin account in the web applications to connect to the database. Different DB users could be used for different web applications.

In general, each separate web application that requires access to the database could have a designated database user account that the web-app uses to connect to the DB. That way, the designer of the application can have detailed access control, thus reducing the privileges as much as possible. Each DB user will then have select access to what it needs only, and write-access as needed.

As an example, a login page requires read access to the username and password fields of a table, but no write access of any form (no insert, update, or delete). However, the sign-up page certainly requires insert privilege to that table; this restriction can only be enforced if these web apps use different DB users to connect to the database.

SQL views

You can use SQL views to further increase the access detail by limiting read access to specific fields of a table or joins of tables. It could potentially have additional benefits. For example, suppose that the system is required to store the passwords of the users, instead of salted-hashed passwords. The designer could use views to compensate for this limitation; revoke all access to the table (from all database users except the owner or admin), and create a view that outputs the hash of the password field and not the field itself. Any SQL injection attack that succeeds in stealing DB information will be restricted to stealing the hash of the passwords (even a keyed hash), because no database user for any of the web applications has access to the table itself.

Secure the Application

Securing your application is a topic that merits its own book, but the next few pages illustrate some of the more common issues you should look for as part of your deployment process.

What is OWASP?

The Open Web Application Security Project (OWASP) is focused on providing education, tools, and other resources to help developers avoid some of the most common security problems in web-based applications. Resources provided by OWASP include:

  • Tools - OWASP produces tools such as the OWASP Zed Attack Proxy (ZAP), which looks for vulnerabilities during development, OWASP Dependency Check, which looks for known vulnerabilities in your code, and OWASP DefectDojo, which streamlines the testing process.
  • Code projects - OWASP produces the OWASP ModSecurity Core Rule Set (CRS), generic attack detection rules that can be used with web application firewalls, and OWASP CSRFGuard, which helps prevent Cross-Site Request Forgery (CSRF) attacks.
  • Documentation projects - OWASP is perhaps best known for its documentation projects, which include the OWASP Application Security Verification Standard, the OWASP Top Ten, which describes the 10 most common security issues in web applications, and the OWASP Cheat Sheet Series, which explains how to mitigate those issues.

Let’s look at some of the most common of those Top Ten issues.

SQL injection

You have learned about using data in your application, and how to protect it. One of the issues with using data in your application is that if you incorporate user interaction, you can create a potentially dangerous situation.

For example, would you ever want to execute a statement like this?

<p><code>select * from users where username = ‘bob’ and password = ‘pass’; drop table products;</code></p>

Of course not. Because if you did, you would have deleted your products table. But if you are not careful, you could do exactly that. How? Consider this example. Let’s say you have a form:


That is an odd thing to enter, but follow the example through. If you have code that simply integrates what the user typed, you will get the equivalent of this:

username = “bob”
userpass = “‘; drop table products; --”
sqlstatement = “select * from users where username=’”+username+”’ and password=’”+userpass+”’”

If you make that substitution, you get:

sqlstatement = “select * from users where username=’bob’ and password=’‘; drop table products; --’”

In this case, the hacker does not even have to enter a valid password for bob (or any username, for that matter); the important part is that the dangerous statement, drop table products, gets executed no matter what, and that double-dash (--) is a comment that prevents anything after it from causing an error and preventing the statement from running.

How do you prevent it? While it is tempting to think that you can simply “sanitize” the inputs by removing single quotes (‘), that is a losing battle. Instead, prevent it from happening by using parameterized statements. How to achieve this is different for every language and database, but here is how to do it in Python:

   with connection.cursor() as cursor:
         cursor.execute("SELECT * FROM users WHERE username = %(username)s and password = %(userpass)s", {'username': request.args.get('username'), ‘userpass’: scrambled(request.args.get('userpass'))})
         result = cursor.fetchone()

By doing it this way, you are creating these string variables, username and userpass, that are dynamically inserted into the string in such a way that the user cannot use them to create multiple statements.

One place this situation often appears is in regard to search, where by definition the user is entering what will become part of the database statement. Consider this code:

from flask import request
from flask import render_template
sample = Flask(__name__)
DBHOST = 'NOT SET'
def getdb(targethost = ''):
    import MySQLdb
    if targethost == '':
        global DBHOST
        targethost = DBHOST
    return MySQLdb.connect(host=targethost,
                     user="devnetstudent",
                     passwd="pass",
                     db="products")
@sample.route("/")
def main():
    return render_template("index.html")
@sample.route("/test")
def test():
    return "You are calling me from "+request.remote_addr
@sample.route("/config")
def config():
    return render_template("config.html")
@sample.route("/get_config")
def get_config():
    global DBHOST
    return DBHOST
@sample.route("/config_action", methods=['GET', 'POST'])
def config_action():
    global DBHOST
    DBHOST = request.args.get('dbhost')
    return "Saved database host as "+DBHOST
@sample.route("/search")
def search():
    db = getdb()
    cur = db.cursor()
    search_term = request.args.get(‘search_term’)
    cur.execute("select * from products where title like ‘%s’" % search_term)
    output = ""
    for row in cur.fetchall():
        output = output +  str(row[0]) + " -- " + str(row[1]) + "<br />"
    db.close()
    return output
if __name__ == "__main__":
    sample.run(host="0.0.0.0", port=80)

How would you fix this code?

...
@sample.route("/search")
def search():
    db = getdb()
    cur = db.cursor()
    search_term = request.args.get(‘search_term’)
    # Here you are ensuring that the search term is treated as a string value
    cur.execute("select * from products where title like %(search_term)s", {'search_term': search_term})
    output = ""
    for row in cur.fetchall():
        output = output +  str(row[0]) + " -- " + str(row[1]) + "<br />"
    db.close()
    return output
if __name__ == "__main__":
    sample.run(host="0.0.0.0", port=80)

Cross-Site Scripting (XSS)

Cross site scripting attacks happen when user-submitted content that has not been sanitized is displayed to other users. The most obvious version of this exploit is where one user submits a comment that includes a script that performs a malicious action, and anyone who views the comments page has that script executed on their machine.

For example, consider a page displayed by this code:

...
@sample.route("/product_comments")
def search():
    db = getdb()
    cur = db.cursor()
    prod_id = request.args.get(‘prod_id’)
    cur.execute("select * from products where id = %(prod_id)s", {'prod_id': prod_id})
    output = ""
    for row in cur.fetchall():
        output = output +  str(row[0]) + ": " + str(row[1]) + "<br />"
    db.close()
    return output
…

This code simply extracts comment data from the database and displays it on the page. If a user named Robin, were to submit content such as:

<script type="text/javascript">alert("Gotcha!")</script>

Then a user coming to the page would get content that looks like this:

Robin: <script type="text/javascript">alert("Gotcha!")</script>

When that user loads the page, they would see an alert box, triggered by the inserted Javascript.

Now, in this case, we are just displaying an alert, which is harmless. But that script could just as easily have done something malicious, such as stealing cookies, or worse.

The bigger problem is that you are dealing with more than the data that is stored in your database, or “Stored XSS Attacks.” For example, consider this page, which displays content from a request parameter:

...
<h1>Search results for {{ request.args[‘search_term’] }}</h1>
{ for item in cursor }
…

A hacker could trick someone into visiting your page with a link in an email that provides malicious code in a parameter:

http://www.example.com?search_term=%3Cscript%3Ealert%28%27Gotcha%21%27%29%3C%2Fscript%3E

This link, which includes a “url encoded” version of the script, would result in an unsuspecting user seeing a page of:

...
<h1>Search results for <script>alert('Gotcha!')</script></h1>
...

And that, of course, would execute the script.

This is called a Reflected XSS Attack.

So how do you prevent it?

The main strategy is to sanitize content where possible, and if it cannot be sanitized, do not display it.

Experienced web developers usually know to check for malicious content in comments, but there are other places you must check for “untrusted” content. OWASP recommends never displaying untrusted content in the following locations:

  • Inside script tags
  • Inside comments
  • As part of attribute names
  • As part of tag names
  • In CSS (within style tags)

You can display content in some locations, if it is sanitized first. These locations include:

  • As the content of an HTML tag
  • As the value of an attribute
  • As a variable within your Javascript

Sanitizing content can be a complicated process to get right, as you can see from the wide variety of options an attacker has. It is worth it to use a tool that is built just for sanitizing content, such as OWASP Java HTML Sanitizer, HtmlSanitizer, or Python Bleach.

Cross-Site Request Forgery (CSRF)

Another type of attack that shares some aspects of XSS attacks is Cross Site Request Forgery (CSRF), sometimes pronounced “Sea Surf.” In both cases, the attacker intends for the user to execute the attacker’s code, usually without even knowing it. The difference is that CSRF attacks are typically aimed not at the target site, but rather at a different site, one into which the user has already authenticated.

Here is an example. Let’s say the user logs into their bank website, h​ttp://greatbank.example.com. In another window, they are on a discussion page that includes an interesting looking link, and they click it.

Unfortunately, the link was for h​ttp://greatbank.example.com/changeemail?new_email=attacker@e​xample.com. The browser thinks this is just a normal link, so it calls the URL, sending the cookies for greatbank.example.com, which, as you recall, include the user’s authentication credentials. As far as the bank is concerned, this request came from the user, and it executes the change. Now the attacker can go ahead and change the user’s password, then log into the bank and do whatever damage they want.

Note that even if the user is smart enough not to click on a strange link like that, if a site is vulnerable to XSS attacks, this attack can be carried out without the user having to do anything. A carefully crafted <img> tag can achieve the same result.

An interesting aspect of CSRF is that the attacker never actually gets the results of the attack; they can only judge the results after the fact, and they have to be able to predict what the effects will be to take advantage of a successful attack.

CSRF attacks are notoriously difficult to prevent, but not impossible. One method is to include a hidden token that must accompany any requests from the user. For example, that bank login form might look like this:

...
<form action="https://greatbank.example.com" method="POST">
Username: <input type="text" name="username" style="width: 200px" />
Password: <input type="text" name="password" style="width: 200px" />
<br >
<input type="hidden" name="CSRFToken" value="d063937d-c117-46e6-8354-6f5d8faff095" />
<input type="submit" value="Log in">
</form>
…

That CSRFToken has to accompany every request from the user for it to be considered legitimate. Because it is impossible for the attacker to predict that token, a request such as https://greatbank.example.com/changeemail?new\_eamail=attacker@example.com will automatically be rejected.

The OWASP Top Ten

The OWASP Top 10

Now that you know about three of the most well-known attacks, here is the entire OWASP Top 10 list.

  • Injection - This item consists of all sorts of injection attacks. We talked earlier about SQL injection, but this is only the most common. All databases, such as LDAP databases, Hibernate databases, and others, are potentially vulnerable. In fact, any action that relies on user input is vulnerable, including direct commands. You can mitigate these types of attacks by using parameterized APIs, escaping user input, and by using LIMIT clauses to limit exposure in the event of a breach.
  • Broken Authentication - This item relates to multiple problems with user credentials, from stolen credentials database to default passwords shipped with a product. You can mitigate these attacks by avoiding default passwords, by requiring multi-factor authentication, and using techniques such as lengthening waiting periods after failed logins.
  • Sensitive Data Exposure - This item refers to when attackers steal sensitive information such as passwords or personal information. You can help to prevent these attacks by storing as little personal information as possible, and by encrypting the information you do store.
  • XML External Entities (XXE) - This item refers to attacks made possible by a feature of XML that enables users to incorporate external information using entities. You can solve this problem by disabling XML Entity and DTD processing, or by simply using another format, such as JSON, instead of XML.
  • Broken Access Control - This item refers to the need to ensure that you have not built an application that enables users to circumvent existing authentication requirements. For example, attackers should not be able to access admin functions just by browsing directly to them. In other words, do not rely on “security through obscurity”. Make sure to protect all resources and functions that need to be protected on the server side, ensuring that any and all access really is authorized.
  • Security Misconfiguration - This item refers to the need to ensure that the system itself is properly configured. Is all software properly patched and configured? Is the firewall running? Prevention of these types of problems requires careful, consistent hardening of systems and applications. Reduce the attack surface that is available. To do this, only install the services you actually need, and try to separate out components that are not related to different systems to reduce the attack surface further.
  • Cross-Site Scripting (XSS) - This item refers to the ability for an attacker to use the dynamic functions of a site to inject malicious content into the page, either in a persistent way, such as within the body of comments, or as part of a single request. Mitigating these problems requires careful consideration of where you are including untrusted content in your page, as well as sanitizing any untrusted content you do include.
  • Insecure Deserialization - This item describes issues that can occur if attackers can access, and potentially change, serialized versions of data and objects, that is, text versions of objects that can be reconstituted into objects by the server. For example, if a user’s information is passed around as a JSON object that includes their access privileges, they could conceivably give themselves admin privileges by changing the content of that object. Because objects can include executable code, this exploit can be particularly dangerous, even if it is not necessarily simple to exploit. To help prevent issues, do not accept serialized objects from untrusted sources, or if you must, ensure validation before deserializing the objects.
  • Using Components with Known Vulnerabilities - One of the advantages today’s developers have is that most of the core functions you are trying to perform have probably already been written and included in an existing software package, and it is probably open source. However, many of the packages that are available also include publicly available exploits. To fix this, ensure that you are using only necessary features and secure packages, downloaded from official sources, and verified with a signature.
  • Insufficient Logging and Monitoring - This item reminds you that your most basic responsibility is to ensure that you are logging everything important that is happening in your system so that you can detect attacks, preferably before they succeed. It is particularly important to ensure that your logs are in a common format so that they can be easily consumed by reporting tools, and that they are auditable to detect (or better yet prevent) tampering.

Evolution of Password Systems

Simple Plaintext Passwords

The first passwords were simple plaintext ones stored in databases. These allowed multiple users using the same core processor to have unique privacy settings. This was before sophisticated hacking networks and password-cracking programs came into being.

The rule for plaintext passwords is very simple: Just store them. You have a database with a table for all your users, and it would probably look something like (id, username, password). After the account is created, you store the username and password in these fields as plaintext, and on login, you extract the row associated with the inputted username and compare the inputted password with the password from the database. If it matches, you let your user in. Perfectly simple and easy to implement.

The following sample shows how to create/verify a new user profile with plaintext format:

######################################Plain Text #########################################################
@app.route('/signup/v1', methods=['POST'])
def signup_v1():
    conn = sqlite3.connect(db_name)
    c = conn.cursor()
    c.execute('''CREATE TABLE IF NOT EXISTS USER_PLAIN
           (USERNAME  TEXT    PRIMARY KEY NOT NULL,
            PASSWORD  TEXT    NOT NULL);''')
    conn.commit()
    try:
        c.execute("INSERT INTO USER_PLAIN (USERNAME,PASSWORD) "
                  "VALUES ('{0}', '{1}')".format(request.form['username'], request.form['password']))
        conn.commit()
    except sqlite3.IntegrityError:
        return "username has been registered."
    print('username: ', request.form['username'], ' password: ', request.form['password'])
    return "signup success"

Verify the new signed-up user account with login function in plaintext format, and you can see that user profile's data all are in plaintext format.

def verify_plain(username, password):
    conn = sqlite3.connect('test.db')
    c = conn.cursor()
    query = "SELECT PASSWORD FROM USER_PLAIN WHERE USERNAME = '{0}'".format(username)
    c.execute(query)
    records = c.fetchone()
    conn.close()
    if not records:
        return False
    return records[0] == password
@app.route('/login/v1', methods=['GET', 'POST'])
def login_v1():
    error = None
    if request.method == 'POST':
        if verify_plain(request.form['username'], request.form['password']):
            error = 'login success'
        else:
            error = 'Invalid username/password'
    else:
        error = 'Invalid Method'
    return error

Obviously, plaintext is an insecure way of storing passwords. If your database was hacked, your user's passwords would be exposed to hackers directly.

Password Hashing

Storing passwords is risky and complex at the same time. A simple approach to storing passwords is to create a table in your database that maps a username with a password. When a user logs in, the server gets a request for authentication with a payload that contains a username and a password. We look up the username in the table and compare the password provided with the password stored. A match gives the user access to the application. The security strength and resilience of this model depends on how the password is stored. The most basic, but also the least secure, password storage format is cleartext.

Storing passwords in cleartext is the equivalent of writing them down in a piece of digital paper. If an attacker was to break into the database and steal the passwords table, the attacker could then access each user account. This problem is compounded by the fact that many users re-use or use variations of a single password, potentially allowing the attacker to access other services different from the one being compromised. The attack could come from within the organization. A software engineer with access to the database could abuse that access power, retrieve the cleartext credentials, and access any account.

Hashing

A more secure way to store a password is to transform it into data that cannot be converted back to the original password. This mechanism is known as hashing.

By dictionary definition, hashing refers to "chopping something into small pieces" to make it look like a "confused mess". That definition closely applies to what hashing represents in computing.

In cryptography, a hash function is a mathematical algorithm that maps data of any size to a bit string of a fixed size. This function input can be referred to as message or simply as input. The fixed-size string function output is known as the hash or the message digest. As stated by OWASP, hash functions used in cryptography have the following key properties:

  • It is easy and practical to compute the hash, but difficult or impossible to re-generate the original input if only the hash value is known.
  • It is difficult to create an initial input that would match a specific desired output.

Thus, in contrast to encryption, hashing is a one-way mechanism. The data that is hashed cannot be practically "unhashed".

Hashing example

Here is a code example to import the constructor method of the SHA-256 hash algorithm from the hashlib module:

from hashlib import sha256
# Create an instance of the sha256 class
h = sha256()
# Uses the update() method to update the hash object
h.update(b'devnetpassword1')
# Uses the hexdigest() method to get the digest of the string passed to the update() method
hash = h.hexdigest()
# The digest is the output of the hash function.
# Print the hash variable to see the hash value in the console:
print(hash)

As an example, you would see the following output:

a75e46e47a3c4cf3aaefe1e549949c90e90e0fe306a2e37d2880702a62b0ff31

Salted password

There are many passwords are encrypted but can be guessed. They have been mined from hacked sites and placed into lists. These lists have made hashed passwords much easier to crack. To guarantee the uniqueness of the passwords, increase their complexity, and prevent password attacks even when the inputs are the same, a salt (which is simply random data) is added to the input of a hash function. This is known as a salted password.

Using cryptographic hashing for more secure password storage

The irreversible mathematical properties of hashing make it a phenomenal mechanism to conceal passwords at rest and in motion. Another critical property that makes hash functions suitable for password storage is that they are deterministic. A deterministic function is a function that, given the same input, always produces the same output. This is vital for authentication because you need to have the guarantee that a given password will always produce the same hash. Otherwise, it would be impossible to consistently verify user credentials with this technique.

Adding salt to password hashing

A salt is added to the hashing process to force hash uniqueness. It increases complexity without increasing user requirements, and mitigates password attacks such as rainbow tables. In cryptography, salting hashes refers to adding random data to the input of a hash function to guarantee a unique output, which is the hash, even when the inputs are the same. Consequently, the unique hash produced by adding the salt can protect against different attack vectors, while slowing down dictionary and brute-force attacks.

Sample

Hashed passwords are not unique to themselves due to the deterministic nature of hash function: when given the same input, the same output is always produced. If devnet_alice and devnet_bob both choose devnetpassword1 as a password, their hash would be the same:

username

hash

devnet_alice

0e8438ea39227b83229f78d9e53ce58b7f468278c2ffcf45f9316150bd8e5201

devnet_ava

a75e46e47a3c4cf3aaefe1e549949c90e90e0fe306a2e37d2880702a62b0ff31

devnet_bob

0e8438ea39227b83229f78d9e53ce58b7f468278c2ffcf45f9316150bd8e5201

devnet_blaine

6421e62bf41b6d52963b42d5467e25ed18d0ef26e5dfde8825e639600d2d9698

devnet_devon

9314342333718a996b107ff2de51e8105466a9f48310f1b47b679f64d60f5264

devnet_dave

5d86d07ab6c68ccdeab2815b26598c6d9ce0db92f455d499f70bca5067cc841c

And this example reveals, devnet_alice and devnet_bob have the same password because we can see that both share the same hash: 0e8438ea39227b83229f78d9e53ce58b7f468278c2ffcf45f9316150bd8e5201.

The attacker can better predict the password that legitimately maps to that hash. After the password is known, the same password can be used to access all the accounts that use that hash.

Mitigating password attacks with a salt

To mitigate the damage that a rainbow table or a dictionary attack could do, salt the passwords. According to OWASP Guidelines, a salt is a fixed-length cryptographically-strong random value that is added to the input of hash functions to create unique hashes for every input, regardless of whether the input is unique. A salt makes a hash function look non-deterministic, which is good as you do not want to reveal password duplications through your hashing.

Let’s say that you have password devnet_password1 and the salt salt706173776f726473616c74a. You can salt that password by either appending or prepending the salt to it. For example: devnetpassword1salt706173776f726473616c74a or salt706173776f726473616c74adevnetpassword1 are valid salted passwords. After the salt is added, you can then hash it and get different hash values: cefee7f060ed49766d75bd4ca2fd119d7fcabe795b9425f4fa9d7115f355ab8c and d00c162358af0e645b90bf291836cbf7d523157baf85e96492b151e0624ee041

Let's say devnet_alice and devnet_bob decide to use both the same password, devnetpassword1. For devnet_alice, we'll use salt706173776f726473616c74a again as the salt. However, for devnet_bob, we'll use salt706173776f726473616253b as the salt:

Hashed and salted password examples

User: devnet_alice

Password: devnetpassword1

Salt: salt706173776f726473616c74a

Salted input: devnetpassword1salt706173776f726473616c74a

Hash (SHA-256): cefee7f060ed49766d75bd4ca2fd119d7fcabe795b9425f4fa9d7115f355ab8c

User: devnet_bob

Password: devnetpassword1

Salt: salt706173776f726473616253b

Salted input: devnetpassword1salt706173776f726473616253b

Hash (SHA-256): 41fffe05d7aca370abaff6762443d9326ce22107783b8ff5bb0cf576020fc1d5

Different users, same password. Different salts, different hashes. If someone looked at the full list of password hashes, no one would be able to tell that devnet_alice and devnet_bob both use the same password. Each unique salt extends the password devnetpassword1 and transforms it into a unique password.

In practice, store the salt in cleartext along with the hash in your database. You would store the salt salt706173776f726473616c74a, the hash cefee7f060ed49766d75bd4ca2fd119d7fcabe795b9425f4fa9d7115f355ab8c, and the username together so that when the user logs in, you can lookup the username, append the salt to the provided password, hash it, and then verify if the stored hash matches the computed hash.

Additional factors for authentication

Even with the incorporation of these password strength checkers, single-factor authentication still leaves your enterprise vulnerable. Hackers can crack any password, it may take significant time, but they can. Moreover, they could always steal passwords via social engineering. Thus, you should incorporate multi-factor authentication (MFA). Incorporating other authentication factors confounds hackers who may have cracked your password.

Single-factor authentication (SFA)

Single-factor authentication is the simplest form of authentication methods. With SFA, a person matches one credential to verify himself or herself online. The most popular example of this would be a password (credential) to a username. Most verification today uses this type of authentication method.

What are the risks of single-factor authentication?

Online sites can have users' passwords leaked by a hacker. Without an additional factor to your password to confirm your identity, all a malicious user needs is your password to gain access. Hopefully it is not a website that has additional personal information stored, such as your credit card information, home address, or other personal information used to identify you.

Often a user's password is simple so that it is easy to remember. The simpler the password, the easier it is to crack. A malicious user may guess your password because they know you personally, or because they were able to find out certain things about you, such as your birthdate, favorite actor/actress or pet’s name. A malicious user may also crack your password by using a bot to generate the right combination of letters/numbers to match your simple, secret identification method. In either example, it is going to be a hassle to recover your account(s). Hopefully your simple password is not being reused with other online entities.

Single-factor authentication (SFA) is quickly becoming the weak link of security measures, similar to cross domains. There is a growing number of products, websites and apps that offer two-factor and multi-factor authentication. Whether it is just two factors, or three or more, MFA is the way to make your accounts much harder for attackers to access.

Two-factor authentication (2FA)

Two-factor authentication uses the same password/username combination, but with the addition of being asked to verify who a person is by using something only he or she owns, such as a mobile device. Putting it simply: it uses two factors to confirm an identity.

Multi-factor authentication (MFA)

Multi-factor authentication (MFA) is a method of computer access control in which a user is only granted access after successfully presenting several separate pieces of evidence to an authentication mechanism. Typically at least two of the following categories are required for MFA: knowledge (something they know); possession (something they have), and inherence (something they are).

2FA is a subset of that. It is just a type of MFA where you only need two pieces of evidence, that is, two “factors”. When you log in to Google, Twitter, or LinkedIn, or you make a purchase on Amazon, you can use their two-step validation to require your password (something you know) and a special text sent to your phone (something you have). If you do not have your password and your phone, you do not get in.

Password Cracking

Techniques used for finding a password that allows entry is known as "cracking" the security intended by the password.

Password guessing

Password guessing is an online technique that involves attempting to authenticate a particular user to the system. Password guessing may be detected by monitoring the failed login system logs. Clipping levels are used to differentiate between malicious attacks and normal users accidentally mistyping their passwords. Clipping levels define a minimum reporting threshold level. Using the password guessing example, a clipping level might be established so that the audit system only alerts if failed authentication occurs more frequently than five times in an hour for a particular user. Clipping levels can help to differentiate the attacks from legitimate mistakes; however they can also cause false negatives if the attackers can glean the threshold beneath which they must operate.

Preventing successful password guessing attacks is typically done with account lockouts. Account lockouts are used to prevent an attacker from being able to simply guess the correct password by attempting a large number of potential passwords.

Dictionary attack

In cryptanalysis and computer security, a dictionary attack is a form of brute force attack for defeating a cipher or authentication mechanism by trying to determine its decryption key or passphrase by trying hundreds or sometimes millions of likely possibilities, such as words in a dictionary.

A dictionary attack is based on trying all the strings in a pre-arranged listing, typically derived from a list of words such as in a dictionary (hence the phrase dictionary attack). In contrast to a brute force attack, where a large proportion of the key space is searched systematically, a dictionary attack tries only those possibilities which are deemed most likely to succeed.

Dictionary attacks often succeed because many people have a tendency to choose short passwords that are ordinary words or common passwords, or simple variants obtained, for example, by appending a digit or punctuation character. Dictionary attacks are relatively easy to defeat, for example, by using a passphrase or otherwise choosing a password that is not a simple variant of a word found in any dictionary or listing of commonly used passwords.

Pre-computed dictionary attack or rainbow table attack

It is possible to achieve a time/space tradeoff by pre-computing a list of hashes of dictionary words, and storing these in a database using the hash as the key. This requires a considerable amount of preparation time, but allows the actual attack to be executed faster. The storage requirements for the pre-computed tables were once a major cost, but are less of an issue today because of the low cost of disk storage. Pre-computed dictionary attacks are particularly effective when a large number of passwords are to be cracked. The pre-computed dictionary need be generated only once, and when it is completed, password hashes can be looked up almost instantly at any time to find the corresponding password. A more refined approach involves the use of rainbow tables, which reduce storage requirements at the cost of slightly longer lookup-times. Pre-computed dictionary attacks, or "rainbow table attacks", can be thwarted by the use of salt, a technique that forces the hash dictionary to be recomputed for each password sought, making pre-computation infeasible, provided the number of possible salt values is large enough.

Social engineering

Social engineering for password cracking involves a person convincing or tricking another person into supplying the attacker with access.

Information security culture

Employee behavior can have a big impact on information security in organizations. Cultural concepts can help different segments of the organization work effectively towards information security. "Exploring the Relationship between Organizational Culture and Information Security Culture" provides the following definition of information security culture: "ISC is the totality of patterns of behavior in an organization that contribute to the protection of information of all kinds." (Reference: Lim, J. S., Chang, S., Maynard, S., & Ahmad, A. (2009). Exploring the Relationship between Organizational Culture and Information Security Culture.)

Techniques

All social engineering techniques are based on cognitive biases, where humans make decisions based on known or assumed attributes. Cognitive biases are exploited in various combinations to create attack techniques, such as stealing an employee's private information. Phone calls and conversations are the most common type of social engineering. Another example of a social engineering attack would be someone posing as a service provider such as exterminators, technicians, or safety inspectors to gain access to people or information systems.

Techniques can involve gaining trust through conversations on social media sites that lead to eventually asking for banking information or valuable passwords.

Another example would be using an authoritative, trusted source of information like a company bulletin board to change the help desk number to a false number to collect information. When an employee reads the posted, but fake, information on the board and calls the number, they are more willing to give over passwords or other credentials because they believe the post on the board is real.

Social engineering relies heavily on the six principles of influence established by Robert Cialdini in his book, Influence.

Six key principles of human influence

  • Reciprocity – Our social norms mean that we tend to return a favor when asked. You can imagine this works well for phone conversations to gain information.
  • Commitment and consistency – When people commit, whether in person, in writing, or on a web site, they are more likely to honor that commitment in order to preserve their self-image. An example is the checkbox on a popover on a web site stating, "I'll sign up later." When someone checks it, even when that original incentive goes away, they tend to honor that social contract, or go ahead and sign up so that they do not feel dishonest.
  • Social proof – When people see someone else doing something, such as looking up, others will stop to do the same. This type of conformity lets people manipulate others into sharing information they would not usually leak.
  • Authority – People nearly always obey authority figures, even if they are asked to do a dangerous or harmful task to another person. This authority principle means that attackers who seem to be authoritative or representing an authority figure are more likely to gain access.
  • Liking – Likable people are able to persuade others more effectively. This principle is well-known by sales people. People are easily persuaded by familiar people whom they like.
  • Scarcity – When people believe or perceive that something is limited in amount or only available for a short time, people will act positively and quickly to pick up the desired item or offer.

There are four social engineering vectors, or lines of attack, that can take advantage of these influence principles.

  • Phishing means the person is fraudulently gaining information, especially through requests for financial information. Often the attempts look like a real web site or email, but link to a collector site instead.
  • Vishing stands for "voice phishing," so it is associated with voice phone calls to gather private personal information for financial gain.
  • Smishing involves using SMS text messaging for both urgency and asking for a specific course of action, such as clicking a fake link or sending account information.
  • Impersonation involves in-person scenarios such as wearing a service provider uniform to gain inside access to a building or system.

Password strength

Password strength is the measure of a password’s efficiency to resist password cracking attacks. The strength of a password is determined by:

Length - This is the number of characters the password contains.

Complexity - This means it uses a combination of letters, numbers, and symbols.

Unpredictability - Is it something that can be guessed easily by an attacker?

Here is a practical example of three passwords:

  • password
  • passw0rd123
  • #W)rdPass1$
The word password used as a password is very weak. It costs no time to crack it.

The password used is passw0rd123, but the strength of it is still very weak. Again, it takes little time to crack it.

In this example, the password used is #W)rdPass1$, and it has strength. One estimate is that it would take about 21 years to crack it.

Password strength checkers and validation tools

An example of a password strength checker is a password manager. This type of tool is a best approach to ensure that you are using security strength password. In general, the password strength validation tool is built in with password system as the input validator to make sure the user's password is compatible with latest identity management guidelines.

Best practices

There are a few best practices to secure user login attempts. These include notifying users of suspicious behavior, and limiting the number of password and username login attempts.

NIST Digital Identity Guidelines

Here's a brief summary of the NIST 800-63B Digital Identity Guidelines:

  • 8-character minimum when a human sets it
  • 6-character minimum when set by a system/service
  • Support at least 64 characters maximum length
  • All ASCII characters (including space) should be supported
  • Truncation of the secret (password) shall not be performed when processed
  • Check chosen password with known password dictionaries
  • Allow at least 10 password attempts before lockout
  • No complexity requirements
  • No password expiration period
  • No password hints
  • No knowledge-based authentication such as, who was your best friend in high school?
  • No SMS for two-factor authentication, instead use a one-time password from an app like Google Authenticator

You can read more about NIST Special Publication 800-63B on the National Institute of Standards and Technology site.

Lab – Explore the Evolution of Password Methods

In this lab, you will create an application that stores a username and password in plaintext in a database using python code. You will then test the server to ensure that not only were the credentials stored correctly, but that a user can use them to login. You will then perform the same actions, but with a hashed password so that the credentials cannot be read. It is important to securely store credentials and other data to prevent different servers and systems from being compromised.

You will complete the following objectives:

  • Part 1: Launch the DEVASC VM
  • Part 2: Explore Python Code Storing Passwords in Plain Text
  • Part 3: Explore Python Code Storing Passwords Using a Hash

Summary: Application Deployment and Security

What Did I Learn in this Module?

Understanding Deployment Choices with Different Models

Typically, large organizations use a four-tier structure: development, testing, staging, and production. In the early days of computers, there was only one way to deploy your software; you simply installed it on the computer itself. This model is known as “bare metal.” You now have other options including virtual machines, containers, and serverless computing. “On-premises” means any system that is literally within the confines of your building. Likely this would be traditional data centers that house individual machines which are provisioned for applications, rather than clouds, external or otherwise. Clouds provide self-service access to computing resources, such as VMs, containers, and even bare metal. The advantage of a private cloud is that you have complete control over where it’s located. A public cloud is essentially the same as a private cloud, but it is managed by a public cloud provider. Public cloud customers may share resources with other organizations. A hybrid cloud is the combination of two different types of clouds. Typically, hybrid cloud is used to bridge a private cloud and a public cloud within a single application. An edge cloud moves computing closer to where it’s needed. Instead of transactions making their way from an end user in Cleveland, to the main cloud in Oregon, there may be an intermediary cloud, an edge cloud, in Cleveland. The edge cloud processes the data or transaction. It then either sends a response back to the client, or does preliminary analysis of the data and sends the results on to a regional cloud that may be farther away.

Creating and Deploying a Sample Application

A container is way of encapsulating everything you need to run your application, so that it can easily be deployed in a variety of environments. Docker is a way of creating and running that container. A “makefile” is the file the make utility uses to compile and build all the pieces of the application. In Docker, this is a simple text file called a Dockerfile. It defines the steps that the docker build command needs to take to create an image that can then be used to create the target container. Here are some of the available Dockerfile commands: FROM, MAINTAINER, RUN, CMD, EXPOSE, ENV, COPY, ENTRYPOINT, VOLUME, USER, WORKDIR, ARG, ONBUILD, STOPSIGNAL, and LABEL. You can use your image to create a new container and actually do some work. To do that, you want to run the image:

$ docker run -d -P sample-app-image
1688a2c34c9e7725c38e3d9262117f1124f54685841e97c3c5225af88e30bfc5

In this case, you’ve specified several parameters. The -d parameter is short for --detach and says you want to run it in the background, and -P tells Docker to publish it on the ports that you exposed (in this case, 8080).

You can make your image available for other people to use by storing it in an image registry. By default, Docker uses the Docker Hub registry, though you can create and use your own registry.

The development environment is meant to be convenient to the developer; it only needs to match the production environment where it’s relevant. A typical development environment can consist of any number of tools, from IDEs to databases to object storage. You built and ran a sample app in the hands-on lab, and you can set up so that you can run the sample app locally. It’s recommended that you follow the steps in an Ubuntu VM, especially if you're using a Windows computer. You can deploy your application on bare metal or as a VM. You also have the option to deploy it as a containerized solution.

Continuous Integration/Continuous Deployment (CI/CD)

CI/CD is a philosophy for software deployment that figures prominently in the field of DevOps. DevOps itself is about communication and making certain that all members of the team are working together to ensure smooth operation. The idea behind Continuous Integration is that you, and all other developers on the project, continually merge your changes with the main branch of the existing application. This means that any given change set is small and the potential for problems is low. If everyone is using the main branch, anyone who checks out code is going to have the latest version of what everyone else is developing. Here are some benefits that come with using CI/CD for development:

  • Integration with agile methodologies
  • Shorter Mean Time To Resolution (MTTR)
  • Automated deployment
  • Less disruptive feature releases
  • Improved quality
  • Improved time to market

A deployment pipeline, can be created with a build tool such as Jenkins. These pipelines can handle tasks such as gathering and compiling source code, testing, and compiling artifacts such as tar files or other packages. The fundamental unit of Jenkins is the project, also known as the job. You can create jobs that do all sorts of things, from retrieving code from a source code management repo such as GitHub, to building an application using a script or build tool, to packaging it up and running it on a server.

Networks for Application Development and Security

These days, you must consider networking for all but the simplest of use cases. This is especially true when it comes to cloud and container deployments. Some of the applications you need to consider when it comes to cloud deployment include: Firewalls, Load balancers, DNS, and Reverse proxies. At its most basic level, a firewall accepts or rejects packets based on the IP addresses and ports to which they're addressed. A load balancer takes requests and balances them by spreading them out among multiple servers. DNS is how servers on the internet translate human-readable names into machine-routable IP addresses. IP addresses are required to navigate the internet. A reverse proxy is similar to a regular proxy; however, while a regular proxy works to make requests from multiple computers look like they all come from the same client, a reverse proxy works to make sure responses look like they all come from the same server.

Securing Applications

You must secure data when it is at rest. There are two methods for encrypting data: one-way encryption, and two-way encryption. Data is also vulnerable when it’s being transmitted. When your data is in motion it is vulnerable to “man in the middle” attacks, in which a server along the way can observe, steal, and even change the data as it goes by. To prevent this, you can use several techniques including: SSH, TLS, and VPN. SQL injection is a code injection technique that is used to attack data-driven applications, in which malicious SQL statements are inserted into an entry field for execution. SQL injection must exploit a security vulnerability in an application's software. SQL injection vulnerability exists because some developers do not care about data validation and security. There are tools that can help detect flaws and analyze code. OWASP is focused on providing education, tools, and other resources to help developers avoid some of the most common security problems in web-based applications. Resources provided by OWASP include: tools, code projects, and documentation projects.

XSS attacks happen when user-submitted content that hasn’t been sanitized is displayed to other users. The most obvious version of this exploit is where one user submits a comment that includes a script that performs a malicious action, and anyone who views the comments page has that script executed on their machine. Another type of attack that shares some aspects of XSS attacks is CSRF. In both cases, the attacker intends for the user to execute the attacker’s code, usually without even knowing it. The difference is that CSRF attacks are typically aimed not at the target site, but rather at a different site, one into which the user has already authenticated.

Here is the entire OWASP Top 10 list of attack types:

  • Injection
  • Broken Authentication
  • Sensitive Data Exposure
  • XML External Entities (XXE)
  • Broken Access Control
  • Security Misconfiguration
  • Cross-Site Scripting (XSS)
  • Insecure Deserialization
  • Using Components with Known Vulnerabilities
  • Insufficient Logging and Monitoring

The first passwords were simple plaintext ones stored in databases. A more secure way to store a password is to transform it into data that cannot be converted back to the original password, known as hashing. To guarantee the uniqueness of the passwords, increase their complexity, and prevent password attacks even when the inputs are the same, a salt (which is simply random data) is added to the input of a hash function. 2FA uses the same password/username combination, but with the addition of being asked to verify who a person is by using something only he or she owns, such as a mobile device. With MFA, a user is only granted access after successfully presenting several separate pieces of evidence to an authentication mechanism. Typically at least two of the following categories are required for MFA: knowledge (something they know); possession (something they have), and inherence (something they are).

Password guessing is an online technique that involves attempting to authenticate a particular user to the system. In cryptanalysis and computer security, a dictionary attack is a form of brute force attack technique for defeating a cipher or authentication mechanism by trying to determine its decryption key or passphrase by trying hundreds or sometimes millions of likely possibilities. Pre-computing a list of hashes of dictionary words, and storing these in a database using the hash as the key requires a considerable amount of preparation time, but allows the actual attack to be executed faster. Social engineering for password cracking involves a person convincing or tricking another person for access. Password strength is the measure of a password’s efficiency to resist password cracking attacks. The strength of a password is determined by: length, complexity, and unpredictability.

Module 6: Application Deployment and Security Quiz

  1. In serverless computing, which term refers to the ability for resources surrounding an app to change and adjust capacity as needed?

    Topic 6.1.0 - In a serverless computing deployment model, the service provider provides resource capacity for customer applications. The resources and their capacity change as need changes and this is referred to as being elastic.

  2. Which Linux-based platform is used to create, run, and manage containers in a virtual environment?

    Topic 6.2.0 - Container engines create, run, and manage containers. Docker is a very popular container engine.

  3. What is Bash?

    Topic 6.2.0 - Bash is the name of a Linux script engine that lets a user do things from the command line. It is the default shell for most Linux distributions.

  4. Which load balancing technique will check the load status of multiple hosting servers and send the next incoming request to the server with the lowest load?

    Topic 6.4.0 - In the least connections request servicing method, the load balancer will check the load status of multiple hosting servers and send the next incoming request to the lowest load server.

  5. Which web application attack involves an attacker accessing, and potentially changing, serialized versions of data and objects?

    Topic 6.5.0 - An insecure deserialization attack occurs when an attacker gains access to, and potentially changes, serialized versions of data and objects. This attack can be mitigated by ensuring validation before deserializing objects.

  6. Which social engineering technique is carried out by someone attempting to gain access to a building by wearing a delivery service uniform?

    Topic 6.5.0 - Impersonation is a social engineering attack used to gain access to a system or network. Unlike other forms of social engineering attacks, impersonation occurs in person. The attacker impersonates someone whom others are likely to trust in an attempt to gain access to restricted resources.

  7. A company has remote employees who need to connect to the company network in order to participate in meetings and to share the data and progress of application development. Which data transportation security technique can be implemented to allow remote employees to securely connect to the company private network?

    Topic 6.5.0 - Data is vulnerable when it is transmitted over an insecure public network such as the internet. A company can use a virtual private network (VPN) to securely connect remote workers to the internal network and protect development and deployment resources as well as applications.

  8. Which two attacks target web servers through exploiting possible vulnerabilities of input functions used by an application? (Choose two.)

    Topic 6.5.0 - When a web application uses input fields to collect data from clients, threat actors may exploit possible vulnerabilities for entering malicious commands. The malicious commands that are executed through the web application might affect the OS on the web server. SQL injection and cross-site scripting are two different types of command injection attacks.

  9. Which statement describes the term containers in virtualization technology?

    Topic 6.1.0 - In a virtualization environment, containers are a specialized "virtual area" where multiple applications can run independently of each other while sharing the same OS and hardware. By sharing the host operating system, most of the software resources are reused, which leads to reduced boot time and optimized operation.

  10. A threat actor has used malicious commands to trick the database into returning unauthorized records and other data. Which web front-end vulnerability is the threat actor exploiting?

    Topic 6.5.0 - Web front-end vulnerabilities apply to apps, APIs, and services. Some of the most significant vulnerabilities are as follows:

    • Cross-site scripting: In a cross-site scripting (XSS) attack, the threat actor injects code, most often JavaScript, into the output of a web application. This forces client-side scripts to run the way that the threat actor wants them to run in the browser.
    • SQL injections: In a SQLi the threat actor targets the SQL database itself, rather than the web browser. This allows the threat actor to control the application database.
    • Broken authentication: Broken authentication includes both session management and protecting the identity of a user. A threat actor can hijack a session to assume the identity of a user especially when session tokens are left unexpired.
    • Security misconfiguration: Security misconfiguration consists of several types of vulnerabilities all of which are centered on the lack of maintenance to the web application configuration.
  11. What are three characteristics of a virtual machine? (Choose three.)

    Topic 6.1.0 - A virtual machine is a software emulation of a physical server including a CPU, memory, network interface, and operating system. The hypervisor is virtualization software that performs hardware abstraction. It allows multiple VMs to run concurrently in the virtual environment.

  12. What is a characteristic of the development environment in the four-tier deployment environment structure?

    Topic 6.1.0 - There are four deployment environments: Development, testing, staging, and production. The first environment is the development environment which is where coding takes place.

  13. What is CI/CD?

    Topic 6.3.0 - CI/CD (continuous integration/continuous delivery) is a philosophy for software deployment that figures prominently in the field of DevOps.

Ref : [1]