You have learned about scaling automation. Now it is time to learn to automate the infrastructure of networks. This is a big change in the way experts think about networks, from design to creation and management. As a NetAcad student, you could not have picked a better time to explore automation and networking!
Sure, there are many tools for automating many tasks, and specialty tools for simulating a network for testing. But even more valuable to network engineers is speed with precision and safe self-service options. Puppet, Ansible, and Chef all work well for these use cases. In addition, pyATS and VIRL are tools that have been specifically created for the network.
Automating Infrastructure with Cisco
Introduction to Automating Infrastructure
In this module you will learn about automation. Automation is using code to configure, deploy, and manage applications together with the compute, storage, and network infrastructures, and the services on which they run.
The tools in this area include Ansible, Puppet, and Chef, to name a few. For automation with Cisco infrastructure, the platforms can integrate with those common tools, or provide direct API access to the programmable infrastructure. Whether in a campus/branch configuration, in your own data center, or as a service provider, there are Cisco tools that do more than network configuration management.
When you understand what automation is and what it can do for you, you'll be ready to visit the Cisco DevNet Automation Exchange to explore solutions that can work for you.
Cisco Automation Solutions
There are several use cases for automation for the network. Depending on the operational model you want to follow, you have choices in how to programmatically control your network configurations and infrastructure. Let’s look at Cisco automation through the DevNet Automation Exchange to understand the levels of complexity and choices available.
Walk: Read-only automation
Using automation tools, you can gather information about your network configuration. This scenario offers answers to the most basic and common question you can ask, which is "What changed?"
By gathering read-only data, you minimize risk of causing a change that could break your network environment. Using GET requests is also a great way to start, by writing code solutions to data collection tasks. Plus, you can use a read scenario to audit configurations and do the next natural step, which is to put the configuration back into compliance. In the Automation Exchange, this shift is categorizes as a walk-run-fly progression.
Run: Activate policies and provide self-service across multiple domains
With these "Run stage" automation scenarios, you can safely enable users to provision their own network updates. You can also automate on-boarding workflows, manage day-to-day network configurations, and run through Day 0, Day 1, and daily (Day n) scenarios.
Fly: Deploy applications, network configurations, and more through CI/CD
For more complex automation and programmable examples, you want to go to the Fly stage of the DevNet Automation Exchange. Here you can get ahead of needs by monitoring and proactively managing your users and devices while also gaining insights with telemetry data.
There are many use cases for infrastructure automation, and you are welcome to add to the collection in the DevNet Automation Exchange.
Why Do We Need Automation?
Enterprises compete and control costs by operating quickly and being able to scale their operations. Speed and agility enable the business to explore, experiment with, and exploit opportunities ahead of their competition. Scaling operations lets the business capture market share efficiently and match capacity to demand.
Developers need to accelerate every phase of software building: coding and iterating, testing, and staging. DevOps practices require developers to deploy and manage apps in production, so developers should also automate those activities.
Below are some of the risks that can be incurred in manually-deployed and -managed environments.
Disadvantages of manual operations
Building up a simple, monolithic web application server can take a practiced IT operator 30 minutes or more, especially when preparing for production environments. When this process is multiplied by dozens or hundreds of enterprise applications, multiple physical locations, data centers and/or clouds; manual processes will, at some point, cause a break or even a network failure. This adds costs and slows down the business.
Manual processes such as waiting for infrastructure availability, manual app configuration and deployment, and production system maintenance, are slow and very hard to scale. They can prevent your team from delivering new capabilities to colleagues and customers. Manual processes are always subject to human error, and documentation meant for humans is often incomplete and ambiguous, hard to test, and quickly outdated. This makes it difficult to encode and leverage hard-won knowledge about known-good configurations and best practices across large organizations and their disparate infrastructures.
Financial costs
Outages and breaches are most often caused when systems are misconfigured. This is frequently due to human error while making manual changes. An often-quoted Gartner statistic (from 2014) places the average cost of an IT outage at upwards of $5,600 USD per minute, or over $300,000 USD per hour. The cost of a security breach can be even greater; in the worst cases, it represents an existential threat to human life, property, business reputation, and/or organizational survival.
Financial Costs of Server Outages
Dependency risks
Today's software ecosystem is decentralized. Developers no longer need to build and manage monolithic, full-stack solutions. Instead, they specialize by building individual components according to their needs and interests. Developers can mix and match the other components, infrastructure, and services needed to enable complete solutions and operate them efficiently at scale.
This modern software ecosystem aggregates the work of hundreds of thousands of independent contributors, all of whom share the benefits of participating in this vast collaboration. Participants are free to update their own work as needs and opportunities dictate, letting them bring new features to market quickly, fix bugs, and improve security.
Responsible developers attempt to anticipate and minimize the impact of updates and new releases on users by hewing closely to standards, deliberately engineering backwards compatibility, committing to provide long-term support for key product versions (e.g., the "LTS" versions of the Ubuntu Linux distribution), and other best practices.
This ecosystem introduces new requirements and new risks:
- Components need to be able to work alongside many other components in many different situations (this is known as being flexibly configurable) showing no more preference for specific companion components or architectures than absolutely necessary (this is known as being unopinionated).
- Component developers may abandon support for obsolete features and rarely-encountered integrations. This disrupts processes that depend on those features. It is also difficult or impossible to test a release exhaustively, accounting for every configuration.
- Dependency-ridden application setups tend to get locked into fragile and increasingly insecure deployment stacks. They effectively become monoliths that cannot easily be managed, improved, scaled, or migrated to new, perhaps more cost-effective infrastructures. Updates and patches may be postponed because changes are risky to apply and difficult to roll back.
Why Do We Need Full-Stack Automation?
Why do we need full-stack automation?
Infrastructure automation can deliver many benefits. These are summarized as speed, repeatability, and the ability to work at scale, with reduced risk.
Automation is a key component of functional software-defined infrastructure and distributed and dynamic applications. Below are additional benefits of full-stack automation.
Self-service
Automated self-service frameworks enable users to requisition infrastructure on demand, including:
- Standard infrastructure components such as database instances and VPN endpoints
- Development and testing platforms
- Hardened web servers and other application instances, along with the isolated networks and secured internet access that make them useful, safe, and resistant to errors
- Analytics platforms such as Apache Hadoop, Elastic Stack, InfluxData, and Splunk
Scale on demand
Apps and platforms need to be able to scale up and down in response to traffic and workload requirements and to use heterogeneous capacity. An example is burst-scaling from private to public cloud, and appropriate traffic shaping. Cloud platforms may provide the ability to automatically scale (autoscale) VMs, containers, or workloads on a serverless framework.
Observability
An observable system enables users to infer the internal state of a complex system from its outputs. Observability (sometimes abbreviated as o11y) can be achieved through platform and application monitoring. Observability can also be achieved through proactive production testing for failure modes and performance issues. But, in a dynamic operation that includes autoscaling and other application behaviors, complexity increases, and entities become ephemeral. A recent report by observability framework provider DataDog, states that the average lifetime of a container under orchestration is only 12 hours; microservices and functions may only live for seconds. Making ephemeral entities observable and testing in production are only possible with automation.
Automated problem mitigation
Some software makers and observability experts recommend what is known as Chaos Engineering. This philosophy is based on the assertion that failure is normal: as applications scale, some parts are always failing. Because of this, apps and platforms should be engineered to:
- Minimize the effects of issues: Recognize problems quickly and route traffic to alternative capacity, ensuring that end users are not severely impacted, and that on-call operations personnel are not unnecessarily paged.
- Self-heal: Allocate resources according to policy and automatically redeploy failed components as needed to return the application to a healthy state in current conditions.
- Monitor events: Remember everything that led to the incident, so that fixes can be scheduled, and post-mortems can be performed.
Some advocates of Chaos Engineering even advocate using automation tools to cause controlled (or random) failures in production systems. This continually challenges Dev and Ops to anticipate issues and build in more resilience and self-healing ability. Open source projects like Chaos Monkey and "Failure-as-a-Service" platforms like Gremlin are purpose-built for breaking things, both at random and in much more controlled and empirical ways. An emerging discipline is called "failure injection testing."
Software-Defined Infrastructure: A Case for Automation
Software-defined infrastructure, also known as cloud computing, lets developers and operators use software to requisition, configure, deploy, and manage bare-metal and virtualized compute, storage, and network resources.
- Cloud computing also enables more abstract platforms and services, such as Database-as-a-Service (DaaS), Platform-as-a-Service (PaaS), serverless computing, container orchestration, and more.
- Private clouds let businesses use expensive on-premises hardware much more efficiently.
- Public and hosted private clouds let businesses rent capacity at need, letting them move and grow (or shrink) faster, simplifying planning and avoiding fixed capital investment.
Benefits of cloud paradigms
- Self-service (platforms on demand) - Cloud resources can be available within hours or minutes of needing them. Thus speeding all phases of development, and enabling rapid scaling of production capacity. Applications can be scaled in the public cloud region, service set, or provider that is most cost-effective.
- Close specification, consistency, repeatability - Developers can capture and standardize unique configurations, maintaining configurational consistency of platforms through development, testing, staging, and production. Deploying a known-good application and configuration prevents bugs that can be introduced during manual platform configuration changes.
- Platform abstraction - Container technologies abstract apps and platforms away from one another, by encapsulating application dependencies and letting your containerized app run on a generically-specified host environment.
Challenges of cloud paradigms
Developers must pay close attention to platform design, architecture, and security. Cloud environments make new demands on applications. Public or private cloud frameworks have varying UIs, APIs, and quirks. This means that users cannot always treat cloud resources as the commodities they really should be, especially when trying to manage clouds manually.
Access control is critical, because cloud users with the wrong permissions can do a lot of damage to their organization's assets. Cloud permissions can be also challenging to manage, particularly in manually operated scenarios.
When cloud resources can be self-served quickly via manual operations, consumption can be hard to manage, and costs are difficult to calculate. Private clouds require frequent auditing and procedures for retiring unused virtual infrastructure. Public cloud users can be surprised by unexpected costs when pay-by-use resources are abandoned, but not torn down.
Many of these challenges can be addressed through automation.
Distributed and Dynamic Applications: Another Case for Automation
Large-scale enterprise and public-facing applications may need to manage heavy and variable loads of traffic, computation, and storage.
- The applications need to provide a good experience to their user base.
- They need to be resilient, highly available, protect user data integrity and security, and comply with local regulations about where and how data is stored.
- They need to grow (and shrink) efficiently to meet business needs, exploit trends, and run cost-effectively.
Single-server, "monolithic" application architectures, while conceptually simple, do not serve these needs very well. One server is a single point of failure, limits performance and capacity, and has limited upgrade capability. Duplicating the single server can increase capacity for very simple apps, but does not work for applications requiring data consistency across all instances. And it will not protect user data if there is a failure on the server on which their data resides.
For these and other reasons, modern application architectures are increasingly distributed. They are built up out of small and relatively light components that are sometimes called microservices. These components may be isolated in containers, connected via discovery and messaging services (which abstract network connectivity) and backed by resilient, scalable databases (which maintain state).
Microservices-based applications can (in theory) be scaled out service-by-service, as needed to meet traffic or performance demands. This architecture helps obtain the greatest benefit from hyperconverged infrastructure, which enables fine-tuning of compute, storage, and network connectivity and services to match dynamic application load requirements.
Microservices-Based Applications
Benefits of microservices
- Scalability - Microservices can be scaled and load-balanced as needed across many networked servers or even multiple, geographically-separate datacenters or public cloud regions. This eliminates single points of failure, and provides as much of each task-specific type of capacity as conditions demand, wherever it is needed, eliminating congestion.
- Infrastructure automation tools - Increasingly, the dynamism of microservice-based applications is provided by infrastructure. These container "orchestrators" like Kubernetes or Mesos, automate on-demand scaling, self-healing, and more.
Challenges of microservices
- Increased complexity - Microservices mean that there are many moving parts to configure and deploy. There are more demanding operations, including scaling-on-demand, self-healing and other features.
- Automation is a requirement - Manual methods can not realistically cope with the complexity of deploying and managing dynamic applications and their orchestrator platforms, with their high-speed, autonomous operations and their transitory and ephemeral bits and pieces.
Automating Infrastructure Summary
These business and technical needs, trends, and dynamics, encourage developers and operators to use automation everywhere for the following tasks:
- Manage all phases of app building, configuration, deployment and lifecycle management. This includes coding, testing, staging, and production.
- Manage software-defined infrastructures on behalf of the applications you build.
- Alongside your applications, to preserve, update, and continually improve the automation code. This code helps you develop, test, stage, monitor, and operate our apps at production scales, and in various environments. You can increasingly treat all this code as one work-product.
DevOps and SRE
Introduction to DevOps and SRE
For full-stack automation to be truly effective, it requires changes to organizational culture, including breaking down the historical divides between Development (Dev) and Operations (Ops).
This topic discusses the history of the DevOps approach and the core principles of successful DevOps organizations.
DevOps Divide
Historically, creating applications was the job of software developers (Dev), and ensuring that apps work for users and the business has been the specialized province of IT operations (Ops).
Characteristic | Dev | Ops |
|---|---|---|
Cares about: | Bespoke applications and how they work | Applications and how they run, plus Infrastructure (historically, hardware), OS, network, commodity services, availability, security, scaling, and maintenance |
Business treats as: | Profit center: demands resources | Cost center: provides and accounts for resources |
Participates in on-call rotation: | Occasionally (and is disturbed only when issues are escalated to dev) | Regularly (point of spear) |
Performance measured: | Abstractly (including bad metrics, like lines of code/day) | Concretely (SLA compliance, issues resolved) |
Skills required: | More deep than broad: Languages, APIs, architecture, tools, process, "frontend," "backend," etc. | More broad than deep: Configuration, administration, OS, manufacturer-specific admin CLIs/APIs, shell, automation, DB, security, etc. |
Agility required: | Much! Move fast, innovate, break things, fix later. | Not so much! Investments must be extensively justified, expectations managed. Longish timescales are normal. Safer to say no than yes. |
In the traditional, pre-virtualization, enterprise IT ecosystem, separating Dev from Ops seemed sensible. As long as infrastructure was based on investment in capital equipment (manually installed, provisioned, configured, and managed hardware), it required gatekeepers. The Ops gatekeepers worked on a different timescale from users and developers, and had different incentives and concerns.
Legacy bottlenecks
Traditionally, project resourcing or system scaling would be plan-driven, rather than demand-driven. Requisitioning, purchasing, installing, provisioning, and configuring servers or network capacity and services for a project could take months. With limited physical resources, resource-sharing was common.
The lack of simple ways to set up and tear down isolated environments and connectivity meant that organizations tended to create long-running systems that became single points of failure. Mixed-use networks were difficult to secure and required meticulous ongoing management.
Colloquially, such long-running, elaborate systems were referred to as "pets" because people would name them and care for that one system. The alternative is "cattle", which are stripped-down, ephemeral workloads and virtual infrastructure built and torn down by automation. This method ensures that there is a new system (or “cow”) available to take over the work.
Fusing Dev and Ops
Sophisticated tech organizations have been migrating away from these historic extremes for generations. The process accelerated with widespread adoption of server virtualization, cloud, and Agile software development. In the early 2000s, there began a movement to treat Dev and Ops as a single entity:
- Make coders responsible for deployment and maintenance
- Treat virtualized infrastructure as code
Evolution of DevOps
DevOps evolved and continues to evolve in many places in parallel. Some key events have shaped the discipline as we know it today.
Defining Moments 1: Site Reliability Engineering (SRE)
By 2003, the world's biggest and most-advanced internet companies had significantly adopted virtualization. They were dealing with large data centers and applications operated on a massive scale. There were failures that resulted in Dev vs. Ops finger-pointing, fast-growing and perpetually insufficient organizational headcounts, and on-call stress.
Google was among the first companies to understand and institutionalize a new kind of hybrid Dev+Ops job description. This was the Site Reliability Engineer (SRE). The role of the SRE is intended to fuse the disciplines and skills of Dev and Ops, creating a new specialty and best-practices playbook for doing Ops with software methods.
The SRE approach was adopted by many other companies. This approach is based on:
- Shared responsibility
- Embracing risk
- Acknowledgment of failure as normal
- Commitment to use automation to reduce or eliminate "toil"
- Measurement of everything
- Qualifying success in terms of meeting quantitative service-level objectives
Defining Moments 2: Debois and “Agile Infrastructure”
At Agile 2008, Belgian developer Patrick Debois gave a presentation called Agile Infrastructure and Operations. His presentation discussed how to apply Developer methods to Ops while maintaining that Dev and Ops should remain separate. Nevertheless, the following year, Debois went on to found the DevOpsDays event series.
Debois's presentation was influential in advancing discussions around automating virtual (and physical) infrastructure, using version-control (such as Git) to store infrastructure deployment code (procedural or declarative), and applying Agile methods to the development and maintenance of infrastructure-level solutions.
Defining Moments 3: Allspaw and Hammond
By the late 2000s, DevOps was increasing in popularity. Among the first compelling illustrations of its value was a presentation by John Allspaw and Paul Hammond at VelocityConf in 2009. In this presentation, the authors outline a simple set of DevOps best practices founded on the idea that both Dev and Ops cooperatively enable the business. These best practices include:
- Automated infrastructure
- Shared version control
- Single-step builds and deployments
The presentation also describes automation, teamwork, responsibility-sharing, transparency, trust, mutual accountability, and communications practices that have since become commonplace among DevOps/SRE practitioners. These practices include CI/CD, automated testing, metrics, SLOs/SLAs, and blame-free embrace of risk and inevitable failure.
Core Principles of DevOps
DevOps/SRE have many core principles and best practices:
- A focus on automation
- The idea that "failure is normal"
- A reframing of "availability" in terms of what a business can tolerate
Just as Agile Development can be seen as a method for defining and controlling management expectations for software projects, DevOps can be viewed as a way of structuring a healthy working culture for the technology-based parts of businesses. It can also be viewed as a way of reducing costs. By making failure normal and automating mitigation, work can move away from expensive and stressful on-call hours and into planned workday schedules.
Automation, avoiding toil, and retaining talent
Automation delivers speed and repeatability while eliminating manual, repetitive labor. This enables technical talent to spend time solving new problems, increasing the value to the business in the time spent.
DevOps/SRE practitioners are often expected to devote a significant fraction of working hours (50% or more in some scenarios) to delivering new operational capabilities and engineering reliable scaling, including development of automation tooling. This reduces hours spent on-call and intervening manually to understand and fix issues.
Acquisition and retention of technical talent requires organizations to cooperate with their innovators to minimize the boredom and stress of low-value labor and on-call work, and the risk of confronting inevitable technology failures in "fire drill" mode.
This is especially critical for businesses with intrinsically low margins, that profit by rapid growth and large scale and need to strictly control expenditures, particularly for skilled headcount. For all kinds of organizations, however, it is important that information technology work be perceived as a profit-center, rather than as a cost-center.
Failure is normal
The assumption that failures will occur does influence software design methodology. DevOps must build products and platforms with greater resiliency, higher latency tolerance where possible, better self-monitoring, logging, end-user error messaging, and self-healing.
When failures do occur and DevOps must intervene, the resulting activities should be viewed not simply as repair work, but as research to identify and rank procedural candidates for new rounds of automation.
SLOs, SLIs, and error budgets
Critical to DevOps/SRE culture are two linked ideas: 1. DevOps must deliver measurable, agreed-upon business value, AND 2. the statistical reality of doing so perfectly is impossible. These ideas are codified in a Service Level Objective (SLO) that is defined in terms of real metrics called Service Level Indicators (SLIs).
SLIs are engineered to map to the practical reality of delivering a service to customers: they may represent a single threshold or provide more sophisticated bracketing to further classify outlier results. For example, an SLI might state that 99% of requests will be served within 50 milliseconds, and may also require capturing information such as whether a single >50msec request completes at all, or whether a particular request has failed for your biggest customer.
SLO/SLI methodology permits cheaper, more rapid delivery of business value by removing the obligation to seek perfection in favor of building what is "good enough". It can also influence the pace, scope, and other aspects of development to ensure and improve adequacy.
One way of modeling SLO/SLI results requires establishing a so-called "error budget" for a service for a given period of time (day, week, month, quarter, etc.), and then subtracting failures to achieve SLO from this value. If error budgets are exceeded, reasonable decisions can be made, such as slowing the pace of releases until sources of error are determined, and specific fixes are made and tested.
The figure shows a horizontal bar where three quarters of it is blue and one quarter gold. About two thirds down the blue section is an arrow labeled s l a. The slo arrow is at the beginning of the gold section and the words error budget under the gold section.
SLA vs. SLO and Error Budget
DevOps and SRE Summary
You have seen how DevOps/SRE is co-evolving with technologies like virtualization and containerization, enabling a unified approach and unified tool set to support coordinated application and infrastructure engineering.
Next, you will learn about some of the mechanics of infrastructure automation.
Basic Automation Scripting
Introduction to Basic Automation Scripting
Powerful automation tools like Ansible, Puppet, and Chef bring ease of use, predictability, discipline, and the ability to work at scale to DevOps work. But that does not mean that you cannot do some automation with more basic tools like Bash and Python. Automation tooling partly works by wrapping shell functionality, operating system utilities, API functions and other control plane elements for simplicity, uniformity, feature enrichment, and compatibility in DevOps scenarios. But tools still do not solve every problem of deployment and configuration.
That is why every automation tool has one or more functions that execute basic commands and scripts on targets and return results. For example, in Ansible, these functions include command, shell, and raw.
Sometimes it can be faster and simpler to use shell commands or scripts. Often, this is because many tool implementations begin by translating automation originally written in Bash, Python, or other languages, and you want to transfer that functionality quickly and accurately into the tool before porting and refactoring.
In summary, it is rare to look deep down into tool-maintained infra-as-code repos without finding some scripting. So having these skills is important!
Basic Tools for Automation Scripting
Shells are ubiquitous, so shell scripting is historically the bedrock of automation.
Bash
In Linux (and other operating systems) the shell interoperates with interactive I/O, the file system, and interprocess communication. This provides ways to issue commands, provides input for processing and piping outputs to chains of powerful utilities.
The Bourne Again Shell (BASH) is the default on most Linux distributions. Because of its ubiquity, the terms "Bash" and "shell" are generally used interchangeably.
Using commands in a Bash script is much the same as using them directly from the command line. Very basic script development can simply be a matter of copying command-line expressions into a file after testing the CLI commands to see if they work.
By contrast, Python or another high-level, sophisticated language for simple procedural automation is usually more challenging, and may not be worthwhile for simple projects.
Programming languages beyond Bash
Sophisticated languages improve on Bash when complexity and scale requirements increase. They are particularly useful when building and operating virtualized infrastructure in cloud environments, using SDKs like the AWS SDK for Python or the AWS SDK for javascript in Node.js. While Bash can be used to script access to the AWS CLI, you can use the built-in features and libraries of more sophisticated languages to parse complex returned datasets (such as JSON), manage many parallel operations, process errors, handle asynchronous responses to commands, and more.
To develop and execute scripts in your desired language, you may need to install and configure that language on your development system and on any remote target machines. Accessing system-level utility functions may require invoking libraries (such as the os library in Python), then wrapping what are Bash CLI commands in additional syntax for execution. You also need to handle return codes, timeouts, and other conditions in your preferred language environment.
Procedural Automation
Using Bash, Python, or other conventional languages for automation usually means writing an imperative procedure. An imperative procedure is an ordered sequence of commands aimed at achieving a goal. The sequence may include flow-control, conditions, functional structure, classes, and more.
Such procedural automation can be very powerful. But it stays simple only if you are knowledgeable about how system utilities, CLIs/SDKs, and other interfaces work. You must also know about target system state.
Developing a procedure
As you know, if you make a little script to install and configure a piece of software on a remote target system, it may run okay the first time. Run it a second time however; and your simple script might make a mess. It might throw an error and stop when it finds the application already installed, or worse, ignore such an error, and go on to make redundant changes in config files.
To make this script safer, easier to use, more flexible, and reusable, you need to make it smarter and more elaborate. For example, you could enhance it to:
- Determine if it is running in a Debian or a CentOS environment, and use the correct package manager (
aptoryum) and syntax. - Determine if your target app is already installed in an appropriate version, and only try installing it if it is not present, stopping otherwise and making no further changes.
- Determine if it has made a copy of each config file before changing it, and use stream editors (
awk,sed, etc.) to make precise changes, rather than carelessly appending text to config files, and hoping the applications that consume these files will not break.
As you develop and refine the scripts further, you will want them to accomplish some of the following tasks:
- Discover, inventory, and compile information about target systems, and ensure the scripts do this by default.
- Encapsulate the complexity of safely installing applications. Make config file backups and changes, and restart services into reusable forms, such as subsidiary scripts containing parameters, function libraries, and other information.
To ensure the scripts are efficient and reusable, you will:
- Standardize the ordering and presentation of parameters, flags, and errors.
- Create a code hierarchy that divides tasks logically and efficiently.
- Create high-level scripts for entire deployments and lower-level scripts for deployment phases.
- Separate deployment-specific data from the code, making the code as generic and reusable as possible.
The figure shows an A known state circle with an arrow to the predefined process textbox that has an arrow to the B desired state circle. Beside this is the b desired state circle again with an arrow going into a predefined process textbox that has an arrow going to the ? unknown state circle. Beside this is the ? unknown state circle again with an arrow going into the predefined process textbox that has an arrow going to another ? unknown state circle.
Simple Procedural Scripting Example
This type of scripting tends to be dangerous if starting state is not completely known and controlled. Applying the same changes again to a correctly-configured system may even break it.
Idempotency: a recurring theme in automation
Ultimately, the goal of almost any script is to achieve a desired state in a system, regardless of starting conditions. Carefully-written procedural scripts and declarative configuration tools examine targets before performing tasks on them, only performing those tasks needed to achieve the desired state.
This quality of software is called idempotency. There are a few basic principles of idempotency to follow:
- Ensure the change you want to make has not already been made - Also known as "First, do no harm". Doing nothing is almost always a better choice than doing something wrong and possibly unrecoverable.
- Get to a known-good state, if possible, before making changes - For example, you may need to remove and purge earlier versions of applications before installing later versions. In production infra-as-code environments, this principle becomes the basis for immutability. Immutability is the idea that changes are never made on live systems. Instead, change automation and use it to build brand-new, known-good components from scratch.
- Test for idempotency: Be scrupulous about building automation free from side effects.
- All components of a procedure must be idempotent - Only if all components of a procedure are known to be idempotent can that procedure as a whole be idempotent.
Executing Scripts Locally and Remotely
To configure remote systems, you need to access and execute scripts on them. There are many ways to do this:
- You can store scripts locally, transmit them to target machines with a shell utility like
scp, then log into the remote machine usingsshand execute them. - You can pipe scripts to a remote machine using
cat | sshand execute them in sequence with other commands, capturing and returning results to your terminal, all in one command. - You can install a general-purpose secure file-transfer client like SFTP, then use that utility to connect to the remote machine, transfer, set appropriate permissions, then execute your script file.
- You can store scripts on a webserver, log into the remote machine and retrieve them with
wget,curl, or other utilities, or store the scripts in a Git repository. Installing git on the remote machine, clone the repo to it, check out a branch, and execute the scripts found there. - You can install a full remote-operations solution like VNC or NoMachine locally, install its server on the target (this usually requires also installing a graphical desktop environment), transmit/copy and then execute scripts.
- If your target devices are provisioned on a cloud framework, there is usually a way to inject a configuration script via the same CLI command or WebUI action that manifests the platform.
Almost every developer will end up using these and other methods at one point or another, depending on the task(s), environmental limitations, access to internet and other security restrictions, and institutional policy.
Understanding these methods and practicing them is important, because procedurally automating certain manual processes can still be useful, even when using advanced deployment tools for the majority of a DevOps task. To be clear, this is not good practice, but the state of the art in tooling is not yet perfect or comprehensive enough to solve every problem you may encounter.
Cloud Automation
Infrastructure-as-a-Service (IaaS) cloud computing frameworks are a typical target for automation. Cloud automation enables you to provision virtualized hosts, configure virtual networks and other connectivity, requisition services, and then deploy applications on this infrastructure.
Cloud providers and open source communities often provide specialized subsystems for popular deployment tools. These subsystems extract a complete inventory of resources from a cloud framework and keep it updated in real time while automation makes changes, which enables you to more easily write automation to manage these resources.
You can also manage cloud resources using scripts written in Bash, Python, or other languages. Such scripts are helped along by many tools that simplify access to automation targets. These include:
- CLIs and SDKs that wrap the REST and other interfaces of hardware, virtual infrastructure entities, higher-order control planes, and cloud APIs. This makes their features accessible from shells (and via Bash scripts) and within Python programs.
- Command-line tools and Python's built-in parsers can parse JSON and YAML output returned by CLIs and SDKs into pretty, easy-to-read formats and into native Python data structures for easy manipulation.
Cloud CLIs and SDKs
IaaS and other types of infrastructure cloud also provide CLIs and SDKs that enable easy connection to their underlying interfaces, which are usually REST-based.
Cisco UCS - a bare metal cloud
If you are familiar with Cisco Compute products, including Unified Computing System (UCS), Hyperflex, UCS Manager, and the Intersight infrastructure management system, you know these are effectively a gateway to global SaaS management of an organization's UCS/Hyperflex infrastructure.
Cisco's main API for this infrastructure is the Cisco Intersight RESTful API. This is an OpenAPI-compatible API that can be interrogated with Swagger and other open source OpenAPI tools. These enable you to generate specialized SDKs for arbitrary languages and environments, and simplify the task of documenting the API (and maintaining SDKs).
Cisco provides and maintains a range of SDKs for the Intersight RESTful API, including ones for Python and Microsoft PowerShell. They also provide a range of Ansible modules.
VMware
VMware's main CLI is now Datacenter CLI, which enables command-line operation of vCenter Server API and VMware Cloud on AWS. It is written in Python and runs on Linux, Mac, and Windows.
VMware also provides vSphere CLI for Linux and Windows, which lets you manage ESXi virtualization hosts, vCenter servers, and offers a subset of DCLI commands. It also offers PowerCLI for Windows PowerShell, which provides cmdlets for vSphere, vCloud, vRealize Operations Manager, vSAN, NSX-T, VMware Cloud on AWS, VMware HCX, VMware Site Recovery Manager, and VMware Horizon environments.
VMware also offers a host of SDKs for popular languages (including Python), aimed at vSphere Automation, vCloud Suite, vSAN, and other products.
OpenStack
The OpenStack project provides the OpenStack Client (OSC), which is written in Python. The OSC lets you access OpenStack Compute, Identity, Image, Object Storage, and Block Storage APIs.
Installing the command-line clients also installs the bundled OpenStack Python SDK, enabling a host of OpenStack commands in Python.
OpenStack Toolkits are also available for many other popular languages.
Summary of Basic Automation Scripting
Summary
Basic automation scripting techniques are great to have in your toolbox, and understanding them will improve your facility as an operator and user of mature automation platforms.
Automation Tools
Introduction to Automation Tools
In this topic, you will learn about three of the most popular automation tools: Ansible, Puppet, and Chef.
You will also have the option to install one or all of them on your local workstation. If you want to try this, ensure that you have access to a Linux-based workstation, such as Ubuntu or macOS.
You should always refer to the tool's own installation documentation for your operating system.
What Do Automation Tools Do for Us?
What do automation tools do for us?
Automation tools like Ansible, Puppet, or Chef offer powerful capabilities compared to ad-hoc automation strategies using BASH, Python, or other programming languages.
Simplify and standardize
Automation tools "wrap" operating system utilities and API functions to simplify and standardize access. Often, they also establish intelligent defaults that speed code drafting and testing. They make tool-centric code less verbose and easier to understand than scripts.
You can still access deeper underlying functionality with built-in shell access that enables you to issue "raw" shell commands, inject shell and other scripts into remote systems to enable delicate configuration. You can reuse legacy configuration code, and add functionality to the tool itself by composing modules and plugins in languages like Python or Ruby.
Automation tool modules enable best practices that make code safer and idempotency easier to achieve. For example, many Ansible functions can back up configuration files on target systems or retrieve copies and store them locally on a deployer machine before making changes. This helps to enable recovery if a deployment breaks, or is interrupted.
Accelerate development with out-of-the-box features
Automation tools typically provide some very powerful functionality to accelerate development. For example, by default, Ansible 2.4+ provides functionality that lets you easily retrieve configuration snapshots from Cisco ACI network fabrics. It also has complementary functionality to help you enable rollback of ACI configurations to a prior snapshotted state.
Facilitate reusability, segregate concerns, promote security
Modern automation tools strive to be "data-driven" and enable the following:
- Compilation of variable definitions
- Server inventory as structured data and other details separate from generic code
- Orderly means to inject variable values into code, config file templates, and other destinations at runtime
For example, Ansible Vault supports sophisticated functionality for encrypting sensitive files and variables, securely editing encrypted file contents, and more.
Perform discovery and manage inventory
Automation tools typically gather information from target devices as a default in the normal course of operations. This information includes hardware configuration, BIOS settings, operating system, configuration of network and other peripheral cards and subsystems, installed applications, and other details.
Some tools, like Cisco ACI, can also gather configuration details from individual devices and higher-order virtualization frameworks. Others feature dynamic inventory systems that enable automated extraction, compilation, and realtime updating of data structures, describing all resources configured in a private or public cloud estate.
Handle scale
Most automation tools can work in a local mode, as well as a client/server or distributed agent mode. This lets the tool manage thousands or tens of thousands of nodes.
Engage community
Most popular tools are available in open source core versions, helping the community to accelerate development, and find and fix bugs. Users of these tools also share deployment playbooks, manifests, recipes, and more. These are designed for use with the tool, are may be distributed via GitHub and other public repositories, and on tool-provider-maintained repositories like Ansible Galaxy.
Critical Concepts
Idempotency: a review
Idempotent software produces the same desirable result each time that it is run. In deployment software, idempotency enables convergence and composability. Idempotent deployment components let you:
- More easily gather components in collections that build new kinds of infrastructure and perform new operations tasks
- Execute whole build/deploy collections to safely repair small problems with infrastructure, perform incremental upgrades, modify configuration, or manage scaling.
For example, suppose an operator misconfiguration causes a problem in a complex Kubernetes cluster. Idempotent lets you revert the most recent change in the deployment code and rebuild the cluster from scratch in five minutes, completely confident that you will have a working product.
This is the basic thinking behind infrastructure as code. Idempotency guarantees that your codebase achieves your desired goal by converging on a sequence of dependent goals. In this way, running your codebase against a base infrastructure target will have the idempotent result of producing a new or revised working infrastructure.
Procedure vs. declarative
Procedural code can achieve idempotency, but many infrastructure management, deployment, and orchestration tools have adopted another method, which is creating a declarative. A declarative is static model that represents the desired end product. This model is used by middleware that incorporates deployment-specific details, examines present circumstances, and brings real infrastructure into alignment with the model, via the least disruptive, and usually least time-consuming path.
These days, most popular automation tools are characterized as inherently procedural or declarative. Ansible and Puppet, for example, are often described as employing declarative Domain-Specific Languages (DSLs), whereas Chef is said to be more inherently procedural.
This is a somewhat artificial distinction, because all these platforms (as well as BASH, Python, etc.) are procedural at the lowest level; Ansible is based on Python, Puppet and Chef are built on Ruby. All can make use of both declarative and procedural techniques as needed, and many real-world automation tasks require both approaches.
- name: Install Apache webserver
apt:
name: apache2
state: present
update-cache: yes
Figure 1. Typical terse Ansible declaration, which installs the Apache web server on an Ubuntu host
In this example, state: present could be replaced with state: absent to remove the package if found. The update-cache: yes setting performs the equivalent of apt-get update before attempting the installation.
Provisioning vs. configuration vs. deployment vs. orchestration
Operations people tend to think in terms of a hierarchy of infrastructure layers and associated task-domains:
- Provisioning refers to obtaining compute, storage, and network infrastructure (real or virtual), enabling communications, putting it into service, and making it ready for use by operators and developers (e.g., by installing an operating system, machine-level metrics, ssh keys, and the lowest level of operations tooling).
- Configuration means installing base applications and services, and performing the operations, tasks, and tests required to prepare a low-level platform to deploy applications or a higher-level platform.
- Deployment involves building, arranging, integrating, and preparing multi-component applications (such as database clusters) or higher-level platforms (like Kubernetes clusters), often across multiple nodes.
- Orchestration may refer to several things. When meant concretely, it usually refers to user-built or platform-inherent automation aimed at managing workload lifecycles and reacting dynamically to changing conditions (e.g., by autoscaling or self-healing), particularly in container environments. When meant abstractly, it may refer to processes or workflows that link automation tasks to deliver business benefits, like self-service.
People who come to operations from software development tend to have a looser perspective on how these terms should be used. They tend to use to term deployment about anything that is not orchestration. They make the strongest distinction between "things you need to do to make a system ready for testing/use" and "adjustments the system needs to make automatically, or that you may be asked to make for it."
People also use the phrase "configuration management" when describing IT automation tools in general. This can mean one of two things:
- An expanding idea of "configuration" as installing applications and services.
- A distinction between configuration and coding: domains where static descriptions can neatly represent work-products vs. domains in which processes must be made explicit and possible side-effects anticipated. For more information, research the connection between declarative automation and functional/logical programming, and the complementary connection between procedural automation and imperative programming.
Statelessness
Automation works best when applications can be made stateless. This means that redeploying them in place does not destroy, or lose track of, data that users or operators need.
- Not stateless - An app that saves important information in files, or in a database on the local file system.
- Stateless - An app that persists its state to a separate database, or that provides service that requires no memory of state between invocations.
The discussion of full-stack (infrastructure + applications) automation in this topic assumes that the applications being discussed are stateless and/or that you, the developer, have figured out how to persist state in your application so that your automation can work non-destructively.
Examples of Statelessness and Statefulness
This diagram depicts three kinds of applications:
- Stateless / No state to store - This app requires only atomic/synchronous interactions between client and server. Each request from client to server returns a result wholly independent of prior and subsequent requests. An example of this application is a public web server that returns an HTML page, image, or other data on request from a browser. The application can be scaled by duplicating servers and data behind a simple load balancer.
- Stateless / State stored in database - User state is stored in a database accessible to any webserver in the middle tier. An example of this application is a web server that needs to be aware of the correspondence between a user ID and user cookie. New webservers and copies of the website can be added freely without disrupting user sessions in progress and without requiring that each request from a given user be routed to the specific server that maintains their session.
- Stateful / State stored on server - A record of user state must be maintained across a series of transactions. An example of this application is a website that requires authentication. The app is not allowed to serve pages to a user who is not logged in. User state is typically persisted by giving the client an identifying cookie that is returned to the server with each new request and used to match an ID stored there. This application cannot be scaled just by adding servers. If a logged-in user is routed to a server that has not stored an ID matching the user's cookie, that server will not recognize them as being logged in, and will refuse their request.
Apps that need to maintain state are inconvenient candidates for full-stack automation, because state will be destroyed by an ad hoc rebuild of their supporting infrastructure. They also cannot be efficiently migrated away from one pool of resources (for example, one set of application servers or hosts) to another
Popular Automation Tools
The first modern automation tool was probably Puppet, introduced in 2005 as open source, and then commercialized as Puppet Enterprise by Puppet Labs in 2011.
Currently, the most popular tools are Ansible, Puppet, and Chef. They share the following characteristics:
- Relatively easy to learn
- Available in open source versions
- Plugins and adapters enable them to directly or indirectly control many types of resources: software-defined bare-metal infrastructure (Cisco UCS, Cisco devices), private cloud (VMware, OpenStack), and public cloud (Amazon Web Services, Microsoft Azure, Google Cloud Platform).
Many other solutions also exist. Private and public cloud providers often endorse their own tools for use on their platforms, for example, OpenStack's HEAT project, and AWS' CloudFormation. Other solutions, many aimed at the fast-growing market for container orchestration, pure infrastructure-as-code, and continuous delivery of infrastructure+applications, include SaltStack and Terraform.
Ansible
Ansible
Ansible is probably the most broadly popular of current automation solutions. It is available as open source, and in a version with added features from IBM/Red Hat that is called Ansible Tower. Its name comes from the novels of speculative fiction author Ursula K. LeGuin, in which an "ansible" is a future technology enabling instant communication at cosmic distances.
Ansible's basic architecture is very simple and lightweight.
- Ansible's control node runs on virtually any Linux machine running Python 2 or 3, including a laptop, a Linux VM residing on a laptop of any kind, or on a small virtual machine adjacent to cloud-resident resources under management. All system updates are performed on the control node.
- The control node connects to managed resources over SSH. Through this connection, Ansible can:
- Run shell commands on a remote server, or transact with a remote router, or other network entity, via its REST interface.
- Inject Python scripts into targets and remove them after they run.
- Install Python on target machines if required.
- Plugins enable Ansible to gather facts from and perform operations on infrastructure that cannot run Python locally, such as cloud provider REST interfaces.
Ansible is substantially managed from the Bash command line, with automation code developed and maintained using any standard text editor. Atom is a good choice, because it permits easy remote work with code stored in nested systems of directories.
The figure shows an ansible workstation or node up top. Below it is the commands and responses textbox and an arrow pointing to the node. Below the textbox on the left are three servers and a line with an arrow pointing to each of them. Words below servers: physical or virtual host via s s h configured using shell commands and python modules (no agents required). Below the commands and responses textbox is an arrow pointing to a circle that has 4 arrows pointing inward and the line labeled s s h. Words below icon: physical and or virtual infrastructure and hosts. The commands and responses textbox has another arrow pointing to an h t t p s / rest section. This section has a server labeled infrastructure manager, cloud operations a p i, etc. Two arrows go from the server to an icon of a circle with 4 inward pointing arrows and another server. Words under this section: physical and or virtual infrastructure and hosts
Ansible Architecture
Installing Ansible
The Ansible control node application is installed on a Linux machine (often a virtual machine) from its public package repository. To install Ansible on your workstation, refer to the installation documentation appropriate to your device.
Ansible code structure
In the Ansible code structure, work is separated into YAML (.yml) files that contain a sequence of tasks, executed in top-down order. A typical task names and parameterizes a module that performs work, similar to a function call with parameters.
Ansible has hundreds of pre-built Python modules that wrap operating-system-level functions and meta-functions. Some modules like raw only do one thing; they present a command in string form to the shell, capture a return code and any console output, and return it in accessible variables. The module apt can be used to install, remove, upgrade, and modify individual packages or lists of packages on a Linux web server running a Debian Linux variant. If you want to learn more, the apt documentation will give you a sense of the scope and power of Ansible modules.
Playbooks and roles
An Ansible playbook (or "series of plays") can be written as a monolithic document with a series of modular, named tasks. More often, developers will build a model of a complex DevOps task out of low-level playbook task sequences (called "roles"), then reference these in higher-level playbooks, sometimes adding additional tasks at the playbook level.
This segregation of concerns has many benefits.
- Clarity - Given a little context, almost anyone can interpret a higher-level playbook referencing clearly-named roles.
- Reusability and shareability - Roles are reusable and may be fairly closely bound to infrastructure specifics. Roles are also potentially shareable. The Ansible project maintains a repository for opensource role definitions, called Ansible Galaxy.
Example playbook
The following sample playbook builds a three-node Kubernetes cluster on a collection of servers:
- It installs Python 2 on all servers and performs common configuration steps on all nodes, via a role called
configure-nodes, installing the Kubernetes software and Docker as a container engine, and configuring Docker to work with Kubernetes. The actual Ansible commands are not shown. - It designates one node as master, installing the Weave container network (one of many network frameworks that work with Kubernetes), and performing wrapup tasks.
- It joins the
k8sworkernodes to thek8smasternode. - The statement
become: truegives Ansible root privileges (via sudo) before attempting an operation. - The line
gather_facts: falsein the first stanza prevents the automaticsystem-factsinterrogator from executing on a target machine before Python is installed. When subsequent stanzas are executed, facts will be compiled automatically, by default.
---
- hosts: all
become: true
gather_facts: False
tasks:
- name: install python 2
raw: test -e /usr/bin/python || (apt -y update && apt install -y python-minimal)
- hosts: all
become: true
roles:
- configure-nodes
- hosts: k8smaster
become: true
roles:
- create-k8s-master
- install-weave
- wrapup
- hosts:
- k8sworker1
- k8sworker2
become: true
roles:
- join-workers
...
Ansible project organization
Ansible projects are typically organized in a nested directory structure as shown below. The hierarchy is easily placed under version control and used for GitOps-style infrastructure as code. For an example, refer to “Directory Layout” in the Ansible documentation.
Ansible folder hierarchy elements
The Ansible folder/file hierarchy includes the following main elements:
- Inventory files - Also called hostfiles. These organize your inventory of resources (e.g., servers) under management. This enables you to aim deployments at a sequence of environments such as dev, test, staging, production. For more information about inventory files, refer to “How to build your inventory” in the Ansible documentation.
- Variable files - These files describe variable values that are pertinent to groups of hosts and individual hosts.
- Library and utility files - These optional files contain Python code for custom modules and the utilities they may require. You may wish to write custom modules and utilities yourself, or obtain them from Ansible Galaxy or other sources. For example, Ansible ships with a large number of modules already present for controlling main features of Cisco ACI, but also provides tutorials on how to compose additional custom modules for ACI features currently lacking coverage.
- Main playbook files - Written in YAML, these files may reference one another, or lower-level roles.
- Role folders and files - Each role folder tree aggregates resources that collectively enable a phase of detailed configuration. A role folder contains a
/tasksfolder with amain.ymltasks file. It also contains a folder of asynchronoushandlertask files. For more information about roles, refer to “Roles” in the Ansible documentation.
Ansible at scale
Ansible's control node is designed to sit close to the infrastructure that it manages. For example, it may reside on a VM, or in a container, running in the same subnet as managed resources. Enterprises and organizations with many hosts under management tend to operate many Ansible control nodes, distributing across infrastructure pools as required.
If you are not doing rapid-fire continuous delivery, Ansible nodes do not even need to be maintained between deployments. If your Ansible deployment code is stored in version control, control nodes can be launched or scratch-built as you need.
There are scaling challenges for large organizations, such as managing and controlling access to many Ansible nodes flexibly and securely. This also includes putting remote controllers seamlessly and safely under control of centralized enterprise automation. For this, there are two control-plane solutions:
- The commercial Red Hat Ansible Tower product provides a sophisticated web interface, REST API, and rich, role-based access control options.
- The open-source, feature-comparable alternative, AWX project, of which Ansible Tower is a value-added distribution. AWX, however, is said to represent a development branch that undergoes minimal testing, and is not made available via signed binaries. This may be a problem for many enterprises.
Continuous delivery around Ansible deployment can be performed with any general-purpose CI/CD automation tool such as Jenkins or Spinnaker. Larger, more complex projects often use the Zuul open source gating framework, originally developed by the OpenStack Project and spun off independently in 2018. The AWX Project, among many others, is a Zuul-gated project.
Larger-scale Ansible implementations will also benefit from Ansible Vault, a built-in feature that enables encryption of passwords and other sensitive information. It provides a straightforward and easily-administered alternative to storing sensitive information in playbooks, roles, or elsewhere as plaintext.
Cisco Ansible resources
Cisco and the Ansible community maintain extensive libraries of Ansible modules for automating Cisco compute and network hardware including:
- A very large set of built-in modules for configuring Cisco Application-Centric Infrastructure fabrics via the Application Policy Infrastructure Controller (APIC). These modules execute on the Ansible host (not the controller), communicating via the controller's REST interface.
- Remote control of Cisco network devices running IOS-XR, plus modules for sending commands and retrieving results from these devices via CLI, or via the standard NETCONF REST interface.
- Ansible modules for configuring Cisco UCS infrastructure via the Intersight REST interface.
Ansible Example
This exercise will let you view the structure of a simple Ansible playbook, which retrieves information about the container the demo environment resides in. Note that Ansible normally uses ssh to connect with remote hosts and execute commands. In this example, a line in the top-level playbook.yml file instructs Ansible to run this playbook locally, without requiring ssh.
connection: local
Normally, Ansible is used to perform deployment and configuration tasks. For example, you might use it to create a simple website on a remote host. Let's see how this might work.
Prerequisites
You can walk through this exercise yourself, or you can simply read along. If you want to complete this exercise, you will need:
- A target host running a compatible operating system (such as Ubuntu 18.04 server)
- SSH and keywise authentication configured on that host
- Ansible installed on your local workstation
This is typically how new virtual machines are delivered on private or public cloud frameworks.
For the target host, you can create one using a desktop virtualization tool like VirtualBox.
Building an Ansible project file tree
For the purposes of this exercise, the target machine's (DNS-resolvable) hostname is simply target. If you have your own target host set up and configured, substitute your host name when you create your files.
With your target machine SSH-accessible, begin building a base folder structure for your Ansible project.
mkdir myproject
cd myproject
At the top level in your project folder, you need:
- An inventory file, containing information about the machine(s) on which you want to deploy.
- A top level
site.ymlfile, containing the most abstract level of instructions for carrying out your deployment. - A role folder structure to contain your
webserverrole.
touch inventory
touch site.yml
mkdir roles
cd roles
ansible-galaxy init webservers
Creating your inventory file
Your inventory file for this project can be very simple. Make it the DNS-resolvable hostname of your target machine:
[webservers]
target # can also be IP address
You are defining a group called webservers and putting your target machine's hostname (or IP) in it. You could add new hostnames/IPs to this group block, or add additional group blocks, to assign hosts for more complex deployments. The name webservers is entirely arbitrary. For example, if you had six servers you wanted to configure in a common way, and you then configure three as webservers and three as database servers, your inventory might look like this:
[webservers]
target1
target2
target3
[dbservers]
target4
target5
target6
You don't actually need to create a common group, because Ansible provides means to apply a common configuration to all servers in an inventory, which you'll see in a moment.
Creating your top level playbook file
A top-level playbook typically describes the order, permissions, and other details under which lower-level configuration acts, defined in roles, are applied. For this example project, site.yml looks like this:
---
- hosts: webservers
become: true
roles:
- webservers
site.yml identifies which hosts you want to perform an operation on, and which roles you want to apply to these hosts. The line become: true tells Ansible that you want to perform the roles as root, via sudo.
Note that instead of hosts: webservers, you could apply this role to all target hosts (which, in this case, would work fine, because you only have one target) by substituting the line:
- hosts: all
Creating your webservers role
Next step is to create the role that installs and configures your web server. You've already created the folder structure for the role using ansible-galaxy. Code for the role is conventionally contained in a file called main.yml in the role's /tasks directory. You can edit the roles/webserver/tasks/main.yml file directly, to look like this:
---
- name: Perform updates and install apache2
apt:
name: apache2
state: present
update_cache: yes
- name: Insert new homepage index.html
copy:
src: index.html
dest: /var/www/html
owner: myname
mode: '0444'
The role has two tasks:
- Deploy Apache2.
- Copy a new
index.htmlfile into the Apache2 HTML root, replacing the defaultindex.htmlpage.
In the apt: stanza, you name the package, its required state, and instruct the apt module to update its cache. You are basically performing a sudo apt update before the installation happens.
In the second stanza, Ansible's copy routine moves a file from your local system to a directory on the target and also changes its owner and permissions. This is the equivalent of:
chown myname index.html
chmod 444 index.html
Creating your index.html file
Of course, you'll need to create a new index.html file as well. The Ansible copy command assumes that such files will be stored in the /files directory of the role calling them, unless otherwise specified. Navigate to that directory and create the following index.html file, saving your changes afterward:
<html>
<head>
<title>My Website</title>
</head>
<body>
<h1>Hello!</h1>
</body>
</html>
Running your deployment
Now you're ready to run your deployment. From the top level directory of your project, you can do this with the statement:
ansible- -i inventory -u myname -K site.yml
-i: names your inventory file.-uargument names your sudo user.-Ktells Ansible to ask us for your sudo password, as it begins execution.site.ymlis the file that governs your deployment.
If all is well, Ansible should ask us for your BECOME password (sudo password), then return results similar to the following:
BECOME password:
PLAY [webservers] **************************************************************
TASK [Gathering Facts] *********************************************************
ok: [192.168.1.33]
TASK [webservers : Perform updates and install apache2] ****************************
changed: [192.168.1.33]
TASK [webservers : Insert new homepage index.html] *********************************
changed: [192.168.1.33]
PLAY RECAP *********************************************************************
192.168.1.33 : ok=3 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
And now, if you visit the IP address of your target machine in a browser, you should see your new homepage.
Note that Ansible gives back a full report on each execution, noting whether a step was actually performed, or whether Ansible determined that its desired goal was already reached (In that case, nothing happened, but the step was considered to have completed 'ok'.). This is an example of how Ansible maintains idempotency. You can typically run an Ansible deployment as many times as needed without putting a target system into an unknown state.
Ansible CI/CD walkthrough
Let's walk through the example as if they were part of a CI/CD pipeline.
A developer collaborating with you on GitHub commits a change to the website, such as in the index.html file.
Next, tests in the repository execute syntax and sanity checks as well as code review rules against each pull request, for example.
If tests pass: Accepts their commit and notifies CI/CD server to run tests.
If tests fail: Rejects their commit based on failed checks and asks them to resubmit.
Next, the CI/CD system, such as Jenkins, prepares an environment and runs predefined tests for any Ansible playbook. It should indicate the version expected each time and install it. Here's an example pipeline:
pip install ansible==2.9.2
ansible-version
ansible-playbook main.yml --syntax-check
ansible-playbook -i staging-servers.cfg main.yml --check
ansible-playbook -i staging-servers.cfg main.yml -vvvv
With this pipeline, you make sure:
- There are no syntax errors in
main.yml. - Code review rules are being followed before the code even gets merged into the repository.
- All modules are spelled correctly and available in the environment with the
-checkparameter. - There's a common Ansible version, such as 2.9.2, because it's installed in the CI environment.
- The playbook can run in the staging environment on the staging servers.
- Other people can see what was run, using the highly verbose parameter,
-vvvv.
After Jenkins is done running the job, you can get a notification that all is ready for staging and you can push these changes to production with another pipeline, this time for pushing to production. This is the power of code version control, Ansible, and multiple promotion through environments using CI/CD.
Lab – Use Ansible to BackUp and Configure a Device
In this lab, you will explore the fundamentals of how to use Ansible to automate some basic device management task. First, you will configure Ansible in your DEVASC VM. Next, you will use Ansible to connect to the CSR1000v and back up its configuration. Finally, you will configure the CSR1000v with IPv6 addressing.
You will complete the following objectives:
- Part 1: Launch the DEVASC VM and CSR1000v VM
- Part 2: Configure Ansible
- Part 3: Use Ansible to Back Up a Configuration
- Part 4: Use Ansible to Configure a Device
Lab – Use Ansible to Automate Installing a Web Server
In this lab, you will first configure Ansible so that it can communicate with a webserver application. You will then create a playbook that will automate the process of installing Apache on the webserver. You will also create a customized playbook that installs Apache with specific instructions.
You will complete the following objectives:
- Part 1: Launch the DEVASC VM
- Part 2: Perform a Backup with Ansible
- Part 3: Configure IPv6 Addressing with Ansible
- Part 4: Use Ansible to install Apache on Web Servers
- Part 5: Add Options to Your Ansible Playbook for Apache Web Servers
Puppet
Puppet was founded as open source in 2005 and commercialized as Puppet Enterprise by Puppet Labs in 2011.
Puppet’s core architecture has the following characteristics:
- A designated server to host main application components:
- The Puppet Server (historically called "Puppet Master")
- Facter, the fact-gathering service
- PuppetDB, which can store facts, node catalogs, and recent configuration event history
- A secure client, also known as a Puppet Agent, installed and configured on target machines. Clients and server are mutually authenticated with self-signed certificates, and SSL is used for transport. The agents gather facts (under control of the Facter service) and make configuration changes as directed by the Puppet Server.
- For cloud APIs and hardware that cannot run an agent, Puppet has modules available to enable these connections.
- In scaled-out implementations where many non-agent-capable devices are under management, Puppet enables a proxy agent to offload the work of directly connecting to device CLIs and exchanging information.
Operators communicate with the Puppet Server largely via SSH and the command line.
The Puppet Server can be a VM or even a Docker container for small self-teaching implementations, and Puppet provides a compact Docker install for this purpose, called PUPPERWARE. Standard packages are available for building a Puppet Server on Linux, which is currently the only option for a Server install. Puppet Agents (also called Clients) are available for Linux, Windows, and MacOS.
The figure shows a dev ops workstation icon up top with an s s h textbox below it that has an arrow pointing to the workstation and an arrow pointing down to the puppet server labeled Facter - Hiera - PuppetDB. Three lines with arrows at both ends point to a block of three servers labeled physical or virtual host running puppet agent and communicating via port 8140 and 8142 using t c p under s s l. From the puppet server is a line with arrows at both ends labeled t c p / s s l. This section has a puppet proxy agent icon with an arrow pointing to a circle with 4 arrows pointing inward. The puppet server also has a line with arrows at both ends labeled s s h pointing to an icon with a circle with 4 arrows pointing in labeled physical and or virtual infrastructure and hosts. The last line from the puppet server with arrows at both end labeled h t t p s / rest goes to the last section. This section has a server labeled infrastructure manager, cloud operations a p i, etc. Two arrows go from the server to an icon of a circle with 4 inward pointing arrows and another server. Words under this section: physical and or virtual infrastructure and hosts. The three lines going to the three sections has a textbox labeled modules on the puppet server.
Puppet Architecture
Installing Puppet
Puppet Server requires fairly powerful hardware (or a big VM), and also requires a Network Time Protocol client to be installed, configured, and tested. You can find a wide range of how-to blog posts about Puppet installation, but the Puppet project is fast-moving and third-party posts and articles may be outdated or unreliable.
When you have Puppet Server running, you can begin installing Puppet Agents on hosts you wish to manage. The agents will then need to have the puppet.conf file configured to communicate with a Puppet Server. After the client service is started, it will have its certificate signed by the server. The Server will now be able to gather facts from the client and update the client state with any configuration changes.
Puppet code structure
Like Ansible, Puppet lets you store components of a project or discrete configuration in a directory tree (/etc/puppetlabs/code/environments). Subsidiary folders are created according to the configuration in puppet.conf, or by the operator.
As an example, you may declare environment = production in puppet.conf. Puppet will create a directory for this default environment, which will contain a modules subdirectory in which you can store subsidiary projects and manifests for things you need to build and configure (/etc/puppetlabs/code/environments/production/modules).
To begin a small project, you might create a folder inside this directory, and then within that folder, create another called manifests, where you would store the manifest files declaring operational classes. These are units of code describing a configuration operation. Manifest files typically end in the extention .pp, and are written in Puppet's declarative language, which looks something like Ruby, and was inspired by the Nagios configuration file format.
Like Ansible and other configuration tools, Puppet provides a host of resources that can be invoked to define configuration actions to be performed on hosts and connected infrastructure. The basic idea is very similar to Ansible's practice, where a class that invokes a resource will be parameterized to function idempotently, and will be applied in context to produce the same desired result every time it runs.
Puppet comes with a set of basic resources (templates for performing configuration actions) built in. Many additional resources for performing all sorts of operations on all kinds of infrastructure can be downloaded and installed from Puppet Forge using the puppet module command.
Puppet at scale
Puppet Server is somewhat monolithic, and a monolithic installation is recommended by the (open source) implementors, who say an appropriately configured Puppet Server can manage "up to 4000" hosts.
The first recommended step to accommodate more hosts is to create additional "compile masters", which compile catalogs for client agents and place these behind a load balancer to distribute work.
Puppet Enterprise customers can further expand capacity by replacing PuppetDB with a stand-alone, customized database called PE-PostgreSQL. The Puppet Enterprise product offers many other conveniences as well, including a web-based console giving access to reports, logs, and enabling certain kinds of point-and-click configuration.
Cisco Puppet resources
Cisco and the Puppet community maintain extensive libraries of modules for automating Cisco compute and network hardware including:
- Cisco IOS modules enabling management of IOS infrastructure.
- Cisco UCS modules enabling control of UCS via UCS Manager
Puppet Example
This example describes how to install Puppet and then use Puppet to install Apache 2 on a device. You can simply read along to better understand Puppet.
This approximates the normal workflow for Puppet operations in an automated client/server environment. Note that modules can be completely generic and free of site-specific information, then separately and re-usably invoked to configure any number of hosts or infrastructure components. Because modules and manifests are composed as text files, they can easily be stored in coordinated fashion in a version control repository, such as Git.
Installing Puppet Server
Puppet Server requires fairly powerful hardware (or a big VM), and also requires a Network Time Protocol client to be installed, configured, and tested. Instructions for installing the server can be found in Puppet's documentation.
Installing Puppet Client
When you have Puppet Server running, you can install Puppet Agents on a host. For example, on a Debian-type Linux system, you can install Puppet Agent using a single command:
sudo apt-get install puppet-agent
Modify
When installed, the Puppet Agent needs to be configured to seek a Puppet Server. Add the following lines to the file /etc/puppet/puppet.conf:
[main]
certname = puppetclient
server = puppetserver
environment = production
runinterval = 15m
This tells the Client the hostname of your server (resolved via /etc/hosts) and the name of the authentication certificate that you will generate in the next step.
Start the puppet service on the Client:
sudo /opt/puppetlabs/bin/puppet resource service puppet ensure=running enable=true
You should get a response similar to the following:
Notice: /Service[puppet]/ensure: ensure changed 'stopped' to 'running'
service { 'puppet':
ensure => 'running',
enable => 'true',
}
Note Puppet's declarative method of configuration. Puppet uses itself to configure and activate its service. Note the Ruby-like declarative syntax.
Certificate signing
Puppet Agents use certificates to authenticate with the server before retrieving their configurations. When the Client service starts for the first time, it sends a request to its assigned server to have its certificate signed, enabling communication.
On the Server, issuing the ca list command returns a list of pending certificates:
sudo /opt/puppetlabs/bin/puppetserver ca list
The response should be similar to the following:
Requested Certificates:
puppetclient (SHA256) 44:9B:9C:02:2E:B5:80:87:17:90:7E:DC:1A:01:FD:35:C7:DB:43:B6:34:6F:1F:CC:DC:C2:E9:DD:72:61:E6:B2
You can then sign the certificate, enabling management of the remote node:
sudo /opt/puppetlabs/bin/puppetserver ca sign --certname puppetclient
The response:
Successfully signed certificate request for puppetclient
The Server and Client are now securely bound and able to communicate. This will enable the Server to gather facts from the Client, and let you create configurations on the Server that are obtained by the client and used to converge its state (every 15 minutes).
Creating a configuration
Like Ansible, Puppet lets you store components of a project or discrete configuration in a directory tree:
/etc/puppetlabs/code/environments
Subsidiary folders are created according to the configuration in puppet.conf or by the operator. In this example, having declared environment = production in puppet.conf, Puppet has already created a directory for this default site, containing a modules subdirectory in which we can store subsidiary projects and manifests for things we need to build and configure.
/etc/puppetlabs/code/environments/production/modules
You will now install Apache2 on your managed client. Puppet operations are typically performed as root, so become root on the Server temporarily by entering:
sudo su -
Navigate to the /modules folder in the /production environment:
cd /etc/puppetlabs/code/environments/production/modules
Create a folder structure to contain the install apache module:
mkdir -p apache2/manifests
cd apache2/manifests
Inside this manifests folder, create a file called init.pp, which is a reserved filename for the initialization step in a module:
class apache2 {
package { 'apache2':
ensure => installed,
}
service { 'apache2':
ensure => true,
enable => true,
require => Package['apache2'],
}
}
The class definition orders the steps we want to perform:
Step 1. Invoke the package resource to install the named package. If you wanted to remove the package, you could change ensure => installed to read ensure => absent.
Step 2. Invoke the service resource to run if its requirement (in this case, that Apache2 is present) is met. Instruct it to ensure that the service is available, and then enable it to restart automatically when the server reboots.
Navigate to the associated manifests folder:
cd /etc/puppetlabs/code/environments/production/manifests
Create a site.pp file that invokes the module and applies it to the target machine:
node 'puppetclient' {
include apache2
}
Deploying the configuration
You have two options to deploy the completed configuration:
- Restarting the Puppet Server will now cause the manifests to be compiled and made available to the Puppet Agent on the named device. The agent will retrieve and apply them, installing Apache2 with the next update cycle:
systemctl restart puppetserver.service
- For development and debugging, you can invoke Puppet Agent on a target machine (in this case our Puppet Client machine):
sudo puppet agent -t
- The agent will immediately interrogate the server, download its catalog (the configurations that reference it) and apply it. The results will be similar to the following:
root@target:/etc/puppetlabs/code/environments/production/manifests# puppet agent -tInfo: Using configured environment 'production'Info: Retrieving pluginfactsInfo: Retrieving pluginInfo: Retrieving localesInfo: Caching catalog for puppetagentInfo: Applying configuration version '1575907251'Notice: /Stage[main]/Apache2/Package[apache2]/ensure: createdNotice: Applied catalog in 129.88 seconds
After the application has been successfully deployed, enter the target machine's IP address in your browser. This should bring up the Apache2 default homepage.
Chef
Chef
Chef provides a complete system for treating infrastructure as code. Chef products are partly licensed, but free for personal use (in Chef Infra Server's case, for fewer than 25 managed nodes).
Chef's products and solutions enable infra-as-code creation, testing, organization, repository storage, and execution on remote targets, either from a stand-alone Chef Workstation, or indirectly from a central Chef Infra Server. You should be aware of the main Chef components:
- Chef Workstation - A standalone operator workstation, which may be all that smaller operations need.
- Chef Infra Client (the host agent) - Chef Infra Clients run on hosts and retrieve configuration templates and implement required changes. Cookbooks (and proxy Clients) enable control of hardware and resources that cannot run a Chef Infra Client locally (such as network devices).
- Chef Infra Server - Replies to queries from Chef Infra Agents on validated hosts and responds with configuration updates, upon which the Agents then converge host configuration.
Most configuration tasks can also be carried out directly between Chef Workstation and managed nodes and devices.
The figure shows a chef workstation icon up top labeled chef, knife c l i's and u i chef repo. It has an h t t p s textbox below it that has an arrow pointing to the workstation and an arrow pointing down to the chef server labeled nGinX - PostgreSQL - apache soir - bookshelf. Three lines with arrows at both ends point to a block of three servers on the left labeled physical or virtual host running chef client and communicating via h t t p s. From the chef server is a line with arrows at both ends labeled s s h pointing to a section that has a circle with 4 arrows pointing inward and labeled physical and or virtual infrastructure and hosts. The second line from the chef server with arrows at both end labeled h t t p s / rest goes to the last section. This section has a server labeled infrastructure manager, cloud operations a p i, etc. Two arrows go from the server to an icon of a circle with 4 inward pointing arrows and another server. Words under this section: physical and or virtual infrastructure and hosts. The two lines going to the two sections has a textbox labeled cookbooks on the chef server.
Chef Architecture
Components of Chef Workstation
Chef's operator workstation provides:
- Command-line tools for authoring, testing, and maintaining "cookbooks" and applying them directly to hosts
- Interacting with Chef Infra Servers to bootstrap new nodes and set policy
- Test Kitchen, which is a testing harness
- ChefSpec, which simulates the effects of code before implementing changes
- InSpec, a security/compliance auditing and testing framework
Chef provides hundreds of resources for performing common configuration tasks in idempotent fashion, as well as Chef Supermarket, a community-maintained sharing site for Cookbooks, custom resources, and other solutions.
Code is maintained in a local repository format called chef-repo, which can be synchronized with Git for enterprise-wide infra-as-code efforts. Within a repo, code is organized in "cookbook" folders, comprising "recipes" (actual linear configuration code, written in Chef's extended Ruby syntax), segregated attributes, custom resources, test code, and metadata.
Chef's domain-specific language (DSL) enables you to address configuration tasks by authoring a sequence of small, bracketed templates, each of which declares a resource and parameterizes it. Chef resources tend to be more abstract than Ansible's or Puppet's, which helps address cross-platform concerns. For example, the package resource can determine the kind of Linux, MacOS, or Windows environment that it is running on and complete a required installation in a platform-specific way.
Installing Chef Workstation
To begin using Chef, a good first step is to install Chef Workstation, which provides a complete operations environment. Workstation is available for Linux and Windows. Refer to the Chef Workstation downloads page for more information.
When Workstation is installed, you can use it immediately to start making configuration changes on accessible hosts. Some node preparation is helpful before trying to manage a target node with Chef. You should configure SSH keywise access to the host rather than using passwords. And it helps (if you are not running DNS) to include the IP address and hostname of the target machine in your Workstation machine's /etc/hosts file.
Running Chef at scale
Chef Infra Server was rewritten several years back in Erlang, to increase its capacity, enabling management of up to about 10,000 hosts. It can be configured for high availability by deploying its front-end services (including NGINX and stateless application logic) into an array of load-balanced proxies, which connect to a three-server active/active cluster supporting back-end services like elasticsearch, etcd, and PostgreSQL.
Chef also provides an array of products that together solve most of the problems enterprises face in dealing with increasingly-complex, large-scale, hybrid infrastructures. Its on-Workstation chef-repo structures harmonize with Git, enabling convenient version control and collaboration on DevOps code, and simplifying transitions to infra-as-code regimes. Its core philosophy of continuous configuration management dovetails well with the goal of continuous IT delivery.
Chef's built-in unit testing framework Test Kitchen, pre-deployment simulators, and companion auditing and security assessor InSpec provide the rest of a purpose-built DevOps test-driven development framework.
Cisco Chef Resources
Cisco has developed modified Chef Infra Agents that run in the guest shell of NX-OS switch equipment, enabling this hardware to work with Chef as if it were a managed host. Cisco has also developed, and maintains a Cisco Chef Cookbook for NX-OS infrastructure, available on Chef Supermarket.
A GitHub public repo of cookbook and recipe code is also maintained, to enable control of a wide range of Cisco products.
Cisco UCS infrastructure is easily managed with Chef, via a cookbook enabling integration with Integrated Management Controllers. Management via UCS Manager and Intersight is possible via Python and/or PowerShell SDKs.
Chef Example – Install and Use Chef
This example describes how to install Chef and use it to install Apache 2 on a device. You can simply read along to better understand Chef.
Installing Chef Workstation
Chef Workstation provides a complete operations environment. Workstation is available for Linux and Windows. The following example assumes you're installing on an Ubuntu 18.04 LTS virtual machine.
If your machine is set up with a standard desktop, you can browse to the Chef Workstation downloads page, find the download for Ubuntu 18.04, and install it automatically with the Debian package manager.
Alternatively, you can install from the command line by copying the URL of the .deb package and using the following steps:
wget https://packages.chef.io/files/stable/chef-workstation/0.12.20/ubuntu/18.04/chef-workstation_0.12.20-1_amd64.deb
sudo dpkg -i chef-workstation_0.12.20-1_amd64.deb
Basic configuration management
After Workstation is installed, you can use it immediately to start making configuration changes on accessible hosts. You will use the chef-run command for this. It is a subsystem that takes care of bootstrapping a Chef Infra Agent onto the target host and then executes whatever commands you reference in files or provide in arguments.
The first time you use chef-run (or other Chef tools), you may be asked to accept licensing terms (type yes) for the utility you're using, or for subsystems it invokes.
For the first configuration exercise, you will provide the information Chef needs to install the ntp package. In the process, you will provide the remote username, their sudo (become root) password, the name of the remote host target (This is in your /etc/hosts file. Otherwise, you would use its IP address here.) and the name of the resource verb:
chef-run -U myname -sudo <password> target package ntp action=install
Chef connects to the node, initially via SSH, and bootstraps the Chef Infra Client onto it (if not already present). This can take a while, but chef-run helpfully shows you activity indicators. When the client is installed, the task is handed to it, and the process completes. You get back something that looks like this:
[✔] Packaging cookbook... done!
[✔] Generating local policyfile... exporting... done!
[✔] Applying package[ntp] from resource to target.
└── [✔] [target] Successfully converged package[ntp].
Note the vocabulary Chef uses to describe what it is doing. The configuration action that you request is treated as a policy that partially describes the target machine's desired state. Chef does what is required to make that policy happen, converging the machine to its desired state, where NTP is installed and its time-synchronization service is activated by default.
Installing Chef Infra Client
Chef Infra Client runs locally on conventional compute nodes. It authenticates to Chef Infra Server using public keypairs, which are generated and signed when a node is registered with a Chef Infra Server. This ensures that rogue nodes cannot request configuration information from the Server. Communications between authorized Clients and their Server are safely encrypted with TLS.
Chef Infra Client includes a discovery subsystem called Ohai, which collects system facts and uses them to determine whether (and how) the target system has drifted from its configuration, and needs to be converged.
Chef Workstation can bootstrap Infra Client onto target nodes. You can also preinstall Infra Client on nodes, for example, while creating new nodes on a public cloud. Below is an example script you might run on a target host to do this. Note that user data scripts run as root, so sudo is not required here. However, if you log into a remote manually as a user (perhaps in the sudoers group) rather than as root, you would need to assume root privileges (using sudo su -) before creating and running this script locally.
The script uses a Chef-provided installer called Omnitruck to do this. The Omnitruck shell script figures out which kind of Linux distribution you are using and otherwise enables a safe, predictable installation of Chef software (you can also use it to install other Chef products). A Windows version of this script is also available that runs on PowerShell:
#!/bin/bash
apt-get update
apt-get install curl
curl -L https://omnitruck.chef.io/install.sh | bash -s once -c current -p chef
Note that the parameters shown above will install the latest version of the Chef client, and do not pin the version. This is dangerous for production work, because it permits updates to occur without warning, possibly introducing an incompatibility between the Client and Workstation or Server. The -v option lets you install a specific version of the Client, and pins it automatically. Bootstrapping a node with Chef installs the latest compatible version and pins it.
Chef Infra Server prerequisites
Chef Infra Server stores configuration and provides it to Clients automatically, when polled, enabling Clients to converge themselves to a desired state. Downloadable packages are listed and linked on this page. The server is free to use for fewer than 25 hosts.
Before installing Chef Infra Server, install openssh-server and enable keywise access. You would also need to install NTP for time synchronization. You can do this with Chef, or you can do it manually:
sudo apt-get install ntp ntpdate net-tools
On an Ubuntu system, turn off the default timedatectl synchronization service to prevent it from interfering with NTP synchronization:
sudo timedatectl set-ntp 0
After NTP is installed, ensure that it is synchronizing with a timeserver in its default pool. This may take a few minutes, so repeat the command until you see something like this:
ntpstat
synchronised to NTP server (198.60.22.240) at stratum 2
time correct to within 108 ms
polling server every 256 s
When this shows up, you can install Chef Infra Server.
Installing Chef Infra Server
To install Chef Infra Server on Ubuntu 18.04, you can perform steps similar to the manual Workstation install, above, after obtaining the URL of the .deb package. At time of writing, the current stable version was 13.1.13-1.
wget https://packages.chef.io/files/stable/chef-server/13.1.13/ubuntu/18.04/chef-server-core_13.1.13-1_amd64.deb
sudo dpkg -i chef-server-core_13.1.13-1_amd64.deb
After Chef Infra Server is installed, issue the following command to tell it to read its default configuration, initialize, and start all services. This is a long process, and is done by Chef itself, giving you an opportunity to see some of Chef's logic used to apply configuration details in converging a complex application to a desired state:
sudo chef-server-ctl reconfigure
The configuration process may initially fail on low-powered or otherwise compromised VMs, but because this is Chef (thus idempotent) it can be run more than once in an attempt to get it to complete. This sometimes works. If it does not work, it is a sign that you are trying to run on hardware (or virtualware) that is not powerful enough, and should be upgraded before trying again.
When chef-server-ctl begins the reconfigure process, on an initial run, you would be expected to accept several product licenses. Type yes at the prompt.
Create an Infra Server user:
sudo chef-server-ctl user-create <username> <firstname> <lastname> <email> '<password>' --filename <key_file_path_and_name.pem>
Provide your own preferred user information for the <>-bracketed terms (removing the <> brackets from your actual responses) and include a password. The argument to --filename provides a pre-existing and accessible path and filename for the .pem private key file that Chef will create for this user. This key file will need to be downloaded from the server and established on the Workstation to enable server management. It makes sense to store this key in a folder that is readily accessible from your OS user's (myname's) home directory. IMPORTANT: Remember the key file path and filename!
Next, you create an organization, which is a structure Chef uses to isolate different bodies of configuration policy from one another. These can be actual organizations, or a concept more like 'sites'. Chef issues each organization an RSA key which is used to authenticate hosts to Server and organization, thus providing a multi-tenant infrastructure.
Provide a short name for the organization, a full descriptive name, the username you created in the last step to associate with the organization, and an accessible path to store the validation key. By convention (though this is optional) the key can be called <ORGANIZATION>-validator.pem:
sudo chef-server-ctl org-create <short_name> '<full_organization_name>' --association_user <username_you_just_created> --filename <key_file_path_and_name.pem>
It makes sense to store this key in the same directory that you used to store the user key generated in the prior step.
Install Chef-Manage
You can also install the web interface for Chef server. This can be done by entering:
sudo chef-server-ctl install chef-manage
When the process completes, restart the server and manage components. These are Chef operations, and may take a while, as before.
sudo chef-server-ctl reconfigure
(lots of output)
sudo chef-manage-ctl reconfigure --accept-license
(lots of output)
The argument --accept-license prevents chef-manage-ctl from stopping to ask you about the unique licenses for this product. When this process is complete, you can visit the console in a browser via https://<IP_OF_CHEF_SERVER>. Note that most browsers will return an error about the server's self-signed certificate, and you will need to give permission to connect. Use the username/password you created above with chef-server-ctl user-create.
Initially, there is not much to see, but this will change when you register a node.
Finish configuring Workstation
Before Chef Workstation can talk to your Infra Server, you need to do a little configuration. To begin, retrieve the keys generated during Server configuration, and store them in the folder /home/myname/.chef, created during Workstation installation:
cd /home/myname/.chef
scp myname@chefserver:./path/*.pem .
/path/ is the path from your (myname) home directory on the Server to the directory in which the Server stored keys.
If you are not using keywise authentication to your Server, scp will ask for your password (your original user password, not your Chef Server user password). The . after user@host: refers to your original user's home directory, from which the path is figured. The wildcard expression finds files ending in .pem at that path. The closing dot means "copy to the current working directory" (which should be the .chef folder). Run the ls command from within the .chef folder to see if your keys made it.
Chef Example – Prepare to Use Knife
Simply read along with this example to better understand Chef.
Prepare to use Knife
Knife is a tool for managing cookbooks, recipes, nodes, and other assets, and for interacting with the Chef Infra Server. You need to give Knife a solid initial configuration.
Within the .chef folder, edit the (initially empty) file named config.rb, and include the following lines of code, adapting them to your environment:
chef_server_url 'https://chefserver/organizations/<short_name>'
client_key '/home/<myname>/.chef/<userkey.pem>'
cookbook_path [
'/home/<myname>/cookbooks'
]
data_bag_encrypt_version 2
node_name '<username>'
validation_client_name '<short_name>-validator'
validation_key '/home/<myname>/.chef/<short_name>-validator.pem'
This configuration identifies the Chef Server and the organization you created, the path to your user key (created during Infra Server post-install configuration), the path you want to use for cookbook file trees, a desired encryption level for "databag" storage (for now, just set it to the recommended value), your server username, a name derived from the name of your validation key, and the local path to that key itself. Change the <> bracketed names to your own values.
Save the config.rb file, then create the directory /home/myname/cookbooks:
mkdir /home/myname/cookbooks
Finally, issue the command:
knife ssl fetch
If you have correctly set up the config.rb, Knife will consult with your server, retrieve its certificate, and store it in the directory:
Certificate stored in: /home/myname/.chef/trusted_certs
Chef will find this automatically when it is time to connect with the server, providing assurance that the server is authentic.
Bootstrap a target node with knife
Now that Knife is configured, you can bootstrap your target node. It is fine that you are doing this a second time (remember, earlier, you installed Chef Infra Client manually).
To bootstrap, issue the following command, replacing variable fields with your information. The command is set up to use keywise authentication to the target machine. The somewhat-redundant --sudo and --use-sudo-password commands tell Knife to use sudo to complete its work. The -P option provides your sudo password on the target machine. <name_for_your_node> is an arbitrary name. The --ssh-verify-host-key never flag and argument cause the command not to pause and ask your permission interactively if/when it finds that you've never logged into this server before:
knife bootstrap target --connection-user <myname> -i ~/.ssh/id_ed25519 --sudo --use-sudo-password -P <sudo_password> --node-name <name_for_your_node> --ssh-verify-host-key never
If the command works correctly, you would get back something that looks like this. Note that Chef has detected the earlier installation and has not overwritten it:
[target] [sudo] password for myname:
[target] -----> Existing Chef Infra Client installation detected
[target] Starting the first Chef Infra Client Client run...
[target] +---------------------------------------------+
✔ 2 product licenses accepted.
+---------------------------------------------+
[target] Starting Chef Infra Client, version 15.5.17
[target]
[target] Creating a new client identity for target using the validator key.
[target]
[target] resolving cookbooks for run list: []
[target]
[target] Synchronizing Cookbooks:
[target]
[target] Installing Cookbook Gems:
[target]
[target] Compiling Cookbooks...
[target]
[target] [2019-12-10T15:16:56-05:00] WARN: Node target has an empty run list.
[target] Converging 0 resources
[target]
[target]
[target]
[target] Running handlers:
[target]
[target] Running handlers complete
[target] Chef Infra Client finished, 0/0 resources updated in 02 seconds
[target]
Now, if you check back in your browser and refresh Chef Manage, you should see that your target machine is now being managed by the server.
Chef Manage Displays Your Target Node
Chef Example – Putting it all Together
Simply read along with this example to better understand Chef.
Putting it all together
Now you will use everything together to create an actual recipe, push it to the server, and tell the target machine's client to requisition and converge on the new configuration. This is similar to the way Chef is used in production, but on a smaller scale.
To start, create a cookbook to build a simple website. Navigate to your cookbooks directory, create a new cookbook called apache2, and navigate into it:
cd /home/myname/cookbookschef generate cookbook apache2cd apache2
Take a look around the cookbook folder structure. There are folders already prepared for recipes and attributes. Add an optional directory and subdirectory for holding files your recipe needs:
mkdir -p files/default
cd files/default
Files in the /default subdirectory of a /files directory within a cookbook can be found by recipe name alone, no paths are required.
Now create a homepage for your website:
vi index.html
<html>
<head>
<title>Hello!</title>
</head>
<body>
<h1>HELLO!</h1>
</body>
</html>
Save the file and exit, then navigate to the recipes directory, where Chef has already created a default.rb file for us. The default.rb file will be executed by default when the recipe is run.
cd ../../recipes
Add some stanzas to the default.rb file. Again, edit the file:
vi default.rb
The header at the top is created for you.
#
# Cookbook:: apache2
# Recipe:: default
#
# Copyright:: 2019, The Authors, All Rights Reserved.
apt_update do
action :update
end
package 'apache2' do
action :install
end
cookbook_file "/var/www/html/index.html" do
source "index.html"
mode "0644"
end
Underneath, the recipe performs three actions. The first resource you are invoking, apt_update, handles the apt package manager on Debian. You would use this to force the equivalent of sudo apt-get update on your target server, before installing the Apache2 package. The apt_update resource's action parameter can take several other values, letting you perform updates only under controlled conditions, which you would specify elsewhere.
The package function is used to install the apache2 package from public repositories. Alternative actions include :remove, which would uninstall the package, if found.
Finally, you use the cookbook_file resource to copy the index.html file from /files/default into a directory on the target server (Apache's default web root directory). What actually happens is that the cookbook, including this file, gets copied into a corresponding cookbook structure on the server, then served to the client, which executes the actions. The mode command performs the equivalent of chmod 644 on the file which, when it reaches its destination, makes it universally readable and root-writable.
Save the default.rb file, then upload the cookbook to the server:
knife cookbook upload apache2
Uploading apache2 [0.1.0]
Uploaded 1 cookbook.
You can then confirm that the server is managing your target node:
knife node list
target
The Knife application can interoperate with your favorite editor. To enable this, perform the following export with your editor's name:
export EDITOR=vi
This lets the next command execute interactively, putting the node definition into vi to let you alter it manually.
knife node edit target{
"name": "target",
"chef_environment": "_default",
"normal": {
"tags": [
]
},
"policy_name": null,
"policy_group": null,
"run_list": [
"recipe[apache2]"
]
}
As you can see, the expression "recipe[apache2]" has been added into the run_list array, which contains an ordered list of the recipes you want to apply to this node.
Save the file in the usual manner. Knife immediately pushes the change to the Infra Server, keeping everything in sync.
Finally, you can use the knife ssh command to identify the node, log into it non-interactively using SSH, and execute the chef-client application. This causes the node to immediately reload its state from the server (which has changed) and implement the new runlist on its host.
knife ssh 'name:target' 'sudo chef-client'
In this case, you would need to provide your sudo password for the target machine, when Knife asks for it. In a real production environment, you would automate this so you could update many nodes at once, storing secrets separately.
If all goes well, Knife gives you back a very long log that shows you exactly the content of the file that was overwritten (potentially enabling rollback), and confirms each step of the recipe as it executes.
target
target Starting Chef Infra Client, version 15.5.17
target resolving cookbooks for run list: ["apache2"]
target Synchronizing Cookbooks:
target - apache2 (0.1.0)
target Installing Cookbook Gems:
target Compiling Cookbooks...
target Converging 3 resources
target Recipe: apache2::default
target * apt_update[] action update
target - force update new lists of packages
target * directory[/var/lib/apt/periodic] action create (up to date)
target * directory[/etc/apt/apt.conf.d] action create (up to date)
target * file[/etc/apt/apt.conf.d/15update-stamp] action create_if_missing (up to date)
target * execute[apt-get -q update] action run
target - execute ["apt-get", "-q", "update"]
target
target * apt_package[apache2] action install
target - install version 2.4.29-1ubuntu4.11 of package apache2
target * cookbook_file[/var/www/html/index.html] action create
target - update content in file /var/www/html/index.html from b66332 to 3137ae
target --- /var/www/html/index.html 2019-12-10 16:48:41.039633762 -0500
target +++ /var/www/html/.chef-index20191210-4245-1kusby3.html 2019-12-10 16:48:54.411858482 -0500
target @@ -1,376 +1,10 @@
target -
target -<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
target -<html xmlns="http://www.w3.org/1999/xhtml">
target - <!--
target - Modified from the Debian original for Ubuntu
target - Last updated: 2016-11-16
target - See: https://launchpad.net/bugs/1288690
target - -->
target - <head>
target - <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
target - <title>Apache2 Ubuntu Default Page: It works</title>
target - <style type="text/css" media="screen">
target - * {
target - margin: 0px 0px 0px 0px;
target - padding: 0px 0px 0px 0px;
target - }
target -
target - body, html {
target - padding: 3px 3px 3px 3px;
target -
target - background-color: #D8DBE2;
target -
target - font-family: Verdana, sans-serif;
target - font-size: 11pt;
target - text-align: center;
target - }
target -
target - div.main_page {
target - position: relative;
target - display: table;
target -
target - width: 800px;
target -
target - margin-bottom: 3px;
target - margin-left: auto;
target - margin-right: auto;
target - padding: 0px 0px 0px 0px;
target -
target - border-width: 2px;
target - border-color: #212738;
target - border-style: solid;
target -
target - background-color: #FFFFFF;
target -
target - text-align: center;
target - }
target -
target - div.page_header {
target - height: 99px;
target - width: 100%;
target -
target - background-color: #F5F6F7;
target - }
target -
target - div.page_header span {
target - margin: 15px 0px 0px 50px;
target -
target - font-size: 180%;
target - font-weight: bold;
target - }
target -
target - div.page_header img {
target - margin: 3px 0px 0px 40px;
target -
target - border: 0px 0px 0px;
target - }
target -
target - div.table_of_contents {
target - clear: left;
target -
target - min-width: 200px;
target -
target - margin: 3px 3px 3px 3px;
target -
target - background-color: #FFFFFF;
target -
target - text-align: left;
target - }
target -
target - div.table_of_contents_item {
target - clear: left;
target -
target - width: 100%;
target -
target - margin: 4px 0px 0px 0px;
target -
target - background-color: #FFFFFF;
target -
target - color: #000000;
target - text-align: left;
target - }
target -
target - div.table_of_contents_item a {
target - margin: 6px 0px 0px 6px;
target - }
target -
target - div.content_section {
target - margin: 3px 3px 3px 3px;
target -
target - background-color: #FFFFFF;
target -
target - text-align: left;
target - }
target -
target - div.content_section_text {
target - padding: 4px 8px 4px 8px;
target -
target - color: #000000;
target - font-size: 100%;
target - }
target -
target - div.content_section_text pre {
target - margin: 8px 0px 8px 0px;
target - padding: 8px 8px 8px 8px;
target -
target - border-width: 1px;
target - border-style: dotted;
target - border-color: #000000;
target -
target - background-color: #F5F6F7;
target -
target - font-style: italic;
target - }
target -
target - div.content_section_text p {
target - margin-bottom: 6px;
target - }
target -
target - div.content_section_text ul, div.content_section_text li {
target - padding: 4px 8px 4px 16px;
target - }
target -
target - div.section_header {
target - padding: 3px 6px 3px 6px;
target -
target - background-color: #8E9CB2;
target -
target - color: #FFFFFF;
target - font-weight: bold;
target - font-size: 112%;
target - text-align: center;
target - }
target -
target - div.section_header_red {
target - background-color: #CD214F;
target - }
target -
target - div.section_header_grey {
target - background-color: #9F9386;
target - }
target -
target - .floating_element {
target - position: relative;
target - float: left;
target - }
target -
target - div.table_of_contents_item a,
target - div.content_section_text a {
target - text-decoration: none;
target - font-weight: bold;
target - }
target -
target - div.table_of_contents_item a:link,
target - div.table_of_contents_item a:visited,
target - div.table_of_contents_item a:active {
target - color: #000000;
target - }
target -
target - div.table_of_contents_item a:hover {
target - background-color: #000000;
target -
target - color: #FFFFFF;
target - }
target -
target - div.content_section_text a:link,
target - div.content_section_text a:visited,
target - div.content_section_text a:active {
target - background-color: #DCDFE6;
target -
target - color: #000000;
target - }
target -
target - div.content_section_text a:hover {
target - background-color: #000000;
target -
target - color: #DCDFE6;
target - }
target -
target - div.validator {
target - }
target - </style>
target - </head>
target - <body>
target - <div class="main_page">
target - <div class="page_header floating_element">
target - <img src="/icons/ubuntu-logo.png" alt="Ubuntu Logo" class="floating_element"/>
target - <span class="floating_element">
target - Apache2 Ubuntu Default Page
target - </span>
target - </div>
target -<!-- <div class="table_of_contents floating_element">
target - <div class="section_header section_header_grey">
target - TABLE OF CONTENTS
target - </div>
target - <div class="table_of_contents_item floating_element">
target - <a href="#about">About</a>
target - </div>
target - <div class="table_of_contents_item floating_element">
target - <a href="#changes">Changes</a>
target - </div>
target - <div class="table_of_contents_item floating_element">
target - <a href="#scope">Scope</a>
target - </div>
target - <div class="table_of_contents_item floating_element">
target - <a href="#files">Config files</a>
target - </div>
target - </div>
target --->
target - <div class="content_section floating_element">
target -
target -
target - <div class="section_header section_header_red">
target - <div id="about"></div>
target - It works!
target - </div>
target - <div class="content_section_text">
target - <p>
target - This is the default welcome page used to test the correct
target - operation of the Apache2 server after installation on Ubuntu systems.
target - It is based on the equivalent page on Debian, from which the Ubuntu Apache
target - packaging is derived.
target - If you can read this page, it means that the Apache HTTP server installed at
target - this site is working properly. You should <b>replace this file</b> (located at
target - <tt>/var/www/html/index.html</tt>) before continuing to operate your HTTP server.
target - </p>
target -
target -
target - <p>
target - If you are a normal user of this web site and don't know what this page is
target - about, this probably means that the site is currently unavailable due to
target - maintenance.
target - If the problem persists, please contact the site's administrator.
target - </p>
target -
target - </div>
target - <div class="section_header">
target - <div id="changes"></div>
target - Configuration Overview
target - </div>
target - <div class="content_section_text">
target - <p>
target - Ubuntu's Apache2 default configuration is different from the
target - upstream default configuration, and split into several files optimized for
target - interaction with Ubuntu tools. The configuration system is
target - <b>fully documented in
target - /usr/share/doc/apache2/README.Debian.gz</b>. Refer to this for the full
target - documentation. Documentation for the web server itself can be
target - found by accessing the <a href="/manual">manual</a> if the <tt>apache2-doc</tt>
target - package was installed on this server.
target -
target - </p>
target - <p>
target - The configuration layout for an Apache2 web server installation on Ubuntu systems is as follows:
target - </p>
target - <pre>
target -/etc/apache2/
target -|-- apache2.conf
target -| `-- ports.conf
target -|-- mods-enabled
target -| |-- *.load
target -| `-- *.conf
target -|-- conf-enabled
target -| `-- *.conf
target -|-- sites-enabled
target -| `-- *.conf
target - </pre>
target - <ul>
target - <li>
target - <tt>apache2.conf</tt> is the main configuration
target - file. It puts the pieces together by including all remaining configuration
target - files when starting up the web server.
target - </li>
target -
target - <li>
target - <tt>ports.conf</tt> is always included from the
target - main configuration file. It is used to determine the listening ports for
target - incoming connections, and this file can be customized anytime.
target - </li>
target -
target - <li>
target - Configuration files in the <tt>mods-enabled/</tt>,
target - <tt>conf-enabled/</tt> and <tt>sites-enabled/</tt> directories contain
target - particular configuration snippets which manage modules, global configuration
target - fragments, or virtual host configurations, respectively.
target - </li>
target -
target - <li>
target - They are activated by symlinking available
target - configuration files from their respective
target - *-available/ counterparts. These should be managed
target - by using your helpers
target - <tt>
target - a2enmod,
target - a2dismod,
target - </tt>
target - <tt>
target - a2ensite,
target - a2dissite,
target - </tt>
target - and
target - <tt>
target - a2enconf,
target - a2disconf
target - </tt>. See their respective man pages for detailed information.
target - </li>
target -
target - <li>
target - The binary is called apache2. Due to the use of
target - environment variables, in the default configuration, apache2 needs to be
target - started/stopped with <tt>/etc/init.d/apache2</tt> or <tt>apache2ctl</tt>.
target - <b>Calling <tt>/usr/bin/apache2</tt> directly will not work</b> with the
target - default configuration.
target - </li>
target - </ul>
target - </div>
target -
target - <div class="section_header">
target - <div id="docroot"></div>
target - Document Roots
target - </div>
target -
target - <div class="content_section_text">
target - <p>
target - By default, Ubuntu does not allow access through the web browser to
target - <em>any</em> file apart of those located in <tt>/var/www</tt>,
target - <a href="http://httpd.apache.org/docs/2.4/mod/mod_userdir.html" rel="nofollow">public_html</a>
target - directories (when enabled) and <tt>/usr/share</tt> (for web
target - applications). If your site is using a web document root
target - located elsewhere (such as in <tt>/srv</tt>) you may need to whitelist your
target - document root directory in <tt>/etc/apache2/apache2.conf</tt>.
target - </p>
target - <p>
target - The default Ubuntu document root is <tt>/var/www/html</tt>. You
target - can make your own virtual hosts under /var/www. This is different
target - to previous releases which provides better security out of the box.
target - </p>
target - </div>
target -
target - <div class="section_header">
target - <div id="bugs"></div>
target - Reporting Problems
target - </div>
target - <div class="content_section_text">
target - <p>
target - Please use the <tt>ubuntu-bug</tt> tool to report bugs in the
target - Apache2 package with Ubuntu. However, check <a
target - href="https://bugs.launchpad.net/ubuntu/+source/apache2"
target - rel="nofollow">existing bug reports</a> before reporting a new bug.
target - </p>
target - <p>
target - Please report bugs specific to modules (such as PHP and others)
target - to respective packages, not to the web server itself.
target - </p>
target - </div>
target -
target -
target -
target -
target - </div>
target - </div>
target - <div class="validator">
target - </div>
target - </body>
target +<html>
target +<head>
target +<title>Hello!</title>
target +</head>
target +<body>
target +<h1>HELLO!</h1>
target +</body>
target </html>
target
target
target Running handlers:
target Running handlers complete
target Chef Infra Client finished, 4/7 resources updated in 02 minutes 31 seconds
At this point, you should be able to point a browser at the target machine's IP address, and see your new index page.
Chef will work to maintain this configuration, preventing drift in the configuration. If you were to log into the target server and make a change to the index.html file in /var/www/html (for example, changing the word "HELLO" to "GOODBYE"), Chef will fix the change pre-emptively the next time the agent runs (by default, within 30 minutes).
Summary of Automation Tools
Summary
This has been a high-level introduction to three modern DevOps toolkits. You should now be ready to:
- Deploy and integrate free versions of the major components of Ansible, Puppet, and/or Chef on a range of substrates, from desktop virtual machines (such as VirtualBox VMs) to cloud-based VMs on Azure, AWS, GCP or other IaaS platforms.
- Experience each platform's declarative language, style of infra-as-code building and organizing, and get a sense of the scope of its library of resources, plugins, and integrations.
- Get practice automating some of the common IT tasks you may do at work, or solve deployment and lifecycle management challenges you set yourself, in your home lab. Hands-on exercises and work will give you a complete sense of how each platform addresses configuration themes, and help you overcome everyday IT gotchas.
If you are building up your reputation with community, know that almost nothing impresses IT peers so much as a well-executed, insightfully automated deploy or manage codebase for a complex, head-scratching piece of software. Entire companies are built on the bedrock of knowing how to deploy complicated systems in robust configurations, such as specialized databases, container orchestration and cloud frameworks like Kubernetes and Openstack.
Be realistic, though. These are each extremely complex and sophisticated platforms that take users years to master, so do not get discouraged if you find them confusing! Reach out to the communities of the products you enjoy using (or that your workplace endorses) and you will learn more quickly.
Infrastructure as Code
Why Store Infrastructure as Code?
At this point, it is time to introduce a new term: immutability. This literally means "the state of being unchangeable," but in DevOps parlance, it refers to maintaining systems entirely as code, performing no manual operations on them at all.
These topics have touched several times on the concept of treating infrastructure as code. But thus far, it has mostly been preoccupied with the mechanics. You know that it makes sense to automate deployment of full stacks, which are virtual infrastructures (compute/storage/network) plus applications. You have seen several approaches to writing basic automation code, and at the mechanics of automation tools, as well as storing code safely and retrieving it from version control repositories.
You are now familiar with the idea of idempotency and related topics, and have seen how it might be possible to compose automation code that is very safe to run. This is code that does not break things, but instead puts things right and converges on the desired state described by a (partly or wholly) declarative model of the deployed system. You have also learned how code like this can be used to speed up operations, solving problems by brute force rather than detailed forensics and error-prone, time-consuming manual operations.
GitOps: modern infrastructure-as-code
Committing to immutability enables you to treat your automation codebase the way you would any application code:
- You can trust that the codebase describes what is actually running on bare metal or cloud servers.
- You can manage the codebase Agile procedures and structured use of version control to keep things clear and simple.
This is "GitOps", also referred to as "operations by pull request." In a typical GitOps setup, you might maintain a repository, such as a private repo on GitHub, with several branches called "Development," "Testing/UAT," and "Production."
The figure shows three horizontal rows labeled production, test, and development. At the beginning of the production line is the c1 oval that has a line out to a dotted C5 module at the very end with words above it: code review, merge, text. A circled 1 git pull number is directly across the test row and is on the line that goes from C1 down to C2 on the development line and connects to c3, c4, and c5 ovals with the circled number 2 iterate and test below it. A line goes from the development line up to the c5 on the test line with a circled 3 git request-pull icon. Above c5 is code review, merge, test and a line that goes up to the c5 on the production line labeled with a circled 4 git request-pull
GitOps is "Operations by Pull Request"
In this model, each branch deploys to segregated infrastructure.
- Development - Developers make changes in the Development branch, filing commits and making pull requests. These requests are fulfilled by an automated gating process and queued for automated testing. The operators see results of automated testing, and developers iterate until tests pass. The changes are then merged into the Test branch for the next level of review.
- Test - When changes are merged from Development into Test, the code is deployed to a larger test environment and is subjected to a more extensive set of automated tests.
- Production - Tested changes are again reviewed and merged to Production, from which release deployments are made.
By the time tested code is committed, evaluated, iterated, and merged to the Production branch, it has gone through at least two complete cycles of testing on successively more production-like infrastructure, and has also had conscious evaluation by several experts. The result is code that is reasonably free of bugs and works well. This operational model can be used to develop extensive self-service capability and self-healing compute/storage/network/PaaS, or implement large-scale distributed applications under advanced container orchestration.
Where can GitOps take you?
When your GitOps procedures and workflow, gating/CI-CD, automated testing and other components are in place, you can begin experimenting with elite deployment strategies that require a bedrock of flawless automation.
Blue/green deployments
Blue/green deployment is a method for reducing or eliminating downtime in production environments. It requires you to maintain two identical production environments (You do not have to call them Blue and Green. Any two colors such as Red and Black will do.). Develop the capability of quickly redirecting application traffic to one or the other (through ACI automation; load balancing; programmable DNS; or other means).
A release is deployed to the environment not currently in use (Green). After acceptance testing, redirect traffic to this environment. If problems are encountered, you can switch traffic back to the original environment (Blue). If the Green deployment is judged adequate, resources owned by the Blue deployment can be relinquished, and roles swapped for the next release.
The figure shows three horizontal rows labeled production, test, and development. At the beginning of the production line is the c1 oval that has a line out to a dotted C5 module at the very end with words above it: code review, merge, text. A circled 1 git pull number is directly across the test row and is on the line that goes from C1 down to C2 on the development line and connects to c3, c4, and c5 ovals with the circled number 2 iterate and test below it. A line goes from the development line up to the c5 on the test line with a circled 3 git request-pull icon. Above c5 is code review, merge, test and a line that goes up to the c5 on the production line labeled with a circled 4 git request-pull
Blue/Green Deployment
Note: Some DevOps practitioners differentiate between blue/green and red/black strategies. They say that in blue/green, that traffic is gradually migrated from one environment to the other, so it hits both systems for some period; whereas in red/black, traffic is cut over all at once. Some advanced DevOps practitioners, like Netflix, practice a server-level version of red/black they call "rolling red/black". With rolling red/black, servers in both environments are gradually updated, eventually ending up with one version distinct from one another, and can be rolled back individually as well. This means that three (not two) versions of the application or stack are running across both environments at any given time.
Canary testing
Canary testing is similar to rolling blue/green deployment, but somewhat more delicate. The migration between old and new deployments is performed on a customer-by-customer (or even user-by-user) basis, and migrations are made intentionally to reduce risk and improve the quality of feedback. Some customers may be very motivated to gain access to new features and will be grateful for the opportunity. They will happily provide feedback on releases along efficient channels.
Automating Testing
Automated Test and Validation
In this topic, you will learn about a suite of products and practices created by Cisco and its user community to extend test automation to software-driven network configuration, and to reduce or eliminate uncertainty about how prospective network architectures will function and perform when fully implemented.
Testing infrastructure challenges
Automation tools like Ansible, Puppet, Chef, and others solve part of the problem by turning infrastructure into code. But DevOps typically needs more fine-grained ways to define and implement infrastructures, certify that deployed infrastructures are working as required, and proactively ensure its smooth operations. DevOps also needs ways to preemptively take action when failures are imminent, and find and fix issues when errors occur.
When you use unit-testing tools like pytest in tandem with higher-order automation and in concert with continuous delivery (CI/CD), you can build environments where code can be automatically tested when changes are made.
Unit-testing frameworks make tests a part of your codebase, following the code through developer commits, pull requests, and code-review gates to QA/test and Production. This is especially useful in test-driven development (TDD) environments, where writing tests is a continuous process that actually leads development, automatically encouraging very high levels of test coverage.
The challenges of testing a network
The behavior and performance of a real-world network is collective, maintained by the configurations of many discrete pieces of equipment and software.
In traditional environments, connectivity and functionality are manually maintained across numerous individual pieces of equipment via diverse interfaces. Often, operations require that changes be made on in-service hardware carrying live traffic. This is difficult, time-consuming, extremely error-prone, and risky.
Network misconfigurations are often only discovered indirectly, when computers cannot talk to one another. As networks become more complex and carry more diverse and performance-sensitive traffic, risks to security and performance degradations, which may be both difficult to discover and to quantify, are increasingly important consequences of misconfiguration.
Network management and testing are still complex even when network devices and connectivity become software-addressable and virtualized. Methods of building and configuring networks certainly change, but you still need to create a collective architecture for safe connectivity by touching numerous device interfaces in detailed ways.
Testing Software Defined Networks (SDN)
Cisco has made huge strides in developing Software Defined Networks (SDN) and middleware that let engineers address a physical network collective as a single programmable entity. In Cisco's case, this includes:
- Application Centric Infrastructure (ACI) – This is a comprehensive data center solution that runs on top of Nexus 9000 and APIC-enabled devices. It enables abstraction and programmability of total network functionality via the Application Policy Infrastructure Controller (APIC).
- Digital Network Architecture Center (Cisco DNA Center) – This is an open, extensible, software driven architecture for Catalyst 9000 and other IOSXE devices for enterprise network.
- REST API and SDKs enabling integration with automation tools like Ansible, Puppet, and Chef.
Solutions like ACI manage the whole network by converging models (often written in a declarative syntax called YANG) that represent desired states of functionality and connectivity. The middleware enables a model to work harmoniously with other models that are currently defining system state and resource requirements. Engineers interact less often with individual devices directly, though the models still need to provide enough detail to enable configuration. The complex, fast-evolving state of large infrastructures can be maintained as code, enabling:
- Rapid re-convergence to desired states at need. If a device breaks and is replaced, it can be rapidly reintegrated with an existing network and its programmed functionality quickly restored, along with network behavior and performance.
- Portability, so that when a core application moves from one data center to another, its required network configuration accompanies it.
- Version control, CI/CD, and other tools to maintain, evolve, and apply the network codebase.
These innovations are increasingly popular with larger organizations, carriers, and other scaled-up entities. In many cases, however, networks still comprise several generations of diverse, hybrid, multi-vendor physical and virtual infrastructures, so the problem of deliberate, device-level configuration still looms.
And even when sophisticated SDN is available, SDN controller/orchestrators cannot always prevent misconfiguration. They can reject flawed code, perform sanity checks before applying changes, and inform when models make demands exceeding resource thresholds, but seemingly legitimate changes can still be applied.
A network test solution: pyATS
Python Automated Test System (pyATS) is a Python-based network device test and validation solution, originally developed by Cisco for internal use, then made available to the public and partially open-sourced. pyATS can be used to help check whether your changes work before putting them into production, and continue validation and monitoring in production to ensure smooth operations.
The pyATS ecology
pyATS originated as the low-level Python underpinning for the test system as a whole. Its higher-level library system, Genie, provides the necessary APIs and libraries that drive and interact with network devices, and perform the actual testing. The two together form the Cisco test solution we know as pyATS.
pyATS has several key features:
- pyATS framework and libraries can be leveraged within any Python code.
- It is modular, and includes components such as:
- AEtest executes the test scripts.
- Easypy is the runtime engine that enables parallel execution of multiple scripts, collects logs in one place, and provides a central point from which to inject changes to the topology under test.
- A CLI enables rapid interrogation of live networks, extraction of facts, and helps automate running of test scripts and other forensics. This enables very rapid 'no-code' debugging and correction of issues in network topologies created and maintained using these tools.
In SDN/cloud/virtual network environments, setup can involve actually building a topology, and cleanup can involve retiring it, reclaiming platform resources. This setup and cleanup can be done directly using pyATS code. pyATS provides an enormous interface library to Cisco and other infrastructure via a range of interfaces, including low-level CLI and REST APIs, as well as connectors to ACI and other higher-order SDN management frameworks.
pyATS can consume, parse, and implement topologies described in JSON, as YANG models, and from other sources, even from Excel spreadsheets.
pyATS can also be integrated with automation tools like Ansible for building, provisioning, and teardown. However, it may be better practice to do the reverse. Use Ansible, Puppet, or Chef to manage the infrastructure's entire codebase and have those products invoke Python (and pyATS) to deal with the details of network implementation. These tools also recruit ACI or other middleware to simplify the task, and permit segregated storage and versioning of YANG or other models defining concrete topologies.
Alternatively, you can invoke pyATS indirectly in several ways (including ways requiring minimal Python programming knowledge).
pyATS Example
The following content shows how to use pyATS to create and apply tests. You will need to be familiar with this information to help you complete the lab on the next page. Simply read along with this example to better understand pyATS.
Virtual environments
The pyATS tool is best installed for personal work inside a Python virtual environment (venv). A venv is an environment copied from your base environment, but kept separate from it. This enables you to avoid installing software that might permanently change the state of your system. Virtual environments exist in folders in your file system. When they are created, they can be activated, configured at will, and components installed in them can be updated or modified without changes being reflected in your host's configuration. The ability to create virtual environments is native to Python 3, but Ubuntu 18.04 may require you to install a python3-venv package separately.
The following instructions describe how to create a venv on Ubuntu 18.04 (where python3 is the default command). If you are using a different operating system, refer to the appropriate documentation for pip and virtual environments.
Ensure that python3-pip, the Python3 package manager, is in place. You would also install git, which you would need later:
sudo apt-get install python3-pip
sudo apt-get install python3-venv
sudo apt-get install git
Create a new virtual environment in the directory of your choice. In this example, it is called myproject.
python3 -m venv myproject
Venv creates the specified working directory (myproject) and corresponding folder structure containing convenience functions and artifacts describing this environment's configuration. At this point, you can cd to the myproject folder and activate the venv:
cd myproject
source bin/activate
Installing pyATS
You can install pyATS from the public Pip package repository (PyPI).
Note: You may see "Failed building wheel for ...<wheelname>" errors while installing pyATS through pip. You can safely ignore those errors as pip has a backup plan for those failures and the dependencies are installed despite errors reported.
pip install pyats[full]
Verify that it was installed by listing the help:
pyats --help
Clone the pyATS sample scripts repo, maintained by Cisco DevNet, which contains sample files you can examine:
git clone https://github.com/CiscoDevNet/pyats-sample-scripts.git
cd pyats-sample-scripts
The installed target, pyats[full], includes both the low-level underpinnings, various components, dependencies, and the high-level Genie libraries.
pyATS test case syntax
The test declaration syntax for pyATS is inspired by that of popular Python unit-testing frameworks like pytest. It supports basic testing statements, such as an assertion that a variable has a given value, and adds to that the ability to explicitly provide results (including result reason, and data) via specific APIs. This is demonstrated in the following excerpt from a basic test script. The pyATS test script can be found in /basics/pyats-sample-script.py from the repository that you cloned previously. A portion of the script is shown below.
class MyTestcase(aetest.Testcase):
@aetest.setup
def setup(self, section):
'''setup section
create a setup section by defining a method and decorating it with
@aetest.setup decorator. The method should be named 'setup' as good
convention.
setup sections are optional within a testcase, and is always runs first.
'''
log.info("%s testcase setup/preparation" % self.uid)
# set some variables
self.a = 1
self.b = 2
@aetest.test
def test_1(self, section):
'''test section
create a test section by defining a method and decorating it with
@aetest.test decorator. The name of the method becomes the unique id
labelling this test. There may be arbitrary number of tests within a
testcase.
test sections run in the order they appear within a testcase body.
'''
log.info("test section: %s in testcase %s" % (section.uid, self.uid))
# testcase instance is preserved, eg
assert self.a == 1
@aetest.test
def test_2(self, section):
'''
you can also provide explicit results, reason and data using result API.
These information will be captured in the result summary.
'''
log.info("test section: %s in testcase %s" % (section.uid, self.uid))
if self.b == 2:
self.passed('variable b contains the expected value',
data = {'b': self.b})
else:
self.failed('variable b did not contains the expected value',
data = {'b': self.b})
If you click through and examine the entire test script, you will see it contains several sections:
- A common Setup block
- Multiple Testing blocks
- A common Cleanup block
These blocks contain statements that prepare and/or determine readiness of the test topology (a process that can include problem injection), perform tests, and then return the topology to a known state.
The Testing blocks, which are often referred to in pyATS documentation as the Test Cases, can each contain multiple tests, with their own Setup and Cleanup code. Best practice suggests that the common Cleanup section, at the end, be designed for idempotency. This means that it should check and restore all changes made by Setup and Test, restoring the topology to its original, desired state.
pyATS scripts and jobs
A pyATS script is a Python file where pyATS tests are declared. It can be run directly as a standalone Python script file, generating output only to your terminal window. Alternatively, one or more pyATS scripts can be compiled into a "job" and run together as a batch, through the pyATS EasyPy module. This enables parallel execution of multiple scripts, collects logs in one place, and provides a central point from which to inject changes to the topology under test.
The pyATS job file can be found in /basics/pyats-sample-job.py from the repository that you cloned previously. A portion of the job file is shown below.
import os
from pyats.easypy import run
def main():
'''
main() function is the default pyATS job file entry point that Easypy module consumes
'''
# find the location of the script in relation to the job file
script_path = os.path.dirname(os.path.abspath(__file__))
testscript = os.path.join(script_path, 'basic_example_script.py')
# execute the test script
run(testscript=testscript)
If you have performed the installation steps and are now in a virtual environment containing the cloned repo, you can run this job manually to invoke the basic test case:
pyats run job basic/basic_example_job.py
If you see an error like RuntimeError: Jobfile 'basic_example_script' did not define main(), it means you have run the basic_example_script.py file rather than the basic_example_job.py file. Or, if you see The provided jobfile 'pyats-sample-scripts/basic/basic_example_job.py' does not exist. double-check which directory you are working within. Perhaps you have already changed directories into the pyats-sample-scripts repository directory.
Output
2020-03-01T12:38:50: %EASYPY-INFO: Starting job run: basic_example_job 2020-03-01T12:38:50: %EASYPY-INFO: Runinfo directory: /Users/agentle/.pyats/runinfo/basic_example_job.2020Mar01_12:38:48.974991 2020-03-01T12:38:50: %EASYPY-INFO: -------------------------------------------------------------------------------- 2020-03-01T12:38:51: %EASYPY-INFO: Starting task execution: Task-1 2020-03-01T12:38:51: %EASYPY-INFO: test harness = pyats.aetest 2020-03-01T12:38:51: %EASYPY-INFO: testscript = /Users/agentle/src/pyats-sample-scripts/basic/basic_example_script.py 2020-03-01T12:38:51: %AETEST-INFO: +------------------------------------------------------------------------------+ 2020-03-01T12:38:51: %AETEST-INFO: | Starting common setup | 2020-03-01T12:38:51: %AETEST-INFO: +------------------------------------------------------------------------------+ 2020-03-01T12:38:51: %AETEST-INFO: +------------------------------------------------------------------------------+ 2020-03-01T12:38:51: %AETEST-INFO: | Starting subsection subsection_1 | 2020-03-01T12:38:51: %AETEST-INFO: +------------------------------------------------------------------------------+ 2020-03-01T12:38:51: %SCRIPT-INFO: hello world! 2020-03-01T12:38:51: %AETEST-INFO: The result of subsection subsection_1 is => PASSED 2020-03-01T12:38:51: %AETEST-INFO: +------------------------------------------------------------------------------+ 2020-03-01T12:38:51: %AETEST-INFO: | Starting subsection subsection_2 | 2020-03-01T12:38:51: %AETEST-INFO: +------------------------------------------------------------------------------+ 2020-03-01T12:38:51: %SCRIPT-INFO: inside subsection subsection_2 2020-03-01T12:38:51: %AETEST-INFO: The result of subsection subsection_2 is => PASSED 2020-03-01T12:38:51: %AETEST-INFO: The result of common setup is => PASSED 2020-03-01T12:38:51: %AETEST-INFO: +------------------------------------------------------------------------------+ 2020-03-01T12:38:51: %AETEST-INFO: | Starting testcase Testcase_One | 2020-03-01T12:38:51: %AETEST-INFO: +------------------------------------------------------------------------------+ 2020-03-01T12:38:51: %AETEST-INFO: +------------------------------------------------------------------------------+ 2020-03-01T12:38:51: %AETEST-INFO: | Starting section setup | 2020-03-01T12:38:51: %AETEST-INFO: +------------------------------------------------------------------------------+ 2020-03-01T12:38:51: %SCRIPT-INFO: Testcase_One testcase setup/preparation 2020-03-01T12:38:51: %AETEST-INFO: The result of section setup is => PASSED 2020-03-01T12:38:51: %AETEST-INFO: +------------------------------------------------------------------------------+ 2020-03-01T12:38:51: %AETEST-INFO: | Starting section test_1 | 2020-03-01T12:38:51: %AETEST-INFO: +------------------------------------------------------------------------------+ 2020-03-01T12:38:51: %SCRIPT-INFO: test section: test_1 in testcase Testcase_One 2020-03-01T12:38:51: %AETEST-INFO: The result of section test_1 is => PASSED 2020-03-01T12:38:51: %AETEST-INFO: +------------------------------------------------------------------------------+ 2020-03-01T12:38:51: %AETEST-INFO: | Starting section test_2 | 2020-03-01T12:38:51: %AETEST-INFO: +------------------------------------------------------------------------------+ 2020-03-01T12:38:51: %SCRIPT-INFO: test section: test_2 in testcase Testcase_One 2020-03-01T12:38:51: %AETEST-INFO: Passed reason: variable b contains the expected value 2020-03-01T12:38:51: %AETEST-INFO: The result of section test_2 is => PASSED 2020-03-01T12:38:51: %AETEST-INFO: +------------------------------------------------------------------------------+ 2020-03-01T12:38:51: %AETEST-INFO: | Starting section cleanup | 2020-03-01T12:38:51: %AETEST-INFO: +------------------------------------------------------------------------------+ 2020-03-01T12:38:51: %SCRIPT-INFO: Testcase_One testcase cleanup/teardown 2020-03-01T12:38:51: %AETEST-INFO: The result of section cleanup is => PASSED 2020-03-01T12:38:51: %AETEST-INFO: The result of testcase Testcase_One is => PASSED 2020-03-01T12:38:51: %AETEST-INFO: +------------------------------------------------------------------------------+ 2020-03-01T12:38:51: %AETEST-INFO: | Starting common cleanup | 2020-03-01T12:38:51: %AETEST-INFO: +------------------------------------------------------------------------------+ 2020-03-01T12:38:51: %AETEST-INFO: +------------------------------------------------------------------------------+ 2020-03-01T12:38:51: %AETEST-INFO: | Starting subsection clean_everything | 2020-03-01T12:38:51: %AETEST-INFO: +------------------------------------------------------------------------------+ 2020-03-01T12:38:51: %SCRIPT-INFO: goodbye world 2020-03-01T12:38:51: %AETEST-INFO: The result of subsection clean_everything is => PASSED 2020-03-01T12:38:51: %AETEST-INFO: The result of common cleanup is => PASSED 2020-03-01T12:38:51: %EASYPY-INFO: -------------------------------------------------------------------------------- 2020-03-01T12:38:51: %EASYPY-INFO: Job finished. Wrapping up... 2020-03-01T12:38:52: %EASYPY-INFO: Creating archive file: /Users/agentle/.pyats/archive/20-Mar/basic_example_job.2020Mar01_12:38:48.974991.zip 2020-03-01T12:38:52: %EASYPY-INFO: +------------------------------------------------------------------------------+ 2020-03-01T12:38:52: %EASYPY-INFO: | Easypy Report | 2020-03-01T12:38:52: %EASYPY-INFO: +------------------------------------------------------------------------------+ 2020-03-01T12:38:52: %EASYPY-INFO: pyATS Instance : /Users/agentle/.local/share/virtualenvs/pyats-sample-scripts-b4vw68FQ/bin/.. 2020-03-01T12:38:52: %EASYPY-INFO: Python Version : cpython-3.8.1 (64bit) 2020-03-01T12:38:52: %EASYPY-INFO: CLI Arguments : /Users/agentle/.local/share/virtualenvs/pyats-sample-scripts-b4vw68FQ/bin/pyats run job basic/basic_example_job.py 2020-03-01T12:38:52: %EASYPY-INFO: User : agentle 2020-03-01T12:38:52: %EASYPY-INFO: Host Server : AGENTLE-M-339A 2020-03-01T12:38:52: %EASYPY-INFO: Host OS Version : Mac OSX 10.14.6 (x86_64) 2020-03-01T12:38:52: %EASYPY-INFO: 2020-03-01T12:38:52: %EASYPY-INFO: Job Information 2020-03-01T12:38:52: %EASYPY-INFO: Name : basic_example_job 2020-03-01T12:38:52: %EASYPY-INFO: Start time : 2020-03-01 12:38:50.019013 2020-03-01T12:38:52: %EASYPY-INFO: Stop time : 2020-03-01 12:38:51.732162 2020-03-01T12:38:52: %EASYPY-INFO: Elapsed time : 0:00:01.713149 2020-03-01T12:38:52: %EASYPY-INFO: Archive : /Users/agentle/.pyats/archive/20-Mar/basic_example_job.2020Mar01_12:38:48.974991.zip 2020-03-01T12:38:52: %EASYPY-INFO: 2020-03-01T12:38:52: %EASYPY-INFO: Total Tasks : 1 2020-03-01T12:38:52: %EASYPY-INFO: 2020-03-01T12:38:52: %EASYPY-INFO: Overall Stats 2020-03-01T12:38:52: %EASYPY-INFO: Passed : 3 2020-03-01T12:38:52: %EASYPY-INFO: Passx : 0 2020-03-01T12:38:52: %EASYPY-INFO: Failed : 0 2020-03-01T12:38:52: %EASYPY-INFO: Aborted : 0 2020-03-01T12:38:52: %EASYPY-INFO: Blocked : 0 2020-03-01T12:38:52: %EASYPY-INFO: Skipped : 0 2020-03-01T12:38:52: %EASYPY-INFO: Errored : 0 2020-03-01T12:38:52: %EASYPY-INFO: 2020-03-01T12:38:52: %EASYPY-INFO: TOTAL : 3 2020-03-01T12:38:52: %EASYPY-INFO: 2020-03-01T12:38:52: %EASYPY-INFO: Success Rate : 100.00 % 2020-03-01T12:38:52: %EASYPY-INFO: 2020-03-01T12:38:52: %EASYPY-INFO: +------------------------------------------------------------------------------+ 2020-03-01T12:38:52: %EASYPY-INFO: | Task Result Summary | 2020-03-01T12:38:52: %EASYPY-INFO: +------------------------------------------------------------------------------+ 2020-03-01T12:38:52: %EASYPY-INFO: Task-1: basic_example_script.common_setup PASSED 2020-03-01T12:38:52: %EASYPY-INFO: Task-1: basic_example_script.Testcase_One PASSED 2020-03-01T12:38:52: %EASYPY-INFO: Task-1: basic_example_script.common_cleanup PASSED 2020-03-01T12:38:52: %EASYPY-INFO: 2020-03-01T12:38:52: %EASYPY-INFO: +------------------------------------------------------------------------------+ 2020-03-01T12:38:52: %EASYPY-INFO: | Task Result Details | 2020-03-01T12:38:52: %EASYPY-INFO: +------------------------------------------------------------------------------+ 2020-03-01T12:38:52: %EASYPY-INFO: Task-1: basic_example_script 2020-03-01T12:38:52: %EASYPY-INFO: |-- common_setup PASSED 2020-03-01T12:38:52: %EASYPY-INFO: | |-- subsection_1 PASSED 2020-03-01T12:38:52: %EASYPY-INFO: | -- subsection_2 PASSED 2020-03-01T12:38:52: %EASYPY-INFO: |-- Testcase_One PASSED 2020-03-01T12:38:52: %EASYPY-INFO: | |-- setup PASSED 2020-03-01T12:38:52: %EASYPY-INFO: | |-- test_1 PASSED 2020-03-01T12:38:52: %EASYPY-INFO: | |-- test_2 PASSED 2020-03-01T12:38:52: %EASYPY-INFO: | -- cleanup PASSED 2020-03-01T12:38:52: %EASYPY-INFO: -- common_cleanup PASSED 2020-03-01T12:38:52: %EASYPY-INFO: -- clean_everything PASSED 2020-03-01T12:38:52: %EASYPY-INFO: Sending report email... 2020-03-01T12:38:52: %EASYPY-INFO: Missing SMTP server configuration, or failed to reach/authenticate/send mail. Result notification email failed to send. 2020-03-01T12:38:52: %EASYPY-INFO: Done!
Pro Tip
Use the following command to view your logs locally: pyats logs view. This command automatically opens a page in your web browser displaying the pyATS test results in a GUI format.
</code></pre> </details>
This is a very simple example that uses the most basic pyATS functionality. There is no actual topology or testbed on which to run network-type tests. However, the output shows you the kind of detailed test log pyATS team creates, including a section-by-section run log of the whole process, from setup to teardown, and appended comprehensive report sections:
2020-03-01T12:38:52: %EASYPY-INFO: +------------------------------------------------------------------------------+
2020-03-01T12:38:52: %EASYPY-INFO: | Task Result Summary |
2020-03-01T12:38:52: %EASYPY-INFO: +------------------------------------------------------------------------------+
2020-03-01T12:38:52: %EASYPY-INFO: Task-1: basic_example_script.common_setup PASSED
2020-03-01T12:38:52: %EASYPY-INFO: Task-1: basic_example_script.Testcase_One PASSED
2020-03-01T12:38:52: %EASYPY-INFO: Task-1: basic_example_script.common_cleanup PASSED
Each job run generates an archive .zip file, stored by default under your user's home directory, ~/.pyats/archive. You can list the files and unzip each archive file to view their content (as regular text files), or use the built in web-browser based log viewer, using a localhost web server:
pyats logs list
pyats logs view
pyATS testbed file
A testbed can be a single YAML file, or can be programmatically assembled from YAML (establishing basic structure) and Python files that make use of pyATS (and potentially Genie) library modules to:
- Define the testbed devices (routers, switches, servers, etc.), subsystems (such as ports, network cards) and their interconnections.
- Establish managerial connections with them, using pyATS'
ConnectionManagerclass to create connections and operate upon them. Inside the pyATS topology model, devices are created as objects that include aconnectionmanagerattribute, containing an instance of theConnectionManagerclass that manages active connections to the real-world device.
The testbed file is an essential input to the rest of pyATS library and ecosystem. It provides information to the framework for loading the right set of library APIs (such as parsers) for each device, and how to effectively communicate to them.
Real testbed files for large topologies can be long, deeply-nested, and complex. A simple testbed.yaml file with one device, identified with <device_ip> below might look like this example. To run the example, you would need to enter a real IP address for a device that matches the type and os settings.
Note: This is an example and it will not work with pyATS unless you enter real values for the username, password, and connection IP address.
devices:
router1:
type: router
os: nxos
platform: n9kv
alias: under_test
credentials:
default:
username: "%ENV{MY_USERNAME}"
password: "%ENV{MY_PASSWORD}"
connections:
cli:
protocol: ssh
ip: "<device_ip>"
This example defines a router whose hostname is router1, with a supported OS.
- '
platform' is recommended, and is defined as the machine name (for example, a VM UUID) on which the component is running. - The file provides default credentials for logging into the device, derived from variables set in your local environment (such as
export MY_USERNAME=username). - The file defines the connection method and protocol used to manage the device, and its IP address. pyATS currently communicates via Telnet, SSH, REST, RESTCONF (YANG), and NETCONF (YANG).
To validate that your testbed YAML file meets the pyATS requirements (and conforms to the standard schema), replace the username, password, and ip values, and then run a pyats validate command like so:
pyats validate testbed testbed.yaml
This command checks the content of your file, loads it, and displays any errors in the schema or format.
Note: It is possible to leverage pyATS libraries without a testbed file input, where you can elect to define devices, connections, and other testbed elements programmatically.
pyATS Library: Genie
Genie is the pyATS higher-level library system that provides APIs for interacting with devices, and a powerful CLI for topology and device management and interrogation.
When installed, it adds its features and functionalities into the pyATS framework.
For example, Genie features parsers for a wide range of network operating systems and infrastructure. Parsers are APIs that convert device output into Python structured data. To exercise parsers, enter the pyATS interactive shell. This is effectively the same as Python interpreter/interactive shell, except it provides niche functionalities such as automatically loading your testbed file:
pyats shell --testbed-file testbed.yaml
Welcome to pyATS Interactive Shell
==================================
Python 3.6.9 (default, Nov 11 2019, 12:11:42)
[GCC 4.2.1 Compatible Apple LLVM 11.0.0 (clang-1100.0.33.12)]
>>> from pyats.topology import loader
>>> testbed = loader.load('testbed.yaml')
*------------------------------------------------------------------------------
>>>
Note: If you are using a pyATS version older than v20.1, the equivalent command is instead genie shell.
Now you can access the loaded testbed's devices, establish connectivity, and parse device command outputs like this:
# connect to testbed devices in parallel
testbed.connect()
# parse device output from your "router1"
testbed.devices['router1'].parse('show interfaces')
The Device.parse() API returns the processed structured data (Python dictionary), enabling you to build your own business logic and/or tests on top. To see the list of all available platforms and supported parser commands in Genie library, visit Available Parsers in the Genie documentation.
In addition, it is also possible to exercise the library's functionality through shell CLIs (pyats commands). You can interactively extract a comprehensive text description of the configuration and operational states of various protocols, features and hardware information in a given topology. For example:
pyats learn conf --testbed-file testbed.yaml --output my_network
The underlying Genie mechanism connects in parallel to all the devices defined in the testbed and collects their configurations (conf) in a human-readable file (in this case, called my_network). The output provides details about network state, including interface setup, VLANs, spanning-tree configuration, and many other features.
Now that the output file exists, it serves as a "gold standard" for this topology's configuration. At any subsequent point in time, if configuration drift seems to have taken hold and broken something, you can run Genie again:
pyats learn conf --testbed-file testbed.yaml --output my_broken_network
The diff command compares the configurations, quickly exposing differences.
pyats diff my_network my_broken_network
This API returns a set of diff files, detailing any changes, and letting you quickly discover the cause of problems.
To see the list of "features" pyATS Genie current supports (and can learn), refer to Available Models in the Genie documentation.
Many of pyATS's functions, such as parse and learn, are exercisable in Python directly (either using interactive shell or in your .py script files, intended for programmers), and through the CLI interface (for non-programmers). You can learn more about this topic the pyATS Getting Started Guide.
Putting it all together
This topic has provided a quick introduction to pyATS and its companion solutions. The next topic introduces VIRL, a Cisco tool for faithfully simulating networks on a server platform, along with some associated utilities.
Note: Windows platforms are not yet supported for using pyATS.
Lab – Automated Testing Using pyATS and Genie
In this lab, you will explore the fundamentals pyATS and Genie.
You will complete the following objectives:
- Part 1: Launch the DEVASC VM
- Part 2: Create a Python Virtual Environment
- Part 3: Use the pyATS Testing Library
- Part 4: Use Genie to Parse IOS Command Output
- Part 5: Use Genie to Compare Configurations
- Part 6: Lab Cleanup and Further Investigation
Network Simulation
Network Simulation and VIRL
Network simulation provides a means to test network configurations, debug configuration code, and to work with and learn Cisco infrastructure and APIs in a safe, convenient, and non-cost-prohibitive way.
Cisco Virtual Internet Routing Laboratory (VIRL, pronounced 'viral') is a commercial product originally developed for internal use at Cisco, with broad and active community support. Now in version 2, VIRL can run on bare metal, or on large virtual machines on several hypervisor platforms (ESXi and VMware Workstation 12+ among them). The official name for VIRL v 2.0 is Cisco Modeling Labs – Personal (or CML-Personal). You may find the tool referenced as both VIRL 2.0 and CML - P.
Though VIRL cannot duplicate the performance of elite hardware or SDN software components running in optimal production conditions, it mirrors Cisco functionality perfectly. The virtual equipment that runs inside VIRL uses the same code that runs inside actual Cisco products. This makes VIRL an ideal tool for learning, as well as a useful mechanism for trialing network configurations and fine-tuning automation for building and testing them.
VIRL components and workflow
VIRL provides a local CLI for system management, a REST interface for integration with automation, and a powerful UI that offers a complete graphical environment for building and configuring simulation topologies.
The UI comes with several sample topologies to get you started. Among these is a two-router IOS network simulation that can quickly be made active and explored. VIRL's Design Perspective view lets you modify existing simulations (after stopping them) or compose new simulations by dragging, dropping, and connecting network entities, configuring them as you go.
VIRL Displays a Visualization of Your Simulation
The visualization has clickable elements that let you explore configuration of entities and make changes via the WebUI, or by connecting to network elements via console. You can also extract individual device configurations, or entire simulated network configs, as .virl files.
VIRL files
VIRL also enables you to define simulations as code, enabling both-ways integration with other software platforms for network management and testing.
VIRL's native configuration format is called a.virl file, which is a human-readable YAML file. The .virl file contains complete descriptions of the IOS routers, their interface configurations and connection (plus other configuration information), credentials for accessing them, and other details. These files can be used to launch simulations via the VIRL REST API, and you can convert .virl files to and from "testbed" files for use with PyATS and Genie.
In the VIRL UI, you select a simulation, make VIRL read the device's configuration, and then it composes a .virl file to represent it. VIRL offers to save the topology in a new file that you can then open in an editor for review.
The .virl file provides a method for determining if configuration drift has occurred on the simulation. A simple diff command can compare a newly-extracted .virl file with the original .virl file used to launch the simulation, and differences will be apparent.
This technique, comparing a known-good configuration manifest with an extracted manifest describing current network state, helps debug real-world networks for which authoritative, complete PyATS topologies are available.
Infrastructure and Automation Summary
What Did I Learn in this Module?
Automating Infrastructure with Cisco
Automation is using code to configure, deploy, and manage applications together with the compute, storage, and network infrastructures and services on which they run. You have choices in how to programmatically control your network configurations and infrastructure. Walk: Read-only automation. Run: Activate policies and provide self-service across multiple domains. Fly: Deploy applications, network configurations, and more through CI/CD. Manual processes are always subject to human error, and documentation meant for humans is often incomplete and ambiguous, hard to test, and quickly outdated. Automation is the answer to these problems. Benefits of full-stack automation are self service, scale on demand, observability, and Automated problem mitigation. Software-defined infrastructure, also known as cloud computing, lets developers and operators use software to requisition, configure, deploy, and manage bare-metal and virtualized compute, storage, and network resources. Modern application architectures are increasingly distributed. They are built up out of small and relatively light components that are sometimes called microservices.
DevOps and SRE
For full-stack automation to be truly effective, it requires changes to organizational culture, including breaking down the historical divides between Development (Dev) and Operations (Ops). DevOps evolved and continues to evolve in many places in parallel. Some key events have shaped the discipline as we know it today.
Defining Moments 1: Site Reliability Engineering (SRE). The role of the SRE is intended to fuse the disciplines and skills of Dev and Ops.
Defining Moments 2: Debois and “Agile Infrastructure” Debois was a proponent of automating virtual infrastructure, using version-control to store infrastructure deployment code, and applying Agile methods to the development and maintenance of infrastructure-level solutions.
Defining Moments 3: Allspaw and Hammond
John Allspaw and Paul Hammond gave a presentation at VelocityConf in 2009. It described automation, teamwork, responsibility-sharing, transparency, trust, mutual accountability, and communications practices.
DevOps/SRE have many core principles and best practices:
- A focus on automation
- The idea that "failure is normal"
- A reframing of "availability" in terms of what a business can tolerate
Basic Automation Scripting
Automation tooling partly works by wrapping shell functionality, operating system utilities, API functions and other control plane elements. But tools still don't solve every problem of deployment and configuration. That's why every automation tool has one or more functions that execute basic commands and scripts on targets and return results. For example, in Ansible, these functions include command, shell, and raw. Automation tools like Ansible, Puppet, or Chef offer powerful capabilities compared to ad-hoc automation strategies using BASH, Python, or other programming languages. An imperative procedure is an ordered sequence of commands aimed at achieving a goal. The sequence may include flow-control, conditions, functional structure, classes, and more. To configure remote systems, you need to access and execute scripts on them. Two ways (of several) to do this are: store scripts locally, transmit them to target machines with a shell utility like scp, then log into the remote machine using ssh and execute them. Or, you can pipe scripts to a remote machine using cat | ssh and execute them in sequence with other commands, capturing and returning results to your terminal, all in one command. Infrastructure-as-a-Service (IaaS) cloud computing frameworks are a typical target for automation. Cloud automation enables you to provision virtualized hosts, configure virtual networks and other connectivity, requisition services, and then deploy applications on this infrastructure. IaaS and other types of infrastructure cloud also provide CLIs and SDKs that enable easy connection to their underlying interfaces, which are usually REST-based.
Automation Tools
Three of the most popular automation tools are Ansible, Puppet, and Chef. Automation tools like Ansible, Puppet, or Chef offer powerful capabilities compared to ad-hoc automation strategies using BASH, Python, or other programming languages. Idempotent software produces the same desirable result each time that it is run. In deployment software, idempotency enables convergence and composability. Procedural code can achieve idempotency, but many infrastructure management, deployment, and orchestration tools have adopted another method, which is creating a declarative. A declarative is static model that represents the desired end product.
Ansible's basic architecture is very simple and lightweight. Ansible's control node runs on virtually any Linux machine running Python 2 or 3. All system updates are performed on the control node. Plugins enable Ansible to gather facts from and perform operations on infrastructure that can't run Python locally, such as cloud provider REST interfaces. Ansible is substantially managed from the Bash command line, with automation code developed and maintained using any standard text editor.
Puppet’s core architecture has the following characteristics: A designated server to host main application components called the Puppet Server, the Facter which is the fact-gathering service, the PuppetDB, which can store facts, node catalogs, and recent configuration event history, and a secure client, a Puppet Agent, installed and configured on target machines. Operators communicate with the Puppet Server largely via SSH and the command line.
Chef’s main components are the Chef Workstation which is a standalone operator workstation, the Chef Infra Client (the host agent) which runs on hosts and retrieves configuration templates and implemenst required changes, and the Chef Infra Server which replies to queries from Chef Infra Agents on validated hosts and responds with configuration updates, upon which the Agents then converge host configuration.
Infrastructure as Code
Immutability literally means "the state of being unchangeable," but in DevOps parlance, it refers to maintaining systems entirely as code, performing no manual operations on them at all. Committing to immutability enables you to treat your automation codebase the way you would any application code:
● You can trust that the codebase describes what's actually running on bare metal or cloud servers.
● You can manage the codebase Agile procedures and structured use of version control to keep things clear and simple.
Automating Testing
DevOps typically needs more fine-grained ways to define and implement infrastructures, certify that deployed infrastructures are working as required, proactively ensure its smooth operations, preemptively take action when failures are imminent, and find and fix issues when errors occur.
When you use unit-testing tools like pytest in tandem with higher-order automation and in concert with continuous delivery (CI/CD), you can build environments where code can be automatically tested when changes are made.
Unit-testing frameworks make tests a part of your codebase, following the code through developer commits, pull requests, and code-review gates to QA/test and Production. This is especially useful in test-driven development (TDD) environments, where writing tests is a continuous process that actually leads development, automatically encouraging very high levels of test coverage.
Network Simulation
Network simulation provides a means to test network configurations, debug configuration code, and to work with and learn Cisco infrastructure and APIs in a safe, convenient, and non-cost-prohibitive way.
Cisco Virtual Internet Routing Laboratory (VIRL) VIRL can run on bare metal, or on large virtual machines on several hypervisor platforms.














