Software Development and Design (3)

 As a NetAcad student, you probably already know that the world of networking is always changing and there is always something new to learn. Right now, IT specialists are being urged to learn to develop software that lets them automate many of the tasks of network creation, maintenance, and administration.

There are many software development methodologies to choose from, and you should learn from software engineering best practices. In your career, you will develop software collaboratively as well as independently. This means that source control will be a big part of your work as a developer. This module provides context for where software development is today, and shows you modern software development methods to create that all-important business or user outcome.

Software Development


Introduction

The software development process is also known as the software development life cycle (SDLC). This process is more than just coding. It also includes gathering requirements, creating a proof of concept, testing, and fixing bugs.

In this module, we discuss phases in the software development life cycle followed by methodologies for managing the real-world requirements of software projects. The methodologies discussed here begin with the waterfall method, which emphasizes up-front planning. Other methodologies in this topic, such as Agile and Lean, are more dynamic and adaptive than the waterfall method.

Software Development Life Cycle (SDLC)



Phases of the Software Development Life Cycle

The software development life cycle (SDLC) is the process of developing software, starting from an idea and ending with delivery. This process consists of six phases. Each phase takes input from the results of the previous phase. There is no standard SDLC, so the exact phases can vary, but the most common are:

Phase 1. Requirements & Analysis

Phase 2. Design

Phase 3. Implementation

Phase 4. Testing

Phase 5. Deployment

Phase 6. Maintenance

Historically, development teams usually followed these phases in order in the waterfall method. The goal of waterfall was to complete each phase of SDLC down to the last detail before proceeding to the next, never returning to a prior phase, and writing everything down along the way.

Although the waterfall methods is still widely used today, it's gradually being superseded by more adaptive, flexible methods that produce better software, faster, with less pain. These methods are collectively known as “Agile development.”

It is important to understand that the SDLC can be applied many different ways. Its phases can be repeated, and the order reversed. Individual phases can be performed at many levels in parallel (for example, requirements can be gathered separately for user interface details, back-end integrations, operating and performance parameters, etc.).

We'll look at the phases of the SDLC in greater detail, and in their classic order (just remember: this is a description, not a prescription)

Requirements and Analysis Phase

The goal of the requirements and analysis phase is to answer several tiers of questions. These begin with exploring stakeholders' current situation, needs and constraints, present infrastructure, etc. They are determining what problem(s) this new software needs to solve.

After the problem is better-defined, more concrete issues can be explored to determine where, and in what context, the new software will need to live.

When answers to such questions are compiled, it's time to begin exploring more precise requirements by focusing on desired features and user experience (UX).

Finally, the team begins assessing architectural options for building the software itself. For most enterprise applications, this means many iterations of defining requirements for the software's front end and back end. You will also need to provide points of integration for other applications, as well as services for lifecycle management.

After gathering the requirements, the team analyzes the results to determine the following:

  • Is it possible to develop the software according to these requirements, and can it be done on-budget?
  • Are there any risks to the development schedule, and if so, what are they?
  • How will the software be tested?
  • When and how will the software be delivered?

At the conclusion of this phase, the classic waterfall method suggests creating a Software Requirement Specification (SRS) document which states the software requirements and scope, and confirms this meticulously with stakeholders.

Design and Implementation Phases

Design

The design phase classically takes the SRS document from the Requirements & Analysis phase as input. During the design phase, the software architects and developers design the software based on the provided SRS.

At the conclusion of the design phase, the team creates High-Level Design (HLD) and Low-Level Design (LLD) documents. The HLD gives a "10,000-foot view" of the proposed software. It describes the general architecture, components and their relationships, and may provide additional detail. The LLD, based on the HLD document, describes in much greater detail the architecture of individual components, the protocols used to communicate among them, and enumerates required classes and other aspects of the design.

Implementation

The implementation phase classically takes the HLD and the LLD from the design phase as an input.

The implementation phase is often called the coding or development phase. During this phase, the developers take the design documentation and develop the code according to that design. All of the components and modules are built during this phase, which makes implementation the longest phase of the life cycle. During this phase, testing engineers are also writing the test plan.

At the conclusion of the implementation phase, functional code that implements all of the customer's requirements is ready to be tested.

Testing, Deployment, and Maintenance Phases

Testing

The testing phase classically takes the software code from the implementation phase as input. During this phase, the test engineers take the code and install it into the testing environment so they can follow the test plan. The test plan is a document that includes a list of every single test to be performed in order to cover all of the features and functionality of the software, as specified by the customer requirements. In addition to functional testing, the test engineers also perform:

  • Integration testing
  • Performance testing
  • Security testing

When code doesn’t pass a test, the test engineer identifies the bug, which gets passed to the developers. After the bug is fixed, the test engineers will re-test the software. This back and forth between the test and development engineers continues until all of the code passes all of the tests.

At the conclusion of the testing phase, a high quality, bug-free, working piece of software is ready for production, in theory. In practice, this rarely happens. Developers have learned how to test more efficiently, how to build testing into automated workflows, and how to test software at many different levels of detail and abstraction: from the tiniest of low-level function definitions to large-scale component aggregations. They've also learned that software is never bug-free, and must instead be made observable, tested in production, and made resilient so it can remain available and perform adequately, despite issues.

Deployment

The deployment phase takes the software from the testing phase as input. During the deployment phase, the software is installed into the production environment. If there are no deployment issues, the product manager works with the architects and qualified engineers to decide whether the software is ready to be released.

At the end of the deployment phase, the final piece of software is released to customers and other end users.

Maintenance

During the maintenance phase, the team:

  • Provides support for customers
  • Fixes bugs found in production
  • Works on software improvements
  • Gathers new requests from the customer

At the conclusion of the maintenance phase, the team is ready to work on the next iteration and version of the software, which brings the process back to the beginning of the SDLC and the requirements and analysis phase.


Software Development Methodologies

A software development methodology is also known as a Software Development Life Cycle model. These methodologies are nothing more than a set of rules, steps, roles and principles. Many different methodologies exist, but we will focus on the three most popular:

  • Waterfall
  • Agile
  • Lean

Each methodology has its own pros and cons. Deciding on which to use depends on many factors, such as the type of project, the length of the project, and the size of the team.

Waterfall Software Development


Waterfall is the traditional software development model, and is still practiced to this day. The waterfall model is nearly identical to the software development life cycle because each phase depends on the results of the previous phase.

With waterfalls, the water flows in one direction only. With the waterfall method, the process goes in one direction, and can never go backwards. Think of it like a relay race where one runner has to finish their distance before handing the baton off to the next person, who is waiting for them. The baton always goes in a forward motion.

It is said that the original waterfall model was created by Winston W. Royce. His original model consisted of seven phases:

  • System requirements
  • Software requirements
  • Analysis
  • Program Design
  • Coding
  • Testing
  • Operations

As you can see, the waterfall model is really just one iteration of the software development life cycle. There are now many variations of the phases in the waterfall model, but the idea that each phase cannot overlap and must be completed before moving on remains the same.

Because the outcome of each phase is critical for the next, one wrong decision can derail the whole iteration; therefore, most implementations of the waterfall model require documentation summarizing the findings of each phase as the input for the next phase. If the requirements change during the current iteration, those new requirements cannot be incorporated until the next waterfall iteration, which can get costly for large software projects, and cause significant delays before requested features are made available to users.

Agile Software Development

The Agile method is flexible and customer-focused. Although methodologies similar to Agile were already being practiced, the Agile model wasn't official until 2001, when seventeen software developers joined together to figure out a solution to their frustrations with the current options and came up with the Manifesto for Agile Software Development, also known as the Agile Manifesto.

Agile Manifesto

According to the Agile Manifesto, the values of Agile are:

  • Individuals and interactions over processes and tools
  • Working software over comprehensive documentation
  • Customer collaboration over contract negotiation
  • Responding to change over following a plan

The manifesto lists twelve different principles:

  1. Customer focus - Our highest priority is to satisfy the customer through early and continuous delivery of valuable software.
  2. Embrace change and adapt - Welcome changing requirements, even late in development. Agile processes harness change for the customer's competitive advantage.
  3. Frequent delivery of working software - Deliver working software frequently, from a couple of weeks to a couple of months, with a preference to the shorter timescale.
  4. Collaboration - Business people and developers must work together daily throughout the project.
  5. Motivated teams - Build projects around motivated individuals. Give them the environment and support they need, and trust them to get the job done.
  6. Face-to-face conversations - The most efficient and effective method of conveying information to and within a development team is face-to-face conversation.
  7. Working software - Working software is the primary measure of progress.
  8. Work at a sustainable pace - Agile processes promote sustainable development. The sponsors, developers, and users should be able to maintain a constant pace indefinitely.
  9. Agile environment - Continuous attention to technical excellence and good design enhances agility.
  10. Simplicity - The art of maximizing the amount of work not done--is essential.
  11. Self-organizing teams - The best architectures, requirements, and designs emerge from self-organizing teams.
  12. Continuous Improvement - At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behavior accordingly.

Agile Methods

Agile Methods

The Agile Manifesto by design, wasn't precise about how Agile should work. After forging the Manifesto, its originators (and many others) kept evolving these ideas, absorbing new ideas from many sources, and testing them in practice. As a result, over the past few decades, many takes on Agile have emerged, and some have become widely popular. These include:

  • Agile Scrum - In rugby, the term scrum describes a point in gameplay where players crowd together and try to gain possession of the ball. The Scrum focuses on small, self-organizing teams that meet daily for short periods and work in iterative sprints , constantly adapting deliverables to meet changing requirements.
  • Lean - Based on Lean Manufacturing, the Lean method emphasizes elimination of wasted effort in planning and execution, and reduction of programmer cognitive load.
  • Extreme Programming (XP) - Compared with Scrum, XP is more prescriptive about software engineering best-practices, and more deliberately addresses the specific kinds of quality-of-life issues facing software development teams.
  • Feature-Driven Development (FDD) - FDD prescribes that software development should proceed in terms of an overall model, broken out, planned, designed, and built feature-by-feature. It specifies a process of modeling used to estimate and plan feature delivery, and a detailed set of roles for both the core development team and support people.

Of the methodologies described above, Agile Scrum is probably the most popular. We'll discuss some Scrum terms and concepts below that have been more or less universally adopted by the Agile community across all methodologies.

Sprints

In the Agile model, the software development life cycle still applies. Unlike the waterfall method, where there is one long iteration of the SDLC, Agile is many quick iterations of the SDLC.

These quick iterations are called sprints, and the purpose of sprints is to accomplish the frequent delivery of working software principle of the Agile manifesto. A sprint is a specific period of time (time-boxed) which is usually between two weeks and four weeks, but preferably as short as possible. The duration of the sprint should be determined before the process begins and should rarely change.

During a sprint, each team takes on as many tasks, also known as user stories, as they feel they can accomplish within the time-boxed duration of the sprint. When the sprint is over, the software should be working and deliverable, but that doesn't necessarily mean that it will be delivered; a sprint doesn't always lead to a release, but Agile requires the software to remain deliverable.

Backlog

It is the role of the product owner to create the backlog. This backlog is made up of all of the features for the software, in a prioritized list. The features in the backlog are a result of the Requirements & Analysis phase, and include features that won't necessarily be in the immediate release. New features can be added to the backlog at any time, and the product owner can reprioritize the backlog based on customer feedback.

User stories

When a feature gets close to the top of the priority list, it gets broken down into smaller tasks called user stories. Each user story should be small enough that a single team can finish it within a single sprint. If it's too large to be completed in a single sprint, the team should break it down further. Because the software must be deliverable at the end of each sprint, a user story must also abide by those rules.

A user story is a simple statement of what a user (or a role) needs, and why. The suggested template for a user story is:

As a <user|role>, I would like to <action>, so that <value|benefit>

Completing a user story requires completing all of the phases of the SDLC. The user story itself should already have the requirements defined by the product owner, and the team taking on the user story needs to come up with a design for the task, implement it and test it.

Scrum Teams

Scrum teams are cross-functional, collaborative, self-managed and self-empowered. Ideally, these scrum teams should not be larger than 10 individuals, but they should be big enough to finish a user story within a sprint.

Every day, each scrum team should have a daily standup. A standup is a meeting that should last no longer than 15 minutes, and should take place at the same time every day. In fact, it's called a "standup" because ideally it should be short enough for the team to accomplish it without having to sit down.

The goal of the daily standup is to keep all team members in sync with what each person has accomplished since the last standup, what they are going to work on until the next standup, and what obstacles they have that may be preventing them from finishing their task. The scrum master facilitates these standups, and their job is to report and/or help remove obstacles.

Lean Software Development

Lean software development is based on Lean Manufacturing principles, which are focused on minimizing waste and maximizing value to the customer.

In its simplest form, Lean Software Development delivers only what the customers want. In the book Lean Software Development: An Agile Toolkit , there are seven principles for lean:

  • Eliminate waste
  • Amplify learning
  • Decide as late as possible
  • Deliver as fast as possible
  • Empower the team
  • Build integrity in
  • Optimize the whole

Eliminate waste

Waste is anything that does not add value for customers. The definition of value is subjective, however, because it's determined by the customer. Eliminating waste is the most fundamental lean principle, the one from which all the other principles follow.

To eliminate waste, you must be able to understand what waste is. Waste is anything that does not add direct value to the customer. There are seven wastes of software development:

  • Partially Done Work
  • Extra Processes
  • Extra Features
  • Task Switching
  • Waiting
  • Motion
  • Defects

Partially done work

Partially done work is a waste because:

  • It doesn't add any value to the customer
  • The time and resources spent doing this work could have been used on something that is of value to the customer
  • The work usually isn't maintained, so it eventually becomes obsolete

Extra processes

Extra processes are just like a bunch of paperwork. As a result, they are a waste for pretty much the same reasons as partially done work.

Extra features

If the customer didn't ask for it, it doesn't bring them value. It might be nice to have, but it's better to use the resources to build exactly what customers want.

Task switching

Humans need time to switch their mind to focus on another task, and that time spent switching contexts is a waste. Task switching wastes a resource's (person’s) time, so it's a waste to assign a resource to multiple projects.

Waiting

Many people would agree that by definition, waiting is a big waste of time. So, any type of delay is a waste. Examples of a delay in software development are delays in:

  • starting the project
  • getting the right resources (staff)
  • getting the requirements from the customer
  • approvals of documentation
  • getting answers
  • making decisions
  • implementation
  • testing

Motion

Lean software development defines motion for two things: people and artifacts. Motion for people is when people need to physically walk from their desk to another team member to ask a question, collaborate, and so on. When they move from their desk, it is not only the time it takes for them to get to the destination that is a waste, but also the task switching time.

The motion of artifacts is when documents or code are moved from one person to another. Most of the time, the document doesn't contain all of the information for the next person, so either that person has to gather the information again, or the hand-off requires time, which is a waste.

Defects

Unnoticed defects (otherwise known as bugs) are a waste because of the impact of the defect. The defect can cause a snowball effect with other features, so the time it takes to debug it is a waste. Also, for a customer, the value of the software is reduced when they run into issues, so the feature ends up being a waste.

Lean Software Development (Cont.)

Amplify Learning with Short Sprints

To be able to fine tune software, there should be frequent short iterations of working software. By having more iterations:

  • Developers learn faster
  • Customers can give feedback sooner
  • Features can be adjusted so that they bring customers more value

Decide as Late as Possible

When there is uncertainty, it is best to delay the decision-making until as late as possible in the process, because it's better to base decisions on facts rather than opinion or speculation.

Also, when a decision isn't yet made, the software is built to be flexible in order to accommodate the uncertainty. This flexibility enables developers to make changes when a decision is made--or in the future, if requirements change.

Deliver as Fast as Possible

Delivering the software faster:

  • Enables customers to provide feedback
  • Enables developers to amplify learning
  • Gives customers the features they need now
  • Doesn't allow customers to change their mind
  • Makes everyone make decisions faster
  • Produces less waste

You'll notice that each of these reasons practices at least one of the previously discussed lean principles.

Empower the Team

Each person has their own expertise, so let them make decisions in their area of expertise. When combined with the other principles such as eliminating waste, making late decisions, and fast deliveries, there isn't time for others to make decisions for the team.

Build Integrity In

Integrity for software is when the software addresses exactly what the customer needs. Another level of integrity is that the software maintains its usefulness for the customer.

Optimize the Whole

Although one of the principles is empowering the team, each expert must take a step back and see the big picture. The software must be built cohesively. The value of the software will suffer if each expert only focuses on their expertise and doesn't consider the ramifications of their decisions on the rest of the software.


Software Design Patterns


Introduction

Software design patterns are best practice solutions for solving common problems in software development. Design patterns are language-independent. This means that they can be implemented in any contemporary, general-purpose computing language, or in any language that supports object-oriented programming. Often, popular design patterns encourage creation of add-on frameworks that simplify implementation in widely-used languages and paradigms.

Artisans have always shared time-tested methods and techniques for problem-solving. Calling these things "design patterns" was first done systematically in the field of architecture and urban planning. Architectural patterns were organized by abstract class of problem solved, urging designers to recognize common core themes shared across numerous divergent contexts. For example, a bus stop and a hospital waiting room, are both places in which people wait; so both can usefully implement features of the pattern A PLACE TO WAIT.

This way of thinking about patterns was quickly picked up by pioneers in object-oriented coding and Agile software development. In 1994, Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides (known collectively as the Gang of Four (GoF)) published a book called Design Patterns - Elements of Reusable Object-Oriented Software. We'll offer a broad view of the patterns they identified and documented.

Program to an interface, not an implementation. Tightly coupling mainline program logic with implementations of specific functionality tends to make code hard to understand and maintain. Experience has shown that it works better to loosely-couple logical layers by using abstract interfaces. For instance, the mainline code calls functions and methods in a generic way. Lower-level functions implement matching interfaces for the functionality they provide, ensuring, for example, that all serialization functions used in a program are called in similar fashion.

Object-oriented languages like Java formalize these ideas. They enable explicit declaration of interfaces that classes can implement. An interface definition is basically a collection of function prototypes, defining names and types for functions and parameters that higher-level logic might use to invoke a range of classes. For example, the interface to a range of 'vehicle' classes (e.g., class 'car,' 'motorcycle,' 'tractor') might include start_engine(), stop_engine(), accelerate(), and brake() prototypes.

Favor object composition over class inheritance. Object-oriented languages enable inheritance: more generalized base classes can be inherited by derived classes. Thus a 'duck' class might inherit substantial functionality from a 'bird' class. This requires, however, that the bird class implement a very wide range of methods, most of which may not be used by a specific derived class.

The principle of favoring composition over inheritance suggests that a better idea may be to favor implementing a specific class (class - duck) by creating only required unique subclasses (class - quack) along with abstract interfaces (interface - 'ducklike') to classes (class - fly, class - swim) that can be shared widely in similar fashion (class - penguin implements interface 'penguinlike,' enabling sharing of class 'swim,' but not 'fly'). Organizing software in this way has proven to be most flexible, ultimately easier to maintain, and encourages reuse of code.

Software design patterns have already been proven to be successful, so using them can speed up development because developers don't need to come up with new solutions and go through a proof of concept to make sure they work.

The Original Design Patterns

In their Design Patterns book, the Gang of Four divided patterns into three main categories:

  • Creational - Patterns used to guide, simplify, and abstract software object creation at scale.
  • Structural - Patterns describing reliable ways of using objects and classes for different kinds of software projects.
  • Behavioral - Patterns detailing how objects can communicate and work together to meet familiar challenges in software engineering.

They listed a total of 23 design patterns, which are now considered the foundation of newer design patterns. Most of these patterns, at some level, express basic principles of good object-oriented software design.

Let's dive deeper into two of the most commonly used design patterns: the Observer design pattern (a Behavioral design pattern), and Model-View-Controller (MVC).

Observer Design Pattern

The observer design pattern is a subscription notification design that lets objects (observers or subscribers) receive events when there are changes to an object (subject or publisher) they are observing. Examples of the observer design pattern abound in today's applications. Think about social media. Users (observers) follow other users (subjects). When the subject posts something on their social media, it notifies all of the observers that there is a post, and the observers look at that update.

To implement this subscription mechanism:

  1. The subject must have the ability to store a list of all of its observers.
  2. The subject must have methods to add and remove observers.
  3. All observers must implement a callback to invoke when the publisher sends a notification, preferably using a standard interface to simplify matters for the publisher. This interface needs to declare the notification mechanism, and it needs to have parameters for the publisher to send the necessary data to the observer.

The execution of this design pattern looks like this:

  1. An observer adds itself to the subject's list of observers by invoking the subject's method to add an observer.
  2. When there is a change to the subject, the subject notifies all of the observers on the list by invoking each observer's callback and passing in the necessary data.
  3. The observer's callback is triggered, and therefore executed, to process the notification.
  4. Steps 2 and 3 continue whenever there is a change to the subject.
  5. When the observer is done receiving notifications, it removes itself from the subject's list of observers by invoking the subject's method to remove an observer.

The benefit of the observer design pattern is that observers can get real time data from the subject when a change occurs. Subscription mechanisms always provide better performance than other options, such as polling.

Model-View-Controller (MVC)


The Model-View-Controller (MVC) design pattern is sometimes considered an architectural design pattern. Its goal is to simplify development of applications that depend on graphic user interfaces. MVC abstracts code and responsibility into three different components: model, view, and controller. Each component communicates with each other in one direction. This design pattern is commonly used in user interfaces and web application.

Components

  • Model - The model is the application's data structure and is responsible for managing the data, logic and rules of the application. It gets input from the controller.
  • View - The view is the visual representation of the data. There can be multiple representations of the same data.
  • Controller - The controller is like the middleman between the model and view. It takes in user input and manipulates it to fit the format for the model or view.

The execution of the Model-View-Controller looks like this:

  1. The user provides input.
  2. The controller accepts the input and manipulates the data.
  3. The controller sends the manipulated data to the model.
  4. The model accepts the manipulated data, processes it, and sends the selected data (in the strictest forms of MVC, via the controller) to the view.
  5. The view accepts the selected data and displays it to the user.
  6. The user sees the updated data as a result of their input.

The benefit of the Model-View-Controller design pattern is that each component can be built in parallel. Because each component is abstracted, the only information each component needs is the input and output interface for the other two components. Components don't need to know about the implementation within the other components. What's more, because each component is only dependent on the input it receives, components can be reused as long as the other components provide the data according to the correct interface.

Version Control Systems



Types of Version Control Systems

Version control, also called version control systems, revision control or source control, is a way to manage changes to a set of files in order to keep a history of those changes. Think of all the times you have made a copy of a file before modifying it, just in case you want to revert to the original. Version control handles all of that for you.

Version control systems store the master set of files and the history of changes in a repository, also known as a repo. In order to make a change to a file, an individual must get a working copy of the repository onto their local system. The working copy is the individual's personal copy of the files, where they can make changes without affecting others. Some of the benefits of version control are:

  • It enables collaboration - Multiple people can work on a project (a set of files) at the same time without overriding each other's changes.
  • Accountability and visibility - Know who made what changes, when they made those changes and why.
  • Work in isolation - Build new features independently without affecting the existing software.
  • Safety - Files can be reverted when a mistake is made.
  • Work anywhere - Files are stored in a repository, so any device can have a working copy.

Types of Version Control Systems

There are three types of version control systems:

  • Local
  • Centralized
  • Distributed

Local Version Control System


Just like the name states, a Local Version Control System (LVCS) tracks files within a local system. A local version control system replaces the "make a copy of the file before editing further" scenario. The focus of a local version control system is mostly to be able to revert back to a previous version. This type of version control isn't meant to address most of the benefits listed above.

Local version control systems use a simple database to keep track of all of the changes to the file. In most cases, the system stores the delta between the two versions of the file, as opposed to the file itself. When the user wants to revert the file, the delta is reversed to get to the requested version.

Centralized Version Control System



A Centralized Version Control Systems (CVCS) uses a server-client model. The repository (also known as the repo), which is the only copy of the set of files and history, is stored in a centralized location, on a server. Whenever an individual wants to make a change to a file, they must first get a working copy of the file from the repository to their own system, the client.

In a centralized version control system, only one individual at a time can work on a particular file. In order to enforce that restriction, an individual must checkout the file, which locks the file and prevents it from being modified by anyone else. When the individual is done making changes, they must checkin the file, which applies the individual's changes to the master copy in the repo, tags a new version, and unlocks the file for others to make changes.

Distributed Version Control System

A Distributed Version Control System (DVCS) is a peer-to-peer model. The repository can be stored on a client system, but it is usually stored in a repository hosting service. When an individual wants to make a change to a file, they must first clone the full repository to their own system. This includes the set of files as well as all of the file history. The benefit of this model is that the full repository will be on multiple systems and can be used to restore the repository in the repository hosting service if an event such as data corruption occurs.

In a distributed version control system, every individual can work on any file, even at the same time, because the local file in the working copy is what is being modified. As a result, locking the file is not necessary. When the individual is done making the changes, they push the file to the main repository that is in the repository hosting service, and the version control system detects any conflicts between file changes.

Git

At the time of this writing, the most popular version control system in use is Git. Git is an open source implementation of a distributed version control system that is currently the latest trend in software development. Git:

  • Is easy to learn
  • Can handle all types of projects, including large enterprise projects
  • Has fast performance
  • Is built for collaborative projects
  • Is flexible
  • Has a small footprint
  • Has all the benefits of a distributed version control system
  • Is free

A Git client must be installed on a client machine. It is available for MacOS, Windows, and Linux/Unix. Though some Git clients come with a basic GUI, Git's focus is on the command line interface, about which we will go into detail later.

One key difference between Git and other version control systems is that Git stores data as snapshots instead of differences (the delta between the current file and the previous version). If the file does not change, git uses a reference link to the last snapshot in the system instead of taking a new and identical snapshot.

Git's 3s

Git is organized by threes -- three stages, and three states.

Git Three Stages


There are three stages in Git:

  • repository ( the .git directory)
  • working directory
  • staging area

REPOSITORY (.GIT DIRECTORY)

Because Git is a distributed version control system, each client has a full copy of the repository. When a project becomes a Git repository, a hidden .git directory is created, and it is essentially the repository. The .git directory holds metadata such as the files (compressed), commits, and logs (commit history).

WORKING DIRECTORY

The working directory is the folder that is visible in the filesystem. It is a copy of the files in the repository. These files can be modified, and the changes are only visible to the user of the client. If the client's filesystem gets corrupted, these changes will be lost, but the main repository remains intact.

STAGING AREA

The staging area stores the information about what the user wants added/updated/deleted in the repository. The user does not need to add all of their modified files to the stage/repo; they can select specific files. Although it is called an area, it is actually just an index file located in the .git directory.

Three States

Since there are three stages in Git, there are three matching states for a Git file:

  • committed - This is the version of the file has been saved in the repository (.git directory).
  • modified - The file has changed but has not been added to the staging area or committed to the repository.
  • staged - The modified file is ready to be committed to the repository.

Local vs. Remote Repositories

Git has two types of repositories, local and remote.

A local repository is stored on the filesystem of a client machine, which is the same one on which the git commands are being executed.

A remote repository is stored somewhere other than the client machine, usually a server or repository hosting service. Remote repositories are optional and are typically used when a project requires collaboration between a team with multiple users and client machines.

Remote repositories can be viewed as the "centralized" repository for Git, but that does not make it a CVCS. A remote repository with Git continues to be a DVCS because the remote repository will contain the full repository, which includes the code and the file history. When a client machine clones the repository, it gets the full repository without needing to lock it, as in a CVCS.

After the local repository is cloned from the remote repository or the remote repository is created from the local repository, the two repositories are independent of each other until the content changes are applied to the other branch through a manual Git command execution.

What is Branching?


Branching enables users to work on code independently without affecting the main code in the repository. When a repository is created, the code is automatically put on a branch called Master. Users can have multiple branches and those are independent of each other. Branching enables users to:

  • Work on a feature independently while still benefitting from a distributed version control system
  • Work on multiple features concurrently
  • Experiment with code ideas
  • Keep production, development, and feature code separately
  • Keep the main line of code stable

Branches can be local or remote, and they can be deleted. Local branches make it easy to try different code implementations because a branch can be used if it is successful and deleted if it is not. Merging a branch back to the parent branch is not mandatory.

Unlike other version control systems, Git's branch creation is lightweight, and switching between branches is almost instantaneous. Although branches are often visually drawn as separate paths, Git branches are essentially just pointers to the appropriate commit.


Branches are like a fork in the road, where it starts with the code and history at the point of diversion, then builds its own path with new commits independently. As a result, branches have their own history, staging area, and working directory. When a user goes from one branch to another, the code in their working directory and the files in the staging area change accordingly, but the repository (.git) directories remain unchanged.

Wherever possible, you should try to use branches rather than updating the code directly to the master branch in order to prevent accidental updates that break the code.

GitHub and Other Providers

Dealing with projects using Git is often associated with GitHub, but Git and GitHub are not the same. Git is an implementation of distributed version control and provides a command line interface. GitHub is a service, provided by Microsoft, that implements a repository hosting service with Git.

In addition to providing the distributed version control and source code management functionality of Git, GitHub also provides additional features such as:

  • code review
  • documentation
  • project management
  • bug tracking
  • feature requests

GitHub has evolved to support many forms of collaborative coding, including:

  • private repos visible only to designated teams
  • "social coding" projects that are public, but whose contributors may be anonymous
  • broad-based open source efforts with many contributors, sometimes numbering in the thousands

To enable project owners to manage in such widely-disparate scenarios, GitHub introduced the concept of the "pull request". A pull request is a way of formalizing a request by a contributor to review changes such as new code, edits to existing code, etc., in the contributor's branch for inclusion in the project's main or other curated branches. The pull request idiom is now universally-implemented in Git hosting services.

GitHub is not the only repository hosting service using Git, others include Gitlab and Bitbucket.

Git Commands

Setting up Git

After installing Git to the client machine, you must configure it. Git provides a git config command to get and set Git's global settings, or a repository's options.

To configure Git, use the --global option to set the initial global settings.

Command: git config --global key value

Using the --global option will write to the global ~/.gitconfig file.

For each user to be accountable for their code changes, each Git installation must set the user's name and email. To do so, use the following commands:

$ git config --global user.name "<user's name>"
$ git config --global user.email "<user's email>"

where <user's name> and <user's email> are the user's name and email address, respectively.

Create a New Git Repository

Any project (folder) in a client's local filesystem can become a Git repository. Git provides a git init command to create an empty Git repository, or make an existing folder a Git repository. When a new or existing project becomes a Git repository, a hidden .git directory is created in that project folder. Remember that the .git directory is the repository that holds the metadata such as the compressed files, the commit history, and the staging area. In addition to creating the .git directory, Git also creates the master branch.

Commandgit init

To make a new or existing project a Git repository, use the following command:

$ git init <project directory>

where the <project directory> is the absolute or relative path to the new or existing project. For a new Git repository, the directory in the provided path will be created first, followed by the creation of the .git directory.

Creating a Git repository doesn't automatically track the files in the project. Files need to be explicitly added to the newly created repository in order to be tracked. Details on how to add files to a repository will be covered later.


Get an Existing Git Repository

With Git, it is easy to get a copy of and contribute to existing repositories. Git provides a git clone command that clones an existing repository to the local filesystem. Because Git is a DVCS, it clones the full repository, which includes the file history and remote-tracking branches.

Command : git clone <repository> [target directory]

where <repository> is the location of the repository to clone. Git supports four major transport protocols for accessing the <repository>: Local, Secure Shell (SSH), Git, and HTTP. The [target directory] is optional and is the absolute or relative path of where to store the cloned files. If you don't provide the project directory, git copies the repository to the location where you executed the command.


When you execute the git clone command, Git:

  1. Creates the working directory on the local filesystem with the name of the repository or the specified name, if provided.
  2. Creates a .git directory inside the newly created folder.
  3. Copies the metadata of the repository to the newly created .git directory.
  4. Creates the working copy of the latest version of the project files.
  5. Duplicates the branch structure of the cloned, remote repository and enables tracking of changes made to each branch, locally and remotely — this includes creating and checking out a local active branch, "forked" from the cloned repository's current active branch.

Please see the official git clone documentation for more details and command line options.

View the Modified Files in the Working Directory

What has been changed in the working directory? What files were added in the staging area? Git provides a git status command to get a list of files that have differences between the working directory and the parent branch. This includes newly added untracked files and deleted files. It also provides a list of files that are in the staging area. Note that the difference is calculated based on the last commit that the local clone copied from the parent branch in the Git repository, not necessarily the latest version in the remote repository. If changes have been made since the repository was cloned, Git won't take those changes into account.

Command : git status

In addition to providing list of files, the output of the git status command provides additional information such as:

  • Current branch of the working directory
  • Number of commits the working directory is behind the latest version of the parent branch
  • Instructions on how to update the local repository and how to stage/unstage files

Please see the official git status documentation for more details and command line options.

Compare Changes Between Files

Want to know what was changed in a file, or the difference between two files? Git provides a git diff command that is essentially a generic file comparison tool.

Command : git diff

Because this command is a generic file difference tool, it includes many options for file comparison. When using this command, the file does not need to be a Git tracked file.

For example, you can:

1. Show changes between the version of the file in the working directory and the last commit that the local clone copied from the parent branch in the Git repository:

$ git diff <file path>

2. Show changes between the version of the file in the working directory and a particular commit from the file history:

$ git diff <commit id> <file path>
  1. Show changes between a file’s two commits from the file history. <file path> is the absolute or relative path of the file to compare and <commit id> is the id of the version of the file to compare.
$ git diff <commit id 1> <commit id 2> <file path>
  1. Show the changes between two files in the working directory or on disk.
$ git diff <file path 1> <file path 2>

Please see the official git diff documentation for more details and command line options.

Adding and Removing Files

Adding Files to the Staging Area

After changes have been made to a file in the working directory, it must first go to the staging area before it can be updated in the Git repository. Git provides a git add command to add file(s) to the staging area. These files being added to staging can include newly untracked files, existing tracked files that have been changed, or even tracked files that need to be deleted from the repository. Modified files don't need to be added to the working directory unless the changes need to be added to the repository.

Command : git add

This command can be used more than once before the Git repository is updated (using commit). Also, the same file can be added to the stage multiple times before a commit. Only the files that are specified in the git add command are added to the staging area.

To add a single file to the staging area:

$ git add <file path>

To add multiple files to the staging area where the <file path> is the absolute or relative path of the file to be added to the staging area and can accept wildcards.

$ git add <file path 1> ... <file path n>

To add all the changed files to the staging area:

$ git add .

Remember that Git has three stages, so adding files to the staging area is just the first step of the two-step process to update the Git repository.

Please see the official git add documentation for more details and command line options.


Removing Files from the Git Repository

There are two ways to remove files from the Git repository.

OPTION 1

Git provides a git rm command to remove files from the Git repository. This command will add the removal of the specified file(s) to the staging area. It does not perform the second step of updating the Git repository itself.

Command : git rm

To remove the specified file(s) from the working directory and add this change to the staging area, use the following command:

$ git rm <file path 1> ... <file path n>

where <file path> is the absolute or relative path of the file to be deleted from the Git repository.

To add the specified file(s) to be removed to the staging area without removing the file(s) itself from the working directory, use the following command:

$ git rm --cached <file path 1> ... <file path n>

This command will not work if the file is already in the staging area with changes.

Please see the official git rm documentation for more details and command line options.



OPTION 2

This option is a two step process. First, use the regular filesystem command to remove the file(s). Then, add the file to stage using the Git command git add that was discussed earlier.

$ rm <file path 1> ... <file path n>
$ git add <file path 1> ... <file path n>

This two step process is equivalent to using the git rm <file path 1> ... <file path n> command. Using this option does not allow the file to be preserved in the working directory.


Updating Repositories

Updating the Local Repository with the Changes in the Staging Area

Remember that in Git, changes to a file go through three stages: working directory, staging area, and repository. Getting the content changes from the working directory to the staging area can be accomplished with the git add command, but how do the updates get to the repository? Git provides a git commit command to update the local repository with the changes that have been added in the staging area.

Command : git commit

This command combines all of the content changes in the staging area into a single commit and updates the local Git repository. This new commit becomes the latest change in the Git repository. If there is a remote Git repository, it does not get modified with this command.

To commit the changes from the staging area, use the following command:

$ git commit

It is good software development practice to add a note to the commit to explain the reason for the changes. To commit the changes from the staging area with a message, use the following command:

$ git commit -m "<message>"

If the git commit command is executed without any content in the staging area, Git will return a message, and nothing will happen to the Git repository. This command only updates the Git repository with the content in the staging area. It will not take any changes from the working directory.

Please see the official git commit documentation for more details and command line options.


Updating the Remote Repository

In order to share the content changes from the local Git repository with others, the remote Git repository must be manually updated. Git provides a git push command to update the remote Git repository with the content changes from the local Git repository.

Command : git push

This command will not execute successfully if there is a conflict with adding the changes from the local Git repository to the remote Git repository. Conflicts occur when two people edit the same part of the same file. For example, if you clone the repository, and someone else pushes changes before you, your push may create a conflict. The conflicts must be resolved first before the git push will be successful.

To update the contents from the local repository to a particular branch in the remote repository, use the following command:

$ git push origin <branch name>

To update the contents from the local repository to the master branch of the remote repository, use the following command:

$ git push origin master

Please see the official git diff documentation for more details and command line options.

Updating Your Local Copy of the Repository

Local copies of the Git repository do not automatically get updated when another contributor makes an update to the remote Git repository. Updating the local copy of the repository is a manual step. Git provides a git pull command to get updates from a branch or repository. This command can also be used to integrate the local copy with a non-parent branch.

Command : git pull

To go into more details about the git pull command, when executing the command, the following steps happen:

  1. The local repository (.git directory) is updated with the latest commit, file history, and so on from the remote Git repository. (This is equivalent to the Git command git fetch.)
  2. The working directory and branch is updated with the latest content from step 1. (This is equivalent to the Git command git merge.)
  3. A single commit is created on the local branch with the changes from step 1. If there is a merge conflict, it will need to be resolved.
  4. The working directory is updated with the latest content.

To update the local copy of the Git repository from the parent branch, use the following command:

$ git pull

Or

$ git pull origin

To update the local copy of the Git repository from a specific branch, use the following command:

$ git pull origin <branch>

Please see the official git pull documentation for more details and command line options.


Branching Features

Creating and Deleting a Branch

Branches are a very useful feature of Git. As discussed earlier, there are many benefits of using branches, but one major benefit is that it allows features and code changes to be made independent of the main code (the master branch).

There are two options for creating a branch.

OPTION 1

Git provides a git branch command to list, create, or delete a branch.

Command : git branch

To create a branch, use the following command:

$ git branch <parent branch> <branch name>

where <parent branch> is the branch to branch off of and the <branch name> is the name to call the new branch.

When using this command to create a branch, Git will create the branch, but it will not automatically switch the working directory to this branch. You must use the git switch <branch name> command to switch the working directory to the new branch.

OPTION 2

Git provides a git checkout command to switch branches by updating the working directory with the contents of the branch.

Command : git checkout

To create a branch and switch the working directory to that branch, use the following command:

$ git checkout -b <parent branch> <branch name>

where <parent branch> is the branch to branch off of and the <branch name> is the name to call the new branch.

Deleting a Branch

To delete a branch, use the following command:

$ git branch -d <branch name>

Please see the official git branch and git checkout documentation for more details and command line options.

GET A LIST OF ALL BRANCHES

To get a list of all the local branches, use the following command:

$ git branch

Or

$ git branch --list

Merging Branches

Branches diverge from one another when they are modified after they are created. To get the changes from one branch (source) into another (target), you must merge the source branch into the target branch. When Git merges the branch, it takes the changes/commits from the source branch and applies it to the target branch. During a merge, only the target branch is modified. The source branch is untouched and remains the same.

For example:


  1. At commit#1, Branch B branched off of Branch A.
  2. After the branches diverge, someone adds commit#2 to Branch A. Branch B does not get these changes.
  3. Someone adds commit#3 and commit#4 to Branch B. Branch A does not get these changes.
  4. Someone adds commit#5 to Branch A. Branch B does not get these changes.
  5. Now, Branch A and Branch B have diverged by two commits each.
  6. Let's say that Branch B wants the changes from Branch A since it diverged (commit#2 and commit#5). So, Branch A is the source branch and Branch B is the target branch. In this example, let's state that the commits were changes in different files. As a result, commit#2 and commit#5 are applied to Branch B and Branch A remains the same. This is called a fast-forward merge.

FAST-FORWARD MERGE

A fast-forward merge is when the Git algorithm is able to apply the changes/commits from the source branch(es) to the target branch automatically and without conflicts. This is usually possible when different files are changed in the branches being merged. It is still possible when the same file is changed, but typically when different lines of the file have been changed. A fast-forward merge is the best case scenario when performing a merge.

In a fast-forward merge, Git integrates the different commits from the source branch into the target branch. Because branches are essentially just pointers to commits in the backend, a fast-forward merge simply moves the pointer that represents the HEAD of the target branch, rather than adding a new commit.

Note that in order to do a fast-forward merge, Git has to be able to merge all of the existing commits without encountering any conflicts.

MERGE CONFLICTS

Modifying the same file on different branches to be merged increases the chances of a merge conflict. A merge conflict is when Git is not able to perform a fast-forward merge because it does not know how to automatically apply the changes from the branches together for the file(s). When this occurs, the user must manually fix these conflicts before the branches can be merged together. Manually fixing the conflict adds a new commit to the target branch containing the commits from the source branch, as well as the fixed merge conflict(s).


PERFORMING THE MERGE

Git provides a git merge command to join two or more branches together.

Command : git merge

To merge a branch into the client's current branch/repository, use the following command:

$ git merge <branch name>

where <branch name> is the source branch that is being merged into the current branch

When using the git merge command, the target branch must be the current branch/repository, so to merge a branch into a branch that is not the client's current branch/repository, use the following commands:

$ git checkout <target branch name>
$ git merge <source branch name>

where <target branch name> is the target branch and the <source branch name> is the source branch.

To merge more than one branch into the client's current branch/repository, use the following command:

$ git merge <branch name 1>...<branch name n>

This is called an octopus merge.

Please see the official git merge documentation for more details and command line options.

.diff Files

What is a .diff file?

Developers use a .diff file to show how two different versions of a file have changed. By using specific symbols, this file can be read by other systems to interpret how files can be updated. The difference is used to implement the changes by comparing and merging the two versions. Some projects require changes to be submitted through a .diff file as a patch. Because it's all in one file, it's referred to as a unified diff .

The symbols and meanings in a unified diff file are shown below:

  • + : Indicates that the line has been added.
  • - : Indicates that the line has been removed.
  • /dev/null : Shows that a file has been added or removed.
  • or "blank": Gives context lines around changed lines.
  • @@ : A visual indicator that the next block of information is starting. Within the changes for one file, there may be multiple.
  • index : Displays the commits compared.

Example diff for a file named check-network.yml :

diff --git a/check-network.yml b/check-network.yml
index 09b4f0c..b1978ca 100644
--- a/check-network.yml
+++ b/check-network.yml
@@ -4,7 +4,7 @@
   roles:
     - ansible-pyats
   vars:
-    snapshot_file: "{{ inventory_hostname }}_bgp.json"
+    snapshot_file: "{{ inventory_hostname }}_routes.json"
   tasks:
   - set_fact:
       snapshot_data: "{{ lookup('file', snapshot_file) | from_json }}"
@@ -13,7 +13,7 @@
 #      var: snapshot_data
 #
   - pyats_parse_command:
-      command: show ip route bgp
+      command: show ip route
       compare: "{{ snapshot_data }}"
     register: command_output

The signal can be a "+" or a "-" depending on the order of the hashes.

In this format, there are three lines shown above and below the exact changed line for context, but you can spot the differences by comparing the - line with the + line. One of the changes in this patch is to change the snapshot file name, replacing ...bgp.json with ...routes.json .

-    snapshot_file: "{{ inventory_hostname }}_bgp.json"
+    snapshot_file: "{{ inventory_hostname }}_routes.json"

You can always look at the difference between two files from a GitHub Pull Request as a unified diff by adding .diff to the GitHub URL.

Coding Basics



Methods, Functions, Modules, and Classes

It may seem easy to throw code in a file and make it work. But as project size and complexity grows, and as other developers (and stakeholders) get involved, disciplined methods and best practices are needed to help developers write better code and collaborate around it more easily.

One thing that most developers agree on is trying to write clean code. But, what is clean code?

Clean Code

Clean code is the result of developers trying to make their code easy to read and understand for other developers. What constitutes "clean code" is somewhat subjective, but here are some common principles:

  • Is the formatting of the code neat, and does it follow generally-accepted formatting practices for the computer language(s) involved, and/or meet specific requirements of the institutional, project, and/or team "stylebook"?
    • Does it stick with ALL tabs or ALL spaces?
    • Does it use the same number of tabs or spaces per indentation level, throughout the file? (Some languages, such as Python, make this a requirement.)
    • Does it have the indentation in the right places?
    • Does it use consistent formatting for syntax, such as the location of curly braces ({})?
  • Are variable, object, and other names used in the code intuitive?
  • Is the code organized in a way that it makes sense? For example, are declarations grouped, with functions up top, mainline code at the bottom, or otherwise, depending on language and context?
  • Is the code internally documented with appropriate comments?
  • Does each line of code serve a purpose? Have you removed all unused code?
  • Is the code written so that common code can be reused, and so that all code can be easily unit-tested?

Clean code emphasizes:

  • standardized formatting and intuitive naming for ease of reading, understanding, and searching
  • overall organization to communicate intention and make things easier to find
  • modularity to facilitate testing and reuse
  • inline and other comments
  • other characteristics that help make code self-documenting

In theory, other developers should be able to understand, use, and modify your code without being able to ask you questions. This accelerates development enormously, enabling reuse, debugging, security analysis, code review and merging, along with other processes. It lets longstanding projects (for example, open source projects) incorporate your code with greater confidence. And it lets organizations keep using your code efficiently, after you have moved on to other projects or jobs.

By contrast, code that is not clean quickly becomes technical debt. It may be unmaintainable, or unusable with complete confidence. It is code that needs to be refactored, or removed and rewritten, which is expensive and time-consuming.

What are some other reasons developers want to write clean code?

  1. Clean code is easier to understand, more compact, and better-organized, which tends to result in code that works correctly (fewer bugs) and performs as required.
  2. Clean code, being modular, tends to be easier to test using automated methods such as unit testing frameworks.
  3. Clean code, being standardized, is easier to scan and check using automated tools such as linters, or command-line tools like grep, awk, and sed.
  4. It just looks nicer.

Now that you understand the goals of writing clean code, you can dive deeper into coding best practices. Specifically, looking at how to break code into methods and functions, modules, and classes.

Methods and Functions

Methods and functions share the same concept; they are blocks of code that perform tasks when executed. If the method or function is not executed, those tasks will not be performed. Although there are no absolute rules, here are some standard best-practices for determining whether a piece of code should be encapsulated (in a method or function):

  • Code that performs a discrete task, even if it happens only once, may be a candidate for encapsulation. Classic examples include utility functions that evaluate inputs and return a logical result (for example, compare two strings for length), perform I/O operations (for example, read a file from disk), or translate data into other forms (for example, parse and process data). In these case, you encapsulate for clarity and testability, as well as for possible future re-use or extension.
  • Task code that is used more than once should probably be encapsulated. If you find yourself copying several lines of code around, it probably needs to be a method or function. You can accommodate slight variations in usage using logic to assess passed parameters (see below).

The most powerful thing about methods and functions is that they can be written once and executed as many times as you like. If used correctly, methods and functions will simplify your code, and therefore reduce the potential for bugs.

Syntax of a Function in Python:

# Define the function
def functionName:
  ...blocks of code...
# Call the function
functionName()

Arguments and Parameters

Another feature of methods and functions is the ability to execute the code based on the values of variables passed in on execution. These are called arguments. In order to use arguments when calling a method or function, the method or function needs to be written to accept these variables as parameters.

Parameters can be any data type and each parameter in a method or function can have a different data type. Arguments passed to the method or function must match the data type(s) expected by the method or function.

Depending on the coding language, some languages require the data type to be defined in the parameter (so-called 'strongly typed' languages), while some permit this optionally.

Even when parameter type specification is not required, it is usually a good idea. It makes code easier to reuse because you can more easily see what kind of parameters a method or function expects. It also makes error messages clearer. Type mismatch errors are easy to fix, whereas a wrong type passed to a function may cause errors that are difficult to understand, deeper in the code.

Parameters and arguments add flexibility to methods and functions. Sometimes the parameter is just a boolean flag that determines whether certain lines of code should be executed in the method or function. Think of parameters as being the input to the method or function.

Syntax of a function using arguments and parameters in Python:

# Define the function
def functionName(parameter1,...,parameterN):
  # You can use the parameters just like local variables
  ...blocks of code...
# Call the function
functionName("argument1", 4, {"argument3":"3"})

The example above is passing this function a string, an integer (or number), and an object containing a key and a value.

Return Statements

Methods and functions perform tasks, and can also return a value. In many languages, the return value is specified using the keyword return followed by a variable or expression. This is called the return statement. When a return statement is executed, the value of the return statement is returned and any code below it gets skipped. It is the job of the line of code calling the method or function to grab the value of the return, but it is not mandatory.

Syntax of a function with a return statement in Python:

# Define the function
def functionName(parameter1,...,parameterN):
  # You can use the parameters just like local variables
  ...blocks of code...
  someVariable = parameter1 * parameter2
  return someVariable
# Call the function
myVariable = functionName("argument1", 4, {"argument3":"3"})

In the above example, the returned value would be the string "argument1argument1argument1argument1", because Python lets you concatenate strings using the multiplication operator.

Function Example

Let's say your original code looks like this:

# Print the circumference for circles with a radius of 2, 5, and 7
radius1 = 2
radius2 = 5
radius3 = 7
# Formula for a circumference is c = pi * diameter
# Formula for a diameter is d = 2 * radius
pi = 3.14 # (Will hardcode pi in this example)
circumference1 = pi * radius1 * 2
print ("Circumference of a circle with a radius of " + str(radius1) + " is " + str(circumference1))
circumference2 = pi * radius2 * 2
print ("Circumference of a circle with a radius of " + str(radius2) + " is " + str(circumference2))
circumference3 = pi * radius3 * 2
print ("Circumference of a circle with a radius of " + str(radius3) + " is " + str(circumference3))

As you can see, there is a lot of duplicate code here.

By using methods with parameters, your code can be cleaned up:

# Print the circumference for circles with a radius of 2, 5, and 7
# In this example, circumference and
# printCircumference are the names of
# the functions. 'radius' is a parameter to
# the function and can be used in the function.
# This function returns the value of the
# circumferenceValue to the code that called
# the function.
def circumference(radius):
  # Formula for a circumference is c = pi * diameter
    # Formula for a diameter is d = 2 * radius
  pi = 3.14 # (Will hardcode pi in this example)
  circumferenceValue = pi * radius * 2
  return circumferenceValue
def printCircumference(radius):
  myCircumference = circumference(radius)
  print ("Circumference of a circle with a radius of " + str(radius) + " is " + str(myCircumference))
radius1 = 2
radius2 = 5
radius3 = 7
# In the below line of code, the value of radius1 (2)
# is the argument to the printCircumference function
printCircumference(radius1)
printCircumference(radius2)
printCircumference(radius3)

Notice that the version of code that uses functions, parameters, and arguments results in no duplicate code. Also, by using functions, you are able to label blocks of code, which make their purposes more understandable. If this example were more complicated and there were a lot of lines of code within each function, having the blocks of code duplicated three times in the file would make it much harder to understand.

Methods vs. Functions

If methods and functions share the same concept, why are they named differently? The difference between methods and functions is that functions are standalone code blocks while methods are code blocks associated with an object, typically for object-oriented programming.

Method example

The code from the function example can be modified to turn the function into a method, producing the same result:

# Print the circumference for circles with a radius of 2, 5, and 7
# In this example, there is a class named Circle.
# Classes will be discussed later. circumference
# and printCircumference are the names of the
# method in the class. These methods returns the
# value of the circumferenceValue to the code
# that called the method.
class Circle:
    def __init__(self, radius):
        self.radius = radius
    def circumference(self):
      # Formula for a circumference is c = pi * diameter
      # Formula for a diameter is d = 2 * radius
      pi = 3.14 # (Will hardcode pi in this example)
      circumferenceValue = pi * self.radius * 2
      return circumferenceValue
    def printCircumference(self):
      myCircumference = self.circumference()
      print ("Circumference of a circle with a radius of " + str(self.radius) + " is " + str(myCircumference))
radius1 = 2
radius2 = 5
radius3 = 7
# Since Circle is a class, it must be instatiated
# with the value of the radius first.
circle1 = Circle(radius1)
# Since printCircumference is a method, it must be
# called using the [class instance].[method]
# syntax. Just calling printCircumference() will
# not work
circle1.printCircumference()
circle2 = Circle(radius2)
circle2.printCircumference()
circle3 = Circle(radius3)
circle3.printCircumference()

Modules

Modules are a way to build independent and self-contained chunks of code that can be reused. Developers typically use modules to divide a large project into smaller parts. This way the code is easier to read and understand, and each module can be developed in parallel without conflicts. A module is packaged as a single file. In addition to being available for integration with other modules, it should work independently.

A module consists of a set of functions and typically contains an interface for other modules to integrate with. It is essentially, a library, and cannot be instantiated.

Module example

Below is a module with a set of functions saved in a script called circleModule.py. You will see this script again later in the lab for this topic.

# Given a radius value, print the circumference of a circle.
# Formula for a circumference is c = pi * 2 * radius

class Circle:

    def __init__(self, radius):
        self.radius = radius

    def circumference(self):
      pi = 3.14
      circumferenceValue = pi * self.radius * 2
      return circumferenceValue

    def printCircumference(self):
      myCircumference = self.circumference()
      print ("Circumference of a circle with a radius of " + str(self.radius) + " is " + str(myCircumference))

An application that exists in the same directory as circleModule.py could use this module by importing it, instantiating the class, and then using dot notation to call its functions, as follows:

from circleModule import Circle
      
# First instantiation of the Circle class.
circle1 = Circle(2)
# Call the printCircumference for the instantiated circle1 class.
circle1.printCircumference()

# Two more instantiations and method calls for the Circle class.
circle2 = Circle(5)
circle2.printCircumference()

circle3 = Circle(7)
circle3.printCircumference()

Classes

Object-orient programming (OOP), as originally conceived, is based on some formally defined properties: encapsulation , data abstraction , polymorphism, and inheritance. In this course, using Python, you will focus on Python class structures as one manifestation of OOP.

In most OOP languages, and in Python, classes are a means of bundling data and functionality. Each class declaration defines a new object type.

Encapsulating functionality together with data storage in a single structure also accomplishes one aspect of data abstraction. Functions defined within a class are known as class methods. Classes may have class variables and object variables. As a new class object is created, new class data members and object data members (variables) are created. New classes may be defined, based on existing, previously defined classes, so that they inherit the properties, data members, and functionality (methods).

As with other Python data structures and variables, objects are instantiated (created) as they are first used, rather than being predeclared. A class may be instantiated (created) multiple times, and each with its own object-specific data attribute values. (Python classes also support class variables that are shared by all objects of a class.) Outside of the class name scope, class methods and data attributes are referenced using the dot notation: [class instance].[method name].

Note: Unlike other OOP languages, in Python, there is no means of creating 'private' class variables or internal methods. However, by convention, methods and variables with a single preceding underscore ( _ ) are considered private and not to be used or referenced outside of the class.

Code Review and Testing


What is a Code Review and Why Should You Do This?

A code review is when developers look over the codebase, a subset of code, or specific code changes and provide feedback. These developers are often called reviewers. It is better to have more than one reviewer when possible.

It is best to have reviewers who understand the purpose of the code so that they can give quality and relevant feedback. The reviewers will provide comments, typically to the author of the code, on what they think needs to be fixed. Because a lot of comments can be subjective, it is up to the author to decide if the comment needs to be addressed, but it is good to have agreement from the reviewer(s) if it will not be fixed. This code review process only happens after the code changes are complete and tested.

The goal of code reviews is to make sure that the final code:

  • is easy to read
  • is easy to understand
  • follows coding best practices
  • uses correct formatting
  • is free of bugs
  • has proper comments and documentation
  • is clean

Doing code reviews has benefits for the whole team. For the author, they get input from the reviewers and learn additional best practices, other ways the code could have been implemented, and different coding styles. As a result of the review, the author learns from their mistakes and can write better code the next time. Code reviews aren’t just for junior developers, they are a great learning process for all developers.

Code reviews also transfer knowledge about the code between developers. If the reviewers have to work on that piece of code in the future, they will have a better understanding of how it works.

Code reviews are a way to refine working code or spot potential bugs, which increases the quality of the code. In general, having another set of eyes on the code is never a bad thing.

Types of Code Reviews


There are many ways to do code reviews. Each one has its own benefits. The most common types of code review processes include:

  • Formal code review
  • Change-based code review
  • Over-the-shoulder code review
  • Email pass-around

Formal Code Review

A formal code review is where developers have a series of meetings to review the whole codebase. In this meeting, they go over the code line by line, discussing each one in detail. This type of code review process promotes discussion between all of the reviewers.

A formal code review enables reviewers to reach a consensus, which may result in better feedback. You might do a new code review every time the comments are addressed.

Details of the code review meetings, such as the attendees, the comments, and comments that will be addressed, are documented. This type of code review is often called Fagan inspection and is common for projects that use the waterfall software development methodology.

A modern adaptation of the formal code review is to have a single meeting to review only the code changes. This way, the code can benefit from the live discussion amongst reviewers. This is sometimes known as a walkthrough.

Change-Based Code Review

A change-based code review, also known as a tool-assisted code review, reviews code that was changed as a result of a bug, user story, feature, commit, etc.

In order to determine the code changes that need to be reviewed, a peer code review tool that highlights the code changes is typically used. This type of code review is initiated by the developers who made the code changes and are responsible for addressing the agreed upon comments. In this type of code review process, the reviewers usually perform the review independently and provide the comments via the peer code review tool.

A change-based code review makes it easy to determine the actual code changes to be reviewed and enables multiple reviewers to get a diverse look into the code.

Over-the-Shoulder Code Review

An over-the-shoulder code review is exactly what it sounds like. A reviewer looks over the shoulder of the developer who wrote the code. The developer who wrote the code goes through the code changes line by line and the reviewer provides feedback.

With this method, if the fix is not difficult, the code may be changed on the spot so that the reviewer can re-review it immediately. The benefit of an over-the-shoulder code review is that there is direct interaction between the author of the code and the reviewer, which allows for discussion about what is the right fix. The downside of this type of code review is that it typically involves only one reviewer, so the comments can be one-sided.

Email Pass-Around

An email pass-around review can occur following the automatic emails sent by source code management systems when a checkin is made. When the emails are sent, it is up to the other developers to review the code changes that were made in that checkin. The downside of this type of code review is that sometimes a single checkin can be just a piece of the whole code change, so it may not include the proper context to fully understand the code changes.

Testing

Why do coders test software? The simple answer is to make sure it works the way it is supposed to work. This answer conceals a wealth of nuance and detail.

To begin with, software testing is classically subdivided into two general categories:

  • Functional testing seeks to determine whether software works correctly. Does it behave as intended in a logical sense, from the lowest levels of detail examined with Unit Testing, to higher levels of complexity explored in Integration Testing?
  • Non-functional testing examines usability, performance, security, resiliency, compliance, localization, and many other issues. This type of testing finds out if software is fit for its purpose, provides the intended value, and minimizes risk.

You might think that functional testing happens early in the development cycle, and non-functional testing begins after parts of the software are built or even finalized. This is incorrect. Some types of non-functional testing (for example, determining whether a particular language, open source library, or component meets requirements of a design, or a standard) need to happen well before design is fixed.

"Agile" software development favors highly-adaptable, minimally-planned creation and extension of a Minimum Viable Product (MVP) over short sprints. This means the product exists, in some form, from very early on in the process. And that means it can be subject both to functional and non-functional tests from the start.

In fact, as you’ll see towards the end of this unit, some developers advocate using testing as a framework for guiding software development. This means capturing design requirements as tests, then writing software to pass those tests. This is called Test-Driven Development (TDD).

Let’s look at some methods and tools for testing the lines of code, blocks, functions, and classes.

Unit Testing

Detailed functional testing of small pieces of code (lines, blocks, functions, classes, and other components in isolation) is usually called Unit Testing. Modern developers usually automate this kind of testing using unit test frameworks. These test frameworks are software that lets you make assertions about testable conditions and determine if these assertions are valid at a point in execution. For example:

a = 2 + 2
assert a == 4

The assert keyword is actually native to Python. In this case, the assertion will return true because 2 + 2 does, in fact, equal 4. On the other hand, if you were to have:

assert a == 5

It would return false and trigger an error.

Collecting assertions and reporting on tests is made easier with testing frameworks. Some examples of test frameworks for Python include:

  • unittest — This is a framework included in Python by default. It lets you create test collections as methods extending a default TestCase class.
  • PyTest — This is a framework that is easily added to Python (from pip repositories: pip3 install pytest). PyTest can run unittest tests without modification, but it also simplifies testing by letting coders build tests as simple functions rather than class methods. PyTest is used by certain more-specialized test suites, like PyATS from Cisco.

Both are used in this part, so you can see some of the differences between them.

Simple Unit Testing with PyTest

PyTest is handy because it automatically executes any scripts that start with test_ or end with _test.py, and within those scripts, automatically executes any functions beginning with 'test_' or 'tests_'. So we can unit test a piece of code (such as a function) by copying it into a file, importing pytest, adding appropriately-named testing functions (names that begin with tests_ ), saving the file under a filename that also begins with 'tests_,' and running it with PyTest.

Suppose we want to test the function add5() , which adds 5 to a passed value, and returns the result:

def add5(v):
    myval = v + 5
    return myval

We can save the function in a file called tests_mytest.py. Then import pytest and write a function to contain our tests, called tests_add5() :

# in file tests_mytest.py
import pytest
def add5(v):
    myval = v + 5
    return myval
def tests_add5():
    r = add5(1)
    assert r == 6
    r = add5(5)
    assert r == 10
    r = add5(10.102645)
    assert r == 15.102645

The tests in our testing function use the standard Python assert keyword. PyTest will compile and report on those results, both when collecting test elements from the file (a preliminary step where PyTest examines Python's own code analysis and reports on proper type use and other issues that emerge prior to runtime), and while running the tests_add5() function.

You can then run the tests using:

pytest tests_mytest.py

And get a result that looks something like this:

============================= test session starts ==============================
platform darwin -- Python 3.8.1, pytest-5.3.5, py-1.8.1, pluggy-0.13.1
rootdir: /home/tana/python/mytest
collected 1 item                                                               
tests_mytest.py .                                                        [100%]
============================== 1 passed in 0.01s ===============================

Note that while the function under test is certainly trivial, many real-world programs contain functions that, like this one, perform math on their arguments. Typically, these functions are called by higher-level functions, which then do additional processing on the returned values.

If there is a mistake in a lower-level function, causing it to return a bad result, this will likely be reflected in higher-level output. But because of all the intermediary processing, it might be difficult or impossible to find the source of an error (or even note whether an error occurred) by looking at output of these higher-level functions, or at program output in general.

That is one reason why detailed unit testing is essential for developing reliable software. And it is a reason why unit tests should be added, each time you add something significant to code at any level, and then re-run with every change you make. We recommend that, when concluding a work session, you write a deliberately-broken unit test as a placeholder, then use a start-of-session unit test run to remind you where you left off.

Simple Unit Testing with unittest

The unittest framework demands a different syntax than PyTest. For unittest , you need to subclass the built-in TestCase class and test by overriding its built-in methods or adding new methods whose names begin with 'test_'. The example unit test script, above, could be modified to work with unittest like this:

import unittest
def add5(v):
    myval = v + 5
    return myval
class tests_add5(unittest.TestCase):
    def test_add5(self):
        self.assertEqual(add5(1),6)
        self.assertEqual(add5(5),10)
        self.assertEqual(add5(10.102645),15.102645)
if __name__ == '__main__':
    unittest.main()

As with PyTest, you import the unittest module to start. Your function follows.

To subclass the TestCase class, pass it to your own (derived) test class (again called tests_add5, though this is now a class, rather than a function), causing the latter to inherit all characteristics of the former. For more on Python object-oriented programming (OOP), see the documentation.

Next, use unittest's assertEqual method (this is one of a wide range of built-in test methods) in the same way that you used Python's native assert in the PyTest example. Basically, you are running your function with different arguments, and checking to see if returned values match expectations.

The last stanza is a standard way of enabling command-line execution of our program, by calling its main function; which, in this case, is defined by unittest.

Save this file (again as tests_mytest.py ), ensure that it is executable (for example, in Linux, using chmod +x tests_mytest.py ) and execute it, adding the -v argument to provide a verbose report:

python3 tests_mytest.py -v
test_add5 (__main__.tests_add5) ... ok
----------------------------------------------------------------------
Ran 1 test in 0.000s
OK

Integration Testing

After unit testing comes integration testing, which makes sure that all of those individual units you have been building fit together properly to make a complete application. For example, suppose an application that you are writing needs to consult a local web service to obtain configuration data, including the name of a relevant database host. You might want to test the values of variables set when these functions are called. If you were using PyTest, you could do that like this:

import requests   # python module that simplifies making web requests
def get_config():
    return requests.get("http://localhost/get_config").content
def set_config(dbhost):
    requests.get("http://localhost/config_action?dbhost="+dbhost)
save_dbhost = ""
def setUp():
    global save_dbhost
    save_dbhost = get_config()
def tearDown():
    global save_dbhost
    set_config(save_dbhost)
def test_setconfig():
    setUp()
    set_config("TESTVAL")
    assert get_config() == "ESTVAL"
    tearDown()

Note that your test_setconfig() method deliberately calls your setUp() function before running tests, and your tearDown() function afterward. In unittest, methods called setUp() and tearDown() are provided by the TestCase class, can be overridden in your defined subclass, and are executed automatically.

Running this code with PyTest might produce output like this:

============================== test session starts ===============================
platform linux2 -- Python 2.7.15+, pytest-3.3.2, py-1.5.2, pluggy-0.6.0
rootdir: /home/ubuntu/deploysample, inifile:
collected 1 item                                                                 
test_sample_app.py F                                                       [100%]
==================================== FAILURES ====================================
_________________________________ test_setconfig _________________________________
    def test_setconfig():
        setUp()
        set_config("TESTVAL")
>       assert get_config() == "ESTVAL"
E       AssertionError: assert 'TESTVAL' == 'ESTVAL'
E         - TESTVAL
E         ? -
E         + ESTVAL
test_sample_app.py:21: AssertionError
------------------------------- Captured log call --------------------------------
connectionpool.py          225 DEBUG    Starting new HTTP connection (1): localhost:80
connectionpool.py          437 DEBUG    http://localhost:80 "GET /get_config HTTP/1.1" 200 7
connectionpool.py          225 DEBUG    Starting new HTTP connection (1): localhost:80
connectionpool.py          437 DEBUG    http://localhost:80 "GET /config_action?dbhost=TESTVAL HTTP/1.1" 200 30
connectionpool.py          225 DEBUG    Starting new HTTP connection (1): localhost:80
connectionpool.py          437 DEBUG    http://localhost:80 "GET /get_config HTTP/1.1" 200 7
============================ 1 failed in 0.09 seconds ============================

If you fix the broken test, you can see that everything runs perfectly:

============================== test session starts ===============================
platform linux2 -- Python 2.7.15+, pytest-3.3.2, py-1.5.2, pluggy-0.6.0
rootdir: /home/ubuntu/deploysample, inifile:
collected 1 item                                                                 
test_sample_app.py .                                                       [100%]
============================ 1 passed in 0.07 seconds ============================

Again, you should run your integration tests before you make any changes for the day, whenever you make significant changes, and before you close out for the day. If you are using Continuous Integration, any errors you find must be corrected before you do anything else.

Note: You can run this script on your VM using pytest. However, understanding the output and fixing any errors is beyond the scope of this course.

Test-Driven Development (TDD)

Building small, simple unit and integration tests around small bits of code helps in two ways:

  • It ensures that units are fit for purpose. In other words, you make sure that units are doing what requirements dictate, within the context of your evolving solution.
  • It catches bugs locally and fixes them early, saving trouble later on when testing or using higher-order parts of your solution that depend on these components.

The first of these activities is as important as the second, because it lets testing validate system design or, failing that, guide local refactoring, broader redesign, or renegotiation of requirements.

Testing to validate design intention in light of requirements implies that you should write testing code before you write application code . Having expressed requirements in your testing code, you can then write application code until it passes the tests you have created in the testing code.

This is the principle of Test-Driven Development (sometimes called Test-First Development). The basic pattern of TDD is a five-step, repeating process:

  1. Create a new test (adding it to existing tests, if they already exist). The idea here is to capture some requirement of the unit of application code you want to produce.
  2. Run tests to see if any fail for unexpected reasons. If this happens, correct the tests. Note that expected failures, here, are acceptable (for example, if your new test fails because the function it is designed to test does not yet exist, that is an acceptable failure at this point).
  3. Write application code to pass the new test. The rule here is to add nothing more to the application besides what is required to pass the test.
  4. Run tests to see if any fail. If they do, correct the application code and try again.
  5. Refactor and improve application code. Each time you do, re-run the tests and correct application code if you encounter any failures.

By proceeding this way, the test harness leads and grows in lockstep with your application. This may be on a line-by-line basis, providing very high test coverage and high assurance that both the test harness and the application are correct at any given stopping-point. Co-evolving test and application code this way:

  • Obliges developers to consistently think about requirements (and how to capture them in tests).
  • Helps clarify and constrain what code needs to do (because it just has to pass tests), speeding development and encouraging simplicity and good use of design patterns.
  • Mandates creation of highly-testable code. This is code that, for example, breaks operations down into pure functions that can be tested in isolation, in any order, etc.


Understanding Data Formats



Data Formats

Rest APIs, which you'll learn about in the next module, let you exchange information with remote services and equipment. So do interfaces built on these APIs, including purpose-dedicated command-line interface tools and integration software development kits (SDKs) for popular programming languages.

When controlling these APIs through software, it is helpful to be able to receive and transmit information in forms that are standards-compliant, and machine- and human-readable. This lets you:

  • Easily use off-the-shelf software components and/or built-in language tools to convert messages into forms that are easy for you to manipulate and extract data from, such as data structures native to the programming language(s) you are using. You can also convert them into other standard formats that you may need for various purposes.
  • Easily write code to compose messages that remote entities can consume.
  • Read and interpret received messages yourself to confirm that your software is handling them correctly, and compose test messages by hand to send to remote entities.
  • More easily detect "malformed" messages caused by transmission or other errors interfering with communication.

Today, the three most popular standard formats for exchanging information with remote APIs are XML, JSON, and YAML. The YAML standard was created as a superset of JSON, so any legal JSON document can be parsed and converted to equivalent YAML, and (with some limitations and exceptions) vice-versa. XML, an older standard, is not as simple to parse, and in some cases, it is only partly (or not at all) convertible to the other formats. Because XML is older, the tools for working with it are quite mature.

Parsing XML, JSON, or YAML is a frequent requirement of interacting with application programming interfaces (APIs). Later in this course, you will learn more about REpresentational State Transfer (REST) APIs. For now, it is sufficient for you to understand that an oft-encountered pattern in REST API implementations is as follows:

  1. Authenticate, usually by POSTing a user/password combination and retrieving an expiring token for use in authenticating subsequent requests.
  2. Execute a GET request to a given endpoint (authenticating as required) to retrieve the state of a resource, requesting XML, JSON, or YAML as the output format.
  3. Modify the returned XML, JSON, or YAML.
  4. Execute a POST (or PUT) to the same endpoint (again, authenticating as required) to change the state of the resource, again requesting XML, JSON, or YAML as the output format and interpreting it as needed to determine if the operation was successful.

XML

Extensible Markup Language (XML) is a derivative of Structured, Generalized Markup Language (SGML), and also the parent of HyperText Markup Language (HTML). XML is a generic methodology for wrapping textual data in symmetrical tags to indicate semantics. XML filenames typically end in ".xml".

An Example XML Document

For example, a simple XML document might look like this:

<?xml version="1.0" encoding="UTF-8"?>
<!-- Instance list -->
<vms>
  <vm>
    <vmid>0101af9811012</vmid>
    <type>t1.nano</type>
  </vm>
  <vm>
    <vmid>0102bg8908023</vmid>
    <type>t1.micro</type>
  </vm>
</vms>

This example simulates information you might receive from a cloud computing management API that is listing virtual machine instances.

XML Document Body

For the moment, ignore the first line of the document, which is a special part known as the prologue (more on this below), and the second line, which contains a comment. The remainder of the document is called the body.

Notice how individual data elements within the body (readable character strings) are surrounded by symmetrical pairs of tags, the opening tag surrounded by < and > symbols, and the closing tag, which is similar, but with a "/" (slash) preceding the closing tag name.

Notice also that some tag pairs surround multiple instances of tagged data (for example, the <vm> and corresponding <vm> tags). The main body of the document, as a whole, is always surrounded by an outermost tag pair (for example, the <vms>...</vms> tag pair) or root tag pair.

The structure of the document body is like a tree, with branches coming off the root, containing possible further branches, and finally leaf nodes, containing actual data. Moving back up the tree, each tag pair in an XML document has a parent tag pair, and so on, until you reach the root tag pair.

User-Defined Tag Names

XML tag names are user-defined. If you are composing XML for your own application, best-practice is to pick tag names that clearly express the meaning of data elements, their relationships, and hierarchy. Tag names can be repeated as required to enclose multiple data elements (or tagged groupings of elements) of the same type. For example, you could create more <vm> tag pairs to enclose additional <vmid> and <type> groupings.

When consuming XML from an API, tag names and their meanings are generally documented by the API provider, and may be representative of usage defined in a public namespace schema.

Special Character Encoding

Data is conveyed in XML as readable text. As in most programming languages, encoding special characters in XML data fields presents certain challenges.

For example, a data field cannot contain text that includes the < or > symbols, used by XML to demarcate tags. If writing your own XML (without a schema), it is common to use HTML entity encodings to encode such characters. In this case the characters can be replaced with their equivalent < and > entity encodings. You can use a similar strategy to represent a wide range of special symbols, ligature characters, and other entities.

Note that if you are using XML according to the requirements of a schema, or defined vocabulary and hierarchy of tag names, (this is often the case when interacting with APIs such as NETCONF) you are not permitted to use HTML entities. In the rare case when special characters are required, you can use the characters' numeric representations, which for the less-than and greater-than signs, are < and > respectively.

To avoid having to individually find and convert special characters, it is possible to incorporate entire raw character strings in XML files by surrounding them with so-called CDATA blocks. Here is an example:

<hungarian_sentence><![CDATA[Minden személynek joga van a neveléshez.]]></hungarian_sentence>

XML Prologue

The XML prologue is the first line in an XML file. It has a special format, bracketed by <? and ?>. It contains the tag name xml and attributes stating the version and a character encoding. Normally the version is "1.0", and the character encoding is "UTF-8" in most cases; otherwise, "UTF-16". Including the prologue and encoding can be important in making your XML documents reliably interpretable by parsers, editors, and other software.

Comments in XML

XML files can include comments, using the same commenting convention used in HTML documents. For example:

<!-- This is an XML comment. It can go anywhere -->

XML Attributes

XML lets you embed attributes within tags to convey additional information. In the following example, the XML version number and character encoding are both inside the xml tag. However, the vmid and type elements could also be included as attributes in the xml tag:

<?xml version="1.0" encoding="UTF-8"?>
<!-- Instance list -->
<vms>
  <vm vmid="0101af9811012" type="t1.nano" />
  <vm vmid="0102bg8908023" type="t1.micro"/>
</vms>

There are a few things to notice here:

  • Attribute values must always be included in single or double quotes.
  • An element may have multiple attributes, but only one attribute for each specific name.
  • If an element has no content, you can use a shorthand notation in which you put the slash inside the open tag, rather than including a closing tag.

XML Namespaces

Some XML messages and documents must incorporate a reference to specific namespaces to specify particular tagnames and how they should be used in various contexts. Namespaces are defined by the IETF and other internet authorities, by organizations, and other entities, and their schemas are typically hosted as public documents on the web. They are identified by Uniform Resource Names (URNs), used to make persistent documents reachable without the seeker needing to be concerned about their location.

The code example below shows use of a namespace, defined as the value of an xmlns attribute, to assert that the content of an XML remote procedure call should be interpreted according to the legacy NETCONF 1.0 standard. This code-sample shows a NETCONF remote procedure call instruction in XML. Attributes in the opening rpc tag denote the message ID and the XML namespace that must be used to interpret the meaning of contained tags. In this case, you are asking that the remote entity kill a particular session. The NETCONF XML schema is documented by IETF.

<rpc message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
  <kill-session>
    <session-id>4</session-id>
  </kill-session>
</rpc>

Interpreting XML

In the above example, what is represented is intended as a list or one-dimensional array (called 'instances') of objects (each identified as an 'instance' by bracketing tags). Each instance object contains two key-value pairs denoting a unique instance ID and VM server type. A semantically-equivalent Python data structure might be declared like this:

vms [
  {
    {"vmid": "0101af9811012"},
    {"type": "t1.nano"}
  },
  {
    {"vmid": "0102bg8908023"},
    {"type": "t1.micro"}
  }  
]

The problem is that XML has no way of deliberately indicating that a certain arrangement of tags and data should be interpreted as a list. So we need to interpret the XML-writer's intention in making the translation. Mappings between XML tree structures and more-efficient representations possible in various computer languages require understanding data semantics. This is less true of more-modern data formats, like JSON and YAML, which are structured to map well onto common simple and compound data types in popular programming languages.

In this case, the <vm> tags bracketing each instance's data (id and type) are collapsed in favor of using plain brackets ( {} ) for grouping. This leaves you with a Python list of objects (which you can call 'vm objects,' but which are not explicitly named in this declaration), each containing two sub-objects, each containing a key/value pair.

JSON

JSON, or JavaScript Object Notation, is a data format derived from the way complex object literals are written in JavaScript (which is in turn, similar to how object literals are written in Python). JSON filenames typically end in ".json".

Here is a sample JSON file, containing some key/value pairs. Notice that two values are text strings, one is a boolean value, and two are arrays:

{
  "edit-config":
  {
    "default-operation": "merge",
    "test-operation": "set",
    "some-integers": [2,3,5,7,9],
    "a-boolean": true,
    "more-numbers": [2.25E+2,-1.0735],
  }
}

JSON Basic Data Types

JSON basic data types include numbers (written as positive and negative integers, as floats with decimal, or in scientific notation), strings, Booleans ('true' and 'false'), or nulls (value left blank).

JSON Objects

As in JavaScript, individual objects in JSON comprise key/value pairs, which may be surrounded by braces, individually:

{"keyname": "value"}

This example depicts an object with a string value (for example, the word 'value'). A number or Boolean would not be quoted.

JSON Maps and Lists

Objects may also contain multiple key/value pairs, separated by commas, creating structures equivalent to complex JavaScript objects, or Python dictionaries. In this case, each individual key/value pair does not need its own set of brackets, but the entire object does. In the above example, the key "edit-config" identifies an object containing five key/value pairs.

JSON compound objects can be deeply-nested, with complex structure.

JSON can also express JavaScript ordered arrays (or 'lists') of data or objects. In the above example, the keys "some-integers" and "more-numbers" identify such arrays.

No Comments in JSON

Unlike XML and YAML, JSON does not support any kind of standard method for including unparsed comments in code.

Whitespace Insignificant

Whitespace in JSON is not significant, and files can be indented using tabs or spaces as preferred, or not at all (which is a bad idea). This makes the format efficient (whitespace can be removed without harming meaning) and robust for use on command-lines and in other scenarios where whitespace characters can easily be lost in cut-and-paste operations.

YAML

YAML, an acronym for "YAML Ain't Markup Language", is a superset of JSON designed for even easier human readability. It is becoming more common as a format for configuration files, and particularly for writing declarative automation templates for tools like Ansible.

As a superset of JSON, YAML parsers can generally parse JSON documents (but not vice-versa). Because of this, YAML is better than JSON at some tasks, including the ability to embed JSON directly (including quotes) in YAML files. JSON can be embedded in JSON files too, but quotes must be escaped with backslashes \" or encoded as HTML character entities &quote;

Here is a version of the JSON file from the JSON subsection, expressed in YAML. Use this as an example to understand how YAML works:

---
edit-config:
  a-boolean: true
  default-operation: merge
  more-numbers:
  - 225.0
  - -1.0735
  some-integers:
  - 2
  - 3
  - 5
  - 7
  - 9
  test-operation: set
...

YAML File Structure

As shown in the example, YAML files conventionally open with three dashes ( --- alone on a line) and end with three dots ( ... also alone a line). YAML also accommodates the notion of multiple "documents" within a single physical file, in this case, separating each document with three dashes on its own line.

YAML Data Types

YAML basic data types include numbers (written as positive and negative integers, as floats with a decimal, or in scientific notation), strings, Booleans ( true and false ), or nulls (value left blank).

String values in YAML are often left unquoted. Quotes are only required when strings contain characters that have meaning in YAML. For example, { ,a brace followed by a space, indicates the beginning of a map. Backslashes and other special characters or strings also need to be considered. If you surround your text with double quotes, you can escape special characters in a string using backslash expressions, such as \n for newline.

YAML also offers convenient ways of encoding multi-line string literals (more below).

Basic Objects

In YAML, basic (and complex) data types are equated to keys. Keys are normally unquoted, though they may be quoted if they contain colons (:) or certain other special characters. Keys also do not need to begin with a letter, though both these features conflict with the requirements of most programming languages, so it is best to stay away from them if possible.

A colon (:) separates the key and value:

my_integer: 2
my_float: 2.1
my_exponent: 2e+5
'my_complex:key' : "my quoted string value\n"
0.2 : "can you believe that's a key?"
my_boolean: true
my_null: null # might be interpreted as empty string, otherwise

YAML Indentation and File Structure

YAML does not use brackets or containing tag pairs, but instead indicates its hierarchy using indentation. Items indented below a label are "members" of that labeled element.

The indentation amount is up to you. As little as a single space can be used where indentation is required, though a best-practice is to use two spaces per indent level. The important thing is to be absolutely consistent, and to use spaces rather than tabs.

Maps and Lists

YAML easily represents more complex data types, such as maps containing multiple key/value pairs (equivalent to dictionaries in Python) and ordered lists.

Maps are generally expressed over multiple lines, beginning with a label key and a colon, followed by members, indented on subsequent lines:

mymap:
  myfirstkey: 5
  mysecondkey: The quick brown fox

Lists (arrays) are represented in a similar way, but with optionally-indented members preceded by a single dash and space:

mylist:
  - 1
  - 2
  - 3

Maps and lists can also be represented in a so-called "flow syntax," which looks very much like JavaScript or Python:

mymap: { myfirstkey: 5, mysecondkey: The quick brown fox}
mylist: [1, 2, 3]

Long Strings

You can represent long strings in YAML using a 'folding' syntax, where linebreaks are presumed to be replaced by spaces when the file is parsed/consumed, or in a non-folding syntax. Long strings cannot contain escaped special characters, but may (in theory) contain colons, though some software does not observe this rule.

mylongstring: >
  This is my long string
  which will end up with no linebreaks in it
myotherlongstring: |
  This is my other long string
  which will end up with linebreaks as in the original

Note the difference in the two examples above. The greater than ( > ) indicator gives us the folding syntax, where the pipe ( | ) does not.

Comments

Comments in YAML can be inserted anywhere except in a long string literal, and are preceded by the hash sign and a space:

# this is a comment

More YAML Features

YAML has many more features, most often encountered when using it in the context of specific languages, like Python, or when converting to JSON or other formats. For example, YAML 1.2 supports schemas and tags, which can be used to disambiguate interpretation of values. For example, to force a number to be interpreted as a string, you could use the !!str string, which is part of the YAML "Failsafe" schema:

mynumericstring: !!str 0.1415

Parsing and Serializing

Parsing means analyzing a message, breaking it into its component parts, and understanding their purposes in context. When messages are transmitted between computers, they travel as a stream of characters, which is effectively a string. This needs to be parsed into a semantically-equivalent data-structure containing data of recognized types (such as integers, floats, strings, etc.) before the application can interpret and act upon the data.

Serializing is roughly the opposite of parsing. To communicate information with a REST interface, for example, you may be called upon to take locally-stored data (e.g., data stored in Python dictionaries) and output this as equivalent JSON, YAML, or XML in string form for presentation to the remote resource.

An Example

For example, imagine you wanted to send an initial query to some remote REST endpoint, inquiring about the status of running services. To do this, typically, you would need to authenticate to the REST API, providing your username (email), plus a permission key obtained via an earlier transaction. You might have stored username and key in a Python dictionary, like this:

auth = {
  "user": {
    "username": "myemail@mydomain.com",
    "key": "90823ff08409408aebcf4320384"
  }
}

But the REST API requires these values to be presented as XML in string form, appended to your query as the value of a key/value pair called "auth":

https://myservice.com/status/services?auth=<XML string containing username and key>

The XML itself might need to take this format, with Python key values converted to same-name tag pairs, enclosing data values:

<user>
  <username>myemail@mydomain.com</username>
  <key>90823ff08409408aebcf4320384</key>
</user>

You would typically use a serialization function (from a Python library) to output your auth data structure as a string in XML format, adding it to your query:

import dicttoxml    // serialization library
import requests     // http request library
auth = {            // Python dict, containing authentication info
  "user": {
    "username": "myemail@mydomain.com",
    "key": "90823ff08409408aebcf4320384"
  }
}
get_services_query = "https://myservice.com/status/services"
xmlstring = dicttoxml(auth)       // convert dict to XML in string form
myresponse = requests.get(get_services_query,auth=xmlstring)  // query service

At this point, the service might reply, setting the variable myresponse to contain a string like the following, containing service names and statuses in XML format:

<services>
  <service>
    <name>Service A</name>
    <status>Running</status>
  </service>
  <service>
    <name>Service B</name>
    <status>Idle</status>
  </service>
</services>

You would then need to parse the XML to extract information into a form that Python could access conveniently.

import untangle     // xml parser library
myreponse_python = untangle.parse(myresponse)
print myreponse_python.services.service[1].name.cdata,myreponse_python.services.service[1].status.cdata

In this case, the untangle library would parse the XML into a dictionary whose root element (services) contains a list (service[]) of pairs of key/value object elements denoting the name and status of each service. You could then access the 'cdata' value of elements to obtain the text content of each XML leaf node. The above code would print:

Service B  Idle

Popular programming languages such as Python generally incorporate easy-to-use parsing functions that can accept data returned by an I/O function and produce a semantically-equivalent internal data structure containing valid typed data. On the outbound side, they contain serializers that do the opposite, turning internal data structures into semantically-equivalent messages formatted as character strings.

Software Development and Design Summary



What Did I Learn in this Module?

Software Development

The software development life cycle (SDLC) is the process of developing software, starting from an idea and ending with delivery. This process consists of six phases. Each phase takes input from the results of the previous phase: 1. Requirements & Analysis, 2. Design, 3. Implementation, 4. Testing, 5. Deployment, and 6. Maintenance. Three popular software development models are waterfall, Agile, and Lean:

  • Waterfall - This is the traditional software development model. Each phase cannot overlap and must be completed before moving on to the next phase.
  • Agile Scrum - In rugby, the term scrum describes a point in gameplay where players crowd together and try to gain possession of the ball. The Scrum methodology focuses on small, self-organizing teams that meet daily for short periods and work in iterative sprints , constantly adapting deliverables to meet changing requirements.
  • Lean - Based on Lean Manufacturing, the Lean method emphasizes elimination of wasted effort in planning and execution, and reduction of programmer cognitive load.

Software Design Pattern

Software design patterns are best practice solutions for solving common problems in software development. Design patterns are language-independent. In their Design Patterns book, the Gang of Four divided patterns into three main categories:

• Creational - Patterns used to guide, simplify, and abstract software object creation at scale.

• Structural - Patterns describing reliable ways of using objects and classes for different kinds of software projects.

• Behavioral - Patterns detailing how objects can communicate and work together to meet familiar challenges in software engineering.

The observer design pattern is a subscription notification design that lets objects (observers or subscribers) receive events when there are changes to an object (subject or publisher) they are observing.

The Model-View-Controller (MVC) design pattern is sometimes considered an architectural design pattern. Its goal is to simplify development of applications that depend on graphic user interfaces.

Version Control Systems

Version control is a way to manage changes to a set of files to keep a history of those changes. There are three types of version control systems: Local, Centralized, and Distributed.

Git is an open source implementation of a distributed version control system. Git has two types of repositories, local and remote. Branching enables users to work on code independently without affecting the main code in the repository. In addition to providing the distributed version control and source code management functionality of Git, GitHub also provides additional features such as: code review, documentation, project management, bug tracking, and feature requests. After installing Git to the client machine, you must configure it. Git provides a git config command to get and set Git's global settings, or a repository's options. Git has many other commands that you can use, including a host of branching options. Developers use a .diff file to show how two different versions of a file have changed.

Coding Basics

Clean code is the result of developers trying to make their code easy to read and understand for other developers. Methods and functions share the same concept; they are blocks of code that perform tasks when executed. If the method or function is not executed, those tasks will not be performed. Modules are a way to build independent and self-contained chunks of code that can be reused. In most OOP languages, and in Python, classes are a means of bundling data and functionality. Each class declaration defines a new object type.

Code Review and Testing

A code review is when developers look over the codebase, a subset of code, or specific code changes and provide feedback. The most common types of code review processes include: Formal code review, Change-based code review, Over-the-shoulder code review, and Email pass-around.

Software testing is subdivided into two general categories: functional, and non-functional. Detailed functional testing of small pieces of code (lines, blocks, functions, classes, and other components in isolation) is usually called Unit Testing. After unit testing comes Integration Testing, which makes sure that all of those individual units fit together properly to make a complete application. Test-Driven Development (sometimes called Test-First Development) is testing to validate the intent of the design in light of requirements. This means writing testing code before writing application code. Having expressed requirements in testing code, developers can then write application code until it passes the tests.

Understanding Data Formats

Today, the three most popular standard formats for exchanging information with remote APIs are XML, JSON, and YAML.

Extensible Markup Language (XML) is a derivative of Structured, Generalized Markup Language (SGML), and also the parent of HyperText Markup Language (HTML). XML is a generic methodology for wrapping textual data in symmetrical tags to indicate semantics. XML filenames typically end in ".xml".

JavaScript Object Notation JSON), is a data format derived from the way complex object literals are written in JavaScript. JSON filenames typically end in ".json".

YAML Ain't Markup Language (YAML) is a superset of JSON designed for even easier human readability.

Parsing means analyzing a message, breaking it into its component parts, and understanding their purposes in context. Serializing is roughly the opposite of parsing.

  1. Which software development methodology prescribes that developers follow a strict process order by completing one step in the SDLC process before proceeding to the next step.

    Topic 3.1.0 - The Waterfall model is the earliest SDLC approach. The phases follow a linear sequential flow, where each phase begins only when the previous phase is complete.

  2. Which SDLC development methodology employs many quick iterations known as sprints?

    Topic 3.1.0 - In the agile software development model the SDLC process is conducted in many quick iterations called sprints.

  3. Which two programming components are defined as blocks of code that perform tasks when executed? (Choose two.)

    Topic 3.4.0 - Methods and functions are both blocks of code that perform tasks when executed. Functions are stand alone code blocks whereas methods are code blocks associated with an object.

  4. A developer wants to find the location of the Python 3 executable file. Which command should the developer use?

    Topic 3.1.0 - The Linux command which application is used to find a specific application (in specific directories as defined by the PATH environment variable).

  5. Which SDLC phase concludes with functional code that satisfies customer requirements and is ready to be tested?

    Topic 3.1.0 - There are six phases in the SDLC process:

    1. Requirements & analysis: The product owner and qualified team members gather the requirements for the software to be built.
    2. Design: Software architects and developers design the software based on the provided software request specification.
    3. Implementation: Developers take the design documentation and develop the code according to that design. At the conclusion of this phase, functional code that implements customer requirements  is ready to be tested.
    4. Testing:  Test engineers take the code and install it into the testing environment so they can follow the test plan.
    5. Deployment: The software is installed into the production environment.
    6. Maintenance: The development team provides support for customers and works on software improvements.
  6. What are the three states of a Git file? (Choose three.)

    Topic 3.3.0 - There are three Git file states:

    • Committed: The version of the file saved in the.git directory.
    • Modified: The file has changed but has not been committed to the repository.
    • Staged: The modified file is ready to be committed to the repository.
  7. Which term is used to describe the first line of an XML document?

    Topic 3.6.0 - The XML prologue, which is the first line in an XML document, has a special format represented by a bracketed  <? and ?>.

  8. How does an application use a module in Python?

    Topic 3.4.0 - A module, in Python, is a Python file with packaged functions. When functions contained in a module are needed in an application in Python, the application uses the import statement to include these functions in the application.

  9. What is the role of the controller component in the Model-View-Controller (MVC) flow?

    Topic 3.2.0 - The Model-View-Controller (MVC) design pattern abstracts code and responsibility into three distinct components: model, view, and controller. The controller accepts the input, manipulates the data, and sends the manipulated data to the model.

  10. Which code review method involves the developer going through the code line-by-line with the reviewer, allowing the developer to make changes on the spot?

    Topic 3.5.0 - In an over-the-shoulder code review the developer who wrote the code goes through the code changes line by line with a reviewer who provides feedback. In this type of review the code can be changed on the spot so that the reviewer can re-review it on the spot.

Ref : [1]





                • Navigation BarWhat are two features of the formal code review? (Choose two.)
                      • Navigation BarWhich statement describes the Waterfall methodology of software development?