DevOps Basics

Some Resources

What is DevOps?

5 Levels

  1. Values
  2. Principles
  3. Methods
  4. Practices
  5. Tools

Why DevOps?

  1. High performing IT organizations deploy more frequently, fail less, and recover faster
  2. Lean management and continuous delivery practices help deliver values faster
  3. High performance is achievable whether your apps are greenfield, brownfield, or legacy.

Core Values - CAMS

The 4 fundamental values to bring to a devops implementation.

That the word DevOps gets reduced to technology is a manifestation of how badly we need a cultural shift. --Patrick DeBois

See article, devops culture

Principles

Systems thinking

Amplifying feedback loops

Culture of continuous experimentation and learning

5 Most Prevalent devOps Methodologies

  1. People over process over tools, Find responsible, define process and lastly find and imlement tools to solve the problem.
  2. Continuous Delivery - Code, test and release continuously. See HP Case
  3. Lean Management - Work in small batches, Work in progress limits, feedback loops, Vizualization. Has been proved to lead to better throughput and stability
  4. Change Control - The Visible Ops Handbook. Operational success correlates with control over changes in environment. Eliminate fragile artifacts, create a repeatable build process, manage dependencies, create environment for continuous improvement.
  5. Infrastructure as Code - Systems can and should be treated like code. Checked into source control, Reviewed, built, and tested.

10 Practices for DevOps Success

  1. Chaos Monkey - Netflix blog
  2. Blue/Green Deployment - Have two identical systems, where one is live. Update offline system, test and point live to it.
  3. Dependency injection - Losely coupled dependencies. Check Fowler
  4. Andon Cords - Anyone can halt the process when needed
  5. The Cloud - Allows you to treat infrastructure like you would any other program component. API-driven control.
  6. Embedded Teams - Add ops person to dev team and make dev team responsible for operations of their particular software.
  7. Blameless Postmortems - See How complex systems fail paper
  8. Public Status Page - Communication is the way for customers to keep trusting your service. See Transparent Uptime blog
  9. Developers on Call - Responibility for services created. This tends to make sure core problem is resolved quickly instead of operations using workarounds.
  10. Incident Command System - See Chapman

Tools

Culture and Communication

Wall of confusion - Impedence mismatch caused by DevTeam usually organized by app or business sector - Infra team often by technology stack. -> Ineffective -> Outsourcing -> New problems

Blameless Postmortems

Transparent Uptime

Rules for Postmortem Communication:

  1. Admit failure
  2. Sound like a human
  3. Have a communication channel (independent of your site)
  4. Above all else, be authentic

Trust Blockers

  1. Lack of context
  2. Conflicting goals

Open It Up

  1. Chat rooms
  2. Wiki pages
  3. Source code (read)
  4. Infrastructure
  5. Monitoring tools
  6. Ticket tracker

The Westrum Model

Minimum viable process - Everybody onboard, remove unnecessary

Management Best Practices

Kaizen: Continuous improvement

Building Blocks

Agile

Lean

A systematic approach for eliminating waste. DevOps is an extension of Agile infrastructure in which its process is iterative or repeated in cycles.

You strive to:

1 Eliminate waste 2 Amplify Learning 3 Decide as late as possible 4 Decide as fast as possible 5 Empower the team 6 Build in integrity 7 See the whole

Muda: Work that absorbs resources but adds no value Muri: Unreasonable work imposed on workers and machines Mura: Work coming in unevenly instead of a constant or regular flow

Wastes:

1 Partially done work 2 Extra features 3 Relearning 4 Handoffs 5 Delays 6 Task switching 7 Defects

Eric Ries - The Lean Startup adapted lean as:

Build - Measure - Learn

1 Build the minimum viable product 2 Measure the outcome and internal metrics 3 Learn about your problem and your solution 4 Repeat. Go deep where needed

Lean Techniques

CAMS to CALMS

ITIL, ITSM and the SDLC

IT service management (ITSM) refers to the entirety of activities – directed by policies, organized and structured in processes and supporting procedures – that are performed by an organization to design, plan, deliver, operate and control information technology (IT) services offered to customers.

Information Technology Infrastructure Library or ITIL provides a comprehensive process-model based approach of designing, managing, and controlling IT processes.

ITIL Phases:

1 Service Strategy 2 Service Design 3 Service Transition 4 Service Operation

Infrastructure Automation

Infrastructure as code

Configuration Management

Approaches:

Idempotent - The ability to execute repeatedly, resulting in the same outcome.

Self service - The ability for an end user to initiate a process without having to go through other people.

See the Golden image or Foil Ball

CM Tools

Services Directory/State Tracking Tools

Container Orchestration Tools

Private Container Services

CI & CD (Continuous Delivery)

Benefits:

1 Time to market goes down 2 Quality increases, not decreases 3 Continuous Delivery limits work in progress 4 Shortens lead times for changes 5 improves mean time to recover

How "little" can you deliver?

important practices:

Continuous Delivery Pipeline:

Trace a single code change through the pipeline and answer the following: 1 Can you audit a single change and trace it through the pipeline? Cycle 2 Overall cycle time

Flow - frequency of commits

Contnuous Delivery requires automated testing 1 Unit Testing 2 Code hygiene - Linters, formatters and best practices 3 Integration Testing 4 Security Testing (Gauntlet) 5 TDD/BDD/ATDD 6 Infrastructure Testing 7 Performance Testing

Tooling

Site reliability engineering (SRE)

Key success metrics

A circuitbreaker detects a threshold of failures and prevents further failure by stopping an application from repeatedly executing that action to protect the system.

Michael Nygard popularized the Circuit Breaker pattern in his book Ship It!

See the twelwe factor app for good practices to avoid common problems.

How Complex Systems Fail

Monitoring

Logging

The 5 Ws of Logging:

Centralized logging: syslog -> Logstash

Principles:

1 Don't collect log data that you will never use 2 Only retain log data for as long it is probable you'll need it 3 Log whatever is usable but alert only things that needs action. Use loglevels where errors require action, else warn 4 Logging should meet business needs, not exceed them! 5 Logs change

SRE Tools

Future