&
NETWORKWORLD d i g i ta l s p ot l ı g h t
Devops Introduction 2 Why devops’ time has come 4 Elements of devops 9 How Etsy makes Devops work 18 Resources 25
FALL 2014
digital spotlıght
Devops :: FALL 2014
More! Faster!
Devops helps ops keep up
IN
the decade before
this one, software developers were told to look for another profession, because all the coding jobs were headed offshore to India and China. My, how times have changed. The unemployment rate for developers in the US is under 3 percent and competition for top programming talent has never been fiercer. No wonder: Web and mobile apps that engage customers and partners have become table stakes for businesses everywhere, while fresh software platforms abound, from smart TVs to Hadoop to whatever the Internet of things cooks up next. Plus, software must now be updated continuously to keep pace with accelerated change. Devops provides the foundation to meet this wildly accelerated demand. It also serves the needs of agile development methodology, which raises soft
ware quality by stipulating shorter dev cycles and continual adjustment based on feedback from stakeholders. Both a philosophy and a set of automation tools, devops enables operations – and in some cases developers themselves -- to set up dev and test environments on demand using a software-defined, cloudlike infrastructure. In this Digital Spotlight, we explore devops’ agile development roots and examine the major types of tools and techniques to build modern dev, test, and deployment environments. In addition, we present an in-depth interview with the vice president of operations for Etsy, a forward-looking e-commerce company that has successfully built and maintained a successful devops environment for years. We hope you find this original content useful in pursuing your own devops strategy. —Eric Knorr, Editor in Chief
INSIDE Introduction 2 By Eric Knorr
Why devops’ time has come 4 With vastly increased demand for new code, enterprises can no longer afford long, slow development cycles. Devops provides the acceleration. B y E r i c K n o r r
Elements of devops 9 Devops is a little bit of philosophy and lot of tools. Here’s how those tools help boost the efficiency of the entire application development lifecycle. By MARTIN heller
How Etsy makes Devops work 18 Etsy, which describes itself as an online “marketplace where people around the world connect to buy and sell unique goods,” is often trotted out as a poster child for Devops. The company latched onto the concepts early and today is reaping the benefits as it scales to keep pace with rapid business growth. B y J o Hn di x
Resources 25 infoworld.com + NETWORKWORLD.COM
2
Faster, continuous software delivery with DevOps Learn how your organization can: •
Exceed customer expectations
•
Increase the velocity of software delivery
•
Leverage mobile, cloud, big data, social business
•
Improve stakeholder collaboration
For your free copy visit: ibm.com/devops
Part #RAG12453-USEN-00 For Dummies is a registered trademark of John Wiley & Sons, Inc.
3
digital spotlıght
Devops :: FALL 2014
Why devops
time has come
With vastly increased demand for new code, enterprises can no longer afford long, slow development cycles. Devops provides the acceleration. BY Eric Knorr
infoworld.com + NETWORKWORLD.COM
4
digital spotlıght
Devops :: FALL 2014
D
tools, release automation, configuration management, and application performance monitoring.
evops mashes together
development and operations into a single term. It does not, however, mean the unification of the two, which are different disciplines involving different skills and cultures. Does devops mean dev and ops the time-toshould at least understand each production of other’s needs better? Sure, although applications that the idealism of that notion has meet or exceed faded since the devops movement expectations, debegan five years ago. More imporvops has no direct tantly, devops underscores that effect on how enterprises now recognize the need well developers for quicker deployment of more and write code. Inbetter applications. stead, devops adds Why? Because Web and mobile automation and applications have become essenstreamlines worktial to connect with customers and flow throughout partners and to capture their preferthe entire cycle, ences and needs. Because new platenabling developers to build, test, forms -- from cars to TVs to smartand deploy modularly. It also allows watches -- keep emerging, with no stakeholders to review applications end in sight, thanks to the Internet in progress, provide feedback, and of things. Because organizations change direction if necessary. now realize there’s no such thing as These ideas are not new. In fact, “one and done” with applications; they originate with agile developyou need to improve them and add ment, a methodology concocted capabilities continually. more than a dozen years ago. Although the goal is to shorten Only recently, however, has a con-
C R E D I T: A L L A R T, S H U T T E R S T O C K C O M P O S I T E / S T E P H E N S A U E R
Agile is as agile does
stellation of technologies gathered to support devops effectively. Devops’ underlying technologies, such as PaaS (platform as a service) and configuration management (e.g., Puppet and Chef) are relatively new. But older technologies for testing and deployment have been integrated as well, such as application lifecycle management, automated testing
To understand the appeal of devops, you need to go back to the original Agile Manifesto, published in 2001. Agile development was conceived as an antidote to waterfall methodologies, which progress in linear fashion through a series of stages, such as feasibility, requirements, external design, program specifications, coding, testing, and (finally) production. The waterfall method demanded that stakeholders compose highly detailed functional requirements up front. These would essentially be thrown over the transom to developers, who would create their own technical specifications and build away until the project was complete. Often, the result wouldn’t be what stakeholders wanted, simply because descriptions are open to
infoworld.com + NETWORKWORLD.COM
5
digital spotlıght
Devops :: FALL 2014
misinterpretations, and no one could anticipate design flaws that might emerge along the way. Many waterfall projects failed or left users dissatisfied. The crew that wrote the Agile Manifesto had experienced these frustrations first hand. Here are their 12 principles, which together changed application development forever: Our highest priority is to satisfy the customer through early and continuous delivery of valuable software. Welcome changing requirements, even late in development. Agile processes harness change for the customer’s competitive advantage. Deliver working software frequently, from a couple of weeks to a couple of months, with a preference to the shorter time scale. Businesspeople and developers must work together daily throughout the project. Build projects around motivated individuals. Give them the environ-
Agile development is all about change: faster time to market, smaller and more frequent builds, a welcoming attitude toward new requirements.
ment and support they need and trust them to get the job done. The most efficient and effective method of conveying information to and within a development team is face-to-face conversation. Working software is the primary
measure of progress. Agile processes promote sustainable development. The sponsors, developers, and users should be able to maintain a constant pace indefinitely. Continuous attention to technical excellence and good design enhances agility. Simplicity – the art of maximizing the amount of work not done – is essential. The best architectures, requirements, and designs emerge from self-organizing teams. At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behavior accordingly. Agile development is all about change: faster time to market, smaller and more frequent builds, a welcoming attitude toward new requirements. But all that change creates gobs of work for operations, to
the point where some argue that ops’ inability or reluctance to keep up has prevented agile methodology from realizing its potential. That’s where devops comes in. It draws on a broad set of capabilities across tools traversing the entire development cycle in order to automate change as much as feasibly possible. Underlying all of devops, however, is the imperative to configure dev and test environments to order quickly and with minimal fuss.
Software-defined infrastructure
Devops runs parallel to a larger trend in enterprise IT: cloud computing. Although the cloud comes in many shapes and sizes, the basic idea is that compute, storage, and network resources can be configured and scaled on demand – without admins scurrying around to manually provi-
infoworld.com + NETWORKWORLD.COM
6
digital spotlıght
Devops :: FALL 2014
sion hardware infrastructure. Devops does not need a full-blown IaaS (infrastructure as a service) cloud to function, but it does need some measure of that sort of automation. Before large Web applications came into play, aside from a few Perl scripts, such automation was seldom required; configuring a handful of physical hosts for dev and test was not an enormous burden. But today, for Web or mobile apps that may suddenly spike to millions of users, we need scalability -- and the virtualization of compute, storage, and network resources that underlies the cloud offers the means to deliver that scalability to dev and test as well as to production. Add the iterative cycles of agile development, along with business demands for more and better apps, and it’s no wonder enterprise operations are turning to the cloud automation techniques first pioneered by Google and Amazon. IaaS-style functionality enables
operations to be vastly more efficient – and thus remain essential to the organizations that employ them. You could say that this motivation is the driving force behind devops. Ideally, operations sets up automated environments that enable developer self-service: A developer fills out a Web form and quickly obtains the dev and test environment he or she needs. Behind the scenes, severs spin up and a database containing a snapshot of the data relevant for the application comes to life. Such configuration management tools as Puppet, Chef, Ansible, and Salt enable ops to script their own cloudlike functionality from the ground up. So-called “private cloud” offerings from the likes of Citrix, Microsoft, OpenStack, and VMware go further, offering clouds that approach the robustness of, say, Amazon Web Services EC2 or Google Compute Engine. Further up the stack, PaaS (platform as a service) offerings ride on top of IaaS to provide dev, test, and
deployment options similar to that of an application server, only with IaaS scalability and support for multiple programming languages. Some wonder whether operations might be automating itself out of existence. At one point, Adrian Cockcroft of Netflix caused a stir by coining the term “no-ops” to describe certain aspects of the company’s development cycle. Many so-called “full-stack” developers pride themselves in handling all aspects of the dev, test, and deployment lifecycle, even though it requires them to learn tools and procedures intended for operations. But most software development managers see configuring infrastructure as a poor use of developers’ time. Whether in the public cloud or in the data center, configuring infrastructure is not a trivial task, as many EC2 customers can testify. Softwarebased configuration, which is the essence of the cloud, is making such tasks a whole lot easier.
A two-way street
Meanwhile, developers have their own sets of tools – as well as respon-
sibilities that extend beyond just slinging code. Although the emphasis of devops is on increasing operational efficiency to support application development, devops also gives developers greater opportunity to test as they go in an environment that mirrors that of production. It’s a lot harder for developers to say, in response to something blowing up in production: “Hey, it worked fine when I tested it.” Without devops, the whole methodology of smaller and more frequent builds and quick response to stakeholder feedback can be more theoretical than real. The embrace of constant change runs afoul of legacy processes. The first principle of agile development is a commitment to software quality -- not beautiful code for the ages, but applications that meet the needs of stakeholders, and evolve as requirements evolve. One of the healthiest aspects of the devops trend is its emphasis on agility in service of business objectives. Agility, as cliché as that word may have become, is still the greatest single benefit IT can bring to business.
infoworld.com + NETWORKWORLD.COM
7
DYNATRACE
DevOps aligns business requirements with IT performance, and recent studies have shown that organizations adopting DevOps practices have a significant competitive advantage over their peers. They are able to react faster on changing market demands, faster to get out new features, and have a higher success rate when it comes to executing changes. However, the term covers a wide range of different topics and consequently means different things to different people. The way we think about it is in terms of CAMS–adopting a Culture of blame-free communication and collaboration, embracing Automation to focus on important tasks, introducing continuous Measurements, and encouraging Sharing of these measurements. In order to focus the entire team on performance you must plug performance into the 4 pillars of CAMS: •
Culture: Tighten the Feedback Loops between Development and Operations
•
Automation: Establish automated performance testing in Continuous Integration
•
Measurement: Measure key performance metrics in CI, Test and Ops
•
Sharing: Share the same tools and performance metrics data across Dev, Test and Ops
CULTURE — TIGHTEN THE FEEDBACK LOOPS BETWEEN DEVELOPMENT AND OPERATIONS Culture is the hardest to change but is also the most important because it means to change the way how teams work together and share the responsibility for the end users of their application. It not only encourages the adoption of agile practices in operations work, it also allows developers to learn from real world Ops experiences and starts a mutual exchange that breaks down the walls between teams. Dynatrace helps with enabling this collaboration as it provides a shared language that allows Ops, Test and Dev to focus on the actual problems they have to solve. It allows to clearly state performance requirements that are well-known to Dev, Test and Ops, and eliminates fingerpointing by abandoning guesswork on the root cause of performance issues.
AUTOMATION — ESTABLISH A PRACTICE OF AUTOMATED PERFORMANCE TESTING IN CI
Running tests against the production system gives better input for capacity planning and uncovers heavy load application issues.
Automated Tests running in CI also help with detecting performance regressions on metrics such as # of SQL Calls, Page Load Time, # of JS files or Images.. Both Operations and Test Teams usually have a good understanding of performance as they deal with it every day. These teams need to educate developers on the importance of performance in large-scale environments under heavy load, so they enable developers to become aware of recent performance problems and how they were solved. This makes common problem patterns easier to prevent, and Dynatrace not only helps with identifying those patterns in production environments, but also in earlier development stages to prevent them from making it into production.
MEASUREMENT — MEASURE KEY PERFORMANCE METRICS IN CI, TEST AND OPS With performance aspects being covered in earlier testing stages, performance engineers get time to focus on large-scale load tests that need to be executed in a production-like environment. This helps to find any data-driven, scalability, and 3rd party impacted performance problems. Close collaboration with Ops ensures that tests can be executed either in the production environment or in a staged environment that mirrors production. Executing these tests in collaboration with Ops allows the teams to become more confident when releasing a new version and also helps with proper capacity planning steps. Furthermore, from an Ops perspective defining a set of key performance metrics that is monitored in all stages and has been agreed on between developers, testers and operators allows for better collaboration in the future, as the meaning of metrics is known to everyone involved.
SHARING: SHARE THE SAME TOOLS AND PERFORMANCE METRICS DATA ACROSS DEV, TEST AND OPS The more “traditional” testing teams are used to executing performance and scalability tests in their own environments at the end of a milestone. With less and less time for extensive testing, their test frameworks and environments have to become available to other teams to make performance tests a part of an automated testing practice in a Continuous Integration environment. Automatic collection and analysis of performance metrics as done by Dynatrace ensures that all performance aspects are covered. This once again entails defining a set of performance metrics that is applied across all phases, as this is beneficial to identify the root cause of performance issues in production, testing and development environments.
WHAT’S NEXT? Dynatrace brings speed and confidence to DevOps by helping with various aspects relevant to adopting Continuous Delivery and DevOps practices. Have you considered making performance a part of your deployment pipeline? Check out our 30-day Free Trial and start using Dynatrace in Continuous Delivery today!
Try Dynatrace free for 30 days at: Dynatrace.com/FreeTrial
digital spotlıght
Devops :: FALL 2014
Elements of
devops
Devops is a little bit of philosophy and lot of tools. Here’s how those tools help boost the efficiency of the entire application development lifecycle. B y M a r tin He l l e r
9
digital spotlıght
Devops :: FALL 2014
O
nce upon a
time, there was a developer who needed to write code against a database. So he asked the database administrator for access to the production database. “Oh, dear me, no,” said the DBA. “You can’t touch our data. You need your own database. Ask operations.” “Oh, dear me, no,” said the operations manager. “We don’t have a spare Oracle license, and it would take six months to get you that and
the server on which to run it. But I’ll do what I can.” You can see where this is going. You can even hear “bwahaha” after each answer. The developer eventually got a dressing down in a weekly meeting, and the DBA and ops manager were unusually silent and tried not to look at the developer. The developer left for a startup, became a black-hat hacker, and/or (horrible to tell) became a manager in various alternate universes. What if the developer could have
spun up a virtual machine already configured with trial versions of the correct operating system, the correct database, the correct table and index schemas, and syntactically valid test data? And what if all of this happened under the control of a configuration file and scripts while he brewed and drank a cup of coffee? How “agile” would that be? Enter devops. Basically, devops offers a big box of tools that automate their way around requests that used to result in
Developer workf low Defect manager Work item
Refresh tickets
Analyze problem
Check out code from repository
Code solution
Debug
Test
Send document set to repository
Detect
! infoworld.com + NETWORKWORLD.COM
10
digital spotlıght
! Devops :: FALL 2014
“no” for an answer. Developers get what they need to do their jobs, and operations can hold up their end of the bargain without too much trouble. These tools can be divided into sets that support each step in the application development lifecycle, from coding to integration to deployment to monitoring to bug reporting.
Integration and deployment Workf low Code check-in
Continuous integration server
Build server
Test runner
Developer tools
For a developer, working life revolves around a development environment. That has several pieces, which might be integrated or might be a selection of independent tools. Existing code lives in a repository, such as Git or Team Foundation Server (TFS), and the developer’s first task every day (after the daily stand-up meeting that agile organizations hold first thing) is to check out or clone all the code of interest from the shared repository. In an ideal world, nobody else’s check-ins or pushes would have an impact on the developer’s code, because everybody’s code would already be merged, integrated, and tested. In the real world, that won’t always be
Defect report
PASS?
NO
!
D E P L O Y
PROMOTE
Development server
PROMOTE
QA server
PROMOTE
Staging server
Production server
! ! ! ! Defect reports
infoworld.com + NETWORKWORLD.COM
digital spotlıght
Devops :: FALL 2014
the case, and merging, integrating, and testing yesterday’s changes might be the second order of business. In an ideal world, all code would be perfect. In the real world, there is no such thing as perfect code – the closest we can come is code that doesn’t have any known bugs. From the developer’s point of view, looking at the defect manager (be it Bugzilla, JIRA, Redmine, TFS, or any other tracker) and addressing any “tickets” (bug reports or task assignments) is the next order of business. An IDE such as Eclipse or Visual Studio often has a window into the defect manager, or possibly even deeper ties, but at the very least the developer has a browser tab open to view his or her tickets. The developer will either continue yesterday’s project, or shelve that and handle a higher-priority ticket, if there is one. By the same token, IDEs often integrate tightly with repositories, but at the very least the developer has a command-line console open for check-ins and check-outs. And to complete the triangle, bug trackers often integrate with source code
repositories. The code editor is usually the core component of an IDE. The very best code editors for devops purposes show you the repository status of the code you’re examining, so you can tell immediately if you’re looking at outdated source code. They’ll also refresh your copy before you introduce merge conflicts. Developers’ build tools depend on the programming language(s) they’re writing in, but in the case of compiled languages, developers want to be able to fire off builds from the IDE and capture the errors and warnings for editing purposes. It also helps if the code editor knows about the syntax of the language, so that it can flag errors in the background during coding and highlight the syntax with colors to help developers visually confirm that, for example, what they intended to be the name of an already-defined variable is correct. When developers write and test
easily and on a regular basis. Ideally, the testing framework integrates with the IDE and any local repository, so that any new code can be tested immediately after check-in, while the developer has the design firmly in mind. The developer’s tests should flow into the code integration environment through the shared repository, along with the source code that the developer has debugged and tested. code, they often spend the majority of the day running a debugger. When they are in an organization that has implemented devops, they often have the luxury of debugging in a virtualized environment that faithfully reflects the production environment. Without that, developers may have to use stub code to represent server actions or have local databases stand in for remote databases. Test runners help developers run their unit tests and regression tests
Code integration tools
Code integration tools take the code in a shared repository, build it, test it, and report on the results. This is often done using a continuous integration server, such as Jenkins, which will tie into automated build tools, automated test runners, automated reporting via email and defect managers, and actions on the repository. For example, if the build succeeds and all tests pass, all the current source code and built libraries and
infoworld.com + NETWORKWORLD.COM
12
digital spotlıght
Devops :: FALL 2014
executables can be tagged with the current build number in the repository. If critical tests fail, the relevant checkins can be backed out of the shared repository and returned to the responsible developer(s) for bug fixes. Some projects implement continuous integration for every code push, if the incremental build time is small. In other projects, a delay is introduced after a code push so that multiple pushes can be combined into the next build. Most projects, whether or not they use automatic builds and tests, and whether or not they integrate after code pushes or on demand throughout the day, also run nightly “clean” builds and tests, often on freshly provisioned test environments.
Deployment tools and environments
If the continuous integration server is set up to deploy builds, after they pass all tests, it will often rely on software deployment and configuration management tools. These often vary
PaaS (platform as a service) occupies an interesting niche in the cloud ecosystem. It’s basically a dev, test, and deployment platform that sits on top of IaaS (infrastructure as a service). depending on the run-time platform and the additional infrastructure. On the other hand, some configuration management tools – such as Ansible, Chef, Puppet, Salt, and Vagrant – work across a wide range of platforms by using widely supported languages. Ansible and Salt are Python-based systems; Chef, Puppet, and Vagrant are Ruby-based. Ansible takes recipes in YAML and manages nodes over SSH. Chef uses a Ruby domainspecific language for its configuration recipes and uses an Erlang server as well as a Ruby client. Puppet uses a custom declarative language to describe system configuration;
Puppet usually uses an agent/master architecture for configuring systems, but it can also run in a selfcontained architecture. There are more than 2,500 predefined modules listed in the Puppet Forge. Salt, originally a tool for remote server management, has evolved into an award-winning open source, cloud-agnostic configuration management and remote execution application. Salt can manage and deploy Linux, UNIX, Windows, and Mac OS X systems, and it can orchestrate resources in any clouds. Vagrant is a specialized configuration management tool for development environments, which
acts as a wrapper for VirtualBox, VMware, and other virtual machine managers. Vagrant takes the sting out of reproducing configuration-dependent bugs. PaaS (platform as a service) occupies an interesting niche in the cloud ecosystem. It’s basically a dev, test, and deployment platform that sits on top of IaaS (infrastructure as a service). PaaS can be deployed on premises or offered as a service by a public cloud provider. For example, the Pivotal Cloud Foundry PaaS can be deployed on premises on top of VMware’s version of a private cloud, or it can run in a public IaaS cloud such as Amazon EC2. PaaS includes infrastructure, storage, database, information, and process as a service. Think of PaaS as providing computers, disks, da-
infoworld.com + NETWORKWORLD.COM
13
digital spotlıght
Devops :: FALL 2014
tabases, information streams, and business processes or meta-applications, all tied up in one “stack” or “sandbox.” Where a PaaS adds value over IaaS is to automate all of the provisioning of resources and applications, which can be a huge time saver.
cloud. System VMs offer excellent software isolation, at the expense of incurring some fairly heavyweight hypervisor overhead and using a lot of RAM. Various hypervisors and IaaS infrastructures offer differing amounts of load isolation and differing algorithms for allocating excess
Docker can work independent of PaaS systems and can greatly simplify deployment for devops. There are two kinds of VMs: system VMs, such as VMware, and process VMs, such as the Java Virtual Machine. For the purposes of deployment tools, we are interested in system VMs, in which we can deploy a PaaS, such as Cloud Foundry, or a server application, such as DB2. In turn, VMs can be deployed on dedicated server hardware, either onpremise or off-premise, or on an IaaS
CPU capacity to VMs that need it. Software containers such as Docker offer good-enough software isolation in most cases, with much less overhead than VMs. All PaaS systems with which I am familiar wrap applications in software containers. For example, OpenShift runs applications in containers called gears, and uses SELinux for gear isolation; Cloud Foundry runs built
and packaged applications, called “droplets,” in Droplet Execution Agents, which use Warden Linux containers for isolation. While Docker is the current media darling of the software container space, and most relevant vendors have signed on to support it, Docker is rather new and not yet universally supported. On the other hand, Docker can work independent of PaaS systems and can greatly simplify deployment for devops. Docker can make multiple clouds look like one big machine, and it can be used for build automation, continuous integration, testing, and other devops tasks. While Docker began as a Linux-only solution, it recently gained support for Windows as well. In the grand scheme of a software lifecycle, each feature moves from design to development to testing to staging to production, while bug reports feed back to the developers for triage and fixes at each stage. For products that are released yearly, moving from one stage to another
can be a manual process. For agile products that are released weekly or biweekly, release management is often automated. Part of what needs to be automated is the release process management; in addition, teams need to automate their tests, bug tracking, building, packaging, configuration, and promotion processes.
Runtime monitoring tools
Acceptance testing for products usually includes performance testing, which may go all the way up to fullblown load testing with realistic user profiles. Even so, application performance can change in production for
infoworld.com + NETWORKWORLD.COM
14
digital spotlıght
Devops :: FALL 2014
a number of reasons: a spike in usage (Black Friday, anyone?), a memory leak that manifests over time, a bad spot on a disk, an overloaded server, or an ill-considered database index that slows down updates after its underlying table gets big. Application performance monitoring is intended to continually create metrics for the key performance indicators that matter to your application. These are usually broken down into user metrics, such as time to see a page or complete a transaction, and system metrics, such as CPU and memory utilization. System metrics are typically available all of the time. Passive user metrics, often collected using network monitoring appliances, are of most value when the application is heavily used; active user metrics, collected by generating application requests and measuring the response times, are often reserved for non-peak-load periods. When your application isn’t performing the way you’d like, determining the root cause may be a frustrating and time-consuming process. Until recently, the DDCM (deep
dive component monitoring) agents intended to help you with root cause analysis generated too much overhead to be used in production; you would have to turn them on for a short period to try to capture the problem, then turn them off to allow production to resume at full capacity. In the past couple of years, however, new DDCM products on the market claim to be able to monitor a wide selection of languages and frameworks with minimal overhead, streamlining the root cause analysis process.
Bug reporting and reproduction tools and environments
We mentioned defect managers earlier, but didn’t really elaborate on their use. In a best case, a reported defect will be accompanied by a detailed description, a root cause, a script to reproduce the problem, and it will be assigned to the developer most familiar with the relevant code. In a worst case, a bug report will
come from a frustrated user calling into tech support and include a conversation along these lines:
TS: What’s wrong? User: It broke. TS: What were you doing? User: What I always do. It worked yesterday. TS: Have you changed anything since yesterday? User: I didn’t change nothin’. Needless to say, such reports require some skill on the part of tech support to dig out enough of a description and steps to reproduce the
problem in order to allow a developer to work on the problem. It may also require remotely entering and running diagnostics on the user’s machine. Sometimes such problems will not reproduce on a developer’s machine. One common reason for this is that the development box is too fast and has too much memory to show the problem; another possibility is that the developer has a library installed that the user lacks; and a third is that the user has another application installed that interferes with yours. Once you’ve determined the user’s runtime environment, the developer can use configuration management tools to create a similar runtime environment in a VM. Vagrant, in particular, is intended for such purposes. The test VM may run locally on the developer’s machine, on a server, or on an IaaS cloud. In some cases, the steps to re-
infoworld.com + NETWORKWORLD.COM
15
digital spotlıght
Devops :: FALL 2014
produce the user’s problem would change the production database. In these situations, it’s useful to have a scaled-down copy of the production application running in a PaaS, so that changes never propagate to the production database. Once a fix for the problem is identified and a change set added to the code repository, the revised application must at least be regression tested, and preferably all acceptance tests will be run. If the change is accepted, then the release manager or customer service manager needs to decide whether to propagate the change to production or schedule it for later integration and whether to give the user a patch or an operational work-around.
The never-ending circle
If the modern agile application lifecycle sounds a little like Ezekiel’s vision of a chariot having wheels within wheels, that’s OK: It is. One wheel set represents the sprints – typically one to two weeks – after which an application version is released from development to testing. Another wheel
set represents a given build’s climb from development to testing to staging to production. An inner wheel set represents the lifecycle of a story card or application feature. And the tiniest wheels represent bug reports and fixes. In this complicated environment, development shops can easily bog down at any stage. The purpose of devops is to see that the routine things, such as bringing up a clean test database or promoting a build, are quick and easy, so that the developers can concentrate on building actual features and fixing real bugs. Martin Heller is a developer, entrepreneur, and veteran technology journalist. A senior contributing editor for InfoWorld, he frequently reviews software related to application development.
If the modern agile application lifecycle sounds a little like Ezekiel’s vision of a chariot having wheels within wheels, that’s OK: It is. infoworld.com + NETWORKWORLD.COM
16
digital spotlıght
digital spotlıght
Devops :: FALL 2014
Q&A
How Etsy makes
Devops work Etsy, which describes itself as an online “marketplace where people around the world connect to buy and sell unique goods,” is often trotted out as a poster child for Devops. The company latched onto the concepts early and today is reaping the benefits as it scales to keep pace with rapid business growth. Network World Editor in Chief John Dix caught up with Etsy VP of Technical Operations Michael Rembetsy to ask how the company put the ideas to work and what lessons it learned along the way.
infoworld.com + NETWORKWORLD.COM
18
digital spotlıght
Devops :: FALL 2014
Let’s start with a brief update on where the company stands today. The company was founded and launched in 2005 and, by the time I joined in 2008 (the same year as Chad Dickerson, who is now CEO), there were about 35 employees. Now we have well over 600 employees and some 42 million members in over 200 countries around the world, including over 1 million active sellers. We don’t have sales numbers for this year yet, but in 2013 we had about $1.3 billion in Gross Merchandise Sales.
How, where and when did the company become interested in Devo ps? When I joined things were growing in a very organic way, and that resulted in a lot of silos and barriers within the company and distrust between different teams. The engineering department, for example, put a lot of effort into building a middle layer – what I called the layer of distrust – to allow developers to talk to our data bases in a faster, more scalable E T S Y P H O T O S courtesy S c o t t B e a l e
“
way. But it turned out to be just the opposite. It created a lot more barriers between database engineers and developers. Everybody really bonded well together on a personal level. People were staying late, working long hours, socializing after hours, all the things people do in a startup to try to be successful. We had a really awesome office vibe, a very edgy feel, and we had a lot of fun, even though we had some underlying engineering issues that made it hard to get things out the door. Deploys were often very painful. We had a traditional mindset of, developers write the code and ops deploys it. And that doesn’t really scale.
Deploys were often very painful. We had a traditional mindset of, developers write the code and ops deploys it. And that doesn’t really scale. How often were you deploying in those early days?
Twice a week, and each deploy took well over four hours.
Twice a week was pretty frequent even back then, no?
Compared to the rest of the industry, sure. We always knew we wanted to move faster than everyone else. But in 2008 we compared ourselves to a company like Flickr, which was doing 10 deploys a day, which was unheard of. So we were certainly going a little bit faster than many companies, but the problem was we weren’t going fast with confidence. We were going fast with lots of pain and it was making the overall experience for everyone not enjoyable. You don’t
want to continuously deploy pain to everyone.We knew there had to be a better way of doing it.
Where did the idea to change come from? Was it a universal realization that something had to give?
The idea that things were not working correctly came from Chad. He had seen quite a lot in his time at Yahoo, and knew we could do it better and we could do it faster. But first we needed to stabilize the foundation. We needed to have a solid network, needed to make sure that the site would be up, to build confidence with our members as well as ourselves, to make sure we were stable enough to grow. That took us a year
infoworld.com + NETWORKWORLD.COM
19
digital spotlıght
Devops :: FALL 2014
and a half. But we eventually started to figure out little things like, we shouldn’t have to do a full site deploy every single time we wanted to change the banner on the homepage. We don’t have any more banners on the homepage, but back in 2009 we did. The banner would rotate once a week and we would have to deploy the entire site in order to change it, and that took four hours. It was painful for everyone involved. We realized if we had a tool that would allow someone in member ops or engineering to go in and change that at the flick of a button we could make the process better for everyone. So that gave birth to a dev tools team that started building some tooling that would let people other than operational folks deploy code to change a banner. That was probably one of the first Devops-like realizations. We were like, “Hey, we can build a better tool to do some of what we’re doing in a full deploy.” That really sparked a lot of thinking within the teams. Then we realized we had to get
rid of this app in the middle because it was slowing us down, and so we started working on that. But we also knew we could find a better way to deploy than making a TAR file and SSH’ing and Rsynch’ing it out to a bunch of servers, and then running another command that pulls the server out of the load balancer, unpacks the code and then puts the server back in the load balancer. This used to happen while we sat there hoping everything is ok while we’re deploying across something like 15 servers. We knew we could do it faster and we knew we could do it better. The idea of letting developers deploy code onto the site really came about toward the end of 2009, beginning of 2010. And as we started adding more engineers, we started to understand that if developers felt the responsibility for deploying code to the site they would also, by nature, take responsibility for if the site was
“
up or down, take into consideration performance, and gain an understanding of the stress and fear of a deploy. It’s a little intimidating when you’re pushing that big red button that says – Put code onto website – because you could impact hundreds of thousands of people’s livelihoods. That’s a big responsibility. But whether the site breaks is not really the issue. The site is going to break now and then. We’re going to fix it.
It’s a little intimidating when you’re pushing that big red button that says — Put code onto website — because you could impact hundreds of thousands of people’s livelihoods. It’s about making sure the developers and others deploying code feel empowered and confident in what they’re doing and understand what they’re doing while they’re doing it.
infoworld.com + NETWORKWORLD.COM
20
digital spotlıght
Devops :: FALL 2014
So there wasn’t a Devops epiphany where you suddenly realized the answer to your problems. It emerged organically?
It was certainly organic. If development came up with better ideas of how to deploy faster, operations would be like, “OK, but let’s also add more visibility over here, more graphs.” And there was no animosity between each other. It was just making things faster and better and stronger in a lot of ways. And as we did that the culture in the whole organization begin to feel better. There was no distrust between people. You’re really talking about building trust and building friendships in a lot of ways, relationships between different groups, where it’s like, “Oh, yeah. I know this group. They can totally do this. That’s fine. I’ll back them up, no problem.” In a lot of organizations I’ve worked for in the past it was like, “These people? Absolutely not.
“
They can’t do that. That’s absurd.” And you have to remember this is in the early days where the site breaks often. So it was one of those things, like, OK, if it breaks, we fix it, but we want reliability and sustainability and uptime. So in a lot of ways it was a big leap of faith to try to create trust between each other and faith that other groups are not going to impact the rest of the people. A lot of that came from the leadership of the organization as well as the teams themselves believing we could do this. Again, we weren’t an IBM. We were a small shop. We all sat very close to one another. We all knew when people were coming
I can’t recall a time where someone walked in and said, “Oh my God, that person deployed this and broke the site.” That never happened. People checked their egos at the door. and leaving so it made it relatively easy to have that kind of faith in one another. I can’t recall a time where someone walked in and said, “Oh my God, that person deployed this and broke the site.” That never happened. People checked their egos at the door.
I was going to ask you about the physical proximity of folks. So the various teams were already sitting cheek by jowl?
In the early days we had people on the left coast and on the right coast, people in Minnesota and New York. But in 2009 we started to realize we needed to bring things back inhouse to stabilize things, to make
things a little more cohesive while we were creating those bonds of trust and faith. So if we had a new hire we would hire them in-house. It was more of a short term strategy. Today we are more of a remote culture than 2009.
But you didn’t actually integrate the development and operations teams?
In the early days it was very separate but there was no idea of separation. Depending upon what we were working on, we would inject ourselves into those teams, which led later to this idea of what we call designated operations. So when John Allspaw, SVP of Operations and Infrastructure, came on in 2010,
infoworld.com + NETWORKWORLD.COM
21
digital spotlıght
Devops :: FALL 2014
we were talking about better ways to collaborate and communicate with other teams and John says, “We should do this thing called designated operations.” The idea of designated ops is it’s not dedicated. For example, if we have a search team, we don’t have a dedicated operations person who only works on search. We have a designated person who will show up for their meetings, will be involved in the development of a new feature that’s launching. They will be injecting themselves into everything the engineering team will do as early as possible in order to bring the mindset of, “Hey, what happens if that fails to this third-party provider? Oh, yeah. Well, that’s going to throw an exception. Oh, OK. Are we capturing it? Are we displaying a friendly error for an end user to see? Etc.” And what we started doing with this idea of designated ops is educate a lot of developers on how operations works, how you build Ganglia graphs or Nagios alerts, and by doing that we actually started creating more allies for how we do things. A
“
good example: the search team now handles all the on-call for the search infrastructure, and if they are unavailable it escalates to ops and then we take care of it. So we started seeing some real benefits by using the idea of this designated ops person to do cross-team collaboration and communication on a more frequent basis, and that in turn gave us the ability to have more open conversations with people. So that way you remove a lot of the mentality of, “Oh, I’m going to need some servers. Let me throw this over the wall to ops.” Instead, what you have is the designated ops person coming back to the rest of the ops team saying, “We’re working on this really cool project.
Success is a really broad term. I consider failure success, as well. If we’re testing a new type of server and it bombs, I consider that a success because we learned something. It’s going to launch in about three months. With the capacity planning we’ve done it is going to require X, Y and Z, so I’m going to order some more servers and we’ll have to get those installed and get everything up and running. I want to make everybody aware I’m also going to probably need some network help, etc.” So what we started finding was the development teams actually had an advocate through the designated ops person coming back to the rest of the ops team saying, “I’ve got this.” And when you have all of your ops folks integrating themselves into these other teams, you start finding some really cool stuff, like people actually aren’t mad at developers. They understand what they’re trying to do
and they’re extremely supportive. It was extremely useful for collaboration and communication.
So Devops for you is more just a method of work. Correct. There is no Devops group at Etsy.
How many people involved at this point? Product engineering is north of 200 people. That includes tech ops, development, product folks, and so on.
How do you measure success? Is it the frequency of deployments or some other metric? Success is a really broad term. I consider failure success, as well. If
infoworld.com + NETWORKWORLD.COM
22
digital spotlıght
Devops :: FALL 2014
we’re testing a new type of server and it bombs, I consider that a success because we learned something. We really changed over to more of a learning culture. There are many, many success metrics and some of those successes are actually failures. So we don’t have five key graphs we watch at all times. We have millions of graphs we watch.
Do you pay attention to how often you deploy?
We do. I could tell you we’re deploying over 60 times a day now, but we don’t say, “Next year we want to deploy 100 times a second.” We want to be able to scale the number of deploys we’re doing with how quickly the rest of the teams are moving. So if a designated ops or development team starts feeling some pain, we’ll look at how we can improve the process. We want to make sure we’re getting the features out we want to get out and if that means we have to deploy faster, then we’re going to solve that problem. So it’s not around the number of deploys.
I presume you had to standardize on your tool sets as you scaled.
We basically chose a LAMP stack: Linux, Apache, MySQL and PHP. A lot of people were like, “Oh, I want to use CoffeeScript or I want to use Tokyo Cabinet or I want to use this or that,” and it’s not about restricting access to languages, it’s about creating a common denominator so everyone can share experiences and collaborate. And we wrote Deployinator, which is our in-house tool that we use to deploy code, and we open-sourced it because one of our principles is we want to share with the community. Rackspace at one point took Deployinator and rewrote a bunch of stuff and they were using it as their own deploying tool. I don’t know if they still are today, but that was back in the early days when it first launched. We use Chef for configuration management, which is spread
“
throughout our infrastructure; we use it all over the place. And we have a bunch of homegrown tools that help us with a variety of things. We use a lot of Nagios and Graphite and Ganglia for monitoring. Those are open-source tools that we contribute back to. I’d say that’s the vast majority of the tooling that ops uses at this point. Development obviously uses standard languages and we built a lot of tooling around that.
And we wrote Deployinator, which is our in-house tool that we use to deploy code, and we opensourced it because one of our principles is we want to share with the community.
infoworld.com + NETWORKWORLD.COM
23
digital spotlıght
Devops :: FALL 2014 www.infoworld.com
www.networkworld.com
InfoWorld
Networkworld
501 Second St. San Francisco, CA 94107 415.978.3200
492 Old Connecticut Path, P.O. Box 9002 Framingham, MA, 01701-9002 508.766.5301
As other people are considering adopting these methods of work, what kind of questions should they ask themselves to see if it’s really for them?
I would suggest they ask themselves why they are doing it. How do they think they’re going to benefit? If they’re doing it to, say, attract talent, that’s a pretty terrible reason. If they’re doing it to improve the overall structure of the engineering culture, enable people to feel more motivated and ownership, or they think they can improve the community in which they’re responsible or the product they’re responsible for, that’s a really good reason to do it.
Everybody will talk and it will be great. Well no. I didn’t marry my wife the first day I met her. It took me a long time to get to the point where I felt comfortable in a relationship to go beyond just dating. It takes longer than people think and they need to be aware of that because, if it doesn’t work after a quarter or it doesn’t work after two quarters, people can’t just abandon it. It takes a lot of time. It takes effort from people at the top and it takes effort from people on the bottom as well. It’s not just the CEO saying, “Next year we’re going to be Devops.” That doesn’t work. It has to be a cultural change in the way people are interacting.
I didn’t marry my wife the first day I met her.
It took me a long time to get to the point where I felt comfortable in a relationship to go beyond just dating. It takes longer than people think and they need to be aware of that because, if it doesn’t work after a quarter or it doesn’t work after two quarters, people can’t just abandon it.
EDITORIAL Editor in Chief Eric Knorr Executive Editor Galen Gruman
That doesn’t mean everybody has to get along every step of the way. People certainly will have discussions and disagreements about how they should do this or that, and that’s OK.
EDITORIAL Editor in Chief
Executive Editor, Test Center Doug Dineley
John Dix
Managing Editor Uyen Phan
Bob Brown
Senior Editor Jason Snyder Editor at Large Paul Krill Senior Writer Serdar Yegulalp East Coast Site Editor Caroline Craig Newsletter Editor Lisa Schmeiser
Online Executive Editor, News
Executive Features Editor Neal Weinberg Community Editor Colin Neagle Multimedia Programming Director Keith Shaw Online News Editor Michael Cooney Online News Editor
Associate Editor Pete Babb
Paul McNamara
Senior Online Production Editor Lisa Blackwelder
Ann Bednarz
Online Associate News Editor
Managing Editor SALES Senior Vice President Digital / Publisher Sean Weglage 508-820-8246 Vice President, Digital Sales Farrah Forbes 508-202-4468 Account Coordinator Christina Donahue 508-620-7760
Jim Duffy Senior Editor Tim Greene Senior Writer Brandon Butler Staff Writer Jon Gold Web Production Managing Editor
East, Southeast, IL and MI Chip Zaboroski 508-820-8279
Ryan Francis
East, New England, New York Chris Rogers 603.583.5044
DESIGN
West / Central Becky Bogart 949.713.5153
But they have to keep in mind it’s not going to be an overnight process. It’s going to take lots of time. On paper it looks really, really easy. We’ll just drop some Devops in there. No problem.
E-mail:
[email protected]
N. CA / OR / WA Kristi Nelson 415.978.3313
Art Director Stephen Sauer
N et w or k w orld la b allian c e Joel Snyder, Opus One; John Bass, Centennial Networking Labs; Barry Nance, independent
Images by Shutterstock
consultant; Thomas Henderson, ExtremeLabs; David Newman, Network Test; James Gaskin, Gaskin Computing Services; Craig Mathias,
© IDG Communications Inc. 2014
FarPoint Group
infoworld.com + NETWORKWORLD.COM
24
Devops :: FALL 2014
SPONSORED BY:
Resources
SPONSORED BY:
EMA: Ten Factors Shaping the Future of Application Delivery In a recent research study on DevOps and Continuous Delivery, EMA discovered there is a strong correlation between the company’s software delivery speed and their revenue growth. This report can help organizations build a case for Continuous Delivery adoption.
G DOWNLOAD HERE
Using Continuous Delivery to Improve Software Delivery Learn more about the challenges impacting organizations and how continuous delivery processes can be a key success factor in accelerating software delivery.
G DOWNLOAD HERE
Forrester: How to Accelerate Innovation with Continuous Delivery In this on demand webcast with Forrester analyst Kurt Bittner, explore how you can increase quality, break down silos and maximize productivity for developers, QA and operations teams. Learn more about the business value of Continuous Delivery with Jenkins. ON-demand G webcast
Global Bank Improves Quality of Application Development
Agile Development: How to Release Apps at the Speed of Business
Lack of a centralized management of the process and sporadic access to development build assets was hurting development cycles. Read how this financial institution centralized build assets, cu development time in half and added additional security controls.
Many companies plan for changes and improvements in software releases several times a year. Some – like McKesson Health Solutions – have a business model that requires their IT team to respond to hundreds of releases every year. Learn how your IT team can release apps at the speed of your business, efficiently and effectively – every day, every quarter.
G DOWNLOAD HERE
ON-demand G webcast
DevOps: Culture or Tools? It’s Both Alan Shimel, Co-Founder & Editorin-Chief of DevOps. com, discusses five key steps for developing a culture and assessing tools that can help you deliver software faster, more efficiently, and with great quality. ON-demand G webcast
Mastering Performance and Collaboration Through DevOps DevOps is a hot topic within the IT community and is quickly becoming the standard for high performing and decidedly collaborative organizations. Join Gene Kim, co-author of The Phoenix Project and DevOps researcher, as he discusses performance metrics, as well as the cultural and technical practices, that enable high performance. ON-demand G webcast
infoworld.com + NETWORKWORLD.COM
25