Data-centric and IoT applications are not typical, and the environments that they are deployed into also are not typical. They most often are deployed into large, distributed networks of machines, which begs the question of whether DevOps practices and tools can be applied to these applications.
DevOps for data-centric applications must work smoothly when the production environment is a distributed network environment. Developers and testers must define best practices for DevOps and continuous deployment into these distributed network environments.
Let’s say you have adopted a distributed network environment for your data-centric production environment. Unless your data-centric applications are developed successfully and rapidly for that environment, it will be a wasted effort. Clearly, DevOps is the answer, but how many DevOps tools are capable of enabling application development for distributed network environments?
Three critical steps are required to ensure success with DevOps in a distributed network environment:
- Containerize applications
- Implement DevOps tools to enable a continuous DevOps lifecycle
- Use sandboxes throughout the DevOps lifecycle
Think of these three steps as addressing the “what,” “how,” and “where” elements of the data-centric DevOps challenge:
Containers (the “what”). Putting your applications into containers allows them to look uniform as they cross between non-production and production environments, and between on-premise and cloud.
DevOps Tools (the “how”). DevOps tools automate the steps from programming to testing to production deployment.
Sandboxes (the “where”). Sandboxes help you determine “where” you develop your application so that the environment and infrastructure that it runs on look the same from the development lab to the test lab to the production datacenter or large distributed IoT network.
Two of these three steps are common to both data-centric and non–data-centric environments. Containerizing applications is definitely important to data-centric DevOps. In addition, data-centric requirements seem to affect the DevOps toolchain very little – the DevOps practices still apply fully. But the data-centric environment is critically different from a more typical application environment. This makes sandboxing to encapsulate the distributed network environment that is more typical of data-centric applications an essential tool for enterprise DevOps.
Sandboxes (aka Uber Containers) are self-contained infrastructure environments that can be configured to look exactly like the final target deployment environment, but can be created and run anywhere. For example, developers can create a sandbox that looks like the production environment – from network and hardware to OS versions and software to cloud APIs. They do their development in that sandbox for a short period of time and when they are done they tear down the sandbox. Testers can do the same thing. In addition, testers can run a bunch of tests with the sandbox configured to look like their internal IT environment, automatically re-configure the sandbox on the fly to look like the external cloud environment, and run more tests. This allows them to test all of the possible environments that the application could run in without disrupting the actual production infrastructure.
Technically, what is a sandbox? A physical sandbox is a protected space where you have complete control and others are allowed in only if you invite them. You can bring in your own toys to the sandbox and make anything you want in the sand. If you don’t like it, just stomp it out and start over. Technological sandboxes follow these same rules. A number of vendors are now providing sandbox solutions (some are called “Environment as a Service”) that have a simple interface for creating any target infrastructure environment and configuring it with as much control as you want. They allow you to bring applications, tools, tests, and automated processes into that sandbox. They provide protections so that others cannot mess with any infrastructure that you are currently using in your sandbox. They provide reservation and scheduling for many people so that whole teams of developers and/or testers can share physical and virtual infrastructure on-the-fly for hours, days, or weeks at a time. Finally, a good sandbox solution can be triggered from the outside (for example, from a DevOps tool).
In the world of data centric and IoT, applications need to be deployable on distributed network infrastructure. Sandboxes allow developers to mimic these complex, large-scale environments and define applications that can run successfully in this type of infrastructure. The sandbox can be used to test in these distributed network environments also.
Of course, in a perfect world, containers, DevOps tools, and sandboxes can be combined to enable continuous deployment in a distributed network environment. Package your applications in containers, use DevOps tools to manage and automate the process of moving through the development cycle, and create sandboxes for each step in the development cycle that mimic the actual target production infrastructure(s) on which those applications need to run.
Joan Wrabetz is the Chief Technology for Quali. Prior to her current role, she was the Vice President and Chief Technology Officer for the Emerging Product Division of EMC.
Ms. Wrabetz has over 20 years of executive management experience at public and privately held technology companies. She has been an executive at a number of startup technology companies, has been a Venture Partner with BlueStream Ventures, and has been on the board of directors or advisory board of many early stage technology companies. She was the founder and CEO of Aumni Data, a developer of big data analytics technology. She was the CEO of Tricord Systems (acquired by Adaptec), one of the first companies to introduce a commercial product based on distributed file system technology. Earlier, Ms. Wrabetz was the Vice President and General Manager for SAN operations at StorageTek, a $650M revenue business, responsible for all open systems products, including tape libraries, disk systems, and SAN switches. Prior to joining StorageTek, Ms. Wrabetz was the founder and CEO of Aggregate Computing, a grid computing software company, acquired by Platinum Technologies. Prior to Aggregate, Ms. Wrabetz held management and senior technical positions at Control Data Corporation and SRI International.
Ms. Wrabetz holds an MBA from the University of California, Berkeley, an MSEE from Stanford University, and BSEE from Yale University. She has taught as an adjunct faculty member at the University of St. Thomas, St. Mary’s University, and at the Carlson School of Business at the University of Minnesota. She holds patents in load balancing, distributed systems and machine learning classification and analytics.
Subscribe to Data Informed for the latest information and news on big data and analytics for the enterprise, plus get instant access to more than 20 eBooks.