Standard Bank: Our DevOps Journey (Part 5)
// Chef Blog
This is the fifth entry in our ongoing, bi-weekly series examining our customer Standard Bank's DevOps journey. You can read the first entry here, the second entry here, the third entry here, and the fourth entry here. Continue below for part five.
In this blog post, we talk to several members of the Chop Chop team. Derek Chung is the iteration manager and manages the deliverables. Mark Figueira works in Quality Assurance. Marcus Talken is the technical lead. Their discussion revolves around change—changes in process, changes in approaches to testing, changes in tools and changes in culture.
To set the stage, Mark described the waterfall approach that Standard Bank has traditionally used to develop applications.
"Business had its requirements. Those got handed to a business analyst who drafted an FSS (functional system specifications). The FSS went to the technical teams. Depending on the organization, one team would deliver the infrastructure and the other would deliver the application. In parallel, someone would write the test cases based on the requirements within the functional spec."
"It would get to a point where development would complete some form of unit testing. Then, the application would be handed off to another organization for component integration testing. When that phase was complete, another organization performed system integration testing."
"There were three testing cycles and we were always picking up bugs, throwing the application back over the fence to development or, if there were other requirements, back to the business analyst who would then confirm the requirements with business, update the functional spec, and update the test cases. You could be working on a project for five months and still hit a bug that delayed the whole process."
In contrast, the Chop Chop team uses a DevOps approach. Everyone involved sits together and shares information directly rather than by passing paperwork from one group to another. Mark continues, "The product guy is on board, the technical guy, the QA guy, we all understand what we're going to deliver. You pick up issues up front rather than three months, six months down the line. It's a good change for us."
Other changes include what is tested and how testing is done. The Chop Chop team's pilot project, the prepaid feature, is actually a part of a larger initiative that has adopted agile methods and testing strategies, so there are existing application tests that the team can use and adapt. What is new is test-driven infrastructure.
Derek says, "Historically, we've never actually done any tests on infrastructure. We'd build a server, we'd deliver it, we'd build a database, we'd deliver it – we would subsequently get come backs around the quality of the server builds; they were not consistent. We never actually had the concept of testing an infrastructure component. On top of that, we're dealing with the whole concept of shifting left when you write your infrastructure tests."
Marcus adds, "For all of us, writing test cases up front was different. I'm from the traditional infrastructure world where testing comes last, if at all. We were really struggling with the concept of getting the test cases sorted out. It took a while to figure out the test cases first, do a build, and then run the tests again."
The team had to learn several new technologies. Along with Chef, they learned test kitchen and Serverspec. Mark describes what it was like when they first started to use test kitchen. "In kitchen, it was a case of not really knowing what we don't know. Initially, we'd map out what we felt would be the tests we should pass. We started with a manual process and then from that, we automated it. By the end, what we delivered was not what we expected up front but it was a lot better."
They also learned how to use Bamboo for their continuous integration server. The team used the Serverspec tests they developed to run in test kitchen and incorporated them into their Bamboo runs.
Marcus described the continuous integration process. "We've got a continuous integration cycle for infrastructure that runs hourly. Coupled with that, we've embedded some of the infrastructure and application tests. Every hour we recreate the box and it runs all the infrastructure tests and application tests to make sure that any code changes we've made (the box is always built from the latest version of source available, not necessarily the latest release available) is then run through all of those tests to make sure that they still pass."
The Bamboo dashboard shows the status of the continuous integration build as well as the system integration tests and the production builds.
"The infrastructure tests make sure that, for example, the correct ports are open, the correct services are running, and the files are in the correct places. We then have contract tests, which are basically back-end server tests that exercise the service calls. The last test is a front-end functionality test that uses a tool to test the actual UI by mimicking what a user would do. We also check that, if you create a payment at the front end, all the necessary back-end calls happen to ensure that the payment goes through."
Here is an example of what the Chop Chop team sees when tests fail.
Here is an example of what they see once all the tests pass.
Mark says, "When the build comes out the other side, we're very confident that what was requested is what was delivered."
It takes time to completely adopt a new approach. Marcus notes that, "Officially we write the tests before, unofficially it's a bit of a mix and match. There are tests that are written before but there are times when the cookbook gets written and then we need to write the tests to make sure it works. " Writing tests up front is still a challenge but is recognized by the team as a very important aspect of DevOps and a key driver to the future success of Chop Chop.
Derek points out that change doesn't happen immediately. "It's a behavioral thing, to get the guys to change from the usual way. It's something that we need to practice and get a little more discipline with so we're doing it all the time.
Another challenge was deciding who should actually write the tests. Derek says, "Just figuring out the roles and responsibilities is something new. Security might think that the server team should be writing the security tests to make sure the server is compliant. The server team might say it's the security team's requirements so they should be writing the tests. It got us talking. It's the whole point of DevOps. We're all debating it. We haven't fully answered that question."
Many factors have contributed to the team's success. On the technical side, Marcus cites automation as extremely important. Derek agreed and said, "I can't imagine if one of us broke a machine and had to wait two weeks for another one to be spun up. It's literally a one-button click to get things right."
Autonomy and a measured approach also mattered. Marcus says, "I think you need not take the enterprise's view on everything. I won't say we ignored enterprise standards, but we did things in a way that would work for us. As an example, we chose Red Hat as our operating system over SUSE. Although the bank standard generally uses SUSE, Red Hat was just a better fit for us."
"Also, most organizations try and do everything perfectly from day one. We did it in baby steps. Can I build a vanilla server with nothing on it? Yes. Now, can I build a server with a little bit of security on it? Yes. Next, can I automatically deploy Chef? Yes. Now, can I automatically deploy Chef with a role? You can't do everything in one go. You've got to implement something, fix it, make sure it's working and then try and make it a little bit better."
On the non-technical side, everyone stressed the importance of teamwork, transparency and executive support. Derek said that the people on the team had more than expertise in their fields. They had a can-do attitude and worked together. Marcus said, " Everyone was core. There was no hero business where one person was doing everything. Everyone had a job to do, they knew what the goal was and they got together and delivered. If anyone deserves to get rewarded, it's the whole team."
Also important was support from the executives. Marcus said, "I think transparency and exec support was very important. They believed that we could do it and never actually told us how to do it. They said, 'Don't burn the bank down, but go do it.'"
In terms of transparency, Marcus made sure that anyone who was interested knew what the team was doing. "With every team we engaged with, we tried to impart as much information as possible, even if they weren't doing exactly what we were doing. We tried to get everyone on board."
Finally, a blameless culture was essential. Marcus said, "When someone did something that broke a box, broke an environment, we didn't go on a witch hunt. If something didn't work, there was no finger pointing. It was, 'OK, this is what broke. What can we learn from it? What can we do differently so we don't have this problem next time?' That was a really big change from the way I saw the organization work previously. Any environment where you're going to experiment, you need to be able to make mistakes and not get into trouble for that."
Marcus summed up the experience. "It's about having the right culture. With it, everything is possible."
Shared via my feedly reader
Sent from my iPhone