Powering 3 million requests an hour with open source software
Open source technology continues to drive innovation, but it’s not always obvious when it’s used behind the scenes. We spoke to Tony Gorman from ASOS to learn how open source software is being used to power services that handle upwards of 3 million customer requests an hour.
Can you tell us a little about yourself?
My name is Tony Gorman and I’m one of the Engineering leads at ASOS. I work with the Principal Software Engineer, Principal Test Q/A and Principal Platform Engineer groups, who are all working across ASOS in a bunch of different platforms. One of the things I do is work with a small core team who create, maintain, manage and curate pipelines for deploying AKS, with all the bells, whistles and restrictions that ASOS need in order to run a secure platform. I spend a lot of time with those folks working on these pipelines and working with AKS, and a large part of the rest of my time is spent with teams that are working on implementing AKS.
What were your objectives in moving from classic cloud services to .NET core, containerising on Linux and moving more into open source?
The long term goals were, first of all, to get off of cloud services because of the various restrictions we had with them. We also wanted to take the opportunity to improve our cost base, so we were looking at density, the DevOps experience and the security that would wrap around all of that. We looked at AKS amongst a few other options, and we decided that it was a good angle to approach these goals from.
Which Azure technologies are you currently using?
We use a lot of different technologies as you would expect from an online e-commerce operation. Databases, IaaS, PaaS, and as with AKS, some things that fit in between. We also use various server-less and event driven offerings provided by Azure.
How are ASOS currently using Kubernetes?
We currently mostly use Kubernetes to run micro-services. We use the micro-x architecture extensively across our estate, written in a variety of languages and running on different application tiers. We have in-house Data Science and Integration teams that also use Kubernetes to run workloads.
I also saw that you’re using Redis as well. How are you using that?
We use Redis in a few places in ASOS, usually as a side cache for a number of applications across our estate that are handling large volumes of data and critical workloads. There are a lot of people within ASOS who are quite committed to the open source movement, and we prefer and encourage people to code in the open. We are more active on the open source front than we used to be, and we hope to continue that trend.
Why were you considering AKS to begin with, and what’s important about AKS specifically for how ASOS uses it?
One of the things we were trying to do was “shift left”, particularly in the DevOps space, so we wanted to streamline and enhance our CI/CD process. That led us down the containerisation route, and once we decided that containers were the way forward, it was then a case of making a choice on orchestration. We are heavily invested in Microsoft and we arrived at Kubernetes via containers and then looked at how best to run them. We considered Azure Container Service early on, and then AKS became a product. We fundamentally favour a more PaaS-like experience over an IaaS experience; they seem to interlock and dovetail nicely for us, so that’s basically how we said, OK, let’s start the journey with AKS and see if it progresses to a point were it meets our needs.
What has been the return on investment in making this move from a business perspective?
It’s saved us quite a lot of money on compute and it’s saved us a lot of time in terms of CI/CD, so we were able to get some of our deployments to happen much faster because of how we’ve engineered stuff to work on it. We’re getting faster releases and we have fewer incidents because we feel that the compute platform is a lot more tractable.
There are also some indirect benefits as well – AKS itself is an interesting solution for our engineers and prospective new engineers to work on. We spend a lot of time training people on AKS as well, so getting the opportunity to be trained on a new and interesting technology has a return on investment without just being about the compute.
How did you get people on board with the move to AKS?
At the start we had already recognised that cloud services had a shelf life, so we had some impetus behind making our compute choice anyway. I think it was relatively easy because we did a lot of proof of concepts very early on while working with Microsoft. The first platform we actually used was a Data Science platform, and we got really good value out of that straight away. So most people looked at that, saw that seemed to be working, and gave us the green light to carry on. But we planned our way into it pretty well and didn’t have many hiccups along the way, so it worked out well for us.
Has moving to AKS influenced how your cloud strategy looks? If it has, how would you say it’s influenced it?
I think it has reminded everyone that the cloud has come of age. It’s a mature, secure, stable, reliable place to do your business. Any qualms we had in the past have long been vanquished. I think AKS has also shone a light on how many cloud services we have, how many challenges there can be and how much effort is required in order to maintain and release them.
There’s also been a shift in the marketplace as well, in terms of skillsets. We, like any company, have to keep recruiting, and it’s actually much easier to recruit people now who either have experience with or want to get involved in something like Kubernetes, which plays in our favour.
We are quietly and incrementally moving off of cloud services. We have a compute strategy that gives you simple choices between AKS, App Services and something like server-less with functions, based on the workload. But a lot of our workloads just fit naturally into the AKS slot. Because we’ve done a lot of automation behind the scenes and the Microsoft Product Group have done a lot of stuff to make life easier, it’s definitely focused us on containers as the way forward.
Was there anything that tripped you up during the move to AKS? Is there anything you’d recommend to people thinking about their own moves to AKS?
Number one is to make sure you understand the security boundary and that you have a clear understanding of what your security needs are. We had a lot of challenges at the start, and in some respects we actually over-engineered for our particular situation. Some of that was standard Kubernetes upstream, some of that was due to the way in which AKS started off, and some of it was down to how we structure our network topography at ASOS. So make sure you understand your security perimeter properly and that you understand how you can apply that within AKS.
The second thing I would say is to make sure you understand the learning curve, and make sure you have some form of training program in place. It doesn’t have to be radical but we recognised that we needed one very early on, so we worked with Microsoft on a bunch of training courses that we run internally. These are very popular and are a pre requisite for a team to have attended before they start using AKS.
The third thing is that it’s easy to get up and running with AKS, but you need to be clear about all the extra stuff that you need. You need VNets and NSGs. You need key vaults and a CI/CD strategy worked out. Understand that there’s way more to running an AKS cluster than running an “az aks” command – there’s a lot of plumbing that needs to go into place. We took a decision very early on to wrap that all up in a reusable pipeline, and I think it’s paid dividends. We run a lot of clusters and there are no clusters that don’t use our internal pipeline, because it saves people a lot of time, effort and money.
One of the things that I’ve valued on this journey is actually having really good contacts within Microsoft to work with. Even in the earlier days when we spoke to the product group more directly, it was super valuable and helpful for us to be able to have that level of contact and get our point of view across. We’ve had lots of calls with lots of people on various subjects, so I think we would have had a much harder time if it weren’t for that. We meet regularly with CSAs from Microsoft and they have really helped with improving how we run AKS.
Will there be more coming to AKS from ASOS in the future?
We’re moving more workloads over to AKS, but there isn’t a one-size-fits-all answer for choosing how to run an application – which means we’re always looking at options that offer a sensible default for the type of workload under consideration. In tandem, we’re continually improving how we use AKS, with a particular focus on ease of use. This is the general direction of travel for Microsoft as well so it aligns nicely. We ultimately want AKS to be invisible for our engineers and trusted by our Security and Platform community.