This article originally posted on Medium. Archived here to maintain open access.
Trade Me engineering is a medium size team — about 200 engineers — spread across predominantly platform and stream-aligned agile squads. About 80 per cent of our active development and user sessions are across the four major monolithic platforms, i.e. iOS and Android apps, Front-end App, and API.
While monolithic applications come with their baggage, they have worked well for us. Over the years, these platforms secured significant investments and underwent a few evolutions to respond to our developers’ and customers’ needs. To put it in perspective, they handle about 24 million weekly sessions.
Photo courtesy of Marco Bianchetti
The advent of machine learning services, data streaming pipelines, functions, and the need to harness cloud-native power led to a relative increase in the number of services we had to build and maintain. While 80% of our development effort is on four monoliths, the other 20% of our engineering team spends their effort on 10s of other microservices and their underlying components. We have started initiatives to modularise and extract domains from our monoliths to change this ratio to 70/30 in the mid-term.
Here Comes the Army of Terraforms
As a company with a long history of building on and maintaining monolithic applications, moving to a more service-friendly environment was a learning curve. This change was not only from a technical perspective, but also needed cultural and structure changes.
Observability, security, and governance of 4 monolithic applications with four stream-lined build pipelines and release processes are way more manageable. Within a year, we ended up with an explosion of numerous terraform projects and services delivering value to our customers, from recommendation engines to integration middle layers. While this was great, we were facing a few problems:
Local Optimisation and Lack of Knowledge Management: our engineers spent a good part of their time in the discovery phase, understanding the pros and cons of each Cloud service. Some criteria could be cost, lock-in effect, migration cost, scalability, and available community knowledge. While we encouraged cross-company knowledge sharing, we observed different teams repeatedly conducting the same type of evaluations.
Impact on Lead Time to Customer Value: Our stream-aligned team engineers spent a great deal of their time understanding the ups and downs of cloud environments and dealing with edge cases of terraforming providers. In a nutshell, the ops side of DevOps was more prevalent.
Impact on Cognitive Load: The differences between different service architectures and cloud components our engineers were utilising (even on rare occasions) resulted in an operability risk and burden to our teams’ cognitive load. This issue is more pressing if we consider the size of our company, as in each of our agile teams (squads), we usually have 4 or 5 engineers.
One Platform to Lure them All
TVP (Thinnest Viable Platform) is our approach to building the future of cloud-native applications at Trade Me. The name and idea came from the Team Topology book.
While we considered changing the platform name to something else (e.g. Phoenix), we noticed that we like the three components of this acronym: 1) Thinnest 2) Viable; and 3) Platform
It started with a series of wiki pages highlighting the characteristics of a production-ready application and the definitive list of must-haves we expect applications to have to fulfil our stream-aligned teams’ needs. We used user story-mapping to identify the Musts.
Subsequently, it evolved into a templated infrastructure-as-code project with almost fully automated provisioning pipelines. To understand TVP in the context of the Trade Me Engineering Platform, we better define its characteristics first.
It’s a Platform
We use TVP to define our approach to building future platforms. In our definition, Platforms are building blocks that abstract away the complexity of infrastructure and make it easier and faster to deliver products and value to our customers.
It’s a Product
TVP follows the Platform-as-Product mindset. As a product, the main customers of TVP are our developers. The main goal of this product is to improve DevEx (Developer Experience). Our main measures of success (MoS) are:
- Reducing developers’ cognitive load (qualitative MoS)
- Time to First Hello World (TTFHW)
Following good product practices, it has a pull model instead of a push. Instead of building features into the platform without explicit requirements from our customers (stream-aligned teams), our platform teams maintain a close relationship with them, understand their needs, and prioritise those features in their roadmap based on the value.
TVP has Product Managers who define the long-term platform strategy as a product, and the Product Owner builds the roadmap and maintains stakeholder management. Our platform teams spend a portion of the roadmap to maintain and support the artefacts and, ideally, sunset and retire features as the product evolves in future.
It’s Just Big Enough
TVP intends to ensure the absolute minimum requirements we expect from our production systems are implemented and abstracted away from our developers. The intention behind this is to keep the platform as simple as possible to cater to one of its primary purposes: reducing developer cognitive load. As the “Thinnest” in the acronym implies, it is just big enough.
It’s Opinionated but Sensible
TVP’s technical governance is very strict about the WHAT and HOW of TVP implementation. The minimalist attribute of TVP allows it to be opinionated. After creating each project with strict TVP templates as its foundation, the development team can tweak the underlying implementation to fit their specific use cases. These changes 1) are not supposed to be applied to the upstream (TVP foundation/templates) and 2) should follow technical governance/architecture sign-off guidelines applied to the rest of our Engineering projects. In a nutshell, the TVP is setting Sensible Defaults principles.
It’s the Golden Path to Production
Following the Spotify terminology, TVP is the Trade Me Engineering Golden Path to production-ready application. As a customer of TVP, developers can be confident that all of the MUST guardrails are implemented and ideally automated. In cases where a team decides not to use TVP, they need to complete the production-readiness checklist manually.
Open Source Model for Contribution
While we have a dedicated team to build and maintain TVP, it is open for contribution by everyone across the company. If a feature does not exist in TVP, any team can develop and contribute those features back after consultation with platform maintainers. The main questions to answer are if this feature belongs to TVP and what the design should look like from the architectural perspective.
Conclusion & What’s Next
TVP could potentially become one more competing platform in our company. There were a couple of reasons it became the TVP: 1) We had our senior leadership sponsorship, and 2) It is open for contribution. At this stage, we have released our version 1.0, and we are planning for what’s next. There are few things in our roadmap specifically addressing service to service integration patterns as well as data federation of TVP services. These features will enable us to modularise and extract domains from our monoliths to microservices.