Machine Learning Operations

Wayfair MLOps Acceleration: Powering Great Experiences at Scale

Machine Learning (ML) is an integral part of everything we do at Wayfair to support each of the 30 million active customers on our website. It enables us to make context-aware, real-time, and intelligent decisions across every aspect of our business. We use ML models to forecast product demand across the globe, ensuring our customers can quickly access what they’re looking for. Natural language processing (NLP) models are used to analyze chat messages on our website so customers can be redirected to the appropriate customer support team as quickly as possible, without having to wait for a human assistant to become available. Wayfair’s commitment to MLOps Acceleration is evident in every aspect of our operations.

Integrating Machine Learning at Wayfair

Machine Learning (ML) is an integral part of Wayfair’s operations, enhancing customer experiences and driving business decisions.

Leveraging ML for Enhanced Customer Support

ML models are utilized for forecasting product demand and analyzing chat messages to optimize customer support efficiency.

Commitment to MLOps Acceleration

Wayfair demonstrates a commitment to accelerating Machine Learning Operations (MLOps) across its operations.

Technology Strategy for Competitiveness

ML plays a crucial role in Wayfair’s strategy for competitiveness, supporting various eCommerce engineering processes.

Evolution of Infrastructure and Tools

Wayfair’s journey involves migrating to Google Cloud, adopting new tools like Apache Airflow, and addressing legacy infrastructure challenges.

Adoption of Vertex AI for ML

The adoption of Google Cloud’s Vertex AI in 2021 showcases Wayfair’s readiness to embrace new solutions for ML advancements.

One AI Platform with all the ML Tools Needed

This enables us to build software that runs on any infrastructure. We enjoyed how the tool looks, feels, and operates. Within six months, we moved from configuring our infrastructure manually to conducting a POC, to a first production release.

Next on our priority list was to use Vertex AI Feature Store to serve and use AI technologies as ML features in real-time, or in batch with a single line of code. Vertex AI Feature Store fully manages and scales its underlying infrastructure, such as storage and compute resources. That means our data scientists can now focus on feature computation logic, instead of worrying about the challenges of storing features for offline and online usage.

While our data scientists are proficient in building and training models, they are less comfortable setting up the infrastructure and bringing the models to production. So, when we embarked on an MLOps transformation, it was important for us to enable data scientists to leverage a  platform as seamlessly as possible without having to know all about its underlying infrastructure. To that end, our goal was to build an abstraction on Vertex AI. Our simple python-based library interacts with the Vertex AI Pipeline and Vertex AI Features Store. And a typical data scientist can leverage this setup without having to know how Vertex AI works in the backend. That’s the vision we’re marching towards–and we’ve already started to notice its benefits.

Reducing Hyperparameter Tuning From Two Weeks to Under one Hour

While we enjoy using open source tools such as Apache Airflow, the way we were using it  was creating issues for our data scientists. And we frequently ran into infrastructure challenges, carried over from our legacy technologies, such as support issues and failed jobs. So we built a CI/CD pipeline using Vertex AI Pipelines, based on Kubeflow, to remove the complexity of model maintenance.

Now everything is well arranged, documented, scalable, easy to test, and well organized in terms of best practices. This incentivizes people to adopt a new standardized way of working, which in turn brings its own benefits. One example that illustrates this is hyperparameter tuning, an essential part of controlling the behavior of a machine learning model. 

In machine learning, hyperparameter tuning or optimization is the problem of choosing a set of optimal hyperparameters for a learning algorithm. A hyperparameter is a parameter whose value is used to control the learning process. Every machine learning model will have a different hyperparameter, whose value is set before the learning process begins. And a good choice of hyperparameters can make an algorithm perform optimally. 

Doing it in Python using a legacy infrastructure would take a data scientist on average two weeks. We have over 100 data scientists at Wayfair, so standardizing this practice and making it more efficient was a priority for us. 

With a standardized way of working on Vertex AI, all our data scientists can now leverage our code to access CI/CD, monitoring, and analytics out-of-the-box to conduct hyperparameter tuning in just one day. 

Powering Great Customer Experiences with More ML-Based Functionalities

Next, we’re working on a docker container template that will enable data scientists to deploy a running ‘hello world’ Vertex AI pipeline. It can take a data science team more than two months to get a ML model fully operational on average. With Vertex AI, we expect to cut down that time to two weeks. Like most of the things we do, this will have a direct impact on our customer experience. 

It’s important to remember that some ML models are more complex than others. It must be accurate, and it must appear on-screen extremely quickly while customers browse the website. That means these models have the highest requirements and are the most difficult to publish to production. 

We’re actively working on building and implementing tools to streamline and enable continuous monitoring of our data and models in production, which we want to integrate with Vertex AI. We believe in the power of AutoML to build models faster, so our goal is to evaluate all these services in GCP and then find a way to leverage them internally. 

And it’s already clear that the new ways of working enabled by Vertex AI not only make the lives of our data scientists easier, but also have a ripple effect that directly impacts the experience of millions of shoppers who visit our website daily. They’re all experiencing better technology and more functionalities, faster. 

For a more detailed dive on how our data scientists are using Vertex AI, look for part two of this blog coming soon.

Request for a Call

Collaborate with the best in the industry. Let’s talk and get your project moving.


Contact Us For Questions

Thank You for Subscribing

Come Join Us For A Workshop!
Food And Drinks Provided


Chatbot Dialogflow CX Instructions