In an increasingly data-driven world, innovation doesn’t have to rely on gut instincts or hunches. It can be guided by a systematic, test-and-learn approach: experimentation.
The culture of experimentation has helped tech companies and traditional businesses alike to refine user experiences, optimize internal processes, and unlock new revenue streams. Inspired by Airbnb's transparent and rigorous experimentation ethos, this article explores five compelling examples from various companies where experimentation didn't just guide improvement—it sparked transformation.
1. Booking.com's relentless A/B testing culture
Booking.com is often considered a gold standard when it comes to A/B testing. The company runs more than 1,000 concurrent experiments on any given day, testing everything from button color to algorithm tweaks.
One standout example is how Booking.com once experimented with displaying hotel prices differently. Instead of showing the final price with taxes included upfront, they ran an A/B test comparing this against their usual pricing method. Surprisingly, showing the final price up front decreased short-term conversion slightly but significantly increased customer satisfaction and trust. Long-term, this led to higher retention and more loyal users.
What made Booking.com successful was not just the scale of their tests but their cultural commitment to experimentation. Engineers, designers, and product managers are empowered to run tests independently. Data is democratized, and results are shared openly.
Quote on experimentation:
2. LinkedIn's newsfeed algorithm revamp
LinkedIn has invested heavily in machine learning and user behavior analytics to drive engagement. In 2016, they ran a large-scale experiment to improve the relevance of the posts users see on their feed.
The hypothesis was simple: posts from closer connections and timely updates would increase engagement. But instead of deploying a new algorithm wholesale, LinkedIn conducted a series of A/B tests using small changes to the ranking logic, testing things like recency bias, relationship strength, and post type prioritization.
One key experiment altered how LinkedIn weighted comments and likes. They discovered that users preferred posts with a high number of thoughtful comments over viral posts with lots of likes but no depth. This insight helped them tune the algorithm toward quality interactions, resulting in a 15% increase in time spent on the feed and a 12% increase in sessions per week.
3. Netflix's Thumbnail Testing Engine
Netflix has mastered the art of micro-experiments, especially in the realm of content discovery. One of their most well-known experimentation practices revolves around thumbnails (also called artwork or cover images).
Each movie or show on Netflix can have multiple thumbnails, and the platform dynamically serves different versions to users based on past behavior, genre preferences, and engagement patterns. Through controlled experiments, Netflix found that the right image can increase click-through rates by 20-30%.
A famous example involves the show "Unbreakable Kimmy Schmidt." When Netflix tested a thumbnail with a goofy facial expression versus one with a group cast shot, the version with the exaggerated face drove significantly more engagement. This experiment validated the theory that emotional cues in faces catch users' attention faster.
"People are better at judging something in relation to something else than evaluating it in absolute terms."
In other words, we tend to understand the value, quality, or impact of something more clearly when we compare it to another option, rather than assessing it in isolation.
It’s easier for someone to decide whether a price is high or low when they see two products side by side (one for $100 and another for $200), than when they see just one product priced at $100 with nothing to compare it to.Market Intelligence Strategies
4. Microsoft Bing's ranking algorithm tuning
Microsoft's Bing search engine may not dominate market share like Google, but it has invested deeply in experimentation to stay competitive. One noteworthy experiment, described by Ronny Kohavi (former head of Microsoft’s experimentation platform), involved tuning Bing’s search ranking algorithm.
In 2012, Bing ran a seemingly minor experiment: changing how fast search results loaded. The test version returned results 100 milliseconds faster. Though imperceptible to most users, this tiny speed improvement led to a 0.6% increase in revenue per user.
The reason? Faster response time encouraged more searches per session, thus generating more ad impressions and clicks. This tiny tweak, validated by an A/B test, ultimately translated into millions of dollars in additional annual revenue.
5. Spotify's Personalized Playlist Engine: Discover Weekly
Spotify’s Discover Weekly feature is a masterclass in data science experimentation. Before launching the personalized playlist in 2015, Spotify ran months of controlled tests.
The team hypothesized that a weekly playlist, generated from collaborative filtering and natural language processing, could boost user satisfaction and time spent listening. To test this, they deployed Discover Weekly to a small user group, measuring increases in engagement, playlist saves, and repeat listens.
Early results were overwhelmingly positive: users played over 60% of the songs, and the feature rapidly became a weekly habit. Over time, Spotify improved the recommendation algorithm with continuous experiments—testing playlist length, update timing, and genre diversity.
Today, Discover Weekly is credited with cementing Spotify’s brand identity as a platform that "gets you," and it plays a vital role in user retention.
Fostering a culture of experimentation
Across these examples, one theme remains consistent: the best companies treat experimentation not just as a process, but as a cultural foundation.
Here’s how they do it:
Empowerment: Teams are trusted to run tests autonomously.
Infrastructure: Companies invest in robust testing platforms (like Airbnb, LinkedIn, and Microsoft).
Transparency: Results are shared openly, even when experiments fail.
Patience: Tests are allowed to run their full course to gather meaningful data.
Learning over winning: Failure is seen as informative, not shameful.
Companies like Airbnb have even developed dynamic p-value thresholds and run dummy experiments (A/A tests) to validate their testing systems. This level of rigor ensures that experiments are not just frequent but scientifically sound.
When leaders embrace a culture of experimentation, they create environments where curiosity, data, and humility combine to spark innovation.