Deep-Dive: Lessons Learned on Product Experimentation

One of my favorite people in all of history is American inventor and businessman, Thomas Edison. In fact, I admire his entrepreneurial zeal and inventiveness so much that my wife and I named our first born child after him! (Hey, Edison!)

It was Thomas Edison who said the following:

“When there’s no experimenting there’s no progress. Stop experimenting and you go backward. If anything goes wrong, experiment until you get to the very bottom of the trouble.”

Experimenting is a critical part of the invention process – and it’s just as important within product management. With the boundaries of what’s possible constantly being pushed in the fast-paced world of tech and software, product experimentation stands as a beacon, guiding companies through the fog of uncertainty toward breakthroughs and enhancements. It’s the silent engine powering the rapid evolution of products we’ve come to rely on, from the apps that organize our lives to the platforms that connect us across continents. Yet, for all its significance, the path of experimentation is strewn with trials and tribulations, demanding not just creativity but a disciplined approach to navigating the unknown.

Over the years, we’ve had quite a few of today’s top product minds talk about experimentation at our conferences and within our community. This essay aims to compile some of the best insights that have been shared within Product Collective and be your compass when it comes to product experimentation. But what makes product experiments especially valuable in today’s tech landscape is not just the successes that are celebrated but even the failures that should be equally embraced – which we’ll dig into as well. In software and tech – where the only constant is change, the ability to learn from missteps and pivot accordingly is just as crucial as the drive to innovate.

Finally, as we stand on the cusp of a new era shaped by artificial intelligence and machine learning, the concept of product experimentation is evolving as I type. The future of product experimentation is not just about testing faster or more frequently but about testing smarter. It’s about leveraging the vast oceans of data at our disposal to make informed decisions, reduce the inherent risks, and, ultimately, deliver experiences that resonate deeply with users.

So, whether you’re a seasoned product manager, a fledgling entrepreneur, or simply a tech enthusiast eager to understand the gears that drive innovation, this journey through the lessons learned in the trenches of product experimentation is meant to illuminate, inspire, and perhaps even transform the way you approach the art and science of creating something new.

Let’s dive in…

A brief history

It may be useful for extra context to rewind way back and talk through the history of product experimentation – and in many ways, the history of software product experimentation is baked into annals of broader computing history. This piece of history won’t take us to Silicon Valley or even Cambridge – but, instead, will lead us to the industrious heart of Manchester in the United Kingdom, where a small-scale experimental machine, affectionately dubbed ‘Baby,’ was born. Far from being a mere footnote, Baby represents a seismic leap in computing, laying down the foundational principles that underpin even the most advanced machines of our contemporary digital landscape.

According to the Science and Industry Museum, it was in this post-war calm of 1948 that Manchester became the crucible for a revolution that would redefine the boundaries of computing. Sir Freddie Williams, along with his adept comrades Tom Kilburn and Geoff Tootill, embarked on a quest to transcend the limitations of their era’s computing behemoths. These machines, though formidable, were shackled by the need for physical reprogramming to undertake new tasks—a laborious endeavor that stifled their versatility.

The trio’s ambition was audacious yet simple: to forge a computer endowed with memory, capable of seamlessly transitioning between tasks as commanded. Their ingenuity birthed the Williams-Kilburn tube, an inventive application of the cathode ray tube, which, through a ballet of electrons and phosphor, harbored the potential to encode operations into memory.

And so, Baby was conceived— not as a towering colossus but as a testament to minimalist ingenuity. Its modest frame, spanning 17 feet and weighing just shy of a ton, was a marvel of efficiency, especially when juxtaposed against its American contemporary, the ENIAC, which dwarfed Baby in both size and mass. From Baby, the Manchester Mark 1 was then developed, which was instrumental in the transition from the theoretical concepts of computing, as envisaged by pioneers like Alan Turing, to the practical realization of these ideas in working machines. The machine’s ability to execute a broad set of instructions, including hardware-based multiplication, and its use of a two-level storage system, with both random-access memory and a magnetic drum for secondary storage, were groundbreaking. These features not only demonstrated the feasibility of Turing’s theoretical work but also provided a platform for further software experiments, influencing the design and development of subsequent generations of computers. The success and lessons learned from Mark 1 underscored the importance of software in the computing ecosystem, heralding a new era where software experimentation became integral to technological advancement and innovation.

An overview of various experiments

Navigating the intricate maze of product development, product teams wield experimentation as their compass, guiding their steps towards innovation and continual enhancement of the user experience. The landscape of product experimentation is vast and varied, each technique offering its unique lens through which we can scrutinize and refine our products. We’ll cover some of the most common types of experiments, which the team at UserPilot did a great job summarizing in this post. This is, by no means, an exhaustive list.

1. A/B Testing: The Duel of Variants

Perhaps the most renowned of all, A/B testing pits two variants against each other in a controlled environment. This method allows teams to make direct comparisons, whether it’s between two landing page designs, onboarding flows, or call-to-action buttons, to discern which variant resonates more effectively with users. By randomly assigning users to either Group A or Group B, product teams can analyze the performance based on conversion rates, user engagement, or other relevant metrics, thereby making data-driven decisions about which version to implement.

A/B testing is most valuable when you need to make data-backed decisions that drive quantifiable results, particularly in improving customer experience (UI/UX) and optimizing marketing campaigns. For instance, you might want to test different UI elements to minimize user friction or evaluate various marketing messages to see which one better fuels conversions.

2. Multivariate Testing: The Symphony of Variables

When the question isn’t just whether A is better than B but how various elements interact and influence user behavior, multivariate testing comes into play. This method extends beyond the simplicity of A/B testing by altering multiple variables simultaneously to understand their combined effects. It’s akin to conducting an orchestra, where each instrument’s contribution to the harmony is assessed. Multivariate testing is particularly valuable when optimizing complex pages or workflows, allowing teams to pinpoint the most effective combination of elements.

Multivariate testing is most useful when you need to understand how different elements of a webpage or a digital product work together to influence user behavior and conversion rates. Unlike A/B testing, which tests one variable at a time, multivariate testing allows for the examination of multiple variables simultaneously, providing insights into the best combination of elements that drive the desired outcome.

3. Fake Door Testing: Peeking into Potential Futures

Fake door testing, or “pretend” features, offers a glimpse into user interest without the full commitment of development resources. By introducing a feature in the UI that isn’t fully implemented, teams can measure engagement and interest based on user interactions. It’s like setting a stage and watching to see if the audience leans in, captivated by the promise of what’s behind the “fake door.” This low-fidelity, high-insight method is a powerful way to validate demand for new ideas before diving into the deep end of development.

Fake door testing is particularly useful when you want to gauge interest in a new product, feature, or service without actually building it first – whether it’s to gauge overall interest, pricing, and other factors.

4. Funnel Testing: Streamlining the User Journey

Every product has a journey, a path that users travel from their initial encounter to achieving their goals within the application. Funnel testing dissects this journey into distinct stages, identifying points where users falter or disengage. By methodically analyzing each segment of the funnel, product teams can remove obstacles, enhance flow, and ensure that the user’s journey is not just a path but a smooth, inviting road that leads to satisfaction and success.

Funnel testing is particularly useful in understanding user behaviors and identifying obstacles throughout the customer journey, from awareness and interest through to the action stage of making a purchase or subscribing. It can also help identify problematic areas in your customer journey, such as confusing steps that cause users to abandon the process.

5. Session Replays and Click Tracking: The Microscope on User Interaction

Sometimes, understanding user behavior requires a closer look, a microscopic view of interactions within the product. Session replays serve this purpose by recording user actions and offering a playback of their experience to identify usability issues, confusion, or moments of delight. Complementing this, click tracking through heatmaps provides a visual representation of engagement, highlighting areas of high activity and zones that might be overlooked. Together, these methods offer a granular view of the user experience, unearthing insights that might otherwise remain buried.

Session Replays and Click Tracking are most useful for understanding user behavior – including revealing the moments of user frustration, like rage clicks or dead clicks, and showing when and where users face challenges.

Each of these experimentation techniques offers a unique vantage point from which product teams can observe, learn, and iterate. By adeptly applying these methods, teams can ensure that every decision is informed, every change is deliberate, and every innovation brings them closer to the heart of their users’ needs.

The Importance of Setting the Right Metrics

As I’ve written previously, the metrics we lean on are more than just numbers — they’re actionable data points that help us know where to steer. In a fireside chat I hosted with Kris McKee of Optimizely, she advocated for being thoughtful about the metrics you develop before even starting an experiment. No experiment should go live unless it’s being measured. In being thoughtful about the metrics you’re using to judge the success of your experiment, you can help steer clear of the treacherous “Hippo effect,” where decisions are often swayed by the highest-paid person’s opinion rather than solid, actionable insights. This shift towards a more democratic decision-making process grounded in data marks a departure from tradition, empowering teams to make choices rooted in reality rather than hierarchy.

In his talk at INDUSTRY: The Product Conference, Jon Noronha recounted the case of Bing. Initially, Bing’s focus on maximizing queries per user was logical on the surface but inadvertently led the team down a misguided path, prioritizing quantity over user satisfaction. This realization sparked a significant recalibration towards metrics that genuinely reflected user engagement and satisfaction, such as queries per session and sessions per user. This wasn’t just a change in measurement—it was a strategic pivot that placed user experience at the forefront, contributing to Bing’s transformation into a profitable entity.

The insights from McKee and Noronha converge on a fundamental principle: the critical importance of aligning metrics with both user aspirations and business goals. Striking this balance is akin to walking a tightrope, where metrics must encapsulate immediate business value while resonating with the underlying needs and behaviors of the users. Misaligned metrics can lead to a misdirection of efforts, squandered opportunities, and innovation that misses the mark. However, when metrics are chosen with care, they can lead us down the right path.

Good metrics should inspire action and measure progress, meaning they should clearly indicate areas for improvement and drive decision-making processes. For instance, seeing a drop in conversion rates should prompt questions about potential causes, such as recent site changes or new acquisition channels, and lead to immediate action to resolve any issues.

It’s essential to use a combination of leading and lagging indicators. Relying solely on lagging indicators may prevent you from making timely adjustments for growth while focusing only on leading indicators might not clearly demonstrate which activities lead to goal achievement. Regularly revisiting and adjusting your KPIs in response to changing circumstances or insights gained from ongoing analysis will keep your strategy aligned with your business objectives.

Embracing Failure

When it comes to experimentation, failure isn’t just a possibility — it’s a vital step on the path to innovation. Kris McKee’s insights in our fireside chat illuminate the inherent value in experiments that don’t go as planned, reinforcing the idea that every misstep is a treasure trove of learning. This perspective reshapes our relationship with failure, transforming it from a dreaded outcome to a catalyst for growth.

In his talk at INDUSTRY: The Product Conference, Andres Glusman of DoWhatWorks reiterated this point as well – going as far as to say that a high failure rate in experimentation isn’t just normal; it’s a sign of a team pushing the boundaries. The notion that only a fraction of experiments will drive the needle forward doesn’t signal defeat but rather, the relentless pursuit of excellence.

Embracing failure and iteration is akin to adopting the scientific method and applying it to the world of product development. Each hypothesis tested and every assumption challenged brings us closer to the underlying truths that govern user behavior and preferences. Claire Vo, CPO at LaunchDarkly, contends that the scientific method and hypothesis testing in product development isn’t just a strategy; it’s a mindset. It encourages teams to view every experiment, successful or not, as a step closer to a more profound understanding of their product and its place in the users’ lives.

In essence, the path to creating products that resonate deeply with users is paved with trials, errors, and relentless curiosity. Embracing failure and iteration isn’t just about being resilient in the face of setbacks; it’s about fostering a culture of exploration and learning. The failure really shouldn’t be seen as a setback in the first place. Each experiment—regardless of its outcome—brings us closer to creating experiences that delight, engage, and truly meet the needs of our users.

Collaborative Experimentation

The trajectory from solitary guesswork to collaborative experimentation is a transformative journey that reshapes not just our approach to product development but the very fabric of innovation itself. In the early stages of his career, Andres Glusman, like many product managers and innovators, relied heavily on a blend of intuition and educated guesses to steer the direction of product development. This approach, while grounded in experience and a keen understanding of the market, inherently carries a degree of uncertainty. Decisions made in this manner are akin to navigating through a foggy landscape, where each step forward is tentative, and the path ahead is unclear.

As Glusman progressed in his career, he experienced a pivotal shift in his approach to product experimentation and development. He moved from relying solely on internal conjectures and team brainstorming sessions to embracing a broader, more collaborative form of insight gathering. This evolution marks a significant transformation in his methodology, akin to the transition from navigating by the stars to using a compass and map.

This collaborative approach to insight gathering involved engaging with a community of peers, including other product managers, industry experts, and even competitors. By sharing experiences, successes, and failures within this community, Glusman and his peers were able to pool their collective knowledge, thereby illuminating the path ahead with a brighter, more reliable light.

This shift towards collaborative insight gathering allowed for a more data-driven and evidence-based approach to product development. It reduced the reliance on guesswork and increased the chances of success for new features and innovations. Each shared experience, whether a triumph or a setback, added to the collective wisdom of the group, making each subsequent decision more informed and grounded in reality.

The Future of Experimentation

As we look ahead at what to expect in the future as it relates to product experimentation, especially as the contours of an AI-enhanced landscape emerge, we can certainly expect things to evolve quickly as the power of artificial intelligence and machine learning will not merely augment our experimentation processes but likely redefine them.

As Glusman astutely observes, the true potency of AI in the realm of product experimentation hinges on the foundation upon which it is built—the quality and relevance of the data that fuels it. It’s a poignant reminder that before we leap into the arms of our AI counterparts, we must first ensure that the insights we provide them are not just voluminous but valuable.

AI, with its ability to sift through vast datasets and identify nuanced correlations, holds the promise of elevating our experimentation strategies from the realm of educated guesses to predictive precision.

Yet, amidst this enthusiasm, we must tread with discernment. The allure of AI’s capabilities must not blind us to the importance of the human element—the creative spark that ignites groundbreaking ideas and the empathetic understanding that ensures our innovations resonate with human needs and aspirations. As we integrate AI into our experimentation ecosystems, we must strive for a symbiosis where AI amplifies our strengths and compensates for our limitations, all while keeping the user experience at the forefront.

Moreover, the advent of AI in product experimentation underscores the imperative for high-quality data. As the mantra goes, “Garbage in, garbage out.” AI-driven insights are only as reliable as the data they analyze – so our commitment to gathering, curating, and sharing robust datasets becomes even more critical. This dedication to data quality not only enhances the accuracy of AI’s predictions but also fosters an environment of trust and reliability in the insights we derive.

Summing it all up

From setting the right metrics to embracing the inevitability of failure – and from harnessing the power of community insights to preparing for the AI-driven future – the path to impactful product development is multifaceted and rich with opportunity. Setting the right metrics emerges not just as a task but as an art form, requiring a nuanced understanding of both user goals and business objectives.

Embracing failure is not a concession but a strategy—a means to unearth deeper understanding and refine our products with resilience and agility. It’s through the crucible of failed experiments that the most profound insights often emerge, shaping our products into better versions that resonate more deeply with our users.

The power of community and collaborative experimentation serves as a testament to the collective wisdom that surrounds us. By sharing our successes and failures, we not only accelerate our learning but also contribute to an ecosystem of knowledge that uplifts the entire field of product development.

As we stand on the brink of an AI-enhanced future, it’s important to approach this new frontier with a blend of enthusiasm and caution. The promise of AI in product experimentation is vast, but its true potential can only be realized on the bedrock of high-quality, meaningful data—a reminder that the human element, with its creativity and empathy, remains irreplaceable.

A big thank you to all of the product people I respect very much whose insights helped inform this essay – including Kris McKee, Jon Noronha, Claire Vo, Andres Glusman, and the team at UserPilot. Keep those experiments running, as your insights are certainly valuable to the entire product community!

Mike Belsito

Share0

Tweet0

Share0

About the author

Mike Belsito is a startup product and business developer who loves creating something from nothing. Mike is the Co-Founder of Product Collective which organizes INDUSTRY, one of the largest product management summits anywhere in the world. For his leadership at Product Collective, Mike was named one of the Top 40 influencers in the field of Product Management. Mike also serves as a Faculty member of Case Western Reserve University in the department of Design and Innovation, and is Co-Host of one of the top startup podcasts online, Rocketship.FM. Prior to Product Collective, Mike spent the past 12 years in startup companies as an early employee, Co-Founder, and Executive. Mike's businesses and products have been featured in national media outlets such as the New York Times, The Atlantic, CNN, NPR, and elsewhere. Mike is also the Author of Startup Seed Funding for the Rest of us, one of the top startup books on Amazon.