
Machine learning is no longer confined to the realm of mathematics and coding experts. Its influence permeates every corner of modern business, from optimising supply chains to personalising customer experiences. But while leaders are adept at leveraging AI as a tool, they often overlook the powerful mental models that drive these systems.
Always fascinated by the intersection of code and creativity, I built my own AI from scratch last year. I was keen to better understand what it can offer the organisations and leaders we work with. The biggest learning was not so much its strengths and limitations. Instead, it was how its underlying principles, mathematics developed to solve complex computational problems, offer a surprisingly fresh and effective lens through which to view and solve persistent organisational challenges.
The core concepts of machine learning are not just technical building blocks for algorithms; they are elegant strategies for navigating uncertainty, finding signals in noise and fostering adaptive learning. When translated from code to the conference room, they provide a new vocabulary for leadership and a robust framework for improving decision-making, clarifying strategy and intentionally building a stronger culture.
Here are four fundamental concepts from machine learning that can help us all tackle complexity with greater clarity and impact…
1. Cross-entropy: Clarity requires contrast
In machine learning, models are trained to make predictions – is this image a cat or a dog? Is this transaction fraudulent or legitimate? A key function used to measure and correct a model’s performance is called cross-entropy loss. It heavily penalises ambiguity. If a model is uncertain, predicting a 50% chance of a cat and a 50% chance of a dog, it receives a high loss score. The model is rewarded for making a decisive choice, even if that choice is initially wrong, because clear feedback is what enables learning.
Business leaders often face a similar challenge. When confronted with ambiguity, the natural tendency is to hedge. They launch products with fuzzy positioning to appeal to everyone, deliver vague feedback to avoid discomfort or create strategies that try to be all things to all people. This approach feels safe, but like the uncertain machine learning model, it generates high organisational ‘loss’. Ambiguous messaging confuses customers, and middle-of-the-road decisions demotivate teams by failing to provide a clear direction.
Adopting a cross-entropy mindset means committing to clarity through contrast. It’s a mental model for forcing a choice between competing options and communicating that choice with conviction. When deciding on market positioning, don’t just describe what your product is; define what it is not. When articulating a new strategy, clarify which initiatives you will stop doing to create focus for the ones you are starting. This decisiveness can feel risky, as it closes doors to other possibilities. However, it provides the clarity necessary for teams to align and execute effectively.
2. Attention mechanism: Weight what matters most in the current context
Large language models like ChatGPT and advanced image recognition systems rely on a concept called ‘attention’. In a long block of text or a complex image, not all information is equally important for the task at hand. Attention mechanisms allow a model to dynamically weigh the significance of different parts of the input, focusing on specific words, pixels or data points that are most relevant in that particular context. This ability to selectively prioritise is what enables nuanced understanding and sophisticated outputs.
In business, leaders are bombarded with a constant stream of information: market trends, competitor moves, internal metrics, customer feedback and team concerns. A common mistake is to treat all these signals with equal importance, leading to strategic paralysis or a reactive, unfocused agenda. [A recent study by found that 85% of business leaders have suffered from ‘decision distress’ questioning decisions they have made in the past year, with 72% admitting the sheer volume of data stopping them from making any decision – Oracle, 2023]
In fact, not all signals should be treated equally. Great strategy means asking “What should we be paying attention to right now?”. It isn’t about seeing everything; it’s about knowing what to focus on and what to ignore.
Applying an attention mindset means shifting focus dynamically based on changing goals and context. For example, during a product launch, customer acquisition metrics might receive the most weight. During a financial downturn, attention might shift to operational efficiency. This isn’t about abandoning other metrics but about intentionally giving more influence to the signals that matter most for the immediate challenge. It also enables a form of organisational multitasking, allowing different teams to apply different ‘attention heads’ to the same set of company-wide data, focusing on what’s most relevant to their function.
3. Gradient descent: Small steps down the slope beat big leaps off cliffs
At the heart of how most machine learning models learn is an optimisation algorithm called gradient descent. The goal is to minimise a ‘loss function’, which measures how inaccurate the model’s predictions are. Instead of attempting a single, perfect leap to the lowest point of error, gradient descent works by taking small, iterative steps. At each point, it calculates the ‘gradient’ – the direction of steepest descent – and takes a tiny step in that direction. This process is repeated thousands or millions of times, gradually guiding the model toward a minimum error state.
The business world, in contrast, is often addicted to the idea of massive, transformational change. We pursue large-scale, multi-year ‘digital transformation’ projects, sweeping reorganisations, and big-bang product launches. An article in the Harvard Business Review recently called this ‘The Transformation Treadmill’ (HBR – Magazine 2026). These initiatives are the equivalent of trying to leap across a valley to find the lowest point. They are expensive, risky and often fail to deliver the expected results because the business environment changes before they are complete. Often, a shift in mindset from large-scale transformation to continuous, incremental improvement is preferable: small steps down the slope are safer and more effective than big leaps off cliffs.
Adopting a gradient descent model means breaking down large goals into small, testable experiments. Instead of a complete website overhaul, you might run dozens of A/B tests on the existing site. Instead of a massive corporate restructuring, you could pilot a new team structure in one department. The key is to have a clear ‘loss function’ – a specific metric you are trying to improve, like customer churn, employee engagement or conversion rate. Each small experiment provides feedback, telling you whether you are moving in the right direction.
Leaders can foster this approach by celebrating learning over flawless execution. Frame new initiatives as hypotheses, not foregone conclusions. Ask yourself, “What is the smallest, fastest experiment we can run to test this idea?” This approach reduces risk, as the cost of any single failed experiment is low. More importantly, it accelerates learning and builds a culture of adaptation, allowing the organisation to navigate complex challenges by continuously adjusting its path based on real-world feedback.
4. Self-attention: The parts of the system must understand each other’s relevanc
Another key innovation behind the power of modern large language models is a mechanism called self-attention. It allows each element in an input sequence – for instance, every word in a sentence – to look at all the other elements in that same sequence and weigh their importance relative to itself. The word ‘it’ in a sentence gains its meaning by attending to the noun it refers to earlier in the text. This process creates a rich, context-aware understanding, where the meaning of each part is defined by its relationship to the whole.
This can provide a powerful model for breaking down one of the most persistent organisational challenges: internal silos. In many companies, teams and departments operate with limited awareness of each other’s priorities, pressures and capabilities. The marketing team launches a campaign without fully understanding the product team’s roadmap; the engineering team makes a technical decision without grasping its impact on customer support. This lack of internal context leads to friction, duplicated work and missed opportunities. (A recent survey of HR leaders, for example, showed that more than 8 in 10 companies report critical misalignment between singular departments’ initiatives and those of the broader business – contributing to an estimated $8,9 trillion in annual economic losses – Eightfold AI, 2024.)
Applying a self-attention mindset means building systems for shared awareness. It’s about creating an environment where each team can effectively ‘attend’ to the work and context of other teams, leading to more intelligent, coordinated action. This goes beyond simple status updates; it requires building empathy, and a deep understanding of how each function contributes to the larger mission. When the sales team understands the engineering constraints and the engineering team understands the market pressures, both can make smarter, more aligned decisions.
Leading with a new set of models
By borrowing from some of the logic that powers today’s most exciting technology, you can equip yourself and your organisation with a more robust, flexible and insightful way to navigate business challenges. The task for leaders is not to become coders and LLM experts, not just to explore and implement the best AI tools for your business, but also to learn from the elegant strategies that make their algorithms so powerful.
