Why Visual Intelligence Is the New Frontier for AI
There is no doubt that AI excels at specific tasks — such as logic, data processing, text generation, assisting with everyday tasks, and particularly in code writing. However, from a product perspective, visual design largely determines how people interact with technology.
Even flawless functionality can fall short if the design isn’t appealing. We have all seen AI-generated images that easily reveal their origin — they lack the visual expressiveness inherent in human work. Teaching AI to understand what “good design” looks like is one of the most important tasks of modern machine learning.
What is Visual Intelligence?
Visual intelligence refers to the capacity to imagine three-dimensional objects, understand spatial relationships, mentally transform shapes, and project them onto a flat surface. In simpler terms, it involves remembering, imagining, and constructing. In human perception, visual intelligence and visual acuity make the world more voluminous and richer, helping not only in art but also in everyday tasks.
When it comes to AI visual intelligence, it is the ability to perceive, interpret, and create visual solutions that match human aesthetic perception. It is more than just recognizing objects in an image. It is about the ability to make subtle design decisions: how to place accents, which colors and fonts to combine, and how to balance composition and visual rhythm. We are trying to teach machines to “feel” the structure of visual space the way humans do. This is the key to scalable and user-oriented design automation.
Why Teaching AI “Good Design” Is Difficult
When classifying data in NLP (natural language processing) or CV (computer vision), we are dealing with tasks that are well-defined and easy to formalize. However, visual preferences are difficult to formalize. In texts and images, we can place precise labels — where the cat is, where the dog is, where the positive review is, and where the negative one is. However, in design, such clear categories don’t apply well: aesthetics are subjective and depend on context and audience.
We cannot simply collect a dataset with examples of “good” and “bad” — because it is subjective and inconsistent. Visual preferences change over time: what caused a wow effect in 2008 — for example, skeuomorphism with its imitation of materials such as leather, metal, and paper — is now perceived as an overloaded, excessive, and visually outdated style. AI needs to learn to understand not only the structure but also the nuances: not just what’s on the screen, but what actually works — that is what elicits the desired response from the user, such as trust, understanding, or the desire to click. This depends not only on structure but also on subtle parameters, such as color temperature, block hierarchy, line spacing, and indents.
These nuances are complex to digitize and rarely appear in traditional datasets. Even if AI learns from large amounts of data that buttons of a specific size in the interface are more popular with users and have a positive effect on conversion, this will likely lead to the generation of thousands of identical buttons. However, it will not provide an understanding of how to create a truly expressive, aesthetic, and, most importantly, unique button for a specific project. This is the limitation of AI: it is capable of reproducing repetitive solutions, but it cannot replace a designer, especially when it comes to originality and creativity.
Perception also plays a role. The audience increasingly perceives content created with significant AI involvement as unnatural and soulless. A study published in Cognitive Research: Principles and Implications found that participants tend to rate AI-generated artworks more negatively than those claimed to be human-made. This is especially evident in assessments of emotional depth and the effort put into creating the artwork.
Context Matters: Niche Design, Not Template Design
In my work on implementing AI systems for creating websites or individual pages, I have repeatedly encountered how poorly inflexible general templates work in real projects. Conversely, giving AI no design constraints at all leads to equally poor results.
A local café needs a visual language that conveys coziness and warmth. A creative agency needs a bright identity, an unusual composition, and visual accents that reflect the brand’s character. A non-profit organization that helps animals needs a warm palette, friendly typography, emotional images, and an emphasis on trust. Even if the logic of the layout is similar, the emotional tone and visual presentation should be different.
I spent a considerable amount of time developing a working algorithm for creating a page template that combines pre-designed elements with flexible, AI-driven layout logic.
Technologies Behind AI Design
Hybrid approaches — where clear rules and user feedback guide generative models — tend to be the most effective in practice. Models based on transformers — the architectures behind systems like ChatGPT or Midjourney — are great at creating text and images. However, on their own, they tend to be too general-purpose to meet the specific needs of design tasks. For the results to be suitable for real-world design applications, models need to be guided by predefined rules, including layout logic, brand restrictions, and visual hierarchy.
When it comes to website or landing page design, AI must be able to select colors, fonts, text, and visual blocks that work together for a specific task and audience. The goal is not merely to assemble a layout but to create a design that is relevant and understandable to this particular business and its users.
Chains of specialized AI agents, where each is responsible for a separate stage, perform exceptionally well in solving this problem: one analyzes the target audience and page goals, another forms the text structure, and a third selects the visual style and arranges the key elements. The entire system is overseen by a final evaluation module — an AI agent that analyzes the overall result and compares it with the specified criteria.
This approach enables us to develop design solutions that perform reliably in real-world contexts — taking into account the task, audience, and business objectives.
Where AI Performs Well — and Where It Still Falls Short
AI tools are already quite effective at solving routine tasks — such as cropping or generating images for websites and social media. However, when analytical thinking, fact-checking, or creative interpretation is required, they still depend on human involvement.
True creativity still belongs to humans. We still see mistakes like jarring fonts or clashing color combinations — the kind even a junior designer would instinctively avoid. Most often, AI reproduces what it can find in code-sharing platforms like GitHub — for example, palettes and styles — without considering the context, purpose, or audience.
Original, meaningful creativity remains a human strength. AI can adapt, combine, and scale, but ultimately builds on human-created solutions.
The Future: Collaborative Creativity Between Humans and AI
The future of design is closely tied to the automation of routine tasks, including creating illustrations, generating landing pages, developing interface elements, and selecting references. This speeds up work, reduces costs, and, in some cases, can already perform tasks typically done by junior designers.
However, AI is not yet capable of completely taking over the creative process. A designer does not simply draw an interface, for example. They consider how to make the interface user-friendly: what to hide to avoid overloading the user and what to highlight in contrast.
AI, on the other hand, will most likely display everything at once — it will create as many buttons as there are actions without setting priorities or considering what is truly important and convenient for the user. Technology can offer many ideas and help put together the “skeleton” of a design, but ultimately, it’s the human who decides what will work best for a particular project.
I am confident that AI will not replace middle-level and higher designers. First, the final result will always need human review and refinement. Second, AI still can’t replicate the way humans analyze, feel, and interpret. Moreover, AI-generated output tends to be repetitive, and this is one of the main problems. Fundamental new ideas come from people. But AI excels at something else: it can quickly reproduce existing solutions and, most importantly, adapt them to new tasks — for example, when it comes to code, it can rewrite functionality for a different framework.
I don’t see this as a threat — on the contrary, AI is becoming a valuable tool that handles repetitive work, allowing us to focus on more important ones.
Conclusion: Can AI Achieve True Visual Intelligence?
Teaching AI visual intelligence is a significant engineering task. It is not just about generating something that looks good but about using form to convey meaning, style, and emotion.
In my experience, if AI has access to good data and the feedback is structured correctly, it can approach that level. With a clearly defined technical task or through a series of iterations, you can achieve a good result. However, this still doesn’t translate well into design practice. In visual generation, each new attempt completely replaces the previous one, and errors that we corrected a couple of steps back often return. Like humans, AI needs experience, trial and error, and constant learning through interaction with reality.
Designers possess skills that cannot be reproduced algorithmically — such as empathy, critical thinking, intuition, and curiosity. That is why AI will not replace them but only transform the very nature of design work: routine tasks can be delegated to machines, leaving humans with the most critical task — to invent and find meaning.