Automation

A powerful framework to explain Intelligent Automation

The content of this article is inspired by the Amazon bestseller book “Intelligent Automation.”

Intelligent Automation (IA), also known as hyperautomation, is a set of technologies and methods for automating the work of white-collar professionals and knowledge workers. Here, we present a framework for explaining its power in terms of four main capabilities—Vision, Execution, Language, and Thinking & Learning—and how they can enable business transformations, with people and business goals at the center.

Vision

Computer vision is an area of technology that’s progressing extremely rapidly, with new breakthroughs coming all the time. It is coupled with deep learning, allowing the computer to make sense of what it sees and make intelligent guesses about any missing visual information, as the human brain does for the eyes.

Applications of computer vision in a physical environment include recognizing objects (for example, in a robot navigating its physical environment) and interpreting signs and road markings (for example, in a self-driving car). However, it has even more applications in a digital environment.

Computer vision is used for intelligent character recognition (ICR), the more advanced descendant of optical character recognition (OCR). ICR can be used to digitize documents, such as invoices, contracts, or IDs, and extract and interpret the information in them.

Finally, computer vision can automate the analysis of images and videos. This has a vast range of applications. It can automate medical diagnostics, improving outcomes and freeing up doctors’ time. It can provide retail store automation, such as Amazon Go, where cameras determine which items a customer has picked up and billing them accordingly, without the need for a human checkout assistant. It can be used in business process documentation, automating what is usually a lengthy and resource-intensive process by detecting the applications and objects that a computer user interacts with and then creating a flowchart of the process, complete with screenshots. Finally, computer vision can be used for biometrics, with applications in identification, access control, and surveillance.

Execution

Execution involves actually doing things—accomplishing tasks in digital environments. This can include clicking on buttons, typing text, logging in and out of systems, preparing reports, and sending emails.

The execution capability acts as a glue to connect the other capabilities together in a streamlined way. For example, it can collect sales data using the Vision or Language capabilities, automatically convey the data to the Thinking & Learning capability for analysis, and compile and send out a report about the findings, with the help of the Language capability.

The key technologies supporting the Execution capability are smart workflow, low-code platforms, and robotic process automation (RPA). Smart workflow platforms help to automate predefined standard processes. (If existing processes have not yet been documented, this can be achieved using IA-powered business process documentation, as discussed in the previous section). Low-code platforms allow business users without coding skills to develop automated programs. RPA, which is the most powerful of these technologies, is used to automate any tasks that a human can do on a computer, such as opening applications, clicking menu items, entering text, or copying and pasting. It learns by recording the actions of the human user and then automates them to save time.

Language

The Language capability enables machines to read, write, listen, speak, and interpret the meaning of natural human language. It’s used to extract useful information from unstructured documents, to categorize text (for example, in spam filters), and to perform sentiment analysis. It enables text-to-speech, speech-to-text, and predictive text keyboards. It’s also used to power chatbots, such as ANZ Bank’s Jamie, which is used to onboard new clients and guide them through the bank’s services, or Google Duplex, which can book restaurant tables and hair appointments over the phone; and machine translation, such as Google Translate, which is used by 500 million people each day to translate over 100 languages.

Natural language processing (NLP) used to be coded as a set of rules, but nowadays it works using deep learning: reading large amounts of text and noticing correlations and patterns, very much like how humans learn languages.

Thinking & Learning

The Thinking & Learning capability is about analyzing data, discovering insights, making predictions, and supporting decision-making. It can work autonomously, triggering automated process activities, or it can be used to augment human knowledge workers, providing them with insights to guide their decisions and actions.

The key technology behind this capability is machine learning—most of all, deep learning, which is the newest and most powerful component of machine learning. Deep learning uses neural networks with multiple layers, each one processing and interpreting the data at a different level, inspired by how the human brain works. It learns autonomously from large amounts of training data, spotting patterns and correlations without being explicitly taught or programmed with any rules. It excels when faced with complex, unstructured data with numerous features, so it’s used for image classification, natural language processing, and speech recognition.

The Thinking & Learning capability also covers data management: acquiring, validating, cleaning and storing the data needed for machine learning, and providing data visualizations to help guide human decision-makers.

The impact of IA capabilities on your business

Our aim is that this conceptual framework will equip you with the information you need for selecting vendors and choosing technologies for your IA journey, and fitting them into your existing organizational IT landscape.

Beyond the value that these four capabilities can deliver individually, though, the combination of them unlocks more impact than the sum of their parts. Combining the technologies broadens the scope of automation, from isolated tasks to continuous, automated, touchless end-to-end processes.

Related Articles

Back to top button