Understanding Azure Cognitive Services

A high-level introduction to the awesome ready-made AI solutions for text, vision, speech, and more

Microsoft Azure offers an umbrella service known as Cognitive Services. This service provides AI capabilities that you can integrate into your existing applications through a single managed area.

While you could accomplish the things in Azure Cognitive Services yourself using machine learning, Azure Cognitive Services offers ready-made polished capabilities that can be quickly added to your applications.

In this article we’ll explore each of these services at a high level.

This content is also available in video form on YouTube

Components of Cognitive Services

Cognitive services allows you to manage all of your cognitive APIs in one place and under one billing banner. However, if you wish to use different API keys for different capabilities or track billing at a more granular level, you can always work directly with individual services without creating a cognitive services instance, but it’s often easier to manage things through a single cognitive services instance.

Text Analysis

  • Entity Recognition - allows Azure to detect locations, people, dates, quantities, and other entities in text following language’s grammatical rules
  • Sentiment Analysis - analyzes a piece of text to determine if the text is positive, negative, or somewhere in-between
  • Language Understanding (LUIS) - allows you to interpret text by taking an utterance (something the user entered) and finding a registered intent with specific entities that can help process that intent
  • Translation - translates text from one language to another
  • Question Answering - this is a component of conversational AI which will be discussed later in this article

Any application that allows users to enter commands can benefit from the text offerings of cognitive services.

Vision

Azure’s vision APIs are maybe the most awe-inspiring aspect of the cognitive services offering. While you could build any of them yourself with deep learning, these APIs exist out of the box and work remarkably well. These APIs will allow you to describe images, detect objects, identify faces, and extract text from images.

Vision capabilities include:

  • Computer Vision - pre-trained image recognition capable of recognizing and interpreting a wide range of objects
  • Custom Vision - image recognition with images you provide
  • Text Extraction - optical character recognition via the OCR and Read APIs
  • Face API - an API for analyzing faces, recognizing emotion, attire, makeup, and facial landmarks

It’s not hard to imagine any how application that allows users to upload images could benefit from this service in generating accessible alt-text to display to users using screen readers, identify common people in images, and offer intelligent capabilities surrounding user-provided content.

Speech

Azure can add speech synthesis, recognition, and translation capabilities to your application, as well as additional security through speaker identification and verification.

The speech offerings in Cognitive Services include:

  • Text to Speech - take text content and convert it to spoken audio. Also called speech synthesis.
  • Speech to Text - recognizes spoken audio and extracts words from that audio. Also called speech recognition.
  • Speaker Recognition - detect or verify the voice of known individuals you’ve enrolled into the service
  • Speech Translation - translate spoken audio from one language to another in real-time

It may be less common for apps to use speech capabilities, but even apps not designed to work with speech initially may benefit from adding options letting users talk to the app. Mixing speech to text and LUIS together, for example, would allow users to interact with the application in new ways.

Decision-Making

Finally, Cognitive Services offers a trio of decision-making services that contain useful offerings a little different than the other types of things available in the service.

  • Anomaly Detection - detects unusual data entries that may represent fraudulent activity
  • Content Moderation - can detect and flag adult content in text and images
  • Content Personalization - a recommendation engine that personalizes the user experience based on user interaction

Each one of these services improve the overall experience in subtle ways by making sure the data flowing through your application is suitable to its users and improving the overall user experience in ways users may not even notice.

Next Steps

Down the road I plan on writing more about the various cognitive services. In the meantime, I’d recommend checking out the excellent and free Microsoft Learn content on Cognitive Services.

Microsoft Learn Content on Cognitive Services

Studying for the Azure AI Fundamentals exam is also a great way to get exposure to everything in cognitive services and can be done in about a week.

Of course, you can also just create a new instance of cognitive services and play around. Whatever you choose to do, I hope you have fun exploring cognitive services.