Synapse Espresso: Data Quality Checking with OpenAI GPT-3 in Azure Synapse Spark!

Welcome to the 24th episode in our Azure Synapse Espresso series!

In this video, Thomas joins Stijn to show how we can clean our data using the Azure OpenAI GPT-3 from within Spark in Synapse Analytics. GPT-3 is a natural language AI model and it’s able to understand text that you feed into it and then generate now text based on your input.

We showcase how you can create user defined functions to perform text-based data cleaning on dirty datasets. We will do cleanup of various data, including common issues like data formatting, country formatting with different languages, turning a address into a JSON field and doing proper time-formatting.

SynapseML: https://microsoft.github.io/SynapseML/ (opens in new tab)
What is Azure OpenAI? https://learn.microsoft.com/en-us/azure/cognitive-services/openai/overview (opens in new tab)

Date: