Workflow is a powerful iOS task automation application that can daisy chain actions across different apps on your iPhone or iPad, and execute the combination of steps via a single tap (think along the lines of Automator for the Mac or Logic Apps on Azure). Recently, I discovered that Workflow can also consume and parse JSON from a web response which opens the door to a number of possibilities... :) one of which being OCR via Cognitive Services.
Build Your Own OCR (Image to Text) App on iOS
- You will need an Azure account. If you don't have one, sign up for a free trial.
- Download and install Workflow on an iOS device via the App Store.
Note: The required Azure resource (Computer Vision) has a free tier (20 calls per minute, 5K calls per month) which is sufficient for this demo.
1. Create a Computer Vision resource via the Azure portal.
2. Navigate to the resource and copy and paste the Computer Vision API key. Preferably, copy the key across to a text editor on your iOS device (e.g. Notes) as we will need this later on to update our workflow.
3. Download the pre-created workflow by tapping on this link via an iOS device (iPhone or iPad) that has Workflow installed. Once loaded, tap Open in "Workflow".
4. Replace the placeholder text with the Computer Vision API key (from step 2).
- If you created your Computer Vision resource in the West US region, you are done and can hit the play button to test the workflow. The workflow app will present a one-time warning that "...this workflow was imported from Safari. Are you sure you want to run it?", tap Run Workflow to dismiss this message.
- If you created your Computer Vision resource in a different region, you will need to perform an additional step.
5. ** This is only required if your Computer Vision resource has been created outside the West US region ** Scroll down to the URL step and replace westus with your region. Note: You can check your resource endpoint via the Azure portal under the Overview section of your Computer Vision resource.
The illustration below provides a helicopter view of all steps encompassed in the workflow.
- The initial menu provides three choices (Take Photo, Latest Photo or Select Photo).
- Once the image has been selected, the editor is presented to crop the relevant section.
- The image is then converted to JPEG and sent to Cognitive Services for processing via an HTTP POST request.
- The response is then converted into a dictionary and ultimately parsed to retrieve the text.
- The output of combined text is then sent to the Notes app.
For those that are interested in understanding how to compile this workflow manually, step-by-step instructions in the video below.