Blog > Read Our Blog in English

How to Turn WhatsApp Voice Notes into AI-Generated Ad Images (Make.com + GPT 1.5)

Posted by Axel M | January 4, 2026

How to Turn WhatsApp Voice Notes into AI-Generated Ad Images (Make.com + GPT 1.5)

Transform your quick voice ideas into professional Facebook ad creatives using Make.com, OpenAI's GPT 1.5, and WhatsApp automation

Table of Contents

  1. Introduction
  2. What You'll Build
  3. Prerequisites
  4. Understanding the Workflow
  5. Step 1: Setting Up WhatsApp Connection
  6. Step 2: Receiving Voice Messages
  7. Step 3: Transcribing Audio to Text
  8. Step 4: Generating Image Prompts with AI
  9. Step 5: Creating Images with GPT 1.5
  10. Step 6: Saving to Google Drive
  11. Step 7: Sending Results Back to WhatsApp
  12. Optimizing Your Automation
  13. Advanced Use Cases
  14. Troubleshooting

Introduction

OpenAI just released GPT 1.5, their new image generation model that's getting rave reviews for text rendering and overall quality. But what if you could combine this powerful AI with the convenience of WhatsApp voice messages to create Facebook ad images on the go?

Imagine this: You're at a skateboard event, inspiration strikes, and you record a quick voice note on WhatsApp describing your ad idea. Within seconds, you receive a professionally generated ad image ready to use. That's exactly what we're building in this tutorial.

This isn't just another AI image generation tutorial. We're creating a complete automation that takes your spoken ideas and turns them into finished ad creatives, all while you're mobile.

Why This Automation Matters

For Marketers: Create ad concepts anywhere, anytime - no design software needed
For Agencies: Rapid prototyping with clients during meetings or site visits
For Social Media Managers: Quick content creation from voice ideas
For Business Owners: Turn spontaneous ideas into visual content instantly

What You'll Build

By the end of this tutorial, you'll have a fully automated system that:

  1. Receives voice messages on your WhatsApp Business account
  2. Downloads and transcribes the audio using OpenAI Whisper
  3. Transforms the transcription into an optimized image generation prompt
  4. Generates professional ad images using GPT 1.5
  5. Saves images to your Google Drive for organization
  6. Sends the final image back to you on WhatsApp

Example Workflow:

You send a voice note: "I want to show a super nice skate park and a paella stand in front of it. The event is called Skate Park Late into the New Year. Show a lively beach skate park with palm trees and a cool atmosphere."

Within 30-60 seconds, you receive a professionally generated event poster image matching your description.

Time to Build: 30-45 minutes
Technical Level: Intermediate (some Make.com experience helpful)

🎥 Watch the Complete Video Tutorial

Follow along with the video or use this written guide - both cover the complete automation setup!

Prerequisites

Before starting, make sure you have:

Required Accounts

  • WhatsApp Business Account - Set up at business.whatsapp.com
  • WhatsAble Account - Sign up at whatsable.app for Notifyer system
  • Make.com Account - Free or paid plan
  • OpenAI Account - With API access and credits
  • Google Drive Account - For storing generated images

API Keys Needed

  • WhatsAble API key (from your dashboard)
  • OpenAI API key (from platform.openai.com)
  • Google Drive connection (authorized through Make.com)

Technical Requirements

  • Basic understanding of Make.com scenarios
  • Familiarity with API concepts
  • WhatsApp Business number connected to WhatsAble

Understanding the Workflow

Let's break down what happens in this automation:

The Complete Flow

Voice Input: You send a WhatsApp voice message with your ad idea

Audio Download: Make.com receives the message and downloads the audio file

Transcription: OpenAI Whisper converts speech to text

Prompt Enhancement: Another AI layer transforms your casual description into a detailed image generation prompt

Image Generation: GPT 1.5 creates the image based on the enhanced prompt

Storage: Image is saved to Google Drive with a shareable link

Delivery: Final image is sent back to you on WhatsApp

Why This Approach Works

The key to quality results is the prompt enhancement step. Your raw voice transcription might say "show a nice skate park with food." The AI prompt enhancer transforms this into something like "Professional event poster featuring a vibrant beachside skate park at sunset, palm trees swaying, a colorful paella stand in the foreground, energetic atmosphere, modern design, high quality rendering."

This middle layer dramatically improves your final image quality.

Step 1: Setting Up WhatsApp Connection

Choosing Your WhatsApp Solution

WhatsAble offers two main products for WhatsApp automation:

Notifyer System - Use your own WhatsApp Business number with full control. This is what we'll use in this tutorial. Perfect if you want a professional setup with your brand's number.

WhatsApp Bot - Use WhatsAble's WhatsApp number for automation. Good for testing or if you don't have a Business account yet.

Both work with Make.com, but the module selection differs slightly.

Installing the WhatsAble Module in Make.com

  1. Log in to your WhatsAble account
  2. Navigate to Settings → Integrations
  3. Click "Connect to Make"
  4. Follow the authorization process with Make.com
  5. Select your workspace or organization
  6. The WhatsAble module is now available in Make.com

Connecting Your Account

In Make.com, when you add the WhatsAble trigger, you'll need to connect your account using your API key from the WhatsAble dashboard. The webhook is set up automatically - when you receive a WhatsApp message, Make.com is notified instantly.

Step 2: Receiving Voice Messages

Setting Up the Trigger

In Make.com, create a new scenario and add your trigger module:

For Notifyer System users: Search for "NotifierSystem by WhatsAble" and select "Watch Incoming Messages"

For WhatsApp Bot users: Search for "Notifyer Bot" and select the appropriate trigger

Understanding the Data

When a voice message arrives, WhatsAble provides:

  • Phone number of the sender
  • Message type (in this case, audio)
  • Attachment URL (link to the audio file)
  • Timestamp
  • Sender name
  • Conversation context

The attachment URL is what we need - it's a direct link to download the voice message audio file.

Testing Your Trigger

Before building the rest of the automation, test the trigger. Turn on your scenario, send yourself a voice message on WhatsApp, and verify that Make.com receives the data. You should see the attachment URL in the trigger output.

Step 3: Transcribing Audio to Text

Downloading the Audio File

Before transcribing, we need to download the audio file from WhatsAble's attachment URL.

Add an HTTP module after your trigger. Set it to GET request and use the attachment URL from the previous step. This downloads the complete audio file as base64 encoded data.

Using OpenAI Whisper for Transcription

Add the OpenAI module called "Generate a Transcription" (you'll find it by searching "transcribe" in Make.com).

Configuration:

File Name: You can name it anything, but the extension MUST be.ogg (this is the officially supported format)

File Data: Map the entire response data from the HTTP download module

Model: Choose the Whisper model (usually whisper-1)

The output will be your voice message converted to text. For example, if you said "I want to show a super nice skate park and a paella stand," that's exactly what you'll get in text format.

Step 4: Generating Image Prompts with AI

Why Enhance the Prompt?

Raw transcriptions are casual and often lack the descriptive detail needed for quality image generation. An enhancement layer transforms your casual speech into professional prompts.

Setting Up Prompt Enhancement

Add another OpenAI module called "Generate a Response" (found under generic OpenAI modules).

Your System Prompt:

You are an expert at creating detailed image generation prompts for advertising and marketing. Take the user's casual description and transform it into a vivid, detailed prompt that will generate professional-quality ad images. Focus on visual details, atmosphere, composition, and style.

User Input:

Map the transcription text from the previous step.

The AI will transform "show a nice skate park with food" into something like "Professional advertising photograph of a vibrant beachside skate park at golden hour, modern architecture, energetic atmosphere with skaters in action, colorful paella stand in foreground with steaming pans, palm trees, warm lighting, magazine quality, high resolution, suitable for event promotion."

Step 5: Creating Images with GPT 1.5

Why Use the Generic API Module

Make.com has a standard "Generate Image" module for OpenAI, but it hasn't been updated yet to include GPT 1.5. Don't worry - there's a simple workaround using the generic API call module.

Preparing the JSON Request

First, add a "Create JSON" module. This structures your API request properly.

Create a new data structure with these fields (all text except where noted):

  • model (text): "dall-e-3" or your chosen model identifier
  • prompt (text): Map your enhanced prompt from Step 4
  • n (number): 1 (how many images to generate)
  • quality (text): "hd" for high quality
  • response_format (text): "b64_json" (we want base64 for easy upload)
  • size (text): "1024x1024" or your preferred dimensions

Making the API Call

Add the OpenAI "Make an API Call" module.

Configuration:

URL Endpoint: /v1/images/generations

Method: POST

Headers: Content-Type = application/json

Body: Map the JSON output from your previous step

What you get back is a base64 encoded image file ready to be saved or sent.

Step 6: Saving to Google Drive

Why Save to Drive First

Saving to Google Drive gives you a shareable URL and organizes all your generated images in one place. It's also useful for reviewing and selecting which images to actually use in ads.

Uploading the Image

Add a Google Drive "Upload a File" module.

Configuration:

Select Folder: Choose or create a folder for your AI-generated ads

File Name: Use something dynamic like the timestamp or a unique ID from OpenAI's response, followed by.png

Convert Data: This is important! Use this formula to decode the base64 image data:

toBinary(base64(your_base64_data_field))

This converts the encoded image data into an actual PNG file that Google Drive can display properly.

The module outputs a "web content link" - this is the direct URL to your image that you can share or send via WhatsApp.

Step 7: Sending Results Back to WhatsApp

Composing Your Response

Add another WhatsAble module, this time "Send Message Without Template."

Configuration:

Connection: Use your existing WhatsAble connection

Recipient Number: Map the phone number from your original trigger (the person who sent the voice note)

Message Type: Image

Image URL: Use the web content link from Google Drive

Caption: Optional - you can add text like "Here's your generated ad image!"

Within seconds of sending your voice note, you'll receive the finished image back on WhatsApp, ready to review and use.

Testing the Complete Flow

  1. Turn on your Make.com scenario
  2. Send yourself a WhatsApp voice message with an ad description
  3. Wait 30-60 seconds
  4. Receive your AI-generated image
  5. Check your Google Drive folder for the saved file

Optimizing Your Automation

Improving Image Quality

Better Voice Descriptions: Be specific about style, mood, colors, and composition in your voice notes. Instead of "nice beach," say "golden hour beach with warm orange lighting."

Refine the Enhancement Prompt: Experiment with your system prompt in Step 4. Add style preferences like "photorealistic," "illustration style," or "modern minimalist design."

Adjust Image Parameters: Try different sizes, quality settings, and model versions to find what works best for your needs.

Handling Multiple Images

Modify the "n" parameter in your JSON to generate multiple variations. Set it to 3 or 4, then update your Google Drive step to handle multiple files, and send all options back to WhatsApp.

Adding Brand Elements

Consider adding a step that overlays your logo or brand colors onto generated images using an image editing API or service before sending back.

Quality Control Layer

Add a conditional filter that only sends images if they meet certain criteria, or add a manual approval step where images are saved to Drive but you choose which ones to send back.

Advanced Use Cases

Direct Facebook Ad Creation

Take this automation further by connecting to Facebook's Marketing API. After generating the image, automatically create a draft ad in your Ads Manager with the image and suggested copy.

Add Facebook Marketing API modules after the image generation step. Use the enhanced prompt to also generate ad copy, then create a complete ad draft including image, headline, description, and targeting suggestions.

Multi-Language Support

Add language detection to the transcription. Based on the detected language, adjust the prompt enhancement to generate region-appropriate imagery and save to different Drive folders for different markets.

Team Collaboration

Set up a shared WhatsApp group where team members can send voice ideas. The automation generates images and posts them to a team Slack channel or shared folder for review and voting.

A/B Test Variations

Generate multiple image variations with different styles automatically. One prompt might generate a photo-realistic version, another an illustration, another a minimalist design - all from the same voice note.

Template-Based Ads

Integrate with design templates. Instead of generating complete images, generate background images that are automatically placed into pre-designed ad templates with your branding, text overlays, and CTAs.

Troubleshooting

Voice Message Not Triggering Scenario

Check that your WhatsApp number is properly connected to WhatsAble. Verify the webhook is active in your WhatsAble dashboard. Ensure the Make.com scenario is turned ON.

Transcription Errors

Make sure the file extension is.ogg when you transcribe. Verify your OpenAI API key has sufficient credits. Check that the HTTP download module successfully retrieved the audio file.

Poor Quality Images

The issue is usually in the prompt enhancement step. Review what prompt is being sent to the image generator. Try being more descriptive in your voice notes. Adjust your enhancement system prompt to emphasize the visual qualities you want.

Image Not Appearing in WhatsApp

Verify the Google Drive link is publicly accessible or properly shared. Check that you're using the web content link, not the regular Drive link. Ensure the image URL field is correctly mapped.

GPT 1.5 Not Available

Remember we're using the generic API call method because Make.com hasn't updated their official module yet. Double-check your endpoint URL and model identifier. Ensure your OpenAI account has access to the model.

Base64 Conversion Errors

The formula for converting to binary must be exact. Make sure you're using toBinary and base64 functions correctly. Test with a small image first.

Getting Help

For WhatsApp connection issues, contact team@whatsable.app for support. For Make.com scenario problems, check their community forums. For OpenAI API questions, review their documentation at platform.openai.com.

Conclusion

You've just built a powerful automation that transforms spontaneous voice ideas into professional ad images. This is the kind of tool that seemed impossible just a few years ago - now it's something you can build in an afternoon.

What You've Accomplished

You created an end-to-end automation connecting WhatsApp voice messages, AI transcription, intelligent prompt enhancement, cutting-edge image generation, cloud storage, and instant delivery. That's five different technologies working seamlessly together.

The Bigger Picture

This tutorial demonstrates the principle of "voice-to-visual" automation. The same workflow can be adapted for product photography descriptions, social media posts, presentation slides, website mockups, or any visual content you need to create quickly.

Take It Further

Now that you understand the workflow, experiment with it. Try different image models, add variation generators, connect it to your ad accounts, or build a gallery of all your generated images. The foundation is there - now make it yours.

Ready to Start Creating?

Set up your automation today:

  1. Sign up for WhatsAble Notifyer to connect your WhatsApp Business
  2. Get your OpenAI API key and add credits to your account
  3. Create your Make.com scenario following this tutorial
  4. Send your first voice note and watch the magic happen
  5. Need help? Reach out to team@whatsable.app

Turn your voice into visuals - start automating your creative process today! 🚀

Frequently Asked Questions

Q: Do I need a WhatsApp Business account?
A: Yes, but WhatsAble makes it easy to set up. You can either use their Notifyer system with your own number or their Bot service with their number.

Q: How much does this cost to run?
A: You'll need WhatsAble subscription (starts around $29/month), Make.com (free plan works for testing), and OpenAI API credits (varies by usage, roughly $0.04-0.08 per image).

Q: Can I use this for client work?
A: Absolutely! This is perfect for agencies showing quick concepts to clients or creating multiple ad variations rapidly.

Q: How long does it take to generate an image?
A: Usually 30-60 seconds from sending your voice note to receiving the image, depending on API response times.

Q: What languages are supported for voice notes?
A: OpenAI Whisper supports 50+ languages for transcription, so you can speak in your preferred language.

Q: Can I customize the image style?
A: Yes! Modify the prompt enhancement system message to emphasize specific styles like "photorealistic," "illustration," "minimalist," etc.

Q: What if I want to generate multiple versions?
A: Change the "n" parameter in your JSON to 2, 3, or 4 to generate multiple variations from one voice note.

Q: Is there a limit to voice message length?
A: WhatsApp allows voice messages up to 15 minutes, but keep descriptions concise (30-60 seconds) for best results.

Last Updated: January 2025 | Contact Support | WhatsAble Documentation

SEO Keywords: #VoiceToImage #AIAdAutomation #MakeComTutorial #GPT15 #WhatsAppAutomation #WhatsAbleNotifyer #AdCreativeAutomation #OpenAIImageGen #VoiceNoteAds #CreativeAutomation

You might also enjoy