Transcription Service
Example Podcast Transcription
Response Formats
speaker_organized
single_paragraph
raw
Key Features
-
100% Serverless and Edge-Optimized: Pay only for what you use with CloudFront, CloudFront Key Value Store, AWS Lambda, and S3.
-
Deepgram Integration: Utilize the powerful Speech-to-Text API with support for 36 languages.
-
Speaker Diarization: Transcripts clearly identify speakers, ideal for meetings, interviews, and podcasts.
-
Custom Domain Configuration: Set up a custom domain for a professional and accessible transcription service.
-
Flexible Deployment Options: Choose local deployment for easy management or automate with GitHub Actions for seamless AWS integration.
-
API Key Security: Secure access with API key generation and management using CloudFront Key Value Store.
-
Synchronous and Asynchronous API Endpoints: Instant transcriptions for short audio files and background processing for longer ones.
Architecture
Helper Functions
- Makefile - Contains all commands mentioned in README
- Download YouTube video
- Convert MP4 to MP3
Prerequisites
Before you begin, ensure you have the following:
- An AWS account
- AWS CLI installed and configured - Install the AWS CLI
- AWS SAM CLI installed - Install the SAM CLI
- Python installed
- Docker install - Install Docker
Deployment Options
- Deploy via GitHub Actions
- Deploy infrastructure from your local environment
GitHub Actions: Steps to Deploy
Step 1: Create a Deepgram Account and API Key
- Sign up for an account at Deepgram.
- Create an API key within the Deepgram dashboard.
Step 2: Upload Deepgram API Key to AWS Parameter Store
- Replace
'your-deepgram-api-key'
with your actual Deepgram API key. - Run the following command to store your API key in AWS Parameter Store:
Step 3: Fork & Clone Repository
- Navigate to https://github.com/whatthecloud-io/transcription-service, click "Fork", then create the repository under your account.
- Clone the repository
Step 4: Create GitHub OIDC Provider in Your AWS Account
- Replace
YourGithubUserName
with your GitHub username or org. - Run the command below to create an GitHub OIDC Provider and IAM Role named
github-oidc-deploy-role
which has Administrator permissions (feel free to update to meet your security practices).
github-oidc-deploy-role
will be assumed by your GitHub workflow to deploy the infrastructure to your AWS Account.
Step 5: Update .github/workflows/pipeline.yaml
- Open .github/workflows/pipeline.yaml, uncomment
#- main
and updateaws-account-id
with your AWS Account Id
This will trigger a deployment. Navigate to the Actions
tab in GitHub, click on the deployment, then again, to see the list of deployment steps.
Once deployed, you will notice CloudFrontDistributionUrl
or CustomDomainName
at the bottom of the Deploy SAM application
step. You will need one of them during Post Deployment Steps to Use the Transcription Service
Optional - Configure Custom Domain
If you want to use a custom domain instead of the default CloudFront domain, configure the following parameters below in .github/workflows/pipeline.yaml
:
hosted-zone-id
: The ID of your hosted zone in Route 53.hosted-zone-name
: The domain name (e.g.,example.com
).subdomain
: The subdomain for your service (e.g.,transcribe
).
The above example would produce the domain https://transcribe.example.com
Now go to Post Deployment Steps to Use the Transcription Service
Local: Steps to Deploy
Step 1: Create a Deepgram Account and API Key
- Sign up for an account at Deepgram.
- Create an API key within the Deepgram dashboard.
Step 2: Upload Deepgram API Key to AWS Parameter Store
- Replace
'your-deepgram-api-key'
with your actual Deepgram API key. - Run the following command to store your API key in AWS Parameter Store:
Step 3: Fork & Clone Repository
- Navigate to https://github.com/whatthecloud-io/transcription-service, click "Fork", then create the repository under your account.
- Clone the repository
Step 4: Deploy CloudFormation Stack
Run the following command to build and deploy the CloudFormation stack (Docker should be running):
Once deployed, you will notice an Outputs
section in your terminal. You will need CloudFrontDistributionUrl
or CustomDomainName
in a later step.
Optional - Configure Custom Domain
If you want to use a custom domain instead of the default CloudFront domain, configure the following parameters below:
HostedZoneId
: The ID of your hosted zone in Route 53. Looks likeZ2FDTNDATAQYW2
HostedZoneName
: The domain name (e.g.,example.com
).SubDomain
: The subdomain for your service (e.g.,transcribe
).
The above example would produce the domain https://transcribe.example.com
This configuration will produce a custom domain like transcribe.example.com
.
Now continue with Post Deployment Steps to Use the Transcription Service
Post Deployment Steps to Use the Transcription Service
Step 1: Install Python Dependencies
Ensure you have the necessary Python dependencies installed:
Step 2: Generate Test API Key
Run the following command to generate a test API key and insert it into the CloudFront Key Value Store:
Step 3: Transcribe Audio Files
Synchronous Transcription
- Replace
your-domain
with eitherCloudFrontDistributionUrl
orCustomDomainName
from the CloudFormation stack outputs. - Replace
your-generated-api-key
with the API Key you generated in step 2
To transcribe a 5-minute Apple Intelligence MP3 file synchronously, run:
This will generate a transcript of the audio file and return the result:
Asynchronous Transcription
- Replace
your-domain
andyour-generated-api-key
again.
To transcribe a 19-minute Total-Microsoft-Recall MP3 file asynchronously, run:
This will return a job ID for the transcription task:
Step 6: Retrieve Asynchronous Transcription Result
- Replace
your-domain
andyour-generated-api-key
again. - Replace
job-id
withjob_id
from the previous request response.
This will return the status and transcript when it is complete.
This will return the transcription result for the specified job ID:
[!NOTE]
- Requests made to the
/transcribe/sync
endpoint must complete within 30 seconds; otherwise, they will time out. If you expect the transcription to take longer than 30 seconds, use the/transcribe/async
and/transcribe/result
endpoints.- The
response_format
parameter supports the following options:speaker_organized
,single_paragraph
,raw
.
Wrap up
By following the steps outlined above, you will deploy and use the transcription service successfully. Make sure to update the necessary parameters and variables in the Makefile as needed. If you encounter any issues, refer to the AWS and Deepgram documentation for further assistance.
License
This template is a commercial product and is licensed under the WhatTheCloud License
transcription-service
Serverless, edge-optimized transcription service leveraging AWS CloudFront, Lambda, and Deepgram. Implements API Key authentication via CloudFront Key Value Store and Functions. Provides synchronous and asynchronous endpoints for MP3 transcription, with results stored in S3. Ideal for processing podcasts, meetings, and interviews.
$149
one-time payment
- Automated AWS Deployments via GitHub Actions Workflow
- Full Access to GitHub Repository
- CloudFormation Scripts
- Deployment Guides