Video Transcription service translates video's audio into text. Cloud provider like AWS and Google has made the translation process easy, all you have to do is upload your video and use their captioning tool to get the corresponding transcript.
Amazon:
Amazon transcribe analyzes the audio files that contain speech and uses advanced ML techniques for transcription. To know more about it see here
Google:
Google calls it Clod Speech to Text, you can find more about it here.
Code Sample
Django Training session using Video transcription can be found in GitLab here