

At the end of the day, everyone gets a nice conversation transcript and later one we can do fancy things like topic analysis or maybe Big Brother wants to sift over everyone's project meetings without having to listen to hours of audio.You can download the Discord app on both the desktop and your smartphone. In any case, once the segmentation is solved, each participant has their trained speaker model, which is then applied to their portions of the audio. For real-time transcript as the call goes on, I imagine we'd need some fancy Real Time Speaker Diarization algorithm. In case the first option isn't feasible or prohibitive in some way, we have to use a Speaker Diarization algorithm, which segments the audio into N clusters/speakers (most algorithms allow for being told how many speakers in the audio, but some can figure this out on their own). you just record all the audio from each speaker's microphone during the call, and you don't have to do any segmentation. There is an easy way to retrieve all the audio that came from each participant, e.g. While it would be trivial to record the audio and send it to a speech-to-text engine, I doubt it would be very high quality because the best results are usually speaker dependent models (else we wouldn't have to take time to train Dragon Naturally Speaking).īut, before we can choose speaker dependent transcription models, we need to know which segment of the audio belongs to which speaker.


The top 3 Google search hits of "automatically transcribe Skype" refer to apps which make manual transcription easier: The transcript could then be input to any variety of search or NLP algorithms.

Assuming each participant agrees to the recording and transcription of the Skype call, is there a way to transcribe the meeting (either live or offline or both) such that it produces a text transcript where each spoken text is correctly attributed to the speaker.
