Abstract: Speech signals contain rich information, such as textual content, emotion, and speaker identity. To extract these features more efficiently, researchers are investigating joint training ...