Our paper with Alan Zhou on modeling speech recognition and synthesis simultaneously got into Interspeech 2022.

The model learns lexical and sub-lexical (phon/n-gram) information without a direct access to training data.

One important finding: binary codes encode holistic (lexical) info, individual bits encode featural (sublexical) info in an interpretable way (tested with a causal technique).

Paper: arXiv

Modeling speech recognition and synthesis simultaneously.