Synthetic data generation has drawn growing attention due to the lack of training data in many application domains. It is useful for privacy-concerned applications, e.g. digital health applications based on electronic medical records. It is also attractive for novel applications, e.g. multimodal applications in meta-verse, which have little data for training and evaluation. This project focuses on synthetic data generation for audio and the corresponding multimodal applications, such as mental health chatbots and digital assistants for negotiations.
For most such applications, the key technical challenge is to create disentangled representations for paralinguistic information and the content of speech. Herein, a component of such a disentangled representation contains only necessary information for the relevant attributes or properties. General composition mechanisms will be learned such that applications can combine appropriate components to generate desired data. For example, the script of an emotion support conversation can be well combined with the desired emotion and voice patterns for generating natural and empathetic speech. However, current deep generative models perform poorly for compositional generalization [3]. The recent work shows that disentangled representations can be defined from a causal perspective [1]. This is also relevant to causal representation learning, which aims for robustness and strong out-of-distribution generalization capability [2]. If user-specific information is identified and removable from the input data, the devised techniques can also be applied for privacy-sensitive applications, such as privacy-preserving ASR.
[1] Wang, Yixin, and Michael I. Jordan. "Desiderata for representation learning: A causal perspective." arXiv preprint arXiv:2109.03795 (2021).
[2] Schölkopf, Bernhard, Francesco Locatello, Stefan Bauer, Nan Rosemary Ke, Nal Kalchbrenner, Anirudh Goyal, and Yoshua Bengio. "Toward causal representation learning." Proceedings of the IEEE 109, no. 5 (2021): 612-634.
[3] Hupkes, Dieuwke, Verna Dankers, Mathijs Mul, and Elia Bruni. "Compositionality decomposed: how do neural networks generalise?." Journal of Artificial Intelligence Research 67 (2020): 757-795.
Similar Positions
-
Casual Clinical Educators (Adult Speech Pathology) Expressions Of Interest, La Trobe University, Australia, about 21 hours ago
Job Description We are seeking expressions of interest from adult Speech Pathologists who want to be considered as casual clinical educators in the La Trobe Communication Clinic and residential ag...
-
Senior Research Engineer/Project Manager, CSIRO, Australia, 12 days ago
Acknowledgement of Country CSIRO acknowledges the Traditional Owners of the land, sea and waters, of the area that we live and work on across Australia. We acknowledge their continuing connection ...
-
Cyber Security Centre Manager, RMIT University, Australia, 23 days ago
Overview: Full Time, Continuing Position Salary HEW 8 ($116,551 - $128,688) + 17% Superannuation and Flexible Working Arrangements Based at the City campus and hybrid ways of working About You The...
-
<! Ko If: Is Job Title Visible > Tafe Sessional <! /Ko > <! Ko If: Is Already Applied Visible ><! /Ko > , Victoria University, Australia, 17 days ago
Job Info Job Identification 17554 Job Category TAFE Sessional Posting Date 03/06/2026, 02:28 AM Locations City Tower Apply Before 03/23/2026, 08:55 AM Job Schedule Part time Job Description About ...
-
Placement Officer, Australian Catholic University, Australia, about 20 hours ago
Campus Location: Melbourne, Australia Job No: APTAV103059#014 Do you have strong organisational skills and thrive in a fast-paced environment? A multi-faceted role with opportunity to engage inter...
-
Manager, Cx Capability, RMIT University, Australia, about 13 hours ago
Overview: Full-time, Fixed Term position until 31st of March 2027 Enjoy a competitive salary package + 17% Superannuation and Flexible Working Arrangements Based at the Melbourne CBD campus, and h...