A strong and vibrant research team with steady-stream publications in high-calibre venues is looking for a postdoctoral researcher (36-months fixed-term) to develop novel approaches to multi-modal audio-visual understanding in conversational settings. The position is funded by the EU project WeHear, a collaboration led by Denmark Technical University (DTU). The project will focus on audio-visual understanding for smart hearing aid, with low latency (real-time and predictive) on smart glasses
You will be working closely with Dima on her active research. Check Dima’s research interests and projects at: http://dimadamen.github.io/
Prior expertise in audio-visual perception and deep learning methods with a strong publication track record is expected, including first-author publications in CVPR/ICCV/ECCV/ICASSP/PAMI/IJCV/NeurIPS/ICLR.
Over the period of 36 months, you will be:
- Conducting novel research in multimodal audio-visual understanding – contributing novel research on designing, training and evaluating audio-visual understanding in conversational setting. This will include hands-on research using the latest deep learning approaches.
- Preparing API packages with low latency that will be integrated with partner demonstrations on quarterly basis.
- Presenting your work in regular meetings, taking feedback and integrating the goals of the proect into your individual research directions.
- Publishing in top-tier venues (conferences and journals). Communicating your work to the best possible audience.
- Collaborating with other researchers (postdocs and faculty) in the WeHear project.
- Co-advising junior PGR students
- PhD [near submission, submitted or graduated] in Multimodal Understanding, preferably with expertise in audio understanding, video understanding or multimodal visual models.
- Prior degree in computer science, engineering or mathematics
- Detailed knowledge of video understanding state-of-the-art, approaches, datasets and problems, preferably with expertise in egocentric datasets.
- Prior knowledge of egocentric audio-visual devices that work in real time like Meta Aria Glasses (Gen1 or Gen2) and Apple Vision Pro.
- Experience in handling audio-video data, for learning and inference
- Experience in modelling deep learning approaches
- Experience and evidence of publishing at high-calibre conferences and journals (at least one first-author paper in a major venue – CVPR/ICCV/ECCV/ICASSP/NeurIPs/PAMI/IJCV/Neurips/ICLR in the past 3 years).
- Excellent programming skills (Python)
- Proficiency in deep learning frameworks (PyTorch)
For informal queries please contact: Gozde Burger, Senior Research Administrator Email: [email protected]
To find out more about what it's like to work in the Faculty of Engineering, and how the Faculty supports people to achieve their potential, please see our staff blog:
https://engineering.blogs.bristol.ac.uk/category/engineering-includes-me/
Contract type: Open ended with fixed funding until 31/08/2029
Work pattern: Full time
Grade: J/Pathway 2
Salary: £43,482 - £50,253 per annum
School/Unit: School of Computer Science
This advert will close at 23:59 UK time on 08/07/2026
The interviews are anticipated to take place on 16/07/26
We recently launched our strategy to 2030 tying together our mission, vision and values.