Demo

Research Scientist – Speech and Audio Understanding (Large Models & Multimodal Systems)

Tencent
Bellevue, WA Full Time
POSTED ON 1/28/2026 CLOSED ON 3/28/2026

What are the responsibilities and job description for the Research Scientist – Speech and Audio Understanding (Large Models & Multimodal Systems) position at Tencent?

Research Scientist – Speech and Audio Understanding (Large Models & Multimodal Systems) Onsite US-Washington-Bellevue Full time Posted 4 Days Ago R105611 Business Unit What the Role Entails Job Responsibilities: We are building large-scale, native multimodal model systems that jointly support vision, audio, and text to enable comprehensive perception and understanding of the physical world. You will join the core research team focused on speech and audio, contributing to the following key research areas: Develop general-purpose, end-to-end large speech models covering multilingual automatic speech recognition (ASR), speech translation, speech synthesis, paralinguistic understanding, and general audio understanding. Advance research on speech representation learning and encoder/decoder architectures to build unified acoustic representations for multi-task and multimodal applications. Explore representation alignment and fusion mechanisms between audio/speech and other modalities in large multimodal models, enabling joint modeling with image and text. Build and maintain high-quality multimodal speech datasets, including automatic annotation and data synthesis technologies. Who We Look For Ph.D. in Computer Science, Electrical Engineering, Artificial Intelligence, Linguistics, or a related field; or Master’s degree with several years of relevant experience. Solid understanding of speech and audio signal processing, acoustic modeling, language modeling, and large model architectures. Proficient in one or more core speech system development pipelines such as ASR, TTS, or speech translation; experience with multilingual, multitask, or end-to-end systems is a plus. Candidates with in-depth research or practical experience in the following areas are strongly preferred: Speech representation pretraining (e.g., HuBERT, Wav2Vec, Whisper) Multimodal alignment and cross-modal modeling (e.g., audio-visual-text) Experience driving state-of-the-art (SOTA) performance on audio understanding tasks with large models Proficient in deep learning frameworks such as PyTorch or TensorFlow; experience with large-scale training and distributed systems is a plus. Familiar with Transformer-based architectures and their applications in speech and multimodal training/inference. Location State(s) US-Washington-Bellevue The expected base pay range for this position in the location(s) listed above is $122,500.00 to $229,700.00 per year. Actual pay may vary depending on job-related knowledge, skills, and experience. Employees hired for this position may be eligible for a sign on payment, relocation package, and restricted stock units, which will be evaluated on a case-by-case basis. Subject to the terms and conditions of the plans in effect, hired applicants are also eligible for medical, dental, vision, life and disability benefits, and participation in the Company’s 401(k) plan. The Employee is also eligible for up to 15 to 25 days of vacation per year (depending on the employee’s tenure), up to 13 days of holidays throughout the calendar year, and up to 10 days of paid sick leave per year. Your benefits may be adjusted to reflect your location, employment status, duration of employment with the company, and position level. Benefits may also be pro-rated for those who start working during the calendar year. Equal Employment Opportunity at Tencent As an equal opportunity employer, we firmly believe that diverse voices fuel our innovation and allow us to better serve our users and the community. We foster an environment where every employee of Tencent feels supported and inspired to achieve individual and common goals.

Salary : $122,500 - $229,700

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Research Scientist – Speech and Audio Understanding (Large Models & Multimodal Systems)?

Sign up to receive alerts about other jobs on the Research Scientist – Speech and Audio Understanding (Large Models & Multimodal Systems) career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$108,245 - $136,486
Income Estimation: 
$136,683 - $171,343
Income Estimation: 
$64,451 - $83,138
Income Estimation: 
$74,029 - $94,382
Income Estimation: 
$74,029 - $94,382
Income Estimation: 
$91,459 - $117,736
Income Estimation: 
$96,123 - $134,937
Income Estimation: 
$74,073 - $107,266
Income Estimation: 
$91,459 - $117,736
Income Estimation: 
$96,123 - $134,937
This job has expired.
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Tencent

  • Tencent Bellevue, WA
  • Research Scientist - Speech & Audio Understanding (Speech Generation) Onsite US-Washington-Bellevue Full time Posted 4 Days Ago R105612 Business Unit What ... more
  • 4 Months Ago

  • Tencent Bellevue, WA
  • AGI Model Architect / Research Scientist in AGI Model Architecture Onsite US-Washington-Bellevue Full time Posted Today R105610 Business Unit What the Role... more
  • 4 Months Ago

  • Tencent Bellevue, WA
  • Research Internship- Multimodal LLM (Speech/Music/Audio/Vision/Language) Onsite US-Washington-Bellevue Full time Posted 4 Days Ago R106334 Business Unit Te... more
  • 4 Months Ago

  • Tencent Bellevue, WA
  • Hunyuan AIGC Algorithm Researcher (World Model Foundation Direction) Onsite US-Washington-Bellevue Full time Posted Yesterday R106612 Business Unit What th... more
  • 4 Months Ago


Not the job you're looking for? Here are some other Research Scientist – Speech and Audio Understanding (Large Models & Multimodal Systems) jobs in the Bellevue, WA area that may be a better fit.

  • Tencent Bellevue, WA
  • Research Scientist - Speech & Audio Understanding (Speech Generation) Onsite US-Washington-Bellevue Full time Posted 4 Days Ago R105612 Business Unit What ... more
  • 4 Months Ago

  • Meta Seattle, WA
  • The GenAI Speech team at Meta is currently looking for Research Scientist interns. Our team creates spoken language technology to make it faster and easier... more
  • 4 Months Ago

AI Assistant is available now!

Feel free to start your new journey!