Demo

Research Internship- Multimodal LLM (Speech/Music/Audio/Vision/Language)

Tencent
Bellevue, WA Full Time
POSTED ON 1/28/2026 CLOSED ON 3/28/2026

What are the responsibilities and job description for the Research Internship- Multimodal LLM (Speech/Music/Audio/Vision/Language) position at Tencent?

Research Internship- Multimodal LLM (Speech/Music/Audio/Vision/Language) Onsite US-Washington-Bellevue Full time Posted 4 Days Ago R106334 Business Unit Technology Engineering Group (TEG) is responsible for supporting the company and its business groups on technology and operational platforms, as well as the construction and operation of R&D management and data centers, TEG provides users with a full range of customer services. As the operator of the largest networking, devices, and data center in Asia,TEG also leads the Tencent Technology Committee in strengthening infrastructure R&D through internal and distributed open source collaboration, constructing new platforms and supporting business innovation. What the Role Entails About Tencent AI Lab at Seattle Area Tencent is a leading internet company in China. Tencent AI Lab at Seattle Area was established in May 2017. The lab strives to continuously improve AI's capability in perception, cognition, and creativity. Researchers there aim at solving challenging real-world problems with advanced technologies and publish extensively at top conferences and journals. Research Internship: Multimodal LLM (Speech/Music/Audio/Vision/Language) Tencent AI Lab is dedicated to advancing cutting-edge AI technologies, with a particular focus on innovative breakthroughs in large foundation models. The lab's long-term ambition is to drive the development of Artificial General Intelligence (AGI), and ultimately, Artificial Superintelligence (ASI). We are seeking research interns who are interested in developing novel speech/music/audio/vision/language processing techniques and large multimodal models for our Seattle area office located at Bellevue WA for the year 2026. Every research intern will work with researchers on a research project aimed at attacking one of the core problems by inventing cutting edge techniques. We encourage discussions and collaborations between researchers and interns. Interns are also encouraged to publish the results from the internship. Our projects span a wide range of areas, including developing more effective multimodal pretraining and post-training strategies for audio, speech, music, image, and video understanding and generation. We aim to enable fully duplex conversations, design more efficient large-model architectures, enhance multimodal memory and reasoning capabilities, and advance novel audio, speech, music, image, and video processing techniques—such as encoding, tokenization, and representation learning—with a focus on multimodal applications and end-to-end large models. Who We Look For Requirements & Qualifications The ideal intern candidates are those who are Ph.D. students in computer science, electrical engineering, mathematics or a related field, are self-motivated and excited about developing novel techniques, have research experiences in natural language processing, speech, audio, and music processing, computer vision, dialog system, or machine learning, have good publication track records and history of creativity and intellectual flexibility, can program skillfully in Python and/or C++ and have experiences in using one of the leading deep learning toolkits. Intern duration: 3 months (with the possibility of extension). Can start any time in the year 2026. Location State(s) US-Washington-Bellevue The expected base pay range for this position in the location(s) listed above is $80,168.40 to $124,800.00 per year. Actual pay may vary depending on job-related knowledge, skills, and experience. This position will be eligible for 1 hour of paid sick leave for every 30 hours worked and up to 13 paid holidays throughout the calendar year. Subject to the terms and conditions of the applicable plans then in effect, full-time interns are also eligible to enroll in the Company-sponsored medical plan. Equal Employment Opportunity at Tencent As an equal opportunity employer, we firmly believe that diverse voices fuel our innovation and allow us to better serve our users and the community. We foster an environment where every employee of Tencent feels supported and inspired to achieve individual and common goals.

Salary : $80,168 - $124,800

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Research Internship- Multimodal LLM (Speech/Music/Audio/Vision/Language)?

Sign up to receive alerts about other jobs on the Research Internship- Multimodal LLM (Speech/Music/Audio/Vision/Language) career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$56,065 - $75,749
Income Estimation: 
$98,735 - $185,128
Income Estimation: 
$302,228 - $379,575
Income Estimation: 
$68,596 - $101,765
Income Estimation: 
$58,530 - $79,170
Income Estimation: 
$64,451 - $83,138
Income Estimation: 
$74,029 - $94,382
Income Estimation: 
$74,029 - $94,382
Income Estimation: 
$91,459 - $117,736
Income Estimation: 
$53,054 - $70,103
Income Estimation: 
$62,307 - $82,426
Income Estimation: 
$91,459 - $117,736
Income Estimation: 
$96,123 - $134,937
This job has expired.
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Tencent

  • Tencent Bellevue, WA
  • Research Scientist - Speech & Audio Understanding (Speech Generation) Onsite US-Washington-Bellevue Full time Posted 4 Days Ago R105612 Business Unit What ... more
  • 4 Months Ago

  • Tencent Bellevue, WA
  • AGI Model Architect / Research Scientist in AGI Model Architecture Onsite US-Washington-Bellevue Full time Posted Today R105610 Business Unit What the Role... more
  • 4 Months Ago

  • Tencent Bellevue, WA
  • Research Scientist – Speech and Audio Understanding (Large Models & Multimodal Systems) Onsite US-Washington-Bellevue Full time Posted 4 Days Ago R105611 B... more
  • 4 Months Ago

  • Tencent Bellevue, WA
  • Hunyuan AIGC Algorithm Researcher (World Model Foundation Direction) Onsite US-Washington-Bellevue Full time Posted Yesterday R106612 Business Unit What th... more
  • 4 Months Ago


Not the job you're looking for? Here are some other Research Internship- Multimodal LLM (Speech/Music/Audio/Vision/Language) jobs in the Bellevue, WA area that may be a better fit.

AI Assistant is available now!

Feel free to start your new journey!