What are the responsibilities and job description for the Senior Software Engineer, Data position at Jobright.ai?
Verified Job On Employer Career Site
Job Summary:
The Allen Institute for AI (Ai2) is hiring a Senior Software Engineer specializing in Data to help integrate a large U.S. patent corpus into the Semantic Scholar platform. The role involves building scalable data pipelines, developing and deploying machine learning models, and contributing to data quality evaluation tools.
Responsibilities:
• Build scalable data pipelines (Airflow) for citation resolution and corpus integration
• Develop and deploy lightweight ML models for inventor disambiguation and author linking
• Train or adapt a topic model to classify patents using titles, abstracts, claims, and specs
• Extend REST APIs to expose linked metadata and topic classifications
• Contribute to dashboards and tools for evaluating data quality and model precision
• Collaborate with Ai2 engineers to ensure maintainability, test coverage, and robust deployment
• Produce reliable, well-documented code and contribute technical designs that support long-term maintainability
Qualifications:
Required:
• Bachelor's degree and 8 years of technical experience; relevant experience may substitute for education.
• Strong Python engineering skills, especially for building and maintaining data pipelines
• Experience with SQL and schema design in production settings (PostgreSQL preferred)
• Familiarity with common ML workflows (training classifiers, tuning models, and deploying for inference), particularly for large-scale or ambiguous structured datasets
• Comfortable working with structured datasets (XML/JSON/Parquet) and writing ETL code
• Experience with workflow orchestration tools (Airflow or similar) and cloud infrastructure (e.g. AWS, S3, Docker)
• Strong communicator and a strong sense of ownership for results
Preferred:
• Experience with author disambiguation, entity resolution, or record linkage problems
• Experience applying vector-based similarity or topic modeling techniques to real-world corpora at scale
• Exposure to citation networks or scholarly data systems (e.g., arXiv, OpenAlex, USPTO)
• Comfort building internal APIs and dashboards to support ML and data quality review
Company:
We are a Seattle-based non-profit AI research institute founded in 2014 by the late Paul Allen. Founded in 2014, the company is headquartered in Seattle, Washington, USA, with a team of 201-500 employees. The company is currently Growth Stage.