What are the responsibilities and job description for the AI Engineer position at Yochana?
Role: AI Engineer
Location: Troy, MI (Onsite)
Duration: Contract or Full time
Visa: Dependent Visa
Major Responsibilities
- Develop, integrate, and deploy AI components including RAG pipelines, vector search, embeddings workflows, vector stores, and AI agents (tool/function calling, multi-step reasoning, workflow orchestration) using modern frameworks and best practices for production systems.
- Implement LLM applications using LangChain and LangGraph (or equivalent orchestration frameworks), including tool/function calling, multi-step workflows, guardrails, and evaluation.
- Evaluate and select models, embedding strategies, vector database options, and deployment approaches based on performance, cost, latency, reliability, and scalability requirements.
- Build and maintain model lifecycle capabilities including experimentation, versioning, packaging, deployment, and monitoring using MLflow (or equivalent) and robust MLOps/LLMOps practices.
- Integrate AI services into enterprise platforms using cloud-native patterns on Azure/AWS, including containerized deployments and scalable inference endpoints.
- Design and operate CI/CD pipelines for AI services, ensuring code quality via automated testing, reproducible builds, and environment parity.
- Verify, monitor, and optimize the performance of AI agents and chatbot/GenAI applications (model quality, drift, latency, retrieval relevance, agent success rates), and implement remediation strategies (retraining, prompt/model updates, rollback).
- Evaluation for RAG/agents (retrieval quality, grounding, hallucination rate, task success rate) and continuously improve performance through iterative experimentation.
- Design, build, and operate agentic solutions for enterprise and manufacturing use cases, including retrieval, planning, tool execution, and safe-action constraints (guardrails, allowlists, human-in-the-loop where needed).
- Work closely with data engineers and the cloud and DevOps team to produce, scale, and operate AI services.
- Collaborate with enterprise architects and domain experts to align AI/ML solutions with business needs and technical strategy.
- Ensure compliance with internal governance and IT security standards and apply Responsible AI principles (privacy, traceability, human-in-the-loop where appropriate, auditability).
Knowledge and Education
- Degree in Computer Science, Data Science, or a similar field.
- Strong fundamentals in software engineering, data structures/algorithms, statistics, and machine learning principles.
Work Experience
- 6–7 years of overall experience.
- 3–4 years hands-on delivering AI/ML and/or GenAI solutions (training/inference pipelines, RAG/agents, deployment, monitoring, iteration) in production.
- Hands-on experience with cloud-native platforms (Azure/AWS) and container orchestration (Kubernetes, Docker) is an advantage.
- Hands-on experience with Databricks (Azure Databricks and/or Databricks on AWS) for large-scale data processing, model training, and production pipelines.
- Experience with vector databases (Azure AI Search, OpenSearch, pgvector, DBX Vector Search, or equivalent).
- Experience with CI/CD, automated testing, and code quality best practices.
- Skills and Competencies
- Strong Python engineering skills; ability to build clean, testable, production-grade services and libraries.
- Generative AI: LangChain, LangGraph, prompt engineering, retrieval-augmented generation, embeddings, and LLM evaluation/observability concepts.
- Deep learning/ML frameworks: PyTorch and/or TensorFlow; applied experience with model training, fine-tuning, evaluation, and optimization.
- Vector databases and search experience with vector indexing/retrieval concepts and platforms such as Azure AI Search, OpenSearch, pgvector, or equivalent.
- MLOps/LLMOps: MLflow (or equivalent), model registry/versioning, reproducibility, monitoring, and operational readiness.
- Cloud: hands-on implementation on Azure, AWS, and Databricks (IAM/security, networking basics, managed compute, storage, logging/monitoring).
- Containers and orchestration: Docker, Kubernetes(or equivalent deployment tooling).
- CI/CD: Git-based workflows, automated testing, build pipelines, release management, and quality gates.
- Strong communication skills and cross-functional collaboration; ability to translate ambiguous business needs into clear technical deliverables.