Project Details
- Implemented a chatbot internally with the bank, build interface and now users can interact.
- It’s a RAG framework so instead of tuning it into the actual applications you can prompt it to give you prevectorized queries.
- Able to feed it documents even if they don’t know what your team is or who you are. Should have 1000 users by the end of the year and another 2000 next year.
Qualifications
Required Skills:
- Experience with major LLMs and inference workflows (examples: Llama 3, Mistral).
- Direct experience with VLLM or similar inference engines to handle batched requests.
- Strong Python development skills (Python 3.10+; team uses 3.12).
- Experience building APIs/endpoints with Flask or FastAPI to host/serve models.
- Experience with vector databases (e.g., Redis) and working knowledge of RAG pipelines.
- SQL skills for data management and querying.
- Ability to translate product/business requirements into technically feasible solutions (strong critical thinking and push-back where appropriate).
- Container and orchestration experience (Kubernetes/OpenShift) for CI/CD and deployment.
- Familiarity with CI/CD tools used by the team (XLR and Datical) or equivalent pipeline deployment experience.
- Solid understanding of GPUs and hardware constraints for model serving and scaling.
- Experience working in Agile teams and delivering in iterative cycles.
Preferred / Nice-to-Have Skills
- Experience with Nvidia Triton or other specific model-serving infrastructures (considered a strong plus).
- Familiarity with Java for building REST services that integrate with front-end UIs.
- Experience with additional inference tools or engines (e.g., experimental projects referenced as "Quinn").
- Prior work on scaling LLM services tied to constrained hardware budgets and efficient resource management.
- Scaling: The project's growth is tied to hardware availability. The initial deployment will be capped at 1,000 users, and scaling will only happen with more budget. This shows the importance of efficient resource management.
Skills: llms,python,kubernetes,it,skills,management,ci,cd,data,project,resource management,building,llama