HealthEdge commits to building an environment and culture that supports the diverse representation of our teams. We aspire to have an inclusive workplace. We aspire to be a place where all employees have the opportunity to belong, make an impact and deliver excellent software and services to our customers.
The Data Science team is looking for a Machine Learning Engineer who will be responsible for working closely with the Data Platform and DevOps teams to enable faster ML based application development and deployment. The ideal candidate will bring expertise in state-of-the-art tools and frameworks to build scalable and efficient solutions for data management, data pre-processing and data set building. Furthermore, the candidate will have experience in deploying Data and ML models into production environments.
What you will do:
• Create data pipelines and testing frameworks for research and production use.
• Create efficient and scalable solutions to support data management, data pre-processing and data set building.
• Contribute to the architecture and implementation of an efficient data ingestion, processing and storage pipeline.
• Help with data curation, data annotation, data quality efforts.
• Test and Deploy Data and ML products into production environments and support them post-production.
What you bring:
• 5+ years’ experience and a bachelor’s degree or 3+ years’ experience and a Masters in CS, data engineering or other related fields.
• 3+ years of experience writing maintainable, testable, production-grade Python code. Knowledge of PySpark is desirable.
• Proficient in SQL and creating ETL processes.
• Previous experience building efficient large-scale data collection, storage and processing pipelines.
• Knowledge of database systems, big data concepts and cluster computing frameworks (e.g. Spark, Hadoop, or other tools)
• Experience working in cloud learning environment, including deployment of models to production.
• Experience with Agile, Continuous Integration, Continuous Deployment, Test Driven Development, Git
• Understanding of time, RAM, and I/O scalability aspects of data science applications (e.g. CPU and GPU acceleration, operations on sparse arrays, model serialization and caching)
• Preferable experience in a health care and/or Insurance setting but not necessary.