Careers
About OMSF
The Open Molecular Software Foundation (OMSF) is a US-based 501(c)(3) nonprofit organization focused on facilitating collaboration and software development for molecular sciences. OMSF aims to create a cost-effective organization to support an ecosystem of open source projects in molecular sciences, build expertise around managing and governing these projects, explore pathways to sustainability, and accelerate innovation by sharing our knowledge and tools under open licenses.
All OMSF positions are currently remote and we generally accept applications worldwide, provided that the OMSF will be able to enter in an appropriate working agreement with the applicant in their designated country of residence. Some positions may be restricted to the US so please make sure to read the full job description before applying. Please note that you will need to have a valid work permit, as OMSF has a limited ability to sponsor any relocation visas at this stage. Please reach out to discuss visa options, if needed. Depending on your location, you might occasionally need to participate in meetings outside of normal business hours due to time zone differences.
When applying, please use the application forms linked below. Please reach out to careers@omsf.io if you have any additional questions or need help.
OpenADMET Science Lead – Machine Learning Models and Active Learning
Job Description
The Open Molecular Software Foundation is seeking a Science Lead for the OpenADMET project, specializing in Machine Learning Models and Active Learning Strategies. The Science Lead will be responsible for overseeing the selection, implementation, and refinement of machine learning models and active learning strategies to drive the development of open source ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) models built from both public datasets and newly generated open ADMET datasets generated by the collaborative OpenADMET project. This position supervises and collaborates with scientific roles on the OMSF OpenADMET team, collaborates closely with the OpenADMET technical team to aid the development of robust training and inference infrastructure, and provides domain expertise for optimizing machine learning model development and deployment. This role is also instrumental in recruiting and liaising with a Technical Advisory Committee consisting of experts from academia and industry to advise on model strategies that maximize the utility of new open ADMET data generated by the project.
The Science Lead will work remotely in collaboration with diverse teams, primarily based in Europe, Australia, and the United States, and may require occasional travel. This fully remote, grant-funded position has a minimum duration of 2 years, with potential for extension based on available funding. The Science Lead will report directly to the OpenADMET Project Director. Learn more about OpenADMET here.
Key Responsibilities
Machine Learning Model Development and Oversight:
- Lead, manage, and contribute to efforts to automate training, assessment, and deployment of high-quality ligand-based (2D) ADMET models, ensuring state-of-the-art predictive capabilities.
- Oversee the implementation of advanced structure-based (3D) ADMET models to leverage protein-ligand structural data, enabling accurate prediction of ADMET properties and enhancing generalizability across species and human genetic variations. This will involve collaboration with the OMSF OpenFold team (OpenFold.io).
- Develop multitask models, uncertainty prediction approaches (e.g., model ensembling, Gaussian Process regression), and mechanistic models to optimize predictive performance and data utilization.
- Drive the integration of new machine learning architectures and active learning strategies, leveraging recent innovations and literature as well as blind community challenges to ensure models remain at the cutting edge of accuracy and reliability.
Active Learning and Data Curation
- Establish and manage a continuous, automated active learning service to prioritize compounds for ADMET assays and structural biology, guiding data acquisition efforts for optimal model enhancement.
- Develop strategies for rapid model re-training to incorporate new data, maximizing model accuracy on chemical spaces of interest while maintaining cost-efficiency.
- Collaborate with internal and external stakeholders to refine active learning pipelines, ensuring model feedback loops are effectively managed.
Scientific Leadership and Collaboration
- Supervise scientific roles on the OpenADMET team, providing guidance on machine learning model design, data management, and active learning best practices.
- Work closely with the technical team to ensure infrastructure supports efficient model training, deployment, validation, and active learning cycles.
- Work closely with experimental groups
- Engage with the Technical Advisory Committee and other experts to refine project strategies, keeping OpenADMET’s machine learning approaches aligned with cutting-edge practices.
- Engage with the Industry Advisory Board to integrate feedback on model performance, impact, and utility from pharma, biotech, and software vendors using OpenADMET models
Documentation and Dissemination
- Document best practices in a “Living Best Practices Review” and other OMSF publications, contributing to a foundation for reproducible, scalable machine learning in drug discovery.
- Maintain and expand the software documentation for OpenADMET models and active learning processes, supporting both internal teams and external collaborators.
Qualifications
- Advanced degree in a relevant field such as data science, ML/AI, bioinformatics, or a related discipline (PhD preferred, but candidates with a Master’s degree and significant experience are welcome).
- Proven expertise in machine learning model development for small molecules properties, particularly for 2D ADMET applications or drug discovery.
- Strong background in active learning methodologies, model uncertainty assessment, and model deployment for scientific research.
- Familiarity with FAIR data principles, and experience in building reproducible data and model repositories.
- Experience leading scientific teams or managing multi-faceted research projects, ideally with interdisciplinary and remote collaboration.
- Excellent communication skills with the ability to engage cross-functional teams and advisory committees.
Preferred Qualifications
- Technical expertise in Python and machine learning frameworks (e.g., PyTorch, JAX, or other deep learning framework).
- Experience with cloud computing for machine learning (e.g., AWS SageMaker, AWS Batch, modal, lambda, anyscale, GCP Vertex, Azure ML).
- Experience with molecular data processing and modeling tools, such as RDKit, DeepChem, or similar.
- Understanding of the drug discovery lifecycle and the role of machine learning in drug discovery.
- Ability to understand and tune machine learning model performance.
- Familiarity with continuous integration and automated deployment tools (e.g., github actions, conda-forge, CI/CD for model pipelines).
- Knowledge of containerization and orchestration tools like Docker and Kubernetes for scalable model deployment.
- Experience contributing to or leading open source projects and publications in computational molecular sciences or related fields.
- Proven expertise in machine learning model development for small molecules properties, using 3D representations for ADMET applications or drug discovery.
- Comfortable working in a distributed global team.
Compensation
This is a full-time, fixed contract position with an expected gross salary range of $150,000-$165,000, depending on qualifications, experience, and location. OMSF provides standard benefits, including healthcare, retirement contributions, paid time off, and other employer contributions per local regulations. Salary will be negotiated in the local currency, varying by location due to differences in mandatory employer contributions.
Location
OMSF is a fully remote organization. For this role, we will accept only applications from candidates who are based in the US due to federal funding restrictions.
How to Apply
Please apply using this form.
OpenADMET Senior Scientist – Data Curation and Pipeline Integration
Job Description
The Open Molecular Software Foundation is seeking a Senior Scientist specializing in Data Curation and Pipeline Integration for the OpenADMET project. This role focuses on ensuring the accuracy, quality, and accessibility of ADMET data used for machine learning applications, while developing and optimizing data workflows for seamless data ingestion, processing, and dissemination. The Senior Scientist will collaborate with experimental scientists at Octant/UCSF and the OpenADMET team to establish best practices for data curation and workflow integration. This includes preparing data in machine learning ready formats and managing data deposition into FAIR (Findable, Accessible, Interoperable, Reusable) repositories and ensuring the quality and reliability of all ADMET dataset classes.
This fully remote, grant-funded position has a minimum duration of 2 years, with potential for extension depending on available funding. Due to time zone differences, occasional meetings outside standard business hours may be necessary. Our teams are primarily based in Europe, Australia, and the United States, and some travel may occasionally be required.
The Senior Scientist – Data Curation and Pipeline Integration will report directly to the OpenADMET project lead, while closely collaborating with the OpenADMET ML Infrastructure team, project stakeholders, and external partners. This role is ideal for a resourceful individual eager to shape data-driven projects that advance ADMET research in a collaborative, open-source environment. Learn more about OpenADMET here.
Key Responsibilities
Data Curation and Quality Assurance
- Manage, curate, and maintain high-quality ADMET datasets in alignment with FAIR principles, ensuring accuracy and accessibility for machine learning applications.
- Collaborate with experimental scientists to establish and document best practices for data curation, validation, and dissemination.
- Develop quality checks and validation mechanisms to ensure data reliability across various ADMET dataset classes. Pipeline Development and Optimization*
- Design, implement, and optimize data ingestion and processing workflows using coding best practices (e.g., CI/CD, automated documentation generation, version control) for OpenADMET activities, with a focus on automation and efficiency.
- Coordinate with OpenADMET team members to extract, process, and integrate data from public databases and FAIR repositories.
- Implement strategies to store data in performant and reliable data warehouse/lake
- Manage public releases of data into industry standard public databases (e.g, ChEMBL, PubChem, Therapeutic Data Commons)
- Develop best practice SOWs for automated data workflows, including ingestion, preparation, and deposition into FAIR repositories, in collaboration with stakeholders.
Collaboration and Dissemination
- Engage with cross-functional and cross-institutional teams to align data workflows with machine learning and data analysis requirements.
- Assist with onboarding and training team members in data curation and workflow management best practices.
- Support dissemination of data to public repositories and other channels to maximize impact and usability.
Qualifications
- Advanced degree in a relevant field such as computational chemistry, data science, bioinformatics, or a related discipline (PhD preferred, but candidates with a Master’s degree and significant experience are welcome).
- Proven experience with data curation, validation, and dissemination practices, ideally within scientific or ADMET-focused datasets.
- Strong understanding of data workflows and pipeline optimization for scientific research applications, particularly with experience in automated data ingestion and FAIR data principles.
- Familiarity with remote work environments and tools (Google Workspace, Slack, GitHub, etc.), and experience working with distributed teams.
- Excellent communication skills and the ability to work across interdisciplinary teams to address complex data challenges.
Preferred Qualifications
- Technical expertise in Python, SQL (and its variants), containerization (e.g., Docker, Kubernetes), and Git
- Experience with cloud computing (e.g., AWS, Azure).
- Experience with molecular data processing and modeling tools, such as RDKit, DeepChem, or similar.
- Experience developing and managing FAIR data repositories or similar data-sharing infrastructures.
- Background in data quality assurance, especially in ensuring accuracy for machine learning training and benchmarking.
- Previous involvement in open-source, collaborative, and remote scientific projects.
- Comfortable working in a distributed global team
Compensation
This is a full-time, fixed contract position with an expected gross salary range of $100,000-$125,000, depending on qualifications, experience, and location. OMSF offers standard benefits, including healthcare, retirement contributions, paid time off, and other employer contributions per local regulations. Salary will be negotiated in the local currency and may vary by location due to differences in mandatory employer contributions.
How to Apply
Please apply using this form.
OMSF Expression of Interest
We are always looking for enthusiastic candidates to contribute to open source projects in molecular sciences, even if we don't have any available openings. If you are interested in building your career as a software scientist, project or product manager, documentation writer, etc, please fill out this form or reach out to careers@omsf.io.