[Remote] Staff Data Engineer - Emerald

Remote Full-time Now Hiring

Note: The job is a remote job and is open to candidates in USA. H1 is dedicated to providing optimal healthcare information access and is seeking a Staff Data Engineer for their Emerald team. This role involves leading the architecture and scalability of H1’s healthcare entity resolution platform while managing a small team and collaborating with various stakeholders to enhance the platform's efficiency and accuracy.

Responsibilities

Lead the design, optimization, and scalability of distributed Spark/PySpark pipelines powering entity resolution and large-scale healthcare data processing
Own systems supporting automatching, identity mapping, grouping logic, deduplication, enrichment, and auto-approval workflows across healthcare provider and organization datasets
Build and maintain scalable processing frameworks for PubMed, clinical trial, ct.gov, conference, and other healthcare data sources
Drive infrastructure optimization initiatives focused on improving throughput, runtime, observability, and cloud compute cost efficiency
Partner closely with AI/ML teams to integrate matching and resolution models into EMERALD and improve matching precision and recall
Lead complex technical initiatives from architecture and design through deployment, monitoring, and long-term production support
Serve as a technical leader and mentor across the team through code reviews, technical guidance, and engineering best practices
Collaborate directly with Product and business stakeholders to align technical solutions with operational and customer needs
Support production operations, incident response, troubleshooting, and ongoing platform reliability

Skills

8+ years of experience building and maintaining large-scale distributed data systems and pipelines
Demonstrated technical leadership experience mentoring engineers and driving complex technical initiatives
Extensive experience with Apache Spark and AWS-based big data technologies including EMR, S3, and distributed compute environments
Strong coding experience in Python (PySpark), Scala, Java, or equivalent languages used for distributed processing systems
Experience optimizing large-scale Spark workloads for performance, scalability, and infrastructure cost efficiency
Experience with streaming and event-driven architectures using technologies such as Kafka or Spark Streaming
Experience with orchestration and lakehouse technologies such as Argo and Hudi or comparable platforms
Experience with containerization and infrastructure technologies such as Docker, Kubernetes, and Terraform
Experience working with relational or distributed databases such as PostgreSQL or Redshift
Proven ability to operate effectively within highly scalable, production-grade distributed systems
Deep expertise with distributed data processing frameworks such as Apache Spark and Hadoop, particularly within AWS environments
Strong proficiency in Python (PySpark), Scala, Java, or other modern programming languages used for large-scale distributed processing
Experience building scalable ETL/ELT frameworks across both batch and streaming architectures
Strong understanding of distributed file formats including Apache Parquet and Apache AVRO
Experience with streaming technologies such as Kafka, Spark Streaming, or KSQL
Strong grasp of software engineering fundamentals including distributed systems, data structures, concurrency, and system design
Experience performing root cause analysis across large-scale distributed systems and complex data pipelines
Ability to write clean, maintainable, modular, and production-grade code
Experience improving performance, scalability, observability, and infrastructure efficiency within distributed systems
Strong communication and collaboration skills across both technical and non-technical stakeholders
Familiarity with modern development and infrastructure tooling including Git, CI/CD pipelines, Docker, Kubernetes, Terraform, Argo, Hudi, and JIRA
Experience with entity resolution, identity mapping, automatching, deduplication, or large-scale matching systems is strongly preferred
Experience working with healthcare, life sciences, Real World Evidence (RWE), or large-scale healthcare datasets is strongly preferred

Benefits

Stock options
Full suite of health insurance options
Generous paid time off
Pre-planned company-wide wellness holidays
Retirement options
Health & charitable donation stipends
Impactful Business Resource Groups
Flexible work hours & the opportunity to work from anywhere
The opportunity to work with leading biotech and life sciences companies in an innovative industry with a mission to improve healthcare around the globe

Company Overview

H1 is on a mission to connect the world with the right doctors. It was founded in 2017, and is headquartered in New York, New York, USA, with a workforce of 201-500 employees. Its website is https://www.h1.co.

Company H1B Sponsorship

H1 has a track record of offering H1B sponsorships, with 5 in 2025, 6 in 2024, 4 in 2023, 9 in 2022, 7 in 2021. Please note that this does not guarantee sponsorship for this specific role.

Apply for This Position

[Remote] Staff Data Engineer - Emerald

You Might Also Like

[Remote] National Account Manager (Flexible Schedule | Fully Remote)

[Remote] Lead Marketing Copywriter – B2B Tech - Remote Work

[Remote] Business Development Representative

[Remote] GCP Data Engineer

[Remote] Business Development Representative

Associate Vice President, Project Management Office

[Remote] Associate/Mid Level/Senior Analytics Reporting Analyst

Technical Recruiter/Coordinator

Outside Sales Representative

Business Account Executive, TTR - Jacksonville, FL

Business Development Account Executive

Automotive Digital Marketing Sales Executive (REMOTE)

Manager, Donor Development

Marine Mechanical Engineer (Auxiliary), SME

Home Infusion Nurse - Per Diem - Newark

Experienced Junior Data Entry Representative – Remote Opportunity at careerzynith

Director, BigFuture Communications

Virtual Front Desk Coordinator at BREAKINGTHROUGH SAN DIEGO PHYSICAL THERAPY Remote

Experienced Customer Service Representative – Remote Opportunity with careerzynith

HR Business Partner, Human Resources

LPN Medication Reconciliation