Software Engineer
Source: RemoteOK
AI Summary Powered by Gemini
This remote Software Engineer role focuses on designing and scaling data pipelines for foundation models, processing time series, logs, and event streams. Key requirements include deep experience in distributed systems and data engineering, with a focus on Python and Spark workflows for machine learning. The opportunity is interesting as it directly impacts the success of next-generation AI models by building robust data infrastructure.
Job Description
itD is seeking a Software Engineer to design and scale the data pipelines that power next-generation foundation models for machine-generated data, including time series, logs, and large-scale event streams. This role contributes directly to the success of model training and production systems by enabling reliable, high-performance data infrastructure at scale. The ideal candidate will bring deep experience in distributed systems and data engineering, along with a proven track record of delivering scalable, production-ready data pipelines that support machine learning workflows. Location: Remote (U.S.-based; time zone alignment with Pacific or Central preferred) We provide comprehensive medical benefits, a 401(k) plan, paid holidays, and more. Please note that we are only considering direct W2 candidates at this time, as we are unable to offer sponsorship. Responsibilities: ⢠Build and scale distributed data pipelines for large-scale time series, log data, and high-volume event streams. ⢠Design and maintain reliable, high-performance Spark and Python workflows to support model training datasets. ⢠Analyze and resolve performance bottlenecks related to latency, memory utilization, data skew, and throughput. ⢠Improve data quality, validation processes, and reproducibility for machine learning workloads. ⢠Partner with machine learning engineers and researchers toPlease mention the word UNDAUNTED and tag RMTMwLjYxLjMzLjkz when applying to show you read the job post completely (#RMTMwLjYxLjMzLjkz). This is a beta feature to avoid spam applicants. Companies can search these words to find applicants that read this and see they're human.
Full Description
itD is seeking a Software Engineer to design and scale the data pipelines that power next-generation foundation models for machine-generated data, including time series, logs, and large-scale event streams. This role contributes directly to the success of model training and production systems by enabling reliable, high-performance data infrastructure at scale. The ideal candidate will bring deep experience in distributed systems and data engineering, along with a proven track record of delivering scalable, production-ready data pipelines that support machine learning workflows. Location: Remote (U.S.-based; time zone alignment with Pacific or Central preferred) We provide comprehensive medical benefits, a 401(k) plan, paid holidays, and more. Please note that we are only considering direct W2 candidates at this time, as we are unable to offer sponsorship. Responsibilities: ⢠Build and scale distributed data pipelines for large-scale time series, log data, and high-volume event streams. ⢠Design and maintain reliable, high-performance Spark and Python workflows to support model training datasets. ⢠Analyze and resolve performance bottlenecks related to latency, memory utilization, data skew, and throughput. ⢠Improve data quality, validation processes, and reproducibility for machine learning workloads. ⢠Partner with machine learning engineers and researchers toPlease mention the word UNDAUNTED and tag RMTMwLjYxLjMzLjkz when applying to show you read the job post completely (#RMTMwLjYxLjMzLjkz). This is a beta feature to avoid spam applicants. Companies can search these words to find applicants that read this and see they're human.