Logo
Tbwa Chiat/Day Inc

Member of Technical Staff - Image / Video Data Engineer Remote | Germany | USA

Tbwa Chiat/Day Inc, MS, United States

Save Job

Member of Technical Staff - Image / Video Data Engineer

Remote | Germany | USA

Black Forest Labs is a cutting-edge startup pioneering generative image and video models. Our team, which invented Stable Diffusion, Stable Video Diffusion, and FLUX.1, is currently looking for a strong candidate to join us in developing large-scale data pipelines for training frontier models.

Role:

  • Develop and maintain scalable infrastructure for large-scale image and video data acquisition
  • Manage and coordinate data transfers from various licensing partners
  • Implement and deploy state-of-the-art ML models for data cleaning, processing, and preparation
  • Implement scalable and efficient tools to visualize, cluster, and deeply understand the data
  • Optimize and parallelize data processing workflows to handle billion-scale datasets efficiently
  • Ensure data quality, diversity, and proper annotation (including captioning) for training readiness
  • Getting training data from alternative sources such as user preferences into trainable format
  • Work closely in the model development loop to update data as necessitated by the training trajectory

Ideal Experiences:

  • Proficiency in Python and various file systems for data intensive manipulation and analysis
  • Familiarity with cloud computing platforms (AWS, GCP, or Azure) and Slurm/HPC environments for distributed data processing
  • Experience with image and video processing libraries (e.g., OpenCV, FFmpeg)
  • Demonstrated ability to optimize and parallelize data processing workflows across CPUs and GPUs
  • Familiarity with data annotation and captioning processes for ML training datasets
  • Knowledge of machine learning techniques for data cleaning and preprocessing

Nice to have:

  • Background or keen interest in developing large-scale data acquisition systems
  • Experience with natural language processing for image/video captioning
  • Experience with data deduplication techniques at scale
  • Experience with big data processing frameworks (e.g., Apache Spark, Hadoop)
  • Understanding of ethical considerations in data collection and usage

Apply for this job

* indicates a required field

First Name *

Last Name *

Email *

Phone

Resume/CV

Accepted file types: pdf, doc, docx, txt, rtf

LinkedIn Profile

Website

#J-18808-Ljbffr