Are you an experienced developer with a ˜can do™ attitude and enthusiasm that inspires others? Do you enjoy being part of a team that works with a diverse range of technology? About the Team:LexisNexis Legal & Professional, which serves customers in more than 150 countries with 11,800 employees worldwide, is part of RELX (www.relx.com), a global provider of information-based analytics and decision tools for professional and business customers. Our company has been a long-time leader in deploying AI and advanced technologies to the legal market to improve productivity and transform the overall business and practice of law, deploying ethical and powerful generative AI solutions with a flexible, multi-model approach that prioritizes using the best model from today's top model creators for each individual legal use case.
The company employs over 2,000 technologists, data scientists, and experts to develop, test, and validate solutions in line with RELX Responsible AI Principles (https://stories.relx.com/responsible-ai-principles/index.html).
About the Role:Join our team to help build state-of-the-art research tools. Our Data Science teams focus on extracting key information such as entities mentioned, sentiment analysis, data enrichments, predictive insights, and more to build best in class data and news streams relied on by our global customer base. This role leads multimodal model strategy (vision + language + layout) and multi‑agent collaboration (task decomposition, verification, conflict reconciliation, feedback loops) and plans future customized training and ongoing optimization of models.
Responsibilities - Pipelines & preprocessing: scalable cleaning, OCR / layout normalization, early quality gating.
- Labeling + active learning loop: strategic sampling, quality scoring, continuous feedback integration.
- Training & inference engineering: sample automation, feature generation, resource orchestration, reliability & monitoring.
- Serving & optimization: multi‑model routing, caching / indexing, elastic scaling, performance & cost efficiency.
Requirements: - Bachelor's or above in Computer Science, Software Engineering, Information Systems, Data Engineering or related.
- 3-6 years in data / platform or backend engineering; practical ML or multimodal data project exposure.
- Strong experience with data modeling, batch / streaming processing, distributed systems fundamentals.
- Experienced with data cleaning & format transformation; multimodal sample construction & efficient storage.
- Strong understanding multimodal training data patterns: balancing, segmentation, structural tagging, negative samples & quality metrics.
- Experienced o bservability: integrated logs / metrics / tracing closed loop. SQL, data warehousing, object storage, columnar & vector index structures.
- Demonstrates robust Python experience (data processing, concurrency / async, performance profiling, packaging & environment isolation). Linux CLI & bash scripting: files / permissions / processes, network & IO diagnostics, automation and troubleshooting.
Nice to Have: - Experience with cloud-based data platforms (e.g., AWS, GCP, Azure) for large-scale machine learning workflows.
- Familiarity with MLOps tools and practices (e.g., MLflow, Kubeflow, Airflow) for
U.S. National Base Pay Range: $95,300 - $158,800. Geographic differentials may apply in some locations to better reflect local market rates.
This job is eligible for an annual incentive bonus.
We know your well-being and happiness are key to a long and successful career. We are delighted to offer country specific benefits. Click here to access benefits specific to your location.