Data Architect

Luxoft -
London

Apply Now

Job details

1 day ago

Qualifications

Data modelling
Azure
Computer Science
Relational databases
Big data
Data structures
NoSQL
English
Google Cloud Platform
SQL
Database design
AWS
Bachelor's degree
Machine learning
Distributed systems
Data pipeline
APIs
Scalability
Apache
Kafka
Metadata
AI
Communication skills
Graph databases
Python

Full job description

Project description

We are seeking a Data Architect to join our AI project team. In this role, you will design and implement the data architecture needed to support machine learning and AI solutions, including defining data models, storage patterns, and governance frameworks. You will ensure that data from various sources is well-organised, accessible, and AI-ready, working closely with data engineers and ML engineers to build robust data pipelines and maintain high data quality for analytics and model development.

Responsibilities

Data Modelling & Schema Design: Develop and maintain data models (conceptual, logical, and physical) that define how data is stored and related. This includes designing relational schemas, graph data models for knowledge graphs, and time-series data structures as needed, ensuring they accurately represent business entities and relationships. You will continually refine these models to meet AI use cases and evolving business requirements.

Data Storage Architecture: Define and implement data storage and management patterns that optimise data retrieval and analytics performance. This involves selecting or designing appropriate storage solutions (e.g. relational databases, NoSQL/graph databases, data warehouses, data lakes) and structuring them for scalability and fast access to large datasets used in AI projects. Ensure the architecture can handle structured and unstructured data and is cloud-ready for elasticity.

Data Pipelines & Integration: Build and oversee robust data pipelines (ETL/ELT processes) to integrate data from multiple sources into centralised platforms. You will design workflows to collect, transform, and load data into analytics repositories or feature stores, guaranteeing that AI models have consistent, well-prepared data to work with. This includes setting up stream processing for real-time data when required and automating pipeline orchestration for efficiency.

Data Governance & Quality: Establish and enforce data governance policies and standards. This means defining practices for data quality, data cleaning, and master data management, as well as setting security and privacy controls to protect sensitive information. You will ensure compliance with relevant data regulations and implement data security measures (e.g. access controls, encryption) and validation rules so that the data used in AI is trustworthy and compliant.

Metadata Management & Lineage: Implement frameworks for data metadata management and lineage tracking. This includes maintaining data catalogues or dictionaries that describe data meaning (possibly leveraging ontologies), and tools or processes to trace how data flows through pipelines and transformations. By providing transparency into data origins and transformations, you support model interpretability and enable troubleshooting of data issues, which is critical in AI development.

Collaboration with Engineering Teams: Work closely with data engineers, ML engineers, and data scientists to ensure the data architecture meets their needs. You will collaborate on designing data interfaces (e.g. APIs or query endpoints) and assist in shaping how data is used for features in machine learning. This role requires translating requirements between data teams and ML teams, and jointly resolving issues to streamline the path from raw data to AI insights.

Performance Optimisation & Scaling: Monitor the performance and scalability of the data infrastructure, and tune it as the AI project grows. Optimise database queries, indexing, and storage layouts for faster model training and inference data access. Plan for scale by leveraging cloud capabilities (compute, storage) and manage costs effectively, adjusting architectures (partitioning, caching, etc.) to maintain efficient, cost-effective operations as data volumes increase. You may also evaluate new technologies (e.g. distributed computing frameworks or new databases) and incorporate them to continually improve the architecture.

Skills

Must have

Education: Bachelor's degree in Computer Science, Information Systems, or a related field (or equivalent professional experience). An advanced degree is a plus but not required.

Experience: Approximately 3-5 years of experience in data architecture, data engineering, or a related data management role. A proven track record in designing data solutions and managing data schemas is expected.

Data Modelling & Databases: Strong proficiency in data modelling and database design. You should be comfortable creating ER diagrams and defining relational schema, as well as working with NoSQL databases (e.g. document or graph databases). Practical experience with SQL and at least one relational database is required, and familiarity with other data store types (such as graph or time-series databases) is highly valued.

Data Pipeline Development: Hands-on experience developing data pipelines and integration workflows. This includes proficiency in ETL/ELT tools or frameworks (or custom scripting with Python/SQL) to gather and transform data. You should understand how to optimise data flow and have experience with batch processing; experience with real-time streaming data (e.g. using Kafka or equivalent) is a plus.

Cloud Data Platforms: Experience working with cloud-based data platforms or big data technologies. While our approach is cloud-agnostic, you should be familiar with concepts like data lakes, data warehouses, and distributed computing in a cloud environment (e.g. using AWS, Azure, or GCP services). The ability to design solutions that leverage cloud scalability and tools for storage and processing is important.

Data Governance & Security: Solid understanding of data governance principles and best practices. You should be knowledgeable about data privacy regulations and data protection techniques, ensuring compliance in how data is stored and used. Experience implementing data quality checks, defining data standards, and using or setting up metadata management tools will be useful.

Communication & Teamwork: Excellent communication skills with the ability to collaborate in cross-functional teams. You should be able to translate complex data architecture concepts into clear terms for project managers or stakeholders, and work closely with engineering teams to guide implementation. Problem-solving aptitude and a willingness to mentor junior data team members are also important in our collaborative environment.

Nice to have

AI/ML Project Involvement: Experience working on projects that involve AI or machine learning, where you partnered with data scientists or ML engineers. For example, having supported an ML model deployment by providing well-structured data and ensuring data reliability. This background will help you anticipate the needs of AI initiatives and design data architectures that facilitate model training and inference.

Data Governance Tools: Familiarity with data governance or data cataloguing tools (such as Collibra, Alation, or Apache Atlas) and lineage-tracking systems. Hands-on experience setting up or maintaining a data catalogue, documenting data definitions, or automating data lineage capture is a strong plus, as it shows ability to operationalise governance and transparency in data ecosystems.

Ontologies & Knowledge Graphs: Exposure to semantic data modelling, ontologies, or knowledge graph construction. Experience in structuring data with ontologies (e.g. using RDF/OWL standards) or implementing a knowledge graph to link datasets can be very beneficial, since it helps in creating a unified data vocabulary and enriches the context for AI models.

Modern Data Architecture Patterns: Experience with modern data architecture concepts and patterns. This could include implementing or working with data lakehouse architectures (combining data lake flexibility with data warehouse performance), data mesh principles (decentralising data ownership to domain teams), or event-driven/streaming architectures. Familiarity with these approaches demonstrates adaptability and knowledge of cutting-edge solutions for handling complex data workflows.

Certifications: Relevant industry certifications are advantageous. Certifications such as AWS/Azure/GCP data engineering certifications, Certified Data Management Professional (CDMP), or other credentials in data architecture and cloud services show validated expertise and a commitment to staying current with technology developments. While not mandatory, they could strengthen your candidacy.

Other

Languages

English: C1 Advanced

Seniority

Senior

London, United Kingdom of Great Britain and Northern Ireland

Req. VR-123709

AI/ML

BCM Industry

26/06/2026

Req. VR-123709

Apply Now

Project description

Responsibilities

Skills

Other

Jobseeker tools

Employer Tools

Browse

Stay Connected