Our Sr. Data Engineer role builds and maintains the platform that delivers accessible data to power decision-making at Thorlabs. This position is focused on making it really simple for our users to answer three questions: What happened in the past? What is happening now? And, what will happen in the future?
The ideal candidate:
- High-energy self-starter with experience and passion for data and big data scale processing. You enjoy working in fast-paced environments and love making an impact.
- Exceptional communicator with the ability to translate technical concepts into easy to understand language for our stakeholders.
- Excitement for working with a team; you value collaborating on problems, asking questions, delivering feedback, and supporting others in their goals whether they are in your vicinity or entire cities apart.
Essential Job Functions include the following, but are not limited to:
- Organizing data into our data lake in highly-optimized formats for fast query processing, and maintaining the security + quality of our datasets.
- Build reporting products that leverage the primitives to deliver simple and useful visualizations and datasets that power scalable transformation of data.
- Build and support end-user experiences for experimentation, data discovery, and business intelligence reporting.
- Operate and manage the data platform efficiently in a consistent and reliable manner. Build tools for other teams to leverage to encourage consistency and champion reliability across the platform.
- Build fail-proof data pipelines (ETL) to move data between different systems while maintaining integrity using Azure tools and cloud-based ETL development processes.
A Sr. Data Engineer at typically has 4-6 years of experience in one or more of the following areas:
- Working with the internals of a distributed compute engine (Spark, Presto, DBT, or Flink/Beam)
- Query optimization, resource allocation and management, and data lake performance (Presto, SQL)
- Cloud infrastructure (Azure, Google Cloud, Kubernetes, Terraform)
- Security products and methods (Apache Ranger, Apache Knox, OAuth, IAM, Kerberos)
- Deploying and scaling ML solutions using open-source frameworks (MLFlow, TFX, H2O, etc.)
- Background and practical experience in statistics and/or computational mathematics (Bayesian and Frequentist approaches, NumPy, PyMC3, etc.)
- Modern Big-Data storage technologies (Iceberg, Hudi, Delta)
- Experience connecting to varied data sources, on-prem and cloud.
- Excellent SQL coding experience with performance optimization for data queries.
- Understands different data models like normalized, de-normalized, stars, and snowflake models. Working experience with transactional, temporal, time series, and structured and unstructured data, Azure DataLake is a plus.
- Working experience with MPP DWH system like Azure synapse is highly desirable.
- Experience in deployment and maintenance of ETL Jobs.
- Is familiar with the reporting tools like Power Bi and SSRS or any other.
Thorlabs values its diverse environment and is proud to be an Equal Employment Opportunity/Affirmative Action Employer. All qualified individuals will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age or veteran status. Job descriptions are not intended as and do not create employment contracts. The organization maintains its status as an at-will employer. Employees can be terminated for any reason not prohibited by law.