Interested in a Data Engineer ?
- Are you into crunching tones of data?
- Can you architect a petabyte size data lake?
- Do you care about sub-second query latency of billions of records?
- Interested in steaming processing?
If so, you are in the right place. I’ve collected everything you need to jump start your career as a data engineer.
But lets first start with the basics –
What is a Data Engineering?
Data Engineering is the practice of – collecting, storing, processing and enabling data.
A Data Platform – is a collection of tools ensembled together to achieve the above practice and is the main product of the data engineer.
- Collecting – as our life become more and more digitized, collecting the data from the different sensors for further processing is the most basic part of the data platform.
- Storing – in the past we used to have relational database that stored the organization’s transactions. As the data become bigger, faster – new tools for storing and maintaining data had been created.
- Processing – data rarely arrives in the optimal structure, there’s usually – manipulation is needed, integration with other data sources, data cleansing, data deduping, etc..
- Enabling – collecting, storing and processing the data is meaningless if you can’t expose it to clients. It can be to a fellow engineer, the data science team, a BI tools or as an API.
Here are some examples of daily tasks routines you might be expected to do in your work:
- Architect, design and maintain highly scalable data management systems – you’ll need to decide when to use which NoSQLs Databases, Distributed file systems, Analytic databases or highly avilable transaction processing systems.
- Develop data processing jobs – inluding ETLs, Data Algorithms and Real Time Streaming Processing.
- Design highly effective data piplines, including – data collection mechanism, data middleware and log shipping.
- Feel comfortable leveraging different programming language for different tasks, including mainly: Python, Java and Scala.
- Design data synchronization tasks across different technologies and geographic locations.
- Research data acquisition from 3rd party vendors.