Be a Data Engineer

Interested in a Data Engineer ?

Open Source Big Data Technologies
  •  Are you into crunching tones of data?
  • Can you architect a petabyte size data lake?
  • Do you care about sub-second query latency of billions of records?
  • Interested in steaming processing? 

If so, you are in the right place. I’ve collected everything you need to jump start your career as a data engineer. 
But lets first start with the basics – 

What is a Data Engineering?

Data Engineering is the practice of – collecting, storingprocessing and enabling data. 

A Data Platform – is a collection of tools ensembled together to achieve the above practice and is the main product of the data engineer

  • Collecting – as our life become more and more digitized, collecting the data from the different sensors for further processing is the most basic part of the data platform. 
  • Storing – in the past we used to have relational database that stored the organization’s transactions. As the data become bigger, faster – new tools for storing and maintaining data had been created.
  • Processing – data rarely arrives in the optimal structure, there’s usually – manipulation is needed, integration with other data sources, data cleansing, data deduping, etc.. 
  • Enabling – collecting, storing and processing the data is meaningless if you can’t expose it to clients. It can be to a fellow engineer, the data science team, a BI tools or as an API. 

Here are some examples of daily tasks routines you might be expected to do in your work:

  • Architect, design and maintain highly scalable data management systems – you’ll need to decide when to use which NoSQLs Databases, Distributed file systems, Analytic databases or highly avilable transaction processing systems
  • Develop data processing jobs – inluding ETLs, Data Algorithms and Real Time Streaming Processing
  • Design highly effective data piplines, including – data collection mechanism, data middleware and log shipping
  • Feel comfortable leveraging different programming language for different tasks, including mainly: Python, Java and Scala
  • Design data synchronization tasks across different technologies and geographic locations
  • Research data acquisition from 3rd party vendors.

 

Leave a Reply

Your email address will not be published. Required fields are marked *