hot job
Role & responsibilities
- Evaluate domain, financial and technical feasibility of solution ideas with help of all key stakeholders
- Design, develop, and maintain highly scalable data processing applications
- Write efficient, reusable and well documented code
- Deliver big data projects using Spark, Scala , Python, SQL
- Maintain and tune existing Spark applications to the fullest.
- Find opportunities for optimizing existing spark applications.
- Work closely with QA, Operations and various teams to deliver error free software on time
- Actively lead / participate daily agile / scrum meetings
- Take responsibility for Apache Spark development and implementation
- Translate complex technical and functional requirements into detailed designs
- Investigate alternatives for data storing and processing to ensure implementation of the most streamlined solutions
- Serve as a mentor for junior staff members by conducting technical training sessions and reviewing project outputs
Qualifications
- Engineering graduates in computer science backgrounds preferred with 8+ years of software development experience with Hadoop framework components(HDFS, Spark, Spark, Scala, PySpark)
- Excellent at verbal, written and presentation skills
- Ability to present and defend a solution with technical facts & business proficiency
- Understanding of data-warehousing and data-modeling techniques
- Strong data engineering skills
- Knowledge of Core Java, Linux, SQL, and any scripting language
- At least 6+ years of experience using Python / Scala, Spark, SQL
- Knowledge of shell scripting is a plus
- Knowledge of Core and Advance Java is a plus.
- Experience in developing and tuning spark applications
- Excellent understanding of spark architecture, data frames and tuning spark
- Strong knowledge of database concepts, systems architecture, and data structures is a must Process oriented with strong analytical and problem solving skills
- Experience in writing Python standalone applications dealing with PySpark API
- Knowledge of DELTA.IO package is a plus.