Sr. Site Reliability Engineer (Big Data)

Sr. Site Reliability Engineer (Big Data)

Location: Remote

Our client is a digital media company headquartered in NYC that’s looking to grow the engineering team supporting our Exchange team by adding developers.

Some major projects the Exchange team is currently working on: continuing to scale our core exchange platform, honing the intelligence of our optimization, cutting feedback time for business intelligence, and aggressive automation. Currently, we:

  • handles millions of transactions per second, hundreds of billions of times each day
  • evaluates, selects, and optimizes ad-serving based on advanced statistics and machine-learning
  • returns responses collected from dozens of parties in milliseconds
  • constantly evolves to meet market demands that change in days and weeks, not months/years
  • factors thousands of data-points in every serving decision


What you’ll be doing:

  • Deploying, configuring, monitoring and maintaining multiple big data stores, across multiple datacenters. Perform planning, configuration, deployment and maintenance work relevant to the environment. Managing the large-scale Linux infrastructure to ensure maximum uptime. 
  • Developing and documenting system configuration standards and procedures. 
  • Performance and reliability testing. This may include reviewing configuration, software choices/versions, hardware specs, etc. 
  • Advancing our technology stack with innovative ideas and new creative solutions. 

Who are you

  • Collaboration is in your DNA. You enjoy contributing to a mutual cause, that is why you know when the team succeeds, you succeed. 
  • You are always looking for ways to grow your skills. You are hungry to learn new technologies and share your insights with your team. 
  • You like a big picture perspective and also digging into the fine details. You can think strategically but also dive into complex systems and break them down and build them back better. 
  • You are a proactive problem solver. You are irked by an unreliable infrastructure and your first instinct is to find ways to fix it. 

What you'll need:

  • Multi-faceted Alluxio and Hadoop understanding, including the Kerberos, for data storage and Trino, Hive, and Impala for data retrieval.
  • Experience managing Kafka clusters on Linux.
  • Thorough understanding of Linux (we use CentOS in production). 
  • Experience administering SQL/NoSQL databases (we use MySQL, PostgreSQL, MongoDB).
  • Any scripting language (Python/Ruby/Shell etc).
  • Understanding of basic networking concepts (TCP/IP stack, DNS, CDN, load balancing).
  • Must be willing and able to East Coast U.S. hours 9am-6pm EST

Bonus, but not required:

  • Ability to work with Cassandra cluster from installation through troubleshooting and maintenance. 
  • Puppet configuration management tool. 
  • Experience with scalable infrastructure monitoring solutions such as Icinga, Prometheus, Graphite, Grafana and ELK.
  • Experience with container technologies such as Docker and Kubernetes. 
  • Train/mentor junior-level staff. 
  • Experience in AdTech or High-Frequency Trading.
  • Experience with Security-related best practices.

Waiting for your CV!

All vacancies

Our jobs on recruiting portals

Djinni Telegram


Thanks for the application

Sorry, something went wrong. Try again later.

Apply for the job

Choose file or drop it here. Max file size 5MB
Max file size 5MB