Site Reliability Operations Engineer
If you are passionate about working on technologies and design approaches that are disruptive to incumbent Enterprise players, like to leverage Open Source, tooling and automation and agile concepts, and are interested in opportunities to work at massive scale, we may just have the job for you. Site Reliability Operations (SRO) is a discipline that is in and of itself not new, but at Box we like to think that we are considering it in a way that makes it one of the most important roles within our Technology organization. As an SRO Engineer at Box you will be responsible for managing our production services and will be working very closely with developers and other Ops teams to ensure reliability, scalability and performance of the next-generation of systems. As such, you will be a core driver of operational excellence for all of Box's production services. We are looking for smart Operations Engineers, who have a strong acumen for troubleshooting technical issues and tuning of distributed systems, having done this ideally in a highly scalable, consumer facing web service. Experience with best practices around monitoring, deployment and configuration management are a must, as are strong fundamentals in systems administration. If this sounds like you, please read on for a more detailed description of the scope and requirements for
RESPONSIBILITIES
- Responsible for managing the overall operations of all production services at Box.
- Collaborate with developers and other internal groups to identify, prioritize and develop service reliability and manageability improvements.
- Work with the NOC and other Ops teams to troubleshoot site issues.
- Be a change management Guru! Change is the only constant in our team and you will be responsible to walk the thin line between failing-fast and maintaining SLAs.
- Develop tools/scripts to improve our ability to rapidly deploy and effectively monitor production applications in a large-scale Linux environment.
- Participate in a 24x7 on-call rotation for second-tier escalations.
QUALIFICATIONS
- 1+ years of Unix/Linux systems administrator level experience.
- Proven production service troubleshooting skills that span applications, systems and network.
- Demonstrated programming skills in one or more of: Bash, Python, Perl, PHP, Ruby, Java, C.
- 1+ years experience working in a consumer web-scale Technical Operations role.
- Solid understanding of operational principles, such as capacity planning, monitoring and incident handling.
- CS or equivalent university degree.
- Committer to relevant Open Source projects and prior experience working in a DevOps capacity, both nice to have!
- Passion for cloud technologies.
About Box: Box provides a secure way to share content and improve collaboration on any device. Desktop, tablet or mobile. From huge corporations to mom and pop stores, Box believes technology should never limit anything you do. Businesses of any size can be more productive, inventive and powerful on Box. The company is well funded by top VC firms like Andreessen Horowitz, Draper Fisher Jurvetson and U.S. Venture Partners. Box is proud to be on Forbes’ list of America’s Most Promising Companies, is used in 240,000 businesses - including 99% of the Fortune 500 – and is the go-to product of 27 million people.
No comments:
Post a Comment