Limeade is a software company that elevates the employee experience and helps build great places to work. Limeade ONE brings together employee well-being, engagement, inclusion and communications solutions in a mobile-first experience. Recognized for its own award-winning culture, Limeade helps every employee know their company cares. To learn more, visit www.limeade.com.
We’re committed to creating a mission-driven, positive culture of improvement made up of the best and brightest people in the business. And we’ve got the awards to back it up: Puget Sound Business Journal ranked us #1 Best Workplace in Washington, and Seattle Business ranked us one of the top three Best Companies to Work for in Washington State. We’re one of the fastest-growing companies in North America (Deloitte’s Technology Fast 500™), and Fortune magazine recognized us as a Best Workplace for Women.
About the role:
The Cloud Site Reliability Engineer will be a key player in our migration from monolith datacenter applications to microservices on Azure. This role combines systems engineering and software engineering to build a distributed system using cloud native services and will work in concert with our software development team. The Site Reliability Engineer will have a breadth of knowledge in high performance web applications, microservices, and REST architecture to help ensure security, availability, reliability, scalability, and high performance. You will be working with other engineers who are database, cloud, networking, storage, and DevOps experts to support all the Limeade products and work closely with our development teams. This role will report to the IT Manager responsible for production operations on the IT Operations team. Your skills and experience will help shape the work we do as an operational team supporting a product that is HIPAA and GDPR compliant.
- Develop live site monitoring and alerting strategy together with developers, QA and security teams
- Create and maintain dashboards to show current and historical state of the site
- Participate in incident response activities, troubleshooting and RCA
- Drive performance, reliability and security improvements
- Participate in other team projects as needed
- Ability to demonstrate our values in an on-going and consistent way
- Excellent troubleshooting skills – able to read code and debug live servers to find underlying issues quickly
- Knowledge of the full networking stack to troubleshoot connectivity issues and to ensure security
- Good understanding of distributed applications and their implementations (Docker, Kubernetes, Service Fabric and similar)
- Passionate about web architecture and able to guide a team successfully to the right solution
- Strong understanding of web site metrics, methods to monitor, 3rd party tools, and custom instrumentation
- Experience with DevOps environments and practices, infrastructure as code
- Expertise with one or more common scripting languages: PowerShell or Python preferred
- Comfortable leading projects and influencing others without direct authority
- 3+ year experience with Microsoft web development stack, supporting high transaction, distributed web platforms
- 3+ years using and supporting website monitoring tools like Application Insights, New Relic, and Azure Monitor
- 1+ years using and supporting log aggregation tools like Splunk, and/or Log Analytics
- 1+ years of production support experience in a cloud environment preferably Azure
Limeade provides equal employment opportunity (EEO) to all persons regardless of age, color, national origin, citizenship status, physical or mental disability, race, religion, creed, gender, sex, pregnancy, sexual orientation, gender identity and/or expression, genetic information, marital status, status with regard to public assistance, veteran status, or any other characteristic protected by federal, state or local law. In addition, Limeade will provide reasonable accommodations for qualified individuals with disabilities.