Software / Systems Engineer - Site Reliability Engineering

Job description

Location: Oxygen House, Exeter

Term: Full time, Permanent

Salary: Competitive, dependent on experience


Site Reliability Engineering (SRE) is a set of engineering principles for efficiently building and operating large production services, following a DevOps mindset. Our SRE team works at the core of our software platform, ensuring these services – whether used internally or externally – are reliable and have the capacity to scale effortlessly. We value automation over manual effort, data-driven decision making, and a pragmatic approach to risk – the fundamentals of providing a comprehensive and stable platform to our customers.

 

As a member of the SRE team, you will work tirelessly to ensure our teams have the capability to ship their software frequently with high confidence. You’re a big picture thinker, capable of quickly gaining familiarity with complex systems at any stage of their development lifecycle and identifying opportunities for iterative improvement. In conjunction with software engineers and product owners, you will play a pivotal role in defining measures of reliability for our services to ensure they meet the performance and uptime requirements of their users, and provide consultative support to help teams achieve and maintain those objectives.

 

We encourage a culture of experimentation, innovation and collaboration. We take calculated risks in search of better, supported by a blame-free culture of learning from our mistakes when things break. You might already be a SRE, a software developer looking for a new challenge, or a systems engineer with some software expertise. Whoever you are, you are intellectually curious, eager to learn, and excited by the prospect of making a meaningful difference.


About Sparx

Sparx is about learning.  We work with schools, in schools and for schools because we understand the challenges faced by educators, teachers and students on a day to day basis and we are passionate about finding a way to help.


Empowering educators and motivating young people to improve educational outcomes is our mission - aiming to improve life opportunities for over 5 million learners by 2030  #Ed5M.


For over 7 years we have been investigating how young people learn. We have gathered data, listened to teachers and students and overlaid these insights with our own observations, drawing on the expertise of our team.  The result is not an off-the-shelf software product - it is an approach to learning. Our unique, blended learning methodology combines traditional teaching methods with sophisticated technology, our own high quality content and unparalleled levels of support.


Our Team


Our diverse and vibrant team is growing fast! We’re not just developers, data scientists, product owners, content or education experts – we’re also cyclists, DJs, triathletes, published scholars, parents, pet owners, musicians, adventurers, and more. We bring our whole selves to work, and we’re united by our belief in the transformative power of learning and excitement about tackling the hard problems in technology.


Based in one of the best workspaces in the country, we set clear and challenging goals, collaborate, celebrate success and have the opportunity to learn from each other every day. We encourage everyone to stretch beyond their roles and deliver on our mission.

Requirements

Essential

  • A strong technical background which grasps the fundamentals of running systems at scale. You can see the big picture, rapidly assimilate the architecture of large systems, and understand perfect is the enemy of good – real systems often involve trade-offs in implementation.
  • Several years experience building and supporting services throughout their whole lifecycle – from inception and design, through deployment, operation and refinement.
  • You’re adept in at least one modern programming language (e.g. Go, C, Python) and building systems in a public cloud environment (e.g. Google Cloud Platform, Kubernetes).
  • Familiarity with quantifying the health of your services through measuring and monitoring key service level indicators, such as their availability and latency.
  • Scaling systems sustainably through mechanisms like automation, and evolving systems by pushing for changes that improve reliability and velocity (e.g. CI/CD).
  • We succeed when everyone succeeds. You naturally explore and advocate for your role, and feel comfortable guiding other development and management teams to provide the skills and confidence to enable us to move quickly.


Nice to haves, but not essential

  • You understand the engineering tradeoffs associated with distributed system architectures, the emergent behaviours such systems commonly exhibit, and mechanisms for mitigating their effects.
  • Experience with Unix/Linux operating systems, in areas such as system internals or administration (e.g. filesystems, inodes, system calls, kernel subsystems), networking (e.g. TCP/IP, routing, load balancing), or a related discipline.
  • Systematic problem-solving approach with strong verbal & written communication skills and a motivation to own and drive change.
  • Comfortable to take initiative, maintain accountability, and sometimes making decisions with limited information.


Unfortunately, at this current time, we cannot accept applications that require sponsorship for an employment visa.