Job Description for Site Reliability Engineer - Blockfrost at IO Global - Remote (Global)
Who are we?
IOG, is a technology company focused on blockchain research and development. We are renowned for our scientific approach to blockchain development, emphasizing peer-reviewed research and formal methods to ensure security, scalability, and sustainability.
Our projects include the Cardano blockchain, as well as other products in the areas of decentralized finance (DeFi), governance, and identity management, aiming to advance the capabilities and adoption of blockchain and Web3 technology globally.
We invest in the unknown, applying our curiosity and desire for positive change to everything we do. By fueling creativity, innovation, and progress within our teams, our products and services are designed for people to be fearless, to be changemakers.
About Blockfrost:
Blockfrost is an instant, highly optimized, public and freely accessible API as a Service that serves as an alternative access to the Cardano blockchain and related networks, with extra features, without the need for running and maintaining additional infrastructure and tooling yourself.
Blockfrost.io is also an IPFS provider, allowing you to add and pin resources as well as access IPFS network through our IPFS Gateway clusters.
From the infrastructure perspective, we are a highly available and resilient multi-network platform hosted in multiple data centers all around the globe.
What the role involves:
The Site Reliability Engineer (SRE) is an integral part of our open-source project, ensuring the reliability, availability, and performance of our production systems. This role combines service operation, systems engineering and software engineering principles to operate and monitor services as well as create or maintain tools, automations, and infrastructure code that bolster the efficiency and resilience of our platform.
- Design, write, and deliver tools and software-depending on product focus-using technologies such as Bash, Terraform, Nix, Haskell or Rust to enhance the availability, scalability, and efficiency of our services.
- Engage in and refine the whole lifecycle of services, from inception and design, through deployment, operation, and continuous improvement.
- Practice sustainable incident response and promote blameless postmortems.
- Collaborate with the development teams to ensure that solutions are designed with customer experience, scalability, and performance in mind.
- Analyze system performance and reliability, offering recommendations for enhancement.
- Develop and uphold service-level objectives (SLOs), service-level indicators (SLIs), and error budgets for our services.
- Participate in on-call rotations, responding to and mitigating service interruptions and technical challenges.
This job opportunity is part of Scouthappy's curated collection of remote jobs open to African talent. Apply now to join a global team.