Monitoring Specialist (Mid-Level)

SECTOR 1, Romania

At Spearhead Systems we believe in the delivery of professional IT services that help companies focus on their core business not technology. We are excited to the bone about the delivery of IT services that help people and companies get things done.

Spearhead Systems is at the forefront of IT service delivery and digital transformation as a trusted technology partner in the evolving world of cloud and digital platforms. We are the engineers, developers and operators of spearhead.cloud, a specialised high-performance cloud and specialists in IT infrastructure & application monitoring and observability.

We are seeking a Mid-Level Monitoring Specialist to own and evolve our monitoring practice. In this dedicated monitoring role, you will configure and manage the Checkmk monitoring platform across multiple customer environments, ensuring proactive issue detection and delivering customer-centric reporting on system health. You will work closely with clients and internal teams to anticipate problems before they impact users and to communicate insights clearly.

If you are passionate about ensuring systems run flawlessly and enjoy the mix of hands-on technical work with customer-facing responsibilities, we'd love to hear from you. 

Customer Relationship
Personal Evolution
Autonomy
Administrative Work
Technical Expertise

Responsibilities

  • Configure and Manage Checkmk
    Install, configure, and maintain the Checkmk monitoring system for both our internal infrastructure and client environments. This includes deploying agents on customer servers, setting up monitoring plugins for various applications, and ensuring all systems are correctly enrolled in monitoring.
  • Monitor Systems Proactively
    Continuously watch dashboards and alerts to detect issues early. Tune thresholds and alert settings so that we catch performance degradations or failures in their infancy, allowing for rapid response before they escalate into customer-impacting incidents.
  • Thresholds and Notifications
    Define and adjust alert thresholds for metrics (CPU, memory, disk, network, etc.) based on each application's normal behaviour. Configure notifications (email, SMS, etc.) so the right on-call engineers and customer contacts are alerted with appropriate urgency. Regularly test and refine the notification processes to avoid alert fatigue while not missing critical events. 
  • Application & Log Monitoring
    Extend monitoring beyond basic server metrics to include application-level performance (e.g., databases, web services, APIs) and log monitoring. Use Checkmk (and related tools) to track application-specific KPIs and to scan logs for errors or anomalies. Ensure that important application events and log warnings trigger alerts or are included in reports for review. 
  • Incident Investigation & Resolution
    Investigate alerts and anomalies to identify root causes. Work alongside system administrators or cloud engineers to resolve underlying issues (e.g., restarting services, adjusting resource allocations, patching software). If an alert indicates a broader problem, coordinate with the support team to ensure timely resolution and document the findings.
  • Client Consulting & Support
    Act as a monitoring subject-matter expert in client engagements. During customer onboarding, gather monitoring requirements and configure Checkmk to meet each client’s needs. Provide professional services by advising clients on best practices for monitoring their specific applications and workloads. Be responsive to client inquiries related to monitoring and assist with any custom checks or reporting they request.
  • Customer-Centric Reporting
    Prepare and deliver regular monitoring reports for clients (weekly, monthly, or as needed). These reports should highlight key metrics, uptime/downtime statistics, trend analyses, and any notable incidents or near-misses. Translate technical data into clear insights and actionable recommendations that non-technical stakeholders can understand. During review meetings or calls, clearly communicate how the infrastructure is performing and where improvements could be made.
  • Documentation & Knowledge Sharing
    Document monitoring configurations, custom plugins, and standard operating procedures for common issues. Maintain an up-to-date knowledge base or runbook for the monitoring environment. Share knowledge with internal team members, conducting brief training on Checkmk usage or alert handling as needed, to ensure our 24/7 support team can interact with the monitoring system effectively.
  • Continuous Improvement
    Continually enhance our monitoring capabilities. Evaluate new features in Checkmk, consider integration with other monitoring or AIOps tools, and contribute ideas to improve efficiency (for example, scripting repetitive tasks or automating agent deployment). Help shape our monitoring roadmap and influence how we deliver high-value managed monitoring services to clients.

Must Have

  • Experience
    3+ years of experience in IT infrastructure monitoring, systems administration, or a related role. A solid background in managing servers and understanding system health indicators is essential for this mid-level position.
  • Monitoring Tools
    Hands-on experience with monitoring platforms like Checkmk is highly preferred. Exposure to similar tools such as Nagios, Zabbix, Icinga, or Prometheus/Grafana is acceptable if you can quickly learn Checkmk. You should understand how to configure checks, set thresholds, and manage alert rules in at least one monitoring system.
  • System Administration Skills
    Strong sysadmin skills on Linux (and preferably Windows) systems. You are comfortable installing software agents, editing configuration files, managing services, and using the command line. Familiarity with managing plugins or extensions for monitoring various applications (databases, web servers, etc.) is important. Basic networking knowledge (TCP/IP, ping, firewall rules) is needed to troubleshoot connectivity between monitored nodes and the monitoring server.
  • Analytical & Troubleshooting Ability
    Proven skills in diagnosing system issues. When an alert comes in, you can interpret what the metric means, correlate it with other data (logs, recent changes, etc.), and pinpoint potential causes. You have a methodical approach to problem-solving and can handle incidents under time pressure, restoring services quickly when failures occur.
  • Attention to Detail
    Ability to fine-tune monitoring settings and notice patterns. For example, identifying that a certain service regularly spikes at a certain time and adjusting thresholds or investigating why. You ensure monitoring coverage is complete (no important system left unmonitored) and that false positives are minimized.
  • Communication Skills
    Excellent written and verbal communication skills. You must be able to write clear reports for customers and explain technical issues and solutions in plain language. Within the team, you document your work and can guide others on how to interpret monitoring data or handle alerts. Customer-facing experience is a big plus, as this role involves explaining reports and possibly teaching clients how to view their dashboard.
  • Customer Focus
    A customer-centric mindset. Our monitoring service is a product we deliver to clients, so understanding their business needs and tailoring the monitoring to meet those needs is crucial. You take pride in keeping clients informed and happy by ensuring their systems are stable and by being proactive in preventing issues.
  • Education
    A bachelor's degree in computer science, Information Systems, or equivalent experience in the field is preferred. Relevant certifications (systems administration or cloud) can substitute for formal education.

Nice to have

  • Scripting & Automation
    Ability to write basic scripts in Python, Bash, or PowerShell to automate monitoring tasks. This might include automating agent deployments, writing custom Checkmk local checks or plugins, or parsing log files. While not mandatory, scripting skills can greatly enhance efficiency in this role.
  • Cloud Platform Knowledge
    Familiarity with public cloud environments (AWS, Azure, or Google Cloud). Since we operate a managed public cloud, knowing the services and metrics of cloud platforms (like CloudWatch, Azure Monitor) can help integrate cloud resource monitoring with Checkmk and understand our clients’ cloud infrastructure better.
  • Log Management Tools
    Experience with log analysis or SIEM tools (e.g., ELK/Elastic Stack, Splunk, Graylog). This experience can complement Checkmk's capabilities, especially if you have used log monitoring or alerting in those systems. It’s a plus for handling the log monitoring aspect of the role.
  • Monitoring Certifications
    Any certification or formal training in monitoring or IT operations (for example, a Checkmk certification, if available, or Nagios Certified Professional, etc.) is a bonus. Likewise, Linux or cloud certifications (RHCSA, AWS SysOps, etc.) can be advantageous.
  • DevOps and IaC
    Exposure to DevOps practices and Infrastructure-as-Code tools (Ansible, Terraform, etc.) is nice to have. Our environment might leverage automation for setting up monitoring across many servers, so understanding these concepts can be beneficial for scaling our monitoring solution.
  • Client-Facing Experience
    Prior experience in a consulting, professional services, or MSP (Managed Service Provider) context. If you have worked directly with external clients to deliver IT solutions, you’ll fit right in with our customer-focused culture.

What's great in the job?


  • Impactful Role: Take ownership as the dedicated Monitoring Specialist in a small, agile team—your work directly improves service quality and customer satisfaction.

  • Leadership & Growth: Work closely with the CTO, influence key decisions and grow into a lead or architect role as the company evolves.

  • Diverse Tech Exposure: Gain hands-on experience with varied technologies across different client environments, promoting continuous learning.

  • Training & Development: Access training, certifications and industry events to support your professional growth and skill development.

  • Collaborative Culture: Join a supportive, team-oriented environment where knowledge sharing and cross-functional collaboration are the norm.

  • Competitive Package: Receive a fair salary with health benefits, PTO, flexible work hours and reasonable on-call rotations with compensatory time.

  • Customer Engagement: Interact directly with clients and see the real-world impact of your work—often recognised and appreciated by customers.

    Join us to take our monitoring services to the next level!

What We Offer


Each employee has a chance to see the impact of his work. You can make a real contribution to the success of the company.
Several activities are often organised all over the year, such as weekly breakfast, team building events, amazing coffee and much more.

Perks

A full-time position
Attractive salary package.

Trainings

External and Internal depending on your needs and expertise.

Eat & Drink

Fruit, coffee and
snacks provided.