Taking a Stab at Enhanced Monitoring

While it is fun to opine on issues that impact the industry, we seldom take the time to talk about actual technical bits in this blog, or about ourselves. Fair warning: this is one of those times, but even for the non-technical folks out there, the story of Roberto should be interesting. It illustrates how in today’s world of APIs and open source projects, tools and applications can be customized to make them more useful.

Much like your dial tone or internet access, the infrastructure of a SaaS or enterprise digital signage platform is often taken as a given. Behind the scenes, it is absolutely essential to monitor key metrics to assess the performance of all production, staging and enterprise servers in order to proactively identify bottlenecks and potential issues. To do this at Real Digital Media, we use a SaaS application called Scout, which is installed as an agent on each server, communicating with Scout’s management application. The tool allows our engineers to define the metrics that they wish to monitor from a set of plug-ins provided with the Scout application. Metrics that exceed a defined level will generate an email alert from Scout, and we have learned that such alerts almost always serve as an early warning that internal users and/or customers are about to report something out of whack. Experience has helped us identify and resolve issues before the calls come, which makes for happy users.  Scout also lets our engineers write custom plug-ins to augment the built-in metrics. The ability to write custom plug-ins without revising code on the remote server comes in very handy from time to time, especially in the deployment of a new enterprise install, where new metrics can be devised and watched to explain or head off typical issues.

In daily usage, we’ve tied the most critical metrics watched by Scout to a dashboard that the support team can keep on a large display or pull up on their desktop screens. That capability, combined with the email alert system, is useful, but does not mesh with our work styles and requirements.  It is rare for all the members of our development and support teams to be assembled in the same place. Geography notwithstanding, it is critical for them to share ideas, issues and information on a continuous basis. They have found that the most effective way to support communications during a typical day is to use a chat room powered by 37 Signals’ Campfire product. To fully integrate Scout with Campfire, we needed to do a little bit of magic.

The problem was well defined. We needed to have Scout alerts routed into the Campfire chat room so that all of the team could get immediate feedback, decide amongst themselves what next steps should be, and who should handle which tasks. Beyond that, the chat room could serve as a repository for previous incidents and actions taken, making us more efficient in responses. Every team member would have visibility of issues and resolutions, as well as the option to make clever, motivational comments to co-workers as appropriate.

To pull this off, a new plug-in with some extra capability had to be written. Essentially, we created a “bot” that sits in our Campfire chat room and uses the Scout webhook API to receive and display alerts. The “bot” is based on “hubot” from github, and is now part of github’s third-party hubot-scripts repository. The team named the bot Roberto, after the knife-wielding, insane criminal robot of fleeting Futurama fame. He was going to spur us to action, one way or another. Roberto was originally going to utilize Scout’s API to paste critical alert information into the Campfire chat room.  But the team wanted more. We wanted to grab some of the graphics that Scout’s in-app page offers to users to provide us with visual feedback on historical norms for each metric without leaving the chat room. A call to Scout support got graphics ported into the API in less than a day, and we were off and running. Full and customized metrics on all our critical servers were now available to the entire team inside a collaborative environment. Works styles were accommodated and both hubot and Scout users were provided with new tools.

Monitoring critical functions and performance on staging and production servers is an important task, seldom seen or appreciated by platform users. We’d just as soon keep it that way. In a world-class environment, good tools allow problems to be solved before they manifest themselves to users. In our shop, Roberto’s technical blade ensures that we stay on the cutting edge. And he made me write this… who am I to argue with an insane robot?

Ed. Note: Thanks to Gavin Stark for bringing Roberto to life and sharing the story.

By | 2017-01-05T16:35:27+00:00 August 29th, 2012|Digital Media Technology|0 Comments

Leave A Comment