Should we consider developing an open agent? At BarCampESM we discussed the idea of having a standard open source agent that has all the following traits.
Deployed on OS
Standard Agent API
Normalized View
Regarding "Deploy on OS" concerns me (in my microcosm of Cisc) as we are very cautios of making target system "unsupported" by installing non-Cisco certified software on devices. As well, as Cisco moves to more appliance like Linux based devices, they do not allow ANY access to the device other than a locked up CLI shell to perform basic functions for the OS.
I still will relay heavily on probes, collectors, or off-box agents accessing target devices remotely to collect data via SNMP, WMI, Perfmon to extract the required data to monitor devices.
I agree that an open, standardized agent would not be a one-size fits all solution, but it seems that it could simplify a lot of infrastructure and deployment issues.
This is something I've been thinking about for a while. I don't have much to add at this point. But I will say that I think one of the most important pieces will be a solid plugin/extension API. Really, at it's core, an open agent could be just a very basic core that supports loading of various plugins which enable the particular features needed. That way, one agent could serve the needs of monitoring, deployment, remote control, etc.
Isn't there one that is already shipped with every operating system on the planet? Net-SNMP? Why re-invent the wheel when we can perhaps contribute to that project and make the agent better or add functionality that is missing? It's a well know protocol, the agent specification is already extensible so contributing subagents is a breeze.
I know, I know, most folks hear SNMP and run away, but, there isn't a protocol out there that has stood the test of time like SNMP and, it is extremely scalable. Let's take our resources and put them into a project that has already been benefiting the open source community long before we all decided open source was cool.
Dave, I think the key is that there is a limitation on SNMP for passive management. What about people who are installing agents for active management (configuration, etc.)? I asked Erik to summarize in detail why he thought this was a good idea and the discussion. The purpose of the question is to figure out if this is the right question or as you assert can we just apply our resources to make what already exists better.
Well I guess we need to give the folks who were part of these discussions a chance to explain what the rationale is for such a project.
What are the shortcomings of existing agents? Hyperic has Sigar that is GPL licensed and there may be others.
Some description on what the perceived problem this project would be addressing will be very helpful
I see several issues with yet another agent
Deployment
How is the agent deployed to the target system ?
Without a centralized software management system (which needs an agent ...), this means manual install on every client. Not very promising.
Security
Pull or push model ?
With push, the agent needs to open a port and listen to incoming request. Any open port is usually considered a security risk.
Using pull, it must regularly check for management request. One cannot do immediate requests but wait for the next pull. And with lots of clients, pulls will clutter the network.
Discovery
How can a client (running the agent) be discovered ?
Protocol
How should the agent talk to the management application ?
REST would be nice (and maybe the only reason for a new agent).
But this still leaves resource representation open.
XML is considered bloated and not very useful for monitoring. Monitoring must be done on the client (RRDtool ?!) as well as policy evaluation. Policy violations (disk full !) can then be reported asynchronously.
Data Model
How to represent managed entities and their relations ?
It should be object oriented and allow discovery of relations. If a network switch fails, you do want to know which services are affected. The model must support to discover services routed over the failed switch.
In band - out band
Management of clients in case of critical hardware failures
If the power supply burns, the CPU just overheated or the kernel crashed, the system must still be manageable. This requires as Baseboard Management Controller (BMC) ideally speaking the same language as the agent.
Reusability
Writing management instrumentation is hard. How can existing instrumentation code being reused ?
Interoperability
Mixed Windows/Unix/Linux environments are fact today.
Who's going to develop and support the agent across these environments ?
So what's wrong with existing standards and solutions ? CIM and WS-Management have solved most of the above mentioned problems.
Instead of reinventing the wheel, effort is better spend in making these standards more mature and interoperable. Interestingly enough, Microsoft opted for WS-Management because customer asked them for interoperable management.
OMC should jump onto this bandwagon and push Microsoft further down this road.