Expositus Procuratio

2 Posts tagged with the network_monitoring tag
5

 

In the Network monitoring is a commodity mythI argued that network monitoring is far from being a commodity and on the contrary needs innovation to cope with the increasing complexity.

 

 

As cote mentioned in the comments of that post, there has been some fresh blood in the IT management industry. Several open source companies/projects are tackling the monitoring problem, which is a good thing, yet I feel we're still missing some pieces. AFAIK, most of the monitoring solutions seem to be following existing paradigms :

 

  • monitoring the devices (nodes) through SNMP agent

  • synthetic transcations to determine the status of services running on nodes

 

The understanding of the network topology is missing in both paradigms. In other words, nodes are what's being monitored. Not the network. The network topology (except layer 3) is largely unknown. This limits the effectiveness of the monitoring. Monitoring tools (or rather functionality offered by the tools) can be categorized broadly as the following:

 

  • Polling the devices: Most common approach in IP networks. Most IP networking devices have an SNMP agent that supports at least MIBII so basic availability and performance information can be obtained. For more detailed information however, use proprietary MIBs is needed. Many IT management guys spent long hours trying to understand these MIBs, which data is where, compile them to be used by their monitoring tools, etc.

  • Listening for exceptions: Not every network device has an available agent that can be polled, especially in lower layers below IP. And when available, ability to listen for information is useful as it can be more immediate. In IP networks, these are typically SNMP traps or syslog events. In others, there are often element managers that convey messages. Again, IT management folks spent countless, often frustrating hours, trying to make sense of the traps, syslog events, etc. normalizing them, translate them into human language, identifying what is important and what's not etc.

  • Listening to the pipes: It is possible to learn a lot by listening to what goes on the network. Flow tools (Netflow and its kin cFlow, J-Flow, netstream, sflow, etc.) generate end to end traffic statistics based on flow of data through the network device that support it. Another approach seems to be analyzing the traffic going through a device using a span port. Although it seems this method is popular to analyze application traffic. I don't have a lot of personal experience with these tools so I'll leave it to others to explain it better or correct me. From what I see these tools often require hardware distributed throughout the network to get full visibility which may be a hurdle for adoption.

 

IMHO, all of the approaches I've tried to summarize above have some shortcomings. As far as I can see, the situation may improve in two ways:

 

  • someone may come up with a new technology, a clever way to monitor the network and identifytthe problems, may be discover & represent the network etc. IMO, this can only happen if some of the investment and attention in tools that target “business users” with sexy, shiny UIs flow back to the muck. When the payoff is so low (who wants to tackle a “commodity” problem?) significant investment is not likely.

  • The power of the community is harvested to solve tedious problems once and share rather than each user struggling to solve the same problems over and over independently. There are already some examples of this splunk is attempting to create a repository of log events and what they mean. ZipTie open source project is working on solving device configuration through collaboration of vendors and customers (how come they are not a member?)

 

There is a lot more that can be done in the monitoring realm, if we can manage to setup the right collaboration platform (commercially, legally as well as technically) to facilitate sharing, which is sorely lacking in IT management for whatever the reasons may be.

 

 

From what I can see, ZipTie model is particularly interesting and suitable. Ability to collaborate and share is potentially a major competitive advantage for open source projects. I believe there are opportunities here for collaboration among open source projects/companies and their users/customers.

 

 

For example, in the case of discovery and representation of the network topology, how to get the topology data out of vast number of different type of devices is can be shared. If a common model can be defined to represent the topology, adapters to populate the model for each device can be developed.

 

 

In case of trap and event log processing, the knowhow of what each trap may mean, what the varbinds are can be shared. And again if a commong model can be defined to represent the traps/events, adapters to convert the traps into the common model can be developed.

 

 

I think these activities are naturally conducive to be solved through collaboration, and the life in the trenches would improve significantly if we were tackling them together instead of drowning in them alone.

 

 

 

 

5 Comments 0 References Permalink
3

There is a persistent meme in the industry that states (network) monitoring is now a commodity. This meme is so persistent that it seems it's no longer even disputed. There are lots of different monitoring tools, many of them are open source and/or free, and they've been around for a long time, hence the thinking goes, monitoring is now a commodity.

 

It is quite puzzling to me how terribly wrong this meme is. How can we be so wrong? IMHO, network monitoring is not a commodity. Far from it. Network monitoring is still largely an unsolved problem. The tools we have to monitor the "network" are largely inadequate.

 

Network is a complex beast, and level of complexity is increasing by leaps and bounds as well as the criticality of it.. It has layers over layers and only limited set of people understand it all. Our monitoring of the network is mostly limited to what we understand the most: the nodes in the network. We don't really monitor the network itself which is a complex distributed application running on these nodes.

 

This reminds me a famous Nasreddin Hodja folk tail where he looses his ring in the basement of his house but people find him looking for it outside, on the road. When asked why he is looking for it outside, he says that the basement is too dark, and he can't see anything there.

 

It seems to me that somewhat like Hodja, we're monitoring the nodes in the network since we can, and not monitoring the network because, well, we can't. The problem is largely related to instrumentation. More or less standard instrumentation SNMB MIBII, etc.) to monitor the status of a device and its ports & interfaces has been available for quite some time but very little instrumentation is available to determine the network topology, and whatever is available is not standard.

 

Without the understanding the network topology and the role of the nodes in that topology, the value of monitoring of the nodes is quite limited. We end up collecting a lot of information that does not necessarily helps us determine what's wrong. This is also largely the cause of the disconnect between the users and IT organizations when talking about availability reporting. IT reports on availability of the nodes in the network which does not necessarily equate to the availability of the services that run on the network.

 

As an alternative when the services are monitored directly, we may be able to determine whether the service is up or down, but cannot determine what the cause of problem may be by looking at the monitoring tools.

 

The focus in IT management market has moved up to stack so to speak to “business level” where tools which shiny user interfaces that provide “executive dashboards” are all the rage. IT departments have hell of a time justfying an investment in better monitoring tools but have easier time investing in tools that address the higher level. Ironically, the higher level tools rely on the information provided by the lower level tools such as the monitoring tools hence without solving the monitoring problem, it's not feasible to have meaningful dashboards.

 

Beating up the IT organizations has become such a popular sport that no one seems to listen to what they have to say. As a result, IT management discussions increasingly risk loosing touch with reality. I confess to be jealous of cote's blog biline “one foot in the muck, the other in the utopia” as I believe is the right philosopy to solve any problem worth solving. Network monitoring is in desparate need of innovation and attention, but that is not likely to happen if we start paying more attention to what the people in the muck are saying and kill this false meme of monitoring is a commodity

 

I don't have the answer to how to solve this problem, but I think the community may well have. In the next post, I'll lay out not what I think may be an answer but what I hope may trigger some thoughts on what can be done to tackle the problem of “network” monitoring.

3 Comments 0 References Permalink
Click to view Berkay's profile

Berkay

Member since: Dec 31, 2007

Thoughts on IT management

View Berkay's profile