Network Cybersecurity Starts with Network Maintenance

Author: Joshua Frazee, Information Security Consultant

Healthcare organizations haven’t always provided their Information Technology (IT) infrastructure, mainly their networks, with the necessary attention to ensure optimal performance and network cybersecurity. The issue is the lack of focus on one instrumental component: network maintenance. Network maintenance involves proactively managing and overseeing your organization’s network to prevent issues. It encompasses testing, troubleshooting, monitoring, asset management, documentation, backups, operating system updates, patching, and lifecycle replacement. This process aims to ensure smooth network operations and prevent potential problems before they arise. Network management are foundational aspects of cybersecurity. Unfortunately, they often come under the ‘set it and forget it’ management path with limited resources.

Organizations should consider and evaluate their security devices and assess improvements to elevate their cybersecurity posture. Additionally, network management that includes ongoing maintenance needs to be part of the organization’s daily cybersecurity operations and reviewed before adding new devices such as intrusion detection and prevention systems (IDS/IPS) and the like. It isn’t that the capabilities or benefits of incorporating devices such as an IDS or IPS aren’t necessary; rather, the fundamental problem is that ongoing oversight is necessary to maximize your infrastructure investment and ensure the health of your cybersecurity controls. Suppose network maintenance isn’t properly performed on devices currently in the environment—what is most likely to happen when new equipment and technologies are introduced? They will likely fall to the same neglect or, worse, become an attack vector due to delayed vulnerability management, a big part of network maintenance.

A Tale of Uptime and Network Maintenance

Let’s consider this in a context critical to every organization’s operations: money. A healthcare organization purchases firewalls for its main hospital data center. However, they connect to an older core switch with an uptime of 4 years. It has not been restarted or turned off in over four years. This uptime also lets the network administrator know that the operating system (OS) has not been updated in 4 years. The new firewalls are installed, and everything seems to be running as expected. It is decided that a small configuration change needs to be implemented in the core switch. The change is made, and the core switch, which provides connections to all aspects of the data center, locks up, preventing data from flowing as intended. The administrator first troubleshoots to see if it is just the connection, realizes it’s the switch, makes their way to the data center, locates the device, must power cycle it and then waits for it to come fully online.

To make matters worse, there is no redundancy. So, a failover to an adjacent switch is not feasible. It takes 30 minutes from the time the switch froze to it coming back online and becoming fully operational.

If this occurs during the hospital’s main service hours, that is, 30 minutes, during which the hospital cannot take new patients or create appointments and must delay patient care. Each minute of downtime to a hospital averages approximately $7,900 in lost revenue, that’s $237,000. Incorporating and conducting preventative maintenance could have reduced or even eliminated this cost.

The failure of a core switch that costs hundreds of thousands of dollars to a hospital is a valid risk and a risk that arises when proper management has lapsed.

Female healthcare employee conducting network maintenance

Key Components of Network Cybersecurity

A Maintenance Program

The first thing any organization needs is a structured maintenance program. An organization must have documented processes and procedures for this to happen. A maintenance program provides a structured approach to managing the organization’s network and the interconnectivity of its IT infrastructure.

The maintenance program also allows an organization to conduct routine, scheduled, and emergency maintenance. Routine maintenance can be as simple as port activation for a new user. Scheduled maintenance could include OS upgrades, addition and testing of security configurations, and lifecycle replacement. An emergency request would be necessary to apply a patch for a zero-day vulnerability or add a policy to a firewall blocking a malicious internet protocol (IP) address. Any of these scenarios provide additional security, redundancy, and reliability to your network infrastructure.

Change Advisory and Change Control Boards (CAB/CCB) for Accountability

A key component of the maintenance program is conducting a CCB or CAB. These boards are essential as they ensure the maintenance of the network devices is not just required but brings value to the security, performance, and equipment they support. Additionally, an organization needs to do a cost analysis verifying that the maintenance provides an overall cost benefit. Multiple variables are considered when discussing the cost-benefit. These can be items such as the man hours required, the required downtime of the network, which will inhibit the use of specific assets that bring in profits such as MRIs or EMGs, and, of course, the cost-to-risk ratio.

A CAB is a group of individuals who formally meet to assess, prioritize, authorize, and schedule changes as part of the change control process. For smaller organizations, there may not be a Change Control Board, but at least a Change Advisory group, where the IT staff and business owner will discuss the impact of the change. Those being affected by the change, along with key roles throughout the organization, can review and make the best overall decision concerning:

Change priorities and scope of impact
Timing and resources
Testing and roll-back options
Change validation and status

A Ticketing System and Workflow Management

The maintenance program needs a tracking system that not only tracks and documents all maintenance projects, plans, and procedures being completed but also serves as a fundamental way to request a change. The lifecycle of the change should be followed in the system, including the initial request, prior testing, and the outcome of the maintenance. While IT is usually on the hook for most of this information, there should be a consensus with the Change Advisory Group (even if this is just a manager) to validate and provide a closed-loop process for the maintenance and emergency patch changes.

There is no “one size fits all” ticketing system. It depends on the organization’s size and budget. However, because of the need to have network change reviews to ensure the organization’s resiliency, having a way to collect and find the change request and verified artifacts can be managed manually, like using a SharePoint site or other similar means.

The ticketing system should contain sufficient documentation that can be archived for future reference or evidence, such as change description, change testing, change implementation plan, change impact change approval, and change results.

Asset Maintenance and Lifecycle Management

Many hospitals struggle with proper asset management. When thinking about network equipment maintenance, ensuring that all network and network-enabled devices are documented and accounted for is necessary. This will allow tracking of the current OS and additional software running on the device and assist in lifecycle management.

This asset management type maintains a hardware and software list. Updating devices and keeping current with the vendor’s patches can enable new features, fix bugs, and guard against the latest malware, along with other possible risks. Frequent vulnerability scanning and vendor monitoring for critical security patches should be part of the asset management process. In addition, it will allow organizations to know of elevated risks and help prioritize actions to mitigate them until a patch can be tested, implemented, and verified.

Another component of asset management is lifecycle management. By documenting the End of Life (EoL) and End of Support Life (EoSL) of devices and the software deployed on them, an organization should make appropriate plans when this changes. EoL is when the vendor will no longer make the device or software. When a device or software is at EoL, there is most likely a device or software that has taken its place. The new device or software can provide additional capabilities that support security features and protocols, such as 802.1x for port-based network access control. EoSL is when the vendor will no longer provide software updates and customer support for that specific item. It also allows the IT department to ensure that funding for such upgrades is requested ahead of time and placed within the yearly budget.

Network Monitoring and Preventive Maintenance

Network monitoring is an essential component of network maintenance beyond ensuring the device is alive and responding. Using technologies such as Simple Network Management Protocol (SNMP), an organization can monitor performance, interfaces, CPU utilization, memory usage, throughput, uptime, and sometimes even the environmental status of devices. With an SNMP manager and agent, the traps can send messages showing that, for example, a specific port is overloaded with traffic. With monitoring, an administrator is alerted that a port cannot pass traffic. It is also the first way to discover a denial of service (DoS) attack before a customer, clinician, or other user brings up accessibility issues.

We will use availability from the confidentiality, integrity, and availability (CIA) cybersecurity triad to explain why this is important. You are monitoring the central processing unit (CPU) usage of a core switch, which provides a network with core capabilities and access to the intranet and internet. An alert has been triggered, and you notice that the CPU of this core switch is running at 95-100%, causing network congestion and packet errors. A core switch is typically set up in a high availability (HA) pair, but this one is not. Now, all internal and external communication is beginning to fail. You go through all configurations made previously by looking into the ticketing system for this device and uncover nothing of merit. You verify that no malicious actions are taking place, such as a DoS, and that there aren’t any bugs within the current OS. Then, to see if the switch is just hung up, you do what no network administrator wants to do in a production network during peak hours and restart the switch. It returns to normal but then goes right back to the high utilization. After additional troubleshooting, you discover that the CPU is at a normal operating percentage during non-peak hours, and the device functions normally. The problem is simple. The core switch’s CPU isn’t strong enough to meet the requirements of the network or the configurations within the switch, and the blade or switch itself needs to be upgraded.

If proper monitoring was conducted and SNMP traps were configured for 50%, you would have been alerted when the CPU reached that mark. It would have been documented, and then the level of acceptance would have been moved up a little more until the operating threshold that is seen as a security risk for the device was reached. Proper planning and funding could have been allocated to upgrade the device or purchase another device to integrate into the network, with the original switch being a backup if the new one failed. It still wouldn’t be able to handle all the traffic or networking capability, so there would still have to be some compromise on what it could or could not support.

Backups

Network device backups are critical to effective network management, ensuring resilience, security, and efficient operations in the face of various challenges. These include:

Disaster Recovery, recovering to a known working state, minimizing downtime
Fault Troubleshooting, providing a reference point to compare the current configuration with the backup, reducing time to remediation
Roll-back points serve as a recovery point in time, so when change tickets are actioned for updates, there is a clean recovery for unforeseen or compatibility issues

The guidance is always to have a safe and central location for your backups, no matter the size of your organization. For larger organizations, this also means your network devices are mirrored and backed up at a disaster recovery site, as they are necessary to provide continuous patient care.

Network Maintenance Paves a Path Toward Better Security

These are instrumental components—conducting proper network and IT infrastructure maintenance will provide a better path for elevating your network cybersecurity posture. Creating a maintenance program and plan, incorporating change accountability, ensuring proper asset management, performing backups, updating, and critical patch management is time-consuming.