|Deletions are marked like this.||Additions are marked like this.|
|Line 99:||Line 99:|
|||'''''Role'''''||'''''Operational responsibility'''''||'''''Time spend per school per week'''''||'''''Time spent in total for all schools'''''||
||Centralised operations staff||Monitoring, debugging and operation of 500 machines, for example, 10 schools with 3,200 students and teachers.||2-3 h(50 clients)||½ position(500 clients)||
||ICT contact at each school||Oversight of equipment, easy maintenance, and reporting of incidents and requests||3-4 h(50 clients)||1 position(10 schools / 500 clients)||
||Central ICT-coordinator||Assist in planning and implementation of educational and technical ICT work in the school.||1-2 h||½ position||
||ICT manager (principal)||Make joint purchases, and ensure compliance with the service level agreement. Schedule updates, or develop solutions||1 h||¼ position||
||'''Overall for a school'''||'''50 client machines (concurrent users)'''||'''6 - 10 h'''||||
||'''Overall for all schools'''||'''10 schools, 500 client machines (concurrent users)'''||||'''2 ¼ position'''||
<td align="left">'''''Operational responsibility'''''
<td align="left">'''''Time spend per school per week'''''
<td align="left">'''''Time spent in total for all schools'''''
<td align="left">Centralised operations staff
<td align="left">Monitoring, debugging and operation of 500 machines, for example, 10 schools with 3,200 students and teachers.
<td align="left">2-3 h
<td align="left">½ position
<td align="left">ICT contact at each school
<td align="left">Oversight of equipment, easy maintenance, and reporting of incidents and requests
<td align="left">3-4 h
<td align="left">1 position
(10 schools / 500 clients)
<td align="left">Central ICT-coordinator
<td align="left">Assist in planning and implementation of educational and technical ICT work in the school.
<td align="left">1-2 h
<td align="left">½ position
<td align="left">ICT manager (principal)
<td align="left">Make joint purchases, and ensure compliance with the service level agreement. Schedule updates, or develop solutions
<td align="left">1 h
<td align="left">¼ position
<td align="left">'''Overall for a school'''
<td align="left">'''50 client machines (concurrent users)'''
<td align="left">'''6 - 10 h'''
<td align="left">'''Overall for all schools'''
<td align="left">'''10 schools, 500 client machines (concurrent users)'''
<td align="left">'''2 ¼ position'''
|Line 392:||Line 465:|
|Build management is about ensuring that you always install the required software packages, services and proper settings both of individual programs and for the network. Many people have heard about so-called "images". One installs the operating system with all needed programs and configures the network. Then one uses an image program to make a copy of the hard disk. This "disk image" can then be copied to other computers.||Build management is about ensuring that you always install the required software packages, services and proper settings both of individual programs and for the network. Many people have heard about the so-called "images". One installs the operating system with all needed programs and configures the network. Then one uses an image program to make a copy of the hard disk. This "disk image" can then be copied to other computers.|
|Line 396:||Line 469:|
|For most situations, scripting is an easy way to "build" and roll out programs and configurations. But there are situations where the construction of disk images may be the solution, e.g. for installation on many laptops.||For most situations, scripting is an easy way to "build" and roll out programs and configurations. But there are situations where building disk images may be the solution, e.g. for installation on many laptops.|
|Line 410:||Line 483:|
|=== Backing up ===||=== Fall-back solution ===|
|Line 444:||Line 517:|
||Prioritization of the release:||Check if necessary decisions are made before a change or upgrade would be deployed.||
||Definitive Software Library||Ensure that the appropriate software packages to be installed are in place in the definitive software library.||
||Configuration database||Be sure to have in place all configuration files. This applies both to those who are in use, and the new ones supplied in systems to be changed or updated.||
||Build management||All scripts and systems used to unroll or create disk images must be in place.||
||Testing||First, run trials on test equipment. When this works without any problems, it can be tested at a school. The school must be fully informed about, and fully in on trying out new software. When one is sure that everything works, you can upgrade for all.||
||Backing up||Even with extensive testing, new releases may go wrong. Therefore it is essential to have a fallback. The easiest solution is to spare have the old installation with data on a separate server machine. Such a machine can be plugged in if the change or upgrade does not work.||
<td align="left">Prioritization of the release:
<td align="left">Check if the necessary decisions are made before a change or upgrade would be deployed.
<td align="left">Definitive Software Library
<td align="left">Ensure that the appropriate software packages to be installed are in place in the definitive software library.
<td align="left">Configuration database
<td align="left">Be sure to have in place all configuration files. This applies both to those who are in use, and the new ones supplied in systems to be changed or updated.
<td align="left">Build management
<td align="left">All scripts and systems used to deploy or create disk images must be in place.
<td align="left">First, run trials on test equipment. When this works without any problems, it can be tested at a school. The school must be fully informed about, and fully in on trying out new software. When one is sure that everything works, you can upgrade for all.
<td align="left">Fall-back solution
<td align="left">Even with extensive testing, new releases may go wrong. Therefore it is essential to have a fallback. The easiest solution is to spare have the old installation with data on a separate server machine. Such a machine can be plugged in if the change or upgrade does not work.
|Line 568:||Line 680:|
|Plan for communication||Communication plan|
As mentioned in the introduction, it is recommended to begin by establishing an office for centralized operations to allow you to manage tickets. The benefits of this come quickly and are visible, which is important for customer and user satisfaction.
After the office is up and running with a sensible workflow for tickets (user requests and troubleshooting) you will move on to the biggest challenge for the organisation. As a rule, this is either change management or problem solving. Organisations with "cowboy" system administrators who come up with smart ideas and implement them without much testing, often begin with change management. For organisations suffering recurring outages, problem solving comes first.
Whatever you choose to start with, a certain amount of configuration management will be necessary. Managing configuration is critical to delivering the software and services for the user. Software must work as expected. In order to make beneficial changes, one must know the configuration of the different programs.
To manage configuration changes you may use a database (Configuration Management Data Base (CMDB)). Few people use a database for all the configurations, and neither do you have to add all configurations to one single database. It's fine to place configurations into multiple smaller and partly independent repositories. Some people, for example, store configurations and setups in version control. But even if you have different repositories, you may get greater benefits if you connect information from the different processes.
For users of Debian Edu, most service configurations lie within a specific directory (/etc). These may benefit from being collected and stored in a central version controlled directory. This makes it easier to restore lost services and setup machines if they are reinstalled. This applies both to servers and user laptops or workstations. As part of the backup system in Debian Edu, a backup is made of the setup directory /etc. But the backup system is nothing more than a database or a version controlled directory for configurations.
The Service Desk is where users submit questions or errors. At school, the ICT-contact often forwards operational events to the Service Desk. There may also be requests like setting up a new PC, or installing a program.
At school the ICT contact is the link to the Service Desk. The ICT contact also responds to the most common questions. Some questions are too difficult to manage at each school and must be forwarded to the Service Desk. It is important to have good cooperation between the school ICT contact and Service Desk operators. Tasks that are too extensive or too difficult to solve locally should be passed to the Service Desk.
Users may also get direct answers from an operator at the Service Desk. All operational enquiries go to the Service Desk. Enquiries will be assigned a case number. Anyone who has registered a case will receive an e-mail confirming that the inquiry has been received. During consideration of the case, those working with it at the Service Desk may send updated status to the user.
This way, users get one point of contact, and service desk operators get an overview of all of the cases. Operations can be expected to troubleshoot across all parts of the organisation. Periodically the team leader needs to go through all issues and solutions in order to prioritize debugging and to prevent re-occurrence of errors, in order to provide schools with a stable operating environment.
Incidents can be reported over the phone, fax, email or web form. Incidents that are more urgent must be prioritized. Incidents that need to be resolved quickly are usually reported by telephone. Less important events are usually reported via eg. email. A member of the support staff should be assigned to the incident and will need to ask the user questions to investigate the problem.
- Remember to be an active listener, not a passive one.
All enquiries should be logged, and an email confirmation should be sent. It is important that the user should feel safe, and information about what might be the problem should be communicated to them. When the enquiry arrives at the service desk, a brief description of the incident should be logged. The enquiry may be from the ICT contact at the school, or from someone with an agreement to use the service desk. The event logging should happen as soon as possible, and it should be assigned a case number. The user should get a confirmation by email copy that the matter has been received and assigned appropriate case number.
Previously, enquiries were written in paper logbooks. Today software is used to record the enquiries. In English, this is called "Request Tracker". It is crucial for operations to log enquiries. This is basically for error handling, user requests, and prioritization of the various incidents. Log entries are important to prevent recurring errors. Because operational events are periodically reviewed, an assessment of fixes and priorities can be made. The log also provides a basis for improving the service by debugging problem services and applications based on what users perceive as problematic.
Thus the log of requests is a basic and necessary tool both for users and the service desk. There are several freely available systems for logging requests with good documentation <<FootNote(RT Essentials: http://www.oreilly.com/catalog/rtessentials/chapter/index.html )>>. Skolelinux Drift uses RT <<FootNote(RT Essentials: http://www.oreilly.com/catalog/rtessentials/chapter/index.html )>> to handle requests.
One important thing when starting up support is not to get too tough a start. Do not try to achieve everything at once; bet rather on "quick wins" that keep the user informed, and aim for quick response times. It is also important to clarify who the service desk should forward events to, if they can not solve the issue themselves. The support desk must also check whether there will be disruptions for the user. This makes it quick and easy to give feedback.
For the users it is important that incidents are dealt with. For the service office it is important that the incidents are handled correctly according to the service level agreement, and that work requested outside of what was agreed is handled between management at the school and the system administration organisation.
Tasks and roles
We recommend to agree upon what duties the school's ICT contact has and what is the responsibilities are of those who work at the Service Desk. Schools often have little resources compared to what is common in municipal administrations or private companies. At the same time, schools usually have many more users and often more client machines than in use in the rest of the municipality.
To distribute tasks roles must be in place. By having clearly established roles it is easier to distribute tasks and ascertain the working capacity necessary to resolve operational tasks. Operational experience in municipalities and professional organisations shows that these roles are common.
- ICT contact on each school This is often a teacher with ICT educational and/or technical background.
- Operator(s) working in the central IT service. This is a person skilled in operations.
- ICT coordinator who organises the educational use of IT, and contributes towards plans for developmental, operational and educational use. Often this is a teacher.
- ICT responsible. This is usually the principal who is responsible for IT operations.
Here is an overview of the various everyday tasks, some of which are contracted out by the municipalities.
ICT contact(s) tasks at each school:
- Oversee the school's server room.
- To be the school's contact at the municipality - report errors and outages.
- Perform simple maintenance tasks such as replacing mice and keyboards, upgrading thin clients, and simple patching.
- To be the school's superuser - to advise colleagues about: the user interface, e-mail, video projectors and relevant applications.
- Participate in ICT gatherings.
- Create and administrate local users.
- Perform simple maintenance of printers.
- Create and manage email accounts.
- Perform simple commands and operations under guidance of a ICT-tutor.
- Facilitate the use of ICT in teaching.
The operator's tasks:
- Receiving incidents and service requests.
- Mentor ICT contacts by telephone and e-mail.
- If agreed, visit the school for troubleshooting defects and malfunctions on computers, printers and servers.
- Security software updates on the school's computers (servers and clients).
ICT coordinator's duties:
- Assist school management and ICT contacts in expanding technical and pedagogical ICT plans.
- Ensure that the service desk and the management get a good selection of software.
- Ensure that the schools have appropriate ICT tools for teaching, and that computers and networks are appropriate for school subjects.
- Provide advice and guidance to operational services on what the technical and pedagogical ICT requirements of the school are.
ICT-responsible duties (principal, headmaster, head of operational services)
- Make joint purchases of computer equipment and enter into joint agreements etc.
- Develop competence plans.
- Provide the schools with courses in the educational use of ICT.
- Operations course.
- Negotiate contracts for operations.
- Ensure that the IT contact and the IT service have the necessary resources.
The advantage of an agreement for these tasks is that expectations on the individual are known, giving a good basis for planning and managing ICT services. Usually these ICT tasks are only done part-time by a teacher who also has teaching duties.
A business would often have two staff members working full time, operating 100 standard client machines with 100 users. In schools there may be a 30% position in total, divided among several persons, operating 100 client computers used by 320 students and teachers.
When the school has so few resources for operations, it is crucial to have good resource management. Making agreements for the tasks can make it easier to assess whether you need additional resources, or to reduce expectations of IT initiatives in schools with regards to the budget. By having a good overview of the ICT tasks in the school, if would be easier for IT administrators to ask for an increase in resources if necessary. There may be a need for increased resources to implement ICT-based exams, or a need for new equipment like whiteboards as teaching aids.
Expected time usage
We've created a table showing time spent on operations and maintenance. The table is based on the experiences of municipalities which implement a centrally operated Debian Edu of 9-10 schools with 250-500 client computers. Several things are not included in the table. Therefore extra time is required for projects where schools develop their own ICT solutions with networking and more equipment.
<table> <tbody> <tr class="odd"> <td align="left">Role </td> <td align="left">Operational responsibility </td> <td align="left">Time spend per school per week </td> <td align="left">Time spent in total for all schools </td> </tr> <tr class="even"> <td align="left">Centralised operations staff </td> <td align="left">Monitoring, debugging and operation of 500 machines, for example, 10 schools with 3,200 students and teachers. </td> <td align="left">2-3 h
(50 clients) </td> <td align="left">½ position
(500 clients) </td> </tr> <tr class="odd"> <td align="left">ICT contact at each school </td> <td align="left">Oversight of equipment, easy maintenance, and reporting of incidents and requests </td> <td align="left">3-4 h
(50 clients) </td> <td align="left">1 position
(10 schools / 500 clients) </td> </tr> <tr class="even"> <td align="left">Central ICT-coordinator </td> <td align="left">Assist in planning and implementation of educational and technical ICT work in the school. </td> <td align="left">1-2 h </td> <td align="left">½ position </td> </tr> <tr class="odd"> <td align="left">ICT manager (principal) </td> <td align="left">Make joint purchases, and ensure compliance with the service level agreement. Schedule updates, or develop solutions </td> <td align="left">1 h </td> <td align="left">¼ position </td> </tr> <tr class="even"> <td align="left">Overall for a school </td> <td align="left">50 client machines (concurrent users) </td> <td align="left">6 - 10 h </td> <td align="left"></td> </tr> <tr class="odd"> <td align="left">Overall for all schools </td> <td align="left">10 schools, 500 client machines (concurrent users) </td> <td align="left"></td> <td align="left">2 ¼ position </td> </tr> </tbody> </table>
Experience shows that the scope of work of the ICT contact is affected by the number of concurrent users. The term "concurrent users" is new to many. To illustrate with an example: A school may have 250 students but not more than 50 computers. Then a maximum of 50 students can use computers at the same time. This is much less than the total 250 users who have an account on the system. It is these 50 logged in users that provide work for IT service. The other 200 people not logged in give little extra work.
Therefore, it is common to calculate IT costs from the maximum number of concurrent users. Other calculation methods are also possible, for example when paying for proprietary software. But since Debian Edu has no license costs, the number of concurrent users is the most crucial figure for operating costs. To calculate costs from user accounts provide little or no meaning for a school.
For users of Debian Edu the cost difference to manage 100 or 250 user accounts is very small. There are a few exceptions. With 250 students instead of 100, some students may repeatedly forget their password. Therefore, it is wise to let the teacher responsible for the class give these students a new password.
If the school has 50 client machines, the ICT contact needs less time on their operational tasks than if the school has 150 clients. With multiple clients, the overall time spent on the operation increases, but operating time per client machine goes down somewhat.
Several municipalities have set aside 3-4 hours a week to the ICT contacts tasks at each school with 30-70 client machines. The Education Department in Oslo has set aside half a weekday, or a 30% position, to follow up 150 client machines. Experiences from other municipalities suggests that a 20% position is enough to solve the tasks of a local ICT contact when a school has 160 thin or diskless clients with Debian Edu.
In addition there are associated costs of centralized operations, ICT management, and construction of the educational use of ICT tools in school subjects. One position is probably sufficient for the operation of 1000 client machines. When it comes to educational support, several principals have a 50-100% position in the school for this work. There may be a 10-20% position as an ICT contact and a 40-80% position as an educational support for the teachers. Many teachers perceive IT tools in schools to be something new. Some principals wish to give more backing to the educational side by making teacher more confident in using IT tools across the different subjects.
We have sat up a list of tasks to set up a new service desk.
- Arrange people in different roles like IT manager, IT contact in schools, central operations and IT coordinator for all schools. It is important to make a distinction between what is technical operations and maintenance, and what is pedagogical work.
- Establish the service desk such that every school has a service agreement regulating what is standard operating activities, and what is extra. It is imperative that ICT-responsible principals are a part of this process.
- Establish a system for handling incoming requests (a request tracker). All enquiries by email need a case number. Almost all enquiries from users or IT contacts from schools also need a case number.
- Ensure that ICT budget reflects the contribution necessary to ensure proper operation of school computer equipment and networks. The requirement today is that the ICT systems will be used for national and local tests with use of ICT tools with or without the Internet.
- Basically use the standard edition of Debian Edu with the same version on all schools. From this make the changes you want. These changes must be taken care of in a configuration database with documentation of the changes made. Version management can be used to save the changes and documentation.
The purpose of the ICT service is to prevent disturbances like shutdowns or software issues. Users will experience few problems with the ICT system if the ICT service has enough resources to handle operations, equipment and for enquiries to the Service Desk. Small or big problems will cause interruptions for users, so good handling of incidents is necessary.
In parachuting they call near-accidents "incidents". It is perhaps not quite the same in computer operations when something is not working. The purpose of dealing with incidents is to restore services as quickly as possible so that everything works normally. If something goes wrong, it must have the least possible impact on users. What is a "normal service" is agreed through an operating agreement describing the service level.
Statistics of incidents is important, especially if several people work within the organisation. When several people work together, it is easy to lose track of the work. Statistics will point out problem areas that must be addressed more thoroughly than a quick fix from the service desk. For example, there may be many requests to replace forgotten passwords, so it may be wise to let the teacher change passwords for pupils in their class.
An operational disturbance is defined as:
- an event which is not part of normal operations and causes, or can cause, an interruption or reduction in the quality of the service.
Examples of operational disturbances may be:
the office program (OpenOffice.org) does not start
- the web browser (Firefox) crashes
- the hard drive is full
- the server is down
- unable to print
- unable to log in
- requests for information, advice or documentation
- forgotten password
The examples show some of the most common operational issues. These are problems that prompt users to contact the school or the service desk. The ICT service must prioritize what must be handled straight away, and which problems need more time to resolve. To prioritize which problems need more comprehensive debugging, it is important to log all enquiries about malfunctions. Once one has an overview of the most common problems, appropriate actions can be taken.
We have made a short check list to ensure procedures and systems for good event handling are in place.
- The operator doing the debugging will report the status back to the ICT contact at the school and/or the user.
- The system for logging events must be available and working (both technically and functionally) for those working with event handling in schools and at the service desk.
- The event logging system must be used for virtually all operational events.
- Statistics of the log of events should be made periodically. The statistics can be used to identify and eliminate recurring problems, which are irritating to users.
Planning and implementation
To set up a workable system for logging events requires something more than installing the system. Everyone in the operations department must use the system. Those reporting errors must also receive feedback by email with a ticket number. This requires significant efforts in configuring the system for event logging. In addition, one must ensure basic user training for those who receive the requests.
Large and comprehensive plans are not required to implement proper event handling. Event handling is a completely standard task for those who work at the service desk or as ICT contacts at the schools. Setting up a computer tool for logging events may require up to a few weeks for a correct configuration, and users may also report events via e-mail and by phone.
The user interface to the logging system is relatively self-explanatory, so it should not take too long to get started. Daily use of the system will get users comfortable with what should be logged. It is crucial that everyone in the operations department uses the logging system for operational messages.
Activities related to operational events
To get an idea of activities done following a reported event, we use an example.
A user contacts the service office with a problem, and reports that printing is not working. Operations logs the event immediately after the call is completed. A case is opened for the issue, and automatically given a case number.
Operations at the service desk make a quick analysis. Has the spooler stopped again, or is it something else? Is the paper or toner missing? The operator examines the spooler and sees that queue has filled up. She deletes the queue and tests whether the next job is printed.
This time the print queue fills back up again. Operations contact the school's ICT contact asking to check whether the paper tray is empty. This is listed in the event log. The ICT contact replies that they have refilled the paper tray, and printing is normal. The case is closed, and is noted in the system event log.
If printing had not started again, the toner might have be missing or there might have been a printer error. If there was an error, operations would have to escalate the issue. This means that someone other than the operator or the ICT contact is needed to resolve the problem - in this example, a technician who can fix printers.
This example shows the whole workflow that needs to be investigated to get a printer working again. If a printer does not work even after checking that paper and toner are available, the issue needs to be escalated. The operations department must call in an expert to fix the problem - this time it was a service technician for printers.
What was wrong and what the fix was are noted in the event logging system.
A variety of roles are involved when the ICT service deals with reported issues. In the example above, the school's ICT contact and the operator cooperate to solve the printing problem. Had the issue been more difficult, they would have had to call a technician. If the printer could not be fixed, a new one would have to be purchased. If the school needed to buy a new printer, the ICT managers might need to arrange payment. In many organisations, the principal has the last word.
In short, it is easy for many people to get involved when something does not work. If possible, problems should be solved on the spot, trying to avoid including unnecessary people. Escalating problems which could be solved locally quickly becomes costly. Many enquiries are easy to deal with there and then, but other requests involve more complex problems which involve more people. If additional or external help is needed to solve the problem, this must as a rule be clarified with the operations manager. The important thing is to be aware of these points when handling operating events, so as to use resources appropriately.
We have sat up some key points for handling incidents. These points can be helpful in evaluating whether or not things are going well by using measurable and well-defined requirements. Such measurement points are:
- Total number of operational incidents.
- Average time from receiving an inquiry to when the issue is resolved, classified with codes (a well organized operation department has codes for different types of events and errors).
- Percentage of incidents handled within agreed response time (as agreed in the service level agreement).
- Average cost for each event
- Percentage of incidents solved by the service desk without escalation
- Events per client machine (workplace)
- Number and percentage of incidents solved by the operations center without the need for visits to school
A number of tools can make it easier to handle operational incidents.
- Automatic logging
- Automatic routing of events to the right persons
- Automatic retrieving of data from the database for configuration management
- Phone and email are used in conjunction with tools for registering requests and incidents.
Problem management is an "investigative" process. Known bugs are most often handled directly by the service desk. This is the most common form of event handling. To investigate unknown errors requires both common sense and instinct. Good operating people use instinct to go straight to the problem, find the solution and restore service as quickly as possible so that everything works normally.
Problem management is;
- Problem management
- Checking errors
- Proactive control to prevent problems
- Identify error patterns, using information from, for example, event management
- Identify problems
- Classify problems
- Examine/research problems
- Identify and register known errors
- Find temporary solutions if possible
- Contacting those with responsibility for Change Management to remove the error permanently
- Identify and solve problems and errors before the incident is reported by users.
- Use logs and information from event handling to see how problems may arise
Procedures for problem management
The Skolelinux/Debian Edu manual is a comprehensive collection of solutions for solving problems and configuring systems. Everything is on the Debian wikipedia pages. Solutions are maintained with the help of staff in schools, municipal ICT services, professional individuals and volunteers. See links to the English pages: https://wiki.debian.org/!DebianEdu/Documentation/Manuals The pages are being translated to Norwegian bokmål. We are working to link the pages to bokmål too.
The Wiki technology has proven to be a great success for maintaining catalogued information on the internet. It's easy to contribute to and all changes are logged. It is also possible to import OpenOffice.org documents, and export documents as PDF.
The resources spent on IT systems in schools must be handled in a financially prudent manner in order to control the services used and the equipment / infrastructure. The equipment, software and services have a whole range of settings - this is configuration, or a logical model of how infrastructure and services are set up.
To manage configuration it must be identified, saved and maintained. One must also be able to keep track of different versions of the configurations. We call each part of a setup for a Configuration Item (CI). A configuration file may, for example, ensure that certain users have access to a few printers in the network. Another can make sure you get a buffer on diskless clients.
An updated database for configuration management is essential to ensure rapid and controlled handling of operational issues, or changes in the layout of machines, programs or services.
It takes planning to set up a database for configuration management. One must decide in which areas to use the system, the objective, policies and processes for storage and maintenance of configurations.
- Identify and select a structure for configuration according to the important parts of the ICT infrastructure. Configuration owners, name tags (attributes), dependencies, and relations between configurations all need to be considered.
- Only approved configurations are managed in the database through the lifetime of the system. Control over access to the configurations can be done with group permissions, and can be done through the process of Change Management.
- Status logging - keeps track of the condition and status of the various subsystems. This applies throughout the lifetime of the service, software or hardware. There may be a configuration in production, disconnected or discontinued.
- Checking and revision. Each configuration must be checked to confirm that the correct information is stored in the configuration database (CMDB). This is followed up with periodic reviews to ensure that the database is up to date.
As we see, there is a lot of planning needed in order to have configuration management in the IT system. The purpose of planning as part of IT operations is to ensure that systems are fixed quickly when they go down. With a good configuration management, it is easy to replace a defective machine with a new one. The configurations can be quickly transferred to the new computer and the IT system functions just as well as before.
Management of Configuration Items (CI)
A configuration item is a part of the infrastructure. It is normally the configuration of a service or a program. Some times users want to change how a service work. One need to keep track of the configurations if changes are made.
To get this down to earth we can imagine the configuration of the printer server. You want to add a new printer to the computer network and will add this to the printing system CUPS. When changing one configuration through a web application or via configuration in KDE. CUPS config file will change, and you must restart the printer server again. This can be done in KDE tools or through a web application. The modified setup file is copied to a directory where the file can be handled by a version system.
Of many different choices there are a few common ones. This is if a service should: run, stop, terminate, start, be interrupted or taken out.
One should be cautious in changing configurations without a proper plan. It is easy to forget what you have done on a server or a PC. Therefore it is important to document the changes made in a change log.
Planning and installation
The configuration of the computer network is connected to the architecture. Much of the planning is done with Debian Edu. This is because it may take both 3 and 4 weeks to set up servers with corresponding service level with Windows server, RedHat or other GNU/Linux distributions. Debian Edu takes this with 1-2 hours. If you want a fixed IP address for the network a professional uses ½ hour extra on this. This is because web services are set up with reusable names.
What then must be planned is which additional user program to use, and which subsystems should interact with Debian Edu. It may, for example. be that the school has an electronic whiteboard.
We have made a list of activities and solutions that are important in good configuration management.
- Establish a version-controlled area for saving configurations for all servers and selected workstations and laptops. Git and SVN are often used for this. Remember to take daily backup of the area, and make sure to save all changes in configurations.
- Use an electronic system for taking care of recipes explaining configurations of different type machines, the network and services. Such recipes contributes to others who help or take over operations can read up on what is done. A wiki can be suitable for this.
- Use one specific version of the operating system and software on all machines. This is to avoid maintaining many different versions of the software. Ensure that the software is well tested. Therefore, it may be wise to wait 6-12 months before adopting latest edition of a program.
Relations to other processes
Management of configurations are closely connected with the handling of problems and if the systems are available. If printing stops to often, it may be that a configuration change solves the problem. It may, for example, be to establish a routine for deleting the print queue and restart the print service anew.
The aim of the changes you make in the configurations are usually to increase the availability of services or programs. It may also be to restrict access to certain programs or services to specific times. To achieve this, one must reconfigure the service. In addition, it may cost money beyond what was agreed on as service level or capacity of the system.
The examples show that the managing configurations engages a number of other areas. Therefore there is much to gain by putting in place good practices for managing changes in configurations. Also automation is advisable if you want greater stability, or access to certain services in specific periods.
Tools for configuration management
As mentioned under Check list one may use
- Saving the configuration files in a version-control system, for example subversion.
- Wiki for storing documentation of setup and wizards
- Use a common directory for operational documentation on the internet, maintained by Skolelinux/Debian edu staff in the schools.
Many ICT services are not clever in handling changes in ICT systems. Leading to many disgruntled users. Surveys in the public sector in Denmark show that operating costs go down when you have good control on the changes. Therefore, it pays to involve users with training and participation related to the changes made.
Change-messages is entirely dependent on proper processes. This applies regardless of whether the changes are small or big. Therefore it is important to have in place the right people when making changes, both to give training and to have people to answer questions. This becomes especially important when adopting new releases of software and services. This is independent of whether one uses free or proprietary software.
Change Management should ensure that all changes are made in a standardized and right manner. It is important to anchor the decision about amending at the appropriate level in the organisation, Standard changes can often be pre-approved when they are done a few times. But major changes will often involve a higher decision level between school management and operator.
The reason why the management should be included is that an upgrade will often require training of users. It may be upgrading to a new browser or a new version of office software. This can quickly lead to a half day training in what is new in a program. Such changes must be agreed with the management. The changes must also be done without the other parts of the system stops working.
Those with responsibility for approving changes receives a so-called change message or RFC (Request For Change). When you have a RFC you can assess whether the change should be performed. Many times you have to clarify with management if optional changes should be made, and if so, when it will happen.
By changes one must also cooperate with the school's ICT responsible. One must ensure that changes occur when it fits with the schools plans. To implement significant changes without Change Management can lead to much dissatisfaction and additional inquiries to the Service Desk. This would provide significant extra work without this being planned. In addition, it may lead to a change that would soon be rolled back. You fast get twice as much work without ending anywhere else than back to start. Had one made the necessary approvals, may the change be done in a planned and straightforward manner.
Change Management is done to avoid more extra work than what's necessary. Making changes obviously requires more work, but you will get less extra work on the changes planned. One also avoids the need to roll back changes, because problems arise where users are unprepared for substantial changes.
When you for example update the entire system to a new version, make sure that everyone is informed. One must look into whether those affected by the change need training. The right professionals must prepare it all, so there are no surprises.
All responsibility must not land on the person responsible for managing versions of software, the release manager. Release handling is a process which preferably should work with changes that contains many minor changes. This usually happens when rolling out new systems and services, or the upgrading of the entire system to a new version.
- See change message, or RFC (Request For Change) above, and check it also has got a unique number.
- Prioritize and categorize the changes
- Remove not possible changes. This can be done by marking them as not possible.
- Give feedback to the one giving the change message
- Make sure you have a Change Advisory Board, where the change is dealt with, discussed and evaluated. This consulting group can be selected ICT contacts and operations personnel with long experience.
- Coordinate changes with the Release Management which handle different versions of applications and services.
- Look over and finish the changing message (RFC)
- Remember to save modified configurations in the repository for configuration files.
Even what may look like a small insignificant change message can have major consequences for if the change is implemented. We have examples of schools that have a stable Debian Edu network where all the programs work. A test version of a popular program crashing constantly, is installed, and Debian Edu get blamed.
An example is schools that have installed the test version of the latest OpenOffice.org before the program was finally finished. Several thought it could be fun and try out. The problem is that the test editions are usually released to find errors and instability in applications. They are not intended for production use
In production, the general rule is that you don't install test versions of software. Most operators recommend using the next to latest version of a program intended for production. After 6-12 months are usually the worst errors picked out of a new main version of an application.
It means one often wait until summer before updating to a program that were reissued just before New Year. This fits well with the school year. The alternative may be instability and irritated users. Therefore the advisory group plays a key role when done small or large changes.
Release handling is management and planning activities preparing for wanted changes. The changes can be small or large, where large changes can consist of many smaller changes. Release management goes on before initiating the actual job of installing software and hardware into production.
First the planning and testing of new releases are carried out. Then it all is rolled out it into production. Deployment is part of the infrastructure management. The procedure is to implement what is planned, tested and is ready within the systems for Configuration Management. Once everything is planned, tested and configurations are stored, then roll out the solution in production.
Usually, many service providers and suppliers are involved. This applies both to the acquisition of machines, the software used, and the recommended configurations. Good resource planning is crucial to package and distribute a new release in a good way for users. Cutting corners in this area can lead to equipment that doesn't work, or that goes unused because of deficiencies in the installation.
Release Management takes a comprehensive approach by the change in a service, and ensure that all parts of a publication is seen in context. This applies to both technical and non-technical factors.
As you can see, for computers, software and network to work as planned, release-management is crucial. Proper handling of releases prevents disruptions. New releases or changes can be introduced while operations continue as normal, without interruption or reduction in quality.
Implementing changes or new releases can be compared to building a new road. Cars must still get past even if you build a new road on top of the old. Good signs must be in place. One must also have the necessary resources to rebuild the road. If you lack the resources to make changes, it's better to let it be.
Some might think that proper release management is boring as one doesn't get to implement the latest version every time something new is released. But often the operations department lacks the resources to handle a flood of complaints should an upgrade fail. High uptime requires established technology, as said by Linux expert David Elboth in the Linux Magazine (1/2004). He writes:
- The more you demand of the system the more stringent are the requirements of the individual components. High requirements for uptime results also show that the choices you are left with are old technology. Only empirical data over time can say anything about downtime. We have all noticed how far behind are Red Hat and !SuSE with their server products.
To get few complaints, with a stable and reliable environment, requires solid release management. Alternatively, a bunch of complaints and dissatisfied users emerge, caused by installing insufficiently tested cutting-edge software. Amateurs have a tendency to underestimate the consequences of software upgrades. If something works fine on your home computer, it does not mean that this will work in a wide network with 500 client computers and 3200 users.
Definitive Software Library (DSL)
A software archive in an operational context is a collection of original copies of the software in use. If you use Skolelinux 2.0, this is the software package. The phrase software archive is used differently in some other contexts, especially among programmers. When it comes to operations, we would be talking about the original software package of a particular version which is used for the installation.
By using free software, the software archive may be Skolelinux 2.0 plus the extra programs you have added from various sources. There may be certain versions of Macromedia Flash, Java and decoders which make it possible to run national tests in the browser, or to watch broadcasts from a national TV station.
If you plan to upgrade to the next version of Debian Edu when released, this new version shall be the main program archive. The new archive shall also include appropriate versions of all additional applications beyond Debian Edu.
Set-up files customized or created locally by the operations department are not included in the main program archive. Configurations are saved separately in a version-control system or database.
Database for configurations and hardware
As mentioned in the chapter on configuration management, you must create a database or a version-controlled directory to take care of set-up files. One should also keep track of all computers, what kinds of machines are in use, performance, and unique standard addresses on the network cards (MAC addresses).
There are many reasons to have an overview of the equipment. One of the main reasons is to keep track of how many machines are in operation, how many are not in use and how many are being repaired. Another reason is planning for upgrades.
A variety of applications in addition to browser and office suite are installed in schools. Educational programs for learning, browser plug-ins, and programs for multimedia are needed. The systems also have network set-up and changed settings in specific programs. When you have many servers and perhaps thousands of clients, the need for effective tools for deployment, soon makes itself felt. Such tools are standard in Debian Edu.
Build management is about ensuring that you always install the required software packages, services and proper settings both of individual programs and for the network. Many people have heard about the so-called "images". One installs the operating system with all needed programs and configures the network. Then one uses an image program to make a copy of the hard disk. This "disk image" can then be copied to other computers.
It is not necessary to build such disk images. Debian Edu is based on Debian which has an excellent package management system. There is no need to compile applications, as ready-made packages can be installed directly from the Internet. It is enough to work out what changes you want to the default set-up of Debian Edu or the main program archive in use. Then you make one or more scripts to run on each machine that get everything installed and set up.
For most situations, scripting is an easy way to "build" and roll out programs and configurations. But there are situations where building disk images may be the solution, e.g. for installation on many laptops.
As we see, handling the construction process is about facilitating deployment on many computers. In exceptional cases, this may involve building a tailor-made Debian package. But in most situations, everything is ready-packaged. Then you have to put in place a script which installs additional programs and certain settings. One can also create disk images if you have many similar machines, such as laptops for all students
It is essential to test new applications, configurations, and new services before they are put into production. Several schools have experienced instability when they have installed software without making the necessary adjustments. Therefore it is crucial to test changes in configurations or new versions of the software before the change is made on all machines.
Testing generally takes place in three steps.
- First, do an installation of the changes on a test network. This is technical testing to check that everything hangs together in a system with few users. Take care to include all changes in configuration files.
- When you are sure that everything works on the technical side, try installing the solution in one school. It is very important to agree about the testing with the school's ICT contact. Users must also be fully briefed on changes made for the sake of testing. Take care to preserve current adjustments in the set-up files, which may have been made in the course of normal maintenance.
- When you are sure everything works, you can roll out the solution to all schools. It is easiest to create a script that simplifies upgrading of software packages, services and configurations.
Much can go wrong during a new installation or upgrade. Therefore, one must have ready a fall-back solution. This lets one quickly get back to the system as it was before the upgrade. In technical terms, this is called roll-back.
When rolling back it is absolutely essential to have ready the previous version of the software archive and configuration files. This means that you can install for example Edu 1.0 in under an hour, and put in place the appropriate configuration files.
But roll-back takes time. Therefore, it may be prudent to have a server ready with the previous version of the software, the right configurations, and a recent copy of the users' home directories. This server can quickly replace any machines on which the upgrade does not go according to plan. Having server machines in reserve can ensure high availability even if something goes wrong.
Advantages and possible problems
The advantage of having records of the software in production can't be underestimated. Many rely on having the software on their respective CDs and DVDs. This is inefficient distribution. To save time and trouble all the software in Debian Edu is available online.
Your operating department can create a copy of the Debian Edu archive on a central server. From here, all the software can quickly and smoothly be installed on other machines. The advantage is that your ICT service has a constant overview of the versions of the software they have made available to schools. This also prevents the installation of software that has not been considered by the Change Management.
There may be considerable problems if you do not maintain the software archive and configurations. It can also lead to mistakes with a configuration or software package. Then this gets rolled out to all machines. In addition, some schools may install insufficiently tested software or use beta releases in production. So one must have good processes and have someone to take responsibility for maintenance of the program archive and configurations.
It may seem like one needs a lot of extra things in place in order to install and maintain the services and programs that are in use. However, if you skip the tools that provide management of upgrades, you give yourself a lot of extra work. The ICT service must spend a lot of time on manual installation on each machine. The danger of making mistakes increases. When things do not work you get disgruntled users, and much time is spent fixing problems.
Many operating major IT systems have inadequate plans for changes. Some have no plans at all, but just installing new versions of software. Changes made can be perceived as problematic for some users, because functions they are comfortable with changes place in the user interface. For operations it can go completely wrong. For example when they should upgrade to from older version of Windows to newer in Arendal municipality, most stopped working. ICT service said they had several computer program that was held together with "wire and tape." It took half a year to clean it up.
Planning and implementation
The reason for planning before implementing changes is to avoid weeks or months of delay due to problems. The time used for planning is quickly regained because one avoids additional problems. There will always be people who say they have had no problems with ad hoc changes in the systems; but closer examination reveals that there are problems after such changes, they merely don't get communicated.
In our eyes ad-hoc solutions are only a detour through changes, and only an emergency measure. An ad-hoc solution is like a temporary repair with "wire and tape." One must in due course clean up such solutions to ensure stable operation without constant surprises. Skipping a planning phase leads to many more ad hoc solutions, and several operational problems when changes or upgrades are done. Therefore it is essential that professionals and management understand the value of a good planned process for changes.
Therefore, we recommend that you convene a meeting for planning, and make a stepwise plan for changes in the system. A stepwise plan will naturally vary according to the change. Upgrading the OpenOffice.org suite is quite different from upgrading the whole system. When upgrading to a new office application, a 2-3 hour tour of the office suite may be enough for the teacher in each school. When upgrading the entire system one must both provide user training and test that the technical details work as intended.
You'll find few shortcuts is the main point when it comes to planning and implementation. Studies show that those who plan properly and ensure that people have the right skills, have lower operating costs for the operation.
It is crucial to plan new releases. Most modifications of the system should be clarified with management. The following list of activities is designed to support the upgrades in a planning and implementation phase.
<table> <tbody> <tr class="odd"> <td align="left">Tasks </td> <td align="left">Details </td> </tr> <tr class="even"> <td align="left">Prioritization of the release: </td> <td align="left">Check if the necessary decisions are made before a change or upgrade would be deployed. </td> </tr> <tr class="odd"> <td align="left">Definitive Software Library </td> <td align="left">Ensure that the appropriate software packages to be installed are in place in the definitive software library. </td> </tr> <tr class="even"> <td align="left">Configuration database </td> <td align="left">Be sure to have in place all configuration files. This applies both to those who are in use, and the new ones supplied in systems to be changed or updated. </td> </tr> <tr class="odd"> <td align="left">Build management </td> <td align="left">All scripts and systems used to deploy or create disk images must be in place. </td> </tr> <tr class="even"> <td align="left">Testing </td> <td align="left">First, run trials on test equipment. When this works without any problems, it can be tested at a school. The school must be fully informed about, and fully in on trying out new software. When one is sure that everything works, you can upgrade for all. </td> </tr> <tr class="odd"> <td align="left">Fall-back solution </td> <td align="left">Even with extensive testing, new releases may go wrong. Therefore it is essential to have a fallback. The easiest solution is to spare have the old installation with data on a separate server machine. Such a machine can be plugged in if the change or upgrade does not work. </td> </tr> </tbody> </table>
As seen from the activity list, one needs several tools to keep track of different releases of software, services and hardware in the system. Some of these tools mentioned previously. But we repeat this anyway:
- Debian tools for the definitive software library
- Database for configurations and hardware (subversion setup files, spreadsheets detailing all hardware with physical location)
- Build management (the system which builds Debian packages)
- Hardware for testing and backup-solution
Relations to other processes
Release management goes directly into the core of the ICT service. It goes on implementing appropriate security updates, change in services or upgrading of computer software. Requests for new releases may be due to operational problems or desire new software. Before a new release it is assessed if the change is necessary.
If the change is straightforward one will make necessary changes in configurations and clarify application packages for unrolling. This have been tested, and one will have in place backup solutions. When changes are made, will perhaps have change parts of the operational activity. It's easy to see change management affects all parts of the operating support.
Tools for operational support
The first thing you should ask yourself: "Do we really need software tools?" Do one need tools, it is crucial to examine the options thoroughly.
Taking a glossy brochure, and listen to sales talk, we are totally dependent on such tools. But good people, good process descriptions, good procedures and job descriptions are a basis for good service management. The need for, and how complicated the tools are, depend on the organsation's need for computer systems, and the size of the organisation.
In a small organisation, will a single freely accessible database be enough for logging and management of events (request tracker). But in larger organisations will almost certainly need a sophisticated distributed and integrated tools for service management. It means linking all processes to a system for event handling.
Although tools can be important, as they are not important in itself. For the tasks and processes to be done, and the information needed which are important. They will provide the necessary information to specify which tools are best suited to support operations. Here are some reasons why one may use software for operational and service management:
- increased demands from users
- lack of ICT knowledge
- budget limitations
- organisations is entirely dependent on the quality of service
- integration of systems from multiple vendors
- increased complexity of ICT infrastructure
- emergence of international standards
- Extended scope and changes in ICT
Automatic tools allow:
- Centralisation of key functions
- Automation of functions in the service delivery
- Analysis of data
- Identification of trends
- Preventive measures may be implemented
Type of tool
In this chapter we have proposed a number of tools to improve operational support. Here follows a summary of the tools:
- Debian tools for the definitive software library
- Database for configurations and hardware (subversion setup files, spreadsheets detailing all hardware with physical location)
- Build management (the system which builds Debian packages)
- Hardware for testing and backup-solution
- Request Tracker
- System for monitoring (Munin)
As operations department get more experience with systematic operation it will be made, or obtained several types of tools.
Evaluation criteria when selecting tools
Although it is used large amounts on creating evaluation criteria for software, the result is only experience-based guidelines. There is no final answer to what's good or less good software. As much else it revolves partly about taste. Different solutions do the same job just as well, but may have quite different design. However, here may some rules of thumb be useful.
The main evaluation criterion is whether one needs to do a job at all. Many IT tools are absolutely perfect and works without error, but it solves tasks not needed to be fixed. So the main criterion is whether it resolves the correct problem, and if it at all is necessary to do anything.
- So the first thing you ask about is whether the tool is wanted.
If it turns out one will have done a task, the solution my be so simple as to run some commands manually. The simplest way is best. But when one gets many machines to operate, automation becomes crucial. It's too much work to log into 20 similar server machines to do a security upgrade. Then automation is the thing.
- So here one must ask whether the tool is useful to solve the task
- Then one must ask whether the tool is usable.
There are often a wide range of programs and procedures to solve a specific task. But some problems solved completely different when maintaining 500 computers and 11 servers, than when fixing your home PC. An example might be tools that allow the teacher to see the desktop of each student on his or hers client machine. The teacher can stop and start programs for all pupils, and prevent individual pupils to use for example IMs when this interferes with school work.
Regarding the choice of operating tools, it's about automation and simplification of operational tasks. It is about making and reduce manual work to a minimum. So the motivation is to just maintain automatic. Also here it is to make things easy, which can be a considerable job to get done.
As you can see, it is not easy to set up good criteria for selection of operating tool for large installations. Most of all, this is because software developers often lack experience in the operation of IT systems. They are known only to create new things, but to create good and relevant tools for operation requires many years of experience.
Some general operational tools have not been replaced the last 20 years. But the products used may have been replaced. Also some programs may in a few years time be irrelevant to use. Therefore, one must rely on training in new editions of the applications used for operation, and in upgrades and changes in user programs.
Thorough user training makes a lot of support can be done informally in direct conversation between users. Often training costs as little as 1% of the total operating costs. It is well worth spending a little more on training. The effect is very positive. The same applies proper training for ICT contacts in schools, and operators. Training of ICT contacts to use simple systems for password change, error messages, etc.. will provide better quality of calls to the IT service.
Training and product practices are in Norway regulated in the Working Environment Act (§ 4-2)
- Employees and their representatives will be kept informed of systems used in the planning and carrying out the work. They should be given the necessary training to familiarize themselves with these systems, and they shall take part in designing them.
So in short it can be advantageous to increase efforts in training, which will improve ICT service and provide a significant cost reduction. This is because users and IT contacts becomes more confident and better to help each other. It should also be noted that the transition to new software can also provide an opportunity to simplify some of the operating practices. Simplification can reduce the requirement for product training.
Planning at the upstart of service support
A growing number of organisations see the necessity of service control. It is often the practice to base decisions on historical and political considerations, rather than the current organisation's needs. Therefore it is important to ensure that management commits to participation and understanding of the working methods in the organisation, and go through the existing processes and compare these with the organization's needs and "best practice".
Implementing service support
Determine current situation
General guidelines for project planning
Business case for the project
Critical success factors and possible problems
Project review and reporting
Evaluation of the project
Going through to check compliance with quality parameters
Going through in relation to key factors