As mentioned in the introduction, it is recommended to begin by establishing an office for centralized operations to allow you to manage tickets. The benefits of this come quickly and are visible, which is important for customer and user satisfaction.
After the office is up and running with a sensible workflow for tickets (user requests and troubleshooting) you will move on to the biggest challenge for the organization. As a rule, this is either change management or problem solving. Organizations with "cowboy" system administrators who come up with smart ideas and implement them without much testing, often begin with change management. For organisations suffering recurring outages, problem solving comes first.
Whatever you choose to start with, a certain amount of configuration management will be necessary. Managing configuration is critical to delivering the software and services for the user. Software must work as expected. In order to make beneficial changes, one must know the configuration of the different programs.
To manage configuration changes you may use a database (Configuration Management Data Base (CMDB)). Few people use a database for all the configurations, and neither do you have to add all configurations to one single database. It's fine to place configurations into multiple smaller and partly independent repositories. Some people, for example, store configurations and setups in version control. But even if you have different repositories, you may get greater benefits if you connect information from the different processes.
For users of Debian Edu, most service configurations lie within a specific directory (/etc). These may benefit from being collected and stored in a central version controlled directory. This makes it easier to restore lost services and setup machines if they are reinstalled. This applies both to servers and user laptops or workstations. As part of the backup system in Debian Edu, a backup is made of the setup directory /etc. But the backup system is nothing more than a database or a version controlled directory for configurations.
The Service Desk is where users submit questions or errors. At school, the ICT-contact often forwards operational events to the Service Desk. There may also be requests like setting up a new PC, or installing a program.
At school the ICT contact is the link to the Service Desk. The ICT contact also responds to the most common questions. Some questions are too difficult to manage at each school and must be forwarded to the Service Desk. It is important to have good cooperation between the school ICT contact and Service Desk operators. Tasks that are too extensive or too difficult to solve locally should be passed to the Service Desk.
Users may also get direct answers from an operator at the Service Desk. All operational enquiries go to the Service Desk. Enquiries will be assigned a case number. Anyone who has registered a case will receive an e-mail confirming that the inquiry has been received. During consideration of the case, those working with it at the Service Desk may send updated status to the user.
This way, users get one point of contact, and service desk operators get an overview of all of the cases. Operations can be expected to troubleshoot across all parts of the organization. Periodically the team leader needs to go through all issues and solutions in order to prioritize debugging and to prevent re-occurrence of errors, in order to provide schools with a stable operating environment.
Incidents can be reported over the phone, fax, email or web form. Incidents that are more urgent must be prioritized. Incidents that need to be resolved quickly are usually reported by telephone. Less important events are usually reported via eg. email. A member of the support staff should be assigned to the incident and will need to ask the user questions to investigate the problem.
- Remember to be an active listener, not a passive one.
All enquiries should be logged, and an email confirmation should be sent. It is important that the user should feel safe, and information about what might be the problem should be communicated to them. When the enquiry arrives at the service desk, a brief description of the incident should be logged. The enquiry may be from the ICT contact at the school, or from someone with an agreement to use the service desk. The event logging should happen as soon as possible, and it should be assigned a case number. The user should get a confirmation by email copy that the matter has been received and assigned appropriate case number.
Previously, enquiries were written in paper logbooks. Today software is used to record the enquiries. In English, this is called "Request Tracker". It is crucial for operations to log enquiries. This is basically for error handling, user requests, and prioritization of the various incidents. Log entries are important to prevent recurring errors. Because operational events are periodically reviewed, an assessment of fixes and priorities can be made. The log also provides a basis for improving the service by debugging problem services and applications based on what users perceive as problematic.
Thus the log of requests is a basic and necessary tool both for users and the service desk. There are several freely available systems for logging requests with good documentation <ref>RT Essentials: http://www.oreilly.com/catalog/rtessentials/chapter/index.html </ref>. Skolelinux Drift uses RT <ref>RT Essentials: http://www.oreilly.com/catalog/rtessentials/chapter/index.html </ref> to handle requests.
One important thing when starting up support is not to get too tough a start. Do not try to achieve everything at once; bet rather on "quick wins" that keep the user informed, and aim for quick response times. It is also important to clarify who the service desk should forward events to, if they can not solve the issue themselves. The support desk must also check whether there will be disruptions for the user. This makes it quick and easy to give feedback.
For the users it is important that incidents are dealt with. For the service office it is important that the incidents are handled correctly according to the service level agreement, and that work requested outside of what was agreed is handled between management at the school and the system administration organisation.
Tasks and roles
We recommend to agree upon what duties the school's ICT contact has and what is the responsibilities are of those who work at the Service Desk. Schools often have little resources compared to what is common in municipal administrations or private companies. At the same time, schools usually have many more users and often more client machines than in use in the rest of the municipality.
To distribute tasks roles must be in place. By having clearly established roles it is easier to distribute tasks and ascertain the working capacity necessary to resolve operational tasks. Operational experience in municipalities and professional organizations shows that these roles are common.
- ICT contact on each school This is often a teacher with ICT educational and/or technical background.
- Operator(s) working in the central IT service. This is a person skilled in operations.
- ICT coordinator who organises the educational use of IT, and contributes towards plans for developmental, operational and educational use. Often this is a teacher.
- ICT responsible. This is usually the principal who is responsible for IT operations.
Here is an overview of the various everyday tasks, some of which are contracted out by the municipalities.
ICT contact(s) tasks at each school:
- Oversee the school's server room.
- To be the school's contact at the municipality - report errors and outages.
- Perform simple maintenance tasks such as replacing mice and keyboards, upgrading thin clients, and simple patching.
- To be the school's superuser - to advise colleagues about: the user interface, e-mail, video projectors and relevant applications.
- Participate in ICT gatherings.
- Create and administrate local users.
- Perform simple maintenance of printers.
- Create and manage email accounts.
- Perform simple commands and operations under guidance of a ICT-tutor.
- Facilitate the use of ICT in teaching.
The operator's tasks:
- Receiving incidents and service requests.
- Mentor ICT contacts by telephone and e-mail.
- If agreed, visit the school for troubleshooting defects and malfunctions on computers, printers and servers.
- Security software updates on the school's computers (servers and clients).
ICT coordinator's duties:
- Assist school management and ICT contacts in expanding technical and pedagogical ICT plans.
- Ensure that the service desk and the management get a good selection of software.
- Ensure that the schools have appropriate ICT tools for teaching, and that computers and networks are appropriate for school subjects.
- Provide advice and guidance to operational services on what the technical and pedagogical ICT requirements of the school are.
ICT-responsible duties (principal, headmaster, head of operational services)
- Make joint purchases of computer equipment and enter into joint agreements etc.
- Develop competence plans.
- Provide the schools with courses in the educational use of ICT.
- Operations course.
- Negotiate contracts for operations.
- Ensure that the IT contact and the IT service have the necessary resources.
The advantage of an agreement for these tasks is that expectations on the individual are known, giving a good basis for planning and managing ICT services. Usually these ICT tasks are only done part-time by a teacher who also has teaching duties.
A business would often have two staff members working full time, operating 100 standard client machines with 100 users. In schools there may be a 30% position in total, divided among several persons, operating 100 client computers used by 320 students and teachers.
When the school has so few resources for operations, it is crucial to have good resource management. Making agreements for the tasks can make it easier to assess whether you need additional resources, or to reduce expectations of IT initiatives in schools with regards to the budget. By having a good overview of the ICT tasks in the school, if would be easier for IT administrators to ask for an increase in resources if necessary. There may be a need for increased resources to implement ICT-based exams, or a need for new equipment like whiteboards as teaching aids.
Expected time usage
We've created a table showing time spent on operations and maintenance. The table is based on the experiences of municipalities which implement a centrally operated Debian Edu of 9-10 schools with 250-500 client computers. Several things are not included in the table. Therefore extra time is required for projects where schools develop their own ICT solutions with networking and more equipment.
Time spend per school per week
Time spent in toal for all schools
Centralised operations staff
Monitoring, debugging and operation of 500 machines, for example, 10 schools with 3,200 students and teachers.
2-3 h(50 clients)
½ position(500 clients)
ICT contact at each school
Oversight of equipment, easy maintenance, and reporting of incidents and requests
3-4 h(50 clients)
1 position(10 schools / 500 clients)
Assist in planning and implementation of educational and technical ICT work in the school.
ICT manager (principal)
Make joint purchases, and ensure compliance with the service level agreement. Schedule updates, or develop solutions
Overall for a school
50 client machines (concurrent users)
6 - 10 h
Overall for all schools
10 schools, 500 client machines (concurrent users)
2 ¼ position
Experience shows that the scope of work of the ICT contact is affected by the number of concurrent users. The term "concurrent users" is new to many. To illustrate with an example: A school may have 250 students but not more than 50 computers. Then a maximum of 50 students can use computers at the same time. This is much less than the total 250 users who have an account on the system. It is these 50 logged in users that provide work for IT service. The other 200 people not logged in give little extra work.
Therefore, it is common to calculate IT costs from the maximum number of concurrent users. Other calculation methods are also possible, for example when paying for proprietary software. But since Debian Edu has no license costs, the number of concurrent users is the most crucial figure for operating costs. To calculate costs from user accounts provide little or no meaning for a school.
For users of Debian Edu the cost difference to manage 100 or 250 user accounts is very small. There are a few exceptions. With 250 students instead of 100, some students may repeatedly forget their password. Therefore, it is wise to let the teacher responsible for the class give these students a new password.
If the school has 50 client machines, the ICT contact needs less time on their operational tasks than if the school has 150 clients. With multiple clients, the overall time spent on the operation increases, but operating time per client machine goes down somewhat.
Several municipalities have set aside 3-4 hours a week to the ICT contacts tasks at each school with 30-70 client machines. The Education Department in Oslo has set aside half a weekday, or a 30% position, to follow up 150 client machines. Experiences from other municipalities suggests that a 20% position is enough to solve the tasks of a local ICT contact when a school has 160 thin or diskless clients with Debian Edu.
In addition there are associated costs of centralized operations, ICT management, and construction of the educational use of ICT tools in school subjects. One position is probably sufficient for the operation of 1000 client machines. When it comes to educational support, several principals have a 50-100% position in the school for this work. There may be a 10-20% position as an ICT contact and a 40-80% position as an educational support for the teachers. Many teachers perceive IT tools in schools to be something new. Some principals wish to give more backing to the educational side by making teacher more confident in using IT tools across the different subjects.
We have sat up a list of tasks to set up a new service desk.
- Arrange people in different roles like IT manager, IT contact in schools, central operations and IT coordinator for all schools. It is important to make a distinction between what is technical operations and maintenance, and what is pedagogical work.
- Establish the service desk such that every school has a service agreement regulating what is standard operating activities, and what is extra. It is imperative that ICT-responsible principals are a part of this process.
- Establish a system for handling incoming requests (a request tracker). All enquiries by email need a case number. Almost all enquiries from users or IT contacts from schools also need a case number.
- Ensure that ICT budget reflects the contribution necessary to ensure proper operation of school computer equipment and networks. The requirement today is that the ICT systems will be used for national and local tests with use of ICT tools with or without the Internet.
- Basically use the standard edition of Debian Edu with the same version on all schools. From this make the changes you want. These changes must be taken care of in a configuration database with documentation of the changes made. Version management can be used to save the changes and documentation.
The purpose of the ICT service is to prevent disturbances like shutdowns or software issues. Users will experience few problems with the ICT system if the ICT service has enough resources to handle operations, equipment and for enquiries to the Service Desk. Small or big problems will cause interruptions for users, so good handling of incidents is necessary.
In parachuting they call near-accidents "incidents". It is perhaps not quite the same in computer operations when something is not working. The purpose of dealing with incidents is to restore services as quickly as possible so that everything works normally. If something goes wrong, it must have the least possible impact on users. What is a "normal service" is agreed through an operating agreement describing the service level.
Statistics of incidents is important, especially if several people work within the organisation. When several people work together, it is easy to lose track of the work. Statistics will point out problem areas that must be addressed more thoroughly than a quick fix from the service desk. For example, there may be many requests to replace forgotten passwords, so it may be wise to let the teacher change passwords for pupils in their class.
An operational disturbance is defined as:
- an event which is not part of normal operations and causes, or can cause, an interruption or reduction in the quality of the service.
Examples of operational disturbances may be:
the office program (OpenOffice.org) does not start
- the web browser (Firefox) crashes
- the hard drive is full
- the server is down
- unable to print
- unable to log in
- requests for information, advice or documentation
- forgotten password
The examples show some of the common operational disturbances. These are problems that make users turn to ICT contact the school or the Service Desk. IT service must prioritize what must be treated right away, and which problems need more time to resolve. To prioritize which problems need more comprehensive debugging, it is important to log all inquiries about malfunctions. This gives an overview of which disruptions it is most of, neseccary for acting on those areas with most problems.
We have made a short check list to ensure it has in place procedures and systems for good event handling
- The operator making debugging, is the one that reports the status back to the ICT contact at school and/or user
- The system for logging events must be in place, working both technically and functionally for those working with event handling in schools and at the Service Desk.
- The system for event logging must be used for nearly all operating events
- Statistics of the log of events is made periodically. The statistics used to implement measures which eliminates recuring problems, which irritate users.
Planning and implementation
To set up a workable system for logging events require something more than installing the system. All in the operations department must use the system. Those reporting errors must also receive feedback on email with a ticket number. This require significant efforts configuring the system for event logging. In addition, one must ensure plain user training of those who receive the requests.
Large and comprehensive plans are not need to put in place a proper event handling. To handle events is a completely standard task for those who work at the Service Desk or as ICT contacts at each school. To set up a computer tool for logging events may require up to a few weeks for a correct configuration, and users may also report events via e-mail and by phone.
The user interface to the log system is relatively self-explanatory. So it does not take many hours to start applying. During the daily use of the system one will become more and more comfortable with what should be replied to the logged messages. It is crucial that all in the operations department use the system for logging of operation messages.
Activities when operating interference occurs
To get an idea of activities done in relation to a message of an event, we use an example.
A user contacts the service office with a problem. Printing does not work, is the message from the user on the telephone. Operations logs event right after the call is completed. The problem with the print becomes a case with a case number (given automatically).
Operations at the Service Desk makes a quick analysis. Has the spooler stopped again, or is it something else? There may be missing paper or toner? By examining the spooler looks the operator that it is filled up. She deletes the queue and see if the next job is printed.
This time the print queue refilled again. Operations contact school's ICT contact asking to check whether the paper is in place. This is listed in the event log. The ICT contact replies that the paper is filled in, and printing is normal. Case closed, also noted in the system event logg.
If the had not started again, toner may be missing or the printer had an error. Was it an error the operator must have scaled problem. Scaling means that some other than the operator or the ICT contact resolves the problem. In this example one needs help from a technician who fix printers.
This example shows that it is initiated a whole apparatus to start a printer again. Printers will not work even if you have added more paper in a an empty printer, so one must first examine if toner is missing. If everything seems to be in place, but things still do not work, you must scale the problem. Operating department call an expert in a particular area to fix the error. This time it was a service technician for printers.
What was wrong and the fix done is noted in the system for event logging.
A variety of roles are involved when ICT service processes messages about something doesn't work. In the example above the school's ICT contact and the operator cooperate to solve the printing problem. Had the issue been greater, they have to summon a service technician. If one can't fix the printer, one have to buy new one. If the school must obtain new printer, involving the ICT managers may be needed to get money. Many places the principal who has the last word.
In short, it quickly becomes many who get involved when something does not work. One should basically solve problems there and then. Avoiding involving many who can't help solve the problem. Scaling problems which can be solved locally, becomes quickly more costly. Also because many inquiries are easy to deal with there and then. Other requests involve more complex problems. Then you have to involve more people. Are additional or external help to solve the problem needed, this must as a main rule be clarified with the operations manager. The important thing is to be aware when handling operating events, and using resources in a good way.
We have sat up some key points for handling incidents. The points will be helpful to consider whether doing a good job out of measurable and well-defined requirements. Such measurement points are:
- Total number of operating disturbances.
- Average time from receiving an inquiry to the issue is resolved, and classified with codes (a well organized operation department has codes for types of events and errors).
- Percentage of incidents handled within agreed response time (as agreed in the service level agreement).
- Average cost for each event
- Percentage of incidents solved by using the service without going on to the next level with operational
- Events per client machine (workplace)
- Number and percentage of incidents solved by the operations center without the need for visits to school
A number of tools can make it easier to handle operational disturbances.
- Automatic logging
- Automatic routing of events to the right persons
- Automatic retrieving of data from the database for configuration management
- Phone and email are easy to use together with tools for registering requests and incidents.
Problem management is an "investigative" process. Known bugs are most often handled directly by the Service Desk. This is the most common form of event handling. By unknown errors one must investigate what's wrong. This form of debugging requires both common sense and scent. Good operating people use scent to go straight to the problem, find the solution and restore service as quickly as possible so that everything works normally.
Problem management is;
- Problem management
- Checking errors
- Proactive control to prevent problems
- Identify error patterns, using information from for example event management
- Identify problemes
- Classify problems
- Examine/research problems
- Identify and register known errors
- Find temporary solutions if possible
- Contacting those with responsibility for Change Management to remove the error permanently
- Identify and solve problems and errors before the incident is reported by users.
- Using logs, information from event handling to see how problems may arise
Procedures for problem management
Vi har lagt ved en omfattende samling av problemløsninger og oppskrifter for konfigurering. I løpet av sommeren 2006 vil dette også være lagt ut på Internett. Vedlikeholdet av oppskriftene vil skje av profesjonelle driftsoperatører på skoler, kommunale IT-tjenester og private driftsoperatører. For å gjøre det enkelt å gjøre forbedringer i dokumentasjonen er det hele lagt ut i en wiki som ligger på en Skolelinux-tjener.
The Wiki technology has proven to be a great success for maintaining cataloged information on the Internet. It's easy to contribute and all changes are logged. It is also possible to import OpenOffice.org documents, and export recipes as pdf.
The resources spent on IT systems in schools must be handled in a financially prudent manner. Then you have to control of the services used and the equipment or infrastructure as it is often called. The equipment, software and services have a whole range of settings. This is configurations, or a logical model of how infrastructure and services are set up.
To control the configurations they must be identified, saved and maintained. One must also be able to keep track of different versions of the configurations. We call each part of a setup for a Configuration Item (CI). A configuration file may, for example, ensure that certain users have access to a few printers in the network. Another can make sure you get a buffer on diskless clients.
An updated database for configuration management is essential to ensure rapid and controlled treatment of operational disturbances, or wanted changes in the layout of machines, programs or services.
It takes planning to establish a database for configuration management. One must decide areas where to use the system, the objective, policies and processes for storage and maintenance of configurations.
- Identify and select a structure for configurations on the important parts of the ICT infrastructure. It also applies to owners of the configuration, name tags (attributes), dependencies, and relations between configurations.
- Control configurations so that only those who are approved are taken care of in the database through the lifetime of the system. Control over access to the configurations can be done with group permissions. This can be done through the process of Change Management.
- Status logging - keeps track of the condition and status of the various subsystems. This applies throughout the lifetime of the service, software or hardware. There may be a configuration in production, is disconnected or discontinued.
- Checking and revision. Each configuration must be checked to confirm that the correct information is stored in the database configurations (CMDB). This is followed up with periodic reviews to ensure that the database is constantly updated.
As we see, one must plan a whole lot if one want to have a good management of configurations of the ICT system. The purpose of planning this as part of ICT operations is to ensure that systems quickly get on the air, when they go down. With a good track of configurations, it is easy to replace a defective machine with a new one. The configurations can be quickly transferred to the new computer and the ICT system perceives as just as well as before it went wrong with the old machine.
Management of Configuration Items (CI)
A configuration item is a part of the infrastructure. It is normally the configuration of a service or a program. Some times users want to change how a service work. One need to keep track of the configurations if changes are made.
To get this down to earth we can imagine the configuration of the printer server. You want to add a new printer to the computer network and will add this to the printing system CUPS. When changing one configuration through a web application or via configuration in KDE. CUPS config file will change, and you must restart the printer server again. This can be done in KDE tools or through a web application. The modified setup file is copied to a directory where the file can be handled by a version system.
Of many different choices there are a few common ones. This is if a service should: run, stop, terminate, start, be interrupted or taken out.
One should be cautious in changing configurations without a proper plan. It is easy to forget what you have done on a server or a PC. Therefore it is important to document the changes made in a change log.
Planning and installation
The configuration of the computer network is connected to the architecture. Much of the planning is done with Debian Edu. This is because it may take both 3 and 4 weeks to set up servers with corresponding service level with Windows server, RedHat or other GNU/Linux distributions. Debian Edu takes this with 1-2 hours. If you want a fixed IP address for the network a professional uses ½ hour extra on this. This is because web services are set up with reusable names.
What then must be planned is which additional user program to use, and which subsystems should interact with Debian Edu. It may, for example. be that the school has an electronic whiteboard.
We have made a list of activities and solutions that need to be in place should you have good management of the configurations.
- Establish a version handled area for saving configurations for all servers and selected workstations and laptops. One can use the version system subversion to this. Remember to take daily backup of the area, and make sure to save all changes in configurations.
- Use an electronic system for taking care of recipes explaining configurations of different type machines, the network and services. Such recipes contributes to others who help or take over operations can read up on what is done. A wiki can be suitable for this.
- Use one specific version of the operating system and software on all machines. This is to avoid maintaining many different versions of the software. Ensure that the software is well tested. Therefore, it may be wise to wait 6-12 months before adopting latest edition of a program.
Relations to other processes
Management of configurations are closely connected with the handling of problems and if the systems are available. Experiencing too often that printing stops, it may happen a change of configuration solves the problem. It may, for example, be to establish a routine for deleting the print queue and restart the print service anew.
The aim of the changes you make in the configurations are usually to increase the availability of services or programs. It may also be to restrict access to certain programs or services to specific times. To achieve this, one must reconfigure the service. In addition, it may cost money beyond what was agreed on as service level or capacity of the system.
The examples show that the managing configurations engages a number of other areas. Therefore there is much to gain by putting in place good practices for managing changes in configurations. Also automation is advisable if you want greater stability, or access to certain services in specific periods.
Tools for configuration management
As mentioned under Check list one may use
- Saving the configuration files in a version-control system, for example subversion.
- Wiki for storing documentation of setup and wizards
- Use of a common directory for operation documentation on the Internet maintained by those central operates Debian edu in many schools.
Many ICT services are not clever in handling changes in ICT systems. Leading to many disgruntled users. Surveys in the public sector in Denmark show that operating costs go down when you have good control on the changes. Therefore, it pays to involve users with training and participation related to the changes made.
Changing messages is entirely dependent on proper processes. This applies regardless of whether the changes are small or big. Therefore it is important to have in place the right people when making changes, both to give training and to have people to answer questions. This becomes especially important when adopting new releases of software and services. This is independent of whether one uses free or proprietary software.
Change Management should ensure that all changes are made in a standardized and right manner. It is important to ancor the decision about amending at the appropriate level in the organization, Standard changes can often be pre-approved when they are done a few times. But major changes will often involve a higher decision level between school management and operator.
The reason why the management should be included is that an upgrade will often require training of users. It may be upgrading to a new browser or a new version of office software. This can quickly lead to a half day training in what is new in a program. Such changes must be agreed with the management. The changes must also be done without the other parts of the system stops working.
Those with responsibility for approving changes receives a so-called change message or RFC (Request For Change). When you have a RFC you can assess whether the change should be performed. Many times you have to clarify with management if optional changes should be made, and if so, when it will happen.
By changes one must also cooperate with the school's ICT responsible. One must ensure that changes occur when it fits with the schools plans. To implement significant changes without Change Management can lead to much dissatisfaction and additional inquiries to the Service Desk. This would provide significant extra work without this being planned. In addition, it may lead to a change that would soon be rolled back. You fast get twice as much work without ending anywhere else than back to start. Had one made the necessary approvals, may the change be done in a planned and straightforward manner.
Change Management is done to avoid more extra work than what's necessary. Making changes obviously requires more work, but you will get less extra work on the changes planned. One also avoids the need to roll back changes, because problems arise where users are unprepared for substantial changes.
When you for esxample update the entire system to a new version, make sure that everyone is informed. One must look into whether those affected by the change need training. The right professionals must prepare it all, so there are no surprises.
All responsibility must not land on the person responsible for managing versions of software, the release manager. Release handling is a process which preferably should work with changes that contains many minor changes. This usually happens when rolling out new systems and services, or the upgrading of the entire system to a new version.
- See change message, or RFC (Request For Change) above, and check it also has got a unique number.
- Prioritize and categorize the changes
- Remove not possible changes. Thit can be done by marking them as not possible.
- Give feedback to the one giving the change message
- Make sure you have a Change Advisory Board, where the change is dealt with, discussed and evaluated. This consulting group can be selected ICT contacts and operations personnel with long experience.
- Coordinate changes with the Release Management which handle different versions of applications and services.
- Look over and finish the changing message (RFC)
- Remember to save modified configurations in the respository for configuration files.
Even what may look like a small insignificant change message can have major consequences for if the change is implemented. We have examples of schools that have a stable Debaian Edu network where all the programs work. A test version of a popular program chrashing constantly, is installed, and Debian Edu get blamed.
An example is schools that have installed the test version of the latest OpenOffice.org before the program was finally finished. Several thought it could be fun and try out. The problem is that the test editions are usually released to find errors and instability in applications. They are not intended for production use
In production, the general rule is that you don't install test versions of software. Most operators recommend using the next to latest version of a program intended for production. After 6-12 months are usually the worst errors picked out of a new main version of an application.
It means one often wait until summer before updating to a program that were reissued just before New Year. This fits well with the school year. The alternative may be instability and irritated users. Therefore the advisory group plays a key role when done small or large changes.
Release handling is management and planning activities preparing for wanted changes. The changes can be small or large, where large changes can consist of many smaller changes. Release management goes on before initiating the actual job of installing software and hardware into production.
First the planning and testing of new releases are carried out. Then it all is rolled out it into production. Deployment is part of the infrastructure management. The procedure is to implement what is planned, tested and is ready within the systems for Configuration Management. Once everything is planned, tested and configurations are stored, then roll out the solution in production.
Usually, many service providers and suppliers are involved. This applies both to the procurement of machines, the software used, and the recommended configurations. Good resource planning is crucial to package and distribute a new release in a good way for users. Slipshod in this area can end up with equipment that doesn't work, or are left unused because of deficiencies in the installation.
Release Management takes a comprehensive approach by the change in a service, and ensure that all parts of a publication is seen in context. This applies to both technical and non-technical factors.
As you can see is the publication handling fundamental for computers, software and network to work as planned. Proper handling of releases is done to prevent disruptions. By new releases or changes it is expected that operations will continue as normal without interruption or reduction in quality.
Handling of changes or new releases can be compared to building a new road. Cars must still get past even if you build a new road atop the old. Good signage must be in place. One must also have the necessary resources to rebuild the road. If missing resources to make changes, it's just fine to let it be as it is.
For some it may be boring with proper release management. You do not use the latest new every time something new comes. But often there is not room for the extra time in the operations department to handle a flood of complaints when new software fails. High uptime require established technology, Linux expert David Elboth states in Linux Magazine (1/2004). He writes:
- The higher requirements more stringent are requirements of the individual components. High requirements for uptime results also show that the choices you are left with are old technology. It is namely empirical data over time which may say something about downtime. We have all noticed how long after Red Hat and SuSE is on its server products.
Getting few complaints, with a stable and reliable environment, requires solid release handling. Alternatively, a bunch of complaints and dissatisfied users emerge, when installing not good enough tested cutting edge software. People with "boy room skills" has a tendency to underestimate the consequences of software upgrade. If something goes fine on your home computer, it does not mean that this will work in a wide network with 500 client computers and 3200 users.
Central program archives (DSL)
The program archive in operational context is a collection of original edition of the program version of the software which are in production. If you use Skolelinux 2.0, this is the program archive. In the computer world, the word program archive in different contexts, especially when programming. When it comes to operation, we are talking about the original composed software of a particular version which is the base for the installation.
Using free software the program archive may be Skolelinux 2.0 plus the extra programs you have added from various sources. There may be certain versions of Macromedia Flash, Java and decoders who make it possible to run national tests in the browser, or watching broadcasts from NRK.
If you plan to upgrade to the next version of Debian Edu when it comes, it will be the new version which is the main program arcive. Also here will all additional applications beyond new Debian Edu be part of the archive.
Setup files adjusted or created locally by the operations department is not included as part of the main archive programs. Configurations are saved in a separate version handled directory or database.
Database for configurations and hardware
As mentioned in chapter about configuration management, you must create a database or a version handled directory to take care of the setup files. One must also keep track of all computers, what kind of machines are involved, performance, and unique standard addresses on the network cards (MAC addresses)
There are many reasons to have an overview of the equipment. One of the main reasons is to keep track of how many machines are in operation, the number of machines that are not in use and the number of machines in repair. Another reason is planning on upgrades It is both the amount of.............?
A variety of applications in addition to browser and office suite are beeing installed in schools. Educational programs for learning, additional programs in the browser, and programs for multimedia are needed. The systems also have network setup and changed settings in specific programs. If you have many servers and perhaps thousands of clients quickly reveals quickly the need for effective tools for deployment. Such tools are standard in Debian Edu.
Construction management is about getting installed the required software packages, services and proper settings both of individual programs and data network. Many people have heard about so called "images". One installs operating system and all the programs needed. And adjust the network. Then use a image program to make a copy of the one installed at your hard drive. This is then copied at the other computers.
It is not necessary to build so-called "images" or disk images you can call it in Norwegian. Debian Edu is based on Debian which has an excellent package management system. One does not in any way to compile applications as this is preassembled and can be installed directly from the Internet. one must have in order is wanted changes to the default setup of Debian Edu or the main program archive in use. Then you make one or more scripts running on different machines to get everything installed and set up.
For most situations, scripting are an easy way to "build" and roll out programs and setups. But there are situations where the construction of disk images may be the solution. For ecample during installation on many laptops.
As we see, handling the construction process is about facilitating deployment on many computers. In exceptional cases, it's about building a tailor made Debian package But in most situations, all packaging is finished. Then you have to put in place a script which installs additional programs and certain settings. One can also create disk images if you have many similar machines, such as laptops to all students
It is essential to test new applications, configurations, and new services before they are put into production. Several schools have experienced instability because they have to install software without making the necessary adjustments. Therefore it is crucial to test changes in configurations or new version of the software before the change is made on all machines.
Testing generally takes place in three steps.
- First, do an installation of the changes on a test network. This is technically testing guaranteing that everything connect and works in a system without users. Retain all changes in configuration files.
- When one is sure that everything works on the technical side, try installing the solution to a school. It is very important to agree about the testing with the schools ICT contact. Users must also get full briefing on the changes because of testing is performed. Retain current adjustments in the setup files, which are made along the way from the operating messages that have arrived.
- When one is sure everything works, you can roll out the solution to all schools. It is easiest to create a script that simplifies upgrading of software packages, services and configurations.
Back up solution
Much can go wrong during a new installation or upgrade. Therefore, one must have ready a fallback solution. It means one quickly can use the system as it was before the upgrade. In technical terms, this is called rollback.
When rolling back it is absolutely essential to have ready the previous version of the software archive and configuration files. It means that you can install for example Edu 1.0 in under an hour, and put it in place the appropriate configuration files.
But rollback takes time. Therefore, it may be okay to have a server ready with the previous version of the software, the right configurations, and users' home directories. This server can quickly replace the machines were upgraded, but not worked according to plan. By having server machine(s) in reserve can ensure high availability even if something should go wrong.
Advantages and possible problems
The advantage of having records of the software in production can't be underestimated. Many aim to have the software on their respective CDs and some DVDs. This inefficient distribution. To save time and trouble is all the software in Debian Edu available online.
Operating department can create a copy of the Debian Edu archive on a central server. From here may all the software quickly and smoothly be installed on other machines. The advantage is that ICT service constantly overviews of the versions of the software they have made available to schools. One also prevents the installation of software not been considered by the Change Management.
There may be considerable problems if you do not maintain program records and configurations. They may also make mistakes with a configuration or software package. Then this is rolled out to all machines. In addition, some schools may install little tested software or beta program into production. So one must have good processes and have someone to hold accountable for the maintenance of program records and configurations.
Is it needed a lot extra to install and maintain services and software already in use? However, if you choose away the tools providing management with upgrades you give yourself a lot of extra work. The ICT service must spend a lot of time on manual work with installation on each machine. The danger of making mistakes increases. When things do not work you get disgruntled users, and much time is spent to debugging.
Many operating major IT systems have inadequate plans for changes. Some have no plans at all, but just installing new versions of software. Changes made can be perceived as problematic for some users, because functions they are comfortable with changes place in the user interface. For operations it can go completely wrong. For example when they should upgrade to from older version of Windows to newer in Arendal municipality, most stopped working. ICT service said they had several computer program that was held together with "wire and tape." It took half a year to clean it up.
Planning and implementation
The reason for planning before implementing changes is to prevent weeks or months of additional problems. Although one would use any extra time on planning, it is re-earned again quickly because on avoids additional problems. There will always be people saying they have not had problems with ad hoc changes in the systems. But when one examines more closely, it turns out that there are problems after changes, but inquiries about this are not communicated.
In our eyes ad-hoc solutions are only a detour through changes, and only an emergency measure. An ad-hoc solution is comparable to a temporary repair with "wire and tape." One must in sight clean up such solutions when you want stable operation without constant surprises. By skipping a planning phase you will get many more ad hoc solutions, and several operational problems when changes or upgrades are done. Therefore it is essential that professionals and management understand the value of a good planned process for changes.
Therefore, we recommend that you convene a meeting for planning, and makes a stepwise plan for changes in the system. A stepwise plan will naturally vary according to the change. The upgrade OpenOffice.org suite is someting other than upgrading the whole system. When upgrading to a new office application, a 2-3 hour tour of the office suite may be enough for the teacher in each school. When upgrading the entire system one must both provide user training and that the technical works as intended.
You'll find few shortcuts is the main point when it comes to planning and implementation. Studies show that those who plan properly and ensure that people have the right skills, have lower operating costs for the operation.
It is crucial to plan new releases. Most modifications of the system should be clarified with management. The following list of activities is designed to support the upgrades in a planning and implementation phase.
Prioritization of the release:
Check if necessary decisions are made before a change or upgrade should be unrolled.
Central program archives (DSL)
Ensure that the appropriate software packages to be installed are in place in the central program archive.
Be sure to have in place all configuration files. This applies both to those who are in use, and the new ones supplied in systems to be changed or updated.
All scripts and systems used to unroll or create disk images must be in place.
First, run trials on test equipment. When this works without any problems, it can be tested at a school. The school must be fully informed about, and fully in on trying out new software. When one is sure that everything works, you can upgrade for all.
Back up solution
Even with extensive testing, new releases may go wrong. Therefore it is essential to have a fallback. The easiest solution is to spare have the old installation with data on a separate server machine. Such a machine can be plugged in if the change or upgrade does not work.
As seen from the activity list, one needs several tools to keep track of different releases of software, services and hardware in the system. Some of these tools mentioned previously. But we repeat this anyway:
- Debian tools for central program archives (DSL)
- Database for configurations and hardware (subversion setup files, spreadsheets detailing all hardware with physical location)
- System for construction management
- Hardware for testing and backup solution
Relations to other processes
Release management goes directly into the core of the ICT service. It goes on implementing appropriate security updates, change in services or upgrading of computer software. Requests for new releases may be due to operational problems or desire new software. Before a new release it is assessed if the change is necessary.
If the change is straightforward one will make necessary changes in configurations and clarify application packages for unrolling. This have been tested, and one will have in place backup solutions. When changes are made, will perhaps have change parts of the operational activity. It's easy to see change management affects all parts of the operating support.
Tools for operational support
The first thing you should ask yourself: "Do we really need software tools?" Do one need tools, it is crucial to examine the options thoroughly.
Taking a glossy brochure, and listen to sales talk, we are totally dependent on such tools. But good people, good process descriptions, good procedures and job descriptions are a basis for good service management. The need for, and how complicated the tools are, depend on the organsation's need for computer systems, and the size of the organization.
In a small organization, will a single freely accessible database be enough for logging and management of events (request tracker). But in larger organizations will almost certainly need a sophisticated distributed and integrated tools for service management. It means linking all processes to a system for event handling.
Although tools can be important, as they are not important in itself. For the tasks and processes to be done, and the information needed which are important. They will provide the necessary information to spesify which tools are best suited to support operations. Here are some reasons why one may use software for operational and service management:
- increased demands from users
- lack of ICT knowledge
- budget limitations
- organisations is entirely dependent on the quality of service
- integration of systems from multiple vendors
- increased complexity of ICT infrastructure
- emergence of international standards
- Extended scope and changes in ICT
Automatic tools allow:
- Centralization of key functions
- Automation of functions in the service delivery
- Analysis of data
- Identification of trends
- Preventive measures may be implemented
Type of tool
In this chapter we have proposed a number of tools to improve operational support. Here follows a summary of the tools:
- Debian tools for central program archives (DSL)
- Database for configurations and hardware (subversion setup files, spreadsheets detailing all hardware with physical location)
- System for construction management
- Hardware for testing and backup solution
- Request Tracker
- System for monitoring (Munin)
As operations department get more experience with systematic operation it will be made, or obtained several types of tools.
Evaluation criteria when selecting tools
Although it is used large amounts on creating evaluation criteria for software, the result is only experience-based guidelines. There is no final answer to what's good or less good software. As much else it revolves partly about taste. Different solutions do the same job just as well, but may have quite different design. However, here may some rules of thumb be useful.
The main evaluation criterion is whether one needs to do a job at all. Many IT tools are absolutely perfect and works without error, but it solves tasks not needed to be fixed. So the main criterion is whether it resolves the correct problem, and if it at all is necessary to do anything.
- So the first thing you ask about is whether the tool is wanted.
If it turns out one will have done a task, the solution my be so simple as to run some commands manually. The simplest way is best. But when one gets many machines to operate, automation becomes crucial. It's too much work to log into 20 similar server machines to do a security upgrade. Then automation is the thing.
- So here one must ask whether the tool is useful to solve the task
- Then one must ask whether the tool is usable.
There are often a wide range of programs and procedures to solve a specific task. But some problems solved completely different when maintaining 500 computers and 11 servers, than when fixing your home PC. An example might be tools that allow the teacher to see the desktop of each student on his or hers client machine. The teacher can stop and start programs for all pupils, and prevent individual pupils to use for example IMs when this interferes with school work.
Regarding the choice of operating tools, it's about automation and simplification of operational tasks. It is about making and reduce manual work to a minimum. So the motivation is to just maintain automatic. Also here it is to make things easy, which can be a considerable job to get done.
As you can see, it is not easy to set up good criterias for selection of operating tool for large installations. Most of all, this is because software developers often lack experience in the operation of IT systems. They are known only to create new things, but to create good and relevant tools for operation requires many years of experience.
Some general operational tools have not been replaced the last 20 years. But the products used may have been replaced. Also some programs may in a few years time be irrelevant to use. Therefore, one must rely on training in new editions of the applications used for operation, and in upgrades and changes in user programs.
Thorough user training makes a lot of support can be done informally in direct conversation between users. Often training costs as little as 1% of the total operating costs. It is well worth spending a little more on training. The effect is very positive. The same applies proper training for ICT contacts in schools, and operators. Training of ICT contacts to use simple systems for password change, error messages, etc.. will provide better quality of calls to the IT service.
Training and product practices are in Norway regulated in the Working Environment Act (§ 4-2)
- Employees and their representatives will be kept informed of systems used in the planning and carrying out the work. They should be given the necessary training to familiarize themselves with these systems, and they shall take part in designing them.
So in short it can be advantageous to increase efforts in training, which will improve ICT service and provide a significant cost reduction. This is because users and IT contacts becomes more confident and better to help each other. It should also be noted that the transition to new software can also provide an opportunity to simplify some of the operating practices. Simplification can reduce the requirement for product training.
Planning at the upstart of service support
A growing number of organizations see the necessity of service control. It is often the practice to base decisions on historical and political considerations, rather than the current organization's needs. Therefore it is important to ensure that management commits to participation and understanding of the working methods in the organization, and go through the existing processes and compare these with the organization's needs and "best practice".
Implementing service support
Determine current situation
General guidelines for project planning
Business case for the project
Critical success factors and possible problems
Plan for communication
Project review and reporting
Evaluation of the project
Going through to check compliance with quality parameters
Going through in relation to key factors