As mentioned in the introduction, it is recommended to begin by establishing an office for centralized operations to allow you to manage tickets. The benefits of this come quickly and are visible, which is important for customer and user satisfaction.
After the office is up and running with a sensible workflow for tickets (user requests and troubleshooting) you will move on to the biggest challenge for the organization. As a rule, this is either change management or problem solving. Organizations with "cowboy" system administrators who come up with smart ideas and implement them without much testing, often begin with change management. For organisations suffering recurring outages, problem solving comes first.
Whatever you choose to start with, a certain amount of configuration management will be necessary. Managing configuration is critical to delivering the software and services for the user. Software must work as expected. In order to make beneficial changes, one must know the configuration of the different programs.
To manage configuration changes you may use a database (Configuration Management Data Base (CMDB)). Few people use a database for all the configurations, and neither do you have to add all configurations to one single database. It's fine to place configurations into multiple smaller and partly independent repositories. Some people, for example, store configurations and setups in version control. But even if you have different repositories, you may get greater benefits if you connect information from the different processes.
For users of Debian Edu, most service configurations lie within a specific directory (/etc). These may benefit from being collected and stored in a central version controlled directory. This makes it easier to restore lost services and setup machines if they are reinstalled. This applies both to servers and user laptops or workstations. As part of the backup system in Debian Edu, a backup is made of the setup directory /etc. But the backup system is nothing more than a database or a version controlled directory for configurations.
The Service Desk is where users submit questions or errors. At school, the ICT-contact often forwards operational events to the Service Desk. There may also be requests like setting up a new PC, or installing a program.
At school the ICT contact is the link to the Service Desk. The ICT contact also responds to the most common questions. Some questions are too difficult to manage at each school and must be forwarded to the Service Desk. It is important to have good cooperation between the school ICT contact and Service Desk operators. Tasks that are too extensive or too difficult to solve locally should be passed to the Service Desk.
Users may also get direct answers from an operator at the Service Desk. All operational enquiries go to the Service Desk. Enquiries will be assigned a case number. Anyone who has registered a case will receive an e-mail confirming that the inquiry has been received. During consideration of the case, those working with it at the Service Desk may send updated status to the user.
This way, users get one point of contact, and service desk operators get an overview of all of the cases. Operations can be expected to troubleshoot across all parts of the organization. Periodically the team leader needs to go through all issues and solutions in order to prioritize debugging and to prevent re-occurrence of errors, in order to provide schools with a stable operating environment.
Incidents can be reported over the phone, fax, email or web form. Incidents that are more urgent must be prioritized. Incidents that need to be resolved quickly are usually reported by telephone. Less important events are usually reported via eg. email. A member of the support staff should be assigned to the incident and will need to ask the user questions to investigate the problem.
- Remember to be an active listener, not a passive one.
All enquiries should be logged, and an email confirmation should be sent. It is important that the user should feel safe, and information about what might be the problem should be communicated to them. When the enquiry arrives at the service desk, a brief description of the incident should be logged. The enquiry may be from the ICT contact at the school, or from someone with an agreement to use the service desk. The event logging should happen as soon as possible, and it should be assigned a case number. The user should get a confirmation by email copy that the matter has been received and assigned appropriate case number.
Previously, enquiries were written in paper logbooks. Today software is used to record the enquiries. In English, this is called "Request Tracker". It is crucial for operations to log enquiries. This is basically for error handling, user requests, and prioritization of the various incidents. Log entries are important to prevent recurring errors. Because operational events are periodically reviewed, an assessment of fixes and priorities can be made. The log also provides a basis for improving the service by debugging problem services and applications based on what users perceive as problematic.
Thus the log of requests is a basic and necessary tool both for users and the service desk. There are several freely available systems for logging requests with good documentation <ref>RT Essentials: http://www.oreilly.com/catalog/rtessentials/chapter/index.html </ref>. Skolelinux Drift uses RT <ref>RT Essentials: http://www.oreilly.com/catalog/rtessentials/chapter/index.html </ref> to handle requests.
One important thing when starting up support is not to get a too tough start. Do not try to achieve everything at once. Bet rather on "quick wins" that keep the user informed, and short response times. It is also important to clarify who Service Desk should forward events to, if they can not figure out the inquiry themselves. The support must also be able to see if there are disruptions for the user. This makes it quick and easy to give feedback.
For the users it is important that incidents are handled. For the service office it is important that the incidents are handled correctly according to the service level agreement, and that work requested outside what is already agreed upon is handled between leaders at the school and the system administration organisation.
Tasks and roles
We recommend to agree upon what duties the school's ICT contact has and what is the responsibility of those who work at the Service Desk. Schools often have little resources compared to what is common in municipal administrations or private companies. At the same time, one usually has many more users and often more client machines than what is in use in the rest of the municipality.
To distribute tasks must have in place roles. By having clearly established roles is easier to distribute tasks, and the working capacity necessary to resolve operational tasks in a good way. From experiences in municipalities and professional organizations operating shows that these roles are common.
- ICT contact on each school This is often a teacher with ICT educational and/or technical background.
- Operator(s) working in the central IT service. This is a person skilled in operations.
- ICT coordinator who organises the educational use of IT, and contributes towards plans for developmental, operational and educational use. Often this is a teacher.
- ICT responsible. This is usually the principal who is responsible for IT operations.
Here is an overview of the various everyday tasks, some of which are contracted out by the municipalities.
ICT contact(s) tasks at each school:
- Oversee the school's server room.
- To be the school's contact at the municipality - report errors and outages.
- Perform simple maintenance tasks such as replacing mice and keyboards, upgrading thin clients, and simple patching.
- To be the school's superuser - to advise colleagues about: the user interface, e-mail, video projectors and relevant applications.
- Participate in ICT gatherings.
- Create and administrate local users.
- Perform simple maintenance of printers.
- Create and manage email accounts.
- Perform simple commands and operations under guidance of a ICT-tutor.
- Facilitate the use of ICT in teaching.
The operator's tasks:
- Receiving incidents and service requests.
- Mentor ICT contacts by telephone and e-mail.
- If agreed, visit the school for troubleshooting defects and malfunctions on computers, printers and servers.
- Security software updates on the school's computers (servers and clients).
ICT coordinator's duties:
- Assist school management and ICT contacts in expanding technical and pedagogical ICT plans.
- Ensure that the service desk and the management get a good selection of software.
- Ensure that the schools have appropriate ICT tools for teaching, and that computers and networks are appropriate for school subjects.
- Provide advice and guidance to operational services on what the technical and pedagogical ICT requirements of the school are.
ICT-responsible duties (principal, headmaster, head of operational services)
- Make joint purchases of computer equipment and enter into joint agreements etc.
- Develop competence plans.
- Provide the schools with courses in the educational use of ICT.
- Operations course.
- Negotiate contracts for operations.
- Ensure that the IT contact and the IT service have the necessary resources.
The advantage of an agreement for these tasks is that one both know what is expected of the individual, - and has a good basis for planning and managing ICT services. Usually these ICT tasks is done only as a part of the job of a teacher who also have teaching duties.
A business would often have two staff members working full time opeerating 100 standard client machines with 100 users. In schools maybe a 30% position in total, divided among several persons, operate 100 client computers used by 320 students and teachers.
When the school has so few resources to operation, it is crucial to have good management of resources. Making agreements for the tasks, can make it easier to assess whether you need additional resources or to reduce expectations of IT initiatives in schools from budgetary considerations. By having a good overview of the ICT tasks in the school, IT administrators could easier ask for an increase in resources if necessary. There may be a need for increased resources to implementate ICT-based exams or a need for new equipment like whiteboard, as an aid in teaching.
Expected time usage
We've created a table showing time spent on operation and maintenance. The table is based on the experiences of municipalities that central operates Debian Edu of 9-10 schools with 250-500 client computers. Several things are not included in the table. That's why one must set up extra time to projects where ans when schools develop ICT solutions with network and more equipment.
<table> <tbody> <tr class="odd"> <td align="left">Role </td> <td align="left">Operational responsibilty </td> <td align="left">Time spend per school per week </td> <td align="left">Time spent in toal for all schools </td> </tr> <tr class="even"> <td align="left">Operation manager centrally </td> <td align="left">Monitoring, debugging and operation of 500 machines, for example, 10 schools with 3,200 students and teachers. </td> <td align="left">2-3 h
(50 clients) </td> <td align="left">½ position
(500 clients) </td> </tr> <tr class="odd"> <td align="left">ICT contact at each school </td> <td align="left">Oversight of equipment, easy maintenance, and reporting of incidents and requests </td> <td align="left">3-4 h
(50 clients) </td> <td align="left">1 position
(10 schools / 500 clients) </td> </tr> <tr class="even"> <td align="left">ICT-coordinator sentrally </td> <td align="left">Assist in planning and implementation of educational and technical ICT work in the school. </td> <td align="left">1-2 h </td> <td align="left">½ position </td> </tr> <tr class="odd"> <td align="left">ICT manager (principal) </td> <td align="left">Make joint purchases, and make sure the compliance of the service level agreement. Schedule updates, or developing solutions </td> <td align="left">1 h </td> <td align="left">¼ position </td> </tr> <tr class="even"> <td align="left">Overall for school </td> <td align="left">50 client machines (concurrent users) </td> <td align="left">6 - 10 h </td> <td align="left"></td> </tr> <tr class="odd"> <td align="left">Overall for all schools </td> <td align="left">10 schools, 500 client machines (concurrent users) </td> <td align="left"></td> <td align="left">2 ¼ position </td> </tr> </tbody> </table>
Experience shows that the scope of work of the ICT contact is affected by the number of concurrent users. The term "concurrent users" is new to many. To illustrate with an example: A school may have 250 students but not more than 50 computers. Then a maximum of 50 students can use computers at the same time. This is much less than the total 250 users who have an account on the system. It is these 50 logged in users that provides work for IT service. The other 200 people not logged in give little extra work.
Therefore, it is common to calculate IT costs from the maximum number of concurrent users. Other calculation methods are also possible, for example when paying for proprietary software. But since Debian Edu has no license costs, the number of concurrent users is the most crucial for operating costs. To calculate costs from user accounts provide little or no meaning in school.
For users of Debian Edu the cost difference to manage 100 or 250 user accounts is very small. There are a few exceptions. With 250 students in stead of 100, some more studens may constantly forget their password. Therefore, it is wise to let the teacher responsible for the class to give these students the new password.
If the school has 50 client machines, the ICT contact needs less time on their operational tasks than if the school has 150 clients. With multiple clients increases the overall time spent on the operation. But operating time per client machine goes down somewhat.
Several municipalities have set aside 3-4 hours a week to the ICT contacts tasks at each school when it is installed 30-70 client machines. The Education Department in Oslo has set aside half weekday, or 30% position to follow up 150 client machines. Experiences from other municipalities suggests that a 20% position is enough to solve the tasks of a local ICT contact when a school has 160 thin or diskless clients with Debian Edu.
In addition, the costs of centralized operations, ICT management, and construction of the educational use of ICT tools in school subjects. Probably it is enough with one position for the operation of 1000 client machines. When it comes to educational support, several principals have a 50-100% position in the school for this work. There may be a 10-20% position as ICT contact and a 40-80% position as educational support for the teachers. Many teachers perceive IT tools in schools as something new. Some principals wish to give more backing to the educational side by making teacher more confident in using IT tools in the different subjects.
We have sat up a list of tasks to be resolved to put in place a proper Service Desk.
- Get in place people in different roles that IT manager, IT contact in schools, operator(s) centrally and IT coordinator for all schools. It is important to make a distinction between what is technical operations and maintenance, - and the pedagogical work.
- Establish Service Desk where every school has a service agreement regulating what's standard operating activities, and what's extra. It is imperative that ICT manager principals throughout are a part of this process.
- Establish system for request tracking. All inquiries on email get a case number. Almost all inquiries from users or IT contacts at schools calling in also get a case number.
- Ensure that ICT budget reflects the contribution necessary to ensure proper operation of school computer equipment and networks. The requirement today is that the ICT systems will be used for national and local tests with use of ICT tools with or without the Internet.
- Basically use the standard edition of Debian Edu with the same version on all schools. From this make the changes you want. These changes must be taken care of in a configuration database with documentation of the changes made. One can use a system for version management to save the changes and documentation.
The purpose of the ICT service is to prevent disturbances like shutdowns or reduced quality using computer programs. Users will experience few problems with the ICT system if the ICT service has enough resources to operations, equipment and for inquiries to the Service Desk. Anyway small or big errors happens as disturbances for users. Then one needs good handling of incidents.
In parachute environment they call near accidents for "incidents". It is perhaps not quite the same in computer operation when something is not working. The purpose of dealing with incidents and restore service as quickly as possible, so that everything works normally. If something goes wrong, it must have the least possible impact on users. What is a normal service is agreed through an operating agreement describing the service level.
Statistics of incidents is important. Especially if several people job with the operation. When several jobs together, you loose track of all cases. Statistics will point out problem areas that must be addressed more thoroughly than a quick fix from service office. For example there may be many requests to exchange passwords to students who have forgotten this. Then it may be wise to let the teacher switch passwords for pupils in the class.
An operational disturbance is defined as:
- an event which is not part of the normal operation and causes, or can caus,e an interruption or reduction in the quality of the service.
Examples of operational disturbances may be:
the office program (OpenOffice.org) do not start
- the web browser (Firefox) crashes
- the hard drive is full
- the server is down
- unable to print
- unable to log in
- requests for information, advice or documentation
- forgotten password
The examples show some of the common operational disturbances. These are problems that make users turn to ICT contact the school or the Service Desk. IT service must prioritize what must be treated right away, and which problems need more time to resolve. To prioritize which problems need more comprehensive debugging, it is important to log all inquiries about malfunctions. This gives an overview of which disruptions it is most of, neseccary for acting on those areas with most problems.
We have made a short check list to ensure it has in place procedures and systems for good event handling
- The operator making debugging, is the one that reports the status back to the ICT contact at school and/or user
- The system for logging events must be in place, working both technically and functionally for those working with event handling in schools and at the Service Desk.
- The system for event logging must be used for nearly all operating events
- Statistics of the log of events is made periodically. The statistics used to implement measures which eliminates recuring problems, which irritate users.
Planning and implementation
To set up a workable system for logging events require something more than installing the system. All in the operations department must use the system. Those reporting errors must also receive feedback on email with a ticket number. This require significant efforts configuring the system for event logging. In addition, one must ensure plain user training of those who receive the requests.
Large and comprehensive plans are not need to put in place a proper event handling. To handle events is a completely standard task for those who work at the Service Desk or as ICT contacts at each school. To set up a computer tool for logging events may require up to a few weeks for a correct configuration, and users may also report events via e-mail and by phone.
The user interface to the log system is relatively self-explanatory. So it does not take many hours to start applying. During the daily use of the system one will become more and more comfortable with what should be replied to the logged messages. It is crucial that all in the operations department use the system for logging of operation messages.
Activities when operating interference occurs
To get an idea of activities done in relation to a message of an event, we use an example.
A user contacts the service office with a problem. Printing does not work, is the message from the user on the telephone. Operations logs event right after the call is completed. The problem with the print becomes a case with a case number (given automatically).
Operations at the Service Desk makes a quick analysis. Has the spooler stopped again, or is it something else? There may be missing paper or toner? By examining the spooler looks the operator that it is filled up. She deletes the queue and see if the next job is printed.
This time the print queue refilled again. Operations contact school's ICT contact asking to check whether the paper is in place. This is listed in the event log. The ICT contact replies that the paper is filled in, and printing is normal. Case closed, also noted in the system event logg.
If the had not started again, toner may be missing or the printer had an error. Was it an error the operator must have scaled problem. Scaling means that some other than the operator or the ICT contact resolves the problem. In this example one needs help from a technician who fix printers.
This example shows that it is initiated a whole apparatus to start a printer again. Printers will not work even if you have added more paper in a an empty printer, so one must first examine if toner is missing. If everything seems to be in place, but things still do not work, you must scale the problem. Operating department call an expert in a particular area to fix the error. This time it was a service technician for printers.
What was wrong and the fix done is noted in the system for event logging.
A variety of roles are involved when ICT service processes messages about something doesn't work. In the example above the school's ICT contact and the operator cooperate to solve the printing problem. Had the issue been greater, they have to summon a service technician. If one can't fix the printer, one have to buy new one. If the school must obtain new printer, involving the ICT managers may be needed to get money. Many places the principal who has the last word.
In short, it quickly becomes many who get involved when something does not work. One should basically solve problems there and then. Avoiding involving many who can't help solve the problem. Scaling problems which can be solved locally, becomes quickly more costly. Also because many inquiries are easy to deal with there and then. Other requests involve more complex problems. Then you have to involve more people. Are additional or external help to solve the problem needed, this must as a main rule be clarified with the operations manager. The important thing is to be aware when handling operating events, and using resources in a good way.
We have sat up some key points for handling incidents. The points will be helpful to consider whether doing a good job out of measurable and well-defined requirements. Such measurement points are:
- Total number of operating disturbances.
- Average time from receiving an inquiry to the issue is resolved, and classified with codes (a well organized operation department has codes for types of events and errors).
- Percentage of incidents handled within agreed response time (as agreed in the service level agreement).
- Average cost for each event
- Percentage of incidents solved by using the service without going on to the next level with operational
- Events per client machine (workplace)
- Number and percentage of incidents solved by the operations center without the need for visits to school
A number of tools can make it easier to handle operational disturbances.
- Automatic logging
- Automatic routing of events to the right persons
- Automatic retrieving of data from the database for configuration management
- Phone and email are easy to use together with tools for registering requests and incidents.
Problem management is an "investigative" process. Known bugs are most often handled directly by the Service Desk. This is the most common form of event handling. By unknown errors one must investigate what's wrong. This form of debugging requires both common sense and scent. Good operating people use scent to go straight to the problem, find the solution and restore service as quickly as possible so that everything works normally.
Problem management is;
- Problem management
- Checking errors
- Proactive control to prevent problems
- Identify error patterns, using information from for example event management
- Identify problemes
- Classify problems
- Examine/research problems
- Identify and register known errors
- Find temporary solutions if possible
- Contacting those with responsibility for Change Management to remove the error permanently
- Identify and solve problems and errors before the incident is reported by users.
- Using logs, information from event handling to see how problems may arise
Procedures for problem management
Vi har lagt ved en omfattende samling av problemløsninger og oppskrifter for konfigurering. I løpet av sommeren 2006 vil dette også være lagt ut på Internett. Vedlikeholdet av oppskriftene vil skje av profesjonelle driftsoperatører på skoler, kommunale IT-tjenester og private driftsoperatører. For å gjøre det enkelt å gjøre forbedringer i dokumentasjonen er det hele lagt ut i en wiki som ligger på en Skolelinux-tjener.
The Wiki technology has proven to be a great success for maintaining cataloged information on the Internet. It's easy to contribute and all changes are logged. It is also possible to import OpenOffice.org documents, and export recipes as pdf.
The resources spent on IT systems in schools must be handled in a financially prudent manner. Then you have to control of the services used and the equipment or infrastructure as it is often called. The equipment, software and services have a whole range of settings. This is configurations, or a logical model of how infrastructure and services are set up.
To control the configurations they must be identified, saved and maintained. One must also be able to keep track of different versions of the configurations. We call each part of a setup for a Configuration Item (CI). A configuration file may, for example, ensure that certain users have access to a few printers in the network. Another can make sure you get a buffer on diskless clients.
An updated database for configuration management is essential to ensure rapid and controlled treatment of operational disturbances, or wanted changes in the layout of machines, programs or services.
It takes planning to establish a database for configuration management. One must decide areas where to use the system, the objective, policies and processes for storage and maintenance of configurations.
- Identify and select a structure for configurations on the important parts of the ICT infrastructure. It also applies to owners of the configuration, name tags (attributes), dependencies, and relations between configurations.
- Control configurations so that only those who are approved are taken care of in the database through the lifetime of the system. Control over access to the configurations can be done with group permissions. This can be done through the process of Change Management.
- Status logging - keeps track of the condition and status of the various subsystems. This applies throughout the lifetime of the service, software or hardware. There may be a configuration in production, is disconnected or discontinued.
- Checking and revision. Each configuration must be checked to confirm that the correct information is stored in the database configurations (CMDB). This is followed up with periodic reviews to ensure that the database is constantly updated.
As we see, one must plan a whole lot if one want to have a good management of configurations of the ICT system. The purpose of planning this as part of ICT operations is to ensure that systems quickly get on the air, when they go down. With a good track of configurations, it is easy to replace a defective machine with a new one. The configurations can be quickly transferred to the new computer and the ICT system perceives as just as well as before it went wrong with the old machine.
Management of Configuration Items (CI)
A configuration item is a part of the infrastructure. It is normally the configuration of a service or a program. Some times users want to change how a service work. One need to keep track of the configurations if changes are made.
To get this down to earth we can imagine the configuration of the printer server. You want to add a new printer to the computer network and will add this to the printing system CUPS. When changing one configuration through a web application or via configuration in KDE. CUPS config file will change, and you must restart the printer server again. This can be done in KDE tools or through a web application. The modified setup file is copied to a directory where the file can be handled by a version system.
Of many different choices there are a few common ones. This is if a service should: run, stop, terminate, start, be interrupted or taken out.
One should be cautious in changing configurations without a proper plan. It is easy to forget what you have done on a server or a PC. Therefore it is important to document the changes made in a change log.
Planning and installation
The configuration of the computer network is connected to the architecture. Much of the planning is done with Debian Edu. This is because it may take both 3 and 4 weeks to set up servers with corresponding service level with Windows server, RedHat or other GNU/Linux distributions. Debian Edu takes this with 1-2 hours. If you want a fixed IP address for the network a professional uses ½ hour extra on this. This is because web services are set up with reusable names.
What then must be planned is which additional user program to use, and which subsystems should interact with Debian Edu. It may, for example. be that the school has an electronic whiteboard.
We have made a list of activities and solutions that need to be in place should you have good management of the configurations.
- Establish a version handled area for saving configurations for all servers and selected workstations and laptops. One can use the version system subversion to this. Remember to take daily backup of the area, and make sure to save all changes in configurations.
- Use an electronic system for taking care of recipes explaining configurations of different type machines, the network and services. Such recipes contributes to others who help or take over operations can read up on what is done. A wiki can be suitable for this.
- Use one specific version of the operating system and software on all machines. This is to avoid maintaining many different versions of the software. Ensure that the software is well tested. Therefore, it may be wise to wait 6-12 months before adopting latest edition of a program.
Relations to other processes
Management of configurations are closely connected with the handling of problems and if the systems are available. Experiencing too often that printing stops, it may happen a change of configuration solves the problem. It may, for example, be to establish a routine for deleting the print queue and restart the print service anew.
The aim of the changes you make in the configurations are usually to increase the availability of services or programs. It may also be to restrict access to certain programs or services to specific times. To achieve this, one must reconfigure the service. In addition, it may cost money beyond what was agreed on as service level or capacity of the system.
The examples show that the managing configurations engages a number of other areas. Therefore there is much to gain by putting in place good practices for managing changes in configurations. Also automation is advisable if you want greater stability, or access to certain services in specific periods.
Tools for configuration management
As mentioned under Check list one may use
- Saving the configuration files in a version-control system, for example subversion.
- Wiki for storing documentation of setup and wizards
- Use of a common directory for operation documentation on the Internet maintained by those central operates Debian edu in many schools.
Many ICT services are not clever in handling changes in ICT systems. Leading to many disgruntled users. Surveys in the public sector in Denmark show that operating costs go down when you have good control on the changes. Therefore, it pays to involve users with training and participation related to the changes made.
Changing messages is entirely dependent on proper processes. This applies regardless of whether the changes are small or big. Therefore it is important to have in place the right people when making changes, both to give training and to have people to answer questions. This becomes especially important when adopting new releases of software and services. This is independent of whether one uses free or proprietary software.
Change Management should ensure that all changes are made in a standardized and right manner. It is important to ancor the decision about amending at the appropriate level in the organization, Standard changes can often be pre-approved when they are done a few times. But major changes will often involve a higher decision level between school management and operator.
The reason why the management should be included is that an upgrade will often require training of users. It may be upgrading to a new browser or a new version of office software. This can quickly lead to a half day training in what is new in a program. Such changes must be agreed with the management. The changes must also be done without the other parts of the system stops working.
Those with responsibility for approving changes receives a so-called change message or RFC (Request For Change). When you have a RFC you can assess whether the change should be performed. Many times you have to clarify with management if optional changes should be made, and if so, when it will happen.
By changes one must also cooperate with the school's ICT responsible. One must ensure that changes occur when it fits with the schools plans. To implement significant changes without Change Management can lead to much dissatisfaction and additional inquiries to the Service Desk. This would provide significant extra work without this being planned. In addition, it may lead to a change that would soon be rolled back. You fast get twice as much work without ending anywhere else than back to start. Had one made the necessary approvals, may the change be done in a planned and straightforward manner.
Change Management is done to avoid more extra work than what's necessary. Making changes obviously requires more work, but you will get less extra work on the changes planned. One also avoids the need to roll back changes, because problems arise where users are unprepared for substantial changes.
When you for esxample update the entire system to a new version, make sure that everyone is informed. One must look into whether those affected by the change need training. The right professionals must prepare it all, so there are no surprises.
All responsibility must not land on the person responsible for managing versions of software, the release manager. Release handling is a process which preferably should work with changes that contains many minor changes. This usually happens when rolling out new systems and services, or the upgrading of the entire system to a new version.
- See change message, or RFC (Request For Change) above, and check it also has got a unique number.
- Prioritize and categorize the changes
- Remove not possible changes. Thit can be done by marking them as not possible.
- Give feedback to the one giving the change message
- Make sure you have a Change Advisory Board, where the change is dealt with, discussed and evaluated. This consulting group can be selected ICT contacts and operations personnel with long experience.
- Coordinate changes with the Release Management which handle different versions of applications and services.
- Look over and finish the changing message (RFC)
- Remember to save modified configurations in the respository for configuration files.
Even what may look like a small insignificant change message can have major consequences for if the change is implemented. We have examples of schools that have a stable Debaian Edu network where all the programs work. A test version of a popular program chrashing constantly, is installed, and Debian Edu get blamed.
An example is schools that have installed the test version of the latest OpenOffice.org before the program was finally finished. Several thought it could be fun and try out. The problem is that the test editions are usually released to find errors and instability in applications. They are not intended for production use
In production, the general rule is that you don't install test versions of software. Most operators recommend using the next to latest version of a program intended for production. After 6-12 months are usually the worst errors picked out of a new main version of an application.
It means one often wait until summer before updating to a program that were reissued just before New Year. This fits well with the school year. The alternative may be instability and irritated users. Therefore the advisory group plays a key role when done small or large changes.
Release handling is management and planning activities preparing for wanted changes. The changes can be small or large, where large changes can consist of many smaller changes. Release management goes on before initiating the actual job of installing software and hardware into production.
First the planning and testing of new releases are carried out. Then it all is rolled out it into production. Deployment is part of the infrastructure management. The procedure is to implement what is planned, tested and is ready within the systems for Configuration Management. Once everything is planned, tested and configurations are stored, then roll out the solution in production.
Usually, many service providers and suppliers are involved. This applies both to the procurement of machines, the software used, and the recommended configurations. Good resource planning is crucial to package and distribute a new release in a good way for users. Slipshod in this area can end up with equipment that doesn't work, or are left unused because of deficiencies in the installation.
Release Management takes a comprehensive approach by the change in a service, and ensure that all parts of a publication is seen in context. This applies to both technical and non-technical factors.
As you can see is the publication handling fundamental for computers, software and network to work as planned. Proper handling of releases is done to prevent disruptions. By new releases or changes it is expected that operations will continue as normal without interruption or reduction in quality.
Handling of changes or new releases can be compared to building a new road. Cars must still get past even if you build a new road atop the old. Good signage must be in place. One must also have the necessary resources to rebuild the road. If missing resources to make changes, it's just fine to let it be as it is.
For some it may be boring with proper release management. You do not use the latest new every time something new comes. But often there is not room for the extra time in the operations department to handle a flood of complaints when new software fails. High uptime require established technology, Linux expert David Elboth states in Linux Magazine (1/2004). He writes:
- The higher requirements more stringent are requirements of the individual components. High requirements for uptime results also show that the choices you are left with are old technology. It is namely empirical data over time which may say something about downtime. We have all noticed how long after Red Hat and SuSE is on its server products.
Getting few complaints, with a stable and reliable environment, requires solid release handling. Alternatively, a bunch of complaints and dissatisfied users emerge, when installing not good enough tested cutting edge software. People with "boy room skills" has a tendency to underestimate the consequences of software upgrade. If something goes fine on your home computer, it does not mean that this will work in a wide network with 500 client computers and 3200 users.
Central program archives (DSL)
The program archive in operational context is a collection of original edition of the program version of the software which are in production. If you use Skolelinux 2.0, this is the program archive. In the computer world, the word program archive in different contexts, especially when programming. When it comes to operation, we are talking about the original composed software of a particular version which is the base for the installation.
Using free software the program archive may be Skolelinux 2.0 plus the extra programs you have added from various sources. There may be certain versions of Macromedia Flash, Java and decoders who make it possible to run national tests in the browser, or watching broadcasts from NRK.
If you plan to upgrade to the next version of Debian Edu when it comes, it will be the new version which is the main program arcive. Also here will all additional applications beyond new Debian Edu be part of the archive.
Setup files adjusted or created locally by the operations department is not included as part of the main archive programs. Configurations are saved in a separate version handled directory or database.
Database for configurations and hardware
As mentioned in chapter about configuration management, you must create a database or a version handled directory to take care of the setup files. One must also keep track of all computers, what kind of machines are involved, performance, and unique standard addresses on the network cards (MAC addresses)
There are many reasons to have an overview of the equipment. One of the main reasons is to keep track of how many machines are in operation, the number of machines that are not in use and the number of machines in repair. Another reason is planning on upgrades It is both the amount of.............?
A variety of applications in addition to browser and office suite are beeing installed in schools. Educational programs for learning, additional programs in the browser, and programs for multimedia are needed. The systems also have network setup and changed settings in specific programs. If you have many servers and perhaps thousands of clients quickly reveals quickly the need for effective tools for deployment. Such tools are standard in Debian Edu.
Construction management is about getting installed the required software packages, services and proper settings both of individual programs and data network. Many people have heard about so called "images". One installs operating system and all the programs needed. And adjust the network. Then use a image program to make a copy of the one installed at your hard drive. This is then copied at the other computers.
It is not necessary to build so-called "images" or disk images you can call it in Norwegian. Debian Edu is based on Debian which has an excellent package management system. One does not in any way to compile applications as this is preassembled and can be installed directly from the Internet. one must have in order is wanted changes to the default setup of Debian Edu or the main program archive in use. Then you make one or more scripts running on different machines to get everything installed and set up.
For most situations, scripting are an easy way to "build" and roll out programs and setups. But there are situations where the construction of disk images may be the solution. For ecample during installation on many laptops.
As we see, handling the construction process is about facilitating deployment on many computers. In exceptional cases, it's about building a tailor made Debian package But in most situations, all packaging is finished. Then you have to put in place a script which installs additional programs and certain settings. One can also create disk images if you have many similar machines, such as laptops to all students
It is essential to test new applications, configurations, and new services before they are put into production. Several schools have experienced instability because they have to install software without making the necessary adjustments. Therefore it is crucial to test changes in configurations or new version of the software before the change is made on all machines.
Testing generally takes place in three steps.
- First, do an installation of the changes on a test network. This is technically testing guaranteing that everything connect and works in a system without users. Retain all changes in configuration files.
- When one is sure that everything works on the technical side, try installing the solution to a school. It is very important to agree about the testing with the schools ICT contact. Users must also get full briefing on the changes because of testing is performed. Retain current adjustments in the setup files, which are made along the way from the operating messages that have arrived.
- When one is sure everything works, you can roll out the solution to all schools. It is easiest to create a script that simplifies upgrading of software packages, services and configurations.
Back up solution
Much can go wrong during a new installation or upgrade. Therefore, one must have ready a fallback solution. It means one quickly can use the system as it was before the upgrade. In technical terms, this is called rollback.
When rolling back it is absolutely essential to have ready the previous version of the software archive and configuration files. It means that you can install for example Edu 1.0 in under an hour, and put it in place the appropriate configuration files.
Men tilbakerulling tar tid. Derfor kan det være greit å ha en tjenermaskin klar med forrige utgave av programvaren, de riktige konfigurasjonene, og hjemmekatalogene til brukerne. Denne tjeneren kan raskt erstatte maskinene som ble oppgradert, men ikke virket etter planen. Ved å ha tjenermaskin(er) i reserve kan man sørge for høy tilgjengelighet selv om noe skulle gå galt.
Fordeler og mulige problemer
Fordelen med å ha arkiv over programvaren som er i produksjon kan ikke undervurderes. Mange satser på å ha programvaren på sine respektive CD-er og enkelte DVD-er. Dette gir lite effektiv distribusjon. For å spare tid og bryderi er all programvaren i Skolelinux tilgjengelig på Internett.
Driftsavdelingen kan lage kopi av Skolelinux-arkivet på en sentral tjenermaskin. Herfra kan all programvaren raskt og greit installeres på de andre maskinene. Fordelen med dette er at IT-tjenesten hele tiden har oversikt over hvilke versjoner av programvaren som de har gjort tilgjengelig for skolene. Man hindrer også installasjon av programvare som ikke har vært vurdert av styringsgruppa for endringer.
Det kan oppstå betydelig problemer om man ikke vedlikeholder programarkivet og konfigurasjoner. Det kan også være at man gjør feil med en konfigurasjonsfil eller programpakke. Da rulles dette ut til alle maskinene. I tillegg kan enkelte skoler installere lite testet programvare eller beta-program som de setter i produksjon. Så man må ha gode prosesser og ha noen å holde ansvarlige for vedlikehold av programarkivet og konfigurasjonene.
Is it needed a lot extra to install and maintain services and software already in use? However, if you choose away the tools providing management with upgrades you give yourself a lot of extra work. The ICT service must spend a lot of time on manual work with installation on each machine. The danger of making mistakes increases. When things do not work you get disgruntled users, and much time is spent to debugging.
Many operating major IT systems have inadequate plans for changes. Some have no plans at all, but just installing new versions of software. Changes made can be perceived as problematic for some users, because functions they are comfortable with changes place in the user interface. For operations it can go completely wrong. For example when they should upgrade to from older version of Windows to newer in Arendal municipality, most stopped working. ICT service said they had several computer program that was held together with "wire and tape." It took half a year to clean it up.
Planning and implementation
Årsaken til at man planlegger før man gjennomfører endringer er for å hindre uker eller ekstra måneder med problemer. Selv om man skulle bruke noe ekstra tid på planlegging, så tjenes dette raskt inn fordi man unngår ekstra problemer. Det vil alltid være personer som forteller at de ikke har hatt problemer med ad-hoc-endringer i systemene. Men når man undersøker nærmere viser det seg at det er problemer etter endringer, og at henvendelser om dette ikke formidlet videre.
I våre øyne er ad-hoc-løsninger kun en omvei ved endringer, og kun en nødløsning. En ad-hoc løsning kan sammenlignes med en midlertidig reparasjon med «ståltråd og tape». Man må på sikt rydde opp i slike løsninger når man vil ha stabil drift uten stadige overraskelser. Ved å hoppe over en planleggingsfase vil man få mange flere ad-hoc-løsninger, og flere driftsproblemer ved endringer eller oppgraderinger. Derfor er det helt avgjørende at fagfolk og ledelsen forstår verdien av en god planprosess. endringer.
Derfor anbefaler vi at man innkaller til planmøte, og lager en stegvis plan ved endringer av systemet. En stegvis plan vil selvsagt variere i forhold til hva som skal endres. Det å oppgradere kontorprogrammet OpenOffice.org er noe annet enn å oppgradere hele systemet. Ved oppgradering til nytt kontorprogram holder det kanskje med en 2-3 timers gjennomgang av kontorpakken for læreren på hver skole. Når man skal oppgradere hele systemet må man både sørge for brukeropplæring og at det tekniske fungerer etter forutsetningene.
Hovedpoenget er at det er få snarveier når det kommer til planlegging og implementasjon. Undersøkelser viser at de som planlegger skikkelig og sørger for at folk har riktig kompetanse har lavere driftskostnader knyttet til driften.
Det er helt avgjørende å planlegge nye utgivelser. De fleste endringer av systemet skal avklares med ledelsen. Følgende liste over aktiviteter er laget som støtte ved oppgraderinger i en plan- og gjennomføringsfase.
<table> <tbody> <tr class="odd"> <td align="left">Oppgaver </td> <td align="left">Detaljer </td> </tr> <tr class="even"> <td align="left">Prioritering av utgivelsen: </td> <td align="left">Sjekk om nødvendige beslutninger er gjort før en endring eller oppgradering skal rulles ut. </td> </tr> <tr class="odd"> <td align="left">Sentralt programarkiv </td> <td align="left">Sørg for at de aktuelle programpakker som ønskes installert er på plass i det sentrale programarkivet. </td> </tr> <tr class="even"> <td align="left">Konfigurasjonsdatabase </td> <td align="left">Sørg for å ha på plass alle oppsettfiler. Det gjelder både de som er i bruk, og de nye som følger med systemene som endres eller oppdateres. </td> </tr> <tr class="odd"> <td align="left">Construction management </td> <td align="left">Alle skript og systemer som brukes til utrulling eller å lage diskbiler (images) må på plass. </td> </tr> <tr class="even"> <td align="left">Testing </td> <td align="left">Kjør først utprøving på testutstyr. Når dette fungerer uten problemer så kan det prøves ut med en skole. Skolen må være fullt orientert om, og med på at de skal prøve ut ny programvare. Når man er sikker på at alt virker kan man oppgradere hos alle. </td> </tr> <tr class="odd"> <td align="left">Back up solution </td> <td align="left">Selv med omfattende testing kan nye utgivelser gå galt. Derfor er det avgjørende å ha en reserveløsning. Den enkleste reserveløsningen er å ha den gamle installasjonen med data på en egen tjenermaskin. En slik maskin kan plugges inn om endringen eller oppgraderingen ikke virker. </td> </tr> </tbody> </table>
Som man ser av aktivitetslisten trenger man flere verktøy for å holde orden på forskjellige utgivelser av programvaren, tjenester og maskinvare i systemet. Noen av disse verktøyene er nevnt tidligere. Men vi gjentar dette allikevel:
- Debian-verktøy for sentralt programarkiv
- Database for konfigurasjoner og maskinvare (subversion for oppsettfiler, regneark med oversikt over all maskinvare med fysisk plassering)
- System for bygghåndtering
- Maskinvare for testing og reserveløsning
Relations to other processes
Utgivelsesledelse griper rett inn i kjernen til IT-tjenesten. Det går på å gjennomføre ønskede sikkerhetsoppdateringer, endringer i tjenester, eller og oppgradering av dataprogram. Forespørsler om nye utgivelser kan skyldes driftsproblemer eller ønske om ny programvare. Før en ny utgivelse så er det gjort en vurdering om endringen er ønskelig.
Om endringen er grei så vil man gjøre nødvendige endringer i konfigurasjoner og klargjøre programpakker for utrulling. Dette vil være testet, og man vil ha på plass reserveløsninger. Når endringene er utført vil man kanskje måtte legge om deler av driftsaktiviteten. Så det er enkelt å se at endringshåndtering påvirker alle deler av driftsstøtten.
Verktøy for driftsstøtte
Det første man skal spørre seg selv om: «trenger vil virkelig programvareverktøy?» Trenger man verktøy så er det avgjørende å undersøke alternativene grundig.
Tar man en glanset brosjyre, og lytter til salgsprat, så er man helt avhengig av slike verktøy. Men gode folk, gode prosessbeskrivelser, og gode prosedyrer og arbeidsbeskrivelser er et grunnlag for god tjenestestyring. Behovet for, og hvor kompliserte verktøyene er, er avhengig av virksomhetens behov for datasystemer, og størrelsen på organisasjonen.
I en liten organisasjon vil en enkel fritt tilgjengelig database være nok for logging og styring på hendelser (request tracker). Men i større organisasjoner vil man ganske sikkert ha behov for et sofistikert distribuert og integrerte verktøy for tjenestestyring. Det betyr at man linker alle prosesser til et system for hendelseshåndtering.
Selv om verktøy kan være viktig, så er ikke disse viktige i seg selv. Det er de oppgaver og prosesser som må gjøres, og informasjonen som det er behov for som er utgangspunktet. Dette vil gi nødvendig informasjon til en spesifisering for hvilke verktøy som passer best til å støtte driften. Her er noen grunner til hvorfor man kan bruke programvare til driftsstøtte og tjenestehåndtering:
- økte krav fra brukerne
- mangel på IT-kunnskap
- virksomheten er helt avhengig av kvaliteten på tjenesten
- integrasjon av systemer fra flere leverandører
- økt kompleksitet i IT-infrastrukturen
- fremvekst av internasjonale standarder
- økt omfang og endringer innen IT
Automatiske verktøy tillater:
- sentralisering av nøkkelfunksjoner
- automatisering av funksjoner i tjenesteleveransen
- analyse av data
- identifisering av trender
- preventive tiltak kan implementeres
I dette kapitlet har vi forestått en rekke verktøy for å forbedre driftsstøtten. Her følger en oppsummering av verktøyene:
- Debian-verktøy for sentralt programarkiv
- Database for konfigurasjoner og maskinvare (subversion for oppsettfiler, regneark med oversikt over all maskinvare med fysisk plassering)
- System for bygghåndtering
- Maskinvare for testing og reserveløsning
- Hendelseslogger (Request Tracker)
- System for overvåking (Munin)
Etter som driftsavdelingen får mer erfaring med systematisk drift vil det lages, eller skaffes flere typer verktøy.
Evalueringskriterier ved valg av verktøy
Selv om det er brukt store beløp på å lage evalueringskriterier for programvare, så finnes ikke annet enn erfaringsbaserte retningslinjer. Det er ingen endelige svar på hva som er god eller mindre god programvare. Som med mye annet dreier en del seg om smak og behag. Flere løsninger gjøre samme jobben like godt, men kan ha ganske forskjellig utforming. Men det er noen tommelfingerregler som kan være nyttige å ta med seg.
Det viktigste evalueringskriteriet er om man har behov for å gjøre en jobb i det hele tatt. Mange IT-verktøy er helt perfekt og løser sine oppgaver uten feil, men det løser oppgaver ingen trenger å ha løst. Så det viktigste kriteriet er om man løser riktig problem, og om det i det hele tatt er nødvendig å gjøre noe som helst.
- Så det første man spør om er om verktøyet er ønsket.
Om det viser seg at man vil ha løst en oppgave, kan det vise seg at løsningen er så enkel at det er like greit å kjøre noen kommandoer for hånd. Det enkleste er gjerne det beste. Men når man får mange maskiner å drifte blir automatisering helt avgjørende. Det blir for mye jobb å logge seg inn på 20 likeartede tjenermaskiner for å gjøre en sikkerhetsoppgradering. Da er automatisering tingen.
- Så her må man spørre om verktøyet er nyttig til å løse oppgaven.
- Deretter må man spørre om verktøyet er brukbart.
Det er ofte et stort utvalg av programmer og fremgangsmåter for å løse et bestemt oppgave. Men en del problemer løses helt annerledes når man vedlikeholder 500 datamaskiner og 11 tjenermaskiner enn når man fikser hjemme-PC-en. Et eksempel kan være verktøy for at læreren kan se skrivebordet til hver enkelt elev på sin klientmaskin. Læreren kan stoppe og starte programmer hos alle elevene, og hindre enkeltelever å bruke f.eks. lynmeldinger når dette forstyrrer skolearbeide.
Når det gjelder valg av driftsverktøy handler det om automatisering og forenkling av driftsoppgaver. Det er om å gjøre og få redusert manuelt arbeide til et minimum. Så motivasjonen er å kun vedlikeholde automatikken. Også her går det på å gjøre ting enkelt, noe som kan være en betydelig jobb å få til.
Som man ser er det slett ikke enkelt å sette opp gode kriterier for valg av driftsverktøy for store installasjoner. Mest av alt kan dette skyldes at utviklere av programvare ofte mangler erfaring fra drift av IT-systemer. De er kun kjent med å lage nye ting, og det å lage gode og relevante verktøy for drift krever mange års erfaring.
En del generelle driftsverktøy som ikke har vært byttet ut de siste 20 årene. Men de produktene som brukes kan være byttet ut. Også noen programmer kan om få år være uaktuelle å bruke. Derfor må man belage seg på trening i nye utgaver av programmene som brukes til drift, eller ved oppgradering og endringer i brukerprogram.
Thorough user training makes a lot of support can be done informally in direct conversation between users. Often training costs as little as 1% of the total operating costs. It is well worth spending a little more on training. The effect is very positive. The same applies proper training for ICT contacts in schools, and operators. Training of ICT contacts to use simple systems for password change, error messages, etc.. will provide better quality of calls to the IT service.
Opplæring og produkttrening er regulert i Arbeidsmiljøloven (§ 4-2):
- Arbeidstakerne og deres tillitsvalgte skal holdes løpende informert om systemer som nyttes ved planlegging og gjennomføring av arbeidet. De skal gis nødvendig opplæring for å sette seg inn i systemene, og de skal medvirke ved utformingen av dem.
So in short it can be advantageous to increase efforts in training, which will improve ICT service and provide a significant cost reduction. This is because users and IT contacts becomes more confident and better to help each other. It should also be noted that the transition to new software can also provide an opportunity to simplify some of the operating practices. Simplification can reduce the requirement for product training.
Planlegging ved igangsetting av servicestøtte
Et stadig økende antall virksomheter ser nødvendigheten av tjeneste-styring. Det er ofte praksis at man baserer beslutninger på historiske og politiske vurderinger, framfor gjeldende behov i virksomheten. Derfor er det viktig å sikre at ledelsen forplikter seg til deltagelse, og forståelse for arbeidsmåten i organisasjonen, og gå gjennom eksisterende prosesser og sammenligne disse med virksomhetens behov og «best practice».
Innføring av servicestøtte
Brukbarhetsstudie (Feasibility study)
Fastslå gjeldende situasjon
Generelle retningslinjer for prosjektplanlegging
Forretningstilfelle for prosjektet
Kritiske suksessfaktorer og mulige problemer
Prosjektgjennomgang og rapportering
Evaluering av prosjektet
Gjennomgang for å sjekke samsvar med kvalitetsparametere
Gjennomgang i forhold til nøkkelfaktorer