Service Delivery

The main purpose of service delivery is to ensure proactive operation and ICT service delivers appropriate support for users. The purpose of service delivery is to focus on your organization's needs. It is active learning, with use of ICT tools in the different subjects, the school needs. This chapter describes in order:

Service level management

Service Level Management is often shortened to the acronym SLA. Managing the the service level is about the quality of the operational services, measured in relation to what is agreed in a contract. There are definitely concrete figures for availability, response times, support, error correction etc.

The objective is to have control over service level and improve the quality of the operational services. By repeating rounds the quality level us determined, monitored and reported. The purpose is to improve the contact between ICT administrators and users, to get an ICT service to the agreed quality delivered.

It is important to have a proven relation to different types of SLAs. One can choose from many types of agreements. Typically three types :.

All SLAs is to be administered, reported on and maintained. It quickly becomes confusing and much work that does not provide particular benefit. The purpose is to get an agreement that helps to improve quality of service. Therefore it is useful to think carefully about this, when the agreement is made. Here is an overview of what is important to make sure when you create an agreement for the service level management.

General checklist

Planning

It is essential that the operations center has the technical capability to measure the values included in the SLA. This must be taken into account from the beginning.

Furthermore, it is important to define the services dependent on subcontractors and therefore can't provide guarantees of service, or relies on a similar agreement with the subcontractor. The definition of dependencies is made because it should be clear who rectify problems, and to avoid ongoing negotiations before the error can be corrected.

Level of service may be different for different user groups, or during different periods of the school year. For example, there may be difference between teachers and students, or a higher service quality when carrying out exams. Dialogue with all relevant users is important to ensure measuring of what's most relevant for each user group.

Implementation

A service catalog with all services included in the SLA It must prepared. A service will often be a application/program in this directory. It will often be different requirements for different services, and reflected in different objektives in the agreement. It will often be different requirements for different services, and it will be reflected in different targets in the agreement.

To establish and continually adjust the users' expectations can't be overstated. Often users have exaggerated expectations to the system and the services included. ICT service responsibilitity is to adjust expectations down to realistic levels before the service-level agreement (SLA) is signed. The operating management must also ensure that all users actually are notified and know about the expected service level through the agreement.

For the structure of the SLA, see section in the service level agreement.

The operational situation

Monitoring of actually achieved service levels, and reporting back to the customer, are essential to preserve a good relationship between the Service Desk and the users. Format and level of detail for reporting, should be dealt with in the SLA.

It must be held periodic, for example quarterly or semiannually, meetings with the client. These meetings should result in concrete plans for the next period and, possibly, agreed implementation of new services.

Content of the Service Level Agreement (SLA)

Introduction

Name and contact information for the Contracting Parties, description of the services included, duration of the agreement, responsibility between customer and supplier.

Service time

In which time the agreement applies, like Monday to Friday 8:00 a.m. to 4:00 p.m., any special requirements for certain times (for example exams), routines to order expanded time.

Availability

Access to the services. Is best measured as the time, period, one or more services have been unavailable, for example a calendar month. Different levels for different services may be agreed, for example depending on the degree of importance for users.

Important to emphasize that this is availability within the agreed period of service, not the overall availability all day, all week and all year round (called 24/7/365). For example, it may be agreed that the system should be available between the hours. 8 to 18 on workdays, after that and on weekends it is more uncertain whether one can use the computer system, unless otherwise agreed.

Availability is also if one gets support via phone or email. For example, can the Service Desk be reached between the hours 08 and 16 at day time, or all day. May one have the possibility of support in the afternoon and evening, or in specific weekends?

Stability

Is often measured with the number of times of downtime for a period, or that the average time between episodes of downtime. One can also measure the time it takes to the system comes up again.

Support

Often measured as response times by phone (for example 1 minute) or email (for example 30 minutes) at requests from users. When the operator gets a request for support, the message will be categorized by severity with a time guarantee for answers. There may also be an agreement about how fast error correction will start, which will depend on what kind of received inquiry.

The support is also about when during the day or night one reach people. Should support be available during school hours between 08 and 16 o'clock, or should one also have support throughout the evening or on weekends. Some will have support also on certain holidays.

The period when support is available is usually in the SLA. It is also agreed what support will assist with to a fixed price, and what must be resolved additionally on an assignment basis. The agreement regulates the process of handling inquiries, both what to fix, and when this will happen.

Capacity

Can be measured as the average response time by certain operations in specific applications. Will measure the user experience of the system.

Change management

Measures for time management, approval and implementation of change requests from users.

Security

Can be measured as the number of ascertained security incidents in a period. It is very important to be clear on each user's responsibility to ensure that warranties will apply.

Billing

Prices, times for billing and settlement provisions.

Reporting and follow-up

Description of rules and periods for reporting of measured service levels. It is recommended regular meetings, for example quarterly, to go through the report and plan ahead.

Sanctions and possible incentives

Rules for pricereduction if the agreed service is not met. Escalation procedures and rules for cancellation of agreement by continuous violations of guaranteed service level. Possible incentives for achievement or better than expected service.

See Appendix A for SLA.

Financial Management

Organizations rarely have a full overview of their ICT spending. A 2001-survey of Norwegian municipalities showed that only 1 of 8 municipalities had an ICT budget. Probably it is not better for school. Putting in place an ICT budget is important. Often users think they pay too much for a service they are not happy with. This creates many times conflicts between users and the ICT department.

It is very useful for both the operations center and users to document the real ICT costs. Without it is difficult to budget appropriately. Not least, it is difficult to make a cost/benefit assessment of existing ICT solutions. Rector should know the ICT budget as well as she knows salary budget, or the budget of teaching aids.

There are three major key processes related to financial management of ICT services:

  1. Budgeting
  2. Accounting
  3. Billing

Budgeting

The objective of the budget is to make a realistic estimate of the expected ICT costs. Budgeting usually contains various alternative solutions. It applies both to equipment and software, and the level you want to lay on. The budget is the starting point for subsequent budget negotiations with the director of education and/or politicians.

Budget must include both personnel and equipment costs. Some organisations count only on costs to buy equipment, omitting as much as 60 - 70 % personnel costs for the operation of an ICT-solution. One must also get all of the equipment.

There are examples of municipalities forgetting to count the cost of power connectors and computer networks in schools. Then you have forgotten about 2000 NOK (10 NOK = 0.85 GBP/1.18 EUR) per client machine. Should we put in place 70 new computers we talk soon about 140,000 NOK to computer networks and power.

Alternative solutions are also important to include in the budget. This applies both for the operation and the equipment. Today there are several vendors who specialize in the operation of computer equipment in schools with varying prices and quality. Number of simultaneous users and type machines and software to be maintained, also means a lot.

If you want a laptop for all teachers and students one will easily get 5-6 times higher costs than if you have desktops with three students for each client machines.

Accounting

The accounts will mainly consist of invoices for purchased equipment, cabling, repair, operation and extra services. When the accounting period is over, it is important to go through the numbers and compare this with the budget.

Planning accounting and billing

Not all municipalities have accounting that shows ICT costs broken down by each school. There may be practical reasons for this, such as discounts and the like that the municipality gets centrally. Therefore it is important to do some planning so that you get an overview of what costs have been for operating and procurement when the accounts should be assessed against the budget.

Some organizations may have cumbersome and costly accounting procedures. You get fast extra charge to pay bills by delays, or if you have many who shall approve a payment. It is important to agree on good billing practices in the procurement and the operation. For both having control, as well as handling payments on time without long decision paths.

Implementation

The payment method is regulated by the SLA. One must agree with the finance department for a convenient way to get reports from the accounting, to get the necessary accounting overview of ICT costs without it takes a long time to get out the overview.

Daily operation

Regarding contracts one will usually have a fixed monthly billing consisting of a fixed amount and possible additional services. Billing is done from accounting office based on the current operating agreements, and the extra services performed. It is important to have good and frequent contact with the accounting service based on the tasks carried out for the customer.

Capacity Management

Capacity planning is used to ensure that all parts of the ICT solution has sufficient capacity to safeguard users' requirements. This includes:

Capacity planning is all about balance:

The objective of capacity planning is to avoid surprises.

Monitoring

It is essential for good capacity planning that the systems are continuously monitored to obtain the necessary data.

Typical data which are monitored are:

In Debian Edu, Nagios used as a monitoring tool.

Analysis

On the basis of data collected from monitoring routines, one tries to identify any bottlenecks in the systems. Examples:

Configuration

If the data analysis uncovers bottlenecks, one needs to try to set up the system in a way that better caters for the users' needs.

Here is a list of commonly encountered bottlenecks and what to do to get rid of them:

Bottlenecks

Actions

Missing sound, USB stick support and DVD on thin clients.

Install diskless workstations (> 800 MHz processor, > 256 MB RAM)

Has 60 thin clients connected to the server and want more PCs.

Go for diskless clients, or install another a thin client server

Thin clients runs slowly after we expanded with 20 pieces without acquiring a new server machine

Install 2GB more memory on the server machine

Thin clients with 32MB memory does not start after upgrading to Skolelinux 2.0

Turn on cache (swap) of the thin clients, or downgrade to LTSP 4.2 which is set up with swap.

Flash animations make the thin clients slow when 50 students are logged into the same server machine

Install diskless clients

Implementation

Implementation of possible changes the system configuration must be done in accordance with the guidelines set for changes of the system. A well-planned test of function and performance must also be done before changes can be made in the production system. Testing is done to avoid operational disturbances when changes are set into production.

Making of the capacity plan

A capacity plan is basically an investment plan for the ICT system based on knowledge of the users' current needs and future plans.

The capacity plan should be updated and processed once a year, normally in conjunction with the budget process. The plan should include the following themes:

Availability management

Good and stable availability of ICT services is obviously crucial for users.

Availability, seen from the user perspective, depends on the following assumptions:

Availability can be measured in several ways. But before we show examples we'll point out what may be difficult targetving figures. If we should make systematic efforts to availability, we have to clarify what the different things mean. What means for example a percentage of availability.

Let's say a "computer with computer program" is a service. If the computer program does not work one day, then the service unavailable if all the other programs work fine. What if the computer program is unavailable for a classroom, but available on the rest of the school (because of an underlying service). This is difficult matter to clarify and work on in practice.

Measures for availability

Availability can be measured using several methods. Here are some examples:

Value

Meaning

% available

The value can be availability between hours 08:00 and 18:00. If the system is down 1 hours during one day, than the system is available in 90% of the agreed upon time. If availability is measured over a month with 20 work days, then the system is available 95% of the time.

% unavailable

Is the system down one hour during an agreed uptime, for example 10 hours a day, the system is unavailable in 10% of the time. Measured over 20 days, we may assume the system has been unavailable for 5% of the time.

Hours unavailable

One can agree to the number of times one accepts the system is unavailable during, for example within one month (20 days). It can be a maximum of one hour halt in the period, and between 08:00 until 18:00.

Error frequency

Even error rate can be measured per day or every month. 3 errors in the month and that the system is down between 08:00 until 18:00, is an example.

Consequences of errors

Measured values are a common starting point for judging whether an error to have consequences beyond ordinary error correction. The customer or the school for example, may ask to pay less for the operating agreement for the current month.

The most important is that your measure describes the user experience in the best possible way. Therefore, one should measure what is important for the user.

The feedback from schools is that printers gives most problems. This includes everything from the print queue has stalled, to missing paper or toner. Some have also experienced some instability with the browser, and that OpenOffice.org suite is hanging. It may happen when your broadband connection is unstable and you have links in documents going to the Internet.

Infrastructure

To have a stable computer system is dependent on a good enough technical quality of the network. Several schools have experienced instability because the physical computer network is provisional and of poor quality.

Today many invest in wireless networks. Doing so, one must also be aware of wireless networks having significant weaknesses. Wireless networks have limited capacity. It can be quite choppy when about 30 students are to see a film from the Internet simultaneously. Wireless networks also have shadows. Meaning areas may not get coverage, which allows some to end up in blind zones. This would provide poor or no net connection at all.

Should you set requirements to access, it is normally done by the operating company and ICT services to require good quality computer network at school.

«Single points of failure»

Usually parts of a data solution must just work. Fails, for example, a firewall and stop working, stops all traffic to the Internet. One may also have problems with the stability of the system for allocating network addresses using DHCP (Dynamic Host Configuration Protocol).

The operating department's responsibility is to know of the parts that may stop the entire data solution. It is important to find these points, and remove the errors one by one, if this is something you can afford. If one can't afford to remove sources of errors stopping for example. the entire computer network, one must live with the risk for something suddenly does not.

Sources of error making everything stop, may also be logical rather than physical. This is especially true for computer networks and databases. So it is important to have a broader perspective when it comes to such errors.

Risk management

One must consider what one accepts of risks in the network. Is it acceptable that users lose personal files and data, when a hard drive fails? How quickly should one replace broken equipment? Some schools have experienced it takes several days to get the server up and go after a virus attack. The municipality has not resources to allocate to fix errors.

Much of operation goes on to maintain the agreed service level. It's about avoiding and lose confidence and user satisfaction. Risk management is about having in place the appropriate resources to keep the entire computer system on the air, and have resources ready if something should go wrong, and needs to be fixed.

Testing

It is a big difference to install equipment and software on a single PC and hundreds, even thousands of computers. With responsibility for hundreds of machines a small error, one can live with on a PC, mean much instability and discontent if the error affects hundreds of users.

To avoid making mistakes during installation and contributes to stability, it is essential to test equipment and software to be used. It's is to follow up the expected quality. If you want a stable operation one must often choose next to the last edition of equipment and software.

One should avoid adopting software ending with a zero. For example you should avoid OpenOffice.org 4.0. One should adopt the office program when version 4.0.2 has arrived or later. Then the program has been fixed for several errors. The same applies to hardware.

Server machines have usually a slightly older version of processors, and more robust memory, and hard drives. This is because many people use this hardware simultaneously. A small error that would not mean anything for one user, can provide downtime if 30 users logged into the machine.

So testing is about to use proven equipment and editions of software running well a half or a year. Testing is also about trying out the different parts in a smaller but realistic context, to ensure that everything works. Adopting the latest version, or even beta versions of software or completely latest hardware usually lead to much trouble and extra work with maintenance. Setting systems in production without a small test in realistic environments usually lead to significant firefighting and dissatisfied users.

When testing in a smaller scale on equipment in production, it is essential to arrange this with those affected. In addition, one must choose when to test. One should not test new things, for example, under examinations with use of ICT tools.

Design improvements

An operations department will be served by correcting systems that provide much operational messages. It may be users getting much spam. Then it may be okay to install files for spam. There may be a lot of extra work with students who constantly forgets their password, and teachers who send the inquiry to the central drifsstaben. To avoid extra emailing and double work so the teacher can give the student a new password.

This was a some examples of design improvements to lighten the work of operation and allows users become more satisfied. A well-run operations department has a list of prioritized improvements in design making operation easier. The priorities is usually done based on an assessment of the inquiry to the service, stored in the message log, and an assessment of the work that must be done to treat the requests.

Planning for availability

It means having realistic expectations to the ICT service based on what operations costs. Plan for what's expected accessability. For example, when schools require one should be on air in less than one hour after the server crashes, one must have a standing pre-installed machine in reserve, to be inserted as replacement for the faulty machine. It's made during one hour to copy your backup files to the backup machine.

Is a diskless or thin client broken a prepared small warehouse of machines and monitors is needed at the school. The school ICT contact can retrieve and install a replacement machine. This can be done easily without waiting days in ordering equipment.

Planning for recovery

As the example of equipment standing ready to replace defective equipment, it is also expected to be able to retrieve lost files and data. Therefore it is crucial to have a backup of user data and a copy of the configuration files. One must also have computer architectural drawings, and descriptions of system, making ICT staff able to quickly install systems when something goes wrong.

It is crucial to schedule backup of user data and settings. One must plan ahead in order to have proper equipment and appropriate services. Routines must be planned to be followed when certain error situations occurs and systems must be restored.

Service Continuity

Operating continuity or continuity management is often the most costly part of the work. High demands to operational continuity will require huge investments, which must be agreed in making the SLA. For example it can be agreed that there is no disaster plan for certain services. If you have a disaster plan the value is very low if not tested once in a while. Usually this is expensive. There are examples where customers and management have blocked the engine room and turned off power to test readiness of the IT department.

Operating continuity may be appropriate in certain periods like under examinations. Then it may be extra requirements to have equipment with backup ready in case of a hard disk on the server fails. But even this will require considerable additional work for the operational staff.

An IT coordinator told us that it might be just as well to postpone the exam one day, if something went wrong with the computer system. This costs a lot less than having a double number of servers at each school. There are examples of schools having had water leakage. Then it is usual to defer examination a day or two to repair the damage . One might think the same way when it comes to school data solution. If you have a backup of home directories for pupils and teachers, you have time to consider without doubling systems at each school. Then it is sufficient with one or two servers in reserve located at the municipality building, which quickly can be moved and connected at the school if something goes wrong.