Concepts: Metrics
Topics
We measure primarily to gain control of a project, and therefore to be able
to manage it. We measure to evaluate how close or far we are from the objectives
we had set in our plan in terms of completion, quality, compliance to
requirements, etc.
We measure also to be able to better estimate for new projects effort, cost
and quality, based on past experience. Finally, we measure to evaluate how we
improve on some key aspects of performance of the process over time, to see what
are the effects of changes.
Measuring some key aspects of a project adds a non-negligible cost. So we do
not measure just anything because we can. We must set very precise goals for
this effort, and only collect metrics that will allow us to satisfy these goals.
There are two kinds of goals:
- Knowledge goals: they are expressed by the use of verbs like
evaluate, predict, monitor. You want to better understand your development
process. For example, you may want to assess product quality, obtain data to
predict testing effort, monitor test coverage, or track requirements changes.
- Change or achievement goals: these are expressed by the use of
verbs such as increase, reduce, improve, or achieve. You are usually
interested in seeing how things change or improve over time, from an iteration
to another, from a project to another.
Examples
- Monitor progress relative to plan
- Improve customer satisfaction
- Improve productivity
- Improve predictability
- Increase reuse
These general management goals do not translate readily into metrics. We have
to translate them into some smaller subgoals (or action goals)
which identify the actions project members have to take to achieve the goal. And
we have to make sure that the people involved understand the benefits.
Examples
The goal to "improve customer satisfaction"
would decompose into:
- Define customer satisfaction
- Measure customer satisfaction, over several releases
- Verify that satisfaction improves
The goal to "improve productivity"
would decompose into:
- Measure effort
- Measure progress
- Calculate productivity over several iterations or projects.
- Compare the results
Then some of the subgoals (but not all) would require some
metrics to be collected.
Example
"Measure customer satisfaction"
can be derived from
- Customer survey (where customer would give marks for different aspects)
- Number and severity of calls to a customer support hotline.
For more information, consult [AMI95].
A useful way to categorize these goals is by organization, project and
technical need. This gives a framework for the refinement discussed above.
An organization needs to know, and perhaps improve, its costs per 'item',
shorten its build times (time-to market), while delivering product of known
quality (objective and subjective), and appropriate maintenance demands. An
organization may from time to time (or even continuously) need to improve its
performance to remain competitive. To reduce its risks, an organization needs to
know the skill level and experience level of its staff, and ensure it has the
other resources and capability to compete in its chosen sphere. An organization
must be able to introduce new technology and determine the cost-benefit of that
technology. The following table lists some examples of the kinds of metrics that
are relevant to these needs for a software development organization.
Concern |
Metric |
Item Cost |
Cost per line of code, cost per function point, cost per use
case. Normalized effort (across defined portion of life cycle, programming
language, staff grade, etc.) per line of code, function point or use case.
Note that these metrics are not usually simple numbers - they depend on the
size of the system to be delivered and whether the schedule is compressed. |
Construction Time |
Elapsed time per line of code or per function point. Note
that this will also depend on system size. The schedule can also be
shortened by adding staff - but only up to a point. An organization's
management ability will determine exactly where the limit is. |
Defect Density in Delivered Product |
Defects (discovered after delivery) per line of code or per
function point. |
Subjective Quality |
Ease of use, ease of operation, customer acceptance.
Although these are fuzzy, ways of attempting quantification have been
devised. |
Ease of Maintenance |
Cost per line of code or function point per year. |
Skills Profile, Experience Profile |
The Human Resources group would presumably keep some kind of
skills and experience database. |
Technology Capability |
- Tools - an organization should know which are in general use, and the
extent of expertise for those not regularly used.
- Process Maturity - where does the organization rate on the SEI CMM
scale, for example?
- Domain Capability - in which application domains is the organization
capable of performing?
|
Process Improvement Measures |
- Process execution time and effort.
- Defect rates, causal analysis statistics, fix rates, scrap and rework.
|
A project must be delivered typically:
- with required functional and non-functional capabilities;
- under certain constraints;
- to a budget and in a certain time;
- delivering a product with certain transition (to the customer),
operational and maintenance characteristics.
The Project Manager must be able to see if s/he is tracking towards such
goals, expanded in the following table to give some idea of things to consider
when thinking about project measurements:
Concern |
Project Effort and Budget
- How is project tracking on effort and cost against plan?
|
Project Schedule
- Is the project meeting its milestones?
|
Transition/Installation
- Are the predicted effort, cost and skills requirements acceptable?
|
Operation
- Are the predicted effort and skills requirements supportable by the
customer?
|
Maintenance/Supportability
- Are the predicted effort and skills requirements acceptable to the
customer?
|
Functional Requirements
- Are the requirements valid, complete?
- Are the requirements allocated to an iteration?
- Are the requirements being realized according to plan?
|
Non-Functional Requirements
- Performance
- Is the system meeting requirements for responsiveness, throughput,
recovery time?
- Capacity
- Can the system handle the required number of simultaneous users?
Can the web site handle the required number of hits per second? Is
there sufficient storage for the required number of customer
records?
- Quality Factors
- Reliability: how often are system failures allowed, and what
constitutes a system failure?
- Usability: is the system easy and pleasant to use? How long does
it take to learn how to use it and what skills are required?
- Fault tolerance/robustness/resilience/survivability: can the
system continue to function if failures occur? Can the system cope
with bad input? Is the system capable of automatic recovery after
failure?
- Specialty Engineering Requirements
- Safety: can the system perform without risk to life or property
(tangible and intangible)?
- Security/privacy: does the system protect sensitive data from
unauthorized access? Is the system secure from malicious access?
- Environmental impact: does the system meet environmental
requirements?
- Other Regulatory or Legal Requirements
- Constraints
- External environment: is the system capable of operation in the
prescribed environment?
- Resources, host, target: does the system meet its CPU, memory,
language, hardware/software environment, constraints?
- Use of commercial-off-the-shelf (COTS) or other existing software: is the system meeting its reuse constraints?
- Staff availability and skills: can the system be built with the
number and type of staff available?
- Interface support/compatibility: can the system support required
access to and from other systems?
- Reusability: what provisions are made for the system to be
reusable?
- Imposed standards: are the system and the development method
compliant?
- Other design constraints (architectural, algorithmic, for example):
is the system using the required architectural style? Are the
prescribed algorithms being used?
|
This is an extensive, but not exhaustive list, of concerns for the Project
Manager. Many will require the collection and analysis of metrics, some will
also require the development of specific tests (to derive measurements) to
answer the questions posed.
Many of the project needs will not have direct measures and even for those
that do, it may not be obvious what should be done or changed to improve them.
Lower level quality-carrying attributes can be used to build in quality against
various higher level quality attributes such as those identified in ISO Standard
9126 (Software Quality Characteristics and Metrics) and those mentioned above in
Project Needs. These technical measures are of engineering (structural and
behavioral) characteristics and effects (covering process and product), that
contribute to project level metrics needs. The attributes in the following table
have been used to derive a sample set of metrics for the Rational Unified
Process artifacts and process.
This may be found in Guidelines:
Metrics.
Quality |
Attributes |
Goodness of Requirements |
- Volatility: frequency of change, rate of introduction of new
requirements
- Validity: are these the right requirements?
- Completeness: are any requirements missing?
- Correctness of expression: are the requirements properly stated?
- Clarity: are the descriptions understandable and unambiguous?
|
Goodness of Design |
- Coupling: how extensive are the connections between system elements?
- Cohesion: do the components each have a single, well-defined purpose?
- Primitiveness: can the methods or operations of a class be
constructed from other methods or operations of the class? If so they
are not primitive (a desirable characteristic).
- Completeness: does the design completely realize the requirements?
- Volatility: frequency of architectural change.
|
Goodness of Implementation |
- Size: how close is the implementation to the minimal size (to solve
the problem)? Will the implementation meet its constraints?
- Complexity: is the code algorithmically difficult or intricate? Is it
difficult to understand and modify?
- Completeness: does the implementation faithfully realize all of the
design?
|
Goodness of Test |
- Coverage: how well does the test exercise the software? Are all
instructions executed by a set of tests? Does the test exercise many
paths through the code?
- Validity: are the tests themselves a correct reflection of the
requirements?
|
Goodness of Process (at lowest level) |
- Defect rate, defect cause: what is the incidence of defects in an
activity, and what are the causes?
- Effort and duration: what duration and how much human effort does an
activity require?
- Productivity: per unit of human effort, what does an activity yield?
- Goodness of artifacts: what is the level of defects in the outputs of
an activity?
|
Effectiveness of Process/Tool Change |
(as for Goodness of Process, but percentage changes rather
than total values):
- Defect rate, defect cause
- Effort and duration
- Productivity
- Goodness of artifacts
|
For a deep treatment of metrics concepts, see [WHIT97].
We distinguish two kinds of metrics:
- A metric is a measurable attribute of an entity. For
example, project effort is a measure (that is, metric) of project size. To be
able to calculate this metric you would need to sum all the time-sheet
bookings for the project.
- A primitive metric is a raw data item that is used to
calculate a metric. In the above example the time-sheet bookings are the
primitive metrics. A primitive metric is typically a metric that exists in a
database but is not interpreted in isolation.
Each metric is made up of one or more collected metrics. Consequentially each
primitive metric has to be clearly identified and its collection procedure
defined.
Metrics to support change or achievement goals are often
"first-derivative" over time (or iterations or project). We are
interested in a trend, not in the absolute value. To "improve quality"
we need to check that the residual level of known defects diminishes over time.
Template for a metric
Name |
Name
of the metric and any known synonyms. |
Definition |
The
attributes of the entities that are measured using this metric, how the
metric is calculated, and which primitive metrics it is calculated from. |
Goals |
List
of goals and questions related to this metric. Also some explanation as to
why the metric is being collected. |
Analysis
procedure |
How
the metric is intended to be used. preconditions for the interpretation of
the metric (e.g., valid range of other metrics). Target values or trends.
Models of analysis techniques and tools to be used. Implicit assumptions
(for example, of the environment or models). Calibration procedures. Storage. |
Responsibilities |
Who
will collect and aggregate measurement data, prepare the reports and analyze
the data. |
Template for a primitive metric
Name |
Name
of the primitive metric |
Definition |
Unambiguous
description of the metric in terms of the project's environment |
Collection
procedure |
Description
of the collection procedure. Data collection tool and form to be used.
Points in the lifecycle when data are collected. Verification procedure to
be used. Where will the data be stored, format, precision. |
Responsibilities |
Who is responsible for collecting the data. Who is responsible for verifying the data. |
There are two activities:
- Define measurement plan
- Collect measures
Define measurement plan is done once per development cycle - in the inception
phase, as part of the general planning activity, or sometimes as part of the
configuration of the process in the development case. The measurement plan may
be revisited like any other section of the software development plan during the
course of the project.
Collect measures is done repetitively, at least once per iteration, and
sometimes more often; for example, weekly on an iteration spanning many months.
The metrics collected are part of the Status Assessment document, to be
exploited in assessing the progress and health of the project. They may also be
accumulated for later use in project estimations and trends over the
organization.
Estimation
The project manager in particular is faced with having to plan - assign
resources to activities with budgets and schedules. Either effort and schedule
are estimated from a judgment of what is to be produced, or the inverse -
there are fixed resources and schedule and an estimate of what can be produced
is needed. Estimation typically has to do with the calculation of resource needs
based on other factors - typically size and productivity - for planning
purposes.
Prediction
Prediction is only slightly different from estimation, and is usually about
the calculation of the future value of some factor based on today's value of
that factor, and other influencing factors. For example, given a sample of
performance data, it is useful to know (predict) from it how the system will
perform under full load, or in a resource constrained or degraded configuration.
Reliability prediction models use defect rate data to predict when the system
will reach certain reliability levels. Having planned an activity, the project
manager will need data on which to predict completion dates and effort at
completion.
Assessment
Assessment is used to establish the current position for comparison with a
threshold, say, or identification of trends, or for comparison between
alternatives, or as the basis for estimation or prediction.
For more on metrics in project management, read [ROY98].
| |
|