Guidelines:
|
Account Number (range) | PIN number (integer) |
Account Balance (decimal) |
Account Type (string) |
|
---|---|---|---|---|
(S) 0812 0000 0000 to 0812 9999 9999 (C) 0829 0000 0000 to (X) 0799 0000 0000 to |
0000 - 9999 | -999,999.99 to 999,999.99 | S, C, X | |
record 1 | 0812 0837 0293 | 8493 | -3,123.84 | S |
record 2 | 0812 6493 8355 | 3558 | 8,438.53 | S |
record 3 | 0829 7483 0462 | 0352 | 673.00 | C |
record 4 | 0799 4896 1893 | 4896 | 493,498.49 | X |
The above matrix contains the minimum number of records that would physically represent the acceptable data values. For the Account Number, there is one record for each of the three ranges, all the PIN numbers are within the range specified, there are several different Account Balances - including one that is negative, and there are records for each of the different Account Types. The matrix above is the minimum data, best practice would be to have data values at the limits of each range as well as inside the range (see Guidelines: Test Case).
The advantage of physical representation is that the Test Data is limited in size and manageable, focused on and targeting the acceptable values. The disadvantage however, is that actual, real-world data is not completely random. Real data tends to have statistical profiles that may affect performance, which when using physical representation, would not be observed.
Statistical Test Data representation is Test Data that reflects a statistical sampling (of the same percentages) of the production data. For example, using the same data elements as above, if we analyzed the production database and discovered the following:
our Test Data, based upon statistical sampling would include 294 records (as compared to the four we noted above):
Test Data (at .1 percent of production) | ||
---|---|---|
Number of Records | Percent | |
Total Number of records | 294 | 100 |
Account numbers (S) 0812 0000 0000 to 0812 9999 9999 |
141 | 48 |
Account numbers (C) 0829 0000 0000 to 0829 9999 9999 |
144 | 49 |
Account numbers (X) 0799 0000 0000 to 0799 9999 9999 |
9 | 3 |
The above matrix only addresses the account types. In developing the best Test Data based upon statistical representation, you'd include the significant data elements. In the above example, that would include reflecting the actual account balances.
A disadvantage of the statistical representation is that may not reflect the full range of acceptable values.
Typically, both methods of identifying Test Data are used to ensure that the Test Data address all values and performance / population issues.
Test Data breadth is relevant to the Test Data used as input as well as the Test Data used to support testing (in pre-existing data).
Scope is the relevancy of the Test Data to the test objective, and is related to depth and breadth. Having a lot of data does not mean its the right data. As with the breadth of Test Data, we must ensure that the Test Data is relevant to the test objective, that is, that there is Test Data to support our specific test objective.
For example, in the matrix below, the first four Test Data records address the acceptable values for each data element. However, there are no records to evaluate negative balances for account types C and X. Therefore, although this Test Data correctly includes a negative balances (valid breadth), the data below would be insufficient in its scope to support any testing using negative account balances for each account type. Expanding this data to include additional records, including negative balances for each of the different account types would be necessary to address this oversight.
|
Account Number (range) | PIN number (integer) |
Account Balance (decimal) |
Account Type (string) |
---|---|---|---|---|
(S) 0812 0000 0000 to 0812 9999 9999 (C) 0829 0000 0000 to (X) 0799 0000 0000 to |
0000 - 9999 | -999,999.99 to 999,999.99 | S, C, X | |
record 1 | 0812 0837 0293 | 8493 | -3,123.84 | S |
record 2 | 0812 6493 8355 | 3558 | 8,438.53 | S |
record 3 | 0829 7483 0462 | 0352 | 673.00 | C |
record 4 | 0799 4896 1893 | 4896 | 493,498.49 | X |
New Record 1 | 0829 3491 4927 | 0352 | -995,498.34 | C |
New Record 2 | 0799 6578 9436 | 4896 | -64,913.87 | X |
Test Data scope is relevant to the Test Data used as input as well as the Test Data used to support testing (in pre-existing data).
The physical structure of Test Data is relevant only to any pre-existing data used by the target-of-test to support testing, such as an application's database or rules table.
Testing is not executed once and finished. Testing is repeated within and between iterations. In order to consistently, confidently, and efficiently execute testing, the Test Data should be returned to its initial state prior to the execution of test. This is especially true when the testing is to be automated.
Therefore, for to ensure the integrity, confidence, and efficiency of testing, it is critical that Test Data be free of all external influences, and it state be known at the start, during, and end of the test execution. There are two issues that must be addressed in order to achieve this test objective:
Each of these issues will affect how you manage your test database, design your test model, and interact with other roles.
Test Data may become unstable for the following reasons:
To maintain the confidence and integrity of testing, the Test Data should be highly controlled and isolated from these influences. Strategies to insure the Test Data is isolated include:
The other Test Data architecture issue that must be addressed is that of the initial state of the Test Data at the beginning of test execution. This is especially true when test automation is being used. Just as the target-of-test must begin the execution of test in a known, desired state, so to must the Test Data. This contributes to the repeatability and confidence that each test execution is the same as the previous.
Four strategies are commonly used to address this issue:
Each is described in greater detail below.
The method used will depend upon several factors, including the physical characteristics of the database, the technical competence of the testers, the availability of external (non-test) roles, and the target-of-test.
The most desirable method of returning Test Data to its initial state is Data Refresh. This method involves creating a copy of the data base in its initial state and storing it. Upon the completion of test execution (or prior to the execution of test), the archived copy of the test database is copied into the test environment for use. This ensures that the initial state of the Test Data is the same at the start of each test execution.
An advantage of this method is that data can be archived in several different initial states. For example, Test Data maybe archived at end-of-day state, end-of-week state, end-of-month state, etc. This provides the tester a method of quickly refreshing the to a given state to support a test, such as testing of the end of month use case(s).
If data cannot be refreshed, the next best method is to restore the data to its initial state through some programmatic means. Data re-initialize relies on special use cases and tools to return the Test Data to its initial values.
Care must be taken to ensure all data, relationships, and key values are returned to their appropriate initial value to ensure that no errors are introduced into the data.
On advantage of this method is that it can support the testing of the invalid values in the database. Under normal conditions, invalid data values would be trapped not allowed entry into the data (for example by a validation rule in the client). However, another actor may affect the data (for example an electronic update from another system). Testing needs to verify that invalid data is identified and handled appropriately, independent of how it occurs.
A simple method of returning data to its initial state is to "reverse the changes" made to the data during the test. This method relies upon using the target-of-test to enter reversing entries, that is, adding records / values that were deleted, un-modifying modified records / values, and deleting data that was added.
There are risks associated with this method however, including:
If this is the only method available in your test environment, avoid using database keys, indices and pointers as the primary targets for verification. That is, for example, use the Patient Name field to determine if the patient was added to the database instead of using a system generated Patient ID number.
Data roll forward is the least desirable method of addressing the initial state of the Test Data. In fact, it doesn't really address the issue. Instead, the state of the data at the completion of test execution becomes the new initial state of the Test Data. Typically, this requires modifying the Test Data used for input and / or the Test Cases and Test Data used for the evaluation of the results.
There are some instances when when this is necessary, for example at month-end. If no archive of the data, just prior to month's end, then the Test Data and Test Scripts from each day and week must be executed to "roll forward" the data to the state needed for the test of the month end processing.
Risks associated with this method include:
Rational Unified Process |