jagomart
digital resources
picture1_Software Quality Assurance Pdf 180409 | Flairs01 119


 137x       Filetype PDF       File size 0.52 MB       Source: www.aaai.org


File: Software Quality Assurance Pdf 180409 | Flairs01 119
from flairs 01 proceedings copyright 2001 aaai www aaai org all rights reserved a quagmire of terminology verification validation testing and evaluation valerie barr department of computer science hofstra university ...

icon picture PDF Filetype PDF | Posted on 30 Jan 2023 | 2 years ago
Partial capture of text on file.
              From: FLAIRS-01 Proceedings. Copyright © 2001, AAAI (www.aaai.org). All rights reserved. 
                    A Quagmire of  Terminology:  Verification                                         & Validation,               Testing,  and
                                                                             Evaluation*
                                                                             Valerie       Barr
                                                                   Department  of  Computer Science
                                                                             Hofstra  University
                                                                          Hempstead,  NY 11550
                                                                             vbarr~hofstra.edu
                                             Abstract                                      at  very different  levels  in  the software development pro-
                     Software engineering literature  presents multiple defi-              cess.  In  one usage,  the  term refers  to  testing  in  the
                     nitions for the terms verification,  validation and test-             small,  the  exercise  of  program code with  test  cases,
                     ing.  The ensuing diA~culties carry  into  research  on               with  a  goal  of  uncovering  faults  in  code by exposing
                     the verification  and validation  (V&V) of  intelligent               failures.   In  another  usage,  the  term refers  to  testing
                     systems.  We explore  both  these  areas  and then  ad-               in  the  large,  the entire  overall  process of  verification,
                     dress  the  additional  terminology problems faced when               validation,  and quality  analysis  and assurance.
                     attempting  to  carry  out  V&V work in  a  new domain
                     such as  natural  language processing (NLP).
                                         Introduction                                         The term V~V, for  verification         and validation,     is
                  Historically     verification     and validation      (V&V) re-          also  used  in  both high  level  and low level  ways. In  a
                  searchers  have labored  under  multiple  definitions          of        high  level  sense,  it  is  used synonymously with test-
                  key terms  within  the  field.      In  addition,    the  termi-         ing  in  the  large.  V&V can  refer  to  a  range  of  ac-
                  nology  used  by V&V researchers         working with  intel-            tivities   that  include  testing  in  the  small  and soft-
                  ligent   systems  can di~er  from that  used  by software                ware quality  assurance.  More specifically,            V&V can
                  engineers  and software  testing  researchers.         As a re-          be  used  as  an  umbrella  term  for  activities       such  as
                  sult,   many V&V research  efforts         must begin  with  a           formal  technical  reviews,  quality  and configuration
                   (re)definition   of  the  terms that  will  be used. The need           audits,  performance monitoring,  simulation,  feasibil-
                  to  establish  working definitions  becomes more pressing                ity  study,  documentation review,  database  review,  al-
                  if  we try  to  apply verification,    validation,  and testing          gorithm analysis,      development testing,      qualification
                   (W&T) theory  and practice  to  fields  in  which develop-              testing,   installation     testing  (Wallace  & Fujii  1989;
                  ers  do not  normally carry  out formal  VV&T activities.                Pressman  2001).  This  is  consistent          with  the  ANSI
                  This paper starts  with a review of  terminology that  is                definition  of  verification  as  the  process of  determining
                  used in  the  software  engineering/software  testing  ar-               whether or  not  an object  in  a  given phase of  the  soft-
                  eas.  It  then discusses  the  terminology issues  that  exist           ware development process  satisfies          the  requirements
                  among V&V researchers  in  the  intelligent        systems com-          of  previous  phases  ((ANSI/IEEE 1983b),  as  cited 
                  munity and between them and the  software  engineer-                     (Beizer  1990)).  In  this  view,  V&V activities      can take
                  ing/software  testing  communities. Finally,  it  explores               place during the  entire  life-cycle,     at  each stage  of  the
                  the  terminology  issues  that  can arise  when we attempt               development process,  starting        with  requirements  re-
                  to  apply  VV&T to  other  domain areas,  such  as  natural              views,  continuing  through  design  reviews  and code
                  language processing  systems.                                            inspection,    and finally  product  testing  (Sommerville
                                                                                           2001). In  this  sense,  software  testing  in  the  small is
                      Terminology         Conflicts        -   First     View              one activity     of  the  V&V process.  Similarly,      the  Na-
                  The first  term to  tackle  in  the  terminology of  software            tional   Institute    of  Standards  and  Technology (NIST,
                  testing  is  the  term testing  itself.     Unfortunately  this          formerly  National  Bureau of  Standards)  defines  the
                  word is  used to refer  to several activities  that  take place          high  level  view of  VV&T as  the  procedure  of  review,
                                                                                           analysis,  and testing  throughout the  software life  cycle
                       Uopyright ~)2001, American Association for Artificial               to  discover  errors,  determine functionality,      and ensure
                  Intelligence  (www.a.~i.org). All rights reserved.                       the  production  of  quality  software  (NBS 1981).
                                                                                                                    VERIFICATION, VALIDATION 625
                           Verification         & Validation                            ing  both  user  requirements  and  additional         require-
               In  a  low level  sense,  each  of  the  terms  verification             ments that  are  necessary  for  actual  system  develop-
               and validation     has  very  specific    meaning and refers             ment. However, in  new texts  on software  development
              to  various  activities    that  are  carried  out during  soft-          (for  example  (Hamlet  & Maybee 2001))  this  process
               ware development. In  an early  definition,        verification          is  broken into  two phases:  the  requirements  phase is
               was characterized  as  determining  if  we "are  building                strictly   user  centered,  while  the  specification       phase
              the  product  fight"       (Boehm 1981).  In  more current                adds  the  additional  requirements  information  that  is
               characterizations,  the  verification  process ensures that              needed by developers.  This  leads  to  confusing  defi-
              the  software  correctly  implements specific  functions                  nitions  of  V&V which necessitate  that  first        the  terms
               (Pressman 2001), characteristics       of  good design are  in-           "requirements" and "specifications"  be well defined.  In
              corporated,     and the  system  operates  the  way the  de-              (Hamlet & Maybee 2001) the  issue  is  addressed directly
              signers  intended  (Pfieeger  1998).                                      by defining  verification  as  "checking that  two indepen-
                 Note the  emphasis in  these  definitions  on aspects  of              dent  representations     of  the  same thing  are  consistent
              specification  and design.  The definition  of verification               in  describing  it."   They propose comparing the  require-
              used  by the  National  Bureau of  Standards  (NBS) also                  ments  document and the  specification            document for
              focuses on aspects that  are internal  to the system itself.              consistency,  then  the  specification        document and the
              They define  verification  as  the demonstration of consis-               design  document, continuing  through  all  the  phases of
              tency,  completeness, and correctness  of  the  software  at              software  development.
              each  stage  and  between each  stage  of  the  development                                          Testing
              life  cycle  (NBS 1981).                                                  We next  return  to  various  attempts  in  the  literature
                 Validation,  on the  other  hand,  was originally  char-               to  define  testing.    Most software  engineering  texts  do
              acterized  as  determining  if  we "are  building  the  right             not give an actual  definition  of  testing  and do not dis-
              product"  (Boehm 1981).  This  has  been taken  to  have                  tinguish  between testing  in the  large  and testing  in  the
              various meanings related to the customer or ultimate                      small.  Rather,  they  simply launch into  lengthy discus-
              end-user of the system. For example, in one defini-                       sion of what activities  fall  under the rubric  of  testing.
              tion validation is seen as ensuring that the software, as                 For example, Pfieeger  (Pfleeger  1998) states  that  the
              built, is traceable to customer requirements (Pressman                    different  phases of testing  lead to a validated  and veri-
              2001) (as contrasted with the designer requirements                       fied  system.  The closest  we get  to  an actual  definition
              specifications used in verification). Another definition of testing  (Pressman 2001) is  that  it  is  an "ultimate
              more vaguely requires that the system meets the expec-                    review of  specification,     design,  and code generation".
              tations of the customer buying it and is suitable for its                 Generally,  discussions of  testing  divide  it  into  several
              intended purpose (Sommerville 2001). Pfleeger adds                        phases,  such as  the  following  (Pressman 2001):
              the notion (Pfleeger 1998) that the system implements
              all of the requirements, creating a two way relation-                     ¯  unit  testing,   to  verify  that  components work prop-
              ship between requirements and system code (all code                          erly  with expected types  of  input
              is traceable to requirements and all requirements are
              implemented). Pfleeger further distinguishes require-                     ¯  integration    testing,    to  verify  that  system compo-
              ments vRlidatlon which makes sure that the require                           nents  work together  as  indicated  in  system speci-
              ments actually meet the customers’ needs. These var-                         fieations
              ious definitions generally comply with the the ANSI                       ¯  validation  testing,  to  validate  that  software conforms
              standard definition (ANSI/IEEE 1983a) of validation                          to  the  requirements  and functions  in  the  way the  end
               (as cited in (Beizer 1990)) as the process of evaluat-                      user  expects it  to (also  referred  to  as  function  test
              ing software at the end of the development process to                        and performance  test  (Pfleeger  1998)).
              ensure compliance with requirements. The National
              Bureau of  Standards  d_~qnltion  agrees  in  large  part                 ¯  system testing,     in  which software  and other  system
              with these  user-centered  definitions  of  validation,  say-                elements  are  tested  as  complete entity  in  order  to
              ing  that  it  is  the  determination  of  the  correctness  of              verify  that  the  desired  overall  function  and perfor-
              the  final  program or  software  with respect  to  the  user                mance of  the  system is  achieved (also  called  accep-
              needs  and requirements.                                                     tance  testing  (Pfleeger  1998)).
                 As other  terms  within  software  engineering  are  more
              carefully     defined,  there  is  a  subsequent  impact  on                 Rather  than  actually  define  testing,         Sommerville
              definitions     of  V&V. For  example,  the  "requirements                (Sommerville 2001) presents  two techniques  within  the
              phase" often  refers  to  the  entire  process  of  determin-             V&V process.  The first      is  software  inspections  which
              626     FLAIRS-2001
                   are static processes for checking requirements docu-                          each usage to  provide sufficient          context  and indicate
                   ments, design diagrams, and program source code. The                          whether a  high-level  or  low-level usage is  intended.
                   second is  what we consider  testing  in  the  small,  which
                   involves  executing  code with test  data  and looking  at                              V&V of  Intelligent                Systems
                   output  and operational  behavior.                                            The quagmire  of  terminology  continues  when we fo-
                      Pfleeger  breaks down the  testing  process slightly  dif-                 cns  on the  development of  intelligent        systems.  As dis-
                   ferently,  using three  phases (Pfleeger  1998):                              cussed  in  (Gonzalez & Barr 2000),  a similarly  varied
                   ¯  testing    programs,                                                       set  of definitions  exists.  Many of the definitions  are  de-
                   ¯  testing  systems,                                                          rived  from Boehm’s original  definitions            (Boehm 1981)
                    ¯  evaluating  products  and processes.                                      of verification  and validation,  although conflicting  deft-
                                                                                                 nitions do exist.  It  is  also the case that,  in this  area,  the
                   The first    two of  these  phases are  equivalent  to  Press-                software built  is  significantly  different  from the  kinds
                    man’s four  phases  listed        above.  However, Pfleeger’s                of  software  dealt  with in  conventional  software  devel-
                   third  phase  introduces  a  new concept,  that  of  eva/-                    opment models.  Intelligent        systems  development deals
                    uation.  In  the  context  of  software  engineering  and                    with more than  just  the  issues  of  specifications            and
                    software  testing,     evaluation  is  designed  to  determine               user  needs and expectations.
                    if  goals  have been met for  productivity  of  the  develop-                   The chief  distinction       between "conventional"  soft-
                    ment group,  performance  of  the  system,  and software                     ware and intelligent      systems is  that  construction  of  an
                    quality.  In  addition,  the  evaluation  process determines                 intelligent     system is  based on our  (human) interpre-
                    if  the project  under review has aspects that  are  of sufii-               tation   or  model of  the  problem  domain.  The systems
                    cient  quality  that  they can be reused in future  projects.                built  are expected to behave in  a fashion that  is  equiva-
                    The overall  purpose  of  evaluation  is  to  improve the                    lent  to  the behavior of  an expert in  the  field.  Gonzalez
                    software  development process  so  that  future  develop-                    and Barr argue,  therefore,  that  it  follows  that  human
                    ment efforts  will  run more smoothly, cost  less,  and lead                 performance should  be used as  the  benchmark for  per-
                   to  greater  return  on investment  for  the  entity  funding                 formance of  an intelligent       system.  Given this  distinc-
                    the  software project.                                                       tion,  and taking  into  account the  definitions  of  other
                       Peters  and Pedrycz  (Peters  & Pedrycz 2000)  present                    V&V researchers  within  the  intelligent          systems area,
                    one of  the  vaguer sets  of  definitions.       They define  val-           they propose definitions  of  verification  and validation
                    idation    as  occurring  "whenever  a  system  component                    of intelligent  systems as follows:
                    is  evaluated to  ensure that  it  satisfies  system require-
                    ments".  They then  define  verification            as  "checking            ¯  Verification  is  the  process  of  ensuring  1)  that  the
                    whether the  product of  a particular  phase satisfies  the                     intelligent    system conforms to  specifications,        and 2)
                    conditions  imposed at  the  beginning  of  that  phase".                       its  knowledge base is  consistent  and complete within
                    There is  no discussion  of  the  source  of  the  require-                     itself.
                    ments and the  source of  the  conditions,  so it  is  unclear
                    which step  involves  comparison to  the  design and which                   ¯  Validation  is  the  process  of  ensuring  that  the  out-
                    involves  comparison  to  the  customer’s  needs.  Their                        put of the  intelligent  system is  equivalent to those of
                    discussion  of  testing  provides no clarification          as they             human experts  when given  the  same inputs.
                    simply  state  that  testing  determines  when a  software
                    system can be released  and gauges future  performance.                         The proposed definition  of verification  essentially  re-
                       This brief  discussion  indicates  that  there  is  a  fair               tains  the  standard definition  used in software  engineer-
                    amount of  agreement,  within  the  software  engineering                    ing,  but  adds to  it  the  requirement  that  the  knowledge
                    community, on what is  meant by verification              and val-           base be consistent  and complete (that  is,  free  of  in-
                    idation.  Verification  refers,  overwhelmingly, to  check-                  ternal  errors).  The proposed definition  of  validation  is
                    ing  and establishing  the  relationship  between the  sys-                  consistent  with the  standard  definition  if  we consider
                    tem and its  specification        (created  during  the  design              human performance  as  the  standard  for  the  "customer
                    process),  while validation  refers  to the  relationship  be-               requirements"  or  user  expectations  that  must be satis-
                    tween the  system’s  functionality  and the  needs and ex-                   fied  by the  system’s  performance.
                    pectations  of  the  end user.  However, there  are  some au-                   Therefore,  we can apply the  usual  definitions  of  V&V
                    thors  whose use of  the  terms is  not consistent  with this                to  intelligent   systems with slight  modifications to  take
                    usage.  In  addition,  all  of  the  key terms (testing,        ver-         into  account  the  presence  of  a  knowledge base and the
                    ification,  validation,  evaluation,  specification,  require-               necessity  of  comparing system performance to  that  of
                    ments)  are  overloaded.  Every effort          must be made in              humans in  the  problem  domain.
                                                                                                                            VERIFICATION, VALIDATION 627
                     Applying         V&V in  a  New Area                           perform as well as predicted or desired, and compare
              As shown, the  area  of  VV&T is  based  on  overloaded               different approaches for solving a single problem.
              terminology,  with  generally  accepted  definitions        as           What becomes apparent is that there are several key
              well  as  conflicting  definitions   throughout the  litera-          differences between testing and evaluation. One obvi-
              ture,  both in  the  software  engineering  field  and in  the        ous difference between testing and evaluation is that
              intelligent    systems  V&V community.  The questions                 evaluation takes place late in the development life cy-
              then  arise,  how should we proceed and what dli~cuities              cle, after a system is largely complete. On the other
              might  be  encountered  in  an  attempt  to  apply  VV&T              hand, many aspects of testing (such as requirements
              efforts   in  a  new problem  domain? In  this  section  we           analysis and inspection, unit testing and integration
              discuss the difficulties that arose, and the specific ter-            testing) are undertaken early in the life cycle. A second
              minology issues, in a shift into the area of natural lan-             difference is that evaluation data is based on domain
              guage processing (NLP) systems.                                       coverage, whereas some of the data used in systematic
                Language, as a research area, is studied in many                    software testing is based on code coverage.
              contexts. Of interest to us is the work that takes place                 The perspective f~om which a system is either tested
              at the intersection of linguistics and computer science.              or evaluated is also very important in this comparison.
              The overall goal (Allen 1995) is to develop a computa-                In systematic software testing a portion of testing in-
              tional theory of language, tackling areas such as speech              volves actual code coverage which is determined based
              recognition,   natural  language understanding,  natural              on the implementation paradigm. For example, there
              language generation,  speech synthesis,  information  re-             are testing methods for systems written in procedu-
              trieval,  information extraction,  and inference  (Jurafsky           ral languages such as C, in object oriented languages
              & Martin 2OOO).                                                       such as C++ and 3ava, and developed using UML.
                We subdivide  language processing  activities      into  two        However, NLP systems are evaluated based on the ap-
              categories,  those  in  which text  and components of  text           plication domain. For example, a speech interface will
              are  analyzed,  and those  in  which the  analysis  mecha-            be evaluated with regard to accuracy, coverage, and
              nisms are  applied  to  solve  higher  level  problems.  For          speed (James, Rayner, & Hockey 2000) regardless 
              example,  text  analysis    methods include  morphology,              its implementation language.
              part  of  speech tagging,  phrase  chunking,  parsing,  se-             Finally, we contrast the respective goals of testing
              mantic analysis,  and discourse  analysis.  These analysis           and evaluation. As stated above, the goal of program
              methods are in turn used in application areas such as                level testing is to ultimately identify and correct faults
              machine translation, information extraction, question                 in the system. The goal of evaluation of an NLP sys-
              and answer systems, automatic indexing, text summa-                  tem is to determine how well the system works, and
              rization, and text generation.                                       determine what will happen and how the system will
                Many NLP systems have been built to date, both                      perform when it is removed from the development en-
             for research purposes and for actual use in application vironment and put into use in the setting for which
             domains. However, the literature indicates (Sundheim                  it is intended. Evaluation is user-oriented, with a fo-
              1989;  Jones  & Galliers     1996;  Hirschman & Thompson             cus on domain coverage.  Given its  focus  on the  user,
              1998) that  these  systems are  typically  subjected  to  an         evaluation  is  most like  the  validation  aspect  of  VV&T.
             evaluation  process  using a  test  suite  that  is  built  to           As part  of  evaluation  work, organized  (competitive)
             maximize domain coverage.  This  immediately  raises                  comparisons are  carried  out  of  multiple  systems which
             the  questions  of  what is  meant by the  term evaluation            perform the  same task.  For example, the  series  of  Mes-
             as  it  is  used in  the  NLP community, whether it  is  equiv-       sage  Understanding  Conferences  (MUC) involved  the
             alent  to  testing  in the  small or to  testing  in the  large,      evaluation  of  information  extraction       systems.  Simi-
             and where it  fits  in  the  VV&T terminology  quagmire.              larly  the  Text Retrieval  Conferences (TREC) carry  out
                NLP systems have largely been evaluated using a                    large-scale  evaluation  of  text  retrieval  systems.  These
             black-box, functional, approach, often supplemented                   efforts  allow for  comparison of  different  approar~es to
             with aa analysis of how acceptable the output is to                   particular   language processing  problems.
             users ((Hirschman & Thompson 1998; White & Taylor                        Functional,  black-box,  evaluation  is  a  very  impor-
              1998). The evaluation process must determine whether                 tant  and  powerful  analysis  method, particularly        be-
             the system serves the intended function in the intended               cause it  works from the  perspective  of the  user,  without
             environment. There are several evaluation taxonomies                  concern  for  implementation.        However,  a  more com-
              (Cole et al. 1998; Jones & Galliers 1996), but the                   plete  methodology would also  take  into  account  im-
             common goals are to determine if the system meets                     plementation  details     and conventional  program based
             objectives, identify areas in which the system does not               testing.    Without  this  we can  not  be sure  that  the
            628      FLAIRS-2001
The words contained in this file might help you see if this file matches what you are looking for:

...From flairs proceedings copyright aaai www org all rights reserved a quagmire of terminology verification validation testing and evaluation valerie barr department computer science hofstra university hempstead ny vbarr edu abstract at very different levels in the software development pro engineering literature presents multiple defi cess one usage term refers to nitions for terms test small exercise program code with cases ing ensuing dia culties carry into research on goal uncovering faults by exposing v intelligent failures another systems we explore both these areas then ad large entire overall process dress additional problems faced when quality analysis assurance attempting out work new domain such as natural language processing nlp introduction is historically re also used high level low ways searchers have labored under definitions sense it synonymously key within field addition termi can refer range ac nology researchers working intel tivities that include soft ligent di er war...

no reviews yet
Please Login to review.