181x Filetype PDF File size 1.50 MB Source: core.ac.uk
UnivUniversity of Nebrersity of Nebraska Medical Center aska Medical Center DigitalCommons@UNMC DigitalCommons@UNMC Journal Articles: College of Nursing College of Nursing 4-1996 Basics of rBasics of researesearch (Pch (Parart 6): Quantitativt 6): Quantitative data analysis e data analysis Cheryl Thompson University of Nebraska Medical Center, cbthompson@unmc.edu Robert Schwartz University of Pittsburgh Eric Davis Strong Memorial Hospital Edward A. Panacek University of California, Davis Follow this and additional works at: https://digitalcommons.unmc.edu/con_articles Part of the Nursing Commons Recommended Citation Recommended Citation Thompson, Cheryl; Schwartz, Robert; Davis, Eric; and Panacek, Edward A., "Basics of research (Part 6): Quantitative data analysis" (1996). Journal Articles: College of Nursing. 19. https://digitalcommons.unmc.edu/con_articles/19 This Article is brought to you for free and open access by the College of Nursing at DigitalCommons@UNMC. It has been accepted for inclusion in Journal Articles: College of Nursing by an authorized administrator of DigitalCommons@UNMC. For more information, please contact digitalcommons@unmc.edu. SPECIAL COMMUNICATION Basics of Research (Part 6): Quantitative Data Analysis Cheryl Bagley Thompson, PhD, RN, CS,’ Robert Schwartz, MD, MPH,* Eric Davis, MD,3 Edward A.-P&acek,‘MDd 1. University of Utah College of Nursing, Salt Introduction should consider the statistician an impor- Lake City, Utah You can do it early or you can do it late, tant consultant and collaborator and 2. Department of Emergency Medicine, Cen- but eventually all investigators using should avail himself or herself of the sta- ter for Injury Research and Control, Uni- quantitative methods have to deal with tistician’s expertise. versity of Pittsburgh Medical Center, Pitts- statistical analysis. The purpose of this To help diminish the stress of a statis burgh, Pennsylvania part of the Basics of Research series is to tical consultation, the investigator should 3. Department of Emergency Medicine, try to take some of the fear out of ap- prepare a list of questions before meet- Strong Memorial Hospital, Rochester, proaching your data analysis by provid- ing with the statistician. In creating the New York ing an introduction to basic statistical list of questions the investigator needs to 4. Division of Emergency Medicine and Clini- concepts and the process of data analy- start with the research question.1 If the cal Toxicology, University of California, sis. To help accomplish this, definitions investigator has an idea of what statistics Davis, Medical Center, Sacramento, Cali- will be provided for uncommon concepts to use, the questions for the statistician fornia and examples will be used to demon- are related to whether the proposed Key Words: clinical research, research, sta- strate key points. analyses are appropriate and what other tistical analysis The discussion will proceed through statistics should be considered. If the in- the chronologic steps used for a statisti- vestigator has no idea of what statistics Address for correspondence and reprints: cal analysis: planning, preparation, and to use, the first question should be, Cheryl Bagley Thompson, PhD, RN, CS, Uni- statistical concepts. The final section will ‘What statistics are appropriate for the versity of Utah College of Nursing, 25 South include a discussion on interpreting your research questions being addressed?“. Medical Drive, Salt Lake City, UT 64112 statistical analysis. The investigator should take advan- Copyright 0 1996 by the Air Medical Journal tage of the meeting with the statistician Associates. Planning to find out why the analysis is appropri- Reprint no. 74llff2811 The most important part of any research ate and to increase his or her knowledge project is the planning process. The of statistics. The investigator needs to be more complete the planning, the able to defend his or her choice of statis- smoother the project generally runs. tics at presentations and within publica- This includes data collection and analy- tions. Saying merely that the statistician sis. The development of a statistical said so is not sufficient. Consequently, analysis should not be delayed until after this understanding of the analysis is es- the data is in hand. Rather, the investiga- sential. tor should have a specific plan for data Several advantages result from hav- analysis before initiating the study. ing a plan for data analysis before begin- Investigators uncomfortable with sta- ning the study. The most obvious is that tistical analysis should consult a statisti- the investigator is not left perplexed cian early in the planning phase. A statia about what to do with all the data now in tician will help the investigator the computer. A plan speeds the process determine what statistical analysis is of data analysis. If a computer program most appropriate for answering the re- will be used, the commands for the search question considering the type of analysis can even be written before data data collected. Statisticians, however, can collection is complete. In this case, as be intimidating to some individuals. This soon as all the data are entered, the in- should not be the case. The investigator vestigator runs the predetermined prc- Air Medical Journal 152 April-June 1996 73 grams and the analysis is ready for inter- puter that can only be accessed by the re ual subjects are listed down the side. pretation. search team or must be placed into files This data format takes less time to create The second advantage of planning the with password protection. In addition, than a data input screen but still requires statistical analysis before the study is an the investigator must have backups in the enter, tab, or arrow key to be pressed increase in scientific integrity. The inves- case of catastrophic loss. Backups should after each data element. Most current tigator who has a plan ahead of time is be made at frequent intervals during data spreadsheets allow for a variety of simple less likely to bend the analysis to suit the entry to prevent loss caused by power statistics to be calculated without using a purpose. A plan also prevents the failure or other mechanical difficulty. statistical analysis program. Investigators process of doing repeated analyses until The investigator needs to mark the needing a sophisticated statistical analy- something is found that is statistically disk(s) carefully so that the most recent sis may need more power than is avail- significant. A post hoc (after the fact) ap disk is used for the next data entry ses- able with a spreadsheet. However, statis preach to statistical analysis is inappro- sion. Data can be inadvertently lost if tical programs such as SPSS for windows priate and increases the chance of mak- some subjects are entered on one disk provide a spreadsheet-type data entry for- ing a type I statistical error (see following and the next set of subjects added to a mat that then allows complex data analy- section on hypothesis testing for discus backup that does not contain a complete sis using SPSS.3 sion of type I errors). If post hoc analyses set of data. A final method for data entry is enter- are used, a technique such as Bonferroni Data backups should be made at the ing data directly into a flat text file. This adjustment is needed to decrease the end of each day and can consist of the type of data entry is demonstrated in chance of a type 1 error.2 entire computer system or of individual Table 1. The main advantage of flat files Before beginning a study, the investi- data files for the study. Backups can be is the speed of data entry. A competent gator also should know what computer made onto tapes or onto floppy disks. typist can enter data quickly because hardware and what statistical and data The best backup plan will provide for few if any characters need to be typed be entry software are available for data entry storage of backup tapes at a site remote tween variables. Spaces are left occasion- and data analysis. The investigator from the location of the original data. ally to allow the eye to follow columns of should know the capabilities of the statis This plan will protect against major loss values, but the spaces are much fewer tics program to be used and should de- of data in case of fire or other disaster than needed to enter data into a spread- termine whether his or her computer is that might destroy data residing in a sheet. The disadvantage of this method powerful enough to run the planned desk, as well as on the computer itself. of data entry is that it is easier to make a analyses. The investigator should not mistake. Numbers once entered are not plan for analyses that he or she has no Data entry associated directly with a variable name. way of conducting. The investigator also Data can be entered into a computer file Consequently, finding errors in data should spend time during the early for analysis using one of several data entry may be more difficult. phases of the project becoming familiar entry methods. Although not common Data coding is the process of assign- with the statistical package to be used. with air medical research, data can be ing a numerical value to a qualitative re Data analysis will proceed more smooth- collected on forms that are automatically sponse. For example, an investigator ly if the investigator does not need to read into a computer data file. Because of may ask for the professional background stop and ask for technical assistance. the expense involved, this is not a com- of a transport team member. The re- mon method for data entry. sponses may be physician, nurse, para- Data Management A second method entails the develop medic, respiratory therapist, pilot, or Data Storage ment of a computerized data entry form other. To facilitate the statistical analysis Data, once collected, should be stored that resembles the original form onto and to save space in the data file, the in- carefully to prevent damage or violations which the data is hand written by the vestigator may assign a numerical value of subject privacy. Having a plan for orga- data collector or subject. This method is to each possible response such that all nizing the data before data collection be user friendly because the screen resem- physicians would be 0, all nurses 1, all gins is also advisable. A commonly used bles the paper form and generally is easy paramedics 2, all respiratory therapists 3, method is manila file folders. Data to understand. In addition, the individual all pilots 4, and all others 5. Although should be sorted into appropriate file entering data can see that the data they numbers are easier to deal with, the in- folders and placed into a locked file cabi- are entering matches the variable name vestigator must be careful that he or she net. If data is not to be entered into a on the screen. The disadvantages of this does not attribute ordinal or ratio level computer file but will be tabulated by method are that data entry takes longer characteristics to data that were origi- hand, it is advisable to make a complete because the enter, tab, or arrow key nally nominal in nature. For example, a copy of the data and then store the com- must be pressed with each data element correlation requires that the level of mea- plete copy at another site. and time is needed to create and debug surement for both of the variables be at Security of data remains an issue even the input form. the interval or ratio level of measure- after data is stored within a computer file. Data also can be directly entered onto ment. Coding profession as a numerical The investigator must ensure that unau- a spreadsheet. With spreadsheet data value does not then allow the investigator thorized access to the files is not possi- entry the names of the variables are to do correlations between profession ble. Data files need to be stored in a com- placed across the top row and the individ- and another variable. The dam remain at 74 April-June 1996 15:2 Air Medical Journal the nominal level and thus are inappropri- ate for inclusion in a correlation analysis. p!Kt File Data Entry Misaligned Data Indicating Data transformation is a variation of Data Entry Error data coding and is the process of chang- ing the numerical representation of a 101120195 3455 11122012 101120195 3455 11122012 quantitative value to another value. Data 102120395 3545 11021121 1022120395 3545 11021121* can be changed to reflect a new measure 103120695 4443 01011022 103120695 4443 01011022 ment scale or to make alterations in the 104121095 3343 10112222 104121095 3343 10112222 distribution of the data. For example, ‘The longer line is because an extra 2 was entered with the subject number of 102 In the first three columns. data on neonate weight may be obtained in grams. However, if older subjects also are included in the dataset, the neonate Population weights may need to be transformed into Group1 Group2 kilograms from the original grams. In the polxpolxpolx polxpolxpolx case of family income level an uneven polxpolxpolx polxpolxpolx distribution of subjects is usually found polxpolxpolx polxpolxpolx at the low end of the scale. Economists often use a logarithmic transformation so Sample that the data are more evenly distributed. Growl Group2 If the income data are reduced to their PXPl poxl logarithmic value, the high incomes are POPX PP XI brought closer to the lower end of the PPlO 00 xl scale and provide a distribution closer to , a normal curve. Physically coding the data can be ac- been made. The primary disadvantage of believing that their data entry was ade- complished by one of several methods. this method is the increased time needed quate and that obvious errors can be Some researchers like to evaluate each for data entry. Another disadvantage is found with other methods. data collection form before data entry that commonly the same mistake will be A final method for data checking is to and place the appropriate code next to made twice, thus concealing the mistake. obtain a frequency distribution for all the response on the form. Others prefer Another method for cleaning data as it variables. The investigator can check for to put the appropriate code on the data is entered is by using controls in the data obvious errors by comparing the data ob collection form before distribution so entry program. If a form is constructed tained with the data that should have that the code can be read from the form for use with data entry, value checking been obtained. For example, if gender is at the time of data entry. A final way to rules can be created. Once a rule is cre coded as 0 for males and 1 for females, code data is to have a separate code book ated, the computer will check all data en- no 2’s or 3’s should appear for the value and to use this as a reference during data tered into that field to make sure that it of gender. If the frequency distribution entry into the computer. The first meets the established criteria. shows an erroneous value, the investiga- method requires two passes through the Other methods of data cleaning in- tor can search the data set for the erro- data collection forms. Consequently, one clude a variety of ways for examining the neous value, find the subject number for of the later two approaches often is pre- raw data after computer entry. The first the subject where the error was made, ferred. If the investigator does not wish approach is to use the appearance of the find the correct value on the original data the subject to be aware of the coding data. One of the most common errors collection form, and reenter the data. scheme or does not have room for it on with data entry is skipping variables or the data collection form, the last option double entry of a value. If periodic spaces Statistical Analysis may be the best. are planned into data entry, either of Overview these errors will cause the data to be- Understanding the concepts of a statisti- Data cleaning come misaligned (Table 2). cal analysis before actually running any Data entered into the computer will in- The method of data cleaning that statistical tests is extremely important. evitably contain mistakes. Data cleaning catches most errors is the process of On a basic level, statistics (specifically in- is the process of trying to find errors and reading the data back. With this method ferential statistics) try to determine to correct them before data analysis. Sev- one investigator reads the data elements whether a difference exists between mea- eral techniques can be used for data from the data entry form and the second sured values and how confident you can cleaning. checks the values entered by reading a be that the diierence is, in fact, a real dii Programs are available that allow the copy of the data file. Inconsistencies be- ference. The measured values for a study investigator to enter the data twice. The tween what the data should be and the are derived from a sample, which is computer then checks for inconsisten- data that was entered become obvious some subset of a population.4 Statistical cies between the two sets of data and no- and errors can be corrected. Some inves tests can be used to determine whether tifies the individual that a mistake has tigators elect not to read the data back the sample is representative of the total Air Medical Journal 15:2 April-June 1996 75
no reviews yet
Please Login to review.