ANALYSIS OF DATA AND THEIR PRESENTATION
Aims and Objectives
3.1 | Introduction |
3.2 | h4eaning of Data Analysis |
3.3 | Eilementary Analysis of Data |
3.4 | Coding and Tabulation |
3.5 | Statistical Analysis of Data |
3.6 | Presentation of Data |
3.7 | Let Us Sum Up |
3.8 | Key Words |
3.9 | Suggested Readings |
3-10 | Model Answers |
3.0 AIMS AND OBJECTNES
The aim of this unit is to acquaint you with the basic aspects of analysis of data and their pre:sentation. Analysis of data’in social sciences is not a simple exercise rather it involves complex and cumbersome techniques and procedures. Therefore when you actually take up a research project you may require further readings in this area. However after having learned this unit you should be able to:
describe the fundamental aspects pertaining to the analysis of data
identify the major steps involved in the processing and interpretation of data describe some of the important characteristics of tabulation and statistical analysis
state some of the basic elements related to presentation of data.
3.1 INTRODUCTION
In Unit 1of this block, we have discussed some of the important aspects concerning the designing of a research study. Unit 2 has taken care of the nature and selection of research tools and their application in the process of data collection. By now you know that preparation of an adequate research design requires a lot of advance thinking and planning. The executiori of a research design with the help of appropriate research tools culminates into the collection of data. Significant meanings can be drawn through the analysis and interpretation of such data.
The data collected do not automatically provide answers to research questions. It is the systematic analysis of data that helps to build an intellectual edifice in which properly sorted and sifted observations and facts locate their true meaning. In fact, the function of analysis is to summarize the data in such a manner, that they yield bnswers to research questions. The process of data collection also helps to reveal the broader significance of a research study by linking it to other available knowledge. Hence, in order to draw meaningful conclusions from a research venture, the data are needed to be analysed and presented in an appropriate manner.
3.2 MEANING O F DATA ANALYSIS
Scientific knowledge is a means to understand the empirical world. Its progress involves a repetitive interplay between theoretical ideas and empirical evidence.’ In most of the research, theoretical formulations guide the process of data collection. Data and facts, in turn, help to generate tentative theories. Deductions from a
tentative theory are compared with new facts. The theory is upheld, modified or discarded in the light of such facts. This constant interplay between theory and data is a continuous process that helps in the ptogress of scientific knowledge.
Analysis and interpretation of data form a vital link between theory and empirical evidence. Data analysis takes place whenever theory and data are compared. This comparison begins in field research when an investigator struggles to bring order to or to make sense out of his observations. In surveys and experiments, the researcher typically begins to bring theory and data together when testing the validity of a hypothesis. 0,n the whole, thus, demand for this continued interaction between theory and data sets the stage for data-analysis.
Data analysis involves processing and interpretation of data. Processing of data involves the transformation of categories into coding schemes that are amenable to quantitative treatment. It demands organization, reorganization and understanding of data so as to reach certain conclusions in the light of the theoretical concerns of the study. Interpretation of data means that certain inferences are drawn from the study and the findings of research are linked to the already existing state of knowledge.
The problems raised by analysis of data are directly related to the complexity of the hypothesis in studies where the purpose is hypothesis-testing. Under ideal conditions of precision and simplicity, data analysis presents very few problems. In such a situation, the statements of the hypothesis or research questions automatically provide for an adequate basis for the analysis of data. For instance, in an experimental study, the researcher is supposed to collect such data as fall within the experimental design. In other words, to depict a causal relationship the focus of data collection is usually on the variables involved in the study. Once the data about the independent and dependent variables become available, the process of data analysis turns out to be more or less, an operation which follows a pre-determined path.
Thus, when relevant factors are known in advance, no serious problems are encountered in data analysis. However, the occurrence of such an ideal situation is relatively infrequent in social research. Many a times, the situation demands that data analysis must cope with diverse factors not anticipatedin the original research design. In such studies, the process of data analysis is cumbersome and the researcher must have thorough knowledge about the different steps involved in it.
Now, if data are analysed to answer specific questions, it is termed as primary analysis. On the other hand, if data collected for a particular purpose are later on examined in terms of some new objective or for resolving another problem, then it is &led secondary analysis. When the object of data-analysis is to determine the causal relationship between two variables, this is done through bivariate analysis. However, when the relationship among more than two variables is investigated, the statistical ‘ tool of multivariate analysis is used.
Two aspects need special attention in the context of data analysis. These are measurement and evaluation. Measurement involves: 1) identification of the quality or attribute that is to be measured; 2) determination of a set of operations by which the attribute may be made manifest and perceivable; 3) establishment of certain procedures for translating observations into quantitative statements of some degree or amount. Evaluation concerns itself with the nature and quality of measurement and the determination of their reliability and validity.
Now, in everyday life, we all are engaged in measurement at various levels. We generally make statements like the soil is very fertile, the road is extremely bad, the room is very big and so on. While making such statements-weare measuring some attribute or quality. What we actually do in this process is that we assign labels ‘very fertile’, ‘extremely bad’, ‘very big’ to objects such as soil, road, room so as to represent their conceptual properties. Thus, whenever we’rate something, we are engaged in measurement.
There is difference, of course, between such every day examples of measurement and the process of measurement in social research. In the former cases, the rules for assigning labels or numbers to objects are more or less intuitive whereas in social
sciences these rules must be spelled out in detail. Scientific norms require that we specify the guidelines and follow procedures to obtain a measurement so that others can repeat our observations and judge the quality of information yielded by our measurenlent procedures.
In social sciences, the measurement process consists of moving from the abstract to the concrete. This process begins as soon as the researcher formulates the research problem. Every research problem or hypothesis contains concepts or variables that refer to aspects of reality in which the researcher is interested. Thus at the time of formulatilw of questions the researcher must think which terms are to be used and what they mean in both an abstract and an empirical sense. In fact, the ultimate goal of measurement is to specify clearly the observable referents of terms contained in a research Iiypothesis.
The process of measurement can be broken into three steps-conceptualization, specification of variables and indicators, and operationalization. The development and clarification of concepts is called conceptualization. We can identify and observe the actual existence of a concept only if its meaning is made clear. The initial step is to clarify the mental construct conveyed by the concepts. In other words, we must amve at precise definitions of concepts. Suppose you are interested in testing the hypothes~sthat “education reduces poverty”, then you would begin by defining the meaning of ‘education’ and ‘poverty’. The usual practice recommended for the beginning researcher is to rely on existing definitions in social science literature and carefully consider which one is suitable for his work. Investigators, however, sometimes formulate their own definitions, especiany when disagreement exists in the literature over the exact meaning of the concept.
Measurement assumes the possibility of assigning values to concepts. But we measure concepts that vary, which we refer to as variables. Now most concepts in social sciences are not directly observable. For instance, we cannot see education in the same sense as we can see a table or fan. While we cannot see education, we can observe what knowledgeable people are like and how much formal schooling they have. Thus after conceptualization, the important aspect is to identify the varying manifestations of concepts in real life situations. It is at this point that we move from the language of concepts to the language of variables.
Researchers often use the terms ‘concept’ and ‘variable’ interchangeably. But these terms cormote different levels of abstraction and different stages in the measurement process. While a concept is a ‘mental construct’, a variable usually signifies some observable event that represents the underlying concept. Thus, we must specify in which sense a given concept may vary or assume different sense or meaning in observabie terms. For instance, the concept of education can have varying manifestations in real social life. Education can be in the field of medicine, engineering, social sciences and so on. For purposes of measurement, it is of paramount importanc’e that we become fully aware of the variable character of concepts and specify their meaning and sense as employed in a study.
Measurement operations aim to estimate the existing values of variables. But to achieve this objective the main problem arises as to how to assign different values, numbers or labels to the variables. This problem is resolved by the operationalization of concepts or variables. The detailed description of the research operations or procedures necessary to assign various units (e.g. persons) to the categories of variables is called an operational definition. We must choose or develop such operational definitions which correspond reasonably close to the concept or variable in question. The gap between the concept (mental contruct) and the observable reality represented by an operational definition should be the minimum possible.
For example, we are interested in determining the ‘educational status’ of the people belonging to a particular area. After taking care of research design, sampling etc., we will investigate how many persons have acquired formal education. We will further investigate how many years they have spent at school, college etc. and what type of certificates, diplomas or degrees they have obtained. Thus by linking the concept of education with ‘years of schooling’ or degrees, we can know how many people have studied upto primary, middle, higher secondary, Bachelor’s degree, Master’s degree
Analpis of Data and their Presentation and so on. We can further calculate the percentage in each catego9 and can prodrice a quantitative statement about the ‘educational status’ of the population under study. Similarly, the concept of ‘poverty’ can be operationalized by linking it with ‘income’.
- particular household monthly income (say less than Rs 500/-) can be taken as representing the presence of poverty. Thus, all households whose monthly income falls below Rs 500 will be considered as poor.
No matter how simple or how elaborate an investigators’ formal definitions of the concepts, he must find some waq of-translating them into observable events so as to carry out the process of measurement in quantitative terms. The investigator must devise some operations that will produce data as an indicator of the concept. Only then the measurement and consequently the analysis of data is possible. This requirts considerable ingenuity especially if the theoretical constructs are far remukd from everyday events and if little research using such constructs has been done infie past.
Operational definitions are adequate if they help in the gathering of such data as adequately represent the observable manifestations of the concept. However, to which extent this aim is achieved generally remains a matter of judgement. It must be borne in mind that operational definitions so essential to social research are necessarily somewhat arbitrary and restricted expressions of what a concept really means.
Now, for any concept a large number of operational definitions are pdssible. Creative insight, good judgement and relevant theory aid in the development of operational definitions. However, such aids are to some extent subjective in nature. For instance, an investigator may feel that his data provide reasonably good indicators of his concepts, but a critic of the study may feel they do not. It frequently happens that even a researcher himself becomes aware of the fact that his data constitute only a limited reflection of the concepts employed. In this context, an important question arises: how to evaluate the soundness of specific operational definitions in an objective manner?
Evaluation basically concerns with the nature and quality of measurements. Measurement in data analysis is greatly influenced by the quality of operational definition. Social scientists use the terms ‘reliability’ and ‘validity’ to describe issues involved in evaluating the quality of operational definitions. Reliability is concerned with’ questions of stability and consistency. Is the operational definition measuring an attribute or ‘quality’ consistent and dependable? Do repeated applications of the operational definition under similar conditions yield consistent results?
For example, a steel tape is a highly reliable measuring instrument. A piece of wood (say 20 cm long) will measure 20 cm every time a measurement is taken with a steel tape. But the measurement may vary if a cloth tape is used. Elasticity of .the cloth used may affect the length of cloth tape. Vadations may occur depending on how loosely or tightly the tape is stretched. In other words the cloth tape is less reliable as compared to the steel tape.
Validity in measurement refers to the extent of matching, congruence or ‘goodness of fit’ between an operational definition and the concept it is purported to measure. In other words, an operational definition must truly reflect what the concept means. The researcher must ensure that he is measuring something which he intends to measure. For instance, the concept of ‘education’ may be operationalized by linking it with the years of schooling. Now if ‘the educational standards – syllabi, teaching methods, examination system etc. are uniform in different schools, then ‘years of schooling’ mayrbe considered as a valid measure. Twelve years of schooling and the attendant percentage of marks in the school leaving certificate will reflect the performance and knowledge of the students. However, if the educational standards vary from region to region then years of schooling cannot be considered as a valid measure. The procedures for assessing validity and reliability are complex and cumbersome in social sciences. Many research investigators prefer to borrow from prior studies measures that have established records of validity and reliability.
a e c k Your Progress I
Indicate whether the following statements are true or false. Insert a tick mark ( J ) in the relevant box.
True | False | |||||
1) The continuous interplay between theory and data | 1 | 1 | [ | 1 | ||
enhances the growth of scientific knowledge. | ||||||
2) | Data analysis forms a vital link between theory and | [ | 1 | [ | 1 | |
empirical evidence | ||||||
3) | Social facts are usually simple and pose no difficulties in | 1 | 1 | [ | 1 | |
data analysis |
- Processing and interpretation of data are the two most
important dimensions of data analysis. | [ | 1 | [ | 1 | |
5) Two aspects, viz.measurement and evaluation require | [ | 1 | [ | 1 | |
special attention in the context of data analysis. | |||||
6) Measurement process means a movement from | [ | 1 | [ | 1 | |
concrete reality to abstract ideas. |
- Measurement process involves conceptualization,
,specification | of variables and indicators, and | [ | 1 | [ | 1 | |
operationalization of concepts. |
- Operational definitions are not arbitrary expressions of
[ | I | [ | I | ||
9) Evaluation determines the natureand quality of | [ | I | [ | I | |
measurement. | |||||
10) The procedure for assessing reliability and validity are | [ | 1 | 1 | ||
very simple in social research. |
3.3 ELEMENTARY ANALYSIS OF DATA
Data analysis is a fundamental requidment of social research. Its main function is to impart order to the whole body of information gathered during the course of study. This demands a systematic treatment of the data collected. In other words, it means grouphg,re~ouping,’sorting, sifting of data keeping in mind the overall needs of the research design.
The ulti-t: aim of data-analysis is to provide answers 10 the already formulated ~esearchquestions. The information is systematically organized to perform this tunction. ParticuIar type of information is grouped together. Data are presented-in swh a way that relationship among different types of information becomes clear and data become amenable to interpretation. This is then organised systematically to take the fonn of report, book or article which can be shared by others.
We will now discuss some of the major steps involved in the analysis of data.
3.3.1 Scrutiny of the Assembled Data
An early but important step in data analysis is a critical examination of assembled material. Research should make all efforts to search the contrasts or similarities contained in the data. Adequate attention should be given to all the Little and big events which symbolize the field situation so as to reveal.their.comparative significance. While conducting this preliminary exercise, the researcher should keep in mind the key elements of the study, viz., nature of the research problem, hypothesis and research design.
fn fa&, the-researcher needs to cultivate the habit of asking questions to himself, once he confronts the data. For instance, are the data complete enough to reveal patterns of behaviour, sequences, and relationships which might explain the research problem under investigation? Are the data objectively recorded and k e they reproducible? Are the data susceptible to quantitative treatment? Are the data collected with the help of adequate sampling techniques? Many more such questions may be raised while scrutinizing the data.
Am
Most of the questions should relate to the research project and the data collected. But the researcher should not refrain from asking even those questions which may otherwise look unimportant in the beginning. This procedure invigorates imagination and opens up new patterns for the resolution of problems. It also helps to develop a critical attitqde which is most essential to know the real depths of data.
Data analysis requires total involvement in data. The researcher should make all out efforts to become aware of each and every facet of data. It is just like putting an object in front of one’s eyes and closely looking at it until one becomes thoroughly acquainted with all its essential features. Reading, re-reading, examining and re-examining of data helps to reveal the true nature of data. This process also helps to eliminate a host of problems from data. The whole exercise of scrutinizing data accomplishes several purposes in the sense that the researcher:
gets acquainted with the intrinsic complexity of data to be analysed;
is Able to perceive the essential relationships, similarities and differences contained in the data;
is able to know about the internal consistency and completeness of various aspects of data;
learns about the relative significance of each and every bit of recorded information.
3,3.2 Classification of Data
By now, it is obvious that data in social research are constituted through the collection of accurate social facts. But social facts become meaningful only when they are logically connected with other relevant facts and are sorted according to their essential nature. Similar type of facts must form a chain of evidence so that they are able to mutually explain each other. This process which is aimed at the orderly compilation of facts is termed as classification of data.
It should not be assumed that classification is undertaken only after the data have been collected. The careful researcher is always a classifier and organiser of data. As he observes and gathers his data, he inevitably adopts some scheme (howsoever rudimentary it may be) for classifying and coordinating social facts. For instance, the initial selection of variables (dependent, independent variable etc.) provides a broader scheme for classification even during the course of data collection.
Major work regarding classification is carried out only after the completion of data collection. To achieve this the researcher needs to select, adopt or develop some classification system. The suitability of this is determined by the nature of thestudy (hypothesis testing, diagnostic study etc.) and the completeness or accuracy of data. The classification further depends upon the researcher’s insight into data and the w e n t of sophistication attained by him in the concerned discipline.
The main value of any classification system lies in its potential for grouping together masses of comparable pats into relatively few classes or categories. For instance, a m given bulk of information which may otherwise display a diverse range of characteristics to a casual observer, can be grouped under a suitably adopted or evolved category. This way, a decrease in the number of units, (classes, categories etc.) makes the research material more readily comprehensible. The essence of the data is also more easily graspgd.
In addition to some common sense classification the researcher may begin to sort his data in terms of:
.essential similarities and dissimilarities in them;
clusters of related facts which can be observed repeatedly with consistent regularity; and
recurring sequence of events.
It should be clearly bbrne in mind that the aim of classification is to discover such series, sequences and relationships as may throw light on the uniformities in one group of data and on the differences in another.
3.3.3 Establishment of Categories
You have learned that data analysis involves classification of research materials. For the classification of data, a hndamental requirement is the establishment of categories. To reiterate, the data in social research are usually constituted by a variety of information gathered during the course of study. In order to provide answers to research questions, the loose data need to be organised and classified. This means that diverse kinds of information (field observations, responses through questionnaires/schedules and interviews) need to be grouped into alimited number of classes or categories. Similar types of responses or observations must be grouped together to form a particular category. But how to select, adopt or develop a category or categories?
It is important to note that decisions regarding the selection or adoption of categories cannot be arrived at in an arbitrary fashion. In order to decide about the relevant categories under which research data may be classified, some principle of classification must be adopted or evolved. Now, when a study is conducted within a particular theoretical framework, the categories usually get established prior to data collection. Furthermore, if structured tools’of data collection are employed, then the appropnate principles required for classification of data are clearly prescribed by the nature of questions and their attendant responses. Firstly, in such cases, only those data are collected which are relevant to the hypothesis or the variables involved in the study. Secondly, structured questions themselves provide clear cut options for their answers. For iytance, in order to determine the demographic profile of scheduled castes in a rural area, the researcher may ask.the question. “Do you belong to a Scheduled Caste?” Now, the answer can be of yes/no type. All respondents will get automatically categorised as belonging to Scheduled Castes or not belonging to Scheduled Castes. Sometimes, a number of choices are given for answering a question. Barring some exceptions, almost all the responses usually fall under one or the other choice. This way the problem of developing categories becomes simplified. The information gets more or less precategorized during the process of data collection. Thus, in a study conducted within a particular theoretical premise and with the help of structured tools (schedule) each question provides for a natural unit for categorization.
When the researcher approaches the field situation without the help of existing theories the problem of developing categories becomes cumbersome. In such cases, the researcher usually makes efforts to move from data towards theory. His aim is to gather each and every bit of information during fieldwork which may later on be used to produce a theoretical statement or generate some hypotheses. Categories are mainly developed only after completing the process of data collection. The researcher may continue to sort or sift his data even during the course of data collection under tentative categories. But the main task of establishing categories is taken up only at the time of data analysis.
For instance, in exploratory studies, establishment of principles of classificatio~iis especially difficult. It is absolutely impossible to select categories in advance. Such studies by definition do not start withexplicit hypotheses. At the time of data collection, the investigator is generally not clear which information might prove useful. He simply collects each and every bit of information about the community under study. The ‘wording’ and ‘sequence’ of questions usually varies from person to person. Consequently, the responses also do not display a uniform pattern. The data in such studies turn out to be complex. There are usually no convenient ‘natural units’ (as in the case of structured questions) under which data may be classified. Hence, the researcher faces crucial problems while selecting or developing categories at the time of data analysis. The researcher, in such a situation, is supposed to deal with a diverse range of unstructured information. But a substantial part of information may turn out to be irrelevant for the study. Therefore, the foremost requirement is to identify that part of data which is relevant to the research investigation. After having done this, the researcher should then move on to think about classificatory principles needed for the establishment of categories.
In an exploratory study, the initial concern should be to group similar information under some suitable heading. The process can even be started during the time of data collection. For example, all information regarding ‘family structure’ can be grouped
together. More observations can be grouped together under the heading of ‘kinship structure’, ‘marriage’ and s o on. Similarly, observations relating to ‘political institutions’ and ‘economic structure’ can be grouped separately.
Now, if you as a researcher are interested in some particular aspect of a community, you need to develop particular categories for the classification of data. Suppose you are interested to study the problem of alcoholism among landless labour in a rural area. You may categorize your data as follow:
Alcoholism among landless labour; Present …. Absent …. In this classification, you | . |
have used only two categories, i.e., presence or absence. These two categories form a | |
category set. However, a category set may also consist of more than two categories. | |
Assuming that you wish to study degree of alcoholism among the workers then you | |
may decide to use six categories: very high, high, medium, low, very low and absent. | |
Each-category set (irrespective of the number of categories involved) is subject to | |
certain basic rules. It must be: | |
a derived from a single classification principle; | |
a sufficiently exhaustive to include all possible responses related to a certain item in | |
one of the categories within a set; |
- mutually exclusive i.e. responses must fit into one and only one,category within the
- clearly defined, so that it is understood alike by all the investigators who would code the data.
Categorization of data is not as simple as might be,inferred from the above discussion. In actual data analysis, establishment of categories requires considerable research experience, thorough knowledge in the concerned discipline and of the problem being studied.
.
Check Your Progress I1
Note: a) Use the space provided below for your answer.
- How does scrutiny of data help a researcher? (Hint: see the text.)
…………………………………………………………………………………………………………………………….
……………………………………………………………………………………………………………………………..
…………………………………………………………………………………………………………………………….
…………………………………………………………………………………………………………………………….
2 ) Which principles need to be followed in.establishing categories for the classification of data?
(Hint: see the texQ
…………………………………………………………………………………………………………………………….
…………………………………………………………………………………………………………………………….
…………………………………………………………………………………………………………………………….
…………………………………………………………………………………………………………………………….
3.4 CODING AND TABULATION
In-the following we will acquaint you with basic aspects of coding and tabulation.
3.4.1 Coding
Coding is an operation by which numbers or symbols are assigned to all items, | ||
according to the class or category in which they fall. The basic purpose is to facilitate | ||
greatly the task of tabulation of the responses. Assigning the symbol or number to a | ||
given datum is more or less a mechanical procedure. The basic operation is mainly | ||
that of classification. For instance, in surveys, the categories are usually answers to | ||
50 | questions; and the common practice which simplifies data entry and analysis is to use | |
numerical codes only. |
Now if the answers to questions are already numbered, then there is no need to code the data further. But if the answers are not expressed numerically, then numbers must be assigned to each answer. For a moment, suppose that the variety of answers to a particular question is ‘yes’, ‘no’, ‘don’t know’, ‘no answer’ respectively. The codes would be 1 for ‘yes’ 2 for ‘no’, 3 for ‘don’t know’, and 4 for ‘no answer’.
For structured questions coding is a straightforyard operation. Generally, there are relatively few categories and you simply need to assign a different code to each category. However, for unstructured questions the number of responses may be large. In such a case you need to develop a coding scheme that does not require a separate code for each and every response. A particular w d e should adequately cover a full range of similar responses. The whole idea is to put data in manageable form.
The nature of coding operations depends mainly on three factors:
the number of respondents or sources of data;
the number of questions asked;
the number and complexity of statistical operations planned for the study.
If the number of respondents is large then any kind of tabulation becomes difficult unless the data are coded. There is no easy way to wry out complex cross tabulation without some form of wding. Furthermore, a statistical openation involves manipulation of numbers, which in turn must represent the data from the schedule. The researcher should assign numbers to the answers and summarize them separately on sheets or cards. The operations became much simpler. The more complex the data, the more useful is some form of coding.
Depending upon the nature of study, coding can be done either at the stage of finalising the schedule, or at the data collection stage (where schedule/mailed quekiomaire is not being used) or at a time just prior to tabulation of data. In a study employing closed ended questions, the numerals may be assigned to answers in advance. This operation is called precoding. For example take the question:
Do you belong to a Scheduled Caste?
(Circle the answer) Yes 1 No 2
This way the data get automatically coded. These precded responses can be
- tabulated veryeasily by haad or they may be transferred directly on cards for maCh&e<abulation.Another procedure is that the resebrcher may do the coding work as soon as he hears answers. But it requires considerable research experience and should be done in a fairly cautious manner.
Preceding a schedule has some obvious advantages. For instance, it makes possible elimination of all or major part of the’work required during office coding operation. However, one of the possible disadvantages is that categories selected (consequently coded) may prove to be too general for revealing inconsistencies in reporting. In fact, the tendency among inexperienced researchers is to use a few categories which generally gloss over potentially meaningful differences. Another limitation of precoding may stem from ‘catch-all’ categories like ‘Others’ or ‘Miscellaneous’ as one of the possible response to a question. Such categories may conceal meaningful information unless explained by the interviewer. If the number of responses in the ‘Others’ category is unusually large it implies that much useful information is being lost. Once data are coded and data analysis is under way, it is still possible to combine codicategories for purpose of analysis; but the reverse operation cannot be done.
When coding is carried out after data collection (just before tabulzjtion) coding operation in a central office may be organised. In such a situation, much depends upon the nature of data and the number of respondents. It is useful t o handle relatively simple factual items separately. The more complicated and special information may be grouped into items subject-wise. The most important factor in any d i n g scheme is that it must cover each and every variable or question involved
There are some useful Aids to coding. A heavy coloured pencil can be used to mark code symbols opposite the answers. If the number of respondents is small, the
summary sheets may be attached to each schedule. A large summary sheet can also be designed so as to take care of all the cases. This is quite useful for quick overview of data. If the number of cases goes beyond 150, the student may use some form of punch-cards. The data transferred to punch-cards facilitates analysis partrcularly when mechanical aids are employed.
While carrying out coding of schedules, instructions should be prepared in a written form. Such instructions should be followed during classification. Now, it is possible to make errors no matter how simple the problems of classification and coding may be. Therefore, it is advisable to nlake a definite sample of the coded data so as to detect errors. The procedure is called ‘spot checking’ and it acts as a good precautionary measure against errors in the mechanics of coding.
Once wdes are established for all the items and the data are ready for tabulation, the researcher should prepare a code-book. A code-book is like a dictionary in the sense that it defines the meaning of the numerical codes used for each answer. The code-book should contain coding rules employed in the organization of data. A well-prepared code-book helps to locate each item in the data-file.
3.4.2 Tabulation
After having completed the process of classification and coding, the next step involved in the analysis is the tabulation of data. Tabulation is mainly a part of the technical process in the analysis of data. The essential operation in tabulation is counting so as to determine the number of cases that fall into various categories. In other words, the individual cases or items in the assembled data are sorted and counted according to various categories of classification. The whole aim of tabulation is to arrange research material into some kind of concise and logical order.
Tabulation may be done entirely by hand or it may be done by machine. In other words, sorting and counting process can be carried out by hand or with the help of mechanical aids depending upon the number of cases or items to be tabulated as well ‘asupon the availability of staff and money. If the number of cases is small, the tabulating work can be completed by hand. In case the data has been transferred to cards or small sheets, they can be sorted into piles and counted directly. This system was generally used for large scale tabulation before the advent of mechanical tabulation.
A common method in hand tabulation is to enter each case in the appropriate compartment of a tally-sheet by means of a line, dot or some other mark. The tally sheet is a blank table with properly labelled columns and rows. Figure 3.1 below is an example of a tally-sheet for a frequency distribution. It shows the distribution of 65 executives according to their weight.
Table 3.1: Weight of executives | ||
Weight (kg.) | Tabulation | Frequency |
45-49 | 1 | 1 |
50-54 | 1111 | 4 |
55-59 | J4-H- 1111 | 9 |
60-64 | ~ H I T + T r 11T ~ | 22 |
65-69 | M – W ~ U L T l | 16 |
70-74 | .Xlu+ | 10 |
75-79 | 111 | 3 |
Total | 65 |
Four tallymarks which have been crossed denote a frequency count of 5. This way wunring becomes simpler. In this case, the information from schedules in regard to the question on weight of the respondent has been tabulated. There are 65 schedules. The weight of each person could have ranged from 45 kg. to 79 kg. or a figure inbetween. If the data were recorded in kg. (i.e. decimal points are ignored), one
possibility could have been to tabulate the weight by 1 kg. interval. That would have made the table very long and not served any useful purpose from the point of tabulation. Similarly, class intervals of 15 (i.e. 45-59,60-74, 75 and above) or 10 (i.e. 45-54,55-64,65-74,75 and above) would be too large. Therefore, tabulation ofthe data by 5 kg intervals is fiir more satisfactory in this case. The coding plan would have recognised.this and the codes put against each entry accordingly.
The tern1 cross tabulation is employed to refer to the tabulation of the number of cases that occur jointly in two or more categories. Cross tabulation is an essential step in the discovery or testing of relationships among the variables in one’s data. A cross tabulation requires a table with rows representing the categories of one variable and columns representing the categories of another. When a dependent variable can be identified, it is customary to make this the row variable and to treat the independent variable as the column variable, although this is only an arbitrary convention.
Let us consider the cross tabulation of two variables, i.e., ‘attitude towards dowry’ and ‘gender’. In table below the row variable consists of ‘Attitude towards dowry system’. In other words, it indicates whether the respondents favour or do not favour dowry system. The w l u m variable is ‘gender’. Gender is here the independent variable. Notice that the last column and bottom row, each labelled as-‘Total’ show the total number of respondents with each single characteristic. Because the four numbers (112, 88, 109 and 91) are along the right and bottom margins of the table, they are called marginal frequencies.
Table 3.2: Attitude towards dowry by gemaer | |||
Attitude towards | Gender | Total | |
Male | Female | ||
78 | 34 | 112 | |
Do not favour | 31 | 57 | 88 |
109 | 9 1 | 200 |
Each intersection in the body of the table where the categories of tdvo variables intersect is called a cell. The number in each cell is called the cell f&yuency. The cell
. frequencies in this table are 78,34,31 and 57. For example, 78 men in favour of the dowry system constitute one of the cell frequencies. Because this table has two rows and two columns, it is referred to as a 2×2 table.
Both machine and hand tabulation presuppose that data have been coded and that coding has been checked. Now hand tabulation can be quite fast and accurate if proper techniques are employed. One of the most efficient techniques is to use small code cards which can be easily sorted and counted. Further, with the use of such devices as colours, heavy lines etc., the wdes can be easilyaistinguished and the cards efficiently sorted. Counting is also rapid if small cards are wed. On the whole, manual tabulation is less expensive and less time consuming when data and category sets to be counted are small and also when not too many cross tabulatioh are
In machine tabulation, scores for each case are usually transcribed on to a sheet froni which these can be punched ,on to cards by a card-puncher. In fact, machine . tabulation involves orjerations like card punching, the checking of machine tabulations, the transposition of the results from machine tabulation forms to tables etc. All these steps are not required in hand tabulation. If the data is large or many cross tabulations are involved, then the efficiency of machines compensates for the time consumed in the prepqatory operations. Many types of machines are available which can be used for tabulation. Developments in th&&eld have been extremely rapid in recent years. Some machines.simply sort and count car&; others sort, count and print the results; still others are capable of performing the most complicated statistical operations. The machines of latter variety are extremely complex a& they need to be “programmed for a given operation by a specialist inthe field. The cost of using machine tabulation is usually high. But as the number of cases or the number of cross-tabulations increases, the use of machine tabulators becomes progressively
.
Presentation
more economical. The number of cross-tabulations is the most important factor in deterniining the relative preference for one rather than the other mechanical device in tabulation. With the advent of the computer, tabulation has become a quick exercise. The selection of appropriate software permits not only the data to be tabulated but also their statistical analysis. For instance, computati6n of the correlations among a large number of variables is a statistical job that can now be accomplished within a very short time.
.
Check You Progress I11
Note: a) Use the space provided below for your answer.
- What are the main factors which affect the nature of coding operations? (Hint:. see the text.)
…………………………………………………………………………………………………………………………….
…………………………………………………………………………………………………………………………….
…………………………………………………………………………………………………………………………….
…………………………………………………………………………………………………………………………….
- What is the main aim of tabulation of data? (Hint: see the text.)
…………………………………………………………………………………………………………………………….
…………………………………………………………………………………………………………………………….
…………………………………………………………………………………………………………………………….
Activity I
Frame a question to be asked from the head of a household whose child below 10 years age has dropped from school. Ask the question from some 25 respondents. Try the categorisation,coding and tabulation of this data. Show this to your academic c&nsellor.
3.5 STATISTICAL ANALYSIS OF DATA
in the previous sections, we have discussed some of the important steps involved in the analysis of data. In this section, we will give you some preliminary idea about tha statistical analysis of data. We will also acquaint you with the elementary aspects involved in the presentation of data. Towards the end we will furnish a brief outline for the preparation of a research report
After having taken care of coding, tabulation etc., the next crucial decision pertains to appropriate statistical analysis of data. But how to decide about the relevant kind of analysis? Obviously, such a decision depends on what you want to know. Let us Suppose that you want to know about the causal relationship between independent and dependent variables involved in your study. Now in order to establish such a causal relationship, the foremost requirement would be to demonstrate that both the variables are associated. In other words, you must demonstrate that changes in one variable are accompanied by changes in the other.
At this stage you might be thinking that one may simply examine the raw data, respondent by respondent, to reveal whether changes in the independent variable are associated with changes in the dependent variable. But such an exercise is extremely cumbersome. When you have to examine a large number of respondents the task becomes incredibly tedious and also not very reliable. Many a times, it is not ‘at all feasible or possible to carry out such an operation. Therefore, what the researcher actually needs is a fast and efficient means of summarizing the association between variables for the entire sample of respondents. This is precisely where statistics comes into picture.
Statistics as a discipline provides techniques of organizing and analysing data. Statistics serves the function of summarizing the available data to make them more
intelligible. For instance, with the help of statistics we can summarize very fast the association between variables in a given sample. Statistics facilitates the drawing of inference from a given set of data in the form of generalizations. It also makes possible the testing of the reliability of research findings. Based on probability theory, statistics can be used to estimate population characteristics from the sample survey and to test hypotheses. In sum, statistics are desiwed to summarise data and to reveal the extent to which one may generalize beyond the data at hand.
There are various statistical operations which help to achieve the objectives of data analysis. Most of the procedures are described fully in statistics text books (or in books on research methodology). We shall give here only a preliminary introduction. Statistical analysis sometimes begins with the inspection of each variable separately. It is called univariate analysis, (i.e., single variable analysis). Univariate analysii is knducted as a prelude to more complex analyses or as a part of a basic descriptive study. In any case, the goal is to get a clear picture of the data by examining one variable at a time. Univariate analysis helps to reveal the nature of the variation in the variables under study. The purpose is to see if there is sufficient variation in responses to warrant the inclusion of the variable in the analysis.
The variations in responses are revealed by statistically organizing the data in terms bf frequency distribution and percentage dstribution. A frequency distribution is created by first listing all of the response categories and then adding up the number of cases that fall into each category. Frequency distribution presents a clearer picture than case by case listing of responses. However, it only serves the purpose of a preliminary organisation of data. To get a more clear picture of data, researchers often compute the percentage of respondents in each category. Percentage distributions provide a comparative framework for interpreting responses and depict more clearly the relative difference among responses. They also make visible the size of the particular category in relation to the size of the sample. To create percentage distribution, you divide the number of cases in each category by the total number of cases and multiply by 100.
statistic:^ helps to reveal various statistical properties of univariate distributions. The notable properties examined by statistical analysis are central tendency, dispersion and shape. The determination of central tendency characterizes what is ‘typical’ in the data. In the terminology of statistics three main measures of central tendency are the me-, the median and the mode.
The mean is the arithmatical average, calculated by adding up all the responses and dividing by the total number of respondents. It is the ‘balancing’ point in a distribution, because sum of the differences of all the values from the mean is exactly equal to zero. The median is the midpoint in a distribution – the value of the middle response – half of the responses are above it and half are bdow. You find the median by arranging the values of a variable from low to high (or high to low) and then counting up until you find the middle value. The mode is the value or category with the highest frequency.
Let us go back to Table 3.1 which is an univariate table. We have the frequency distributions by weight i.e., the number of executives whose weight falls in each category. Now we can calculate the percentage distribution in each category by dividing the frequency in each category by the total i.e. 65 and multiplying it with 100. The sum total of the percentage distribution will add up to 100.
The second property that can be summarized statistically is the degree of variability or dispersion among a set of values. The simplest dispersion measure is the range. Statistically, this is the difference between the lowest and the highest values and is usually reported by identifying the end points. For instance, in Table 3 2 if the least weight of a person is 47 kg. and the highest 78 kg. the range will be 78-47 or 31. The most commonly used measure of dispersion is the standard deviation. This is a measure of the ‘average’ spread of observations around the mean.
The third statistical property of univariate distribution is their shape. This property is made apparent through a graphic presentation called a frequency or percentage polygon. The combination of these three statistical properties i.e. central tendency, dispersion and shape provide a good picture of quantitative d a t ~
Analpis of Date and their Presentation Methods in Social Research
.
Bivariate analysis examines the nature of the relationship between two variables. Such analysis begins with the construction of cross-tabulations. There are rules for percentaging cross-tabulations and also for making necessary comparisons. ~tatisticauy,bivariate analysis amounts to calculating the degree of association between variables. Causal inferences are based not only on association but also on theoretical assumptions and empirical evidence about direction of influence and non-spuriousness.
However, let us briefly indicate what is involved in any statistical analysis. In giving an adequate description of the mass of data, we usually wish to do one or more of the following things :
to characterize what is ‘typical’ in the group;
to indicate how widely individuals in the group vary;
to show other aspects of how individuals are distributed with respect to the variable being measured;
to show the relationship of the different variables in the data to one another; to descrihe the differences between two or more groups of individuals.
We also wish to draw generalisations applicable to populations from which samples were taken for study and to infer causal relationship among variables.
Since statistical statements are statements of probability, we can never rely on statistical evidence alone for a judgement of whether or not we will accept a hypothesis as true. Confidence in the interpretation of a research result requires not only statistical confidence in the reliability of the findings but, in addition, it also requires evidence about the validity of the presuppositions of the research venture.
3.6 PRESENTATION OF DATA
For anyone seriously interested in social research, knowledge about the presentation of data is indispensable. In the following we will give you a preliminary idea of data presentation through statistical tables and diagrammatic presentation of data.
3.6.1 Statistical Tables
No matter what type of problem one is investigating, there will almost invariably be a need to arrange data (or at least part of them) in statistical tables. It is, therefore, important for a student of social research to understand the basic principles involved in the construction of tables. We state them below:
A table must have internal logic and order. For instance, if you are to present a variable such as height of individuals in tabular form, then you must order them either from the tallest to the shortest or vice versa. Similhrly, in the case of qualities where order may not be so obvious, the need for some kind of logical treatment is equally important.
The units entered in the left hand column describing the qualities or values of the variable must be mutually exclusive. However, they must cover or include the vast majority of observations. For instance, if you are interested to present your data in terms of religion then mutually exclusive categories will be Hindus, Muslims, Jains, Sikhs, Christians, Buddhists and ‘Others’.
If the left hand coloumn of a table is a quantitative variable, then the class interval must be carefully and reasonably chosen. While this would depend to some extent on the range and the pattern of distribution of the datii, the usual practice is that the number of categories shobld not be less than 5 and not more than 12. There would of course be exceptic 1s Care must be taken to construct intervals of
uniform size. After having decided about the size of class intervals, it is important to clearly designate them. Eac; interval must havedefinite lower and upper limits.
In any discussion of the rules and practices which should be followed in the construction of statistical tables, it is important to make a distinction between the general purpose table and the special purpose table. The former is also known as an original, primary or reference table and the latter has been often referred to as an analytical, summary, interpretative or secondary table.
The general purpose table is constructed to include a large amount of source data in convenient and accessible form. The special purpose table is designed to illustrate or demonstrate certain aspects of statistical analysis or to emphasize a significant relationship pertaining to data. For example, tables in the reports on population, or vital statistics are of the general purpose type. They are usually extensive repositories of statistical information. Examples of the special purpose tables may be found in monographs and articles. In such cases, a table is fundamentally an instrument of
Now the procedures for constructing statistical tables are not entirely standardized, but there are certain generally accepted practices. We state below usages which need to be followed:
Every table should have a title. The title should indicate very briefly a description
of the contents of the table and give an idea about the type of data it will contain.
This is important specially when there is a list of tables which the reader goes –
through before actually looking at the table; .
Every table should be identified by a number to facilitate easy reference; The headings of columns and rows of the table should be clear and brief;
Any explanatory footnotes concerning the table must be placed beneath the table
The arrangement of the categories in a table may be chronological, geographical, alphabetical or according to magnitude. In other words, such an arrangement should fulfil the demands related to the nature of data;
The unit of reporting (where applicable) should be stated. For instance, if the
.population given in the table is in million then this should be stated at the top of the table. Sometimes presentation of numbers in such summary form becomes necessary;
If the table is from some other study or statistical compilation, the source should invariably be given.
The mairi advantages of presenting data in the form of statistical tables are:
Tables conserve space and reduce explanatory and descriptive statements to a
The visualization of relations among variables and the process of comparison is greatly enhanced by tables;
Data presented in tables can be more easily remembered;
Statistical tables provide basis for computations.
preceding pages. Discuss with your academic counsellor.
Analysis of Data and their Presgntatiw
3.6.2 Diagrammatic Presentation
Now it is quite possible that the meaning of a series of figures in a textual or tabular form may be difficult for the mind to grasp or retain. In order to overcome such a limitation, the data is usually presented in the form of diagrams. Diagrams are basic aids in research and help in the translation of complex statistical data into a form which can be easily grasped by the mind. Diagrammatic presentations make large masses of data clear and comprehensible.
Diagrams help in the depiction of facts concisely, logically and in a simple manner. For instance, properly constructed graphs relieve the mind of burdensome details and serve to indicatecthe conclusions of a study. In fact, the graph is one of the most commonly used devices for proving a trend or relationship between variables.
In social research different types of diagrams are used to present data. There are | |
dimensional diagrams which may further be characterized as one dimensional, two | |
dimensional and three dimensional diagrams. One dimensional diagrams are in the | 57 |
shape of vertical or horizontal lines or bars. The length of lines or bars is made in proportion to the different figura which they are supposed to represent. Such diagrams are also in the form of bar diagrams and the width of bars varies in accordance with the nature and value of data. These bar diagrams can further br. simple bar diagrams, multiple bar diagrams and sub-divided bar diagrams.
Two dimensional diagrams are in the shape of rectangles, squares or circles wherein a specific area represents the size of an item or the characteristics of a particular data. Three dimensional diagrams are in the form of cubes, blocks or cylinders. Again the cubes or blocks are constructed in such a way that their volume is in proportion to the given value of data.
Diagrammatic presentation of data can also be in the form of graphs or curves. Graphs are used as a good aid in almost all types of studies. For instance, analysis of frequency distributions generally requires the use of some kind of graphic presentation. Three main types of graphs which are used to represent frequency distribution are called histogram, frequency polygon and smoothened curve. Graphs depict the data in a comprehensible form and herp the reader to understand them in
- simpler manner.
Now, certain basic considerations must be kept in mind while presenting the data in the form of diagrams. The researcher, should pose the following questions to himself: what is the purpose of the diagram? What facts are to be emphasized? What is the educational level of the audience? How much time is available for the preparation of the diagram? What type of diagram can have the greatest appeal in a given situation? What type of diagram will portray the data most clearly and accurately?
Like all social research techniques, diagrammatic presentation must be used with care and discrimination. Poorly constructed diagrams may seriously damage the prospects of an otherwise good research work since they may actually misrepresent or distort the facts. Since the main objective of diagrammatic presentation is to clarify data, it is important not to make them complicated. Diagrams crowded with too many facts may cause confusion. Extreme care must be exercised while selecting a. particular type of diagram. It should be most appropriate to the problem at hand. For instance, there must be some sound reason for preferring a graph to some other type of
Accuracy is just as important in the construction of diagrams as it is in other aspects related to social research. It is most desirable to be extremely meticulous in the actual construction of diagrams.
In our discussion we have just furnished a preliminary view of diagrammatic presentation of data. However, much more is involved in the actual construction of diagrams. Before you actually enter into any research venture, you should acquire thorough knowledge about the techniques involved in the construction of diagrams. For more knowledge in this area you should consult some textbook on statistics and a manual on diagrammatic presentation.
3.6I3 Outline of a Research Report
The preparation of a report is the final stage of the research. Its purpose is to convey the results of study in sufficient detail. It enables others to comprehend the data. It is also a statement about the validity of conclusions.
Now an obvious question that may arise in your mind is: Whaf should be the final form of a research report? In the following, we will suggest some of the major features which a good research report should contain.
- Introduction
This should give a concise and clear cut statement as to the nature of the study, aims and objectives and scope of the study.
- Design of the Study
This should cover: Concepts and definitions; Character of the research design; Sampling design; Tools of data collection employed in the study and the
conditions under which they were used; The period of data collection; Limitations of the data (if any).
- Statistical procedures used in the analysis of data
- Findings of the study
- Conclusions
- Bibliographical references
- Appendices
In general, this is a broad outline for the ?reparation of a research report. However, this is subject to change and any additions or deletions can be made depending upon the nature of a particular research study.
Check Your Progress IV
Note: a) After carefully reading the preceding text complete the following statements.
- Compare your answer with the model answer given at the end of the Unit.
a) | ……………… | provides techniques for the.organization and analysis of data. | ||||
b) | With the help of statistics we can summarize very fast the | …………… between | ||||
variables in a given sample. | ||||||
c) | Statistical organization of data reveals variations in responses in terms of | |||||
………………… | and | …………………a | ||||
d) | The notable properties of univariate distributions revealed by statistical anaiysis | |||||
a r e 1……………….. | 2……………….. | 3………………… | ||||
e) | Three main measures of central tendency in the language of statistics are | |||||
1…………….. | 2……………… | 3………………… | ||||
f) | The general purpose table is constructed to include …………………… | in convenient | ||||
and accessible form. | ||||||
g) | The special purpose table is designed to illustrate a ……………………… | pertaining | ||||
to data. | ||||||
h) | manner.………………… | help in the depiction of facts concisely, logically and in a simple | ||||
- The different types of diagrams used to present data in social research are
1……………………. 2…………………….3………………………..
.-
3.7 LET us S U M UP
- In this Unit we have discussed some of the fundamental aspects related to the analysis of data and their presentation. At the very outset we have explained that analysis and interpretation of data form a vital link between theory and empirical evidence. We have explained that processing and interpretation of data demand organization, reorganization and understanding of data so as to reach certain conclusions in the light of theoretical concerns of the study. We have also noted that measurement and evaluation and the attendant issues of reliability and validity are important aspects which need special attention in the context of data analysis.
We have given considerable attention to various steps involved in the systematic treatment of data. We have explained to you what is involved in the scrutiny and classification of data; establishment of categories; coding operation; and basic elements of Tabulation. Towards the end of this Unit we have discussed preliminary aspects related to statistical analysis of data. However, for elaborate statistical operations you will require further reading in this area. Discussed also is the forrn in which data are to be presented alongwith some idea about the outline of a research
Anrlysis of Data and their Presentation
Research
3.8 KEY WORDS
Univariate : involving use of one variable.
Bivariate : involving use of two variable.
Multivariate : involving use of more than two variables.
Mean : A measure of central tendency that indicates the average valuk of a univariate distribution of intervgl or ratio scale data. The mean or arithmatic average 1s calculated by adding up the individual values and dividing by the total number of cases.
Median : A measure of central tendency indicating the midpoint in a univariate distribution of interval or ratio scale data. The median indicates the point below and pbove which 50-per cent of the values fall.
Mode : A measure of central tendency representing the value or category of a frequency distribution having the highest frequency; the most typical value.
Standard Deviafion : A univariate measure of variability or dispersion that indicates the average “spread” of observations about the mean.
Frequency Distribution : A tabulation of the number of cases falling into each htegory of a variable.
Percentage Distribution : A norming operation that facilitates interpreting and cornbaring frequency distributions by transforming each to a common yardstick of 100 units (percentage points in length), the number of cases in each category is divided by the total and multiplied by 100.
3.9 SUGGESTED READINGS
Davis, J.A., 1971 Elementary Survey Analysis, Prentice
Hall: Englewood Cliffs, New Jersey.
Goode, W.J. and Hatt, P.K. 1952 Methods ‘in Social Research, McGraw-
Hill: London.
Hymn, H.H. 1955 Survey Research and Analysis, The Free
Press: Glencoe Illinois.
Jahoda, M., Morton D, and Research Methods in Social Relations,
Stuart, WC. 1951 Dryden: New York.
3.10 MODEL ANSWERS
Check Your Progress I
- True 2) True 3) False 4) True
- True 6) .True 7) True 8) False
- True 10) False
Check Your Progress N
- statistics (b) association (c) frequency distribution and percentage distribution
- (1) central tendency (2) dispersion (3) shape (e) (1) mean (2) median (3) mode
- large amount ‘of source data (g) significant relationship (h) diagrams (i) (1) one dimensional (2) two dimensional (3) Three dimensional.