
























































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
In the words of Croxton &Cowden, “Statistics may be defined as the collection, presentation, analysis and interpretation of numerical data”. Horace Secrist has ...
Typology: Lecture notes
1 / 64
This page cannot be seen from the preview
Don't miss anything!
The term “statistics” is used in two senses: first in plural sense meaning a collection of numerical facts or estimates—the figure themselves. It is in this sense that the public usually think of statistics, e.g., figures relating to population, profits of different units in an industry etc. Secondly, as a singular noun, the term ‘statistics’ denotes the various methods adopted for the collection, analysis and interpretation of the facts numerically represented. In singular sense, the term ‘statistics’ is better described as statistical methods. In our study of the subject, we shall be more concerned with the second meaning of the word ‘statistics’.
Statistics has been defined differently by different authors and each author has assigned new limits to the field which should be included in its scope. We can do no better than give selected definitions of statistics by some authors and then come to the conclusion about the scope of the subject. A.L. Bowley defines, “Statistics may be called the science of counting”. At another place he defines, “Statistics may be called the science of averages”. Both these definitions are narrow and throw light only on one aspect of Statistics. According to King, “The science of statistics is the method of judging collective, natural or social, phenomenon from the results obtained from the analysis or enumeration or collection of estimates”. According to Wallis and Roberts “Statistics is a body of methods for making wise decisions on the face of uncertainty.” According to Edward N. Dubois “Statistics is a body of methods for obtaining and analyzing numerical data in order to make better decisions in an uncertain world.” In the words of Croxton &Cowden, “Statistics may be defined as the collection, presentation, analysis and interpretation of numerical data”. Horace Secrist has given an exhaustive definition of the term statistics in the plural sense. According to him: “By statistics we mean aggregates of facts affected to a marked extent by a multiplicity of causes numerically expressed, enumerated or estimated according to reasonable standards of accuracy collected in a systematic manner for a pre- determined purpose and placed in relation to each other”.
Some of the most important characteristics of statistics are as follows:
1. It consists of aggregates of facts: In the plural sense, statistics refers to data, but data to be called statistics must consist of aggregate of certain facts.
A single and isolated fact or figure like, 60 Kgs. weight of a student or the death of a particular person on a day does not amount to statistics. For a data may amount to statistics it must be in the form of a set or aggregate of certain facts, viz. 50, 65, 70 Kgs. Weight of students in a class or profits of a firm over different times etc. is liable to be effected by multiplicity of causes.
2. It is affected by many causes: It is not easy to study the effects of one factor only by ignoring the effects of other factors. Here we have to go for the effects of all the factors on the phenomenon separately as well as collectively, because effects of the factors can change with change of place, time or situation. Here, the overall effect is taken and not of one factor only as in other natural sciences. For example, we can say that result of class XII in board examination does not depend on any single factor but collectively on standard of teachers, teaching methods, teaching aids, practical’s performance of students, standard of question papers and as well as of evaluation. 3. It should be numerically expressed: A data to be called statistics should be numerically expressed so that counting or measurement of data can be made possible. It means that the data or the fact to constitute statistics must be capable of being expressed in some quantitative form as weights of 60, 70, 100 and 90 Kg. or profits of Rs. 10,000, Rs. 20,000 etc. Thus these data must contain numerical figures so that those may be called as numerical statement of facts. 4. It must be enumerated or estimated accurately: As stated above that the statements should be precise and meaningful. For getting reasonable standard of accuracy the field of enquiry should not be very large. If it is infinite or very large, even enumeration of data is impossible and reasonable standard of accuracy may not be achieved. To achieve it we have to make on estimate according to reasonable standard of accuracy depending upon the nature and purpose of collection of data. e.g. we may measure the height of buildings in metres but we cannot measure the length of small things like bricks in the same unit of metre. 5. It should be collected in a systematic manner: Another characteristic of statistics is that the data should be collected in a systematic manner. The data collected in a haphazard manner will lead to difficulties in the process of analysis, and wrong conclusions. A proper plan should be made and trained investigators should be used to collect data so that they may collect statistics. If it is not done, in such cases reliability of data gets decreased. So to get correct results the data must be collected in a precise manner. 6. It should be collected for a predetermined purpose: Before we start the collection of data, we must be clear with the purpose for which we are collecting the data. If we have no information about its purpose, we may not be collecting data according to the needs. We may need some more relevant data to achieve the required purpose, which we would miss in the event of its ignorance.
(v) To provide guidance in the formulation of policies : The purpose of statistics is to enable correct decisions, whether they are taken by a businessman or Government. In fact statistics is a great servant of business in management, governance and development. Sampling methods are employed in industry in tacking the problem of standardisation of products. Big business houses maintain a separate department for statistical intelligence, the work of which is to collect, compare and coordinate figures for formulating future policies of the firm regarding production and sales. (vi) To enable measurement of the magnitude of a phenomenon : But for the development of the statistical science, it would not be possible to estimate the population of a country or to know the quantity of wheat, rice and other agricultural commodities produced in the country during any year.
These days statistical methods are applicable everywhere. There is no field of work in which statistical methods are not applied. According to A L. Bowley, ‘A knowledge of statistics is like a knowledge of foreign languages or of Algebra, it may prove of use at any time under any circumstances”. The importance of the statistical science is increasing in almost all spheres of knowledge, eg., astronomy, biology, meteorology, demography, economics and mathematics. Economic planning without statistics is bound to be baseless. Statistics serve in administration, and facilitate the work of formulation of new policies. Financial institutions and investors utilise statistical data to summaries the past experience. Statistics are also helpful to an auditor, when he uses sampling techniques or test checking to audit the accounts of his client. LIMITATIONS OF STATISTICS:
1. The use of statistics is limited numerical studies: Statistical methods cannot be applied to study the nature of all type of phenomena. Statistics deal with only such phenomena as are capable of being quantitatively measured and numerically expressed. For, example, the health, poverty and intelligence of a group of individuals, cannot be quantitatively measured, and thus are not suitable subjects for statistical study.
University teachers. Whereas the University had only three girls student at that time and one of them married to a teacher. Distrust of Statistics: Due to limitations of statistics an attitude of distrust towards it has been developed. There are some people who place statistics in the category of lying and maintain that, “there are three degrees of comparison in lying-lies, dammed lies and statistics”. But this attitude is not correct. The person who is handling statistics may be a liar or inexperienced. But that would be the fault not of statistics but of the person handling them. The person using statistics should not take them at their face value. He should check the result from an independent source. Also only experts should handle the statistics otherwise they may be misused. It may be noted that the distrust of statistics is due more to insufficiency of knowledge regarding the nature, limitations and uses of statistics then to any fundamental inadequacy in the science of statistics. Medicines are meant for curing people, but if they are unscientifically handle by quacks, they may prove fatal to the patient. In both the cases, the medicine is the same; but its usefulness or harmfulness depends upon the man who handles it. We cannot blame medicine for such a result. Similarly, if a child cuts his finger with a sharp knife, it is not a knife that is to be blamed, but the person who kept the knife at a place that the child could reach it. These examples help us in emphasising that if statistical facts are misused by some people it would be wrong to blame the statistics as such. It is the people who are to be blamed. In fact statistics are like clay which can be moulded in any way. Data Collection: Data collection is the process of gathering and measuring information on targeted variables in an established system, which then enables one to answer relevant questions and evaluate outcomes. Data collection is a component of research in all fields of study including physical and social sciences, humanities, and business. While methods vary by discipline, the emphasis on ensuring accurate and honest collection remains the same. The goal for all data collection is to capture quality evidence that allows analysis to lead to the formulation of convincing and credible answers to the questions that have been posed.
There are two types of data Primary Data and Secondary Data
1. Primary Data → raw data or primary data is a term for data collected at source. This type of information is obtained directly from first hand sources by means of surveys, observations and experimentation and not subjected to any processing or manipulation and also called primary data. 2.Secondary Data → It refers to the data collected by someone other than the user i.e. the data is already available and analysed by someone else. Common sources of secondary data include various published or unpublished data, books, magazines, newspaper, trade journals etc.
2. Interview Method This method of collecting data involves presentation of oral verbal stimuli and reply in terms of oral – verbal responses. It can be achieved by two ways :- (A) Personal Interview – It requires a person known as interviewer to ask questions generally in a face to face contact to the other person. It can be – Direct personal investigation – The interviewer has to collect the information personally from the services concerned. Indirect oral examination – The interviewer has to cross examine other persons who are supposed to have a knowledge about the problem. Structured Interviews – Interviews involving the use of pre- determined questions and of highly standard techniques of recording. Unstructured interviews – It does not follow a system of pre-determined questions and is characterized by flexibility of approach to questioning. Focused interview – It is meant to focus attention on the given experience of the respondent and its effect. The interviewer may ask questions in any manner or sequence with the aim to explore reasons and motives of the respondent. Clinical interviews – It is concerned with broad underlying feeling and motives or individual’s life experience which are used as method to ellict information under this method at the interviewer direction. Non directive interview – The interviewer’s function is to encourage the respondent to talk about the given topic with a bare minimum of direct questioning. Advantages – ⦁ More information and in depth can be obtained ⦁ Samples can be controlled ⦁ There is greater flexibility under this method ⦁ Personal information can as well be obtained ⦁ Mis-interpretation can be avoided by unstructured interview.
Limitations: ⦁ It is an expensive method ⦁ Possibility of bias interviewer or respondent ⦁ More time consuming ⦁ Possibility of imaginary info and less frank responses. ⦁ High skilled interviewer is required (B) Telephonic Interviews – It requires the interviewer to collect information by contacting respondents on telephone and asking questions or opinions orally. Advantages: ⦁ It is flexible, fast and cheaper than other methods ⦁ Recall is easy and there is a higher rate of response ⦁ No field staff is required. Limitations: ⦁ Interview period exceed five minutes maximum which is less ⦁ Restricted to people with telephone facilities. ⦁ Questions have to be short and to the point ⦁ Less information can be collected.
3. Questionnaire: In this method a questionnaire is sent (mailed) to the concerned respondents who are expected to read, understand and reply on their own and return the questionnaire. It consists of a number of questions printed on typed in a definite order on a form on set of forms. It is advisable to conduct a `Pilot study’ which is the rehearsal of the main survey by experts for testing the questionnaire for weaknesses of the questions and techniques used. Essentials of a good questionnaire:
Limitations: ⦁ Low rate of return of duly filled questionnaire ⦁ Control over questions is lost once it is sent ⦁ It is inflexible once sent ⦁ Possibility of ambiguous or omission of replies ⦁ Time taking and slow process
4. Schedules: This method of data collection is similar to questionnaire method with the difference that schedules are being filled by the enumerations specially appointed for the purpose. Enumerations explain the aims and objects of the investigation and may remove any misunderstanding and help the respondents to record answer. Enumerations should be well trained to perform their job, he/she should be honest hard working and patient. This type of data is helpful in extensive enquiries however it is very expensive. Sources of Secondary Data : There are number of sources from which secondary data may be obtained. They may be classified as follow. :
(i) Series of individual observations, (ii) Discrete frequency distribution, and (iii) Continuous frequency distribution. Series of individual observations is a simple listing of items of each observation. If marks of 14 students in statistics of a class are given individually, it will form a series of individual observations. Marks obtained in Statistics: Roll Nos. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Marks: 60 71 80 41 81 41 85 35 98 52 50 91 30 88 Marks in Ascending Order Marks in Descending Order 30 98 35 91 41 88 41 85 50 81 52 80 60 71 71 60 80 52 81 50 85 41 88 41 91 35 98 30 Discrete Frequency Distribution: In a discrete series, the data are presented in such a way that exact measurements of units are indicated. In a discrete frequency distribution, we count the number of times each value of the variable in data given to you. This is facilitated through the technique of tally bars. In the first column, we write all values of the variable. In the second column, a vertical bar called tally bar against the variable, we write a particular value has occurred four times, for the fifth occurrence, we put a cross tally mark ( / ) on the four tally bars to make a block of 5. The technique of putting cross tally bars at every fifth repetition facilitates the counting of the number of occurrences of the value. After putting tally bars for all the values in the data; we count the number of times each value is repeated and write it against the corresponding value of the variable in the third column entitled frequency. This type of representation of the data is called discrete frequency distribution.
We are given marks of 42 students: 55 51 57 40 26 43 46 41 46 48 33 40 26 40 40 41 43 53 45 53 33 50 40 33 40 26 53 59 33 39 55 48 15 26 43 59 51 39 15 45 26 15 We can construct a discrete frequency distribution from the above given marks. **Marks of 42 Students
Marks Tally Bars Frequency ------------------------------------------** 15 ||| 3 26 5 33 |||| 4 39 || 2 40 5 41 || 2 43 ||| 3 45 || 2 46 || 2 48 || 2 50 | 1 51 || 2 53 ||| 3 55 ||| 3 57 | 1 59 || 2 Total 42 The presentation of the data in the form of a discrete frequency distribution is better than arranging but it does not condense the data as needed and is quite difficult to grasp and comprehend. This distribution is quite simple in case the values of the variable are repeated otherwise there will be hardly any condensation. Continuous Frequency Distribution; If the identity of the units about a particular information collected, is neither relevant nor is the order in which the observations occur, then the first step of condensation is to classify the data into different classes by dividing the entire group of values of the variable into a suitable number of groups and then recording the number of observations in each group. Thus, we divide the total range of values of the variable (marks of 42
Income (Rs.) No. of Persons --------------------------------------------------------- 200 – 250 50 250 – 300 100 300 – 350 70 350 – 400 130 400 – 50 50 450 – 500 100 ------------------------------------ Total 500 ------------------------------------ It is clear from the example that the exclusive method ensures continuity of the data in as much as the upper limit of one class is the lower limit of the next class. Therefore, 50 persons have their incomes between 200 to 249.99 and a person whose income is 250 shall be included in the next class of 250 – 300. According to the inclusive method, an item equal to upper limit of a class is included in that class itself. The following table demonstrates this method. ----------------------------------------------------------- Income (Rs.) No.of Persons ----------------------------------------------------------- 200 – 249 50 250 – 299 100 300 – 349 70 350 – 399 130 400 – 149 50 450 – 499 100 ---------------------------------------------------------- Total 500 ---------------------------------------------------------- Hence in the class 200 – 249, we include persons whose income is between Rs. 200 and Rs. 249.
Principles for Constructing Frequency Distributions Inspite of the great importance of classification in statistical analysis, no hard and fast rules are laid down for it. A statistician uses his discretion for classifying a frequency distribution and sound experience, wisdom, skill and aptness for an appropriate classification of the data. However, the following guidelines must be considered to construct a frequency distribution:
1. Type of classes: The classes should be clearly defined and should not lead to any ambiguity.They should be exhaustive and mutually exclusive so that any value of variable corresponds to only class. 2. Number of classes: The choice about the number of classes in which a given frequency distribution should he divided depends upon the following things; (i) The total frequency which means the total number of observations in the distribution. (ii) The nature of the data which means the size or magnitude of the values of the variable. (iii) The desired accuracy. (iv) The convenience regarding computation of the various descriptive measures of the frequency distribution such as means, variance etc. The number of classes should not be too small or too large. If the classes are few, the classification becomes very broad and rough which might obscure some important features and characteristics of the data. The accuracy of the results decreases as the number of classes becomes smaller. On the other hand, too many classes will result in a few frequencies in each class. This will give an irregular pattern of frequencies in different classes thus makes the frequency distribution irregular. Moreover a large number of classes will render the distribution too unwieldy to handle. The computational work for further processing of the data will become quite tedious and time consuming without any proportionate gain in the accuracy of the results. Hence a balance should be maintained between the loss of information in the first case and irregularity of frequency distribution in the second case, to arrive at a suitable number of classes. Normally, the number of classes should not be less than 5 and more than 20. Prof. Sturges has given a formula: k = 1 + 3.322 log n where k refers to the number of classes and n refers to total frequencies or number of observations. The value of k is rounded to the next higher integer : If n = 100 k = 1 + 3.322 log 100 = 1 + 6.644 = 8 If n = 10,000 k = 1 + 3.22 log 10,000 = 1 + 13.288 = 14 However, this rule should be applied when the number of observations are not very small.
5. Mid-value or Class Mark: The mid value or class mark is the value of a variable which is exactly at the middle of the class. The mid-value of any class is obtained by dividing the sum of the upper and lower class limits by 2. Mid value of a class = 1/2 [Lower class limit + Upper class limit] The class limits should be selected in such a manner that the observations in any class are evenly distributed throughout the class interval so that the actual average of the observations in any class is very close to the mid-value of the class. 6. Open End Classes : The classification is termed as open end classification if the lower limit of the first class or the upper limit of the last class or both are not specified and such classes in which one of the limits is missing are called open end classes. For example, the classes like the marks less than 20 or age above 60 years. As far as possible open end classes should be avoided because in such classes the mid-value cannot be accurately obtained. But if the open end classes are inevitable then it is customary to estimate the class mark or mid-value for the first class with reference to the succeeding class. In other words, we assume that the magnitude of the first class is same as that of the second class. Example : Construct a frequency distribution from the following data by inclusive method taking 4 as the class interval: 10 17 15 22 11 16 19 24 29 18 25 26 32 14 17 20 23 27 30 12 15 18 24 36 18 15 21 28 33 38 34 13 10 16 20 22 29 19 23 31 Solution : Because the minimum value of the variable is 10 which is a very convenient figure for taking the lower limit of the first class and the magnitude of the class interval is given to be 4, the classes for preparing frequency distribution by the Inclusive method will be 10 – 13, 14 – 17, 18
10 – 13 5 14 – 17 ||| 8 18 – 21 ||| 8 22 – 25 || 7 26 – 29 5 30 – 33 |||| 4 34 – 37 || 2 38 – 41 | 1 Example :** Prepare a statistical table from the following : Weekly wages (Rs.) of 100 workers of Factory A
88 23 27 28 86 96 94 93 86 99 82 24 24 55 88 99 55 86 82 36 96 39 26 54 87 100 56 84 83 46 102 48 27 26 29 100 59 83 84 48 104 46 30 29 40 101 60 89 46 49 106 33 36 30 40 103 70 90 49 50 104 36 37 40 40 106 72 94 50 60 24 39 49 46 66 107 76 96 46 67 26 78 50 44 43 46 79 99 36 68 29 67 56 99 93 48 80 102 32 51
Solution : The lowest value is 23 and the highest 106. The difference between the lowest and highest value is 83. If we take a class interval of 10. nine classes would be made. The first class should be taken as 20 – 30 instead of 23 – 33 as per the guidelines of classification.