Monday, September 30, 2019

Beautiful Boy and Tweak Essay

Reading two different accounts of the story of a drug addict allows much room for comparison between the two. In the case of David and Nic Sheff’s books, I was surprised at how much similarity there was between the two; they agreed on most points and there was no striking discontinuity in their stories. There is, however, a significant difference in the perspectives from which the two are told. Naturally, Nic, as the addict son, takes on a more self-centered view. In David’s book it is clear that Nic’s addiction is the one central driving force in their family life, especially in David’s daily life. In Nic’s book, however, his relationship with his father and the rest of his family is only one of the several focal points of the book; Nic is also preoccupied with girlfriends, friends, and his sponsor. David Sheff’s book is a very self-reflecting account. He is constantly analyzing the past, the decisions he’s made with Nic, and putting it all together in a desperate attempt to find answers to his son’s downfall. He struggles with a constant mental conflict: â€Å"What did I do wrong?† His preoccupation with Nic became an addiction in itself, and the obsessive worry and stress took a tremendous toll, to the point where he suffered from a massive hemorrhage. David’s view of the progression of Nic’s addiction reflects the Social Learning Theories and Psychoanalytical explanations of American drug use. At first, David views the onset of Nic’s addiction as a cause of some childhood lifestyle factors he lived through (such as the divorce). Later on, however, David realizes that there are thousands of teenagers who are reeled into the dark world of drugs and do not necessarily come from traumatic backgrounds; that these two things are not always directly linked. They are simply reinforced by others, usually drug-using friends, regardless of how they were raised. Ronald Akers built on this idea of operant conditioning by pointing out that drug-using behavior is reinforced socially more than physiologically. This is exactly what happened to Nic as he surrounded himself more and more with friends and girlfriends who induced his addiction. As evident as it may have seemed, it took David a while to shift his focus from reflecting on the past to what was quickly making Nic’s situation worse. Nic, on the other hand, is not very psychoanalytic about his addiction.  David has hope for his son, while Nic has very little hope for himself. While telling his story, David is trying to discover and unveil what led to all of this misery in his family. He longs to find answers and causations for all of it. Nic, on the other hand, doesn’t focus on the â€Å"why.† Although he has moments where he confesses he never thought he would turn out this way, he doesn’t spend much time dwelling on what led him to his addiction. Instead, Nic just tells his story and focuses on the very near future. This is typical of an addict’s mentality. Each day is so unsure and unstable that they can’t manage to plan more than one or two days ahead. The deeper he sinks into his addiction, the more surprised he is to find that he wakes up alive each day. Rather than fight it, Nic accepts the fact that he is a hardcore addict and that his life will never be the same. It took his father a much longer time to realize this and fully accept it. I was genuinely shocked at how honest Nic is throughout the whole book while telling his story. He admits that his parents are forcing him to go into a treatment center and that he has â€Å"fucked everything up beyond repair.† Most addicts make themselves seem like the victim and leave out a lot of information about their bad habits. Nic openly shares everything, even his darkest moments of intoxication and suicidal depression. He looks for ways to support the high demands of his druggie lifestyle and makes the necessary amends, even if it means stealing from his own family while they are desperately trying to help him. Nic’s selfishness, however, turns into feelings of deep guilt toward the end on the book when he is on the road to recovery and with his two parents. When his mind clears up, he realizes how badly he has torn everyone apart, especially his mom and dad. These emotional realizations are part of his recovery. David, however, experiences the exact opposite. At the early stages of his son’s addiction, he dedicated all his time and energy to the matter, to the point where he forgot about his own health and happiness. As Nic’s addiction progressed, David shifted focus to himself and stopped obsessing over everything that had to do with his son’s addiction. David’s road to recovery meant almost the exact opposite of Nic’s: dedicating more time and energy to himself rather than taking others into primary  consideration. David Sheff tells his family’s story from the very early happy days and takes his readers all the way through Nic’s descent into his darkest moments, while Tweak begins with Nic already deep into his addiction. Nic Sheff’s Tweak is the dark counterpoint to Beautiful Boy. The elder writer’s grief-filled memoir glows dimly like a distant planet of despair, while the son’s account of the same events burns like an angry Mars.

Sunday, September 29, 2019

Forest Conservation Essay

INTRODUCTION The June 1992 United Nations Conference on Environment and Development (UNCED), underlined the necessity for all countries to develop harmonised approaches in the management, conservation and sustainable development of global forests is essential to meet the socio-economic and environmental needs of the present and future generations. To achieve this goal, the UNCED also realised, among others, the need to sustain the multiple roles and functions of all types of forests, as well as the need to enhance forest conservation, management, and global forest cover as outlined in Programmes A and B of Chapter 11 under Agenda 21, respectively. In addition, the need to ensure the conservation and sustainable utilisation of biological diversity is also emphasised under Chapter 15 of Agenda 21. While all these are now being recognised, the priority is to operationalise and implement the UNCED programmes, bearing in mind that the full implementation of the adopted Statement of Forest Principles and the various forestry programme areas under Agenda 21 is feasible only on the basis of international efforts towards attaining concrete goals. Hence, this paper is intended to provide a basis for discussion on the implementation of specific aspects of these programmes, particularly that on forest conservation, enhancement of forest cover and the roles of forests, as well as to suggest possible areas of collaboration for national and international actions. 2. FOREST CONSERVATION Forests are influenced by climate, landform and soil composition and they exist in a wide variety of forms in the tropical, temperate and boreal zones of the world. Each forest type, evergreen and deciduous, coniferous and broadleaved, wet and dry, as well as closed and open canopy forests, has its own uniqueness and together these forests complement one another and perform the various socio-economic, ecological, environmental, cultural and spiritual functions. Recent surveys on a global basis suggest that there are about 1.4 million documented species, and the general consensus is that this is an underestimate – perhaps 5 – 50 million species exist in the natural ecosystems of forests, savannas, pastures and rangelands, deserts, tundra, lakes and seas. Farmers’ fields and gardens are also importance repositories of biological resources. In this context, it has been acknowledged that forests are rich in biological resources. Though covering only 13.4 per cent of the Earth’s land surface, these forests contain half of all vertebrates, 60 per cent of all known plant species, and possibly 90 per cent of the world’s total species. However, recent studies have shown that temperate and boreal forests with their extremely varied ecosystems, especially those in climatic and geographical areas where old-growth forests still occur, may be even more diverse than tropical forests in terms of variation within some species. Eventhough temperate and boreal forests generally have far fewer tree species than tropical forests, often having a tenth or less in total, certain temperate and boreal forests are now thought to be as diverse, or even more diverse, than their tropical counterparts. For example, old-growth forests in Oregon, U.S.A. are found to have arthropods in leaf litter approaching 250 different species per square meter; with 90 genera being found in the H.J. Andrews Memorial Forest research area alone (Lattin, 1990). It has been suggested that a network of 500 protected and managed areas, with an average size of 200,000 hectares, covering 10 per cent of the remaining old-growth/primary forests be the minimum acceptable target (Anon, 1991 & IUCN/UNEP/WWF, 1991). To enhance this networking and to optimise the global representativeness of these biogeographic areas for the conservation of biological diversity, a list of these areas based on mutually agreed terms by national governments should be formulated. It should also include the identification of these biogeographic areas and the development of joint mechanisms, as well as the quantification of the costs involved and the identification of sources of fund needed to manage and conserve these areas. Joint mechanisms for possible international cooperation to establish transboundary biogeographic areas should also be implemented. However, it has been recognized that totally protected areas can never be sufficiently extensive to provide for the conservation of all ecological processes and for all species. Nonetheless, there is a need to establish a minimum acceptable national target to be designated as forest conservation areas in each country. This effort could be further enhanced by establishing buffer zones of natural forests around the protected area where an inner buffer zone is devoted to basic and applied research, environmental monitoring, traditional land use, recreation and tourism or environmental education and training; and an outer buffer zone where research is applied to meet the needs of the local communities. Such management practices are in consonance with Principle 8(e) of the Forest Principles. Besides the need to set aside conservation areas, it is now being increasingly realised that sustainable production of wood, through environmentally sound selective harvesting practices is one of the most effective ways in ensuring in-situ conservation of the biological diversity of forest ecosystems. Such selectively harvested and managed forests will retain most of the diversity of the old-growth/primary forests both in terms of numbers and population of species. The economic value of the wood and the environmental benefits produced would fully justify investments made in maintaining the forest cover as exemplified in such practices in ensuring its sustainability. The implementation of environmentally sound selective harvesting practices would go a long way in promoting in-situ conservation of biological diversity and the sustainable utilisation of the forest resources. In this regard, the establishment of tree plantations would alleviate the pressure on over-harvesting the natural forests in view of the increasing demand of wood from the forests. The sustainable production of forest goods and services and the conservation of biological diversity in forest ecosystems, as well as the equitable sharing of benefits arising from the utilisation of the genetic resources would require concrete actions at both the national and international levels. In this context, it is imperative that national policy and strategies, among others, should set target on the optimum forest area for forest conservation and for the sustainable production of goods and services, as well as outline relevant measures to enhance both ex-situ and in-situ forest conservation during forest harvesting. In some cases, long term measures may include the rehabilitation and re-creation of old-growth/primary forests. In this connection, it is imperative that countries having a high proportion of their land areas under forest cover, especially the developing countries, have access to new and additional financial resources and the â€Å"transfer of environmentally sound technologies and corresponding know-how on favourable terms, including on concessional and preferential terms†, as reflected in Principles 10 and 11 respectively, of the Forest Principles; in order to ensure the sustainable management, conservation and development of their forest resources. Moreover, â€Å"trade in forest products should be based on non-discriminatory and multilaterally agreed rules and procedures consistent with international trade law and practices† and â€Å"unilateral measures,incompatible with international obligations or agreements, to restrict and/or ban international trade in timber or other forest products should be removed or avoided† as called for in Principles 13 (a) and 14 respectively, of the Forest Principles should be respected by the international community, in order to attain long-term sustainable forest conservation and management. 3. ENHANCEMENT OF FOREST COVER Enhancement of forest cover is to be viewed as a proactive measure taken to arrest and reverse the current trend of forest decline and degradation. In this context, the world’s forests have been under threat and are declining. It is estimated that forests covered four-fifths of the existing area at the beginning of the Eighteenth century. Of this total, approximately half were in tropical regions and half in temperate and boreal regions. However, these forests are declining as a result of deforestation. By the mid-Nineteenth century, it was estimated that global forest cover had decreased to 3,900 million hectares or 30 per cent of the world’s land area. The latest figure by the Food and Agriculture Organisation of the United Nations as reflected in the Forest Resources Assessment 1990 had estimated that global forest cover as at the end of 1990 had further decreased to 3,188 million hectares or about 24.4 per cent of the world’s land area. Processes of reduction and degradation of forest cover have led to an average annual loss of 0.6 per cent. Although the annual loss of temperate and boreal forests is said to be negligible in recent time, historically, large- scale deforestration had taken place in Europe during the Industrial Revolution to cater for the needs of agricultural expansion, building materials and industrial development (Hinde, 1985). In fact, it is estimated that almost 200 million hectares or more than 50 per cent of the original forest cover had been lost (UN, 1991). On the other hand, deforestation in the developing world is a rather recent phenomenon due to poverty, indebtedness and the increasing need for food, shelter and energy to cater for the growing population. In this regard, the four main causes of deforestation in developing countries are shifting cultivation, conversion to agriculture and pasture, wood removals for fuelwood and inappropriate timber utilisation, and the need for infrastructural development. For example, 39.5 per cent of the 1.54 million hectares of closed forest deforested between 1981 and 1990 in Africa was due to agriculture fallow and shifting cultivation, 35.1 per cent due to conversion to mainly permanent agriculture, and the balance 25.4 per cent due to over-exploitation and over-grazing (FAO, 1993a). However, as a result of improved socio-economic development in Africa, the rate of deforestation due to agriculture fallow and shifting cultivation had in fact decreased by 27.2 per cent when compared to 66.7 per cent which was recorded during the period 1976- 1980 (UN, 1991). Besides the loss of forest cover through deforestation, there has been a general degradation in the quality and health of global forests due to acid rain and other atmospheric pollutants, especially in developed countries, as well as through forest fires, unsustainable use as a result of inappropriate logging and fuelwood exploitation. The depletion of global forests and their degradation are causes for concern as they involve not only the loss of forest areas, but also the ultimate quality of the forests. If this trend is unchecked, the implications on the world would be catastrophic. Not only would the existence of all forest types be threatened, but the capability of these forests to perform their various roles and functions in perpetuity would also be seriously undermined. Hence, the need to address the decline in global forest areas and its degradation through enhancing forest cover is immediate. In this context, is the current global forest cover of 24.4 per cent sufficient? If not, what level of forest cover should we aim for in order to ensure that forest resources and forest lands are sustainably managed to meet the needs of the present and future generations? At the Ministerial Conference on Atmospheric Pollution and Climate Change held in the Netherlands in November, 1989, the Noordwijk Declaration on Climate Change advocated a world net forest growth of 12 million hectares per year by the turn of the century while a global forest cover of 30 per cent by the year 2000 was proposed at the second Ministerial Conference of Developing Countries on Environment and Development held in Malaysia in April, 1992. There is every indication that the existing global forest cover should be enhanced through greening of the world. In this connection, restoration of all deforested lands in the industrialised world to close to the original levels of forest coverage is improbable, but this does not mean significant reforestation and afforestation are impossible. All countries which aim for a sound environmental future should set themselves a target of a minimum level of forest cover to be maintained in perpetuity. Countries having more than 30 per cent of their land areas under forest cover after taking into account their socio-economic development needs, particularly the developing countries, should be given incentives to improve the quality of their forests, as well as assistance given to reduce their dependence on wood especially as fuel. On the other hand, countries having less than 30 per cent of their land areas under forest cover, but have the means must increase and enhance their forest cover through rehabilitation and afforestation, which may include, in some cases, the conversion of heavily subsidised farms back to forests. As for those countries which are rich but are constrained by physical and climatic conditions to grow trees because of their geographic locations, they could play their roles by assisting the poorer countries in increasing and enhancing their forest cover. As the future of forests are not only dependent on their quantity, but their quality as well, it is pertinent that all forests, especially those temperate and boreal forests of the developed countries should be protected against air-borne pollutants, particularly that of acid deposition, which are harmful to the health of the forest ecosystems. Appropriate measures should also be taken to protect forests from fire. 4. ROLES OF FORESTS A well-managed forest is a constantly self-renewing resource and provides a wide range of benefits at local, national and global levels. Some of these benefits depend on the forest being left untouched or subject to minimal interference while others can only be realised by harvesting the forest. Among the most important roles of forests are sustainable production of wood and timber products, provision of food, shelter and energy, mitigation of climate change, conservation of water and soil, as well as for recreation and ecotourism. Forests are also important repositories of biological diversity. In this regard, wood is of major economic importance as in 1990 the world’s production of industrial timber was about 1,600 million cubic metres, of which about 75 per cent came from the developed countries, while international trade in wood and wood products, as well as paper and pulp is estimated to worth US$96,000 million a year, of which about US$12,500 million comes from developing country exports (FAO, 1993b). Besides, currently fuelwood comprises about 85 per cent of the wood consumed in the developing countries and accounts for more than 75 per cent of total energy consumption in the poorest countries and that over 2,000 million people use fuelwood as the primary source of fuel (UN, 1991). In recent years, attention has also been focused on the importance of non-wood forest products which include plants for food and medicinal purposes, fibres, dyes, animal fodder and other necessities. Indonesia, for example, earns an estimated US$120 million a year from rattans, resins, sandalwood, honey, natural silk and pharmaceutical and cosmetic compounds (FAO 1990), while the local production of bidi cigarette from the tendu leaf (Diospyros melanoxylon) in India provides part-time employment for up to half a million women (FAO, 1993b). In this connection, it has been estimated that more than 200 million people in the tropics live in the forests (FAO, 1993b) and in some parts of Africa as much as 70 per cent of animal protein comes from forest games such as birds and rodents (FAO, 1990). The economic value of forests in relation to floods and soil conservation is that they may allow for agricultural and even industrial development on floodplains because they contribute to the mitigation of the effects of floods and in minimizing soil erosion especially in mountainous and hilly areas. In fact a well- managed forest would provide a number of goods and services to meet basic human needs as outlined in Annex I. 5. RECOMMENDATIONS 5.1 Forest Conservation (a) To strengthen efforts in forest conservation and the sustainable management of forest resources, it is imperative to ensure the participation of local community and that all national policy and strategies must indicate the forest area set aside for forest conservation and in the sustainable production of forest goods and services. In this context, developing countries must have access to new and additional financial resources and the transfer of environmentally sound technologies. (b) To further ensure sustainable forest conservation and sustainable forest management, the prices of timber and timber products at the market place must fully reflect both their replacement and environmental costs, and that trade in forest products should be non-discriminatory and any unilateral measures to restrict and/or ban their trade should be removed or avoided. Moreover, expenses needed for sustainable forest management, including reforestation and afforestation must be included into the cost of all kinds of production obtained from the forest resources. (c) A global network of well-managed and adequately funded protected areas be established. In this regard, a list of biogeographic areas that is mutually agreed by national governments should be prepared to ensure global representativeness of forest conservation areas. (d) In order to ensure the sharing on mutually agreed terms of benefits and profits, including biotech- nology products derived from the utilisation of biological diversity, efficient and cost-effective methodologies should be developed to assess the biological resources of forests at the genetic, species and ecosystem levels, including the development of techniques to ascribe economic values to these resources. (e) In the light of the agreement at UNCED and in accordance with the requirements of the Convention on Biological Diversity, existing forest harvesting practices should be critically reviewed to ensure effective in-situconservation of biological diversity during forest utilisation. Countries should also endeavour to identify forest ecosystems or even landscapes that are threatened with irreversible changes, as well as their causes so as to enable prompt actions to be taken to arrest them. 5.2 Enhancement of Forest Cover (a) Maintaining and enhancing forest cover, reforestation or afforestation will incur costs, either from opportunities foregone for alternative uses, or from benefits lost from existing land uses. Policy responses must take this into account. The legitimate rights of countries over their natural resources must be upheld. An equitable framework must be found to provide adequate compensation to those countries who undertake action to sustainably manage their forests in the wider interests of global environmental enhancement. (b) All countries should work towards increasing their level of forest cover to be achieved over a speci- fied time-frame and actions be taken to prepare and implement national forestry action programmes and/or plans for the management, conservation and sustainable development of forests as called for in para 11.12(b) of Chapter 11 under Agenda 21. Countries having less than 30 per cent of their land areas under forest cover, but have the means must undertake concerted efforts to increase their forest cover while rich countries which are constrained by physical and climatic factors to increase their forest cover could assist the poorer nations in increasing and enhancing their forest cover. Countries having more than 30 per cent of their land areas under forest cover after taking into account their socio-economic development needs should be recognised and appropriate incentives should be given to encourage them to improve the quality of their forests. 5.3 Roles of Forests (a) To effectively enhance the roles of forests in meeting basic human needs, it is extremely important that the underlying causes of deforestation such as poverty, population pressures, the need for food, shelter and fuel, as well as indebtedness, particularly in the developing countries, must be critically addressed. A consultative and participatory approach should be adopted involving all stakeholders. (b) For the development of management measures to be effective, full knowledge on the distribution and values of non-wood forest resources should be made available at the level compatible to those currently available for the wood resources. (c) At the landscape level, each territory should set a minimum area of forest land to safeguard the climate-and-water characteristics of the forest and that the integrity of the forest ecosystem is protected. (d) Public awareness of the roles of forests should be strengthened at the level of social and professional groups, as well as at the family level so as to ensure that the important ecological and environmental functions of forests are further enhanced for both the present and future generations. 6. CONCLUSION The above recommendations are some of the possible options that could be considered for the effective implementation of specific UNCED programmes, particularly that on forest conservation, enhancement of forest cover and roles of forsts in meeting basic human needs. Concrete actions both at the national and international levels are imperative for their effective implementation.

Saturday, September 28, 2019

Case Study Of Shakira Suffering from Rheumatic Heart Disease

1.In the present case study, Shakira is suffering from Rheumatic heart disease (RHD) which refers to the condition of damaged heart valves due to episodes of acute rheumatic fever (ARF). ARF leads to inflammation of the heart, as a result of which the normal blood flow is restricted. The complications arising due to this condition include endocarditis and stroke (Rothenbà ¼hler et al., 2014). Social determinants of health influence an individual’s health outcome, and for Shakira the two significant social determinants are unemployment and social isolation. These two factors have led to inadequate access to healthcare. Living in an isolated rural area and the poor economic condition has restricted the sufficient access to healthcare that could have a prevented the occurrence of RHD. Unemployment has a negative impact on the decision making process around care provided to an individual. In the present case, Shakira’s mother’s poor economic condition has led to an improper care approach towards Shakira (Roberts et al., 2015). As per reports, aboriginal individuals suffer more chances of developing RHD than the nonindigenous population, and this chance is 64 times greater (rhdaustralia.org.au, 2017). 2.Cultural awareness refers to the capability of a healthcare professional in being aware and knowledgeable about the cultural beliefs, values and traditions of other individuals that are distinctly different. A nurse needs to carry out research to become aware of the cultural background of Shakira and have a successful interaction (Holland, 2017). 3.Cultural sensitivity refers to the ability of a healthcare professional to perceive the cultural similarities and dissimilarities between two different individuals in a positive approach without disrespecting the other individual. A nurse needs to acknowledge the cultural beliefs and values of Shakira and not underestimate them while communicating with her (Norton & Marks-Maran, 2014). 4.The Aboriginal and Torres Strait Islander Act 2005 had been established to promote the level of self-sufficiency and independence of the Aboriginal and Torres Strait Islander population. The Act had been significant for focusing on the development of economic as well as the cultural status of this population through different programs (Willis et al., 2016). 5.The impact of colonisation on the health outcomes of the Aboriginal population is noteworthy. The reduced life expectancy of the population and the high rate of prevalence of a number of health conditions can be linked to the suffering and turmoil experienced by this population as a result of the colonisation. Due to the colonisation, there have been chaos and disturbances that have ultimately led to disputes and poor economic growth. Development and growth in different domains have been restricted to a considerable extent. The Aboriginals have therefore suffered physical and mental health concerns arising from loss, abuse and anguish. Disconnection from the mainland and non-indigenous population have added to the issues (Griffiths et al., 2016). 6.Consultation with community representatives would be a key approach to be taken on the enrolled nurse’s part in establishing an effective communication and building rapport that is culturally safe and appropriate. A community representative would be in a better position to understand the ethnic and cultural beliefs and systems of the aboriginal patient. A consultation with the representative would ensure that his advice is taken while communicating with the aboriginal individual. The representative would be helpful in guiding the manner in which the cultural beliefs are to be respected and acknowledged while communicating. In this way, the preferences of the patient would be included in his care plan (Willis et al., 2016). 7.Since Shakira and her family live in a remote rural area belonging to mostly indigenous population, chances are there that there might be a language barrier between the nurse and them. For avoiding any issues while consulting for Shakira and engaging in effective communication, it is required to have an interpreter who can foster the verbal communication. The second method that would be suitable for effective communication is understanding the level of education of the concerned individuals. Lower education level influences poor knowledge of healthcare. This is to be acknowledged, and communication is to rest upon this factor (Daly et al., 2017). 8.Displaying Aboriginal or Torres Strait Islander art and posters that are visible from the entrance to the building would ensure that Shakira and her family feel comfortable since such an approach would indicate a culturally safe and sensitive environment. Shakira and her family would feel that their cultural is being valued and respected by the care givers (Norton & Marks-Maran, 2014). 9.I have the knowledge that indigenous culture and history play an important role in shaping the interaction they have with their counterparts. The culture and the value system that they uphold are responsible for creating a disconnection between the indigenous and non-indigenous population. The nonindigenous population does not perceive the perceptions of the indigenous population in a positive manner and thus isolate them from the mainstream population. As a result of this, the later have been socially excluded and live in remote rural areas, further aggravating the concern of insufficient communication between the two groups (Holland, 2017). 10.Insufficient use of healthcare services is the first indicator of culturally unsafe practice. In such situation, the individual might not be provided with adequate care resources. The second indicator would be situations in which the health care professional would not acknowledge the concerns of the indigenous patient in relation to any health complication (Ray, 2016). Burden of Disease. (2017).  Rheumatic Heart Disease Australia. Retrieved 19 October 2017, from https://www.rhdaustralia.org.au/burden-disease Daly, J., Speedy, S., & Jackson, D. (2017).  Contexts of nursing : An introduction. Elsevier Health Sciences. Griffiths, K., Coleman, C., Lee, V., & Madden, R. (2016). How colonisation determines social justice and Indigenous health—a review of the literature.  Journal of Population Research,  33(1), 9-30. Holland, K. (2017).  Cultural awareness in   nursing and health care: an introductory text. CRC Press. Norton, D., & Marks-Maran, D. (2014). Developing cultural sensitivity and awareness in nursing overseas.  Nursing Standard,  28(44), 39-43. Ray, M. A. (2016).  Transcultural caring dynamics in nursing and health care. FA Davis. Roberts, K. V., Maguire, G. P., Brown, A., Atkinson, D. N., Remenyi, B., Wheaton, G., ... & Carapetis, J. (2015). Rheumatic heart disease in Indigenous children in northern Australia: differences in prevalence and the challenges of screening.  The Medical Journal of Australia,  203(5), 221. Rothenbà ¼hler, M., O'Sullivan, C. J., Stortecky, S., Stefanini, G. G., Spitzer, E., Estill, J., ... & Pilgrim, T. (2014). Active surveillance for rheumatic heart disease in endemic regions: a systematic review and meta-analysis of prevalence among children and adolescents.  The Lancet Global Health,  2(12), e717-e726. Willis, E., Reynolds, L., & Keleher, H. (Eds.). (2016).  Understanding the Australian health care system. Elsevier Health Sciences.

Friday, September 27, 2019

The role of race and class in the antebellum south Research Paper

The role of race and class in the antebellum south - Research Paper Example Elite white southerners viewed the change as an abolition of slavery (Fertig, 95). They believed that slavery was necessary to promote the new economy established. As such, they implemented codes that disallowed the ability for African-Americans to own or lease land, sing labor contracts, serve on juries, to vote, and testify against whites in a court of law. African-Americans did not have access to public schools whereas orphans were returned to their native countries. The elite southerners attempted to create a new economy and society because they had a comparative advantage in the production of cotton. The slaves, free blacks, and poor whites felt inferior after such a change. They believed that their providence of habits did not match that of the elite whites. As such, the notion of being inferior had a permanent defect of character that would gradually enslave them if they were to remain in such a state (Valdez,

Thursday, September 26, 2019

Multiple Regression Empirical Project Essay Example | Topics and Well Written Essays - 2500 words

Multiple Regression Empirical Project - Essay Example A billion dollar increase in net exports holding consumption and direct foreign investments constant leads to 0.47 billion dollar increase in GDP. Considering consumption alone, it was found out that a billion dollar increase in consumption leads to 1.43 billion dollars increase in GDP and that consumption levels explains 99.84% of the total variations in GDP [r2 (60) = 0.9984]. Further, taking foreign direct investments alone, it was found that a billion dollar increase in foreign direct investment leads to 5.33 billion dollars increase in GDP. This model was found to be significant at 5% level of significance and that FDI explains 96.39% of the total variations in GDP Lastly, a billion dollar increase in net exports led to 17.47 billion dollars decrease in GDP and the model with NE alone was found to be significant at 5% level of significance and that NE explains 54.41% of the total variations in GDP. This study aimed at determining the impact of responsible consumption, foreign direct investments and net exports and employed the use of secondary data to proof the objectives. Different writers have argued that consumptions and investments are the key variables on which the GDP depends most. However, other variables like irresponsible consumptions, political un-rests, environmental degradation, and lack of government priorities translate to irresponsible spending are some other factors which should be taken care of for GDP to grow. GDP is the cumulative amount of goods/services which a country produces within a given year (Hall and Mishkin 1982; Hill 1992). When GDP changes, then a country is said to have experienced economic growth (if positive change) and economic melt-down (if negative growth-previous year’s performance is better than current year’s). Factors like level of consumption, direct foreign investments and net exports are some of the factors which contribute to positive GDP growth, hence economic growth (Haron 2005). High direct foreign

Speech Essay Example | Topics and Well Written Essays - 4250 words

Speech - Essay Example The three of you spend the first several minutes discussing what you did over the weekend. You all laugh over the exploits of classmate #1, who spent all night Friday and Saturday at several nightclubs. You and classmate #2 take your work out and place it on the table. Classmate #1 does not have any work completed and explains that with finals and the beginning of a head-cold, there just was not enough time. You get angry and frustrated; after all, you spent all weekend in your dorm working on this project. You look across the table at classmate #1 who is sitting with arms crossed, glaring at you. I want Classmate #1 to understand that he has not been fair, having spent the weekend nights at clubs while I and Classmate #2 had to forego a lot of things just to be able to work on the group project. After all, the project is supposed to be the result of all our efforts. The three of us would be getting equal credit based on the overall quality of the project that we will submit. I will convey my displeasure by knocking some sense into his head. I will not be affected by his defiant stance and I will go ahead to tell him that he has no right to glare at me with his arms crossed for he is the one who has wronged me and Classmate #2. I will talk in a calm but firm manner, and will unflinchingly look at Classmate #1 straight in the eye. This way, I will show him that I am not intimidated by his purported stance. I will further drive home the point that he has to shape up and make up for his lack of output so far by having the biggest share of work to be done before Wednesday. B. Using the example above, describe each component of the Speech Communication Process. a. The speaker is a college student who is enrolled in a speech class. Filled with anger and frustration about the receiver's irresponsible ways, the speaker earnestly tries to make the receiver comprehend the message that he wants to bring across. b. The receiver is one of the speaker's classmates in the speech class. Together with yet another classmate, the three of them formed a group for the completion of a certain project due in class. The receiver is the co-member who seemed to have no intention of contributing anything to their group project. Instead of being apologetic, this receiver takes on a defiant stance when he sees the speaker's reaction to his having done no part of his individual work during the weekend. c. The message is centered on their urgent need to finish their group project before the deadline. It pertains to the prevailing situation of the three-member group. The speaker means to convey the importance of fairness and of getting credit only when deserved. d. The channels include the speaker's initially angry and frustrated reaction, and his calm and firm manners of the speaker. It also covers the clarity of the speaker's message. 2. How can you control you

Wednesday, September 25, 2019

Death Penalty Essay Example | Topics and Well Written Essays - 2000 words - 1

Death Penalty - Essay Example Our main points for the abolition of the death penalty are morality and technicality. The original arguments for capital punishment does not anymore apply but are outdated – deterrence, retribution, etc. The teachings of the Bible – the Old and the New Testaments – tell us one important aspect of creation: protect life and do not allow vengeance. God did not kill Cain for killing his own brother Abel but instead sent him on exile and put a mark on him so that no one would kill him. A passage in the Bible of states: ‘an eye for an eye, a tooth for a tooth’. This teaching does not mean taking the life of a murderer or someone who has committed a heinous crime, but it means limiting the retribution for an offense. When Jesus was presented the woman accused of adultery, he did not condemn the woman but told those present â€Å"to cast the first stone†, which means we should not condemn anyone but allow a sinner to reform. Another argument is the ground of technicality. The criminal justice system, a human system, is flawed. I mean, it is not always perfect despite all the bright, legal minds the world has ever produced. Rulings are not perfect. The Supreme Court ruled that the death penalty violated the Eight and Fourteenth Amendments. Then in another ruling in 1976, Gregg v. Georgia, the Court again contradicted itself by ruling that the death penalty per se was not unconstitutional. (Bedau 2005, p. 23) In Furman, it was ruled that some state statutes were unconstitutional, which allowed that death penalty statutes had to be rewritten. Advocates of the death penalty began proposing new laws for capital punishment. In other words, advocates of the death penalty interpreted this as an opportunity to write new laws so that there would be no more doubts of retaining the death penalty. It was reported that there were 35 states that rewrote their

Tuesday, September 24, 2019

Reproductive Rights Essay Example | Topics and Well Written Essays - 250 words

Reproductive Rights - Essay Example II. Identification and Evaluation of Ethical Principles of Reproductive Rights Reproductive rights are controversial for several reasons. According to Bellieni and Buonocore (2006), some ethical principles which can be evaluated include: the potential abuse of womens’ bodies in a male-dominated medical profession; the debate over the validity of the embryo being seen as a ‘person,’ with regard to state and federal law; and the fact that studies have shown that in vitro fertilization has shown higher risks of birth defects such as cerebral palsy in children formed as a result of the procedure (pp. 93). Abortion is legal in the U.S., according to federal law. III. The Application of Principles to Ethical Issues with Various Implications One of the main arguments that naysayers usually make with regard to embryonic procedures is that embryos are actually people, and that scientists are ‘playing God’ by creating children in a scientific fashion—som etimes discarding embryos that are damaged, which to some people is unethical because a person’s life is being discontinued.

Monday, September 23, 2019

Media industry Essay Example | Topics and Well Written Essays - 1500 words

Media industry - Essay Example (Ross 2004). The situation is not very different in the advertising industry, as we will see in the discussion on the industry. Women in British Media: Many media commentators described the 1990s as the decade of women. The (then) Conservative government in Britain initiated affirmative action to employ women in large numbers in all sectors including the media industry. There was an apprehension that there would not be enough female graduates to fill job demand. In 1990, three women were actually appointed as editors of national newspapers. This is not to say that the atmosphere of gender discrimination has totally changed. Soft Jobs and the Fair Sex: Even in the allocation of routine jobs, there is a gender divide. By convention men journalists covered news stories concerning politics, crime, finance, education and upbringing; women covered 'human interest', consumer news, culture and social policy. Men covered the 'facts', 'sensation' and 'male' angles while women covered the 'background and effects', 'compassion', 'general' and 'female' angles. The sources for men were men and women were women. The ethics behind men's stories were detached while those of women's were based on social needs. (Carter 1998 36 - Adapted from table) Is it different in the U.S. ... Broadcast journalism employs 26% women as local TV news directors, 17% as local TV managers and 13% as general managers in radio stations. The silver lining of the situation according to Byerly is that many male journalists today identify with feminism and "indeed, have taken their feminist colleagues' side to protest sexist news coverage as well as to demand more egalitarian newsroom policies." (Byerly 2004 113) And Internationally : The situation is not much different internationally. A survey by the International Federation of Journalists (IFJ), world's largest journalists' association, shows that the number of women journalists varies widely: from 6% in Sri Lanka to around 50% in Finland. But at higher - decision making - levels like editors, department heads and media owners, the numbers are very low and account for barely 6%. The percentage is highest (10-20) in Cyprus, Costa Rica, Mexico and Sweden. (Byerly 2004 109-126) Social Milieu and Women in Media: When we seek to analyze the levels at which women are employed by the media industry, it is important to consider the socio-political milieu in which the industry functions and the power the industry wields. This is because media too is a product of the socio-political milieu and cannot be considered in isolation. The social power of the communication (media) industries has long been recognised "as being at the heart of contemporary societies industrially, economically, politically and culturally" (Marris et. al. (eds.) 1999 13). Electronic age and the advent of the internet have added a new powerful thrust to traditional media, which primarily consisted of print media, radio and motion pictures and

Saturday, September 21, 2019

The significance of the post-war settlement Essay Example for Free

The significance of the post-war settlement Essay After World War 2, the extent to which participating countries had lost acted as an eye opener. What these countries expected never came to be. Having fought in war 1, and then war 2, most resources were running towards depletion and worst of all masses of people lost lives, economies, weakened, infrastructure destroyed and industrial development totally demerited. Post war era was the start to a new world order mainly to be characterized by peace, potency, mutuality and prosperity. After World War 2, every country that had taken part in it was left on a stand still. Having invested most of the resources in the war that turned out to be fruitless, it was a turning point for a majority of them. Governments had to work out new sources and systems of accumulating capital, re-institutional arrangements. The period around and after World War 2 led to advancement and fundamental industrial relations re-birth. The development at that time reflected positive future stability and durability. In the whole continent, only Sweden and Switzerland experienced tranquility. All the other countries were faced with war, colonization and enmity. Post war reconstruction of national boundaries, economical atmosphere and political systems facilitated unprecedented significance in development. The change in economical, institutional perception and perspectives is a niche definition of my understanding of post war settlement. (Fulcher,1987). In my discussion, I will scrutinize two global economical giant countries, which are Germany and Russia. To start, after World War 2 life was indeed very tough in the Federal Republic of Germany, with almost all systems in a shattered condition. Very hard living standards came up as a result of food shortage, diseases, instability, lack of job opportunities and unemployment. As a result activism and demonstrations took center stage, this was made possible by the Germany’s unionification. All employees were under the same confederation, which pushed for further reforms putting better living standards in their priority. This move turned out successive due to the fact that employees and unions existed on a mutually potent relationship, that is, one could not live without the other. Citizen thought that separation of politics and industrial issues would be a merit to their welfare. Another development that followed in turn was economical stability and expansion. To boost their progress, Britain sent food relief in 1946. Later on, Germany saw itself join membership of EEC in 1957 through the help of a strategic plan known as the `Marshal Plan’. Old industrious organizations, firms and companies remain stable as growth began to be felt. Post wartime is seen as a stability period in most countries that experienced the effects of war- Volkswagen, the automobile manufacturer is one of the companies that lived on and continue to today. Germany was forced to concentrate all it effort in policies and strategies towards economic growth. They had to halt active political presence for decades. With serious considerations being put into practice, Germany woke up to an economic excellence. As an advantage of post war activity, Germany became, and still is, among the giant economical strengths in Europe and universally. In a bid to bounce back, post wartime witnessed the practice of mass production, industries embarked on manufacturing of goods in surplus. As a result it attracted German citizens to mass consumption. This was a great move since the more they produced the more it was consumed. The gross net profit grew drastically so did their economy. Life felt cozier as job opportunities increased because of the mushrooming of many industries. Politically the country was shaped newly completely for quite a long time, at one time Germany had to lay low. They had been toppled completely and their Nazi regime wrecked down. This turn of events saw its leaders tried at Nuremberg for crimes against human rights, they had to face justice in their own home place or rather their site of propaganda brilliance. (Skidelsky, 1979) Although the late tyrant leader Adolph Hitler escaped trial and execution at that time: he later committed suicide in Berlin, at the end of the war. He felt so intimidated of the counts he would be changed for. World War 2 left many German capital towns in ruins from the massive bombings carried out in it. Germany was segregated into zones by capability and powers; this in turn resulted to a permanent political stability and settlement. The European Union that strongly stands out today has it motherly roots from world war 2 time. In fact it grew from the European Coal and Steel Community (ECSC), Its primary intention was to develop the steel and coal resources from member states, support and boost their economies.the ECSC facilitated the diffusion of the tensions that had resulted between enemy countries in world war 2. With time, this economic co-operation/merger grew enoumously , adding in new members hence broadening their scope, later the European Union was formed from EEC, European Economic Community. Many other prominent organizations today have their history date back to World War 2; for example, the world trade organization,WTO, the United Nations, the international monetary fund IMF, the World Bank in which then West Germany but now Germany had taken part in its formation and stand as members. Another very important significance is the evolution of initial and the follow up of advanced technological progress that had captivated interest during the war. The development was reflected in almost all industrial fields: in electronics, computers. These advancements helped Germany create the foundation for its realization into further development. This finally transformed to what was referred to as the post world war 2 world. The new technology, assisted in the efforts to fight diseases that had erupted during the war. Massive research, monitoring, evaluation and development, quickly attained nuclear power which had adverse impacts on the scientific fraternity, creating a network of laboratories in the whole country therafter . In addition, the struggling to crack codes initiated computer technology. There were social effects too which significantly changed almost all war participants to new degrees. One of them was increased involvement of women in the working force which replaced the place taken by many men during the war: though this initiative was reduced in the following years, due to the fast changing society. The system forced women to taking care of home and family oriented matters.(Rowthorn en.others,1992). In military aspects, the World War II had established the beginning of airpower. Germany could not be left out, this was an opportunity to draft active self defence system. They highly concentrated on project they had earlier established. Advanced aircraft composing of aeroplanes, jet fighters and missiles developed earlier saw further developments. Battle ships and tanks sprout out into the ever growing competition, with guns and ammunination reaching to leathal dynamics. Air power capability is today a constituent in any military operation or mission. (Seidman,1950) Russia on the other hand is said to be the main beneficiary when border revision was done. It saw Poland, Finland, Romania and other countries pushed into Russian boundery in their favor. Germany was not considered in the process. As a result their land area became larger creating room for development. Infrastractural systems remain almost the way they used to in Russia. Only few places had their system completely damaged. Compared to Germany, this was better off since repairs would cost less. Post war saw Germany’s roads and communicational networks left completely in a shuter. Though very many people lost lives during world war 2, in Russia than in Germany, at some point their demograpfic figure took a new turn. This was alos facilitated by increased agricultural activity. Most of Russia’s citizens depended on agriculture, fishing, forestry, or craftworks. Due to increment on the agicultural industry, their economical strength also took a stable ground-this is a major similarity that was experienced in almost all countries that had experienced war. (Gourevitch ,1985) Immortatility rates decreased and in turn working population had a future expectation. Hunger was kicked out and fertility saw the countries graphs rise demeritically. Having food and medicine eradicated the diseases that had become threatening. Politically, Russians remained constantly active compared to Germans who had to gio slow for decades. This was a crafty thing to concentrate on after the war. The approach given was also great to say, it was trying to balance business and politics. Like their counterparts the Germans, unemployment at the beginning of world war 2 hit their population badly, this led to workers demonstrations all over the country. Trade unions at the time wanted drastic change to help improve their living standards. This though was not left to spread like any other bush fire, heavy police and army contigence ensured that a runing battle existed toi keep the country tranquil and the demonstrators at bay. (Taylor,1989) Russia had severe problems following their decision to turn captured prisoners of war into plantation slave labourers. This is another reason that led to activism towards fighting for humanity/human rights. Every significant effect practiced found its relation with industrial settlement. Looking at their military state, Russia developed also in terms of strategic ideas and policies. Industrial inovations led to further outstanding developments in their manufacturing industry. This move also resulted to an interest in nuclear power. In fact, Russia heavilly invested on the project. Facing challenges though, the idea had to be carried out in top secresy due to the effect that had been seen at Nagasaki and Hiroshima. (Ferner en. Others,1994 World super power countries felt empathetic of what had happen and so took it as a mandate to control nuclear power. Countries would only be allowed to use it for developmental purposes and destruction. Russia was the most highly sort after provider of nuclear energy. They made tremendous sale that in turn aided them in developing the state. It also provided job opportunities to it people. Peace settlements characterised both Germany and Russia, this positive move indicated brighter future. Since Russia was facing challenges of cold war because of several stauch stands on their accord they saw it an apportunity to reconcile with enemy states together with Germany. The negotiations indicated that the initiative was to be the last war and a new beginning to everlasting peace. They agreed at all peace summit meetings held in Paris that idealism would take over rivalism. This was further pushed by existence of the league of nations. Expectations reached as far as waterways internationalization. This was a step ahead towards cohesive industrial relations. Independence was being experience in countries that had been colonized by Germany, they had decided to let own rule and democracy prevail. In contrary though, Russia in their side continued too occupy countries like Latvia and others with iron and steel mineral deposits or resources. Though they remain under Russia these countries witnessed new traditions that in deed very good. There was self freedom for everyone, movement was very safe generally. As a result, people grew some self determination which help industrialize Russia particularly, it was all busy-busy to earn a living a condition that turned out to be very superb.( Ryder, 2008) Preparations for compensation is another significant effort that saw ist launch in Germany more than Russia, a lot appeared to be done by Germany to cater on the aspect of enmity/crisis and conflict. Russia became laxile on this one which negatively impacted on their relationship with most countries. In turn, their industrial relations and development faced difficulties and setbacks in their wake to cold war period thereafter. Property ownership was also top after world wra 2, normally public resources were subject or prone to seizure or confiscation, by the then victorious super power countries. Many industries saw ownership and income channelled to their banks but after the war, rules changed. This saw native citizens own proprerties, industries, plantations which was a brilliant sign that the world was changing. This brought an understanding between waring communties who had different and diverse cultural variance. Although Russsia as well as Germany felt like they had lost or unfortunately settled at nothing compared to first world war, They later realized that peace was even sweeter and priceless, at some point. World war 1 settlement was not perfect, that is why war two broke out but this time around, it would be the last time blood spilled. People in fact were turning closer to God at the time, in Germany for example it saw the demise or end of the rough Nazi regime, churches were established in the folloe up.Stability in capability to keep and maintain order is something that came in world war 2 post war settlement. As a result of peace prevalence all the energies were shifted to industrial innovations and development in both Germany and Russia. In conclusion, it can said that the significance of post world war 2 was almost the same in both these two countries. The extent to which states experienced the war was relative to how harder they would be forced to work in order to achieve stability. For instance, Germany had suffered majorly in most important industries plus they had their reservouir flow of capital from their colonies in Europe and other continents stop in the wake of freedom and independence. Although it was almost incapable to bounce back they had undying determination.( Theory, 2008) On Russia case is all the same thing in all industrial developments, only that they had resources and capital that made it easier on them to progress. Politically, too, they saw a major and notable change but not as in Germany whose Nazi regime end and Hitler’s death became the starting point to humanity, democracy and of course their core economical booster indistrial stability. Reference: Fulcher J(1987). `Labour Movement Theory versus Corporatism`, Sociology Vol. 21 no 2, pp 231-252 Hyman R in Hyman R. and A. Ferner(1994) New Frontiers in European IndustrialRelations Blackwell London, Chapter one Jukka Pekkarinen, Matti Pohjola, and Bob Rowthorn (1992). Social corporatism asuperior economic system? Clarendon Press330.17 Joel Seidman(1950) Industrial and Labor Relations Review, Vol. 4, pp. 55-69 Published by: Cornell University, School of Industrial Labor Relations Peter Gourevitch (1985) Unions and Economic Change: Britain, West Germany and Sweden 331.881 GOU Ryder (2008) Post war economy: retrieved from the World Wide Web at http://www.geocities.com/Athens/Rhodes/6916/ww2.htm Skidelsky R (1979) The decline of Keynesian politics: State and economy in Contemporary capitalism: Croom Helm edition, London Taylor A J (1989) Trade Unions and Politics: a comparative introduction,Basingstoke,Macmillan Theory (2008) The world since 1945: retrieved from the World Wide Web at http://www.flowofhistory.com/units/etc/19/FC128

Friday, September 20, 2019

Data Pre-processing Tool

Data Pre-processing Tool Chapter- 2 Real life data rarely comply with the necessities of various data mining tools. It is usually inconsistent and noisy. It may contain redundant attributes, unsuitable formats etc. Hence data has to be prepared vigilantly before the data mining actually starts. It is well known fact that success of a data mining algorithm is very much dependent on the quality of data processing. Data processing is one of the most important tasks in data mining. In this context it is natural that data pre-processing is a complicated task involving large data sets. Sometimes data pre-processing take more than 50% of the total time spent in solving the data mining problem. It is crucial for data miners to choose efficient data preprocessing technique for specific data set which can not only save processing time but also retain the quality of the data for data mining process. A data pre-processing tool should help miners with many data mining activates. For example, data may be provided in different formats as discussed in previous chapter (flat files, database files etc). Data files may also have different formats of values, calculation of derived attributes, data filters, joined data sets etc. Data mining process generally starts with understanding of data. In this stage pre-processing tools may help with data exploration and data discovery tasks. Data processing includes lots of tedious works, Data pre-processing generally consists of Data Cleaning Data Integration Data Transformation And Data Reduction. In this chapter we will study all these data pre-processing activities. 2.1 Data Understanding In Data understanding phase the first task is to collect initial data and then proceed with activities in order to get well known with data, to discover data quality problems, to discover first insight into the data or to identify interesting subset to form hypothesis for hidden information. The data understanding phase according to CRISP model can be shown in following . 2.1.1 Collect Initial Data The initial collection of data includes loading of data if required for data understanding. For instance, if specific tool is applied for data understanding, it makes great sense to load your data into this tool. This attempt possibly leads to initial data preparation steps. However if data is obtained from multiple data sources then integration is an additional issue. 2.1.2 Describe data Here the gross or surface properties of the gathered data are examined. 2.1.3 Explore data This task is required to handle the data mining questions, which may be addressed using querying, visualization and reporting. These include: Sharing of key attributes, for instance the goal attribute of a prediction task Relations between pairs or small numbers of attributes Results of simple aggregations Properties of important sub-populations Simple statistical analyses. 2.1.4 Verify data quality In this step quality of data is examined. It answers questions such as: Is the data complete (does it cover all the cases required)? Is it accurate or does it contains errors and if there are errors how common are they? Are there missing values in the data? If so how are they represented, where do they occur and how common are they? 2.2 Data Preprocessing Data preprocessing phase focus on the pre-processing steps that produce the data to be mined. Data preparation or preprocessing is one most important step in data mining. Industrial practice indicates that one data is well prepared; the mined results are much more accurate. This means this step is also a very critical fro success of data mining method. Among others, data preparation mainly involves data cleaning, data integration, data transformation, and reduction. 2.2.1 Data Cleaning Data cleaning is also known as data cleansing or scrubbing. It deals with detecting and removing inconsistencies and errors from data in order to get better quality data. While using a single data source such as flat files or databases data quality problems arises due to misspellings while data entry, missing information or other invalid data. While the data is taken from the integration of multiple data sources such as data warehouses, federated database systems or global web-based information systems, the requirement for data cleaning increases significantly. This is because the multiple sources may contain redundant data in different formats. Consolidation of different data formats abs elimination of redundant information becomes necessary in order to provide access to accurate and consistent data. Good quality data requires passing a set of quality criteria. Those criteria include: Accuracy: Accuracy is an aggregated value over the criteria of integrity, consistency and density. Integrity: Integrity is an aggregated value over the criteria of completeness and validity. Completeness: completeness is achieved by correcting data containing anomalies. Validity: Validity is approximated by the amount of data satisfying integrity constraints. Consistency: consistency concerns contradictions and syntactical anomalies in data. Uniformity: it is directly related to irregularities in data. Density: The density is the quotient of missing values in the data and the number of total values ought to be known. Uniqueness: uniqueness is related to the number of duplicates present in the data. 2.2.1.1 Terms Related to Data Cleaning Data cleaning: data cleaning is the process of detecting, diagnosing, and editing damaged data. Data editing: data editing means changing the value of data which are incorrect. Data flow: data flow is defined as passing of recorded information through succeeding information carriers. Inliers: Inliers are data values falling inside the projected range. Outlier: outliers are data value falling outside the projected range. Robust estimation: evaluation of statistical parameters, using methods that are less responsive to the effect of outliers than more conventional methods are called robust method. 2.2.1.2 Definition: Data Cleaning Data cleaning is a process used to identify imprecise, incomplete, or irrational data and then improving the quality through correction of detected errors and omissions. This process may include format checks Completeness checks Reasonableness checks Limit checks Review of the data to identify outliers or other errors Assessment of data by subject area experts (e.g. taxonomic specialists). By this process suspected records are flagged, documented and checked subsequently. And finally these suspected records can be corrected. Sometimes validation checks also involve checking for compliance against applicable standards, rules, and conventions. The general framework for data cleaning given as: Define and determine error types; Search and identify error instances; Correct the errors; Document error instances and error types; and Modify data entry procedures to reduce future errors. Data cleaning process is referred by different people by a number of terms. It is a matter of preference what one uses. These terms include: Error Checking, Error Detection, Data Validation, Data Cleaning, Data Cleansing, Data Scrubbing and Error Correction. We use Data Cleaning to encompass three sub-processes, viz. Data checking and error detection; Data validation; and Error correction. A fourth improvement of the error prevention processes could perhaps be added. 2.2.1.3 Problems with Data Here we just note some key problems with data Missing data : This problem occur because of two main reasons Data are absent in source where it is expected to be present. Some times data is present are not available in appropriately form Detecting missing data is usually straightforward and simpler. Erroneous data: This problem occurs when a wrong value is recorded for a real world value. Detection of erroneous data can be quite difficult. (For instance the incorrect spelling of a name) Duplicated data : This problem occur because of two reasons Repeated entry of same real world entity with some different values Some times a real world entity may have different identifications. Repeat records are regular and frequently easy to detect. The different identification of the same real world entities can be a very hard problem to identify and solve. Heterogeneities: When data from different sources are brought together in one analysis problem heterogeneity may occur. Heterogeneity could be Structural heterogeneity arises when the data structures reflect different business usage Semantic heterogeneity arises when the meaning of data is different n each system that is being combined Heterogeneities are usually very difficult to resolve since because they usually involve a lot of contextual data that is not well defined as metadata. Information dependencies in the relationship between the different sets of attribute are commonly present. Wrong cleaning mechanisms can further damage the information in the data. Various analysis tools handle these problems in different ways. Commercial offerings are available that assist the cleaning process, but these are often problem specific. Uncertainty in information systems is a well-recognized hard problem. In following a very simple examples of missing and erroneous data is shown Extensive support for data cleaning must be provided by data warehouses. Data warehouses have high probability of â€Å"dirty data† since they load and continuously refresh huge amounts of data from a variety of sources. Since these data warehouses are used for strategic decision making therefore the correctness of their data is important to avoid wrong decisions. The ETL (Extraction, Transformation, and Loading) process for building a data warehouse is illustrated in following . Data transformations are related with schema or data translation and integration, and with filtering and aggregating data to be stored in the data warehouse. All data cleaning is classically performed in a separate data performance area prior to loading the transformed data into the warehouse. A large number of tools of varying functionality are available to support these tasks, but often a significant portion of the cleaning and transformation work has to be done manually or by low-level programs that are difficult to write and maintain. A data cleaning method should assure following: It should identify and eliminate all major errors and inconsistencies in an individual data sources and also when integrating multiple sources. Data cleaning should be supported by tools to bound manual examination and programming effort and it should be extensible so that can cover additional sources. It should be performed in association with schema related data transformations based on metadata. Data cleaning mapping functions should be specified in a declarative way and be reusable for other data sources. 2.2.1.4 Data Cleaning: Phases 1. Analysis: To identify errors and inconsistencies in the database there is a need of detailed analysis, which involves both manual inspection and automated analysis programs. This reveals where (most of) the problems are present. 2. Defining Transformation and Mapping Rules: After discovering the problems, this phase are related with defining the manner by which we are going to automate the solutions to clean the data. We will find various problems that translate to a list of activities as a result of analysis phase. Example: Remove all entries for J. Smith because they are duplicates of John Smith Find entries with `bule in colour field and change these to `blue. Find all records where the Phone number field does not match the pattern (NNNNN NNNNNN). Further steps for cleaning this data are then applied. Etc †¦ 3. Verification: In this phase we check and assess the transformation plans made in phase- 2. Without this step, we may end up making the data dirtier rather than cleaner. Since data transformation is the main step that actually changes the data itself so there is a need to be sure that the applied transformations will do it correctly. Therefore test and examine the transformation plans very carefully. Example: Let we have a very thick C++ book where it says strict in all the places where it should say struct 4. Transformation: Now if it is sure that cleaning will be done correctly, then apply the transformation verified in last step. For large database, this task is supported by a variety of tools Backflow of Cleaned Data: In a data mining the main objective is to convert and move clean data into target system. This asks for a requirement to purify legacy data. Cleansing can be a complicated process depending on the technique chosen and has to be designed carefully to achieve the objective of removal of dirty data. Some methods to accomplish the task of data cleansing of legacy system include: n Automated data cleansing n Manual data cleansing n The combined cleansing process 2.2.1.5 Missing Values Data cleaning addresses a variety of data quality problems, including noise and outliers, inconsistent data, duplicate data, and missing values. Missing values is one important problem to be addressed. Missing value problem occurs because many tuples may have no record for several attributes. For Example there is a customer sales database consisting of a whole bunch of records (lets say around 100,000) where some of the records have certain fields missing. Lets say customer income in sales data may be missing. Goal here is to find a way to predict what the missing data values should be (so that these can be filled) based on the existing data. Missing data may be due to following reasons Equipment malfunction Inconsistent with other recorded data and thus deleted Data not entered due to misunderstanding Certain data may not be considered important at the time of entry Not register history or changes of the data How to Handle Missing Values? Dealing with missing values is a regular question that has to do with the actual meaning of the data. There are various methods for handling missing entries 1. Ignore the data row. One solution of missing values is to just ignore the entire data row. This is generally done when the class label is not there (here we are assuming that the data mining goal is classification), or many attributes are missing from the row (not just one). But if the percentage of such rows is high we will definitely get a poor performance. 2. Use a global constant to fill in for missing values. We can fill in a global constant for missing values such as unknown, N/A or minus infinity. This is done because at times is just doesnt make sense to try and predict the missing value. For example if in customer sales database if, say, office address is missing for some, filling it in doesnt make much sense. This method is simple but is not full proof. 3. Use attribute mean. Let say if the average income of a a family is X you can use that value to replace missing income values in the customer sales database. 4. Use attribute mean for all samples belonging to the same class. Lets say you have a cars pricing DB that, among other things, classifies cars to Luxury and Low budget and youre dealing with missing values in the cost field. Replacing missing cost of a luxury car with the average cost of all luxury cars is probably more accurate then the value youd get if you factor in the low budget 5. Use data mining algorithm to predict the value. The value can be determined using regression, inference based tools using Bayesian formalism, decision trees, clustering algorithms etc. 2.2.1.6 Noisy Data Noise can be defined as a random error or variance in a measured variable. Due to randomness it is very difficult to follow a strategy for noise removal from the data. Real world data is not always faultless. It can suffer from corruption which may impact the interpretations of the data, models created from the data, and decisions made based on the data. Incorrect attribute values could be present because of following reasons Faulty data collection instruments Data entry problems Duplicate records Incomplete data: Inconsistent data Incorrect processing Data transmission problems Technology limitation. Inconsistency in naming convention Outliers How to handle Noisy Data? The methods for removing noise from data are as follows. 1. Binning: this approach first sort data and partition it into (equal-frequency) bins then one can smooth it using- Bin means, smooth using bin median, smooth using bin boundaries, etc. 2. Regression: in this method smoothing is done by fitting the data into regression functions. 3. Clustering: clustering detect and remove outliers from the data. 4. Combined computer and human inspection: in this approach computer detects suspicious values which are then checked by human experts (e.g., this approach deal with possible outliers).. Following methods are explained in detail as follows: Binning: Data preparation activity that converts continuous data to discrete data by replacing a value from a continuous range with a bin identifier, where each bin represents a range of values. For instance, age can be changed to bins such as 20 or under, 21-40, 41-65 and over 65. Binning methods smooth a sorted data set by consulting values around it. This is therefore called local smoothing. Let consider a binning example Binning Methods n Equal-width (distance) partitioning Divides the range into N intervals of equal size: uniform grid if A and B are the lowest and highest values of the attribute, the width of intervals will be: W = (B-A)/N. The most straightforward, but outliers may dominate presentation Skewed data is not handled well n Equal-depth (frequency) partitioning 1. It divides the range (values of a given attribute) into N intervals, each containing approximately same number of samples (elements) 2. Good data scaling 3. Managing categorical attributes can be tricky. n Smooth by bin means- Each bin value is replaced by the mean of values n Smooth by bin medians- Each bin value is replaced by the median of values n Smooth by bin boundaries Each bin value is replaced by the closest boundary value Example Let Sorted data for price (in dollars): 4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29, 34 n Partition into equal-frequency (equi-depth) bins: o Bin 1: 4, 8, 9, 15 o Bin 2: 21, 21, 24, 25 o Bin 3: 26, 28, 29, 34 n Smoothing by bin means: o Bin 1: 9, 9, 9, 9 ( for example mean of 4, 8, 9, 15 is 9) o Bin 2: 23, 23, 23, 23 o Bin 3: 29, 29, 29, 29 n Smoothing by bin boundaries: o Bin 1: 4, 4, 4, 15 o Bin 2: 21, 21, 25, 25 o Bin 3: 26, 26, 26, 34 Regression: Regression is a DM technique used to fit an equation to a dataset. The simplest form of regression is linear regression which uses the formula of a straight line (y = b+ wx) and determines the suitable values for b and w to predict the value of y based upon a given value of x. Sophisticated techniques, such as multiple regression, permit the use of more than one input variable and allow for the fitting of more complex models, such as a quadratic equation. Regression is further described in subsequent chapter while discussing predictions. Clustering: clustering is a method of grouping data into different groups , so that data in each group share similar trends and patterns. Clustering constitute a major class of data mining algorithms. These algorithms automatically partitions the data space into set of regions or cluster. The goal of the process is to find all set of similar examples in data, in some optimal fashion. Following shows three clusters. Values that fall outsid e the cluster are outliers. 4. Combined computer and human inspection: These methods find the suspicious values using the computer programs and then they are verified by human experts. By this process all outliers are checked. 2.2.1.7 Data cleaning as a process Data cleaning is the process of Detecting, Diagnosing, and Editing Data. Data cleaning is a three stage method involving repeated cycle of screening, diagnosing, and editing of suspected data abnormalities. Many data errors are detected by the way during study activities. However, it is more efficient to discover inconsistencies by actively searching for them in a planned manner. It is not always right away clear whether a data point is erroneous. Many times it requires careful examination. Likewise, missing values require additional check. Therefore, predefined rules for dealing with errors and true missing and extreme values are part of good practice. One can monitor for suspect features in survey questionnaires, databases, or analysis data. In small studies, with the examiner intimately involved at all stages, there may be small or no difference between a database and an analysis dataset. During as well as after treatment, the diagnostic and treatment phases of cleaning need insight into the sources and types of errors at all stages of the study. Data flow concept is therefore crucial in this respect. After measurement the research data go through repeated steps of- entering into information carriers, extracted, and transferred to other carriers, edited, selected, transformed, summarized, and presented. It is essential to understand that errors can occur at any stage of the data flow, including during data cleaning itself. Most of these problems are due to human error. Inaccuracy of a single data point and measurement may be tolerable, and associated to the inherent technological error of the measurement device. Therefore the process of data clenaning mus focus on those errors that are beyond small technical variations and that form a major shift within or beyond the population distribution. In turn, it must be based on understanding of technical errors and expected ranges of normal values. Some errors are worthy of higher priority, but which ones are most significant is highly study-specific. For instance in most medical epidemiological studies, errors that need to be cleaned, at all costs, include missing gender, gender misspecification, birth date or examination date errors, duplications or merging of records, and biologically impossible results. Another example is in nutrition studies, date errors lead to age errors, which in turn lead to errors in weight-for-age scoring and, further, to misclassification of subjects as under- or overweight. Errors of sex and date are particularly important because they contaminate derived variables. Prioritization is essential if the study is under time pressures or if resources for data cleaning are limited. 2.2.2 Data Integration This is a process of taking data from one or more sources and mapping it, field by field, onto a new data structure. Idea is to combine data from multiple sources into a coherent form. Various data mining projects requires data from multiple sources because n Data may be distributed over different databases or data warehouses. (for example an epidemiological study that needs information about hospital admissions and car accidents) n Sometimes data may be required from different geographic distributions, or there may be need for historical data. (e.g. integrate historical data into a new data warehouse) n There may be a necessity of enhancement of data with additional (external) data. (for improving data mining precision) 2.2.2.1 Data Integration Issues There are number of issues in data integrations. Consider two database tables. Imagine two database tables Database Table-1 Database Table-2 In integration of there two tables there are variety of issues involved such as 1. The same attribute may have different names (for example in above tables Name and Given Name are same attributes with different names) 2. An attribute may be derived from another (for example attribute Age is derived from attribute DOB) 3. Attributes might be redundant( For example attribute PID is redundant) 4. Values in attributes might be different (for example for PID 4791 values in second and third field are different in both the tables) 5. Duplicate records under different keys( there is a possibility of replication of same record with different key values) Therefore schema integration and object matching can be trickier. Question here is how equivalent entities from different sources are matched? This problem is known as entity identification problem. Conflicts have to be detected and resolved. Integration becomes easier if unique entity keys are available in all the data sets (or tables) to be linked. Metadata can help in schema integration (example of metadata for each attribute includes the name, meaning, data type and range of values permitted for the attribute) 2.2.2.1 Redundancy Redundancy is another important issue in data integration. Two given attribute (such as DOB and age for instance in give table) may be redundant if one is derived form the other attribute or set of attributes. Inconsistencies in attribute or dimension naming can lead to redundancies in the given data sets. Handling Redundant Data We can handle data redundancy problems by following ways n Use correlation analysis n Different coding / representation has to be considered (e.g. metric / imperial measures) n Careful (manual) integration of the data can reduce or prevent redundancies (and inconsistencies) n De-duplication (also called internal data linkage) o If no unique entity keys are available o Analysis of values in attributes to find duplicates n Process redundant and inconsistent data (easy if values are the same) o Delete one of the values o Average values (only for numerical attributes) o Take majority values (if more than 2 duplicates and some values are the same) Correlation analysis is explained in detail here. Correlation analysis (also called Pearsons product moment coefficient): some redundancies can be detected by using correlation analysis. Given two attributes, such analysis can measure how strong one attribute implies another. For numerical attribute we can compute correlation coefficient of two attributes A and B to evaluate the correlation between them. This is given by Where n n is the number of tuples, n and are the respective means of A and B n ÏÆ'A and ÏÆ'B are the respective standard deviation of A and B n ÃŽ £(AB) is the sum of the AB cross-product. a. If -1 b. If rA, B is equal to zero it indicates A and B are independent of each other and there is no correlation between them. c. If rA, B is less than zero then A and B are negatively correlated. , where if value of one attribute increases value of another attribute decreases. This means that one attribute discourages another attribute. It is important to note that correlation does not imply causality. That is, if A and B are correlated, this does not essentially mean that A causes B or that B causes A. for example in analyzing a demographic database, we may find that attribute representing number of accidents and the number of car theft in a region are correlated. This does not mean that one is related to another. Both may be related to third attribute, namely population. For discrete data, a correlation relation between two attributes, can be discovered by a χ ²(chi-square) test. Let A has c distinct values a1,a2,†¦Ã¢â‚¬ ¦ac and B has r different values namely b1,b2,†¦Ã¢â‚¬ ¦br The data tuple described by A and B are shown as contingency table, with c values of A (making up columns) and r values of B( making up rows). Each and every (Ai, Bj) cell in table has. X^2 = sum_{i=1}^{r} sum_{j=1}^{c} {(O_{i,j} E_{i,j})^2 over E_{i,j}} . Where n Oi, j is the observed frequency (i.e. actual count) of joint event (Ai, Bj) and n Ei, j is the expected frequency which can be computed as E_{i,j}=frac{sum_{k=1}^{c} O_{i,k} sum_{k=1}^{r} O_{k,j}}{N} , , Where n N is number of data tuple n Oi,k is number of tuples having value ai for A n Ok,j is number of tuples having value bj for B The larger the χ ² value, the more likely the variables are related. The cells that contribute the most to the χ ² value are those whose actual count is very different from the expected count Chi-Square Calculation: An Example Suppose a group of 1,500 people were surveyed. The gender of each person was noted. Each person has polled their preferred type of reading material as fiction or non-fiction. The observed frequency of each possible joint event is summarized in following table.( number in parenthesis are expected frequencies) . Calculate chi square. Play chess Not play chess Sum (row) Like science fiction 250(90) 200(360) 450 Not like science fiction 50(210) 1000(840) 1050 Sum(col.) 300 1200 1500 E11 = count (male)*count(fiction)/N = 300 * 450 / 1500 =90 and so on For this table the degree of freedom are (2-1)(2-1) =1 as table is 2X2. for 1 degree of freedom , the χ ² value needed to reject the hypothesis at the 0.001 significance level is 10.828 (taken from the table of upper percentage point of the χ ² distribution typically available in any statistic text book). Since the computed value is above this, we can reject the hypothesis that gender and preferred reading are independent and conclude that two attributes are strongly correlated for given group. Duplication must also be detected at the tuple level. The use of renormalized tables is also a source of redundancies. Redundancies may further lead to data inconsistencies (due to updating some but not others). 2.2.2.2 Detection and resolution of data value conflicts Another significant issue in data integration is the discovery and resolution of data value conflicts. For example, for the same entity, attribute values from different sources may differ. For example weight can be stored in metric unit in one source and British imperial unit in another source. For instance, for a hotel cha Data Pre-processing Tool Data Pre-processing Tool Chapter- 2 Real life data rarely comply with the necessities of various data mining tools. It is usually inconsistent and noisy. It may contain redundant attributes, unsuitable formats etc. Hence data has to be prepared vigilantly before the data mining actually starts. It is well known fact that success of a data mining algorithm is very much dependent on the quality of data processing. Data processing is one of the most important tasks in data mining. In this context it is natural that data pre-processing is a complicated task involving large data sets. Sometimes data pre-processing take more than 50% of the total time spent in solving the data mining problem. It is crucial for data miners to choose efficient data preprocessing technique for specific data set which can not only save processing time but also retain the quality of the data for data mining process. A data pre-processing tool should help miners with many data mining activates. For example, data may be provided in different formats as discussed in previous chapter (flat files, database files etc). Data files may also have different formats of values, calculation of derived attributes, data filters, joined data sets etc. Data mining process generally starts with understanding of data. In this stage pre-processing tools may help with data exploration and data discovery tasks. Data processing includes lots of tedious works, Data pre-processing generally consists of Data Cleaning Data Integration Data Transformation And Data Reduction. In this chapter we will study all these data pre-processing activities. 2.1 Data Understanding In Data understanding phase the first task is to collect initial data and then proceed with activities in order to get well known with data, to discover data quality problems, to discover first insight into the data or to identify interesting subset to form hypothesis for hidden information. The data understanding phase according to CRISP model can be shown in following . 2.1.1 Collect Initial Data The initial collection of data includes loading of data if required for data understanding. For instance, if specific tool is applied for data understanding, it makes great sense to load your data into this tool. This attempt possibly leads to initial data preparation steps. However if data is obtained from multiple data sources then integration is an additional issue. 2.1.2 Describe data Here the gross or surface properties of the gathered data are examined. 2.1.3 Explore data This task is required to handle the data mining questions, which may be addressed using querying, visualization and reporting. These include: Sharing of key attributes, for instance the goal attribute of a prediction task Relations between pairs or small numbers of attributes Results of simple aggregations Properties of important sub-populations Simple statistical analyses. 2.1.4 Verify data quality In this step quality of data is examined. It answers questions such as: Is the data complete (does it cover all the cases required)? Is it accurate or does it contains errors and if there are errors how common are they? Are there missing values in the data? If so how are they represented, where do they occur and how common are they? 2.2 Data Preprocessing Data preprocessing phase focus on the pre-processing steps that produce the data to be mined. Data preparation or preprocessing is one most important step in data mining. Industrial practice indicates that one data is well prepared; the mined results are much more accurate. This means this step is also a very critical fro success of data mining method. Among others, data preparation mainly involves data cleaning, data integration, data transformation, and reduction. 2.2.1 Data Cleaning Data cleaning is also known as data cleansing or scrubbing. It deals with detecting and removing inconsistencies and errors from data in order to get better quality data. While using a single data source such as flat files or databases data quality problems arises due to misspellings while data entry, missing information or other invalid data. While the data is taken from the integration of multiple data sources such as data warehouses, federated database systems or global web-based information systems, the requirement for data cleaning increases significantly. This is because the multiple sources may contain redundant data in different formats. Consolidation of different data formats abs elimination of redundant information becomes necessary in order to provide access to accurate and consistent data. Good quality data requires passing a set of quality criteria. Those criteria include: Accuracy: Accuracy is an aggregated value over the criteria of integrity, consistency and density. Integrity: Integrity is an aggregated value over the criteria of completeness and validity. Completeness: completeness is achieved by correcting data containing anomalies. Validity: Validity is approximated by the amount of data satisfying integrity constraints. Consistency: consistency concerns contradictions and syntactical anomalies in data. Uniformity: it is directly related to irregularities in data. Density: The density is the quotient of missing values in the data and the number of total values ought to be known. Uniqueness: uniqueness is related to the number of duplicates present in the data. 2.2.1.1 Terms Related to Data Cleaning Data cleaning: data cleaning is the process of detecting, diagnosing, and editing damaged data. Data editing: data editing means changing the value of data which are incorrect. Data flow: data flow is defined as passing of recorded information through succeeding information carriers. Inliers: Inliers are data values falling inside the projected range. Outlier: outliers are data value falling outside the projected range. Robust estimation: evaluation of statistical parameters, using methods that are less responsive to the effect of outliers than more conventional methods are called robust method. 2.2.1.2 Definition: Data Cleaning Data cleaning is a process used to identify imprecise, incomplete, or irrational data and then improving the quality through correction of detected errors and omissions. This process may include format checks Completeness checks Reasonableness checks Limit checks Review of the data to identify outliers or other errors Assessment of data by subject area experts (e.g. taxonomic specialists). By this process suspected records are flagged, documented and checked subsequently. And finally these suspected records can be corrected. Sometimes validation checks also involve checking for compliance against applicable standards, rules, and conventions. The general framework for data cleaning given as: Define and determine error types; Search and identify error instances; Correct the errors; Document error instances and error types; and Modify data entry procedures to reduce future errors. Data cleaning process is referred by different people by a number of terms. It is a matter of preference what one uses. These terms include: Error Checking, Error Detection, Data Validation, Data Cleaning, Data Cleansing, Data Scrubbing and Error Correction. We use Data Cleaning to encompass three sub-processes, viz. Data checking and error detection; Data validation; and Error correction. A fourth improvement of the error prevention processes could perhaps be added. 2.2.1.3 Problems with Data Here we just note some key problems with data Missing data : This problem occur because of two main reasons Data are absent in source where it is expected to be present. Some times data is present are not available in appropriately form Detecting missing data is usually straightforward and simpler. Erroneous data: This problem occurs when a wrong value is recorded for a real world value. Detection of erroneous data can be quite difficult. (For instance the incorrect spelling of a name) Duplicated data : This problem occur because of two reasons Repeated entry of same real world entity with some different values Some times a real world entity may have different identifications. Repeat records are regular and frequently easy to detect. The different identification of the same real world entities can be a very hard problem to identify and solve. Heterogeneities: When data from different sources are brought together in one analysis problem heterogeneity may occur. Heterogeneity could be Structural heterogeneity arises when the data structures reflect different business usage Semantic heterogeneity arises when the meaning of data is different n each system that is being combined Heterogeneities are usually very difficult to resolve since because they usually involve a lot of contextual data that is not well defined as metadata. Information dependencies in the relationship between the different sets of attribute are commonly present. Wrong cleaning mechanisms can further damage the information in the data. Various analysis tools handle these problems in different ways. Commercial offerings are available that assist the cleaning process, but these are often problem specific. Uncertainty in information systems is a well-recognized hard problem. In following a very simple examples of missing and erroneous data is shown Extensive support for data cleaning must be provided by data warehouses. Data warehouses have high probability of â€Å"dirty data† since they load and continuously refresh huge amounts of data from a variety of sources. Since these data warehouses are used for strategic decision making therefore the correctness of their data is important to avoid wrong decisions. The ETL (Extraction, Transformation, and Loading) process for building a data warehouse is illustrated in following . Data transformations are related with schema or data translation and integration, and with filtering and aggregating data to be stored in the data warehouse. All data cleaning is classically performed in a separate data performance area prior to loading the transformed data into the warehouse. A large number of tools of varying functionality are available to support these tasks, but often a significant portion of the cleaning and transformation work has to be done manually or by low-level programs that are difficult to write and maintain. A data cleaning method should assure following: It should identify and eliminate all major errors and inconsistencies in an individual data sources and also when integrating multiple sources. Data cleaning should be supported by tools to bound manual examination and programming effort and it should be extensible so that can cover additional sources. It should be performed in association with schema related data transformations based on metadata. Data cleaning mapping functions should be specified in a declarative way and be reusable for other data sources. 2.2.1.4 Data Cleaning: Phases 1. Analysis: To identify errors and inconsistencies in the database there is a need of detailed analysis, which involves both manual inspection and automated analysis programs. This reveals where (most of) the problems are present. 2. Defining Transformation and Mapping Rules: After discovering the problems, this phase are related with defining the manner by which we are going to automate the solutions to clean the data. We will find various problems that translate to a list of activities as a result of analysis phase. Example: Remove all entries for J. Smith because they are duplicates of John Smith Find entries with `bule in colour field and change these to `blue. Find all records where the Phone number field does not match the pattern (NNNNN NNNNNN). Further steps for cleaning this data are then applied. Etc †¦ 3. Verification: In this phase we check and assess the transformation plans made in phase- 2. Without this step, we may end up making the data dirtier rather than cleaner. Since data transformation is the main step that actually changes the data itself so there is a need to be sure that the applied transformations will do it correctly. Therefore test and examine the transformation plans very carefully. Example: Let we have a very thick C++ book where it says strict in all the places where it should say struct 4. Transformation: Now if it is sure that cleaning will be done correctly, then apply the transformation verified in last step. For large database, this task is supported by a variety of tools Backflow of Cleaned Data: In a data mining the main objective is to convert and move clean data into target system. This asks for a requirement to purify legacy data. Cleansing can be a complicated process depending on the technique chosen and has to be designed carefully to achieve the objective of removal of dirty data. Some methods to accomplish the task of data cleansing of legacy system include: n Automated data cleansing n Manual data cleansing n The combined cleansing process 2.2.1.5 Missing Values Data cleaning addresses a variety of data quality problems, including noise and outliers, inconsistent data, duplicate data, and missing values. Missing values is one important problem to be addressed. Missing value problem occurs because many tuples may have no record for several attributes. For Example there is a customer sales database consisting of a whole bunch of records (lets say around 100,000) where some of the records have certain fields missing. Lets say customer income in sales data may be missing. Goal here is to find a way to predict what the missing data values should be (so that these can be filled) based on the existing data. Missing data may be due to following reasons Equipment malfunction Inconsistent with other recorded data and thus deleted Data not entered due to misunderstanding Certain data may not be considered important at the time of entry Not register history or changes of the data How to Handle Missing Values? Dealing with missing values is a regular question that has to do with the actual meaning of the data. There are various methods for handling missing entries 1. Ignore the data row. One solution of missing values is to just ignore the entire data row. This is generally done when the class label is not there (here we are assuming that the data mining goal is classification), or many attributes are missing from the row (not just one). But if the percentage of such rows is high we will definitely get a poor performance. 2. Use a global constant to fill in for missing values. We can fill in a global constant for missing values such as unknown, N/A or minus infinity. This is done because at times is just doesnt make sense to try and predict the missing value. For example if in customer sales database if, say, office address is missing for some, filling it in doesnt make much sense. This method is simple but is not full proof. 3. Use attribute mean. Let say if the average income of a a family is X you can use that value to replace missing income values in the customer sales database. 4. Use attribute mean for all samples belonging to the same class. Lets say you have a cars pricing DB that, among other things, classifies cars to Luxury and Low budget and youre dealing with missing values in the cost field. Replacing missing cost of a luxury car with the average cost of all luxury cars is probably more accurate then the value youd get if you factor in the low budget 5. Use data mining algorithm to predict the value. The value can be determined using regression, inference based tools using Bayesian formalism, decision trees, clustering algorithms etc. 2.2.1.6 Noisy Data Noise can be defined as a random error or variance in a measured variable. Due to randomness it is very difficult to follow a strategy for noise removal from the data. Real world data is not always faultless. It can suffer from corruption which may impact the interpretations of the data, models created from the data, and decisions made based on the data. Incorrect attribute values could be present because of following reasons Faulty data collection instruments Data entry problems Duplicate records Incomplete data: Inconsistent data Incorrect processing Data transmission problems Technology limitation. Inconsistency in naming convention Outliers How to handle Noisy Data? The methods for removing noise from data are as follows. 1. Binning: this approach first sort data and partition it into (equal-frequency) bins then one can smooth it using- Bin means, smooth using bin median, smooth using bin boundaries, etc. 2. Regression: in this method smoothing is done by fitting the data into regression functions. 3. Clustering: clustering detect and remove outliers from the data. 4. Combined computer and human inspection: in this approach computer detects suspicious values which are then checked by human experts (e.g., this approach deal with possible outliers).. Following methods are explained in detail as follows: Binning: Data preparation activity that converts continuous data to discrete data by replacing a value from a continuous range with a bin identifier, where each bin represents a range of values. For instance, age can be changed to bins such as 20 or under, 21-40, 41-65 and over 65. Binning methods smooth a sorted data set by consulting values around it. This is therefore called local smoothing. Let consider a binning example Binning Methods n Equal-width (distance) partitioning Divides the range into N intervals of equal size: uniform grid if A and B are the lowest and highest values of the attribute, the width of intervals will be: W = (B-A)/N. The most straightforward, but outliers may dominate presentation Skewed data is not handled well n Equal-depth (frequency) partitioning 1. It divides the range (values of a given attribute) into N intervals, each containing approximately same number of samples (elements) 2. Good data scaling 3. Managing categorical attributes can be tricky. n Smooth by bin means- Each bin value is replaced by the mean of values n Smooth by bin medians- Each bin value is replaced by the median of values n Smooth by bin boundaries Each bin value is replaced by the closest boundary value Example Let Sorted data for price (in dollars): 4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29, 34 n Partition into equal-frequency (equi-depth) bins: o Bin 1: 4, 8, 9, 15 o Bin 2: 21, 21, 24, 25 o Bin 3: 26, 28, 29, 34 n Smoothing by bin means: o Bin 1: 9, 9, 9, 9 ( for example mean of 4, 8, 9, 15 is 9) o Bin 2: 23, 23, 23, 23 o Bin 3: 29, 29, 29, 29 n Smoothing by bin boundaries: o Bin 1: 4, 4, 4, 15 o Bin 2: 21, 21, 25, 25 o Bin 3: 26, 26, 26, 34 Regression: Regression is a DM technique used to fit an equation to a dataset. The simplest form of regression is linear regression which uses the formula of a straight line (y = b+ wx) and determines the suitable values for b and w to predict the value of y based upon a given value of x. Sophisticated techniques, such as multiple regression, permit the use of more than one input variable and allow for the fitting of more complex models, such as a quadratic equation. Regression is further described in subsequent chapter while discussing predictions. Clustering: clustering is a method of grouping data into different groups , so that data in each group share similar trends and patterns. Clustering constitute a major class of data mining algorithms. These algorithms automatically partitions the data space into set of regions or cluster. The goal of the process is to find all set of similar examples in data, in some optimal fashion. Following shows three clusters. Values that fall outsid e the cluster are outliers. 4. Combined computer and human inspection: These methods find the suspicious values using the computer programs and then they are verified by human experts. By this process all outliers are checked. 2.2.1.7 Data cleaning as a process Data cleaning is the process of Detecting, Diagnosing, and Editing Data. Data cleaning is a three stage method involving repeated cycle of screening, diagnosing, and editing of suspected data abnormalities. Many data errors are detected by the way during study activities. However, it is more efficient to discover inconsistencies by actively searching for them in a planned manner. It is not always right away clear whether a data point is erroneous. Many times it requires careful examination. Likewise, missing values require additional check. Therefore, predefined rules for dealing with errors and true missing and extreme values are part of good practice. One can monitor for suspect features in survey questionnaires, databases, or analysis data. In small studies, with the examiner intimately involved at all stages, there may be small or no difference between a database and an analysis dataset. During as well as after treatment, the diagnostic and treatment phases of cleaning need insight into the sources and types of errors at all stages of the study. Data flow concept is therefore crucial in this respect. After measurement the research data go through repeated steps of- entering into information carriers, extracted, and transferred to other carriers, edited, selected, transformed, summarized, and presented. It is essential to understand that errors can occur at any stage of the data flow, including during data cleaning itself. Most of these problems are due to human error. Inaccuracy of a single data point and measurement may be tolerable, and associated to the inherent technological error of the measurement device. Therefore the process of data clenaning mus focus on those errors that are beyond small technical variations and that form a major shift within or beyond the population distribution. In turn, it must be based on understanding of technical errors and expected ranges of normal values. Some errors are worthy of higher priority, but which ones are most significant is highly study-specific. For instance in most medical epidemiological studies, errors that need to be cleaned, at all costs, include missing gender, gender misspecification, birth date or examination date errors, duplications or merging of records, and biologically impossible results. Another example is in nutrition studies, date errors lead to age errors, which in turn lead to errors in weight-for-age scoring and, further, to misclassification of subjects as under- or overweight. Errors of sex and date are particularly important because they contaminate derived variables. Prioritization is essential if the study is under time pressures or if resources for data cleaning are limited. 2.2.2 Data Integration This is a process of taking data from one or more sources and mapping it, field by field, onto a new data structure. Idea is to combine data from multiple sources into a coherent form. Various data mining projects requires data from multiple sources because n Data may be distributed over different databases or data warehouses. (for example an epidemiological study that needs information about hospital admissions and car accidents) n Sometimes data may be required from different geographic distributions, or there may be need for historical data. (e.g. integrate historical data into a new data warehouse) n There may be a necessity of enhancement of data with additional (external) data. (for improving data mining precision) 2.2.2.1 Data Integration Issues There are number of issues in data integrations. Consider two database tables. Imagine two database tables Database Table-1 Database Table-2 In integration of there two tables there are variety of issues involved such as 1. The same attribute may have different names (for example in above tables Name and Given Name are same attributes with different names) 2. An attribute may be derived from another (for example attribute Age is derived from attribute DOB) 3. Attributes might be redundant( For example attribute PID is redundant) 4. Values in attributes might be different (for example for PID 4791 values in second and third field are different in both the tables) 5. Duplicate records under different keys( there is a possibility of replication of same record with different key values) Therefore schema integration and object matching can be trickier. Question here is how equivalent entities from different sources are matched? This problem is known as entity identification problem. Conflicts have to be detected and resolved. Integration becomes easier if unique entity keys are available in all the data sets (or tables) to be linked. Metadata can help in schema integration (example of metadata for each attribute includes the name, meaning, data type and range of values permitted for the attribute) 2.2.2.1 Redundancy Redundancy is another important issue in data integration. Two given attribute (such as DOB and age for instance in give table) may be redundant if one is derived form the other attribute or set of attributes. Inconsistencies in attribute or dimension naming can lead to redundancies in the given data sets. Handling Redundant Data We can handle data redundancy problems by following ways n Use correlation analysis n Different coding / representation has to be considered (e.g. metric / imperial measures) n Careful (manual) integration of the data can reduce or prevent redundancies (and inconsistencies) n De-duplication (also called internal data linkage) o If no unique entity keys are available o Analysis of values in attributes to find duplicates n Process redundant and inconsistent data (easy if values are the same) o Delete one of the values o Average values (only for numerical attributes) o Take majority values (if more than 2 duplicates and some values are the same) Correlation analysis is explained in detail here. Correlation analysis (also called Pearsons product moment coefficient): some redundancies can be detected by using correlation analysis. Given two attributes, such analysis can measure how strong one attribute implies another. For numerical attribute we can compute correlation coefficient of two attributes A and B to evaluate the correlation between them. This is given by Where n n is the number of tuples, n and are the respective means of A and B n ÏÆ'A and ÏÆ'B are the respective standard deviation of A and B n ÃŽ £(AB) is the sum of the AB cross-product. a. If -1 b. If rA, B is equal to zero it indicates A and B are independent of each other and there is no correlation between them. c. If rA, B is less than zero then A and B are negatively correlated. , where if value of one attribute increases value of another attribute decreases. This means that one attribute discourages another attribute. It is important to note that correlation does not imply causality. That is, if A and B are correlated, this does not essentially mean that A causes B or that B causes A. for example in analyzing a demographic database, we may find that attribute representing number of accidents and the number of car theft in a region are correlated. This does not mean that one is related to another. Both may be related to third attribute, namely population. For discrete data, a correlation relation between two attributes, can be discovered by a χ ²(chi-square) test. Let A has c distinct values a1,a2,†¦Ã¢â‚¬ ¦ac and B has r different values namely b1,b2,†¦Ã¢â‚¬ ¦br The data tuple described by A and B are shown as contingency table, with c values of A (making up columns) and r values of B( making up rows). Each and every (Ai, Bj) cell in table has. X^2 = sum_{i=1}^{r} sum_{j=1}^{c} {(O_{i,j} E_{i,j})^2 over E_{i,j}} . Where n Oi, j is the observed frequency (i.e. actual count) of joint event (Ai, Bj) and n Ei, j is the expected frequency which can be computed as E_{i,j}=frac{sum_{k=1}^{c} O_{i,k} sum_{k=1}^{r} O_{k,j}}{N} , , Where n N is number of data tuple n Oi,k is number of tuples having value ai for A n Ok,j is number of tuples having value bj for B The larger the χ ² value, the more likely the variables are related. The cells that contribute the most to the χ ² value are those whose actual count is very different from the expected count Chi-Square Calculation: An Example Suppose a group of 1,500 people were surveyed. The gender of each person was noted. Each person has polled their preferred type of reading material as fiction or non-fiction. The observed frequency of each possible joint event is summarized in following table.( number in parenthesis are expected frequencies) . Calculate chi square. Play chess Not play chess Sum (row) Like science fiction 250(90) 200(360) 450 Not like science fiction 50(210) 1000(840) 1050 Sum(col.) 300 1200 1500 E11 = count (male)*count(fiction)/N = 300 * 450 / 1500 =90 and so on For this table the degree of freedom are (2-1)(2-1) =1 as table is 2X2. for 1 degree of freedom , the χ ² value needed to reject the hypothesis at the 0.001 significance level is 10.828 (taken from the table of upper percentage point of the χ ² distribution typically available in any statistic text book). Since the computed value is above this, we can reject the hypothesis that gender and preferred reading are independent and conclude that two attributes are strongly correlated for given group. Duplication must also be detected at the tuple level. The use of renormalized tables is also a source of redundancies. Redundancies may further lead to data inconsistencies (due to updating some but not others). 2.2.2.2 Detection and resolution of data value conflicts Another significant issue in data integration is the discovery and resolution of data value conflicts. For example, for the same entity, attribute values from different sources may differ. For example weight can be stored in metric unit in one source and British imperial unit in another source. For instance, for a hotel cha