The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Data preparation is often cited as the most time consuming phase of a Knowledge Discovery and Data Mining (KDDM) process. This is attributed to the fact that this phase is highly dependent on the expertise of the analyst. Although process models exist for KDDM the description of their phases of the process focus on outlining what must be done but often do not detail how this should be done. While...
This paper presents a generalized approach for the simple linear data transformation, Y=bX, through an integration of multidimensional coordinate geometry, vector space theory and polygonal geometry. The scaling is performed by adding an additional ‘Dummy Dimension’ to the n-dimensional data, which helps plot two dimensional component-wise straight lines on pairs of dimensions. The end result is a...
India produces 1.5 million engineers every year. Identifying the significant factors that influence the salary and the jobs these engineers are offered can help us understand the inefficiencies or skill gaps in the labour market, which will be extremely useful for policy making and constructive interventions. Predictive modelling of salary was performed using different machine learning techniques...
With massive increase in the amount of data being generated each day, we need automated tools to oversee the evolution of the web and to quantify global effects like pagerank of webpages. Search engines crawl the web every now and then to build web graphs which store information about the structure of the web. This is an expensive and error prone process. Central to this problem is the notion of graph...
Online Social Network (OSN) provides fastest way to communicate and spread information, influencing users in the network. Blog sites allow the users to reflect and share opinions on various topics of discussion in the form of blogs/online journals and letting readers to comment on their blogs/posts. In this work, a novel method to profile Top Most Influential Blogger (TMIB) is proposed based on content...
The functioning of the brain is mainly controlled by nervous system. It sends out stimulus in the form of electrical pulses to different parts of the brain. These electrical pulses are recorded by a procedure known as Electroencephlography. EEG signal acquisition usually last for 2–8 hours, or can be extended over a week. This places a bigger problem on storage requirements as well as transmission...
A statistical analysis was performed on three thousand and eight hundred soil sample data from Thrissur district. Soil pH, Electrical conductivity, Organic Carbon, Phosphorus, Potassium, Calcium, Magnesium, Sulfur, Zinc, Boron, Iron, Copper and Manganese data were analyzed. Correlation analysis, ANOVA and Principal Component analysis were performed on the data set. Analysis indicate that different...
Microfinance institutions aim at offering financial services to people in low-income category, who typically lack access to traditional banking systems. Till date, greater than 15 billion U.S dollars has been infused into microfinancing, assisting more than 160 million people in developing countries. With the tremendous growth in the World Wide Web, a number of microfinance institutions have recently...
Breast cancer is the development of a malignant tumor notably in the breasts of a female. No proven cure is yet known for breast cancer, except when detected at an initial stage. This paper presents an innovative approach to the diagnosis of breast cancer by using two proposed variants of Genetic Algorithms, the Inter-Genetic Algorithm, and the Intra-Genetic Algorithm, that evolves an ensemble of...
One of the objectives of network security is to control the use of shared resources among users. In this regard, knowing the actual identity of network users is quite valuable to the intermediate nodes. The dynamic allocation of IP addresses and Network Address Translation(NAT) make it practically difficult, without any significant modifications to end user protocols and applications, to identify...
Erasure coded storage schemes offer a promising future for cloud storage. Highlights of erasure coded storage systems are that these offer the same level of fault tolerance as that of replication, at lower storage footprints. In the big data era, cloud storage systems based on data replication are of dubious usability due to 200% storage overhead in data replication systems. This has prompted storage...
Ontologies are agreement of the conceptualisation of concepts in a context that can be shared or reused and thus in consequence it aids in interoperability. The evaluation of ontology aims either for assessment of the ontology or is to grade the accessible ontologies with intent to decide on the appropriate among available alternatives for reuse purposes. The literature has immense contents related...
Many Natural Language Processing applications are underpinned by word comparison algorithms. Often, such comparison algorithms have been designed to determine how related given words are or how similar they are. In some applications, however, it may be important to distinguish between simply not being similar and oppositeness. In this paper we present work in progress on an algorithm that accounts...
Cloud computing is the Leading edge that provide resources as a service to the users. The cloud services has pay as per you use strategy. Most of the cloud service providers will be charging their users depending on their need. The ultimate aim of the service providers is to satisfy their customers with an excellent service. Considering the fact, scalability factor comes into the scenario. So close...
Distributed and Parallel computing are best alternatives for scalable clustering of huge amount of data with moderate to high dimensions, together with improved speed up. In this paper we address the problem of k-medoid clustering using MapReduce framework for distributed computing on commodity machines to evaluate its efficacy. There are mainly two issues to be tackled. The first one is, how to distribute...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.