The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
According to a report online [34], more than 200 million unique users search for jobs online every month. This incredibly large and fast growing demand has enticed software giants such as Google and Facebook to enter this space, which was previously dominated by companies such as LinkedIn, Indeed, Dice and CareerBuilder. Recently, Google released their “AIpowered Jobs Search Engine”, “Google For Jobs”...
Highly imbalanced datasets continue to be a challenge in many data mining applications. It is surprising that state-of-the-art techniques countering class imbalances are usually very computationally expensive and therefore unscalable. Most research effort has been directed into enhancing those techniques, e.g., by focusing on borderline examples or combining multiple techniques. This is usually accompanied...
A scalable method for mining graph patterns stable under subsampling is proposed. The existing subsample stability and robustness measures are not antimonotonic according to definitions known so far. We study a broader notion of antimonotonicity for graph patterns, so that measures of subsample stability become antimonotonic. Then we propose gSOFIA for mining the most subsample-stable graph patterns...
Due to the rapid increase in the number of users owning location-based devices, there is a considerable amount of geo-tagged data available on social media websites, such as Twitter and Facebook. This geo-tagged data can be useful in a variety of ways to extract location-specific information, as well as to comprehend the variation of information across different geographical regions. A lot of techniques...
Due to the recent vast availability of transportation traffic data, major research efforts have been devoted to traffic prediction, which is useful in many applications such as urban planning, traffic management and navigations systems. Current prediction methods that independently train a model per traffic sensor cannot accurately predict traffic in every situation (e.g., rush hours, constructions...
The mining of software repositories has provided significant advances in a multitude of software engineering fields, including defect prediction. Several studies show that the performance of a software engineering technology (e.g., prediction model) differs across different project repositories. Thus, it is important that the project selection is replicable. The aim of this paper is to present STRESS,...
With the rapid development of information technology, campus card has become an important part of the construction of digital campus. Based on the campus card to build a user behavior analysis system, which can help the school to master the student school of learning, consumption and rest behavior. This article discusses the specific behavior of campus card consumption and the data mining and analysis...
Because of the crisis of unexpected events, data sources are complex and diverse. The application of the phrase weight measurement technique and the network user free marking technology in large data technology, transform the multimodal crisis information into a single information source, An integrated model for the extraction of crisis information was established. The integrative course includes...
Mining high utility patterns, the subject of which has attracted many researchers in data mining, is the process of discovering patterns with utility satisfying a minimum predetermined threshold. Many studies have been performed, but finding the suitable minimum utility threshold is problematic, because users cannot predict the appropriate threshold that affects mining performance. To solve this problem,...
Ethical guidelines of software engineering journals require authors to provide statements related to the conflict of interest and the process of obtaining consent (if human subjects are involved). The objective of this study is to review the reporting of the ethical considerations in Empirical Software Engineering - An International Journal. The results indicate that two out of seven studies reported...
Currently, open source projects receive various kinds of issues daily, because of the extreme openness of Issue Tracking System (ITS) in GitHub. ITS is a labor-intensive and time-consuming task of issue categorization for project managers. However, a contributor is only required a short textual abstract to report an issue in GitHub. Thus, most traditional classification approaches based on detailed...
Campus card system in generated a lot of data during it's operation, and the system itself cannot analyze these data. How it can be learned from these massive, outdated data for student management to assist decision-making becomes a very realistic subject. This paper takes the transaction data of campus card as the research object, and uses the comprehensive application of data warehouse, online analysis...
Attempting to avoid severe malfunction, save cost and reduce risk which could lead to serious impact on an operating subway system, applying proper maintenance modes would be critical for all installed equipment. Literature review about maintenance strategy shows that application of Data Mining algorithm and computer assisting system would be a good way for improving maintenance efficiency. The article...
This project uses association rule mining to explore relationships among potential factors related to Skin Melanoma occurrence. The goal is to see if there are any environmental or demographic factors, such as age, education, poverty, UV exposure, or others that can be identified using their spatial relationship. By analyzing data from 2004 and 2014, this study can investigate decadal trends and differences...
Cataract is a cloudiness of eye lens and studies have reported many risk factors for the development of cataract. However, the cumulative effect of multiple factors along with clinical and systemic disease conditions have not been adequately tested due to a limitation in methodology. The collection of a large volume of Electronic Health Records (EHR) offers an opportunity to apply computational tools...
Thyroid nodules are common findings and thyroid cancer is projected to be one of the leading causes of cancer in women. The EHR includes the necessary data needed to connect clinical research with patient outcomes. The objective for this project was to develop and validate a usable informatics tool for clinicians and researchers to record, analyze, and be able to manipulate the clinical and research...
An ontology is a framework for describing domain-specific knowledge in a structured format. It is comprised of a set of terms as nodes and a set of relationships between terms as directed edges to form a directed acyclic graph. Gene Ontology (GO) and Human Phenotype Ontology (HPO) are widely referred biological and biomedical ontology databases. They also provide extensive annotations of human genes...
Deep learning algorithms have recently produced state-of-the-art accuracy in many classification tasks, but this success is typically dependent on access to many annotated training examples. For domains without such data, an attractive alternative is to train models with light, or distant supervision. In this paper, we introduce a deep neural network for the Learning from Label Proportion (LLP) setting,...
Contrast patterns are itemsets that frequently occur in one dataset while not in another. These patterns have been successfully applied to many data mining domains, such as prediction, classification and clustering. However, none of the previous studies has considered extracting contrast patterns from different types of datasets. In this paper, we introduce a new type of contrast pattern, Conditional...
Forecasting models that utilize multiple predictors are gaining popularity in a variety of fields. In some cases they allow constructing more precise forecasting models, leveraging the predictive potential of many variables. Unfortunately, in practice we do not know which observed predictors have a direct impact on the target variable. Moreover, adding unrelated variables may diminish the quality...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.