The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
GitHub, one of the most popular social coding platforms, is the platform of reference when mining Open Source repositories to learn from past experiences. In the last years, a number of research papers have been published reporting findings based on data mined from GitHub. As the community continues to deepen in its understanding of software engineering thanks to the analysis performed on this platform,...
Exception handling is a technique that addresses exceptional conditions in applications, allowing the normal flow of execution to continue in the event of an exception and/or to report on such events. Although exception handling techniques, features and bad coding practices have been discussed both in developer communities and in the literature, there is a marked lack of empirical evidence on how...
LSB substitution steganography only takes the least significant bits in the carrier into account, which has the problems of low security and poor robustness. This paper proposes a self-contained steganography combining the MSB matching and LSB substitution. It contains two types of encoding rules to define the matching result between the secret information binary stream and the most two-significant-bit...
In order to simulate this feature and detect the salient region rapidly, we propose the Spatial-Temporal Feature in Compress Domain (STFCD) model. By respectively using H.264 residual coding length and motion vector coding length, we simulate the salient stimulus intensity and then get video saliency features. Finally, we use the linear weighted fusion algorithm to get the final video saliency maps...
LSB techniques generally embed data in the same LSB position of consecutive samples which helps intruders to extract secret information easily. This paper solve this problem by introducing a robust audio steganography technique where data is embedded in multiple layers of LSB chosen randomly and in non-consecutive samples. The choice of random LSB layers and non-consecutive pixels for embedding increases...
A+ aka Adjusted Anchored Neighborhood Regression - is a state-of-the-art method for exemplar-based single image super-resolution with low time complexity at both train and test time. By robustly training a clustered regression model over a low-resolution dictionary, its performance keeps improving with the dictionary size - even when using tens of thousands of regressors. However, this can pose a...
Convolutive non-negative matrix factorization (CNMF) is a promising method for extracting features from sequential multivariate data. Conventional algorithms for CNMF require that the structure, or the number of bases for expressing the data, be specified in advance. We are concerned with the issue of how we can select the best structure of CNMF from given data. We first introduce a framework of probabilistic...
The HEVC(H.265) has brought in significant improvements in terms of coding efficiency. However, the reduction in bitrates comes along with an increment in computational complexity. This paper presents a data mining approach to reduce the complexity of inter partition modes in HEVC. Determining the CU-splitting in inter partition modes requires substantial resources, so the goal of the work is to terminate...
Dynamic Time Warping (DTW) distance has been effectively used in mining time series data in a multitude of domains. However, in its original formulation DTW is extremely inefficient in comparing long sparse time series, containing mostly zeros and some unevenly spaced non-zero observations. Original DTW distance does not take advantage of this sparsity, leading to redundant calculations and a prohibitively...
Key foundational components of Big Data frameworks include efficient large-scale storage and high-performance linear algebra. This paper discusses efficient implementations that utilize compression techniques inspired by columnar relational databases for improving space and time profiles for vector and matrix operations. In addition, linear algebra operations are integrated with columnar relational...
In recent years, with the popularity of the running and other sports software, friend recommendation algorithm based on trajectory is gradually becoming a hot research. In this paper, the θ-ADBSCAN algorithm is used to dig the hot trail and the resident points of user's trajectory, then the trajectory segmentation algorithm is described, and the trajectory is replaced by the MTR which is composed...
In the airline industry, a Passenger Name Record (PNR) stores the travel itinerary of an individual or group of passengers travelling together. A PNR always contains all the flight information regarding each segment of a journey, and may contain additional important information such as nationality, gender and age of the passengers. From a commercial point of view, these passenger attributes are of...
Record linkage (RL) is a task in data integration that aims to identify matching records that refer to the same entity from different databases. When records from more than two databases are to be linked RL is significantly challenged by the intrinsic exponential growth in the number of potential record comparisons to be conducted. We propose a scalable meta blocking protocol to be used for Multi-Database...
Generally different websites have different web page structures, which would heavily affect the extraction quality when the web content is automatically collected. On the basis of a statistical analysis on content features and structure characteristics of News domain web pages, this paper proposes a maximum continuous sum of text density (MCSTD) method to efficiently and effectively extract web content...
The paper presents several variations of fuzzy extractors to generate cryptographic keys and password based on parameters of keystroke dynamics. A series of simulation experiments was run to estimate the efficiency of these methods, the best parameters of fuzzy extractors were found. The best result was: FRR=0.061, FAR=0.023 with a key length 192 bits.
We introduce a preferences-based itemset mining framework. Preferences are encoded by a penalty function over the transactions in a database. We define an itemset mining problem where we associate to each transaction a penalty value. This problem consists in generating the frequent itemsets with a maximum penalty threshold. We then provide a propositional satisfiability based encoding. We extend the...
Discovering useful patterns plays an essential role in data management and data mining. Frequent itemset mining in uncertain transaction databases semantically and computationally differs from traditional techniques applied on (standard) precise transaction databases. Uncertain transaction databases consist of sets of existentially uncertain items. The uncertainty of items in transactions makes traditional...
In order to reduce the pressure of data storage and transmission on satellite, researchers implemented a method of object region data extraction from remote sensing image in orbit. This method stores and downloads pixels of interesting region through interesting region labeling. But encoding data volume (EDV), hardware scale and real-time property (RTP) are difficult to be balanced. To solve this...
Grounded theory is an approach that can be used to analyse qualitative data. It is a systematic approach for data collection, handling and analysis. The objective of this paper is to present adapted grounded theory approach as data analysis strategy to identify value-based factors in software development. The grounded theory procedure started with data extraction and initial coding, memo writing and...
We are concerned with the issue of discovering behavioral patterns on the web. When a large amount of web access logs are given, we are interested in how they are categorized and how they are related to activities in real life. In order to conduct that analysis, we develop a novel algorithm for sparse non-negative matrix factorization (SNMF), which can discover patterns of web behaviors. Although...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.