The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The state-of-the-art scheduler of containerized cloud services considers load-balance as the only criterion and neglects many others such as application performance. In the era of Big Data, however, applications have evolved to be highly data-intensive thus perform poorly in existing systems. This particularly holds for Platform-as-a-Service environments that encourage an application model of stateless...
Over the past years, frameworks such as MapRe-duce and Spark have been introduced to ease the task of developing big data programs and applications. However, the jobs in these frameworks are roughly defined and packaged as executable jars without any functionality being exposed or described. This means that deployed jobs are not natively composable and reusable for subsequent development. Besides,...
Nowadays most metro advertising systems schedule advertising slots on digital advertising screens to achieve the maximum exposure to passengers by exploring passenger demand models. However, our empirical results show that these passenger demand models experience uncertainty at fine temporal granularity (e.g., per min). As a result, for fine-grained advertisements (shorter than one minute), a scheduling...
Volatility analysis plays a major role in finance and economics. It is the key input for many financial topics including risk management, option and derivative pricing. One pressing computational hurdle in high frequency financial statistics is the tremendous amount of data and the optimization procedures that require computing power beyond the currently available desktop systems. In this article,...
Regression problems on massive data sets are ubiquitous in many application domains including the Internet, earth and space sciences, and aviation. Support vector regression (SVR) is a popular technique for modeling the input-output relations of a set of variables under the added constraint of maximizing the margin, thereby leading to a very generalizable and regularized model. However, for a dataset...
We propose a community detection method based on K-shell. Our method determines some core nodes of the graph according to the K-shell value of these nodes. These core nodes constitute a subgraph on which we use the community detection algorithm to divide the core nodes into communities. Compared to classical methods, by this way, our proposed method removes the non-core nodes which may impact the...
Multi-tenant storage management environments typically manage multiple enterprise accounts with heterogeneous storage footprints. Identifying and grouping accounts with similar storage footprints into clusters reduces account management overhead, and provides a framework for data-driven storage recommendation services. This paper describes a method for the clustering of accounts in multi-tenant storage...
In this paper, we propose an architectural design and software framework for fast development of descriptive, diagnostic, predictive, and prescriptive analytics solutions for dynamic production processes. The proposed architecture and framework will support the storage of modular, extensible, and reusable Knowledge Base (KB) of process performance models. The approach requires the development of automatic...
Recent advancements in sensor technology offer opportunities to manage business processes in a proactive manner. To enable an effective and real-time monitoring, sensor data have to be treated and processed in an event processing manner. Complex Event Processing is an efficient technology that detects useful complex events by matching primitive sensor events using event patterns. Event patterns can...
Geoscience gives insights into our surroundings and benefits many aspects of our life. Nowadays, with massive sensors deployed to sense all kinds of parameters for environments, tens of billions, even trillions of sensed data are collected and need to be analyzed for surveillance or other purposes. From many perspectives, users always issue queries according to specific spatial and temporal predicates...
Automated inspection plays a critical role in many industrial processes, including modern assembly lines. In these processes, components are inspected to ensure adherence to design specifications. Components that are determined to be out-of-specifications are rejected. The benefits of inspection are two-fold. First, defects can be removed early in the process, preventing higher costs incurred in detecting...
As urban population grows, cities face many challenges related to transportation, resource consumption, and the environment. Ride sharing has been proposed as an effective approach to reduce traffic congestion, gasoline consumption, and pollution. Despite great promise, researchers and policy makers lack adequate tools to assess tradeoffs and benefits of various ride-sharing strategies. Existing approaches...
Graphical Model (GM) has provided a popular framework for big data analytics because it often lends itself to distributed and parallel processing by utilizing graph-based ‘local’ structures. It models correlated random variables where in particular, the max-product Belief Propagation (BP) is the most popular heuristic to compute the most-likely assignment in GMs. In the past years, it has been proven...
The wide use of XML for document management and data exchange has created the need to query large repositories of XML data. To efficiently query such large data and take advantage of parallelism, we have implemented Apache VXQuery, an open-source scalable XQuery processor. The system builds upon two other open-source frameworks: Hyracks, a parallel execution engine, and Algebricks, a language agnostic...
Truth table optimization is of great importance for simplification of combinational logic circuits. In this paper, Granular Computing (GrC) and statistic methods are combined to convert traditional big truth table optimization problem into the minimal rule discovery of the logic information system. The proposed method is the improvements of our former work. The possible solutions were searched in...
We consider the decentralized consensus optimization problem arising from in-situ seismic tomography in large-scale sensor networks. Unlike traditional seismic imaging performed in a centralized location, each node in this setting privately holds an objective function and partial data. The goal of each node is to obtain the optimal solution of the whole seismic image, while communicating only with...
A Deep Neural Network (DNN) using the same activation function for all hidden neurons has an optimization limitation due to its single mathematical functionality. To solve it, a new DNN with different activation functions is designed to globally optimize both parameters (weights and biases) and function selections. In addition, a novel Genetic Deep Neural Network (GDNN) with different activation functions...
In this paper we present a novel clustering approach based on the stochastic learning paradigm and regularization with l1-norms. Our approach is an extension of the widely acknowledged K-Means algorithm. We introduce a simple regularized dual averaging scheme for learning prototype vectors (centroids) with l1-norms in a stochastic mode. In our approach we distribute the learning of individual prototype...
Most machine learning algorithms involve solving a convex optimization problem. Traditional in-memory convex optimization solvers do not scale well with the increase in data. This paper identifies a generic convex problem for most machine learning algorithms and solves it using the Alternating Direction Method of Multipliers (ADMM). Finally such an ADMM problem transforms to an iterative system of...
The detection of outliers in time series data is a core component of many data-mining applications and broadly applied in industrial applications. In large data sets algorithms that are efficient in both time and space are required. One area where speed and storage costs can be reduced is via symbolization as a pre-processing step, additionally opening up the use of an array of discrete algorithms...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.