The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In the last decade, the availability of massive amounts of new data, the development of new machine learning technologies, and the availability of scalable computing infrastructure, have given rise to a new class of computing systems. These “Cognitive Systems” learn from data, reason from models, and interact naturally with us, to perform complex tasks better than either humans or machines can do...
Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.
This talk will introduce NSF's vision for moving beyond initial, isolated approaches for data science research, services, and infrastructure, towards a cohesive, federated, national-scale approach to harness the data revolution and transform US science, engineering, and education over the next decade and beyond.
Technological advances and novel applications, such as sensors, cyber-physical systems, smart mobile devices, cloud systems, data analytics, and social networks, are making possible to capture, and to quickly process and analyze huge amounts of data from which to extract information critical for security-related tasks. In the area of cyber security, such tasks include user authentication, access control,...
The real-world big data are largely unstructured, interconnected, and in the form of natural language text. One of the grand challenges is to turn such massive unstructured data into structured ones, and then to structured networks and actionable knowledge. We propose a data-intensive text mining approach that requires only distant supervision or minimal supervision but relies on massive data. We...
Manufacturing is a critical component of the U.S. economy, responsible for 12.5% of GDP, direct employment for over 12 million people, and close to 75% of U.S. exports of goods. The U.S. manufacturing sector, while it produces 17% of the world's manufacturing output, also represents a quarter of the country's energy consumption. On the R&D side, it is responsible for 70% of all private sector...
The traditional wisdom for designing database schemas is to use a design tool (typically based on a UML or E-R model) to construct an initial data model for one's data. When one is satisfied with the result, the tool will automatically construct a collection of 3rd normal form relations for the model. Then applications are coded against this relational schema. When business circumstances change (as...
Modern microprocessors offer a rich memory hierarchy including various levels of cache and registers. Some of these memories (like main memory, L3 cache) are big but slow and shared among all cores. Others (registers, L1 cache) are fast and exclusively assigned to a single core but small. Only if the data accesses have a high locality, we can avoid excessive data transfers between the memory hierarchy...
Parallelizing data mining algorithms has become a necessity as we try to mine ever increasing volumes of data. Spatial data mining algorithms like Dbscan, Optics, Slink, etc. have been parallelized to exploit a cluster infrastructure. The efficiency achieved by existing algorithms can be attributed to spatial locality preservation using spatial indexing structures like k-d-tree, quad-tree, grid files,...
Component-centric distributed graph processing models that use bulk synchronous parallel (BSP) execution have grown popular. These overcome short-comings of Big Data platforms like Hadoop for processing large graphs. However, literature on formal analysis of these component-centric abstractions for different graphs, graph partitioning, and graph algorithms is lacking. Here, we propose an coarse-grained...
Bayesian networks are probabilistic graphical models often used in big data analytics. The problem of Bayesian network exact structure learning is to find a network structure that is optimal under certain scoring criteria. The problem is known to be NP-hard and the existing methods are both computationally and memory intensive. In this paper, we introduce a new approach for exact structure learning...
This paper presents the first Datalog evaluation engine for executing graph analytics over BSP-style graph processing engines. Building on recent advances in Datalog that support efficient evaluation of aggregates functions, it is now easy for data scientists to author many important graph algorithms succinctly. Without the burden of low-level parallelization and optimization, data scientists can...
Many studies have shown that Deep Convolutional Neural Networks (DCNNs) exhibit great accuracies given large training datasets in image recognition tasks. Optimization technique known as asynchronous mini-batch Stochastic Gradient Descent (SGD) is widely used for deep learning because it gives fast training speed and good recognition accuracies, while it may increases generalization error if training...
We analyze the convergence of a decentralized consensus algorithm with delayed gradient information across the network. The nodes in the network privately hold parts of the objective function and collaboratively solve for the consensus optimal solution of the total objective while they can only communicate with their immediate neighbors. In real-world networks, it is often difficult and sometimes...
Nowadays the explosion of Web information has led to the boom of massive web documents such as news webpages, online literature, etc. The latent topics behind the documents spread by self-evolution and mutual transition. Understanding how topics in documents evolve and transit is an important and challenging problem. Topic model is a set of powerful toolkits to model documents generation to find their...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.