Search results

Items from 1 to 20 out of 22 results

chapter

GaDei: On Scale-Up Training as a Service for Deep Learning

Wei Zhang, Minwei Feng, Yunhui Zheng, Yufei Ren, more

2017 IEEE International Conference on Data Mining (ICDM) > 1195 - 1200

2017 IEEE International Conference on Data Mining (ICDM)

Deep learning (DL) training-as-a-service (TaaS) is an important emerging industrial workload. TaaS must satisfy a wide range of customers who have no experience and/or resources to tune DL hyper-parameters (e.g., mini-batch size and learning rate), and meticulous tuning for each user's dataset is prohibitively expensive. Therefore, TaaS hyper-parameters must be fixed with values that are applicable...

chapter

Enabling high performance deep learning networks on embedded systems

Qian Li, Qingcheng Xiao, Yun Liang

IECON 2017 - 43rd Annual Conference of the IEEE Industrial Electronics Society > 8405 - 8410

IECON 2017 - 43rd Annual Conference of the IEEE Industrial Electronics Society

Deep learning is nowadays one of the most popular research topics in computer science. In recent years, the extensive application of convolutional neural network has made it become a new direction for the computer architecture research that is developing rapidly. Currently, there is a growing demand on off-line deploying deep learning network on top of embedded mobile systems. However, how to balance...

chapter

Comparative analysis of open source frameworks for machine learning with use case in single-threaded and multi-threaded modes

Yuriy Kochura, Sergii Stirenko, Oleg Alienin, Michail Novotarskiy, more

2017 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT) > 1 > 373 - 376

2017 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT)

The basic features of some of the most versatile and popular open source frameworks for machine learning (TensorFlow, Deep Learning4j, and H2O) are considered and compared. Their comparative analysis was performed and conclusions were made as to the advantages and disadvantages of these platforms. The performance tests for the de facto standard MNIST data set were carried out on H2O framework for...

chapter

Nexus: Bringing Efficient and Scalable Training to Deep Learning Frameworks

Yandong Wang, Li Zhang, Yufei Ren, Wei Zhang

2017 IEEE 25th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS) > 12 - 21

2017 IEEE 25th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)

Demand is mounting in the industry for scalable GPU-based deep learning systems. Unfortunately, existing training applications built atop popular deep learning frameworks, including Caffe, Theano, and Torch, etc, are incapable of conducting distributed GPU training over large-scale clusters.To remedy such a situation, this paper presents Nexus, a platform that allows existing deep learning frameworks...

chapter

Characterizing Deep Learning over Big Data (DLoBD) Stacks on RDMA-Capable Networks

Xiaoyi Lu, Haiyang Shi, M. Haseeb Javed, Rajarshi Biswas, more

2017 IEEE 25th Annual Symposium on High-Performance Interconnects (HOTI) > 87 - 94

2017 IEEE 25th Annual Symposium on High-Performance Interconnects (HOTI)

Deep Learning over Big Data (DLoBD) is becoming one of the most important research paradigms to mine value from the massive amount of gathered data. Many emerging deep learning frameworks start running over Big Data stacks, such as Hadoop and Spark. With the convergence of HPC, Big Data, and Deep Learning, these DLoBD stacks are taking advantage of RDMA and multi-/many-core based CPUs/GPUs. Even though...

chapter

Performance comparison of deep learning frameworks in image classification problems using convolutional and recurrent networks

Ruben D. Fonnegra, Bryan Blair, Gloria M. Diaz

2017 IEEE Colombian Conference on Communications and Computing (COLCOM) > 1 - 6

2017 IEEE Colombian Conference on Communications and Computing (COLCOM)

Deep learning methods have resulted in effective strategies for improving performance in a large number of applications, becoming one of the most used strategies by developers and researchers. In order to facilitate the implementation of those approaches, a set of software frameworks have been developed and are currently available. Selection of a specific framework is an important task, especially...

chapter

Obstacle detection and classification using deep learning for tracking in high-speed autonomous driving

Gowdham Prabhakar, Binsu Kailath, Sudha Natarajan, Rajesh Kumar

2017 IEEE Region 10 Symposium (TENSYMP) > 1 - 6

2017 IEEE Region 10 Symposium (TENSYMP)

On-road obstacle detection and classification is one of the key tasks in the perception system of self-driving vehicles. Since vehicle tracking involves localizationand association of vehicles between frames, detection and classification of vehicles is necessary. Vision-based approaches are popular for this task due to cost-effectiveness and usefulness of appearance information associated with the...

chapter

Comparison of convolutional neural network models for food image classification

Gozde Ozsert Yigit, Buse Melis Ozyildirim

2017 IEEE International Conference on INnovations in Intelligent SysTems and Applications (INISTA) > 349 - 353

2017 IEEE International Conference on INnovations in Intelligent SysTems and Applications (INISTA)

According to some estimates of World Health Organization (WHO), in 2014, more than 1.9 billion adults aged 18 years and older were overweight. Overall, about 13% of the world's adult population (11% of men and 15% of women) were obese. 39% of adults aged 18 years and over (38% of men and 40% of women) were overweight. The worldwide prevalence of obesity more than doubled between 1980 and 2014. The...

chapter

Evaluation of Deep Learning Frameworks Over Different HPC Architectures

Shayan Shams, Richard Platania, Kisung Lee, Seung-Jong Park

2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS) > 1389 - 1396

2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)

Recent advances in deep learning have enabled researchers across many disciplines to uncover new insights about large datasets. Deep neural networks have shown applicability to image, time-series, textual, and other data, all of which are available in a plethora of research fields. However, their computational complexity and large memory overhead requires advanced software and hardware technologies...

chapter

Scaling a Convolutional Neural Network for Classification of Adjective Noun Pairs with TensorFlow on GPU Clusters

Victor Campos, Francesc Sastre, Maurici Yagues, Jordi Torres, more

2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) > 677 - 682

2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

Deep neural networks have gained popularity inrecent years, obtaining outstanding results in a wide range ofapplications such as computer vision in both academia andmultiple industry areas. The progress made in recent years cannotbe understood without taking into account the technologicaladvancements seen in key domains such as High PerformanceComputing, more specifically in the Graphic Processing...

chapter

Scaling Deep Learning Workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing

Nitin A. Gawande, Joshua B. Landwehr, Jeff A. Daily, Nathan R. Tallent, more

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 399 - 408

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Deep Learning (DL) algorithms have become ubiquitous in data analytics. As a result, major computing vendors — including NVIDIA, Intel, AMD and IBM — have architectural road-maps influenced by DL workloads. Furthermore, several vendors have recently advertised new computing products as accelerating DL workloads. Unfortunately, it is difficult for data scientists to quantify the potential of these...

chapter

Age estimation from brain MRI images using deep learning

Tzu-Wei Huang, Hwann-Tzong Chen, Ryuichi Fujimoto, Koichi Ito, more

2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017) > 849 - 852

2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017)

Estimating human age from brain MR images is useful for early detection of Alzheimer's disease. In this paper we propose a fast and accurate method based on deep learning to predict subject's age. Compared with previous methods, our algorithm achieves comparable accuracy using fewer input images. With our GPU version program, the time needed to make a prediction is 20 ms. We evaluate our methods using...

chapter

A survey of deep-learning frameworks

Aniruddha Parvat, Jai Chavan, Siddhesh Kadam, Souradeep Dev, more

2017 International Conference on Inventive Systems and Control (ICISC) > 1 - 7

2017 International Conference on Inventive Systems and Control (ICISC)

Deep learning is a model of machine learning loosely based on our brain. Artificial neural network has been around since the 1950s, but recent advances in hardware like graphical processing units (GPU), software like cuDNN, TensorFlow, Torch, Caffe, Theano, Deeplearning4j, etc. and new training methods have made training artificial neural networks fast and easy. In this paper, we are comparing some...

chapter

In-datacenter performance analysis of a tensor processing unit

Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, more

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 1 - 12

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC-called a Tensor Processing Unit (TPU)-deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second...

chapter

Predicting statistics of asynchronous SGD parameters for a large-scale distributed deep learning system on GPU supercomputers

Yosuke Oyama, Akihiro Nomura, Ikuro Sato, Hiroki Nishimura, more

2016 IEEE International Conference on Big Data (Big Data) > 66 - 75

2016 IEEE International Conference on Big Data (Big Data)

Many studies have shown that Deep Convolutional Neural Networks (DCNNs) exhibit great accuracies given large training datasets in image recognition tasks. Optimization technique known as asynchronous mini-batch Stochastic Gradient Descent (SGD) is widely used for deep learning because it gives fast training speed and good recognition accuracies, while it may increases generalization error if training...

chapter

Realizing Real-Time Deep Learning-Based Super-Resolution Applications on Integrated GPUs

Sung Ye Kim, Preeti Bindu

2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA) > 693 - 696

2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)

With recent advances in deep convolutional neural networks (CNN), deep learning has brought significant quality improvement and flexibility on single image super resolution (SR). In this paper, we describe how CNN based SR can be accelerated on integrated GPUs. To this end, we employ a CNN model from an existing single image SR approach, and develop the model within a well-known deep learning framework...

chapter

Benchmarking State-of-the-Art Deep Learning Software Tools

Shaohuai Shi, Qiang Wang, Pengfei Xu, Xiaowen Chu

2016 7th International Conference on Cloud Computing and Big Data (CCBD) > 99 - 104

2016 7th International Conference on Cloud Computing and Big Data (CCBD)

Deep learning has been shown as a successful machine learning method for a variety of tasks, and its popularity results in numerous open-source deep learning software tools coming to public. Training a deep network is usually a very time-consuming process. To address the huge computational challenge in deep learning, many tools exploit hardware features such as multi-core CPUs and many-core GPUs to...

chapter

Evaluating the Energy Efficiency of Deep Convolutional Neural Networks on CPUs and GPUs

Da Li, Xinbo Chen, Michela Becchi, Ziliang Zong

2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialCom-SustainCom) > 477 - 484

In recent years convolutional neural networks (CNNs) have been successfully applied to various applications that are appropriate for deep learning, from image and video processing to speech recognition. The advancements in both hardware (e.g. more powerful GPUs) and software (e.g. deep learning models, open-source frameworks and supporting libraries) have significantly improved the accuracy and training...

chapter

A Map-Reduce Method for Training Autoencoders on Xeon Phi

Qiongjie Yao, Xiaofei Liao, Hai Jin

2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing > 1330 - 1337

The stacked autoencoder is a deep learning model that consists of multiple autoencoders. This model has been widely applied in numerous machine learning applications. A significant amount of effort has been made to increase the size of the deep learning model with respect to the size of the training dataset and the parameter of the model to improve performance. However, training a large deep learning...

chapter

A Novel Fast Approach for Convolutional Networks with Small Filters Based on GPU

Wenbin Jiang, Yiming Chen, Hai Jin, Bin Luo, more

2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems > 278 - 283

2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS) and 2015 IEEE 12th International Conf on Embedded Software and Systems (ICESS)

Recently, convolutional networks have achieved great successes in the field of computer vision. In order to improve the efficiency of convolutional networks, large amount of solutions focusing on training algorithms and parallelism strategies have been proposed. In this paper, a novel algorithm based on look-up table is proposed to speed up convolutional networks with small filters by applying GPU...

Keywords:
DEEP LEARNING
GRAPHICS PROCESSING UNITS

Publication date

Set your own date range

Keywords

MACHINE LEARNING (12)
COMPUTATIONAL MODELING (8)
NEURAL NETWORKS (8)
GPU (7)
CONVOLUTION (4)
CONVOLUTIONAL NEURAL NETWORKS (4)
INSTRUCTION SETS (4)
KERNEL (4)
DATABASES (3)
DISTRIBUTED COMPUTING (3)
HARDWARE (3)
LIBRARIES (3)
PARALLEL PROCESSING (3)
TENSORFLOW (3)
ARTIFICIAL NEURAL NETWORKS (2)
BANDWIDTH (2)
BENCHMARK TESTING (2)
BIOLOGICAL NEURAL NETWORKS (2)
CONVOLUTIONAL NEURAL NETWORK (2)
FEATURE EXTRACTION (2)
MATLAB (2)
OPTIMIZATION (2)
RECURRENT NEURAL NETWORKS (2)
TOOLS (2)
3DCNNS (1)
ACCELERATION (1)
ACCELERATOR (1)
ACTION RECOGNITION (1)
AGE ESTIMATION (1)
ALGORITHM DESIGN AND ANALYSIS (1)
ASYNCHRONOUS STOCHASTIC GRADIENT DESCENT (1)
AUTOENCODER (1)
AUTOMOBILES (1)
AUTONOMOUS DRIVING (1)
BIG DATA (1)
BRAIN-AGING (1)
BSP (1)
CAFFE (1)
CENTRAL PROCESSING UNIT (1)
CLONING (1)
CNN (1)
COMPUTER ARCHITECTURE (1)
COMPUTER VISION (1)
CONVOLUTIONAL NETWORK (1)
COPROCESSORS (1)
DEEP BELIEF NETWORK (1)
DEEP LEARNING4J (1)
DIABETES (1)
DIGITAL FILTERS (1)
DLOBD (1)
DNN (1)
DOMAIN-SPECIFIC ARCHITECTURE (1)
EMBEDDED SYSTEMS (1)
ENERGY CONSUMPTION (1)
ENERGY EFFICIENCY (1)
ENERGY-EFFICIENCY (1)
FACE (1)
FEED-FORWARD NEURAL NETWORKS (1)
FILTERING ALGORITHMS (1)
FOOD CLASSIFICATION (1)
FRAMEWORKS COMPARISON (1)
GPUS (1)
GRAPHIC PROCESSING UNIT (1)
H2O (1)
HPC (1)
IMAGE RESOLUTION (1)
INFINIBAND (1)
INTEGRATED GPUS (1)
INTEL KNIGHTS LANDING (1)
JAVA (1)
LSTM (1)
MAGNETIC RESONANCE IMAGING (1)
MAP-REDUCE (1)
MATEX (1)
MATHEMATICAL MODEL (1)
MEMORY MANAGEMENT (1)
MLP (1)
MNIST (1)
MRI (1)
MULTICORE CPU (1)
NEURAL NETWORK (1)
NUTRITION CATEGORIZATION (1)
NVIDIA DGX-1 (1)
OBESITY (1)
OBJECT CLASSIFICATION (1)
OBJECT DETECTION (1)
OPEN SOURCE SOFTWARE (1)
OPENMP (1)
PARALLEL COMPUTATION (1)
PARALLEL COMPUTING (1)
PARALLEL SYSTEMS (1)
PERFORMANCE MODELING (1)
PREDICTION ALGORITHMS (1)
PREDICTIVE MODELS (1)
PROPOSALS (1)
R-CNN (1)
RDMA (1)
more

INFONA - science communication portal

Search results

GaDei: On Scale-Up Training as a Service for Deep Learning

Enabling high performance deep learning networks on embedded systems

Comparative analysis of open source frameworks for machine learning with use case in single-threaded and multi-threaded modes

Nexus: Bringing Efficient and Scalable Training to Deep Learning Frameworks

Characterizing Deep Learning over Big Data (DLoBD) Stacks on RDMA-Capable Networks

Performance comparison of deep learning frameworks in image classification problems using convolutional and recurrent networks

Obstacle detection and classification using deep learning for tracking in high-speed autonomous driving

Comparison of convolutional neural network models for food image classification

Evaluation of Deep Learning Frameworks Over Different HPC Architectures

Scaling a Convolutional Neural Network for Classification of Adjective Noun Pairs with TensorFlow on GPU Clusters

Scaling Deep Learning Workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing

Age estimation from brain MRI images using deep learning

A survey of deep-learning frameworks

In-datacenter performance analysis of a tensor processing unit

Predicting statistics of asynchronous SGD parameters for a large-scale distributed deep learning system on GPU supercomputers

Realizing Real-Time Deep Learning-Based Super-Resolution Applications on Integrated GPUs

Benchmarking State-of-the-Art Deep Learning Software Tools

Evaluating the Energy Efficiency of Deep Convolutional Neural Networks on CPUs and GPUs

A Map-Reduce Method for Training Autoencoders on Xeon Phi

A Novel Fast Approach for Convolutional Networks with Small Filters Based on GPU

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options