The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
One challenge in big data analytics is the lack of tools to manage the complex interactions among code, data and parameters, especially in the common situation where all these factors can change a lot. We present our preliminary experience with DataLab, a system we build to manage the big data workflow. DataLab improves big data analytical workflow in several novel ways. 1) DataLab manages the revision...
In this paper we present a novel software analytics infrastructure supporting for a combination of three requirements to serve software practitioners in utilising data-driven decision making: (1) Real-time insight: streaming software analytics unify static historical and current event-stream data enabling for immediate, nearly real-time insight into software quality, processes and users; (2) Query...
While the domain of big data is anticipated to affect many aspects of human endeavour, there are numerous challenges in building big data applications among which is how to address big data characteristics in quality requirements. In this paper, we propose a novel, unified, approach for specifying big data characteristics (e.g., velocity of data arrival) in quality requirements (i.e., those requirements...
As businesses become increasingly reliant on big data analytics, it becomes increasingly important to {\em test} the choices made within the data miners. This paper reports lessons learned from the {\em BigSE Lab}, an industrial/university collaboration that augments industrial activity with low-cost testing of data miners (by graduate students). BigSE is an experiment in academic/ industrial collaboration...
In big data software engineering, the schema flexibility of NoSQL document stores is a major selling point: When the document store itself does not actively manage a schema, the data model is maintained within the application. Just like object-relational mappers for relational databases, object-NoSQL mappers are part of professional software development with NoSQL document stores. Some mappers go...
Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.
The need for application-level intelligence cannot be easily satisfied with existing architectures or methodologies that separate methods and tools for application developers and data scientists. We aim, therefore, to develop a framework (an architecture and a methodology) to make it possible to add intelligence capabilities to existing applications (decision-enablement) and to facilitate building...
The number and variety of cyber-attacks is rapidly increasing, and the rate of new software vulnerabilities is also rising dramatically. The cybersecurity community typically reacts to attacks after they occur. Being reactive is costly and can be fatal, where attacks threaten lives, important data, or mission success. Taking a proactive approach, we are: (I) identifying potential attacks before they...
Big Data technologies are rapidly becoming a key enabler for modern industries. However, the entry costs inherent to ``going Big" are considerable, ranging from learning curve, renting/buying infrastructure, etc. A key component of these costs is the time spent on learning about and designing with the many big data frameworks (e.g., Spark, Storm, HadoopMR, etc.) on the market. To reduce said...
This article articulates the requirements for an effective big data value engineering method. It then presents a value discovery method, called Eco-ARCH (Eco-ARCHitecture), tightly integrated with the BDD (Big Data Design) method for addressing these requirements, filling a methodological void. Eco-ARCH promotes a fundamental shift in design thinking for big data system design – from "bounded...
Online-activity-generated digital traces provide opportunities for novel services and unique insights as demonstrated in, for example, research on mining software repositories. The inability to link these traces within and among systems, such as Twitter, GitHub, or Reddit, inhibit the advances in this area. Furthermore, no single approach to integrate data from these disparate sources is likely to...
Elasticity is a key component of modern cloud environments and monitoring is an essential part of this process. Monitoring demonstrates several challenges including gathering metrics from a variety of layers (infrastructure, platform, application), the need for fast processing of this data to enable efficient elasticity and the proper management of this data in order to facilitate analysis of current...
Acquirers, system builders, and other stakeholders of big data systems need to define requirements, develop and evaluate solutions, and integrate systems together. A reference architecture enables these software engineering activities by standardizing nomenclature, defining key solution elements and their relationships, collecting relevant solution patterns, and classifying existing technologies....
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.