Snigdha Chaturvedi

article

Predicting the impact of scientific concepts using full‐text features

Kathy McKeown, Hal Daume, Snigdha Chaturvedi, John Paparrizos, more

Journal of the Association for Information Science and Technology > 67 > 11 > 2684 - 2696

New scientific concepts, interpreted broadly, are continuously introduced in the literature, but relatively few concepts have a long‐term impact on society. The identification of such concepts is a challenging prediction task that would help multiple parties—including researchers and the general public—focus their attention within the vast scientific literature. In this paper we present a system that...

chapter

Discriminatively Enhanced Topic Models

Snigdha Chaturvedi, Hal Daume III, Taesun Moon

2013 IEEE 13th International Conference on Data Mining > 985 - 990

2013 IEEE International Conference on Data Mining (ICDM)

This paper proposes a space-efficient, discriminatively enhanced topic model: a V structured topic model with an embedded log-linear component. The discriminative log-linear component reduces the number of parameters to be learnt while outperforming baseline generative models. At the same time, the explanatory power of the generative component is not compromised. We establish its superiority over...

chapter

Automating pattern discovery for rule based data standardization systems

Snigdha Chaturvedi, K Hima Prasad, Tanveer A Faruquie, Bhupesh S Chawda, more

2013 IEEE 29th International Conference on Data Engineering (ICDE) > 1231 - 1241

2013 IEEE International Conference on Data Engineering (ICDE 2013)

Data quality is a perennial problem for many enterprise data assets. To improve data quality, businesses often employ rule based data standardization systems in which domain experts code rules for handling important and prevalent patterns. Finding these patterns is laborious and time consuming, particularly for noisy or highly specialized data sets. It is also subjective to the persons determining...

chapter

Managing data quality by identifying the noisiest data samples

K. Hima Prasad, Snigdha Chaturvedi, Tanveer A. Faruquie, L. Venkata Subramaniam, more

Proceedings of 2012 IEEE International Conference on Service Operations and Logistics, and Informatics > 90 - 95

2012 IEEE International Conference on Service Operations and Logistics and Informatics (SOLI)

Enterprise datasets are often noisy. Several columns can have non-standard, erroneous or missing information. Poor quality data can lead to incorrect reporting and wrong conclusions being drawn. Data cleansing involves standardizing such data to improve its quality. Often data cleansing tasks involve writing rules manually. The step involves understanding the data quality issues and then writing data...

chapter

Automated selection of blocking columns for record linkage

K. Hima Prasad, Snigdha Chaturvedi, Tanveer A. Faruquie, L. Venkata Subramaniam, more

Proceedings of 2012 IEEE International Conference on Service Operations and Logistics, and Informatics > 78 - 83

2012 IEEE International Conference on Service Operations and Logistics and Informatics (SOLI)

Record Linkage is an essential but expensive step in enterprise data management. In most deployments, blocking techniques are employed which can reduce the number of record pair comparisons and hence, the computational complexity of the task. Blocking algorithms require a careful selection of column(s) to be used for blocking. Selection of appropriate blocking column is critical to the accuracy and...

chapter

Optimal Training Data Selection for Rule-Based Data Cleansing Models

Snigdha Chaturvedi, Tanveer A. Faruquie, L. Venkata Subramaniam, K. Hima Prasad, more

2011 Annual SRII Global Conference > 126 - 134

2011 Annual SRII Global Conference (SRII)

Enterprises today accumulate huge quantities of data which is often noisy and unstructured in nature making data cleansing an important task. Data cleansing refers to standardizing data from different sources to a common format so that data can be better utilized. Most of the enterprise data cleansing models are rule based involving lot of manual effort. Writing data quality rules is tedious task...

chapter

Data Cleansing Techniques for Large Enterprise Datasets

K. Hima Prasad, Tanveer A. Faruquie, Sachindra Joshi, Snigdha Chaturvedi, more

2011 Annual SRII Global Conference > 135 - 144

2011 Annual SRII Global Conference (SRII)

Data quality improvement is an important aspect of enterprise data management. Data characteristics can change with customers, with domain and geography making data quality improvement a challenging task. Data quality improvement is often an iterative process which mainly involves writing a set of data quality rules for standardization and elimination of duplicates that are present within the data...

INFONA - science communication portal

Search results for: Snigdha Chaturvedi

Predicting the impact of scientific concepts using full‐text features

Discriminatively Enhanced Topic Models

Automating pattern discovery for rule based data standardization systems

Managing data quality by identifying the noisiest data samples

Automated selection of blocking columns for record linkage

Optimal Training Data Selection for Rule-Based Data Cleansing Models

Data Cleansing Techniques for Large Enterprise Datasets

Filter options

Publication date

Publication type

Keywords

Data set

INFONA - science communication portal

Search results for: Snigdha Chaturvedi

Predicting the impact of scientific concepts using full‐text features

Discriminatively Enhanced Topic Models

Automating pattern discovery for rule based data standardization systems

Managing data quality by identifying the noisiest data samples

Automated selection of blocking columns for record linkage

Optimal Training Data Selection for Rule-Based Data Cleansing Models

Data Cleansing Techniques for Large Enterprise Datasets

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Data set

Reporting an error / abuse

Sending the report failed

Accessibility options