In the past decade, AI/ML technologies have become pervasive in academia and industry, finding their utility in newer and challenging applications. While there has been a focus to build better, smarter and automated ML models little work has been done to systematically understand the challenges in the data and assess its quality issues before it is fed to an ML pipeline. Issues such as incorrect labels, synonymous categories in a categorical variable, heterogeneity in columns etc. which might go undetected by standard pre-processing modules in these frameworks can lead to sub-optimal model performance. Although, some systems are able to generate comprehensive reports with details of the ML pipeline, a lack of insight and explainability w.r.t. to the data quality issues leads to data scientists spending ~80\% time on data preparation before employing these AutoML solutions. This is why data preparation has been called out as one of the most time-consuming step in an AI lifecyle. Since the quality of data is not known at Step 0, when the data is acquired, data preparation becomes an iterative debugging process and becomes more of an art, leveraging the experience of a data scientist. Because the performance of an ML model is only as good as the training data it sees, a systematic analysis of data quality before building AI/ML models is of utmost importance.
Click Here to Download
The goal of this workshop is to attract researchers working in the fields of data acquisition, data labeling, data quality, data preparation and AutoML areas to understand how the data issues, their detection and remediation will help towards building better models. With a focus on different modalities such as structured data, time series data, text data and graph data, this workshop invites researchers from academia and industry to submit novel propositions for systematically identifying and mitigating data issues for making data AI ready.
Methods of data assessment can change depending on the modality of the data. This workshop will invite submissions for data quality assessment for different modalities: structured (or tabular) data, unstructured (such as text, log, images) data, graph structured (relational, network) data, time series data, spatio-temporal data etc. We would like to explore state-of-the-art deep learning and AI concepts such as deep reinforcement learning, graph neural networks, self-supervised learning, capsule networks and adversarial learning to address the problems of data assessment quality for ML. Following is a (non-exhaustive) list of topics that are of interest to this workshop:
We solicit submission of papers of 4 to 10 pages representing reports of original research, preliminary research results, case studies, proposals for new work and position papers.
All papers will be peer reviewed, single blind (i.e. author names and affiliations should be listed). If accepted, at least one of the authors must attend the workshop to present the work. The submitted papers must be written in English and formatted in the double column standard according to the ACM Proceedings Template, Tighter Alternate style. The papers should be in PDF format and submitted via the EasyChair submission site. The workshop website will archive the published papers.
The submitted papers must not be previously published anywhere and must not be under consideration by any other conference or journal during the workshop review process.
Submissions should be made via the Easychair system through the submission page available here: https://easychair.org/my/conference?conf=datareadinesskdd2021#
1st International Workshop on Data Assessment and Readiness for AI @ Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2021 (link)
For any queries, reach out to us at email@example.com