Content analysis for regulatory data mining and beyond

Designed specifically for Life Sciences, cune-Distiller is built on Cunesoft's Content Analysis Framework and is optimized to enable comprehensive data extraction from large volumes of documents.

For IDMP and xEVMPD readiness, we have a configuration designed to automatically extract data from submission documents, SmPCs and xEVPRM messages.


Extract Data Elements

Map Vocabularies

Compare Versions

Check Compliance

Multiple Document Types

Data mining, mapping consolidation, curation, maintenance and export for SMPC’s, eCTD Module 3, xEVPRM, SPL and more


Compare SMPC versions – auto update xEVPRM messages

Data mine current SMPC’s and automatically receive notifications for changes and business rules from newer SMPC version. Available for 26 languages.


Automated Document Redaction

EMA Policy 70 requires the redaction of confidential information within clinical study reports, protocols etc.

Data Base Quality Assurance

Identifying data inconsistencies across multiple data stores and clean data automatically.


Automated eTMF Metadata Tagging

A seamless eTMF process that extracts data from clinical trial documents (i.e. physician CV’s etc.)


Safety Data Analysis

Automated analyses safety reports, adverse event reports and other safety related documentation to automate safety updates, adverse event reporting etc.


Artificial intelligence software

In the past, many were skeptical about automated text mining. And today, the common belief that text mining can do the easy 20% but does not help with the difficult 80% of data extraction is already outdated. There are significant advances in terms of technology and modern data mining algorithms.

Modern technologies like cune-Distiller use several different extraction strategies in parallel: neural networks, natural language processing, fuzzy logic, and deep learning. When combined the right way, information can be extracted from virtually any electronic document. The result is quite impressive. Moreover, it will become even better over time. Artificial intelligence software learns what result is expected and improves its settings. Of course, these techniques can be used for many additional use cases.

Some use cases for cune-Distiller:

  • Extract data from submission documents, SmPC’s & xEVPRM’s
  • Automatically maps MedDRA, GSRS, WHO codes
  • Generate IDMP ready data
  • Data quality assurance workflows
  • Easy to configure
  • Export your data into your RIM or IDMP system
cune-Distiller Main Page

Best performance by combined strategies

cune-Distiller uses many techniques including artificial intelligence (AI) to reach its full potential. Depending on the specific needs for each task, one or multiple techniques get applied.

AI technology is very complex and difficult to develop. Cunesoft invested many resources to develop an algorism which serves the needs of the life science industry. There are many possible use cases for this multipurpose tool. Below you see the six steps during the data mining process applicable in IDMP iteration 1.

IDMP Artificial inteligence

Learn more about cune-Distiller

Data Categories

*click to see the extracted Information

Clinical Particulars

Extracted Information:

Medicinal Product Name

Extracted Information:


ATC Code


Product Form

Package Description

Package Medicinal Product

Extracted Information:
Dosage Forms
Shelf Life
Storage Conditions
Active Substance
Package Item

Marketing Authorization

Extracted Information:
MA Holder
MA Number
Authorization Procedure
EU Authorization#
Legal Basis
Intensive Monitoring
Orphan Drug
Enquiry E-Mail
Enquiry Phone
MFL Code
Product Type
Authorization Date

Extracted Data Elements

cune-Distiller automatically extracts various data types from PDF documents including unstructured information such as clinical particulars.

  1. Medicinal Product Name data entities
  2. Clinical Particulars data entities
  3. Package Medicinal Product data entities
  4. Marketing Authorization data entities

Actual Data Mining POC Summary report (fully redacted):

In this 26-page report you will learn about the overall project approach as well as the results. To receive the full POC summary report please apply here.

  • Combine SmPC and xEVPRM data into one set of data elements
  • Multiple products per SmPC are extracted into separate data elements
  • Powerful quality assurance capabilities
  • Review data and make manual modifications if necessary
cune-Distiller Review Data Extraction Result

Controlled Vocabulary Mapping (EV Codes, GSRS) and MedDRA Coding Concept

The system automatically maps extracted data entities with regulatory data bases GiNAS or EV Codes. Furthermore identified indications are being mapped with MedDRA codes automatically. At any point in time users can manually override the systems suggestions.

cune-Distiller - Annotated Text
  • Reduce data extraction work from hours to minutes
  • Increase data accuracy and data quality
  • Reuse data within your RIM or IDMP system

Self-learning natural language processing (NLP) approach

The platform can be configured to extract information from additional document types other than those found in module 3 of the eCTD. Find out how Cunesoft can help you find more possibilities in your content.

Contact us for further information