Industry

The I-FIND project

An intelligence solution to analyse large structured and unstructured data sets to give the user the ability to retrieve information and get alerts

Giorgio Gentile

19 Feb 2018

The difficulty in analysing unstructured data

The I-FIND project aims at building an intelligence solution to analyse, in a structured and incremental way, large structured and unstructured data sets from multiple sources and different domains and to give the user the ability to retrieve information and get alerts.
In monitoring and analysing online content, big efforts have always been addressed to multimedia, while lower attention has been historically paid to texts. This depends mostly on the fact that texts (i.e. communications, data, conversations, news, web sites, blogs, etc.) are mostly unstructured data. As a consequence, the activities of analysis and processing such information become largely inefficient, onerous and time-consuming. Currently, the most frequent way to identify threat content is through keyword search/manual analysis by operators, which is highly error prone in precision, dramatically slow, highly latent in response and rapidly obsolete, making impractical the examination of thousands of new files appearing on the web every day.

The advantages offered by I-FIND

The I-FIND project aims to take advantage of semantics and other methods that can be used to analyse a large amount of unstructured data by means of concepts, categories, entities of different types, facts, and not only by keywords. The objective of the project is the development of a tool capable to:

collect heterogeneous data from different sources (Data Gatherer): it will provide interfaces for web connection and related services to collect data (i.e. Facebook, Twitter, forums, Google, blogs, web sites etc). The data will be collected through secure communication channels.
reduce the size of the data gathered in the aforementioned phase through a Semantic Data Tagger and a Data Indexer. The Semantic Data Tagger will provide a taxonomy up to fifty categories including People, Organizations, Locations and user defined categories based on the specific domain (e.g. Drugs, Weaponry, Military Devices and Weapons, Criminal associations, Public Building, phone numbers, Date, Addresses, URL, e-mail, etc) in order to catalogue and classify the information according to its content. The Data Indexer will process the information coming from the Semantic Data Tagger, classify and index it according to the policies defined by the user.
correlate and analyse the information through a set of logical rules that can be defined by the users. The correlation will retrieve the relevant information from the information available and whenever needed it will ask the Data Gatherer more specific information from the relevant sources. The Data Analyser will process the information in order to provide the user with only relevant data.
provide the results of the analysis to the final users in a user friendly graphic way. The user interface will be easily usable by non-technical users.