Organizers
- Maxime Crochemore (King's College London, UK)
- Costas S. Iliopoulos (King's College London, UK)
Aim and Scope
This session will bring together researchers and practitioners in all areas of algorithms and data processing related to the management of massive data.Typical topics of interest include (but are not limited to):
- String processing
- Space-efficient data structures
- Coding and data compression
- External memory data processing
- Combinatorics on words
- Data mining
- Information retrieval
- Genome analysis
- Sequence assembly
Publications
- A short abstract will appear on the conference web page as soon as your paper is accepted, and a post-conference proceedings will be published by Springer LNCS .
- A journal special issue consisting of full papers will be organized for consideration in the journal Mathematics in Computer Science, published by Birkhauser/Springer.
Talks/Abstracts
-
SHORT-SS7: Compressing Big Data: When the Rate of Convergence to the Entropy Matters
Alessio Langiu (King's College London, United Kingdom), Francesca Marzi, Filippo Mignosi, Giulio Nazzicone, (Università Degli Studi Di L'Aquila, Italy)Abstract: It is well known from a theoretical point of view that LZ78 have an asymptotic convergence to the entropy faster than LZ77. A faster rate of convergence to the theoretical compression limit should lead to a better compression ratio. In effect, early LZ78-like and LZ77-like compressors behave accordingly to the theory.
On the contrary, it seems that most of the recent commercial LZ77-like compressors perform better than the other ones. Probably this is due to a strategy of optimal parsing, which is used to factorize the text and can be applied to both LZ77 and LZ78 cases, as recent results suggest. To our best knowledge there are no theoretical results concerning the rate of convergence to the entropy of both LZ77-like and LZ78-like case when a strategy of optimal parsing is used. In this paper we investigate how an optimal parsing affect the rate of convergence to the entropy of LZ78-like compressors. We discuss some experimental results on LZ78-like compressors and we consider the ratio between the speed of convergence to the entropy of a compressor with optimal parsing and the speed of convergence to the entropy of a classical LZ78-like compressor. This ratio presents a kind of wave effect that become bigger and bigger as the entropy of the memoryless source decreases but it seems always to slowly converge to one. These results suggest that for non-zero entropy sources the optimal parsing does not improve the speed of convergence to the entropy in the case of LZ78-like compressors. -
SHORT-SS7: Compressing Massive Data on a Distributed System
Sergio De Agostino (University of Rome La Sapienza)Abstract: We discuss the issue of massive data compression and decompression in the context of distributed computing. Adaptive and non-adaptive approaches are both considered.
-
SHORT-SS7: Using Statistical Search to Discover Semantic Relations of Political Lexica – Evidences from Bulgarian-Slovak EUROPARL 7 Corpus
Velislava Stoykova (Bulgarian Academy of Sciences, Institute of the Bulgarian Language)Abstract: The paper presents statistical approach to discover semantic relations of political lexica using parallel Bulgarian-Slovak EUROPARL 7 Corpus. It employs statistical properties incorporated in the Sketch Engine software to generate concordances, co-occurrences and collocations. A comparative analysis of semantic structure of political lexica investigating synonymic, attributive and reciprocal semantic relations of most frequent key words from two parallel corpora – for both Bulgarian and Slovak languages is offered. The paper address some issue related to correct terms discovery, their translations and use in political speech. Finally, more general conclusions about semantic properties of political lexica are presented.
-
REGULAR-SS7: Subquadratic-Time Algorithms for Abelian Stringology Problems
Tomasz Kociumaka, Jakub Radoszewski, Bartlomiej Wisniewski (University of Warsaw, Poland)
Abstract: We propose the first subquadratic-time algorithms to a number of natural problems in Abelian pattern matching (also called jumbled pattern matching) for strings over a constant-sized alphabet. Two strings are considered equivalent in this model if the numbers of occurrences of respective symbols in both of them, specified by their so-called Parikh vectors, are the same.
-
REGULAR-SS7: Reconstructing a Sparse Solution from a Compressed Support Vector Machine
Joachim Giesen, Soeren Laue, Jens K. Mueller, (University of Jena, Germany)
Abstract: A support vector machine is a means for computing a binary classifier from a set of observations. Here we assume that the observations are $n$ feature vectors each of length $m$ together with $n$ binary labels, one at each observed feature vector. The feature vectors can be combined into a $n\times m$ feature matrix. The classifier is computed via an optimization problem that depends on the feature matrix. The solution of this optimization problem is a vector of dimension $m$ from which a classifier with good generalization properties can be computed directly. Here we show that the feature matrix can be replaced by a compressed feature matrix that comprises $n$ feature vectors of length $\ell
-
REGULAR-SS7: Temporal Reasoning: Constraints and Graphs
Jackie Daykin (Royal Holloway College, United Kingdom), Mirka Miller, Joe Ryan, (Newcastle University, Australia),Extended abstract Abstract: Temporal reasoning finds many applications in numerous fields of artificial intelligence { frameworks for representing and analyzing temporal information are therefore important. Allen's interval algebra is a calculus for temporal reasoning that was introduced in 1983. Reasoning with qualitative time in Allen's full interval algebra is NP-complete. Research since 1995 identified maximal tractable subclasses of this algebra via exhaustive computer search and also other ad-hoc methods. In 2003, the full classification of complexity for satisfiability type problems over constraints in Allen's interval algebra was established algebraically. In this survey we review the method for deciding tractability of temporal constraint satisfaction problems based on the theory of algebraic closure operators for constraints. Additionally, we discuss graph-based temporal representations such as interval and sequence graphs.