Internet-scale multimedia retrieval
Transkript
Internet-scale multimedia retrieval
RNDr. Jakub Lokoč, Ph.D. Siret Research Group (www.siret.cz) Department of SW Engineering Faculty of Mathematics and Physics Charles University in Prague 27.11.2012 Bezpečnostní seminář BIG DATA, Policejní akademie ČR v Praze 1 http://royal.pingdom.com statistics for 2011 2.1 billion – Internet users worldwide 3.146 billion – number of email accounts worldwide 800+ million – number of users on Facebook 555 million – number of websites (+300 million in 2011) 1 trillion – number of video playbacks on YouTube 48 hours – amount of video uploaded to YouTube every minute MM data 100 billion – Estimated number of photos on Facebook 4.5 million – Number of photos uploaded to Flickr each day 27.11.2012 Bezpečnostní seminář BIG DATA, Policejní akademie ČR v Praze 2 Storage Scalability Searching Security … Accessibility 27.11.2012 Bezpečnostní seminář BIG DATA, Policejní akademie ČR v Praze 3 Text-based techniques Advantage – scalable retrieval by inverted files Problem – missing or misguiding annotations Content-based techniques Advantage – no annotation needed, visual similarity Problem – slow retrieval for complex similarity models Hybrid techniques Text-based query + content-based reranking/exploration Content-based query + text-based filtering Adapting content-based data for inverted files 27.11.2012 Bezpečnostní seminář BIG DATA, Policejní akademie ČR v Praze 4 Document vector model User issues keywords query (google, bing, …) Efficient query evaluation using inverted files Problems Manual annotation only for small data Subjectivity of the annotation Homonyms, etc. Automatic annotation Surrounding text + linguistic methods + ontologies Content-based keyword assignment Still lot of problems to solve… 27.11.2012 Bezpečnostní seminář BIG DATA, Policejní akademie ČR v Praze 5 Text-based retrieval 27.11.2012 Bezpečnostní seminář BIG DATA, Policejní akademie ČR v Praze 6 All objects transformed into a similarity model Objects represented by descriptors (histograms, signatures) Descriptors measured by a distance measure d (Lp, SQFD, EMD) User issues an example object as a query q Feature extraction Similarity evaluation Feature extraction Objects x sorted according to the visual similarity d(q, x) How to solve efficiency problem? query object Hybrid techniques – not whole DB is searched in the CB way Distance-based indexes or filter-and-refine methods Distributed architectures needed (storage, throughput, …) 27.11.2012 Bezpečnostní seminář BIG DATA, Policejní akademie ČR v Praze 7 Hybrid techniques – reranking page 1 27.11.2012 Bezpečnostní seminář BIG DATA, Policejní akademie ČR v Praze 8 Hybrid techniques – reranking page 2 27.11.2012 Bezpečnostní seminář BIG DATA, Policejní akademie ČR v Praze 9 Hybrid techniques – exploration J. Lokoč, T. Grošup, T. Skopal Image Exploration using Online Feature Extraction and Reranking ICMR, 2012, Hongkong, China, ACM J. Lokoč, T. Grošup, T. Skopal SIR: The Smart Image Retrieval Engine SISAP, 2012, Toronto, Canada, Springer 27.11.2012 Bezpečnostní seminář BIG DATA, Policejní akademie ČR v Praze 10 When a distance measure is a metric, we can employ metric indexes for fast query processing Ball partitioning Hyperplane partitioning M-Tree, PM-Tree, LoC GNAT, M-Index Mapping methods LAESA, Omni family Zezula, P., Amato, G., Dohnal, V., Batko, M. Similarity Search: The Metric Space Approach (Springer, 2006) J. Lokoč, P. Čech, J. Novák, T. Skopal, SISAP, 2012, Toronto, Canada, Springer Cut-region: A Compact Building Block For Hierarchical Metric Indexing D. Novak, M. Batko, P. Zezula, Information systems, 2011, Elsevier Metric Index: An efficient and scalable solution for precise and approximate similarity search 27.11.2012 Bezpečnostní seminář BIG DATA, Policejní akademie ČR v Praze 11 Efficiency depends mainly on the distance distribution in the distance space Indicator of data “indexability” Intrinsic dimensionality iDIM = mean2 / (2*variance) High iDIM = bad indexability ( curse of dimensionality) q o1 p1 o2 p2 E. Chavez, G. Navarro, R. Baeza-Yates, and J. L. Marroquin Searching in Metric Spaces, ACM Computing Surveys, 2001 27.11.2012 Bezpečnostní seminář BIG DATA, Policejní akademie ČR v Praze 12 Relaxing precission Approximate search Distance space transformation Synergistic modeling Distributed computing (brutal force) Peer-to-peer architecture Parallel processing on local nodes 27.11.2012 Bezpečnostní seminář BIG DATA, Policejní akademie ČR v Praze 13 Based on various ideas Early termination for good results Reducing query radius Zezula, P., Amato, G., Dohnal, V., Batko, M. When time elapses Similarity Search: The Metric Space Approach (Springer, 2006) Accessing % of DB Also distance modifications However, for fast retrieval, the quality deteriorates rapidly 27.11.2012 Bezpečnostní seminář BIG DATA, Policejní akademie ČR v Praze 14 Nonlinear transformations of the distance space Monotonous transformation = same similarity ordering Problems with metric properties ▪ If t = x2 then 2 + 2 ≥ 4 but 22 + 22 < 42 ▪ Approximate search with MAMs T. Skopal, Unified framework for fast exact and approximate search in dissimilarity spaces, ACM Transactions on Database Systems, 2007 T. Skopal, J. Lokoč, NM-tree: Flexible Approximate Similarity Search in Metric and Non-metric Spaces LNCS 5181, Springer, 2008, DEXA, Turin ,Italy 27.11.2012 Bezpečnostní seminář BIG DATA, Policejní akademie ČR v Praze 15 Design indexable space (not only precission) Join the world of the domain experts and focus also on iDIM Many factors influence iDIM Extracted features ▪ Sampled points ▪ Kvantization ▪ Clustering Similarity measure ▪ Linear combinations ▪ Inner parameters Indexable space Let as remember also the MAP graphs Ch. Beecks, J. Lokoč, T. Seidl, T. Skopal, Indexing the Signature Quadratic Form Distance for Efficient Content-Based Multimedia Retrieval, ACM ICMR 2011, Trento, Italy, ACM J. Lokoč, Ch. Beecks, T. Seidl, T. Skopal, Parameterized Earth Mover’s Distance for Efficient Metric Space Indexing, SISAP 2011, Lipari, Italy, ACM 27.11.2012 Bezpečnostní seminář BIG DATA, Policejní akademie ČR v Praze 16 Peer-to-peer architecture Chord protocol (efficient routing) M-Chord, M-Index Map objects from U to real domain R Use chord protocol for object distribution Query causes interval queries, results merged D. Novak, P. Zezula, M-Chord: a scalable distributed similarity search structure InfoScale, 2006, ACM D. Novak, M. Batko, P. Zezula, Large-scale similarity data management with distributed Metric Index, Information Processing & Management 27.11.2012 Bezpečnostní seminář BIG DATA, Policejní akademie ČR v Praze 17 • Synergistic modeling • Distance modifications • Distributed index • Approximate search – limit routing • Local node index • Approximate search in local nodes • Parallel processing 27.11.2012 Bezpečnostní seminář BIG DATA, Policejní akademie ČR v Praze 18 … any questions? 27.11.2012 Bezpečnostní seminář BIG DATA, Policejní akademie ČR v Praze 19
Podobné dokumenty
Rozšířená realita a nové možnosti tvorby publikací
Augmented reality is one of the trends in ICT which will change the information behaviour fundamentally. It is based on the idea of adding another layer of information to the physical reality
which...
Termostat THC 02 projednotky HWD a TWE
V˘robce trvale zdokonaluje své produkty a vyhrazuje si proto právo kdykoli zmûnit jakékoli
detaily t˘kající se produktu.
Tato publikace je v‰eobecn˘m prÛvodcem instalací, pouÏitím a fiádnou údrÏbou ...
HORN-kazetový systém 840
Upevňovací šrouby kazety jsou součástí dodávky základního držáku - není potřeba je objednávat samostatně.
Ordering note:
The fastening screw is combined with the basic toolholder - no seperate orde...