The SAMAR Project by French Business Consortium 'Cap Digital': a Platform for Managing Arabic Language Multimedia Information
TEMIS Text Analytics Technology Automates the Analysis of Arabic Language Information
NEW YORK, March 16, 2010 /PRNewswire/ -- TEMIS, the leader in Text Analytics solutions for the Enterprise, today announced it is playing an active role in the SAMAR Project, a government funded multimedia content enrichment initiative of Cap Digital, the French consortium for digital content research and development.
Low volume of Arabic language content in North Africa
The information industry is still developing in North African countries. The volume of content authored in Arabic language is low. Newspapers play a key role in the development of the Arabic language internet, accounting for 40% of its content[1]. However, the production of content in Arabic faces steady demand with the growth of Arabic speaking web users from these countries. Outside North Africa, press agencies are trying to increase their range of Arabic information sources.
Opening new horizons for Arabic content
The SAMAR project was initiated by Agence France-Presse (AFP), the Paris-based international news service. AFP has plans to expand its online information portal to multilingual content including Arabic. Arabic language structure is extremely complex and current technologies do not allow for an optimal semantic tagging. It is also complex to connect Arabic content to information in other languages.
A semantic analysis is necessary to index Arabic language content and make it accessible and findable through online search.
SAMAR, the platform for Arabic multimedia information management
The SAMAR project team will apply new technology to the wealth of Arabic language output of AFP, around 1 million news articles totalling about 150 million words, as well as to a wide set of radio and TV multimedia channels.
The Arabic language challenge
The complexity of the Arabic language poses a number of challenges. The SAMAR project team has to unlock many techno-linguistic issues such as:
- The transcription of Arabic vowels in text for search - The transcription of speech to text in Arabic, a language with many distinct dialects - The cross-match of French and Arabic entities (e.g. People, Places, Companies, etc.) A team of leading experts The SAMAR project gathers several complementary and strategic partners: - AFP, who provides multimedia content and radio/TV output in Arabic language - VECSYS, specialized in text conversion from audiovisual contents ('speech to text') - VECSYS RESEARCH, expert in literary and dialectal Arabic language processing - TEMIS, specialized in knowledge extraction, information analysis and discovery - NUXEO, specialized in multimedia content management - ANTIDOT, expert in cross-lingual search (French<->Arabic; English<->Arabic) - MONDECA, expert in ontology management and taxonomies - CNRS LLACAN (African languages and cultures), expert in the analysis of literary and dialectal Arabic - LIMSI, specialized in automated translation based on training - INALCO CERMOM (Research Center - Middle East and Mediterranean), expert in Arabic language and validating results - GREYC UMR CNRS 6072, specialized in automated translation (Arabic->English; Arabic->French)
TEMIS for Arabic text analysis
In this project, TEMIS provides its core Text Mining technology to analyze content in Arabic language. TEMIS software solution Luxid(R) understands Arabic syntax to extract relevant entities, topics, facts and relationships. Luxid(R)'s powerful analysis is based on the use of domain and language-specific annotators. The SAMAR project benefits from TEMIS' years of experience working with Arabic language, making it possible to design powerful and efficient annotators.
Emerging markets
This platform could ultimately be used by virtually all Arabic media for organizing and enriching their information production. The platform also represents a unique source of strategic information for companies expanding their business operations into the Middle East and North Africa promising markets.
About Cap Digital
Cap Digital is the French business cluster for digital content and services in Paris and the Ile de France region. Cap Digital is a non-profit organization. Cap Digital's 500 members are primarily innovative SMEs but also count major universities, higher education establishments, research labs, and corporations.
Cap Digital's members represent the digital industry's most active players in digital content. Nine vibrant member communities make a vital contribution to the strategy and direction of the cluster: Image, Sound and Interactivity, Video Games, Knowledge Engineering, Culture, Press, and Media, e-Learning and e-Training, Collaborative Technology & Intelligence, Mobile Lifestyle & Services, Robotics and Communicating Objects, and Digital Design.
Cap Digital provides members with essential information, networks, and resources. These include ongoing competitive intelligence, training, partnerships, funding solutions, and project reviews. Partnerships with other leading European clusters, at a structural and project level are an essential element of Cap Digital's strategic activities.
About TEMIS
TEMIS is the leading provider of Text Analytics software solutions for the Enterprise. Its cutting-edge solution Luxid(R) addresses the needs of Life Sciences, Enterprise, Publishing and Homeland Security. Its powerful information intelligence capabilities power strategic corporate activities such as Competitive Intelligence, Scientific Discovery, Opinion Mining, Voice of the Customer and Content Publishing. Luxid(R) turns unstructured data into actionable knowledge, enabling advanced content analysis and strategic information discovery.
Founded in 2000, TEMIS operates in the United States, France and Germany, and is represented worldwide through its network of certified partners.
TEMIS' innovative solutions have attracted the business of leading organizations such as Agence France-Presse, BASF, Bayer Schering Pharma, BNP Paribas, Boehringer Ingelheim, CARMA International, Convera, Editions Lefebvre-Sarrut, Elsevier, EMC, Europol, French Ministry of Defence, French Ministry of Finance, Ingenuity, Invest in France Agency, Liquid Campaign, Merck Serono, Nature Publishing Group, Novartis, Philip Morris International, PSA Peugeot-Citroen, Roche Diagnostics, Sanofi-Aventis, Solvay Pharmaceuticals, Springer Science+Business Media, The McGraw-Hill Companies, and Thomson Reuters.
---------------------------------
[1] « Internet en langue arabe : espace de liberté ou fracture sociale ? », Aïta S., revue trimestrielle MAGHREB-MACHREK, no. 178, 2003-2004.
SOURCE TEMIS
WANT YOUR COMPANY'S NEWS FEATURED ON PRNEWSWIRE.COM?
Newsrooms &
Influencers
Digital Media
Outlets
Journalists
Opted In
Share this article