Skip to main content

How machine translation can help bring Covid-19 info to the masses

After obtaining his PhD from Dublin City University (DCU) in 2011, Dr Rejwanul Haque entered into the language technology industry and worked on industrial machine translation (MT) solutions for seven years.

He re-joined DCU’s MT team in 2018 and worked as an industry-oriented postdoc at SFI’s Adapt research centre. Since January 2019, he has been working as a research fellow with a Marie SkÅ‚odowska-Curie Fellowship.

His research is supported by the Euraxess Hosting Agreement Scheme, which enables approved research enterprises to recruit experts from outside the European Economic Area for their R&D departments in Ireland.

‘In cases where language is a barrier to access of pertinent information, MT may help people assimilate information published in different languages’
– DR REJWANUL HAQUE

What inspired you to become a researcher?

Prior to my PhD, I obtained my degree from Jadavpur University, India, where I worked as a research engineer with the Ministry of Communication and Information Technology. This was part of a sponsored consortia-based project, ‘cross-lingual information access’ (CLIA), for two years.

From that time, I confronted many profound challenges in relation to the project and worked on different natural language processing (NLP) problems such as parts-of-speech tagging, named entity recognition and MT. My interest in this area of research grew out over time during my tenure in the CLIA project.

Can you tell us about the research you’re currently working on?

I primarily work on MT, which is arguably regarded as the most difficult problems scientists could ever contemplate doing on a computer.

These problems include terminology translation, knowledge distillation, interactive MT, low-resource MT, data selection and domain adaptation. Although my primary research area is MT, my interests also include other NLP problems such as question-answering, social media analytics and information extraction.

In your opinion, why is your research important?

Every day more people are becoming infected and dying across the world due to Covid-19 pandemic. In cases where language is a barrier to access of pertinent information, MT may help people assimilate information published in different languages.

As part of the DCU MT team, we have recently built eight multilingual MT engines that are specifically trained to translate Covid-19 material between German, French, Italian, Spanish into English, as well as the reverse direction.

We have enabled online public access to the systems where users can select their source and target languages via a drop-down menu, and paste their desired text into the source panel. The language-appropriate MT server carries out the translation, and the translation is instantaneously retrieved to appear in the source panel.

We have already published this research in ArXiv with the hope of contributing to the fight against Covid-19 and to have a direct impact on society.

What commercial applications do you foresee for your research?

The current state-of-the-art neural approaches to MT typically require millions of parallel sentences and powerful large-scale clusters or GPUs for training, which has been viewed as a ‘non-green’ technology. The cost for GPUs is too high, making many SMEs unable to deploy this cutting-edge innovation in their translation pipeline.

Also, use of large data increases training and experiment time, making it more difficult for MT users such as translation service companies and MT researchers.

Our technology would help SMEs or individual users to select a small but representative training data for building [neural machine translation] systems on resource-limited devices to provide high-quality services. In other words, our research helps reduce the MT training costs.

What are some of the biggest challenges you face as a researcher in your field?

In recent years, we have witnessed the change of MT technology from statistical methods to deep learning methods, with higher demands on computing and data resources, such as powerful hardware and massive amounts of parallel data.

For example, Google recently built a massive multilingual neural MT system with 25bn-plus sentence pairs and 50bn-plus model parameters.

Many SMEs are unable to afford such computing resources, and this prohibits them to deploy this technology in their production. This is also a problem in academia as most of the research institutes cannot afford such computing resources.

Are there any common misconceptions about this area of research?

Nowadays, there are many concerns over the fact that MT poses a threat to the services that professional translators currently offer. However, it would never be the case that the MT systems would generate error-free translations one day. It will always make some mistakes, and never replace the professional translators who would always be the essential part of the industrial translation workflows.

What are some of the areas of research you’d like to see tackled in the years ahead?

Term translation is a well-known problem in MT research. A suitable solution to integrate terminology into MT would certainly impact the translation industry and be a breakthrough in MT research.

Neural MT training can benefit from large-scale data, although this has many downsides. It relies on large-scale powerful hardware such as GPUs. The cost for such hardware is quite high, which makes many SMEs unable to afford these resources.

Selecting a smaller representative subset from large-scale training data would speed up training and lower computation cost, especially benefiting SMEs who have limited computational resources and use neural MT in their production.

Are you a researcher with an interesting project to share? Let us know by emailing editorial@siliconrepublic.com with the subject line ‘Science Uncovered’.

The post How machine translation can help bring Covid-19 info to the masses appeared first on Silicon Republic.



Udimi - Buy Solo Ads from Silicon RepublicSilicon Republic https://ift.tt/2FMA0Hj
via IFTTT

Comments

Popular posts from this blog

9 VCs in Madrid and Barcelona discuss the COVID-19 era and look to the future

Spain’s startup ecosystem has two main hubs: Madrid and Barcelona. Most observers place Barcelona first and Madrid second, but the gap appears to close every year. Barcelona has benefitted from attracting expats in search of sun, beach and lifestyle who tend to produce more internationally minded startups. Madrid’s startups have predominantly been Spain or Latin America-focused, but have become increasingly international in nature. Although not part of this survey, we expect Valencia to join next year, as city authorities have been going all-out to attract entrepreneurs and investors. The overall Spanish ecosystem is generally less mature than those in the U.K., France, Sweden and Germany, but it has been improving at a fast clip. More recently, entrepreneurs in Spain have moved away from emulating success in pursuit of innovative technologies. Following the financial crisis, the Spanish government supported the creation of startups with the launch of FOND-ICO GLOBAL, a €1.5 billi...

Emulating USB Dongle – Introducing HASP Dongle Emulator Software

Information Technology Blog - - Emulating USB Dongle – Introducing HASP Dongle Emulator Software - Information Technology Blog Over the years the methods used by software developers and producers to limit the amount of users to a specific number in a licensing agreement have become more complex.  The aim of copy protection is to protect the intellectual rights and financial investment of the individual developers and manufacturing companies.   A way of getting around this protection is to reproduce the media through which you can deliver the application to other users,  meaning that the software can be replicated far in excess of that specified in the license. One of the most common methods has been to use hardware keys or dongles which will enable the user to activate an application, unlocking its full functionality without using a device.  In addition, it offers good protection against attempts to pirate the software. In this article, we will look at th...

Advantages and Disadvantages of using Vouchers in eCommerce

Information Technology Blog - - Advantages and Disadvantages of using Vouchers in eCommerce - Information Technology Blog To decide whether vouchers and coupons are the right tool to add to your online marketing strategy, it is essential that you consider the benefit and the cost of using coupons. In this article, we will use Gtech coupon marketing strategy as an example a successful coupon strategy.  Also check out these great books on coupon codes for ecommerce . Advantages of Using Coupons Increase Sales This is the obvious benefit. Coupons serve to increase sales especially for high ticket items such as luxury gadgets. Gtech discount codes is a good example as Gtech quality is reflected on the price of both the Gtech eBike and Gtech AirRam. In order to boost sales, the company releases 10% off offers certain times of the year when online sales would normally be low. Enlarge Email List Acquiring a customer can be expensive in terms of advertising and marketing. If yo...