First Monday

The need for addressing multilingualism, ambiguity and interoperability for visual resources management across metadata platforms by Denise Russo and Abebe Rorissa

The digitization of visual resources and the creation of corresponding metadata that meets the criteria of clarity and interoperability, while also approaching the needs of the multilingual Web, are pressing concerns. Because visual resources make up a significant percentage of digital information, this paper focuses on the aforementioned concerns and proposes ways to address them, including swift progression and adoption of cohesive, multi-user, multilingual metadata standardization to improve digital access and to allow all descriptive image metadata to be approachable and translatable. We offer some recommendations such as those involved in visual resource management moving away from using primarily the English writing system based metadata schemas in order to provide flexible lexicon in non-Roman languages, which can easily be recognized and interpreted by both monolingual and multilingual users alike as well as facilitate digital metadata interoperability.


The difficulties in creating digitized versions of physical works
The potential for lexical ambiguity of the visual medium
Automatic digital image captioning
Mapping into non-Roman script
Conversion from one metadata platform to another
Selecting the metadata schema




In a portrait, you have a point of view. The image may not be literally what’s going on, but it’s representative.
— Annie Leibovitz (Source: Weich, 2006).

The challenges in using metadata schema for images rather than printed information sources are embedded within the properties of the visual medium. Generally, content has leeway over the visual resource(s) (Wootton, 2007). In addition, the lexical ambiguity of the visual data, makes it difficult at times to fully comprehend and label the classes, parts and properties (e.g., connection to physical items) of a picture (Rueschemeyer and Gaskell, 2018; Neugeberger, 2005). There is often confusion caused by semantic ambiguity among elements in different metadata schemas (Weagley, et al., 2010), although multilingualism in the metadata process has begun (Elliott, et al., 2015; Tsutsui and Crandall, 2017). However, it is yet to be fully realized (Hovy, et al. 1999; Sasaki, 2018). In addition, metadata standards such as MAchine Readable Cataloging (MARC), that have traditionally been used for encoding and sharing metadata for visual and other information resources, are being superseded by more user-friendly, yet primarily Anglo-based, metadata schemas. Consequently, it is paramount for the visual resources management community to center its focus on reducing ambiguity, advancing interoperability and increasing multilingual lexicons across metadata platforms by adopting emerging technologies and standards, such as the Resource Description Framework (RDF), eXtensible Markup Language (XML), Text Encoding Initiative (TEI), Multinodal Machine Translation, and Dublin Core (Bia, et al. 2005; Bray, et al. 2008; Chan and Zeng, 2006a, 2006b; Dublin Core Metadata Initiative [DCMI], 2017; Jaffe, 2017; Specia, et al. 2016).

In this paper, we examine some of the major obstacles to addressing interoperability, ambiguity and multilingualism for visual resource management; specifically, the difficulties in creating digitized versions of physical works, the potential for lexical ambiguity of the visual medium, the use of non-Roman script versus English orthography, automated multilingual image captioning, and the conversion from one metadata platform to another for visual resources.



The difficulties in creating digitized versions of physical works

From the perspective of artists and historians

One of the challenges of transforming art, print and other resources into digitized representations is image quality, which affects how the image is described and how it may be accessed/retrieved and its metadata shared; thereby increasing ambiguity and decreasing interoperability. For artists, art historians, scholars and others, the visual image serves as both a source of inspiration and an information resource. Therefore, the image must be of excellent quality for it to serve a spectrum of information needs of the various groups of potential users. It is quite common for both artists and art historians to use visual imagery to illustrate a particular style or technique (Beaudoin and Brady, 2011). For instance, a survey conducted by Visick, et al. (2006) revealed that artists rely more heavily on visual than text-based materials with Google as the primary go-to source for visual resources for art studio faculty (Gregory, 2007). However, in line with our arguments above about image quality, Elam (2007) found that some art historians felt that the visual images presented online were substandard, which caused ambiguity in both how the images were described and how they were accessed. In addition, poor image quality for online art history journal articles “may hinder her or her comprehension of the content, which leads to frustration, and causes the researcher to seek alternatives in the form of better-quality images in order to understand the text” [1]. Furthermore, the evidence of poor image quality was prevalent across art journal databases for online scholarly publications (McCann and Ravas, 2010). For instance, the following image represents an example of how screen pixel densities and compression can affect the quality, clarity, and definition of an image (Smus, 2012).


Figure 1: Comparison of screen pixel densities and compression on picture quality and clarity. Source: Smus, 2012.


As one can see, the images in the bottom row in Figure 1, which demonstrates a higher pixel rate and more compressed images, are the most precise in clarity. In addition, one’s choice to use JPEG, PNG, WebP or GIF will further affect the quality and clarity of the visual image, so one must determine the ‘rate of trade-off’ between all available formats and methods in order to ensure the highest picture quality while maintaining the integrity of the image or art piece being digitized (Smus, 2012).

Image use in healthcare

For those in healthcare, the visual image is the key to diagnosis, treatment and education. If the image is not properly illuminated or not in crisp focus, the information that a medical professional can glean from it can be life-changing for a patient. As Rorissa (2007) stated, “Medical researchers and practitioners rely on these multimedia records to make critical and, literally, life and death decisions” [2]. Others (Flagler, et al., n.d.) demonstrate the effect of image quality and clarity, as illustrated by the following vignette of original medical images supplied by Dr. Tim Leclair, Director Interventional Pulmonary, Intermountain Medical Center:


Figure 2: Comparison CAT Scans of Lungs, Anterior View. Source: Leclair, 2020.


To illustrate the importance of high-quality medical images on interoperability and semantics, Flagler, et al. (n.d.) randomly selected 10 multinational medical students at a local urban university to view medical slides, similar to the above vignette. Each student was asked to examine each image and answer the following questions:

  1. Which of these photos demonstrates higher clarity required for diagnosis?
  2. How would you label/describe each of these images?
  3. What diagnosis would you assign to each of these photos?

Along with an academic medical doctor, one of the authors analyzed the results from the above informal survey and determined the following:

  1. For question number 1, only 50 percent of the medical students chose the correct photo (i.e., Image B in Figure 2) when asked to determine the image with the highest level of clarity required for diagnosis.
  2. When asked to provide labels/descriptors for each of these images, there was some overlap in terms (30 percent term match), but overall, each image gleaned different descriptive terms from the students.
  3. In determining the medical diagnosis of each of the images, the image with the lower image quality (i.e., Image A in Figure 2) generated more types and descriptions of medical diagnoses than the images with higher image quality (i.e., Image B in Figure 2).

These issues clearly call for standard guidelines for “acceptable images” across fields and uses to solidify the descriptive lexicon and to heighten interoperability. It appears that for medical images, that this is indeed the case for some medical journals such as Toxicologic Pathology. To address this, efforts are underway. For instance, the BCR’s CDP Digital Imaging Best Practices Working Group (2008) created a guide for library, museum and archival practitioners. In addition, Kenney and Reiger’s (2000) book oving theory into practice: Digital imaging for libraries and archives, research studies by Iyer (2007) and Levy (2003), as well as, the Cornell University Library’s Moving theory into practice: Digital imaging tutorial (Meyer, 2008), may prove invaluable to those professionals involved in digitizing text and other materials. Also, to emphasize the importance of quality in digital imaging, Joseph (2013) argues that “publishers and librarians should work together to ensure that print journals converted to digital format are of acceptable quality.” [3].



The potential for lexical ambiguity of the visual medium

In addition to the quality of the digitized representation of the image(s), as previously mentioned, the lexical (also referred to as “semantic”) vagueness of the visual medium makes it difficult at times to fully understand and name all of the properties of a visual image (Neugebauer, 2005); this is particularly true for non-native English-language users accessing English-based metadata schemas and records for visual resources. The visual medium does not group the properties of a language, for example, when stating what a picture denotes (symbolic) versus what the picture physically represents, especially when ”nonverbal symbols are involved” [4] and meanings may be implied. Some specific examples of lexical ambiguity are as follows (Oluga, 2010):

  1. The use of inherently ambiguous lexemes, such as the use of the adjective “old” in these two sentences: “We are old friends” and “Old friends are good friends”. In this case, “old” refers to friends that have known each other for a long time, rather than friends who are no longer young.

  2. The use of some verbs with multiple class membership, illustrated by the use of “found” in the following sentence: “She found him a reliable partner”. This sentence could have two meanings: The first meaning is that “she” realized that he was a reliable partner; the second, that “she” located herself a partner who was reliable.

To cast a light on the first lexical ambiguity listed above in the use of the word, “old”; perhaps some may have captioned the below photo as, “Old friends eating lunch together.”


Figure 3: WWII Era Image: “Wipers eat lunch at the Clinton, Iowa roundhouse”. Source:, 2018.


One may wonder what a multilingual individual may discern from the use of the word, old to describe the above image scenario. Under “Subject Matter” in the Categories for the Description of Works of Art (CDWA), what Indexing terms might someone who speaks many languages use for this image? Would they vary much from an American-born English speaker?

Further, although Uniform Resource Identifiers (URIs) can be considered language-independent, they are not free of language-related challenges via their concept schemes and in their construction (Niininen, et al., 2017). When considering multilingualism, whether it be an information professional or a potential user, it is important to incorporate correct character sets for encoding non-Roman languages (Oracle, 2005). In addition, as the quests for obtaining multilingual data are insufficient to answer the requirements of international users from various cultural and linguistic heritages, it is apparent that typical Anglo-based thesauri experience a metamorphosis to meet the new polyglot inquiries (Jorna and Davies, 2001).



Automatic digital image captioning


Figure 4: Can machines think? Turing (1950).


A brief discussion of automatic digital image captioning as a potential means for reducing ambiguity and flexing interoperability across metadata platforms is necessary. Automatic image captioning is not only a recent phenomenon with some promise, it also comes with its own challenges (Bai and An, 2018; Khurram, et al., 2020). While automatic image captioning involves a number of methods in computer science and computer vision, natural language processing (NLP) is the primary artificial intelligence (AI) method that helps computers analyze human language and generate descriptions of contents of digital images (Bai and An, 2018; Casey, 2019). NLP techniques could, in theory, provide automated captioning, but while humans have the ability to derive meaning easily using their acquired knowledge base, much research has shown that both semantic and lexical ambiguity continues to plague machine learning (Jackson, 2020; Sharif, et al., 2020; Jusoh, 2018; Brownlee, 2019; Rodriguez, 2017; Anjali and Babu, 2014; Alfawareh and Jusoh, 2011).

As previously mentioned, what appears as a simple image to caption by humans is replete with many competing thoughts, many of which contain lexica other than American English. For “morphologically rich” languages such as Japanese and Arabic, the sparsity of relevant image descriptions is vexing [5].

Automatic image captioning systems utilize English lexical databases such as WordNet. WordNet connects words using “morphosemantic” links to form a semantic web of same-meaning words (Princeton University, 2010). It was found to be more accurate in bridging the language gap between two languages (Balamurali, et al., 2011). WordNet, along with two classes of deep neural networks, specifically, Convolutional networks (CNN/ConvNets) or recurrentneural networks (RNNs), may be viable training for the automation of multilingual image descriptions (Al-muzaini, et al.,, 2018; Simonyan and Zisserman, 2015; Karpathy and Li, 2013; Krizhevsky, et al.,, 2012). In addition, spaCy, which is an open-source NLP library in Python, provides a succinct, end-user focused API (Singh, n.d.), and could also be a basis for training multilingual image caption generation as it allows for users to add a new language by modifying the spaCy library’s code (spaCy, n.d.).

Deep neural network-based models can be applicable to multilingual vocabularies, hence potentially used for automatic captioning of digital images, as long as the lexicon is extensive, a task that can be accomplished utilizing Cross-lingual sentiment analysis (CLSA) (Abdalla and Hirst, 2017), multilingual bidirectional encoder representations from Transformers (m-BERT) (Gupta and Khade, 2020; Wu and Dredze, 2019) or XLM, which could optimize BERT for multilingual training (Horev, 2019). However, while the promise of automatic image captioning is real and building systems that will come close to what humans can perceive in a visual resource or image and be able to describe it for human consumption remains the ultimate goal, it will take much more effort, research, and development to be fully realized (Bai and An, 2018).



Mapping into non-Roman script

Another set of issues that vex the cataloguer and multilingual researcher alike, in terms of interoperability and lexical cohesiveness, is the cataloguing and searching for visual resources using non-Roman script. The provision of concurrent access to non-Roman subject searches has long been acknowledged and addressed by the Association for Library Collections and Technical Services (ALCTS) Task Force on non-English Access beginning in 2007 (ALCTS, 2007). However, per Yale University’s General Recommendations for Searching the Yale University Library (Yale University Library, n.d.), for example, “for research in languages using non-Roman scripts, such as Arabic, Chinese, Hebrew, Japanese, Korean, and Russian, searching by Romanization is usually the most effective and reliable way to find non-Roman language materials” (Yale University Library, n.d.). In addition, The Library of Congress has a set of Romanization tables for 75 languages and dialects (ALA-LC Romanization Tables, 2017).

Multilingual nations

For those in countries such as India which has about 23 official languages and 11 written script forms, the phonetic transliteration schemas of mapping one language into another lends itself to more frustration for the polyglot researcher (Raj and Maganti, 2009) and digital cataloger. Add into the mix the fact that many languages and dialects ‘borrow’ words from English and other languages, which creates yet another layer of frustration for the multilingual searchers of visual resources. For Arabic and Hebrew searches, for example, Yale University Library (n.d.) recommends that the user “pay special attention to the word division rules; word division can be challenging, both for non-native speakers and native speakers of each language” (Yale University Library, n.d.).

El-Sherbini (2017) suggests that linking Library of Congress Subject Headings (LCSH) terms with non-Roman subject terms using Faceted Application of Subject Terminology (FAST) terms may present improved access to non-Roman script collections. It would be reasonable to assume that users would prefer to perform a subject search in their native language. In this vein, the Ohio State University Library began to add other non-Roman languages, beginning with Arabic, into their bibliographic records. However, when the subject term is not found in the thesaurus, the Arabic language cataloguer assigns a local subject term and indicates in the subject field in the bibliographic record that the term was created locally. Once again, the concerns of interoperability and lexical clarity remain across metadata platforms. This issue will no doubt continue to be debated as the need for internationalization, illustrated by the Resource Description and Access (RDA), is compelling (Dunsire, 2016).



Conversion from one metadata platform to another

Another complex matter related to addressing ambiguity and interoperability in visual resource management is the conversion of materials, whether it be art, antique books or artifacts, into digitized, online versions. In these situations, accuracy and consistency are prioritized as the main criteria for metadata quality (Park and Tosaka, 2010). In addition, the semantics of metadata schemas greatly affect the consistency and accuracy of metadata in the conversion process.

Consequently, the conversion of one metadata platform to another for online visual resource collections adds even more challenges related to ambiguity and interoperability. For example, many in the information science field are now finding the MAchine Readable Cataloging (MARC) format no longer quite suits their needs for the description of digital, visual and other non-print materials; therefore, the conversion of MARC metadata for online visual resources into Dublin Core Elements proves a viable path (Mooney-Gonzales, 2014), but it presents with its own caveats (Shreeves, et al., 2006), such as the need for metadata interoperability standards. Caplan and Guenther (1996) state, “The standard(s) would ensure that a common core set of elements could be understood across communities, even if more specific information was required within a particular interest group” [6].

Input issues

With budget and staffing issues facing cultural heritage institutions such as libraries, the quality of metadata input by nonprofessionals is of concern to the maintenance of interoperability and reduction of semantic ambiguity. To this end, Stony Brook University has developed their Stony Brook University Library Strategic Design 2015–2018, which outlines the training and mentoring to be provided for metadata input, conversions, and standards (Stony Brook University Libraries, 2018). Further, it would be crucial to continue to study the successes of the few digital libraries that provide multilingual access and services (Wu and Chen, 2019).



Selecting the metadata schema

Determining which metadata schema to use in the conversion process is a current concern for information professionals. For example, a number of workshops addressed this topic as part of digital asset management (DAM) (Digital Clarity Group, 2019). Greenberg (2001) completed a quantitative analysis of metadata platforms that can be used for the various elements of image retrieval, including Dublin Core (DC), VRA Core, REACH, and EAD. The listed metadata schemas were contrasted with “regard to granularity and the distribution of types of elements into the four identified classes: discovery, use, authentication and administration” [7]. It was determined that in the “discovery” class, which included such elements as creator, title and subject while aiding in the identification and retrieval process, Dublin Core was one of the most successful schemas (90 percent and over) (Greenberg, 2001).

Since MARC records are generally more thorough and include a larger amount of information than DC elements, the conversion process of visual resources is relatively simple (Mooney-Gonzales, 2014). Barta-Norton (2004) pointed out that users seeking out visual resources often use “straightforward descriptors such as proper names, time, and place” [8]. However, per Park (2006), some ambiguities are inherent among some DC metadata elements, such as the Elements of Type, Format, Source and Relation, which may limit semantic interoperability. Despite this, as visual image repositories strive to both maintain and increase usage, Dublin Core remains a strong contender for visual resource conversions.

The move towards XML

n addition to Dublin Core, there have been arguments for using XML to transition out of MARC and into other standards (Guenther, 2004; Zeng and Qin, 2008). The Multilingual Web-LT Working Group successfully created the Internationalization Tag Set, Version 2.0 (World Wide Web Consortium [W3C], 2013). The ITS 2.0 generally focuses on HTML, XML-based formats,which can offer added concepts that have been created to facilitate the automated development and processing of multilingual Web content, which will most likely play a part in MARC visual resource conversions. Guenther (2004) states that “XML allows for an easy path for converting existing records and flexibility in display and further transformations” [9]. Further, Zeng and Qin (2008) suggest the usage of MARC XML to transform MARC records into XML. In addition, it has been noted that MARC’s “inflexible output process” is one of several of its limitations whilst, “in contrast, XML-based processing can easily produce different output forms” [10].


However, transitioning metadata from XML into Resource Description and Access (RDF) “point to the differences in perspective and the change in thinking that is required to manage such a move” [11]. Farnel (2015) notes that moving from XML to RDF is not simply a conversion between encoding formats; it is a translation between two different ways of organizing knowledge. It is a labor-intensive process requiring proper skills and training; therefore, this type of transition is ripe for further issues relating to interoperability and ambiguity.

To illustrate this area of need, with little training or supervision, while still a graduate student, one of the authors of the current work assisted in the conversion process of long-term, digitized archives from one metadata platform to another; specifically, CDWA to Dublin Core. As a burgeoning information professional and archivist, finding crosswalks was the first order of business (Getty Research Institute, 2017). What was noted during the conversion process was that elements that one may view as similar were actually not distinct matches. For example, in CDWA, the creation-creator-identity-nationality/culture/race elements did not align with the subject element in Dublin Core. This realization, in turn, resulted in some confusion and frustration. Additionally, she observed that several elements from CDWA were condensed into one element or repeated in Dublin Core. At times, there were simply no direct matches for many CDWA elements in Dublin Core, such as many CDWA elements in the Title or Names and Creation sections.

More questions than answers

Throughout the conversion process, although it was suggested that converting the CDWA records into VRA Core, it became apparent that Dublin Core has broader uses, both in institutions and businesses. As the process progressed, one of the authors was confronted with several questions: How much should and could someone manipulate the wording that was in the initial record to fit best into the DC record? Furthermore, is there a kind of provenance to the original descriptive wording as there is in archiving? In other words, if there is chaos, should one try to impose order on it? If there is order, wouldn’t one try to keep the files or organization set upon the materials intact? Finally, for the CDWA category of 18.1, “Descriptive Note Text”, how might one describe the image? Would the same descriptive wording be used, even if the wording does not perhaps describe the item well? How might a polyglot individual describe the image? How might the semantic or lexical differences present themselves in the narrative text? Lastly, would those differences change the discussion of the work or provide a different perspective based on his/her cultural identity?




Due to the many-faceted qualities of visual data, it appears to be unlikely that we will be able to create a solitary “perfect” metadata schema for visual materials of different genres; therefore, improving descriptive clarity and interoperability are the only viable options for ensuring effective and efficient user access to visual resources. As “well-structured and carefully mapped metadata plays a fundamental role in reaching this goal” (Woodley, 2016), the work of the International Tag Set 2.0 may play a strong role in this work (World Wide Web Consortium [W3C], 2013). As Mandal (2018) suggests, perhaps open source software, such as Omeka, would help to provide better access to multilingual images and documents.

In addition, despite the fact that visual images have little text associated with them, the majority of human users of visual information sources prefer text to search and retrieve them (O’Connor, et al., 1999; Roberts, 2001; Rorissa, 2010; Tjondronegoro, et al., 2009). Therefore, ultimately, addressing multilingualism is of strong concern as clearly the need has fast surpassed present capabilities. In our view, the primary focus of visual resource management should be on addressing multilingualism across metadata platforms; otherwise, all of the efforts and standards to improve interoperability and reduce lexical/semantic/orthographic ambiguity may not be fruitful.

In summary, the current work fills a crucial gap in the literature as there is an apparent lack of research on the ambiguity and interoperability of metadata schemas for visual resources. It is our hope that the paper communicates the unique needs of the visual resource management community and its users to be addressed by researchers and professionals whose focus and/or responsibilities involve a diverse set of visual resource users. Further, we suggest ongoing discussion and research to fully investigate and examine possibilities that would allow for a richer, more comprehensive, and global visual resource experience. End of article


About the authors

Denise Russo is an Instructional Content Developer in the Department of Distance and Online Learning at Hudson Valley Community College, SUNY. She has taught at the graduate level at UNC-Greensboro, Adelphi University and St. John’s University as a Clinical Supervisor and Lecturer with expertise in linguistics and English as a Foreign Language. She is also serving as a Trustee for the Board of the Farmingdale-Bethpage Historical Society. In addition, she is a Project Archivist via her company, InfoSpeak Consulting. Her research interests focus on the historical, multilingual and legal aspects of metadata creation and usage.
E-mail: d [dot] russo [at] hvcc [dot] edu

Dr. Abebe Rorissa is an Associate Professor and Associate Dean for Faculty Development in the College of Emergency Preparedness, Homeland Security, and Cybersecurity, University at Albany, SUNY. Prior to his current position, he worked in Ethiopia, Lesotho, and Namibia as a lecturer and systems/automation librarian. He has also consulted for academic institutions, national governments, and international organizations on various topics including information and communication technologies as well as information organization. He has published extensively in leading international journals such as the Journal of the Association for Information Science & Technology — JASIS&T, Information Processing & Management, and Government Information Quarterly. He was a member of the Board of the Association for Information Science and Technology (ASIS&T) and its Executive Committee.
Direct comments to: arorissa [at] albany [dot] edu



1. McCann and Ravas, 2010, p. 41.

2. Rorissa, 2007, p. 16.

3. Joseph, 2013, p. 162.

4. Goodman, 1997, p. 57.

5. Al-muzaini, et al., 2018, p. 69.

6. Caplan and Guenther, 1996, p. 46.

7. Greenberg, 2001, p. 918.

8. Barta-Norton, 2004, p. 27.

9. Guenther, 2004, slide 38.

10. Zeng and Qin, 2008, p. 25.

11. Hardesty, 2016, p. 51.



M. Abdalla and G. Hirst, 2017. “Cross-lingual sentiment analysis without (good) translation,” Proceedings of the Eighth International Joint Conference on Natural Language Processing, pp. 506–515, and at, accessed 14 September 2020.

ALA-LC Romanization Tables, 2017, at, accessed 13 October 2019.

H.M. Alfawareh and S. Jusoh, 2011. “Resolving ambiguous entity through context knowledge and fuzzy approach,” International Journal on Computer Science and Engineering, volume 3, number 1, pp. 410–422, and at, accessed 27 December 2020.

H.A. Al-muzaini, T.N. Al-yahya and B. Hafida, 2018. “Automatic Arabic image captioning using RNN-LSTM-based language model and CNN,” International Journal of Advanced Computer Science and Applications, volume 9, number 6.
doi:, accessed 27 December 2020.

M.K. Anjali and A.P. Babu, 2014. “Ambiguities in natural language processing,” International Journal of Innovative Research in Computer and Communication Engineering, volume 2, number 5, pp. 392–394, and at, accessed 13 September 2020.

S. Bai and S. An, 2018. “A survey on automatic image caption generation,” Neurocomputing, volume 311, pp. 291–304.
doi:, accessed 27 December 2020.

N. Barta-Norton, 2004. “MARC applications for description of visual materials,” Journal of Educational Media and Library Science, volume 42, number 1, pp. 21–36, and at, accessed 27 December 2020.

J.E. Beaudoin and J.E. Brady, 2011. “Finding visual information: A study of image resources used by archaeologists, architects, art historians, and artists,” Art Documentation, volume 30, number 2, pp. 24–36.
doi:, accessed 27 December 2020.

BCR’s CDP Digital Imaging Best Practices Working Group, 2008. “BCR’s CDP digital imaging best practices,” version 2.0, at, accessed 26 November 2019.

A.R. Balamurali, A. Joshi and P. Bhattacharyya, 2011. “Harnessing WordNet senses for supervised sentiment classification,” Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 1,081–1,091, and at, accessed 14 September 2020.

A. Bia, J. Malonda and J. Gómez, 2005. “Automating multilingual metadata vocabularies,” DCMI International Conference on Dublin Core and Metadata Applications, at, accessed 11 September 2020.

T. Bray, J. Paoli, C.M. Sperberg-McQueen, E. Maler and F. Yergau, (editors), 2008. “Extensible markup language (XML) 1.0,” Fifth edition (26 November), at, accessed 11 September 2020.

J. Brownlee, 2019. “Promise of deep learning for natural language processing” (7 August), at, accessed 14 September 2020.

P. Caplan and R. Guenther, 1996. “Metadata for Internet resources: The Dublin Core metadata elements set and its mapping to USMARC,” Cataloging & Classification Quarterly, volume 22, numbers 3–4, pp. 43–58.
doi:, accessed 27 December 2020.

K. Casey, 2019. “How to explain natural language processing (NLP) in plain English” (17 September), at, accessed 13 September 2020.

L.M. Chan and M.L. Zeng, 2006. “Metadata interoperability and standardization — A study of methodology, part I: Achieving interoperability at the schema level,” D-Lib Magazine, volume 12, number 6, at, accessed 26 November 2019.

L.M. Chan and M.L. Zeng, 2006. “Metadata interoperability and standardization — A study of methodology, part II: Achieving interoperability at the record and repository levels,” D-Lib Magazine, volume 12, number 6, at, accessed 27 November 2019.

Digital Clarity Group, 2019. “Speaking engagements, webinars, and webcasts,” at, accessed 13 October 2019.

Dublin Core Metadata Initiative (DCMI), 2017. “Advancing metadata practice: Quality, openness, interoperability: 2017 Proceedings of the International Conference on Dublin Core and Metadata Applications,” at, accessed 13 October 2019.

G. Dunsire, 2016. “Towards an internationalization of RDA management and development,” Italian Journal of Library, Archives and Information Science, volume 7, number 2, pp. 308–331, and at, accessed 13 October 2019.

B. Elam, 2007. “Readiness or avoidance: Eresources and the art historian,” Collection Building, volume 26, number 1, pp. 4–6.
doi:, accessed 27 December 2020.

D. Elliott, S. Frank and E. Hasler, 2015. “Multilingual image description with neural sequence models,” arXiv:1510.04709 (18 November), at, accessed 11 September 2020.

M. El-Sherbini, 2017. “Improving resource discoverability for non-Roman language collections” (22 October), at, accessed 13 October 2019.

S. Farnel, 2015. “Metadata at a crossroads: Shifting ‘from strings to things’ for Hydra North,” slideshow presented at the Open Repositories, at, accessed 13 October 2019.

N.D. Flagler, E. Ney, B.W. Mahler and R.R. Maronpot, n.d. “Publication images: The good, the bad and the impossible,” National Institute of Environmental Health Sciences, at, accessed 28 November 2019.

Getty Research Institute, 2017. “Metadata standards crosswalk,” at, accessed 13 October 2019.

N. Goodman, 1997. Languages of art: An approach to a theory of symbols. Second edition. Indianapolis, Ind.: Hackett.

J. Greenberg, 2001. “A quantitative categorical analysis of metadata elements in imageapplicable metadata schemas,” Journal of the American Society for Information Science and Technology, volume 52, number 11, pp. 917–924.
doi:, accessed 27 December 2020.

T.R. Gregory, 2007. “Under-served or under-surveyed: The information needs of studio art faculty in the southwestern United States,” Art Documentation, volume 26, number 2, pp. 57–66.
doi:, accessed 27 December 2020.

R. Guenther, 2004. “New and traditional descriptive formats in the library environment,” presentation at International Conference on Dublin Core and Metadata Applications, at, accessed 27 December 2020.

S. Gupta and N. Khade, 2020. “BERT Based Multilingual Machine Comprehension in English and Hindi,” arXiv:2006.01432 (2 June), at, accessed 14 September 2020.

J.L. Hardesty, 2016. “Transitioning from XML to RDF: Considerations for an effective move towards linked data and the semantic Web,” Information Technology and Libraries, volume 35, number 1, pp. 51–64.
doi:, accessed 27 December 2020.

R. Horev, 2019. “XLM — Enhancing BERT for cross-lingual language model: Cross-lingual language model pretraining” (11 February), at, accessed 14 September 2020.

E. Hovy, N. Ide, R. Frederking, J. Mariani and A. Zampolli (editors), 1999. “Multilingual information management: Current levels and future abilities,” at, accessed 27 December 2020.

H. Iyer, 2007. “Core competencies for visual resources management,” at, accessed 13 October 2019.

P. Jackson, 2020. “Understanding understanding and ambiguity in natural language,” Procedia Computer Science, volume 169, pp. 209–225.
doi:, accessed 14 September 2020.

A. Jaffe, 2017. “Generating image descriptions using multilingual data,” Proceedings of the Conference on Machine Translation (WMT), Volume 2: Shared Task Papers, pp. 458–464, and at, accessed 11 September 2020.

K. Jorna and S. Davies, 2001. “Multilingual thesauri for the modern world — no ideal solution?” Journal of Documentation, volume 57, number 2, pp. 284–295.
doi:, accessed 14 September 2020.

L.E. Joseph, 2013. “Image and figure quality: A study of Elsevier’s Earth and planetary sciences electronic journal back file package,” Library Collections, Acquisitions, & Technical Services, volume 30, numbers 3–4, pp. 162–168.
doi:, accessed 13 October 2019.

S. Jusoh, 2018. “A study on NLP applications and ambiguity problems,” Journal of Theoretical and Applied Information Technology, volume 96, number 6, 1,486–1,499, and at, accessed 27 December 2020.

A. Karpathy and F-F. Li, 2013. “Automated image captioning with ConvNets and recurrent nets,” slides at, accessed 14 September 2020.

A.R. Kenney and O.Y. Rieger, 2000. Moving theory into practice: Digital imaging for libraries and archives. Mountain View, Calif.: Research Libraries Group.

I. Khurram, M.M. Fraz, M. Shahzad, N. M. Rajpoot, 2020. “Dense-CaptionNet: A sentence generation architecture for fine-grained description of image semantics,” Cognitive Computation.
doi:, accessed 12 September 2020.

A. Krizhevsky, I. Sutskever and G. E. Hinton, 2012. “ImageNet classification with deep convolutional neural networks,” at, accessed 27 December 2020.

T. Leclair, 2020. “CAT scan of lungs,” image, Interventional Pulmonary, Intermountain Medical Center, Murray, Utah.

T. Levy, 2003. “Moving theory into practice: Digital imaging for libraries and archives,” Journal of Documentation, volume 59, number 4, pp. 497–498.
doi:, accessed 13 October 2019.

S. Mandal, 2018. “Designing and developing digital content management system through open source software and standards,” International Journal of Next Generation Library and Technologies, volume 4, number 1, at, accessed 13 October 2019.

S. McCann and T. Ravas, 2010. “Impact of image quality in online art history journals: A user study,” Art Documentation, volume 29, number 1, pp. 41–48.
doi:, accessed 27 December 2020.

L. Meyer, 2008. “Moving theory into practice: Digital imaging tutorial, Cornell University Library/Department of Preservation & Conservation 2000,” Microform & Imaging Review, volume 30, number 1.
doi:, accessed 27 December 2020.

B. Mooney-Gonzales, 2014. “The conversion of MARC metadata for online visual resource collections: A case study of tactics, challenges and results,” Library Philosophy and Practice, at, accessed 28 November 2019.

T. Neugebauer, 2005. “Metadata for image resources,” at, accessed 28 November 2019.

S. Niininen, S. Nykyri and O. Suominen, 2017. “The future of metadata: Open, linked, and multilingual — the YSO case,” Journal of Documentation, volume 73, number 3, pp. 451–465.
doi:, accessed 27 December 2020.

B.C. O’Connor, M.K. O’Connor, and J.M. Abbas, 1999. “User reactions as access mechanism: An exploration based on captions for images,” Journal of the American Society for Information Science and Technology, volume 50, number 8, pp. 681–697.

S.O. Oluga, 2010. “Ambiguity in human communication: Causes, consequences and resolution,” Malaysian Journal of Communication, volume 26, number 1, pp. 37–46, and at, accessed 29 November 2019.

Oracle, 2005. “Comparing unicode character sets for database and datatype solutions,” at, accessed 29 November 2019.

J.-R. Park, 2006. “Semantic interoperability and metadata quality: An analysis of metadata item records of digital image collections,” Knowledge Organization, volume 33, number 1, pp. 20–33.

J.-R. Park and Y. Tosaka, 2010. “Metadata quality control in digital repositories and collections: Criteria, semantics, and mechanisms,” Cataloging & Classification Quarterly, volume 48, number 8, pp. 696–715.
doi:, accessed 27 December 2020.

Princeton University, 2010. “What is WordNet?” at, accessed 11 September 2020.

A.A. Raj and H. Maganti, 2009. “Transliteration based search engine for multilingual information access,” Proceedings of CLIAWS3, Third International Cross Lingual Information Access Workshop, pp. 12–20, and at, accessed 29 November 2019.

H.E. Roberts, 2001. “A picture is worth a thousand words: Art indexing in electronic databases,” Journal of the American Society for Information Science and Technology, volume 52, number 11, pp. 911–916.
doi:, accessed 27 December 2020.

J. Rodriguez, 2017. “Ambiguity in natural language processing” (23 March), at, accessed 13 September 2020.

A. Rorissa, 2010. “A comparative study of Flickr tags and index terms in a general image collection,” Journal of the American Society for Information Science and Technology, volume 61, number 11, pp. 2,230–2,242.
doi:, accessed 27 December 2020.

A. Rorissa, 2007. “Image retrieval: Benchmarking visual information indexing and retrieval systems,” Bulletin of the American Society for Information Science and Technology, volume 33, number 3, pp. 15–17.
doi:, accessed 29 November 2019.

S.-A. Rueschemeyer and M.G. Gaskell (editors), 2018. Oxford handbook of psycholinguistics. Second edition. Oxford: Oxford University Press.
doi:, accessed 27 December 2020.

F. Sasaki, 2018. “Metadata for the multilingual Web,” In: G. Rehm, F. Sasaki, D. Stein and A. Witt (editors). Language technologies for a multilingual Europe: TC3 III. Berlin: Language Science Press, pp. 43–53, and at, accessed 13 October 2019.

N. Sharif, U. Nadeem S.A.A. Shah, M. Bennamoun and W. Liu, 2020. “Vision to language: Methods, metrics and datasets,” In: G.A. Tsihrintzis and L.C. Jain (editors). Machine learning paradigms: Advances in deep learning-based technological applications. Cham, Switzerland: Springer, pp 9–62.
doi:, accessed 17 September 2020.

S.L. Shreeves, J. Riley and L. Milewicz, 2006. “Moving towards shareable metadata,” First Monday, volume 11, number 8, at, accessed 5 December 2019.
doi:, accessed 5 December 2019.

K. Simonyan and A. Zisserman, 2015. “Very deep convolutional networks for large-scale image recognition,” arXiv:1409.1556 (10 April), at, accessed 14 September 2020.

T. Singh, n.d. “Natural language processing with spaCy in Python,” at, accessed 14 September 2020.

B. Smus, 2012. “High DPI images for variable pixel densities” (22 August), at, accessed 13 October 2019.

spaCy, n.d. “Adding languages,” at, accessed 12 September 2020.

L. Specia, S. Frank, K. Sima’an and D. Elliott, 2016. “A shared task on multimodal machine translation and crosslingual image description,” Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, pp. 543–553, and at, accessed 12 September 2020.

Stony Brook University Libraries, 2018. “Research & subject guides: 2016–2017 strategic plan goals, cataloging & metadata,” at, accessed 13 October 2019.

S. Tsutsui and D. Crandall, 2017. “Using artificial tokens to control languages for multilingual image caption generation,” arXiv:1706.06275 (20 June), at, accessed 11 September 2020.

D. Tjondronegoro, A. Spink and B.J. Jansen, 2009. “A study and comparison of multimedia Web searching: 1997–2006,” Journal of the American Society for Information Science and Technology, volume 60, number 9, pp. 1,756–1,768.
doi:, accessed 5 December 2019.

A.M. Turing, 1950. “Computing machinery and intelligence,” Mind, Volume 59, number 236, pp. 433–460.
doi:, accessed 14 September 2020.

R. Visick, J. Hendrickson and C. Bowman, 2006. “Seeking information during the creative process — A pilot study of artists,” University of Washington, Unpublished research paper.

J. Weagley, E. Gelches and J.-R. Park, 2010. “Interoperability and metadata quality in digital video repositories: A study of Dublin Core,” Journal of Library Metadata, volume 10, number 1, pp. 37–57.
doi:, accessed 27 December 2020.

D. Weich, 2006. “Annie Leibovitz puts down camera, talks,” Powell’s (10 October), at, accessed 12 September 2020.

M.S. Woodley, 2016. “Introduction to metadata: Connecting people and information,” at, accessed 13 October 2019.

C. Wootton, 2007. Developing quality metadata: Building innovative tools and workflow solutions. Boston, Mass.: Focal Press.

World Wide Web Consortium (W3C), 2013. “Internationalization tag set (ITS) version 2.0” (29 October), at, accessed 13 October 2019.

A. Wu and J. Chen, 2019. “Sustaining multilinguality: Case studies of two American multilingual libraries,” iConference 2019 Proceedings, at, accessed 13 October 2019.
doi:, accessed 27 December 2020.

S. Wu and M. Dredze, 2019. “Beto, bentz, becas: The surprising cross-lingual effectiveness of BERT,” Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the Ninth International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), at, accessed 12 September 2020.

Yale University Library, n.d.. “Searching in non-Roman script,” at, accessed 13 October 2019.

M.L. Zeng and J. Qin, 2008. Metadata. New York: Neal-Schuman.


Editorial history

Received 6 December 2019; revised 19 September 2020; accepted 24 September 2020.

To the extent possible under law, this work is dedicated to the public domain.

The need for addressing multilingualism, ambiguity and interoperability for visual resources management across metadata platforms
by Denise Russo and Abebe Rorissa.
First Monday, Volume 26, Number 1 - 4 January 2021