NLP

Seven application scenarios of NLP annotation [illustration]

1. What is NLP annotation

NLP (Natural Language Processing, Natural Language Processing) is equivalent to the translation between machine language and human language, and realizes the purpose of human-computer communication by building a communication bridge.

From the perspective of natural language, NLP can be divided into two parts: natural language processing and natural language generation, namely understanding and generating text.

What is NLP annotation?

nlp practitioner course

Natural language processing is the study of language, context and its form, including phonology, morphology, syntax, semantics, and pragmatics. In the actual operation of the natural language understanding process, it is often necessary to overcome difficulties such as language diversity, ambiguity, robustness, knowledge dependence, and the need to link context.

Natural language generation is to automatically generate text from the acquired structured data by reading it through the idea of ​​text planning→sentence planning→implementation. That is to plan and determine the content and structure of the text to be generated, and then integrate it according to the acquired or learned sentence model, optimize it with reference to the grammatical expression of the target language to be generated, and finally complete the task of generating output.

2. Application scenarios of NLP

2.1 Information extraction: extracting important information from a specified text range, such as time, place, person, event, etc., can help people save a lot of time and cost, and is more efficient. For example, abstract generation uses computers to automatically extract text from original documents, and the results can completely and accurately reflect the central content of documents.

 

2.2 Text generation: According to different limited conditions or input content, generate data to text or text to text.

 

2.3 Intelligent Q&A: Analyze a question expressed in natural language to some extent (such as entity links, relational expressions, forming logical expressions, etc.), after the analysis is completed, look for possible candidate answers in the knowledge base, and find out by sorting wit Reply with the best answer. For example, the automatic reply customer service widely used in the e-commerce industry can filter out a large number of repeated questions by replying to many basic and repetitive questions, so that manual customer service can better serve customers.

2.4 Machine translation: It is the most well-known scene in natural language processing to obtain text in another language by automatically translating the input source language text, such as Baidu Translate, Google Translate, etc.

machine translation

2.5 Text mining: including text clustering, classification, sentiment analysis, and expressing the mined information and knowledge through a visual and interactive interface.

 

2.6 Public opinion analysis: By collecting and processing massive amounts of information, it automatically analyzes online public opinion to help analyze which topics are current hot spots, and at the same time analyzes and judges the propagation path and development trend of hot spots, so as to respond to online public opinion in a timely manner.

emotion analysis

2.7 Knowledge map: Also known as scientific knowledge map, it is called knowledge domain visualization or knowledge domain mapping map in the library and information industry. It is a series of different graphics showing the knowledge development process and structural relationship. Using visualization technology as a carrier to describe knowledge resources and their carriers, mining, analyzing, constructing, drawing and displaying knowledge and their interrelationships.

NLP 2

3. NLP text annotation method

So it is conceivable that the first stage of obtaining the data is a headache. You will find that there are too many dimensions and types, and the comments of each product may also be different. Then we need to analyze and propose commonality and basic processing principles from a higher dimension.

So we can consider it from three dimensions.

1. General principles: the basic principles that must be followed in this labeling process.

For example: the principle of simplicity/minimum principle can be understood as the smallest granularity word segmentation method used in the word segmentation process. Example: Peace Hotel can be divided into Peace Hotel as a whole, or it can be divided into Peace/Hotel, so here we divide it into Peace/Hotel.
2. Special definition: the handling method of special circumstances in the labeling process.

For example: some proper nouns that can be encountered in word segmentation will not be split.
3. Labeling requirements: explain the specific labeling process.

In the part of labeling requirements, we still consider two types of distinctions.

a. Part of speech angle.

For example: what we need to mark is divided into, which can be better suited to our needs. In this requirement, we want to analyze the user’s experience in the whole process of using the product, so what can be involved? What will the message be? First of all, emotion is a category that must exist. So what can and which characteristic words can express the customer’s situation? Then I understand the core problem. Feature words and sentiment words.

emotion analysis

2.7 Knowledge map: Also known as scientific knowledge map, it is called knowledge domain visualization or knowledge domain mapping map in the library and information industry. It is a series of different graphics showing the knowledge development process and structural relationship. Using visualization technology as a carrier to describe knowledge resources and their carriers, mining, analyzing, constructing, drawing and displaying knowledge and their interrelationships.

 

3. NLP text annotation method

So it is conceivable that the first stage of obtaining the data is a headache. You will find that there are too many dimensions and types, and the comments of each product may also be different. Then we need to analyze and propose commonality and basic processing principles from a higher dimension.

So we can consider it from three dimensions.

1. General principles: the basic principles that must be followed in this labeling process.

For example: the principle of simplicity/minimum principle can be understood as the smallest granularity word segmentation method used in the word segmentation process. Example: Peace Hotel can be divided into Peace Hotel as a whole, or it can be divided into Peace/Hotel, so here we divide it into Peace/Hotel.
2. Special definition: the handling method of special circumstances in the labeling process.

For example: some proper nouns that can be encountered in word segmentation will not be split.
3. Labeling requirements: explain the specific labeling process.

In the part of labeling requirements, we still consider two types of distinctions.

a. Part of speech angle.

For example: what we need to mark is divided into, which can be better suited to our needs. In this requirement, we want to analyze the user’s experience in the whole process of using the product, so what can be involved? What will the message be? First of all, emotion is a category that must exist. So what can and which characteristic words can express the customer’s situation? Then I understand the core problem. Feature words and sentiment words.

Table of Contents