Despite the huge workload, it is still necessary to take time to acquire new techniques to upgrade myself so that I will not be fouled easily. Today, I have a tutorial about the data processing of NLP. It is mainly about using Spider to extract words from conversations of more than 1000 movies.

This is my first time using Spider. In the past, I usually used a Jupyter Notebook owing to the requirement of school assessment. Spider is much more powerful than Jupyter as it can show the variable on the right hand top corner that means you do not need to print some variables to verify your code is correct or not. The only drawback is you need to select part of code to run it that is a built-in method in Jupyter.

For the data processing, there are huge procedures to implement that are difficult to absorb within a short period. I believe that I need to watch the video again to understand more. The main idea is separating the conversation into question and answer parts first and then using a dictionary and list to collect the key and the content followed by removing a special character to convenient the classification process. Also, some tokens will be added to identify the word belonging to which groups such as and .

I will spend more time for the reason behind this process.

The following links are very useful for understanding RNN & LSTM.

LSTM RNN