Introduction: Continuing from the last article, if you haven't read the first part, please click "10 minutes to show you the differences and connections between databases, data warehouses, data lakes, and data centers (1)" , then we will start the second part. Please correct me if there is any inaccuracies. 1. Data Lake The data warehouse and data lake are described and compared through order and openness. Now let's take a closer look at the data lake. 1. The origin of the data lake The data lake is mainly to solve the problem of storing the original data of the whole domain.
The word "lake" in its name expresses the mobile number list meaning of the data lake vividly. Like enterprise production data (unstructured data and structured data), business historical data, temporary data, such as IOT devices, mobile applications, and third-party data returned from traditional devices can all be "water pipes" formed by ETL tools stored in the data lake. For example, the mobile phone signaling data and the positioning data returned by GPS that the author has contacted in the work process before.
These data do not actually have a pre-defined corresponding data structure, which means that the data can be stored first without the need for data processing. Structured processing, and there is no need to specify what analysis to perform, and data practitioners will explore and try in the follow-up work. Structured data and unstructured data mentioned above, what is structured/unstructured data? Below we explain the difference and connection between the two.