Data mining
Publishing or mining data is the extraction of unknown or useful information from large datasets. Due to increased internet phishing, there have been critical threats to the propagation of confidential information over the web. Consequently, privacy preservation in data publishing, PPDP, has become a significant concern for exchanging sensitive information over a network. Therefore, scientists developed various privacy preservation approaches to overcome such issues. One of the techniques is the k-anonymity designed to avert disclosure of confidential information and consequently assist in user de-identification. Nonetheless, this paper provides an analysis of an article on PPDP, and the k-anonymity approach proposed to mitigate the concerns of data mining privacy.
Selected Article
Background of the article
The article on “On-line spatial and temporal k-anonymity,” focuses on the security of the location of users over the web. It mentions that the increasing use of GPS fixed smartphones has popularized Location-Based Services, LBS, which have, in turn, generated concern on location privacy (Zhang, 2017). Correspondingly, spatial and temporal k-anonymity has often been applied as a privacy preservation approach to mitigate the concerns. However, it could not prevent inference attacks on broad temporal k-anonymity dataset analyses (Zhang, 2017). The book of Romans 14:1, “As for the weak in faith, accept them, but no to argue over opinions,” teaches on embracing weaknesses that people may have (Longenecker, 2016). Similarly, the scientists accepted the limitation of the approach and instead continued to modify it further. This saw the creation of NOSTK.
Preliminary Information
The spatial anonymity is a type of k-anonymization method that hides or cloaks the identification of the requestor regarding their location and time of the request. Accordingly, pseudo identifiers are usually applied to discover the requestor’s information (Aggarwal & Philip, 2008; Fung et al., 2010; Zhang, 2017). The anonymity is broadly employed in LBS continuous and snapshot queries. The snapshot query SnAS is expressed as (CR, TC, UP), where UP ranges from U1 to Uk, indicating a false k requestor name (Zhang, 2017). That is if a record contains the value U, then at least other k-1 files will also include the same value (Fung et al., 2010). Besides, CR is a range of ⟨Cell1,…, Cellm⟩ showing the obscure area having m cell grids surrounding the k-user location. And, TC equals ⟨TI1, …, TIn⟩ indicating the varying cloaking with an n period.
The continuous query is given by CoAS = ⟨SnAS1 to SnASs⟩, where SnASi for (1 ≤ i ≤ s) is the snapshot query dataset. The sequence rules extracted from the LBS sequence cloaking areas is thus defined as SeR={⟨gc1,…,→gcn⟩, ⟨Supp, Conf⟩}, where gci for (1 ≤ i ≤ n) is the cell grid for requisition (Zhang, 2017). The gcs are respectively described as the descendants and antecedents of SeR. The conf and supp independently represent the confidence and support of the SeR. The intersection of a gcn with a PSR, privacy-sensitive region leads to a PSSR, privacy-sensitive sequence rule.
The proposed method
NOSTK comprises of two aspects. The first is the system architecture consisting of two phases, namely, the on-line approach and the off-line evaluation (Zhang, 2017). The anonymity application in the on-line request continuously generates unidentified datasets depending on the LBS anonymous service command, until all the confidential sequence rules are hidden (Zhang, 2017). This therefore prohibits mining of sensitive sequence rules from the end. Correspondingly, the closing dataset sequence will, therefore, consist only of the recently generated anonymity database and the initial sequence database hence eliminating the destination location prediction attack.
On the other hand, an off-line analysis keeps anonymity databases that the continuous LBS queries generate. It also collects privacy-sensitive rules by connecting the published temporal series-rules with specified privacy-conscious areas (Zhang, 2017). The functions of the on-line service and the off-line examination in restricting inference attacks on destination locations follow a biblical concept on shielding against attack. From the book of Psalms 32: 7, “You are my hiding place; you will shield me from trouble and surround me with deliverance,” inform individuals that God is their protector, and therefore no harm will come to them (Anderson, 2020). Accordingly, the two phases assist in hiding the location of responders and therefore are significant in keeping user identity hence assuring them of communication privacy.
The other aspect involves avoidance and generalization. The avoidance or suppression concept generates unknown on-line datasets and aims at completely removing the value attribute (Aggarwal, & Philip, 2008; Zhang, 2017). Therefore avoiding the creation of an area nearby PSSR grid cells is significant when establishing a cloaking section of anonymity datasets. Conversely, the generalization philosophy comprises of three sub-types, categorized considering the grid cell with the location of the requestor. First, if the cell indicates an nPSSR grid, then a minimal generalization is applied (Zhang, 2017). Illustratively, the initiated cloaking section needs to have few close nPSSR cell grids as possible. Second, if the new cell grid includes privacy susceptible regions, then the network is considered a PSR, and the concept applied is a maximum generalization. Third, if the current framework is a PSSR, then a conventional generalization is implemented (Zhang, 2017). Ignoring a PSSR grid cell near the current cell generates the cloaking region. Generally, the principle aims at inferring attribute values to a range to reduce the representation granularity (Aggarwal & Philip, 2008).
How NOSTK Addresses and Mitigates Data Mining Privacy Concerns
The mitigation approach follows a workflow in generating anonymity in datasets. First, when an LBS user asks for anonymity, the first S-1 acknowledges the user’s request. After which the second explores the user’s latest location through a spatial matching technique (Zhang, 2017). A cell, C-1, then investigates other requestors in the current cell to examine the satisfaction of the k-value, which, if true, the process proceeds directly to an S-8, which generates the anonymity dataset. If otherwise, then the process moves to S-3, which inspects grids close to the current cell in the anticlockwise direction. An S-4 then clears any PSSR cells from the outcome, and this is where the avoidance approach is applied (Zhang, 2017). The succeeding process then assesses whether the latest cell grid is of a PSR or PSSR cell and then picks a particular generalization approach.
Consequently, the selected principle neighboring grid cells are sorted as either an ascending, descending, or regular order at S-6. These orders respectively correspond to maximum, minimum, or normal generalization principles (Zhang, 2017). A second cell C-2, then examines if all the classified cells have been applied. If not, the data progresses to a first S-7 and to S-3, which sums the sorted cells one after the other, generating a cloaking section. This follows by a simultaneous inspection of the satisfaction of the k-value. Otherwise, if the outcome in C-2 is right, then the process goes to the eighth sequence where the unidentified dataset is generated (Zhang, 2017). However, if all the organized grid cells are utilized and the generation of dataset anonymity has occurred, the data then moves to a second S-7, which creates a failure result. And S-9 returns the outcome. The organized anonymity generation process above conforms to the biblical notion of order. The book of 1st Corinthians 14:40, “however, all things should be done in order and decently,” illuminates that individuals need to be organized in their work, for which, would generate a sense of control (Ciampa & Rosner, 2020). Correspondingly the organized anonymity generation process ensures a controlled approach to data anonymization.
Summary
The extraction of unknown or useful information from large datasets is called data mining. There are various privacy preservation approaches designed to mitigate data mining privacy issues, and k-anonymity is one of them. The limitation of the spatial-temporal k-anonymity in handling inference attacks on large spatial-temporal datasets led to the creation of NOSTK. The article under review describes how NOSTK could be used to mitigate data mining privacy issues. Accordingly, it comprehensively illustrates how the approach helps in providing data anonymity upon requisition by the user through a step by step process.