# 350. Intersection of Two Arrays II

## Description

Given two integer arrays nums1 and nums2, return an array of their intersection. Each element in the result must appear as many times as it shows in both arrays and you may return the result in any order.

Example 1:

Input: nums1 = [1,2,2,1], nums2 = [2,2]
Output: [2,2]


Example 2:

Input: nums1 = [4,9,5], nums2 = [9,4,9,8,4]
Output: [4,9]
Explanation: [9,4] is also accepted.


Constraints:

• 1 <= nums1.length, nums2.length <= 1000
• 0 <= nums1[i], nums2[i] <= 1000

• What if the given array is already sorted? How would you optimize your algorithm?
• What if nums1's size is small compared to nums2's size? Which algorithm is better?
• What if elements of nums2 are stored on disk, and the memory is limited such that you cannot load all elements into the memory at once?

## Solutions

### HashSet

The previous expansion of 349 Intersection of Two Arrays, the difference is that this question allows to return repeated numbers, and returns as many as possible.

HashMap to establish the mapping between the characters in nums1 and the number of occurrences, and then traverse the nums2 array.

If the number of the current character in the HashMap is greater than 0, add this character to the result, and then the corresponding value of the HashMap will be decremented by 1.

### Two pointers

Sort the two arrays first, and then use two pointers to point to the starting positions of the two arrays,

• If the numbers pointed to by the two pointers are equal, it is stored in the result, and both pointers are incremented by 1,
• If the number pointed to by the first pointer is large, the second pointer is incremented by 1, and vice versa

### 1. If the given array is already sorted:

If both nums1 and nums2 are sorted, you can use a two-pointer technique to improve the time complexity to O(n + m), where n and m are the lengths of the two arrays. This approach eliminates the need for additional space required for hash maps in the unsorted case.

### 2. If nums1’s size is small compared to nums2’s size:

If one array is significantly smaller than the other, it’s more efficient to use a hash map for the smaller array. This reduces the space complexity and potentially the time complexity since you iterate over and hash only the smaller array and then iterate over the larger array to check for intersections.

### 3. If elements of nums2 are stored on disk, and the memory is limited:

In this scenario, external sorting and the two-pointer technique can be used.

Process nums1 into a hash map or sort it if not already sorted. For nums2, use external sorting algorithms that break the data into chunks that fit into memory, sort each chunk, and then merge them. This way, you can sort nums2 without loading it entirely into memory. After sorting, you can stream nums2 from disk and use the two-pointer technique or binary search (if nums1 is hashed) to find intersections without exceeding memory limits.

For massive datasets that cannot fit into memory (like nums2 in this case), consider using database solutions designed for handling large-scale data or big data technologies like Hadoop or Spark, which provide mechanisms for distributed data processing.

# 3.1 More on external sorting - memory is limited

External sorting is a class of algorithms used to handle massive amounts of data that do not fit into a computer’s main memory (RAM) but instead are stored on a slower, external memory such as a hard disk drive (HDD) or solid-state drive (SSD). This is particularly useful for sorting large files that exceed the capacity of the system’s RAM. The most common external sorting algorithm is the external merge sort, which involves the following steps:

### 1. Divide:

• The large dataset is divided into smaller chunks, each of which can fit into the available RAM.
• These chunks are read into memory one at a time.

### 2. Sort:

• Each chunk is sorted in memory using an efficient internal sorting algorithm, like quicksort or mergesort.
• After sorting, the sorted chunks are written back to the external storage.

### 3. Merge:

• The sorted chunks are then merged into a single, fully sorted file.
• This merge process is done in a way that always maintains a manageable number of chunks in memory, regardless of the total size of the dataset.

### Example of External Merge Sort:

Assume you have 1GB of data to sort, but only 256MB of RAM available.

Divide:

• Break the 1GB file into 4 chunks of 256MB each.
• Read the first 256MB chunk into RAM.

Sort:

• Sort this chunk using an in-memory sorting algorithm (like mergesort).
• Write the sorted chunk back to disk.
• Repeat this process for the remaining chunks.

Merge:

• Read the first part of each sorted chunk into RAM and merge them, creating a new sorted sequence of data. Given the RAM limitation, you might only be able to read a small portion of each chunk at a time.
• Write the merged sequence back to disk.
• If there was more than one value from each chunk in RAM, repeat the merge process, reading the next values from the chunks as needed until all chunks are fully merged into a single sorted sequence.

Efficiency Tips:

• To reduce disk I/O, which is typically the bottleneck in external sorting, it’s essential to optimize how data is read from and written to disk. For example, using larger buffers can reduce the number of disk accesses.
• Multiway merges, where more than two chunks are merged at once, can significantly reduce the total number of merging passes needed.

Applications:

• External sorting is crucial in databases for sorting large tables and in big data applications where datasets far exceed the available memory.
• Tools and frameworks that handle big data, like Hadoop and Spark, implement their versions of external sorting to distribute the sorting process across multiple machines, further scaling the ability to sort massive datasets efficiently.
• class Solution {
public int[] intersect(int[] nums1, int[] nums2) {
Map<Integer, Integer> counter = new HashMap<>();
for (int num : nums1) {
counter.put(num, counter.getOrDefault(num, 0) + 1);
}
List<Integer> t = new ArrayList<>();
for (int num : nums2) {
if (counter.getOrDefault(num, 0) > 0) {
counter.put(num, counter.get(num) - 1);
}
}
int[] res = new int[t.size()];
for (int i = 0; i < res.length; ++i) {
res[i] = t.get(i);
}
return res;
}
}

• class Solution {
public:
vector<int> intersect(vector<int>& nums1, vector<int>& nums2) {
unordered_map<int, int> counter;
for (int num : nums1) ++counter[num];
vector<int> res;
for (int num : nums2) {
if (counter[num] > 0) {
--counter[num];
res.push_back(num);
}
}
return res;
}
};

• class Solution:
def intersect(self, nums1: List[int], nums2: List[int]) -> List[int]:
counter = Counter(nums1)
res = []
for num in nums2:
if counter[num] > 0:
res.append(num)
counter[num] -= 1
return res

############

class Solution(object):
def intersect(self, nums1, nums2):
"""
:type nums1: List[int]
:type nums2: List[int]
:rtype: List[int]
"""
ans = []
nums1.sort()
nums2.sort()
i = j = 0
while i < len(nums1) and j < len(nums2):
if nums1[i] < nums2[j]:
i += 1
elif nums1[i] > nums2[j]:
j += 1
else:
ans.append(nums1[i])
i += 1
j += 1

return ans


• func intersect(nums1 []int, nums2 []int) []int {
counter := make(map[int]int)
for _, num := range nums1 {
counter[num]++
}
var res []int
for _, num := range nums2 {
if counter[num] > 0 {
counter[num]--
res = append(res, num)
}
}
return res
}

• function intersect(nums1: number[], nums2: number[]): number[] {
const map = new Map<number, number>();
for (const num of nums1) {
map.set(num, (map.get(num) ?? 0) + 1);
}

const res = [];
for (const num of nums2) {
if (map.has(num) && map.get(num) !== 0) {
res.push(num);
map.set(num, map.get(num) - 1);
}
}
return res;
}


• /**
* @param {number[]} nums1
* @param {number[]} nums2
* @return {number[]}
*/
var intersect = function (nums1, nums2) {
const counter = {};
for (const num of nums1) {
counter[num] = (counter[num] || 0) + 1;
}
let res = [];
for (const num of nums2) {
if (counter[num] > 0) {
res.push(num);
counter[num] -= 1;
}
}
return res;
};


• class Solution {
/**
* @param Integer[] $nums1 * @param Integer[]$nums2
* @return Integer[]
*/
function intersect($nums1,$nums2) {
$rs = []; for ($i = 0; $i < count($nums1); $i++) {$hashtable[$nums1[$i]] += 1;
}
for ($j = 0;$j < count($nums2);$j++) {
if (isset($hashtable[$nums2[$j]]) &&$hashtable[$nums2[$j]] > 0) {
array_push($rs,$nums2[$j]);$hashtable[$nums2[$j]] -= 1;
}
}
return \$rs;
}
}

• public class Solution {
public int[] Intersect(int[] nums1, int[] nums2) {
HashSet<int> hs1 = new HashSet<int>(nums1.Concat(nums2).ToArray());
Dictionary<int, int> dict = new Dictionary<int, int>();
List<int> result = new List<int>();

foreach (int x in hs1) {
dict[x] = 0;
}

foreach (int x in nums1) {
if (dict.ContainsKey(x)) {
dict[x] += 1;
} else {
dict[x] = 1;
}
}

foreach (int x in nums2) {
if (dict[x] > 0) {
dict[x] -=1;
}
}

return result.ToArray();
}
}


• use std::collections::HashMap;
impl Solution {
pub fn intersect(nums1: Vec<i32>, nums2: Vec<i32>) -> Vec<i32> {
let mut map = HashMap::new();
for num in nums1.iter() {
*map.entry(num).or_insert(0) += 1;
}

let mut res = vec![];
for num in nums2.iter() {
if map.contains_key(num) && map.get(num).unwrap() != &0 {
map.insert(num, map.get(&num).unwrap() - 1);
res.push(*num);
}
}
res
}
}