HDU2328 Corporate Identity(字符串哈希)

it2023-10-30  59

题目描述

Beside other services, ACM helps companies to clearly state their “corporate identity”, which includes company logo but also other signs, like trademarks. One of such companies is Internet Building Masters (IBM), which has recently asked ACM for a help with their new identity. IBM do not want to change their existing logos and trademarks completely, because their customers are used to the old ones. Therefore, ACM will only change existing trademarks instead of creating new ones. After several other proposals, it was decided to take all existing trademarks and find the longest common sequence of letters that is contained in all of them. This sequence will be graphically emphasized to form a new logo. Then, the old trademarks may still be used while showing the new identity. Your task is to find such a sequence.

Input

The input contains several tasks. Each task begins with a line containing a positive integer N, the number of trademarks (2 ≤ N ≤ 4000). The number is followed by N lines, each containing one trademark. Trademarks will be composed only from lowercase letters, the length of each trademark will be at least 1 and at most 200 characters. After the last trademark, the next task begins. The last task is followed by a line containing zero.

Output

For each task, output a single line containing the longest string contained as a substring in all trademarks. If there are several strings of the same length, print the one that is lexicographically smallest. If there is no such non-empty string, output the words “IDENTITY LOST” instead.

Sample Input 3 aabbaabb abbababb bbbbbabb 2 xyz abc 0 Sample Output abb IDENTITY LOST

题目大意

给你n个字符串,求这n个字符串字典序最小的最长公共子串。

题目分析

这道题我们可以用哈希算法来解决。 首先因为枚举的最长公共子串的长度具有单调性,因此我们可以通过二分的方式来找出答案串的长度。然后寻找是否有该长度的公共子串。 寻找方法为:枚举出所有s1中该长度的子串,然后与剩下的n-1个字符串中所有该长度的子串进行比较。如果s1中有某个子串能与剩下的n-1个字符串中的某个子串匹配成功,那么说明该长度是可以的。 然后记录下该串,与答案串进行比较:如果两串长度不等,保留长度更长的进入答案串。如果长度相等,保留字典序更小的进入答案串。

代码如下
#include <iostream> #include <cstdio> #include <cmath> #include <string> #include <cstring> #include <map> #include <queue> #include <vector> #include <set> #include <algorithm> #include<assert.h> #include <iomanip> #define ULL unsigned long long #define PSI pair<string,int> using namespace std; const int N=5e3+5,M=205,P=131; int n; char s[N][M]; //保存n个字符串 ULL h[N][M]; //记录n个字符串的哈希值 ULL p[M]; string ans; //答案串 ULL get(int k,int l,int r) //计算第k个字符串[l,r]区间的哈希值(哈希模板) { return h[k][r]-h[k][l-1]*p[r-l+1]; } bool check(int mid) //检查该长度是否合法 { bool final=false; //计算最终是否有解 int s1=strlen(s[1]+1); for(int i=1;i+mid-1<=s1;i++) //枚举s[1]中长度为mid的所有子串 { bool st=true; //记录该次匹配是否成功 ULL key=get(1,i,i+mid-1); for(int j=2;j<=n;j++) { bool flag=false; //记录第j个子串和当前s[1]的子串是否能匹配成功 int len=strlen(s[j]+1); for(int k=1;k+mid-1<=len;k++) if(get(j,k,k+mid-1)==key) //匹配成功 { flag=true; break; } if(!flag) {st=false; break;} } if(st) //s[1]的该子串能匹配成功 { final=true; string ss; //记录s[1]的该子串 for(int j=i;j<=i+mid-1;j++) ss+=s[1][j]; if(ss.size()>ans.size()) ans=ss;//与答案串进行比较 else if(ss.size()==ans.size()&&ans>ss) ans=ss; } } return final; } int main() { while(cin>>n,n) { ans.clear(); for(int i=1;i<=n;i++) cin>>(s[i]+1); p[0]=1; for(int i=1;i<M;i++) p[i]=p[i-1]*P; //计算哈希值 for(int i=1;i<=n;i++) { int len=strlen(s[i]+1); for(int j=1;j<=len;j++) h[i][j]=h[i][j-1]*P+s[i][j]; } int l=0,r=M-1; //二分最长公共子串的长度 while(r>l) { int mid=(l+r+1)>>1; if(check(mid)) l=mid; else r=mid-1; } if(l) cout<<ans<<endl; else puts("IDENTITY LOST"); //l=0说明这n个串没有公共子串 } return 0; }
最新回复(0)