This worksheet is work-in-progress relating to an investigation into corporate governance pursued by the Open Knowledge Foundation's School Of Data, Global Witness, and OpenCorporates.
The investigation seeks to explore the extent to which:
The Global Witness report, Grave Secrecy, identifies several red flag indicators for identifying nominee companies:
In turn, this begs the question: which officers are, or are likley to be, nominees? That is, what red flags do we have for identifying an officer as a nominee? We also need to ask: are all (or is the majority of) current officers nominees?
The report also suggests ways of identifying company networks (p20). In particular, companies may be linked through:
In this worksheet, we will review some of the database queries and other computer assisted journalism techniques that might help us pursue these sorts of question. The queries have been tested over a subset of the OpenCorporates UK companies and officers data tables. The intention is to identify what sorts of queries we might be able to ask of the data and what sorts of tools we might be able to develop in order to support, or even to automate, the detection and investigative analysis of subjects of interest.
Given the size of the companies of officers datasets, there may be some trade-offs between running queries over complete datasets versus running specific queries that are rooted on a specific company or officer name, for example. This is particularly true when it comes to looking for partial matches or creating datasets that are based on partial matching. A data cleaning phase that introduces columns based on normalised names is one way of addressing deficiencies that are likely to arise when making use of ad hoc partial matching approaches.
Note that none of the queries have been optimised and some of them may benefit from the creation of new indexes. For running data mining/fishing queries, it may be possible to cast some of the queries into a map-reduce form if we move the data into a database that supports such operations.
Some fishing queries around the structure of company and officer networks may also be better cast as queries onto the data stored in a graph database.
Part of the challenge on the day will be to come up with some sort of process that describes the steps required for each particular type of investigation and chains together appropriate tools, techniques and queries in order to support that investigation.
We need to decide how the investigation will be run.
The approach taken in this worksheet has been to try to identify some query based tools that would be useful when running an interactive investigation, where we start to run queries and then pursue possibly interesting lines of inquiry in an interactive, conversation-with-the-data manner. This approach requires that we have access to a full data dump for the territories of interest or a database that allows us to run queries over that data.
A downside of having access to the data rather than a database is that we will need to provision a database and set up access/permissions to it. (This worksheet makes use of read permissions for running queries and whatever permissions are required to create, delete and update views and create indexes. It also makes use of fuzzystrmatch and trgm extensions.) We need to decide who will provision the database if we are going to support interactive data conversations.
Another approach would be to run queries in advance in order to identify companies of officers that are worthy of further investigation. Again, the queries introduced in this worksheet may contribute to scripts for identifying such likely subjects, although we need to be clear what sorts of criteria make subjects interesting, and how we build up these criteria profiles. However, this approach begs the question of what form that investigation will take if it is not conversational with the data. (An intermediate approach might make use of relatively small number of calls to the OC API to ask very specific questions returning not much data at each step. If we take this approach, we need to decide what mode of access to the API will be required: for example, via a programming lanugauge, via OpenRefine, etc. Also, what sorts of data top-ups would we require?) If we take this approach - of identifying subjects in advance - we need to clearly identify what queries we need to run in order to identify those subjects, what criteria make subjects possibly of interest, and so on. Some of the queries described in this worksheet may lead to automated flagging procedures (for example, identifying popular registered addresses) which in turn feed other automated reports (for example, identifying officers commonly associated with companies registrered at those addresses, along with the average duration they remain an officer). This approach boils down to coming up woth some intelligence rules for classifying/categorising subjects of interest, and possibly chaining them together. Do we have the inclination and time/resource to do this in time? And if so: who will run the queries (i.e. who will provision the database); who will write the query scripts; who will decide what feature profiles define possibly interesting subjects and who will come up with the recipes that can be used to define the query scripts that will identify subjects corresponding to these feature profiles? Or am I just overcomplicating things?!;-) At the moment I feel as if I am coming up with jigsaw pieces but arenlt sure how to put them together. And once they are put together, I have nowhere to run them (and they may be too inefficient to run on the full datasets anyway?)
(Only read this section if the technical backend stuff interests you... Otherwise, move straight on to the next section: Company Investigations - Overexposed Registered Addresses)
This iPython notebook contains a set of working notes for exploring company data exported from OpenCorporates. This section describes something of the technical setup used to support the queries used in the investigative part of the worksheet.
The data is loaded into a PostgreSQL database and then accessed from queries made via this notebook.
The PostgreSQL magic used to support queries requires:
pip install psycopg2
pip install ipython-sql
Now we can load the sql magic...
%load_ext sql
PostgreSQL can be configured with an additional user that provides access from iPython. For example, Creating a User With pgAdmin (use the Grant Wizard to add permissions at the schema level too...).
Create a user ipython with password ipython as a quick placeholder user role, within a group role rstudio which has been previously granted the appropriate permissions to. The database we want to access is okf_gw.
%sql postgresql://ipython:ipython@localhost/okf_gw
u'Connected: ipython@okf_gw'
Let's see what schemata are available in this database:
# For some reason, the following query (which should work?) returns an empty set for me?
# %sql select schema_name from information_schema.schemata;
%sql SELECT * from pg_namespace ;
7 rows affected.
nspname | nspowner | nspacl |
---|---|---|
pg_toast | 10 | None |
pg_temp_1 | 10 | None |
pg_toast_temp_1 | 10 | None |
pg_catalog | 10 | {postgres=UC/postgres,=U/postgres} |
public | 10 | {postgres=UC/postgres,=UC/postgres} |
information_schema | 10 | {postgres=UC/postgres,=U/postgres} |
ukcompanies | 10 | {postgres=UC/postgres,rstudio=U/postgres} |
The schema I'm interested in is ukcompanies. Let's see what tables it contains:
%sql SELECT table_name FROM information_schema.tables WHERE table_schema = 'ukcompanies';
3 rows affected.
table_name |
---|
basic |
company_export |
officer_export |
(This should also work? dbListTables(con))
company_export is a table exported from OpenCorporates that includes information about company registrations.
officer_export is a table exported from OpenCorporates that includes information about company officers.
Note: I have also loaded a table basic* that contains data from the Companies House current companies opendata download as used in Working With Large Text Files – Finding UK Companies by Postcode or Business Area.*
Let's look at what columns we have available to play with, by table:
%sql SELECT table_name, column_name FROM information_schema.columns WHERE table_schema='ukcompanies';
76 rows affected.
table_name | column_name |
---|---|
basic | companyname |
basic | companynumber |
basic | regaddresscareof |
basic | regaddresspobox |
basic | regaddressaddressline1 |
basic | regaddressaddressline2 |
basic | regaddressposttown |
basic | regaddresscounty |
basic | regaddresscountry |
basic | regaddresspostcode |
basic | companycategory |
basic | companystatus |
basic | countryoforigin |
basic | dissolutiondate |
basic | incorporationdate |
basic | accountsaccountrefday |
basic | accountsaccountrefmonth |
basic | accountsnextduedate |
basic | accountslastmadeupdate |
basic | accountsaccountcategory |
basic | returnsnextduedate |
basic | returnslastmadeupdate |
basic | mortgagesnummortcharges |
basic | mortgagesnummortoutstanding |
basic | mortgagesnummortpartsatisfied |
basic | mortgagesnummortsatisfied |
basic | siccodesictext_1 |
basic | siccodesictext_2 |
basic | siccodesictext_3 |
basic | siccodesictext_4 |
basic | limitedpartnershipsnumgenpartners |
basic | limitedpartnershipsnumlimpartners |
basic | uri |
basic | previousname_1condate |
basic | previousname_1companyname |
basic | previousname_2condate |
basic | previousname_2companyname |
basic | previousname_3condate |
basic | previousname_3companyname |
basic | previousname_4condate |
basic | previousname_4companyname |
basic | previousname_5condate |
basic | previousname_5companyname |
basic | previousname_6condate |
basic | previousname_6companyname |
basic | previousname_7condate |
basic | previousname_7companyname |
basic | previousname_8condate |
basic | previousname_8companyname |
basic | previousname_9condate |
basic | previousname_9companyname |
basic | previousname_10condate |
basic | previousname_10companyname |
company_export | id |
company_export | company_number |
company_export | name |
company_export | company_type |
company_export | current_status |
company_export | incorporation_date |
company_export | dissolution_date |
company_export | ra_street_address |
company_export | ra_locality |
company_export | ra_region |
company_export | ra_postal_code |
company_export | ra_country |
officer_export | id |
officer_export | company_id |
officer_export | name |
officer_export | position |
officer_export | start_date |
officer_export | end_date |
officer_export | date_of_birth |
officer_export | inactive |
officer_export | occupation |
officer_export | nationality |
officer_export | address |
Let's run a simple query on the basic table:
result = %sql SELECT companyname,companynumber,regaddressaddressline1,regaddresspostcode, \
companycategory,companystatus,dissolutiondate,incorporationdate FROM ukcompanies.basic LIMIT 2;
result
2 rows affected.
companyname | companynumber | regaddressaddressline1 | regaddresspostcode | companycategory | companystatus | dissolutiondate | incorporationdate |
---|---|---|---|---|---|---|---|
! LTD | 08209948 | METROHOUSE 57 PEPPER ROAD | LS10 2RU | Private Limited Company | Active | 11/09/2012 | |
!BIG IMPACT GRAPHICS LIMITED | 07382019 | 335 ROSDEN HOUSE | EC1V 9AV | Private Limited Company | Active | 21/09/2010 |
We can get the data into a pandas dataframe to make manipulation a bit easier...
If we aren't running the notebook in an appropriate mode (I am running this as ipython notebook --pylab under Anaconda), we may need to bring in pandas explicitly:
#import pandas
Then we can invoke the dataframe method:
df=result.DataFrame()
df
companyname | companynumber | regaddressaddressline1 | regaddresspostcode | companycategory | companystatus | dissolutiondate | incorporationdate | |
---|---|---|---|---|---|---|---|---|
0 | ! LTD | 08209948 | METROHOUSE 57 PEPPER ROAD | LS10 2RU | Private Limited Company | Active | 11/09/2012 | |
1 | !BIG IMPACT GRAPHICS LIMITED | 07382019 | 335 ROSDEN HOUSE | EC1V 9AV | Private Limited Company | Active | 21/09/2010 |
For more powerful searching, include a couple of contributed extensions (in pgAdmin, for the appropriate database, select Extensions then New Extension): fuzzystrmatch and pg_trgm. For example, we could create a trigram index using something like this:
and then do a trigram based name search using the % similarity operator. The fuzzystrmatch library lets us use the lexeme based similarity operator @@.
To run clustering operations, we could try to use OpenRefine as a client using the Google/OpenRefine client that can be found at https://github.com/PaulMakepeace/refine-client-py. (See also Trevor Muñoz's tutorial posted at http://nbviewer.ipython.org/gist/trevormunoz/6265360.)
from google.refine import refine, facet
If the OpenRefine server is running on the traditional port, we can connect to it as follows:
server = refine.RefineServer()
orefine = refine.Refine(server)
For graphics, we'll probably use matplotlib:
import matplotlib.pyplot as plt
We might also experiment with the ggplot python package:
from ggplot import *
For UK companies, in many cases the postcode part of the address is available in a distinct column. Postcodes cover geographical areas, which means that several companies at distinct addresses may share the same address. But as a first pass, we can run a simple query to find postcodes within which there is a large number of registered companies.
In the US, zip codes represent much larger geographical areas. The zip+4 code gives better resolution, with the postal bar code (zip+4 plus two additional digits) giving resolution to the level of a delivery point. The way a zip+4 code is constructed for post office boxes, using digits from the box number to form the +4 characters, suggests we may be able to improve the resolution of postcodes by adding in digits that correspond to house number or PO Box number, for example.
Find the most popular UK postcodes used to register companies along with how many companies are registered with that postcode:
top50pc=%sql SELECT ra_postal_code, COUNT(*) FROM ukcompanies.company_export \
GROUP BY ra_postal_code ORDER BY COUNT(*) DESC;
top50pc.DataFrame()[:5]
78461 rows affected.
ra_postal_code | count | |
---|---|---|
0 | 3141 | |
1 | B18 6EW | 1141 |
2 | EC1V 4PY | 651 |
3 | RH16 3BW | 489 |
4 | RG7 8NN | 402 |
Just knowing that a particular postcode is associated with a large number of company registrations is one thing, but how useful is it? We also need to consider:
We might also want to limit searches based on companies with a specific activity type. Here are the possible company activity status types:
%sql SELECT DISTINCT current_status FROM ukcompanies.company_export
18 rows affected.
current_status |
---|
Liquidation |
current_status |
Removed |
Live But Receiver Manager On At Least One Charge |
In Administration/Receiver Manager |
Administrative Receiver |
Receivership |
Active Proposal To Strike Off |
Converted / Closed |
Administration Order |
Receiver Manager / Administrative Receiver |
Active |
Inliq |
Dissolved |
In Administration |
In Administration/Administrative Receiver |
Voluntary Arrangement |
For companies registered at a given postcode, which are the most common addresses associated with that postcode? (Might we be able to use these as a basis for clustering?)
postcode='B18 6EW'
addr = %sql SELECT * FROM (\
SELECT ra_street_address, COUNT(*) FROM ukcompanies.company_export\
WHERE ra_postal_code = :postcode GROUP BY ra_street_address \
ORDER BY COUNT(*) DESC ) AS tmp WHERE count >1;
addr
4 rows affected.
ra_street_address | count |
---|---|
69 GREAT HAMPTON STREET | 869 |
69 GREAT HAMPTON STREET, BIRMINGHAM | 200 |
69 GREAT HAMPTON ST | 55 |
69 GREAT HAMPTON STREET, BIRMINGHAM, WEST MIDLANDS | 14 |
One thing to note is that in this case, these first address lines seem to represent the same place, even though they don't match exactly as strings. PostgreSQL lets us do fuzzy matching, but what if we want to make a start on cleaning the data and working with a cleaned up/normalised version of it?
Let's see if we can cluster these using OpenRefine. For the moment, I'm going to get the data into an OpenRefine project via a saved CSV file. But there may be a better way? Also, we probably need to be mindful about the projects that are created each time we run this scripts!
csv_file='test.csv'
addr.DataFrame().to_csv(csv_file,index=False)
p=orefine.new_project(project_file=csv_file)
fr=p.compute_facets(facet.TextFacet('ra_street_address'))
facets = fr.facets[0]
for k in sorted(facets.choices, key=lambda k: facets.choices[k].count, reverse=True):
print facets.choices[k].count, k
1 69 GREAT HAMPTON STREET 1 69 GREAT HAMPTON STREET, BIRMINGHAM 1 69 GREAT HAMPTON STREET, BIRMINGHAM, WEST MIDLANDS 1 69 GREAT HAMPTON ST
pr=p.get_rows(limit=10)
pr.rows[0].row
[u'69 GREAT HAMPTON STREET', 869]
cluster_response = p.compute_clusters('ra_street_address',function='metaphone3')
for cluster in cluster_response[:5]:
print '\n'
for line in cluster:
print line
{'count': 1, 'value': u'69 GREAT HAMPTON ST'} {'count': 1, 'value': u'69 GREAT HAMPTON STREET'} {'count': 1, 'value': u'69 GREAT HAMPTON STREET, BIRMINGHAM'} {'count': 1, 'value': u'69 GREAT HAMPTON STREET, BIRMINGHAM, WEST MIDLANDS'}
I'm not sure what the best way is of using this clustering information? We could try a mass edit?
p.mass_edit('ra_street_address', [{
'from': ['69 GREAT HAMPTON STREET, BIRMINGHAM, WEST MIDLANDS', '69 GREAT HAMPTON STREET, BIRMINGHAM'],
'to': '69 GREAT HAMPTON STREET'
}])
{u'code': u'ok', u'historyEntry': {u'description': u'Mass edit 2 cells in column ra_street_address', u'id': 1389011017772, u'time': u'2014-01-06T12:23:22Z'}}
If we now look at the rows in the project file, we should see that some of them have been rewritten...
pr=p.get_rows(limit=10)
for r in pr.rows: print r.row
[u'69 GREAT HAMPTON STREET', 869] [u'69 GREAT HAMPTON STREET', 200] [u'69 GREAT HAMPTON ST', 55] [u'69 GREAT HAMPTON STREET', 14]
One thing to note here is that we have moved into the OpenRefine data structure space and away from the datastructures returned from the sql magic queries and the pandas representations derived therefrom; which could all get a little confusing?
Another difficulty comes in thinking clearly enough as to the automated steps we may want to take to run these clustering operations before further analysis, along with which dataspace to operate in if we do run such cleaning operations.
For companies registered at a given postcode, is there a preponderance of a particular active officer associated with these companies? To check this we need to get a list of the active companies registered at a particular postcode and use this list of companies as the basis of a search on active officers. Let's further limit ourselves to company secretaries.
os = %sql SELECT name, COUNT(*) FROM ukcompanies.officer_export AS o WHERE o.company_id IN (\
SELECT id FROM ukcompanies.company_export\
WHERE ra_postal_code = :postcode ) AND position LIKE '%secretary%' AND inactive !='true' \
GROUP BY name ORDER BY COUNT(*) DESC LIMIT 5
os
5 rows affected.
name | count |
---|---|
GO AHEAD SERVICE LIMITED | 209 |
GO AHEAD SERVICE LTD | 31 |
ON BEHALF SERVICE LIMITED | 9 |
GO AHEAD SERVICES LIMITED | 8 |
URSULA TUERKSCH | 2 |
What this report shows us is that a large number of companies registered at this particular postcode have the same active company secretary.
If we were to try to automate the detection of common officer names fulfilling a particular role at addresses associated with a postcode where there are a large number of companies registered, our algorithm may look something like this:
The aim of this automated approach might be to try to identify an agent acting on behalf of the majority of companies registered at a particular address.
If we know the name of an officer, it may be instructive to look for (common) registration addresses (or variants of a particular address) for companies associated with that officer. In order to do that, we need to get the company ids associated with an officer and then group by the registered addresses of those companies.
secretary='GO AHEAD SERVICE L%'
oa= %sql SELECT c.ra_street_address, COUNT(*) FROM ukcompanies.company_export AS c WHERE id IN \
(SELECT company_id AS id FROM ukcompanies.officer_export \
WHERE name LIKE :secretary AND position LIKE '%secretary%' AND inactive !='true' ) \
GROUP BY c.ra_street_address ORDER BY COUNT(*) DESC LIMIT 10
oa
5 rows affected.
ra_street_address | count |
---|---|
69 GREAT HAMPTON STREET | 177 |
69 GREAT HAMPTON STREET, BIRMINGHAM | 38 |
69 GREAT HAMPTON ST | 19 |
69 GREAT HAMPTON STREET, BIRMINGHAM, WEST MIDLANDS | 4 |
ROMAN HOUSE, 69 GREAT HAMPTON STREET | 1 |
This table shows that a particular company secretary is classed as active in this role in their association with a large number of companies registered at the same address. Appropriate clustering operations are also likely to identify the different variants of this address within the same cluster.
It's easy enough to find the officers associated with a particular company by Open Corporates id:
occid='116469'
officersByCompany = %sql SELECT * FROM ukcompanies.officer_export WHERE company_id= :occid ;
officersByCompany.DataFrame()[:3]
57 rows affected.
id | company_id | name | position | start_date | end_date | date_of_birth | inactive | occupation | nationality | address | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 33175015 | 116469 | ANDREW JAMES TAYLOR | secretary | 2000-10-18 | 1959-12-02 | true | PROPERTY MANAGER | BRITISH | 103 SANDY LANE, ST IVES, RINGWOOD, HAMPSHIRE B... | |
1 | 33175018 | 116469 | HILDE GILBERT | director | 1992-10-12 | 1918-09-21 | true | RETIRED | BRITISH | ALINGTON 25 MARLBOROUGH ROAD, BOURNEMOUTH, DOR... | |
2 | 33175019 | 116469 | NICHOLAS ROBERT HARDING | director | 1992-03-10 | 1960-04-04 | true | ACTUARY | BRITISH | 19 CRANESFIELD, SHERBORNE ST JOHN, BASINGSTOKE... |
It's a little more involved if we want to find the officers given the company number. (Note - from the tables we have, the territory/jurisdiction isn't listed - so what happens if there are clashes between company numbers in different jusrisdictions?)
companyNumber ='00804867'
officersByCompanyID = %sql SELECT o.name, o.id AS "OfficerID", c.name, c.company_number, c.id AS "OCID" \
FROM ukcompanies.officer_export AS o JOIN ukcompanies.company_export AS c ON c.id=o.company_id \
WHERE c.company_number = :companyNumber
officersByCompanyID.DataFrame()[:10]
57 rows affected.
name | OfficerID | name | company_number | OCID | |
---|---|---|---|---|---|
0 | ANDREW JAMES TAYLOR | 33175015 | ALINGTON (WESTBOURNE) MANAGEMENT COMPANY LIMITED | 00804867 | 116469 |
1 | HILDE GILBERT | 33175018 | ALINGTON (WESTBOURNE) MANAGEMENT COMPANY LIMITED | 00804867 | 116469 |
2 | NICHOLAS ROBERT HARDING | 33175019 | ALINGTON (WESTBOURNE) MANAGEMENT COMPANY LIMITED | 00804867 | 116469 |
3 | VALERIE ANN KEY | 33175022 | ALINGTON (WESTBOURNE) MANAGEMENT COMPANY LIMITED | 00804867 | 116469 |
4 | IVAN MARTCH | 33175023 | ALINGTON (WESTBOURNE) MANAGEMENT COMPANY LIMITED | 00804867 | 116469 |
5 | ELIZABETH PALMER | 33175025 | ALINGTON (WESTBOURNE) MANAGEMENT COMPANY LIMITED | 00804867 | 116469 |
6 | GORDON LESLIE SIMPSON | 33175027 | ALINGTON (WESTBOURNE) MANAGEMENT COMPANY LIMITED | 00804867 | 116469 |
7 | JANET TAYLOR | 33175028 | ALINGTON (WESTBOURNE) MANAGEMENT COMPANY LIMITED | 00804867 | 116469 |
8 | DAVE TREVETT | 33175030 | ALINGTON (WESTBOURNE) MANAGEMENT COMPANY LIMITED | 00804867 | 116469 |
9 | MARGARET CHRISTINA LENTON | 60970304 | ALINGTON (WESTBOURNE) MANAGEMENT COMPANY LIMITED | 00804867 | 116469 |
However, suppose we want to run a more general query to find the number of officers who have had an association at any time with each company (for example, to find companies that appear to have had a large number of officers pass through their books):
gps=%sql SELECT c.company_number, c.name, o.company_id, COUNT(*) \
FROM ukcompanies.officer_export AS o JOIN ukcompanies.company_export AS c \
ON o.company_id=c.id \
GROUP BY c.company_number, c.name, o.company_id ORDER BY count DESC LIMIT 10;
gps
10 rows affected.
company_number | name | company_id | count |
---|---|---|---|
00804867 | ALINGTON (WESTBOURNE) MANAGEMENT COMPANY LIMITED | 116469 | 57 |
01744056 | AMBERLEY GROUP LIMITED | 144680 | 42 |
OC322684 | ACUMENTIA LLP | 49180 | 37 |
02940803 | ACAMBIS (UK) LIMITED | 28579 | 36 |
00102488 | AIR LEAGUE ENTERPRISES LIMITED | 89171 | 35 |
00133676 | THE ABBEY SCHOOL, READING | 14142 | 35 |
00169831 | AMEC BKW LIMITED | 147749 | 35 |
00123458 | AAH PHARMACEUTICALS LIMITED | 1483 | 35 |
00073396 | ALLIANCE ASSURANCE COMPANY LIMITED | 122259 | 35 |
00026688 | ACTION FOR BLIND PEOPLE | 45023 | 35 |
We can also list the officers by company along with a count of the number of officers associated with that company:
gps=%sql SELECT company_id, name, COUNT(*) OVER (PARTITION BY company_id) \
FROM ukcompanies.officer_export ORDER BY count DESC LIMIT 10;
gps
10 rows affected.
company_id | name | count |
---|---|---|
116469 | IVAN MARTCH | 57 |
116469 | JANET TAYLOR | 57 |
116469 | NICHOLAS ROBERT HARDING | 57 |
116469 | VALERIE ANN KEY | 57 |
116469 | ELIZABETH PALMER | 57 |
116469 | GORDON LESLIE SIMPSON | 57 |
116469 | ANDREW JAMES TAYLOR | 57 |
116469 | HILDE GILBERT | 57 |
116469 | MARGARET CHRISTINA LENTON | 57 |
116469 | DAVE TREVETT | 57 |
Note that the company_id* is the OpenCorporates ID and not the company number.* To find the company number, we need to JOIN on the companies table:
gps=%sql SELECT o.company_id, c.company_number, c.name, o.name, o.count FROM ( \
SELECT company_id, name, COUNT(*) OVER (PARTITION BY company_id) \
FROM ukcompanies.officer_export ) AS o \
JOIN ukcompanies.company_export AS c ON c.id = o.company_id ORDER BY o.count DESC LIMIT 10
gps
10 rows affected.
company_id | company_number | name | name_1 | count |
---|---|---|---|---|
116469 | 00804867 | ALINGTON (WESTBOURNE) MANAGEMENT COMPANY LIMITED | IVAN MARTCH | 57 |
116469 | 00804867 | ALINGTON (WESTBOURNE) MANAGEMENT COMPANY LIMITED | JANET TAYLOR | 57 |
116469 | 00804867 | ALINGTON (WESTBOURNE) MANAGEMENT COMPANY LIMITED | NICHOLAS ROBERT HARDING | 57 |
116469 | 00804867 | ALINGTON (WESTBOURNE) MANAGEMENT COMPANY LIMITED | VALERIE ANN KEY | 57 |
116469 | 00804867 | ALINGTON (WESTBOURNE) MANAGEMENT COMPANY LIMITED | ELIZABETH PALMER | 57 |
116469 | 00804867 | ALINGTON (WESTBOURNE) MANAGEMENT COMPANY LIMITED | GORDON LESLIE SIMPSON | 57 |
116469 | 00804867 | ALINGTON (WESTBOURNE) MANAGEMENT COMPANY LIMITED | ANDREW JAMES TAYLOR | 57 |
116469 | 00804867 | ALINGTON (WESTBOURNE) MANAGEMENT COMPANY LIMITED | HILDE GILBERT | 57 |
116469 | 00804867 | ALINGTON (WESTBOURNE) MANAGEMENT COMPANY LIMITED | MARGARET CHRISTINA LENTON | 57 |
116469 | 00804867 | ALINGTON (WESTBOURNE) MANAGEMENT COMPANY LIMITED | DAVE TREVETT | 57 |
We might also limit the results to list the officers of companies that have more than a minimum specified number of active officers along with the associated company ID:
gps=%sql SELECT company_id, name, count FROM (\
SELECT company_id, name, COUNT(*) OVER (PARTITION BY company_id) \
FROM ukcompanies.officer_export WHERE end_date!='' LIMIT 100 \
) AS multidir WHERE count>8 ORDER BY count DESC LIMIT 20;
gps
20 rows affected.
company_id | name | count |
---|---|---|
100025 | YASHWANT PATEL | 27 |
100025 | MONICA DAWN BURNETT | 27 |
100025 | GAYATRI DESAI | 27 |
100025 | JOE DOKU | 27 |
100025 | MARK ABAYOMI ESHO | 27 |
100025 | FLORENCE ADZOTER APALOO HAMIEL | 27 |
100025 | HARPREET KAUR | 27 |
100025 | PAUL MATTHEW | 27 |
100025 | PAUL MATTHEW | 27 |
100025 | PAULINE WILUETH MUNROE | 27 |
100025 | DANIEL MWAMBA | 27 |
100025 | MAUREEN DENISE DOVER | 27 |
100025 | GENETTEE HILL | 27 |
100025 | PRAKASHKUMAR CHAMAKLAL PANCHAL | 27 |
100025 | JAGJIT SINGH BAINS | 27 |
100025 | FRANCES BEAZER | 27 |
100025 | JANET COULOUTE | 27 |
100025 | ANDREA NADINE DAVIS | 27 |
100025 | DEVENDRA DUDHAREJIA | 27 |
100025 | ANDY MATTHEWS | 27 |
Look for companies that have had an association at any time with at least two officers who have had an association with a specified company at any time:
test=%sql SELECT company_id,COUNT(*) FROM ukcompanies.officer_export WHERE name IN \
( SELECT name FROM ukcompanies.officer_export WHERE company_id='100025') \
GROUP BY company_id HAVING COUNT(*)>1;
test
2 rows affected.
company_id | count |
---|---|
100025 | 32 |
75047 | 4 |
We can join on the company table to see which these companies are:
cid='100025'
%sql SELECT c.name, o.company_id, COUNT(*) FROM ukcompanies.officer_export o JOIN ukcompanies.company_export c \
ON c.id=o.company_id WHERE o.name IN \
( SELECT name FROM ukcompanies.officer_export WHERE company_id= :cid) \
GROUP BY c.name, o.company_id HAVING COUNT(*)>1;
2 rows affected.
name | company_id | count |
---|---|---|
AFRICAN CARIBBEAN CITIZENS FORUM | 75047 | 4 |
AKWAABA AYEH MENTAL HEALTH ADVOCACY PROJECT | 100025 | 32 |
This table suggests that for company with OpenCorporates internal ID 100025, there is one other company (75047) that has, over all time, had at least two officer positions filled by officers sharing the same name as officers who have filled roles at any time for the company with ID 100025. In particular, 100025 has had 32 officers, and of those 4 officer names have also been associated with * 75047*.
The following query allows us to see which officer names the two companies have in common.
ts1=%sql select DISTINCT name FROM ukcompanies.officer_export where company_id='75047'\
AND name IN (SELECT DISTINCT name FROM ukcompanies.officer_export where company_id='100025') ORDER BY name;
ts1
4 rows affected.
name |
---|
DAVID BENJAMIN MCFARLANE |
JOHN BARRY OMARA |
MARK ABAYOMI ESHO |
MAUREEN DENISE DOVER |
How about if we look for companies that have had an association at any time with at least two officers who currently have an active association with a specified company:
test=%sql SELECT company_id,COUNT(*) FROM ukcompanies.officer_export WHERE name IN \
( SELECT name FROM ukcompanies.officer_export WHERE company_id='100025' AND end_date!='') \
GROUP BY company_id HAVING COUNT(*)>1;
test
2 rows affected.
company_id | count |
---|---|
100025 | 27 |
75047 | 2 |
That is, 100025 has 27 active officers, of which 2 have had an association at some time with 75047.
Look for companies that have at least two active officers who are also active officers of a specified company:
test=%sql SELECT company_id,count(*) FROM ukcompanies.officer_export WHERE name IN \
( SELECT name FROM ukcompanies.officer_export WHERE company_id='100025' AND end_date!='') \
AND end_date !='' GROUP BY company_id HAVING count(*)>1;
test
2 rows affected.
company_id | count |
---|---|
100025 | 27 |
75047 | 2 |
That is, 100025 has 27 active officers, of which 2 have an active association with 75047.
What does the activity profile of a particular person appear to look like (assuming we can reconcile different officer positions with the same human being)? That is, what does the pattern of officer appointments and terminations look like?
eos= %sql SELECT name,position, COUNT(*) FROM ukcompanies.officer_export \
GROUP BY name, position ORDER BY COUNT(*) DESC LIMIT 10
eos
10 rows affected.
name | position | count |
---|---|---|
SWIFT INCORPORATIONS LIMITED | nominated secretary | 3689 |
TEMPLE SECRETARIES LIMITED | nominated secretary | 2773 |
COMPANY DIRECTORS LIMITED | nominated director | 2743 |
FORM 10 DIRECTORS FD LTD | nominated director | 2707 |
FORM 10 SECRETARIES FD LTD | nominated secretary | 2706 |
INSTANT COMPANIES LIMITED | nominated director | 1639 |
WATERLOW SECRETARIES LIMITED | nominated secretary | 1433 |
WATERLOW NOMINEES LIMITED | nominated director | 1378 |
LONDON LAW SERVICES LIMITED | nominated director | 915 |
LONDON LAW SECRETARIAL LIMITED | nominated secretary | 911 |
As we might expect, common officers tend to be corporate officers. In a later section, we will explore a way of filtering out possible corporate officers in order to find frequent officers who are more likely to be human persons.
ods= %sql SELECT o.name AS "officer_name", o.start_date, o.end_date, o.inactive, c.name AS "company_name", c.company_number \
FROM ukcompanies.officer_export o JOIN ukcompanies.company_export c ON o.company_id=c.id \
WHERE o.name='YOMTOV ELIEZER JACOBS' \
AND o.position='director'
ods.DataFrame()[:10]
613 rows affected.
officer_name | start_date | end_date | inactive | company_name | company_number | |
---|---|---|---|---|---|---|
0 | YOMTOV ELIEZER JACOBS | 2010-02-23 | 2010-02-24 | true | AAA FOODS LTD | 07166510 |
1 | YOMTOV ELIEZER JACOBS | 2009-10-16 | 2009-10-21 | true | A A & A AUTOS LTD | 07045770 |
2 | YOMTOV ELIEZER JACOBS | 2009-03-25 | 2009-03-26 | true | AAA PROPERTIES (YORKSHIRE) LTD | 06859220 |
3 | YOMTOV ELIEZER JACOBS | 2010-06-30 | 2010-06-30 | true | AAA TAXIS (PRESTON) LTD | 07300199 |
4 | YOMTOV ELIEZER JACOBS | 2009-07-28 | 2009-07-29 | true | AAA TYRES LTD | 06974840 |
5 | YOMTOV ELIEZER JACOBS | 2009-12-22 | 2009-12-23 | true | AA AUTOS LIMITED | 07111244 |
6 | YOMTOV ELIEZER JACOBS | 2009-01-21 | 2009-01-21 | true | AAB COMMUNICATIONS LIMITED | 06797040 |
7 | YOMTOV ELIEZER JACOBS | 2009-03-27 | 2009-03-30 | true | A A H WELL SERVICES LTD | 06861751 |
8 | YOMTOV ELIEZER JACOBS | 2010-06-23 | 2010-06-23 | true | A A KNIGHT LIMITED | 07293231 |
9 | YOMTOV ELIEZER JACOBS | 2009-08-25 | 2009-08-26 | true | A ALI & CO. LTD | 07000156 |
If we order the positions by date and bring in an acculumaltor count, we can generate a chart that shows the total number started associations a director has had with different companies.
(Note - need to do better accumulators; accumulator should be count on ordered values that exist. Would make sense to have a count of start and end dates each ordered over time. TO DO)
ods_df=ods.DataFrame()
ods_df['start_date'] = ods_df['start_date'].astype('datetime64[ns]')
ods_df['end_date'] = ods_df['end_date'].astype('datetime64[ns]')
ods_df=ods_df.sort(['start_date','end_date'])
ods_df['acc']=range(1,len(ods_df['company_number'])+1)
ods_df[:3]
officer_name | start_date | end_date | inactive | company_name | company_number | acc | |
---|---|---|---|---|---|---|---|
499 | YOMTOV ELIEZER JACOBS | 2008-08-20 00:00:00 | 2008-10-07 00:00:00 | true | ALLENFARE LTD | 06647039 | 1 |
126 | YOMTOV ELIEZER JACOBS | 2008-08-20 00:00:00 | 2008-10-24 00:00:00 | true | ACAMBA SYSTEMS LTD | 06597345 | 2 |
590 | YOMTOV ELIEZER JACOBS | 2008-08-21 00:00:00 | 2009-01-26 00:00:00 | true | AMAXA DESIGN LTD | 06668429 | 3 |
We can do a simple chart to show how starts have progressed over time.
g= ggplot(ods_df, aes('start_date','acc')) + \
geom_line(color='black')
g
<ggplot: (285005821)>
For each company, we can also plot how long each position lasted.
fig=plt.figure()
ax1 = fig.add_subplot(111)
ax1.scatter(ods_df['start_date'],ods_df['acc'],c='r',marker='x',s=1)
tmp=ods_df[ods_df['end_date'].notnull()]
ax1.scatter(tmp['end_date'],tmp['acc'],c='b',marker='+',s=1)
ax1.figure.show()
(This chart would be better as a series of horizontal lines, with y* equal to accession number, start_date as xstart and end_date as xend.) I can do this in R/ggplot easy enough but not sure how to here?!*)
ods_df['duration']=ods_df['end_date']-ods_df['start_date']
#The difference is provided in nanoseconds. Not sure how to convert this to days?
ods_df[:5]
officer_name | start_date | end_date | inactive | company_name | company_number | acc | duration | |
---|---|---|---|---|---|---|---|---|
499 | YOMTOV ELIEZER JACOBS | 2008-08-20 00:00:00 | 2008-10-07 00:00:00 | true | ALLENFARE LTD | 06647039 | 1 | 48 days, 00:00:00 |
126 | YOMTOV ELIEZER JACOBS | 2008-08-20 00:00:00 | 2008-10-24 00:00:00 | true | ACAMBA SYSTEMS LTD | 06597345 | 2 | 65 days, 00:00:00 |
590 | YOMTOV ELIEZER JACOBS | 2008-08-21 00:00:00 | 2009-01-26 00:00:00 | true | AMAXA DESIGN LTD | 06668429 | 3 | 158 days, 00:00:00 |
185 | YOMTOV ELIEZER JACOBS | 2008-08-26 00:00:00 | 2008-08-27 00:00:00 | true | ACORN 2008 LTD | 06680988 | 4 | 1 days, 00:00:00 |
452 | YOMTOV ELIEZER JACOBS | 2008-08-27 00:00:00 | 2008-08-28 00:00:00 | true | ALEXANDER NOBLE FURNISHINGS LTD | 06682800 | 5 | 1 days, 00:00:00 |
ods_df.groupby('duration').size()
duration 0 312 86400000000000 153 172800000000000 15 259200000000000 53 345600000000000 4 432000000000000 5 518400000000000 5 604800000000000 1 691200000000000 1 777600000000000 4 864000000000000 6 950400000000000 2 1036800000000000 2 1123200000000000 2 1209600000000000 1 1296000000000000 2 1382400000000000 2 1728000000000000 1 1814400000000000 2 1987200000000000 1 2160000000000000 1 2246400000000000 1 2332800000000000 3 2678400000000000 2 2851200000000000 2 2937600000000000 1 3024000000000000 1 3110400000000000 2 3974400000000000 1 4060800000000000 1 4147200000000000 1 4492800000000000 1 5097600000000000 1 5529600000000000 1 5616000000000000 1 5961600000000000 1 6652800000000000 1 6739200000000000 1 6825600000000000 1 6912000000000000 1 7257600000000000 1 8208000000000000 1 9158400000000000 1 9936000000000000 1 10281600000000000 1 13132800000000000 1 13651200000000000 1 13824000000000000 1 14688000000000000 1 dtype: int64
ods_df.groupby('duration').size().plot()
<matplotlib.axes.AxesSubplot at 0x113474090>
In the above case, we note that the officer tends to only hold their role for a very short time. Presumably, they are just acting as an agent in the creation of the corresponding companies.
(A corollary to this might be to consider companies that are only in existence for a short period of time?)
Let's have a little think about what question we may ask around short-lived officer positions. To begin with create a view with date types for start and end dates for officer positions where both start and end dates exist.
%sql DROP VIEW officerdates
%sql CREATE VIEW officerdates AS \
SELECT *,to_date(start_date,'YYYY-MM-DD') startdate, to_date(end_date,'YYYY-MM-DD') enddate, \
to_date(end_date,'YYYY-MM-DD')-to_date(start_date,'YYYY-MM-DD') diffdate \
FROM ukcompanies.officer_export WHERE start_date ~ '[0-9]*-[0-9]*-[0-9]*' AND end_date~ '[0-9]*-[0-9]*-[0-9]*'
Done. Done.
[]
%sql DROP VIEW companydates
%sql CREATE VIEW companydates AS \
SELECT *,to_date(incorporation_date,'YYYY-MM-DD') startdate, to_date(dissolution_date,'YYYY-MM-DD') enddate,\
to_date(dissolution_date,'YYYY-MM-DD')-to_date(incorporation_date,'YYYY-MM-DD') diffdate \
FROM ukcompanies.company_export WHERE incorporation_date ~ '[0-9]*-[0-9]*-[0-9]*' \
AND dissolution_date ~ '[0-9]*-[0-9]*-[0-9]*'
(ProgrammingError) view "companydates" does not exist 'DROP VIEW companydates' {} Done.
[]
Then we should be able to run a query to see which officers are commonly associated with holding positions for less than five days, for example.
%sql SELECT * FROM companydates LIMIT 3
3 rows affected.
id | company_number | name | company_type | current_status | incorporation_date | dissolution_date | ra_street_address | ra_locality | ra_region | ra_postal_code | ra_country | startdate | enddate | diffdate |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | 05438136 | A LIMITED | Private Limited Company | Dissolved | 2005-04-27 | 2012-12-04 | 60 WINDSOR AVENUE | LONDON | NA | SW19 2RR | United Kingdom | 2005-04-27 | 2012-12-04 | 2778 |
4 | SC322354 | @AA LIMITED | Private Limited Company | Dissolved | 2007-04-26 | 2009-06-26 | 31 MILVERTON AVENUE, BEARSDEN | GLASGOW | NA | G61 4BE | 2007-04-26 | 2009-06-26 | 792 | |
6 | 04264801 | AAA LIMITED | Private Limited Company | Dissolved | 2001-08-03 | 2009-03-17 | 5 KINGS COURT, HARWOOD ROAD, HORSHAM | WEST SUSSEX | NA | RH13 5UR | 2001-08-03 | 2009-03-17 | 2783 |
Let's see which officers have lot's of positions with start and termination dates on the same date. (This could be an error in the data, or an indication that the officer played a role simple in setting up the company):
%sql SELECT name, COUNT(*) FROM officerdates WHERE diffdate = 0 GROUP BY name ORDER BY count DESC LIMIT 10
10 rows affected.
name | count |
---|---|
SWIFT INCORPORATIONS LIMITED | 3040 |
TEMPLE SECRETARIES LIMITED | 2670 |
COMPANY DIRECTORS LIMITED | 2613 |
WATERLOW SECRETARIES LIMITED | 1320 |
WATERLOW NOMINEES LIMITED | 1227 |
INSTANT COMPANIES LIMITED | 984 |
LONDON LAW SECRETARIAL LIMITED | 859 |
LONDON LAW SERVICES LIMITED | 819 |
BRIAN REID LTD. | 729 |
STEPHEN MABBOTT LTD. | 600 |
Or how about companies that have incorrporation and dissolution dates within 150 days (a little bit less than six months) of each other?
%sql SELECT * FROM companydates WHERE diffdate < 150 ORDER BY diffdate LIMIT 10
10 rows affected.
id | company_number | name | company_type | current_status | incorporation_date | dissolution_date | ra_street_address | ra_locality | ra_region | ra_postal_code | ra_country | startdate | enddate | diffdate |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
20847 | 07391969 | ABISKO TESTING LIMITED | Private Limited Company | Dissolved | 2010-09-29 | 2011-02-01 | SOMERSET HOUSE, 6070 BIRMINGHAM BUSINESS PARK, BIRMINGHAM, WEST MIDLANDS | UNITED KINGDOM | NA | B37 7BF | 2010-09-29 | 2011-02-01 | 125 | |
33239 | 07168189 | ACCOUNTAX CONSULTANTS (CAMBERLEY) LTD | Private Limited Company | Dissolved | 2010-02-24 | 2010-06-29 | 44 UPPER BELGRAVE ROAD, BRISTOL | ENGLAND | NA | BS8 2XN | 2010-02-24 | 2010-06-29 | 125 | |
60349 | 07391696 | ADRIAN MEDICAL SERVICES LIMITED | Private Limited Company | Dissolved | 2010-09-29 | 2011-02-01 | 141 FERME PARK ROAD, TOTTENHAM, LONDON | ENGLAND | NA | N8 9SG | 2010-09-29 | 2011-02-01 | 125 | |
17183 | 07279212 | A B DESIGN SYSTEMS LIMITED | Private Limited Company | Dissolved | 2010-06-09 | 2010-10-12 | WINNINGTON HOUSE, 2 WOODBERRY GROVE, LONDON | UNITED KINGDOM | NA | N12 0DR | 2010-06-09 | 2010-10-12 | 125 | |
66556 | 07268961 | ADVENT INFORMATION LTD | Private Limited Company | Dissolved | 2010-06-01 | 2010-10-05 | 21 STEMBRIDGE RD, ANERLEY, LONDON | UNITED KINGDOM | NA | SE20 7UE | 2010-06-01 | 2010-10-05 | 126 | |
70645 | 06833232 | AERO OPTIMAIL REC LIMITED | Private Limited Company | Dissolved | 2009-03-02 | 2009-07-07 | CHESS HOUSE, GREENS FARM, NEWDIGATE, DORKING | SURREY | NA | RH5 5DD | 2009-03-02 | 2009-07-07 | 127 | |
117265 | 07340325 | ALITE INDUSTRIES (UK) LIMITED | Private Limited Company | Dissolved | 2010-08-09 | 2010-12-14 | LIFFORD HALL LIFFORD LANE, KINGS NORTON, BIRMINGHAM, WEST MIDLANDS | UNITED KINGDOM | NA | B30 3JN | United Kingdom | 2010-08-09 | 2010-12-14 | 127 |
16842 | 06800356 | ABC SAFETY TRAINING LIMITED | Private Limited Company | Dissolved | 2009-01-23 | 2009-06-02 | 8 LAMMAS GARDENS, HUNTINGDON | CAMBRIDGESHIRE | NA | PE29 7LJ | 2009-01-23 | 2009-06-02 | 130 | |
72061 | 07364179 | AFAN MEDIA LIMITED | Private Limited Company | Dissolved | 2010-09-02 | 2011-01-11 | 2ND FLOOR, 43 BROOMFIELD ROAD, CHELMSFORD, ESSEX | UNITED KINGDOM | NA | CM11SY | United Kingdom | 2010-09-02 | 2011-01-11 | 131 |
68232 | 06950899 | A E ALLEN LIMITED | Private Limited Company | Dissolved | 2009-07-02 | 2009-11-10 | 22 PARSONAGE ROAD, TUNBRIDGE WELLS, KENT | UNITED KINGDOM | NA | TN4 8TA | 2009-07-02 | 2009-11-10 | 131 |
How might we identify an officer as a nominee?
One immediate way in is to search on the officer position type:
officerType= %sql SELECT DISTINCT position FROM ukcompanies.officer_export ORDER BY position;
officerType
7 rows affected.
position |
---|
director |
llp designated member |
llp member |
nominated director |
nominated secretary |
position |
secretary |
Trivially, we can search for nominated directors or secretaries who are also active:
tmp = %sql SELECT * FROM ukcompanies.officer_export \
WHERE position LIKE 'nominated%' AND inactive !='true' LIMIT 3;
tmp
3 rows affected.
id | company_id | name | position | start_date | end_date | date_of_birth | inactive | occupation | nationality | address |
---|---|---|---|---|---|---|---|---|---|---|
79846141 | 12269 | HERTFORD COMPANY SECRETARIES LIMITED | nominated secretary | 2007-05-23 | BRITISH | RMG HOUSE ESSEX ROAD, HODDESDON, HERTFORDSHIRE EN11 0DR, UNITED KINGDOM | ||||
74772891 | 12347 | IGP CORPORATE NOMINEES LTD. | nominated secretary | 2002-04-17 | BRITISH | 19 KATHLEEN ROAD, LONDON SW11 2JR, ENGLAND | ||||
82162533 | 12941 | HERTFORD COMPANY SECRETARIES LIMITED | nominated secretary | 2002-06-27 | BRITISH | RMG HOUSE ESSEX ROAD, HODDESDON, HERTFORDSHIRE EN11 0DR, UNITED KINGDOM |
The next thing we might want to do is to try to find companies that only have nominated positions.
One way to do this might be to count the number of active officers a company has, and also count the number of nominated officers it has, and if they are the same, declare the company a likely shell company.
Let's create a view which contains just active officers with a column that counts how many there are:
tmp = %sql SELECT * FROM currentOfficers LIMIT 5;
tmp
5 rows affected.
id | company_id | name | position | start_date | end_date | date_of_birth | inactive | occupation | nationality | address | activeofficers |
---|---|---|---|---|---|---|---|---|---|---|---|
45927742 | 1 | ELIZABETH DAVIES | director | 2006-10-05 | 1962-01-28 | FINANCIAL ADVISER | BRITISH | 12 PONTYMASON RISE, ROGERSTONE, NEWPORT, GWENT NP10 9GJ | 2 | ||
59614127 | 1 | SALLY-ANN MICHELLE BROMLEY | director | 2012-12-21 | 1966-08-25 | FINANCE DIRECTOR | BRITISH | THE OLD BARN HEOL DDU, CASTELLAU, LLANTRISANT, MID GLAMORGAN CF72 8LQ, UNITED KINGDOM | 2 | ||
51730586 | 1000 | TAJAMAL AHMED | director | 2010-08-26 | 1973-01-05 | MANGER | PAKISTANI | 127 LONGDEN ROAD, MANCHESTER M12 5SJ, UNITED KINGDOM | 1 | ||
33498345 | 10000 | RAYMOND DUCKWORTH | secretary | 1992-08-03 | 1951-04-03 | BRITISH | 14 CONWAY DRIVE, OSWALDTWISTLE, ACCRINGTON, LANCASHIRE BB5 4QH | 3 | |||
33498347 | 10000 | RAYMOND DUCKWORTH | director | 1992-08-03 | 1951-04-03 | ADMIN MANAGER | BRITISH | 14 CONWAY DRIVE, OSWALDTWISTLE, ACCRINGTON, LANCASHIRE BB5 4QH | 3 |
%sql SELECT * FROM ( \
SELECT *, COUNT(*) OVER (PARTITION BY company_id) FROM currentOfficers \
WHERE position LIKE 'nominated%' ) as tmp \
WHERE count=activeofficers ORDER BY company_id;
3 rows affected.
id | company_id | name | position | start_date | end_date | date_of_birth | inactive | occupation | nationality | address | activeofficers | count |
---|---|---|---|---|---|---|---|---|---|---|---|---|
56060806 | 21057 | EWAN CRAIG NEILSON | nominated director | 2007-01-05 | 1958-11-03 | BRITISH | 16 EARLSPARK ROAD, BIELDSIDE, ABERDEEN AB15 9BZ | 2 | 2 | |||
83375365 | 21057 | STRONACHS | nominated secretary | 2007-01-05 | BRITISH | 34 ALBYN PLACE, ABERDEEN, ABERDEENSHIRE AB10 1FW | 2 | 2 | ||||
82314041 | 80927 | GO AHEAD SERVICE LIMITED | nominated secretary | 2005-07-25 | BRITISH | 69 GREAT HAMPTON STREET, BIRMINGHAM, WEST MIDLANDS B18 6EW | 1 | 1 |
We can also run a query to just pull out the companies that only have nominees as active officers, joining the result on the company table to get the company names:
shells = %sql SELECT DISTINCT o.company_id, c.name, c.current_status FROM ( \
SELECT *, COUNT(*) OVER (PARTITION BY company_id) FROM currentOfficers \
WHERE position LIKE 'nominated%' ) AS o JOIN ukcompanies.company_export c\
on c.id = o.company_id WHERE count=activeofficers ORDER BY company_id LIMIT 10;
shells
2 rows affected.
company_id | name | current_status |
---|---|---|
21057 | ABK HARRISON COMPANY LIMITED | Active |
80927 | AGORA CONSULTING LIMITED | Active |
Let's just do a quick check on one of those to see if the result seems to make sense:
%sql SELECT * FROM ukcompanies.officer_export WHERE company_id='80927'
3 rows affected.
id | company_id | name | position | start_date | end_date | date_of_birth | inactive | occupation | nationality | address |
---|---|---|---|---|---|---|---|---|---|---|
43995593 | 80927 | HELMA BUCHHOLZ | director | 2006-02-01 | 2012-12-03 | 1960-12-10 | true | DIRECTOR | GERMAN | FRIEDRICHSTR. 41, 10969 BERLIN, GERMANY |
59858474 | 80927 | SEBASTIAN JOSEF HILGER | director | 2005-07-25 | 2005-12-31 | 1963-04-12 | true | DIRECTOR | GERMAN | HANNS-EISLER-STRASSE 4, BERLIN 10409, GERMANY |
82314041 | 80927 | GO AHEAD SERVICE LIMITED | nominated secretary | 2005-07-25 | BRITISH | 69 GREAT HAMPTON STREET, BIRMINGHAM, WEST MIDLANDS B18 6EW |
One of the things we are looking for is to see whether or not we can identify shell companies. Red flags associated with shell companies include the presence of nominee officers and corporate officers (for example, officers that are themselves companies). In this section, we start to investigate whether we can identify corporate officers as such.
One heuristic for identifying whether an officer is a company is to look for company designators in the officer name. If we pull out the "lastname" of the officer, we can look up whether it is a company type.
Create a view that includes a lastname column - that is, the last lexeme from the name:
We can then pull out popular lastnames, noting that many of them appear to be companies.
cln=%sql SELECT lastname, COUNT(*) FROM officers2 GROUP BY lastname ORDER BY COUNT(*) DESC LIMIT 10;
cln
10 rows affected.
lastname | count |
---|---|
LIMITED | 70754 |
LTD | 16175 |
SMITH | 4814 |
JONES | 3534 |
BROWN | 2594 |
WILLIAMS | 2400 |
LTD. | 2399 |
TAYLOR | 2271 |
DAVIES | 2118 |
ALLEN | 1796 |
Also note: we may be able to use lastname to identify correspondences between different variants of the same name (JOHN SMITH =? JOHN ARCHIBALD SMITH) if, for example, the directors share other common attributes, such as the same birthday. Surname may also help identify directorships in a company shared across family members.
Note that variants of the same company type exist - eg. LTD, LIMITED, LTD.
%sql SELECT lastname, COUNT(*) FROM companies2 GROUP BY lastname HAVING COUNT(*) > 20 ORDER BY count DESC;
34 rows affected.
lastname | count |
---|---|
LIMITED | 109644 |
LTD | 29315 |
LTD. | 3417 |
LLP | 2012 |
PLC | 450 |
PARTNERSHIP | 235 |
COMPANY | 205 |
LIMITED(THE) | 200 |
INC. | 198 |
(THE) | 134 |
TRUST | 132 |
LP | 118 |
FOUNDATION | 107 |
L.P. | 85 |
CORPORATION | 61 |
CENTRE | 57 |
INC | 55 |
LLC | 51 |
CO.LIMITED | 46 |
ASSOCIATION | 46 |
CIC | 46 |
B.V. | 44 |
UK | 43 |
THE | 32 |
GMBH | 27 |
INTERNATIONAL | 27 |
PROJECT | 26 |
SERVICES | 26 |
FUND | 24 |
CYF | 24 |
SOCIETY | 23 |
UNLIMITED | 22 |
(UK) | 21 |
C.I.C. | 21 |
We can compare these designators with the "lastname" equivalent pulled from the companies table.
Again, let's create a view that includes this extra column:
Now we can also see how companies of different company_type are named:
%sql SELECT company_type, lastname, COUNT(*) FROM companies2 \
GROUP BY company_type,lastname HAVING COUNT(*) > 20 ORDER BY company_type;
44 rows affected.
company_type | lastname | count |
---|---|---|
(THE) | 51 | |
LIMITED | 304 | |
LIMITED(THE) | 26 | |
Community Interest Company | C.I.C. | 21 |
Community Interest Company | CIC | 46 |
Community Interest Company | COMPANY | 93 |
Converted/Closed | LIMITED | 116 |
Industrial and Provident Society | (THE) | 28 |
Industrial and Provident Society | LIMITED | 263 |
Limited Liability Partnership | LLP | 2012 |
Limited Liability Partnership | PARTNERSHIP | 81 |
Limited Partnership | L.P. | 85 |
Limited Partnership | LP | 118 |
Limited Partnership | PARTNERSHIP | 144 |
Other company type | B.V. | 31 |
Other company type | COMPANY | 32 |
Other company type | CORPORATION | 51 |
Other company type | GMBH | 23 |
Other company type | INC | 53 |
Other company type | INC. | 177 |
Other company type | LIMITED | 469 |
Other company type | LLC | 40 |
Other company type | LTD | 32 |
Other company type | LTD. | 42 |
PRI/LBG/NSC (Private, Limited by guarantee, no share capital, use of 'Limited' exemption) | ASSOCIATION | 40 |
PRI/LBG/NSC (Private, Limited by guarantee, no share capital, use of 'Limited' exemption) | CENTRE | 55 |
PRI/LBG/NSC (Private, Limited by guarantee, no share capital, use of 'Limited' exemption) | FOUNDATION | 103 |
PRI/LBG/NSC (Private, Limited by guarantee, no share capital, use of 'Limited' exemption) | PROJECT | 22 |
PRI/LBG/NSC (Private, Limited by guarantee, no share capital, use of 'Limited' exemption) | TRUST | 123 |
PRI/LBG/NSC (Private, Limited by guarantee, no share capital, use of 'Limited' exemption) | UK | 30 |
PRI/LTD BY GUAR/NSC (Private, limited by guarantee, no share capital) | (THE) | 41 |
PRI/LTD BY GUAR/NSC (Private, limited by guarantee, no share capital) | LIMITED | 1684 |
PRI/LTD BY GUAR/NSC (Private, limited by guarantee, no share capital) | LIMITED(THE) | 89 |
PRI/LTD BY GUAR/NSC (Private, limited by guarantee, no share capital) | LTD | 366 |
PRI/LTD BY GUAR/NSC (Private, limited by guarantee, no share capital) | LTD. | 48 |
Private Limited Company | CO.LIMITED | 44 |
Private Limited Company | CYF | 22 |
Private Limited Company | LIMITED | 106804 |
Private Limited Company | LIMITED(THE) | 61 |
Private Limited Company | LTD | 28902 |
Private Limited Company | LTD. | 3322 |
Private Unlimited Company | UNLIMITED | 21 |
Public Limited Company | COMPANY | 24 |
Public Limited Company | PLC | 435 |
One thing we might do is generate a view with a column that says whether or not an officer is potentially a corporate officer by virtue of having a "last name" that is a popular "last name" in the companies table?
Alternatively, we can create a view on the officers table that just contains rows where the last name of the officer is a common last name in the companies table.
We can use this view to investigate officers that are themselves companies.
In an earlier section, we generated a report showing common officers. This report was dominated by corporate officers. Let's try to find non-corporate (i.e. more likely human person) officers that hold or have held many positions.
%sql SELECT name,position, COUNT(*) FROM officers2 \
WHERE lastname NOT IN (SELECT lastname FROM companies2 GROUP BY lastname ORDER BY COUNT(*) DESC LIMIT 10) \
GROUP BY name, position ORDER BY COUNT(*) DESC LIMIT 10
10 rows affected.
name | position | count |
---|---|---|
YOMTOV ELIEZER JACOBS | director | 613 |
BARBARA KAHAN | director | 473 |
LESLEY JOYCE GRAEME | nominated director | 325 |
DOROTHY MAY GRAEME | nominated secretary | 323 |
ASHOK BHARDWAJ | nominated secretary | 284 |
HOWARD THOMAS | nominated secretary | 259 |
KEVIN BREWER | nominated director | 250 |
SUZANNE BREWER | nominated secretary | 242 |
IRENE LESLEY HARRISON | nominated secretary | 214 |
WILLIAM ANDREW JOSEPH TESTER | nominated director | 192 |
Let's remove the explicitly stated nominee positions:
%sql SELECT name,position, COUNT(*) FROM officers2 \
WHERE lastname NOT IN (SELECT lastname FROM companies2 GROUP BY lastname ORDER BY COUNT(*) DESC LIMIT 10) \
AND position NOT ILIKE '%nominated%' GROUP BY name, position ORDER BY COUNT(*) DESC LIMIT 10
10 rows affected.
name | position | count |
---|---|---|
YOMTOV ELIEZER JACOBS | director | 613 |
BARBARA KAHAN | director | 473 |
STEPHEN GEORGE MABBOTT | director | 178 |
ADERYN HURWORTH | director | 130 |
DOROTHY MAY KANE | secretary | 128 |
GRAHAM ROBERTSON STEPHENS | director | 121 |
GRAHAM MICHAEL COWAN | director | 119 |
PETER VALAITIS | director | 110 |
IRENE LESLEY HARRISON | secretary | 96 |
DUNSTANA ADESHOLA DAVIES | director | 87 |
Now let's look for currently overexposed officers (that is, officers with not inactive positions (i.e. with active positions)) on a similar basis:
%sql SELECT name,position, COUNT(*) FROM officers2 \
WHERE lastname NOT IN (SELECT lastname FROM companies2 GROUP BY lastname ORDER BY COUNT(*) DESC LIMIT 10) \
AND inactive !='true' AND position NOT ILIKE '%nominated%' \
GROUP BY name, position ORDER BY COUNT(*) DESC LIMIT 10
10 rows affected.
name | position | count |
---|---|---|
ROSALYN SHARON SCHOFIELD | secretary | 43 |
TAGUMA NGONDONGA | secretary | 41 |
DAVID MARTIN GREEN | director | 38 |
FRANK STANDISH | secretary | 35 |
HERVE DENIS MICHEL FETTER | director | 31 |
SHAUN PATRICK COLES | director | 29 |
STUART MACNAB | director | 29 |
GILES PATRICK CYRIL MACKAY | director | 28 |
DAVID ALEXANDER ORCHIN | director | 27 |
CHARLES RODERICK SPENCER FOWLER | secretary | 27 |
For human directors of multiple companies, one of the things we are interested in is a situation whereby the director is acting as a nominee of lots of different companies, rather than for example a 'legitimate' director of lots of branch offices of the same corporate group. (So to this extent, we want to find people who are directors of large numbers of companies where those companies appear not to be part of a large company group.)
There are a couple of ways we might start to consider this:
As a first approximation, might we use the first lexeme of a company name to act as a key?
Let's start by having a quick look at companies associated with ROSALYN SHARON SCHOFIELD.
%sql SELECT c.name, c.ra_postal_code, o.name FROM ukcompanies.company_export c, ukcompanies.officer_export o \
WHERE o.name = 'ROSALYN SHARON SCHOFIELD' AND o.company_id = c.id AND \
position ='secretary' AND inactive !='true'
43 rows affected.
name | ra_postal_code | name_1 |
---|---|---|
AB AGRI LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
A.B. EXPLORATION LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
ABF FINANCE LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
ABF FUNDING | W1K 4QY | ROSALYN SHARON SCHOFIELD |
ABF GRAIN PRODUCTS LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
ABF GREEN PARK LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
ABF GROCERY LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
A.B.F.HOLDINGS LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
ABF INGREDIENTS LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
ABF INVESTMENTS PLC | W1K 4QY | ROSALYN SHARON SCHOFIELD |
ABF JAPAN LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
ABF LUX LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
A.B.F. NOMINEES LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
ABF (NO.1) LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
ABF (NO.3) LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
ABF (NO.2) LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
ABF OVERSEAS LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
A.B.F. PROPERTIES LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
ABF (UK) LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
ABF UK FINANCE LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
ABF US FINANCE LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
ABF US INVESTMENTS LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
ABF VICTORIA PARK | W1K 4QY | ROSALYN SHARON SCHOFIELD |
AB INGREDIENTS LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
AB MAURI EUROPE LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
AB MAURI (UK) LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
ABNA LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
ABNA FEED COMPANY LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
ABN (OVERSEAS) LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
ABN (SCOTLAND) LIMITED | G22 7UP | ROSALYN SHARON SCHOFIELD |
AB TECHNOLOGY LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
AB WORLD FOODS LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
AB WORLD FOODS (HOLDINGS) LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
AGRILINES LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
THE AGRONOMY PARTNERSHIP LIMITED | WC1B 4JA | ROSALYN SHARON SCHOFIELD |
ALLIED BAKERIES LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
ALLIED GRAIN LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
ALLIED GRAIN (SCOTLAND) LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
ALLIED GRAIN (SOUTH) LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
ALLIED GRAIN (SOUTHERN) LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
ALLIED MILLS LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
ALLIED TECHNICAL CENTRE LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
ALLINSON LIMITED | W1K 4QY | ROSALYN SHARON SCHOFIELD |
This suggests it may make sense to look for postcodes?
%sql SELECT c.ra_postal_code, o.name, COUNT(*) FROM ukcompanies.company_export c, ukcompanies.officer_export o \
WHERE o.name = 'ROSALYN SHARON SCHOFIELD' AND o.company_id = c.id AND \
position ='secretary' AND inactive !='true' \
GROUP BY c.ra_postal_code, o.name ORDER BY count DESC
3 rows affected.
ra_postal_code | name | count |
---|---|---|
W1K 4QY | ROSALYN SHARON SCHOFIELD | 41 |
WC1B 4JA | ROSALYN SHARON SCHOFIELD | 1 |
G22 7UP | ROSALYN SHARON SCHOFIELD | 1 |
So is this an address used by a nominee, or the registered address at the heart of a large company network?
Let's check to see if there are any PLC's registered at that address?
%sql SELECT name FROM ukcompanies.company_export WHERE ra_postal_code ='W1K 4QY' AND name ILIKE '%PLC%'
1 rows affected.
name |
---|
ABF INVESTMENTS PLC |
One way of trying to reconcile officers is to group by name and date of birth.
Note that the following counts do not necessarily reflect the number of companies an officer is associated with - an officer may be counted across several rows for the same company (for example, if they hold multiple roles, such as secretary and director, or if they held a position at different times).
%sql SELECT name,date_of_birth, COUNT(*) FROM ukcompanies.officer_export \
WHERE date_of_birth !='' GROUP BY name, date_of_birth ORDER BY count DESC LIMIT 10;
10 rows affected.
name | date_of_birth | count |
---|---|---|
YOMTOV ELIEZER JACOBS | 1970-10-18 | 603 |
BARBARA KAHAN | 1931-06-07 | 466 |
LESLEY JOYCE GRAEME | 1953-12-30 | 371 |
DOROTHY MAY GRAEME | 1919-05-11 | 346 |
IRENE LESLEY HARRISON | 1946-08-05 | 307 |
COMBINED NOMINEES LIMITED | 1990-08-24 | 281 |
KEVIN BREWER | 1952-04-21 | 266 |
HOWARD THOMAS | 1945-05-03 | 263 |
FIRST DIRECTORS LIMITED | 1989-08-07 | 194 |
WILLIAM ANDREW JOSEPH TESTER | 1962-06-29 | 192 |
Let's start to think through what we need to do to try to find near matches. This might involve finding near matches and then looking for ones that share the same birthdate or same address?
Let's start with near matches:
cname='JOHN SMITH'
%sql SELECT DISTINCT name FROM ukcompanies.officer_export WHERE name @@ :cname ORDER BY name LIMIT 5;
5 rows affected.
name |
---|
ADAM JOHN SMITH |
ADRIAN JOHN RUSSELL SMITH |
ADRIAN JOHN SCOTT SMITH |
ALAN JOHN SMITH |
ALEXANDER JOHN SMITH |
One possible next step is to identify birth dates that are repeated for different names in this set:
cbdate='1932-09-18'
%sql SELECT * FROM (SELECT *, COUNT(*) OVER (PARTITION BY date_of_birth) FROM (SELECT DISTINCT name, date_of_birth \
FROM ukcompanies.officer_export WHERE name @@ :cname AND date_of_birth!='' ) AS tmp ) AS tmp2\
WHERE count > 1 ORDER BY count DESC;
8 rows affected.
name | date_of_birth | count |
---|---|---|
JOHN JAYCOT SMITH | 1943-10-02 | 2 |
JOHN JACOT SMITH | 1943-10-02 | 2 |
ANDREW JOHN SMITH | 1967-06-12 | 2 |
ROBERT JOHN RITCHIE SMITH | 1967-06-12 | 2 |
MICHAEL JOHN SMITH | 1967-08-22 | 2 |
JOHN SEAN SMITH | 1967-08-22 | 2 |
MATHHEW JOHN BERGMANN -SMITH | 1972-07-30 | 2 |
MATTHEW JOHN BERGMANN -SMITH | 1972-07-30 | 2 |
The count value lets us filter the search to look for exact matching names that have more than one birthdate associated with them. The results table suggests that there may be some cleaning to be done around a couple of the names?
Can we also try a similar approach with addresses, looking for directors of a similar name that share the same address? Again, we can call on a count to let us filter results to only show directors with a name that is similar to a specified name that are associated with multiple addresses.
%sql SELECT * FROM (SELECT *, COUNT(*) OVER (PARTITION BY address) FROM (SELECT DISTINCT name, address \
FROM ukcompanies.officer_export WHERE name @@ :cname ) AS tmp ) AS tmp2\
WHERE count > 1 ORDER BY count DESC;
6 rows affected.
name | address | count |
---|---|---|
MATTHEW JOHN STUART-SMITH | ALBANY COTTAGE, WEST DENE WAY, WEYBRIDGE, SURREY KT13 9RG | 2 |
JOHN STUART SMITH | ALBANY COTTAGE, WEST DENE WAY, WEYBRIDGE, SURREY KT13 9RG | 2 |
JOHN JAYCOTT ANTHONY SMITH | HAWK RIDGE 48 LITTLEDEAN HILL ROAD, CINDERFORD, GLOUCESTERSHIRE GL14 2BJ, GREAT BRITAIN | 2 |
JOHN JACOT SMITH | HAWK RIDGE 48 LITTLEDEAN HILL ROAD, CINDERFORD, GLOUCESTERSHIRE GL14 2BJ, GREAT BRITAIN | 2 |
MATHHEW JOHN BERGMANN -SMITH | THE COURTYARD HALL LANE, WINCHAM, NORTHWICH, CHESHIRE CW9 6DG | 2 |
MATTHEW JOHN BERGMANN -SMITH | THE COURTYARD HALL LANE, WINCHAM, NORTHWICH, CHESHIRE CW9 6DG | 2 |
This table shows that there may be a few more opportunities for cleaning the data...
Indeed, it may make sense to try to cluster some of the addresses. For example, from the birth date table we notice in the address table that there's a JOHN JAYCOT SMITH missing? Querying on this name specifically, we see a variant of the address that is shown in the table above:
%sql SELECT * FROM ukcompanies.officer_export WHERE name='JOHN JAYCOT SMITH';
1 rows affected.
id | company_id | name | position | start_date | end_date | date_of_birth | inactive | occupation | nationality | address |
---|---|---|---|---|---|---|---|---|---|---|
52905062 | 91340 | JOHN JAYCOT SMITH | director | 2010-10-30 | 1943-10-02 | MANAGING DIRECTOR | BRITISH | HAWK RIDGE 48 LITTLEDEAN HILL ROAD, CINDERFORD, GLOS GL14 2BJ, ENGLAND |
How about if we try on a name similarity search based around trigrams?
%sql SELECT * FROM (SELECT *, COUNT(*) OVER (PARTITION BY address) FROM (SELECT DISTINCT name, address \
FROM ukcompanies.officer_export WHERE name % :cname ) AS tmp ) AS tmp2\
WHERE count > 1 ORDER BY count DESC LIMIT 20;
20 rows affected.
name | address | count |
---|---|---|
TONY SMITH | 29 SCHOLEY CLOSE, LOWER HALLING, KENT ME2 1JZ | 4 |
LEIGH SMITH | 29 SCHOLEY CLOSE, LOWER HALLING, KENT ME2 1JZ | 4 |
PATRICIA SMITH | 29 SCHOLEY CLOSE, LOWER HALLING, KENT ME2 1JZ | 4 |
JAMIE SMITH | 29 SCHOLEY CLOSE, LOWER HALLING, KENT ME2 1JZ | 4 |
SHANE SMITH | VICTORIA HOUSE 50 ALEXANDRA STREET, SOUTHEND-ON-SEA, ESSEX SS1 1BN | 3 |
EDWARD SMITH | SOVEREIGN HOUSE BAREHILL STREET, LITTLEBOROUGH, LANCASHIRE OL15 9BL, ENGLAND | 3 |
DAVID SMITH | SOVEREIGN HOUSE BAREHILL STREET, LITTLEBOROUGH, LANCASHIRE OL15 9BL, ENGLAND | 3 |
GLENDA SMITH | SOVEREIGN HOUSE BAREHILL STREET, LITTLEBOROUGH, LANCASHIRE OL15 9BL, ENGLAND | 3 |
ADRIAN SMITH | VICTORIA HOUSE 50 ALEXANDRA STREET, SOUTHEND-ON-SEA, ESSEX SS1 1BN | 3 |
DAREN SMITH | VICTORIA HOUSE 50 ALEXANDRA STREET, SOUTHEND-ON-SEA, ESSEX SS1 1BN | 3 |
MARK JOHN SMITH | 1 THE MOUNT, BILLERICAY, ESSEX CM11 1HD | 2 |
ISABELLA SMITH | 10 SCHOOL LANE, FORMBY, LIVERPOOL, MERSEYSIDE L37 3LN | 2 |
ROY SMITH | 10 SCHOOL LANE, FORMBY, LIVERPOOL, MERSEYSIDE L37 3LN | 2 |
GEOFFREY SMITH | 103 ROCKBANK ROAD, LIVERPOOL, MERSEYSIDE L13 7BG, UNITED KINGDOM | 2 |
PAUL SMITH | 10 PENTLEY CLOSE, WELWYN GARDEN CITY, HERTFORDSHIRE AL8 7SH | 2 |
ALAN SMITH | 1 ROSE LEIGH WAY, FERN PARK, SPALDING, LINCOLNSHIRE PE11 1GW | 2 |
MARIA SMITH | 1 HOLBURN PARK, STOCKTON ON TEES, CLEVELAND, TS19 8BH | 2 |
MARY SMITH | 10 PENTLEY CLOSE, WELWYN GARDEN CITY, HERTFORDSHIRE AL8 7SH | 2 |
SAMUEL SMITH | 10 OXCLIFFE AVENUE, HEYSHAM, MORECAMBE, LANCASHIRE, LA3 1PT | 2 |
PAULA SMITH | 1 ROSE LEIGH WAY, FERN PARK, SPALDING, LINCOLNSHIRE PE11 1GW | 2 |
It seems that the trigram based search is much looser. This may be one way of helping identify familial relationships?
Skimming this results table, it looks as if there may also be opportunities for running the OpenRefine clustering tools over the addresses to try to normalise some of them?
How about if we try a different ordering? For example, by address? (We could also try by name.)
%sql SELECT * FROM (SELECT *, COUNT(*) OVER (PARTITION BY address) FROM (SELECT DISTINCT name, address \
FROM ukcompanies.officer_export WHERE name % :cname ) AS tmp ) AS tmp2\
WHERE count > 2 ORDER BY address;
10 rows affected.
name | address | count |
---|---|---|
TONY SMITH | 29 SCHOLEY CLOSE, LOWER HALLING, KENT ME2 1JZ | 4 |
JAMIE SMITH | 29 SCHOLEY CLOSE, LOWER HALLING, KENT ME2 1JZ | 4 |
PATRICIA SMITH | 29 SCHOLEY CLOSE, LOWER HALLING, KENT ME2 1JZ | 4 |
LEIGH SMITH | 29 SCHOLEY CLOSE, LOWER HALLING, KENT ME2 1JZ | 4 |
DAVID SMITH | SOVEREIGN HOUSE BAREHILL STREET, LITTLEBOROUGH, LANCASHIRE OL15 9BL, ENGLAND | 3 |
GLENDA SMITH | SOVEREIGN HOUSE BAREHILL STREET, LITTLEBOROUGH, LANCASHIRE OL15 9BL, ENGLAND | 3 |
EDWARD SMITH | SOVEREIGN HOUSE BAREHILL STREET, LITTLEBOROUGH, LANCASHIRE OL15 9BL, ENGLAND | 3 |
ADRIAN SMITH | VICTORIA HOUSE 50 ALEXANDRA STREET, SOUTHEND-ON-SEA, ESSEX SS1 1BN | 3 |
DAREN SMITH | VICTORIA HOUSE 50 ALEXANDRA STREET, SOUTHEND-ON-SEA, ESSEX SS1 1BN | 3 |
SHANE SMITH | VICTORIA HOUSE 50 ALEXANDRA STREET, SOUTHEND-ON-SEA, ESSEX SS1 1BN | 3 |
Quick test to see if we can fudge postcode extraction...
pc=%sql SELECT DISTINCT pc[1] FROM \
(SELECT regexp_matches(address, ' ([A_Z][A-Z]?[0-9]+[A-Z]? [0-9][A-Z]+)', 'g') pc FROM ukcompanies.officer_export) \
AS tmp LIMIT 5;
pc
5 rows affected.
pc |
---|
AB35 5WT |
AB24 5AU |
AL5 1JW |
AB22 8QY |
AB15 4EP |
Create a view that includes fudged postcodes and lastnames (work in progress - not all rows are preserved atm):
Quick test to see how the new view works out:
%sql SELECT address,lastname,pc FROM officers3 LIMIT 3;
3 rows affected.
address | lastname | pc |
---|---|---|
6 UPPER CULVER ROAD, ST. ALBANS, HERTFORDSHIRE, AL1 4EE | KLUG | AL1 4EE |
ABBEYFIELD HOUSE 53 VICTORIA STREET, ST. ALBANS, HERTFORDSHIRE AL1 3UW | SINGARAYER | AL1 3UW |
53 VICTORIA STREET, ST ALBANS, HERTFORDSHIRE AL1 3UW | ALLEN | AL1 3UW |
If we do a row count on the table, or a query to count rows by "lastname", we see the officers3 table has lost some rows:-(
%sql SELECT lastname, COUNT(*) FROM officers3 GROUP BY lastname ORDER BY count DESC LIMIT 10;
10 rows affected.
lastname | count |
---|---|
LIMITED | 569 |
LTD | 172 |
SMITH | 125 |
LLP | 101 |
ROBERTSON | 71 |
REID | 62 |
BROWN | 60 |
ROSS | 57 |
PLC | 54 |
MILNE | 53 |
One of the key red flag indicators are company registrations or officer addresses located in territories that are not transparent as far as company reporting goes.
So what territories do our company and officer registries know about...?
It's easy enough to use the ra_country field to pull out companies with registered addresses that aren't in the UK, but there is no equivalent filed for officer addresses - in that case, we would have to either split the address field or use country names to search within addresses.
Let's have a quick look at companies registered outside the UK. Firstly, what countries are involved?
%sql SELECT ra_country, COUNT(*) FROM ukcompanies.company_export GROUP BY ra_country ORDER BY count DESC
89 rows affected.
ra_country | count |
---|---|
United Kingdom | 113986 |
34491 | |
UNITED STATES | 331 |
Jersey | 43 |
Great Britain | 28 |
ISLE OF MAN | 25 |
CAYMAN ISLANDS | 24 |
INDIA | 23 |
CANADA | 23 |
Guernsey | 17 |
VIRGIN ISLANDS | 17 |
AUSTRALIA | 17 |
NETHERLANDS | 16 |
IRELAND | 13 |
GREAT BRITAIN | 12 |
LUXEMBOURG | 10 |
SOUTH AFRICA | 10 |
GIBRALTAR | 10 |
JAPAN | 10 |
WEST GERMANY | 9 |
SWEDEN | 9 |
UNITED KINGDOM | 9 |
FRANCE | 9 |
NORWAY | 8 |
SWITZERLAND | 8 |
PAKISTAN | 8 |
BAHAMAS | 7 |
ITALY | 7 |
BELGIUM | 6 |
NORTHERN IRELAND | 6 |
HONG KONG | 6 |
Ireland | 5 |
BERMUDA | 5 |
GERMANY | 4 |
TURKEY | 4 |
SRI LANKA | 4 |
SPAIN | 4 |
CHINA | 4 |
LIECHTENSTEIN | 3 |
NIGERIA | 3 |
MALAYSIA | 3 |
CYPRUS | 3 |
KAZAKHSTAN | 3 |
BRAZIL | 3 |
ZIMBABWE | 3 |
OMAN | 2 |
SOUTH KOREA | 2 |
JAMAICA | 2 |
RUSSIA | 2 |
FIJI | 2 |
DENMARK | 2 |
MONGOLIA | 2 |
SEYCHELLES | 2 |
UNITED ARAB EMIRATES | 2 |
NETHERLANDS ANTILLES | 2 |
ALGERIA | 2 |
ALBANIA | 2 |
IVORY COAST | 2 |
LITHUANIA | 2 |
KUWAIT | 2 |
ENGLAND & WALES | 2 |
UKRAINE | 2 |
NEW ZEALAND | 2 |
MALTA | 1 |
SLOVAKIA | 1 |
PORTUGAL | 1 |
GREECE | 1 |
HOLLAND | 1 |
SINGAPORE | 1 |
KENYA | 1 |
ISRAEL | 1 |
British Virgin Islands | 1 |
ra_country | 1 |
MALAWI | 1 |
REPUBLIC OF IRELAND | 1 |
LIBERIA | 1 |
PANAMA | 1 |
MACAU | 1 |
CZECHOSLOVAKIA | 1 |
UK | 1 |
United States of America | 1 |
ARGENTINA | 1 |
CHANNEL ISLANDS | 1 |
Belgium | 1 |
POLAND | 1 |
JERSEY | 1 |
MAURITIUS | 1 |
JORDAN | 1 |
SOUTH-WEST AFRICA | 1 |
Let's have a peek at some companies registered in the (British) Virgin Islands:
%sql SELECT * FROM ukcompanies.company_export WHERE ra_country ILIKE '%virgin%' LIMIT 3
3 rows affected.
id | company_number | name | company_type | current_status | incorporation_date | dissolution_date | ra_street_address | ra_locality | ra_region | ra_postal_code | ra_country |
---|---|---|---|---|---|---|---|---|---|---|---|
32649 | BR003134 | ACCOTEC LIMITED | Other company type | Converted / Closed | AKARA BUILDING, SUITE 8, CAY 1 ROAD TOWN TORTOLA, BRITISH VIRGIN ISLANDS, VIRGIN ISLANDS | NA | VIRGIN ISLANDS | ||||
38268 | NF003336 | ACHILL ISLAND HOLDINGS LIMITED | Other company type | Active | 1997-05-06 | C/0 AMS FINANCIAL SERVICES LIMITED, THE CREQUE BUILDING, ROAD TOWN, TORTOLA B.V.1 | NA | British Virgin Islands | |||
62371 | FC023985 | ADVANCED ANALOGIC TECHNOLOGIES HOLDINGS, INC. | Converted/Closed | Converted / Closed | C/O OFFSHORE INCORPORATIONS LTD, PO BOX 957,, OFFSHORE INCORPORATIONS CENTRE, ROAD TOWN TORTOLA BVI, VIRGIN ISLANDS | NA | VIRGIN ISLANDS |
How about officers that appear many times in the officers table with an address in the (British) Virgin Islands?
%sql SELECT name, count(*) FROM ukcompanies.officer_export WHERE address ILIKE '%virgin%' \
GROUP BY name ORDER BY count DESC limit 10
10 rows affected.
name | count |
---|---|
PEMEX SERVICES LIMITED | 55 |
AMERSHAM SERVICES LIMITED | 53 |
CHARLES PETER BITHELL | 10 |
EDWARD PETER GOSTLING | 8 |
MARK STEPHEN CHARLES FULLER | 7 |
IMEX EXECUTIVE, LTD. | 7 |
GILLIAN EILEEN SWITALSKI | 6 |
CLS SECRETARIES, LTD. | 6 |
MILLTOWN CORPORATE SERVICES LIMITED | 5 |
ANN DRAGOUMIS MURTLOW | 5 |
Let's twist these ideas around a bit. For companies that have registered addresses in the British Virgin Islands, are there any officers that show up a lot?
%sql SELECT name, COUNT(*) FROM ukcompanies.officer_export WHERE\
company_id IN ( SELECT id FROM ukcompanies.company_export WHERE ra_country ILIKE '%virgin%' )\
GROUP BY name ORDER BY count DESC LIMIT 10
10 rows affected.
name | count |
---|---|
SARAH FEZER | 2 |
ORLY LAX | 2 |
ALEJANDRO SAN MIGUEL | 1 |
EVA LANDI | 1 |
CODAN TRUST COMPANY LIMITED | 1 |
HENRY MALCOLM ENDELMAN | 1 |
JAMES KILBANE | 1 |
TELE2 UK SERVICES LIMITED | 1 |
RICHARD BENN | 1 |
PER BORGKLINT | 1 |
sql SELECT * FROM ukcompanies.officer_export WHERE name='SARAH FEZER'
2 rows affected.
id | company_id | name | position | start_date | end_date | date_of_birth | inactive | occupation | nationality | address |
---|---|---|---|---|---|---|---|---|---|---|
54994234 | 91525 | SARAH FEZER | secretary | 2002-07-04 | 1965-12-12 | true | MANAGER | IRISH | 2 CONSORT ROAD, COWES, ISLE OF WIGHT, PO31 7SQ | |
54994236 | 91525 | SARAH FEZER | director | 2002-07-04 | 1965-12-12 | true | MANAGER | IRISH | 2 CONSORT ROAD, COWES, ISLE OF WIGHT, PO31 7SQ |
sql SELECT * FROM ukcompanies.company_export WHERE \
id IN (SELECT DISTINCT company_id FROM ukcompanies.officer_export WHERE name='SARAH FEZER')
1 rows affected.
id | company_number | name | company_type | current_status | incorporation_date | dissolution_date | ra_street_address | ra_locality | ra_region | ra_postal_code | ra_country |
---|---|---|---|---|---|---|---|---|---|---|---|
91525 | FC023972 | AISLEAWAY.COM LIMITED | Other company type | Converted / Closed | AKARA BUILDING, 24 DE CASTRO STREET, WILKINS CAY 1 ROAD TOWN, TORTOLA BRITISH VIRGIN ISLANDS, VIRGIN ISLANDS | NA | VIRGIN ISLANDS |
For one of the directors who appears a lot in the officers table with an address in the (British) VIrgin Islands, what companies is he associated with?
sql SELECT * FROM ukcompanies.company_export WHERE \
id IN (SELECT DISTINCT company_id FROM ukcompanies.officer_export WHERE name='CHARLES PETER BITHELL')
6 rows affected.
id | company_number | name | company_type | current_status | incorporation_date | dissolution_date | ra_street_address | ra_locality | ra_region | ra_postal_code | ra_country |
---|---|---|---|---|---|---|---|---|---|---|---|
28521 | 02008246 | ACAL PLC | Public Limited Company | Active | 1986-04-09 | 2 CHANCELLOR COURT, OCCAM ROAD SURREY RESEARCH PARK, GUILDFORD | SURREY | NA | GU2 7AH | United Kingdom | |
28535 | 01882927 | ACAL ELECTRONICS LIMITED | Private Limited Company | Active | 1985-02-05 | 2 CHANCELLOR COURT, OCCAM ROAD SURREY RESEARCH PARK, GUILDFORD | SURREY | NA | GU2 7AH | United Kingdom | |
148013 | 02158386 | AMEGA GROUP LIMITED | Private Limited Company | Active | 1987-08-27 | 2 CHANCELLOR COURT OCCAM ROAD, SURREY RESEARCH PARK, GUILDFORD | SURREY | NA | GU2 7AH | United Kingdom | |
28527 | 00967025 | ACAL BFI CENTRAL PROCUREMENT UK LIMITED | Private Limited Company | Active | 1969-11-26 | 3 THE BUSINESS CENTRE, MOLLY MILLARS LANE, WOKINGHAM | BERKSHIRE | NA | RG41 2EY | United Kingdom | |
148011 | 02179018 | AMEGA ELECTRONICS LIMITED | Private Limited Company | Active | 1987-10-15 | 2 CHANCELLOR COURT OCCAM ROAD, SURREY RESEARCH PARK, GUILDFORD | SURREY | NA | GU2 7AH | United Kingdom | |
28528 | 01618416 | ACAL ELECTRONICS HOLDINGS LIMITED | Private Limited Company | Active | 1982-03-01 | 2 CHANCELLOR COURT, OCCAM ROAD SURREY RESEARCH PARK, GUILDFORD | SURREY | NA | GU2 7AH | United Kingdom |
Through working with the data, it is clear that it may benefit from some amount of cleaning in order to smooth the process of company investigations. For example, (the following is not a complete list):