满足不同角色需求: 领域专家 数据科学家 科研人员、高校教师及学生
Illinois DOC labeled faces dataset
1040次浏览 dataju 于 2021-08-16 发布
该内容是由用户自发提供,聚数力平台仅提供平台,让大数据应用过程中的信息实现共享、交易与托管。如该内容涉及到您的隐私或可能侵犯版权,请告知我们及时删除。
数据集概述

https://academictorrents.com/details/4b9b7e449aa732842aea1a7d4e6413f4507aea99

Tags: machine learningDatasetimagesprisonersAbstract:

This is a dataset of prisoner mugshots and associated data (height, weight, etc). The copyright status is public domain, since it's produced by the government, the photographs do not have sufficient artistic merit, and a mere collection of facts aren't copyrightable.

The source is the Illinois Dept. of Corrections. In total, there are 68149 entries, of which a few hundred have shoddy data.

It's useful for neural network training, since it has pictures from both front and side, and they're (manually) labeled with date of birth, name (useful for clustering), weight, height, hair color, eye color, sex, race, and some various goodies such as sentence duration and whether they're sex offenders.

Here is the readme file:

---BEGIN README---Scraped from the Illinois DOC.

https://www.idoc.state.il.us/subsections/search/inms_print.asp?idoc=https://www.idoc.state.il.us/subsections/search/pub_showfront.asp?idoc=https://www.idoc.state.il.us/subsections/search/pub_showside.asp?idoc=

paste <(cat ids.txt | sed 's/^/http://www.idoc.state.il.us\/subsections\/search\/pub_showside.asp\?idoc\=/g') <(cat ids.txt| sed 's/^/ out=/g' | sed 's/$/.jpg/g') -d '\n' > showside.txtpaste <(cat ids.txt | sed 's/^/http://www.idoc.state.il.us\/subsections\/search\/pub_showfront.asp\?idoc\=/g') <(cat ids.txt| sed 's/^/ out=/g' | sed 's/$/.jpg/g') -d '\n' > showfront.txtpaste <(cat ids.txt | sed 's/^/http://www.idoc.state.il.us\/subsections\/search\/inms_print.asp\?idoc\=/g') <(cat ids.txt| sed 's/^/ out=/g' | sed 's/$/.html/g') -d '\n' > inmates_print.txt

aria2c -i ../inmates_print.txt -j4 -x4 -l ../log-$(pwd|rev|cut -d/ -f 1|rev)-$(date +%s).txt

Then use htmltocsv.py to get the csv. Note that the script is very poorly written and may have errors. It also doesn't do anything with the warrant-related info, although there are some commented-out lines which may be relevant.Also note that it assumes all the HTML files are located in the inmates directory., and overwrites any csv files in csv if there are any.

front.7z contains mugshots from the frontside.7z contains mugshots from the sideinmates.7z contains all the html filescsv contains the html files converted to CSV

The reason for packaging the images is that many torrent clients would otherwise crash if attempting to load the torrent.

All CSV files contain headers describing the nature of the columns. For person.csv, the id is unique. For marks.csv and sentencing.csv, it is not.Note that the CSV files use semicolons as delimiters and also end with a trailing semicolon. If this is unsuitable, edit the arr2csvR function in htmltocsv.py.

There are 68149 inmates in total, although some (a few hundred) are marked as "Unknown"/"N/A"/"" in one or more fields.

The "height" column has been processed to contain the height in inches, rather than the height in feet and inches expressed as "X ft YY in."Some inmates were marked "Not Available", this has been replaced with "N/A".Likewise, the "weight" column has been altered "XXX lbs." -> "XXX". Again, some are marked "N/A".

The "date of birth" column has some inmates marked as "Not Available" and others as "". There doesn't appear to be any pattern. It may be related to the institution they are kept in. Otherwise, the format is MM/DD/YYYY.

The "weight" column is often rounded to the nearest 5 lbs.

Statistics for hair:43305 Black17371 Brown2887 Blonde or Strawberry2539 Gray or Partially Gray740 Red or Auburn624 Bald396 Not Available209 Salt and Pepper70 White7 Sandy1 Unknown

Statistics for sex:63409 Male4740 Female

Statistics for race:37991 Black20992 White8637 Hispanic235 Asian104 Amer Indian94 Unknown92 Bi-Racial4

Statistics for eyes:51714 Brown7808 Blue4259 Hazel2469 Green1382 Black420 Not Available87 Gray9 Maroon1 Unknown---END README---

Here is a formal summary:

---BEGIN SUMMARY---Documentation:

  1. Title: Illinois DOC dataset
  2. Source Information-- Creators: Illinois DOC-- Illinois Department of Corrections 1301 Concordia Court P.O. Box 19277 Springfield, IL 62794-9277 (217) 558-2200 x 2008 -- Donor: Anonymous-- Date: 2019
  3. Past Usage:-- None
  4. Relevant Information:-- All CSV files contain headers describing the nature of the columns. For person.csv, the id is unique. For marks.csv and sentencing.csv, it is not.-- Note that the CSV files use semicolons as delimiters and also end with a trailing semicolon. If this is unsuitable, edit the arr2csvR function in htmltocsv.py.-- The "height" column has been processed to contain the height in inches, rather than the height in feet and inches expressed as "X ft YY in."-- Some inmates were marked "Not Available", this has been replaced with "N/A".-- Likewise, the "weight" column has been altered "XXX lbs." -> "XXX". Again, some are marked "N/A".-- The "date of birth" column has some inmates marked as "Not Available" and others as "". There doesn't appear to be any pattern. It may be related to the institution they are kept in. Otherwise, the format is MM/DD/YYYY.-- The "weight" column is often rounded to the nearest 5 lbs.
  5. Number of Instances: 68149
  6. Number of Attributes: 30 (in some instances, information is missing. If so, it should be treated as unknown or undefined information)
  7. Attribute Information:ID: Alphanumeric internal ID (string)mark: Human-readable string describing marks and scars. May have zero, one, or multiple entries for one ID. (string)name: First and last name in format "SURNAME, GIVEN" - upper case. Redacted in provided copy, script must be executed to regenerate column. (string/void)date_of_birth: Date of birth in format MM/DD/YYYY. Some inmates are marked as "Not Available" and some inmates are marked as "". There doesn't appear to be any pattern. It may be related to the institution they are kept in. (date OR enumeration)weight: Physical weight in pounds OR "N/A". Often rounded to 5 lb increments. It may be related to the institution they are kept in. (integer OR void)hair: Hair color. One of ("Black", "Brown", "Blonde or Strawberry", "Gray or Partially Gray", "Red or Auburn", "Bald", "Not Available", "Salt and Pepper", "White", "Sandy", "Unknown") (enumeration)sex: Sex. One of ("Male", "Female") (enumeration)height: Height in inches. (integer)race: Race. One of ("Black", "White", "Hispanic", "Asian", "Amer Indian", "Unknown", "Bi-Racial", "") (enumeration)eyes: Eye color. One of ("Brown", "Blue", "Hazel", "Green", "Black", "Not Available", "Gray", "Maroon", "Unknown") (enumeration)admission_date: Date of admission in format MM/DD/YYYY. (date)projected_parole_date: Projected parole date in format MM/DD/YYYY OR one of ("TO BE DETERMINED", "Sexually D", "3yrs---Lif", "3yrs---Lif", "TO BE DETERMINED BY COMMITTING COURT") OR "" (if none projected) (date OR enumeration OR void)last_paroled_date: Last paroled date in format MM/DD/YYYY OR "" (if not paroled). (date OR void)projected_discharge_date: Projected discharge date in format MM/DD/YYYY OR one of ("TO BE DETERMINED", "3 YRS TO LIFE - TO BE DETERMINED", "INELIGIBLE", "SEXUALLY D", "TO BE DETERMINED BY COMMITTING COURT", "PENDING", "3 YRS TO L") OR "". (date OR enumeration OR void)parole_date: Parole date in format MM/DD/YYYY OR "". (date OR void)electronic_detention_date: Electronic detention date in format MM/DD/YYYY OR "". (date OR void)discharge_date: Date of discharge from institution. Always "", since discharged offenders are not included in the data set. (void)parent_institution: Institution at which offender is kept, or "PAROLE" if parole. One of ("STATEVILLE CORRECTIONAL CENTER", "SHERIDAN CORRECTIONAL CENTER", "PINCKNEYVILLE CORRECTIONAL CENTER", "MENARD CORRECTIONAL CENTER", "LOGAN CORRECTIONAL CENTER", "ILLINOIS RIVER CORRECTIONAL CENTER", "DIXON CORRECTIONAL CENTER", "VANDALIA CORRECTIONAL CENTER", "GRAHAM CORRECTIONAL CENTER", "LAWRENCE CORRECTIONAL CENTER", "EAST MOLINE CORRECTIONAL CENTER", "SHAWNEE CORRECTIONAL CENTER", "JACKSONVILLE CORRECTIONAL CENTER", "DANVILLE CORRECTIONAL CENTER", "VIENNA CORRECTIONAL CENTER", "HILL CORRECTIONAL CENTER", "BIG MUDDY CORRECTIONAL CENTER", "CENTRALIA CORRECTIONAL CENTER", "ROBINSON CORRECTIONAL CENTER", "WESTERN ILLINOIS CORRECTIONAL CENTER", "LINCOLN CORRECTIONAL CENTER", "TAYLORVILLE CORRECTIONAL CENTER", "SOUTHWESTERN CORRECTIONAL CENTER", "PONTIAC CORRECTIONAL CENTER", "CONCORDIA", "DECATUR CORRECTIONAL CENTER", "KEWANEE LIFE SKILLS RE-ENTRY CENTER", "JOLIET TREATMENT CENTER", "PAROLE") (enumeration)offender_status: Status of offender. One of ("CUSTODY", "PAROLE", "ABSCONDER", "RECEPTION", "WORK RELEASE CUSTODY", "TEMP RESIDENT", "NON-IDOC CUSTODY", "WRIT", "BOND", "HOME CUSTODY", "DETAINER", "MEDICAL FURLOUGH", "ESCAPE") (enumeration)location: Location. One of ("PAROLE DISTRICT 1", "PAROLE DISTRICT 2", "PAROLE DISTRICT 3", "MENARD", "INTERSTATE COMPACT", "PINCKNEYVILLE", "LAWRENCE CORRECTIONAL CENTER", "PAROLE DISTRICT 4", "ILLINOIS RIVER", "DANVILLE", "HILL", "SHAWNEE", "DIXON", "SHERIDAN", "BIG MUDDY RIVER", "LOGAN", "PAROLE", "GRAHAM", "CENTRALIA", "EAST MOLINE", "NORTHERN RECEPTION CENTER", "VANDALIA", "ROBINSON", "STATEVILLE", "WESTERN ILLINOIS", "VIENNA", "TAYLORVILLE", "LINCOLN", "JACKSONVILLE", "PAROLE DISTRICT 5", "PONTIAC", "DIXON CORRECTIONAL CENTER", "SOUTHWESTERN ILLINOIS", "DECATUR", "", "MENARD MEDIUM SECURITY UNIT", "PONTIAC MEDIUM SECURITY", "GRAHAM R&C", "CROSSROADS CCC", "KEWANEE", "ILL/OTH STATE/FED CONCURR", "PEORIA CCC", "NORTH LAWNDALE ADULT TRANSITI", "STATEVILLE FARM", "GREENE COUNTY WORK CAMP", "COURT", "PITTSFIELD WORK CAMP", "FOX VALLEY CCC", "BOND", "SOUTHWESTERN IL WORK CAMP", "MENARD R&C", "ELECTRONIC DETENTION", "CLAYTON WORK CAMP", "DIXON SPRINGS BOOT", "DUQUOIN IMPACT INCARCERATION P", "DETAINER", "PAROLE DISTRICTS", "FURLOUGH", "ESCAPE", "DEPT. OF HUMAN SERVICES", "FED/STATE/TRANSFER OTH ST", "WOMENS TREATMENT CENTER", "JAIL", "CONCORDIA") (enumeration)sex_offender_registry_required: Whether the offender is required to register as a sex offender. One of ("true", "") (boolean)alias: Aliases, separated by pipe sign OR one of ("", "None Reported") (string OR enumeration)mittimus: Mittimus ID (string)class: Class of offender. One of ("4", "2", "3", "X", "1", "M", "U", "A", "B", "C") (enumeration)count: Count of offenses (?) (integer)offense: Offense. One of 1576 values. Appears to have been keyed in by hand. (enumeration/string)custody_date: Date at which offender was taken into custody. (date)sentence: Duration of sentence in format "X Years Y Months Z Days", where Y and Z may exceed 12 and 31 respectively OR one of ("DEATH", "LIFE", "SDP") (int[3] OR enumeration)county: County or "out-of-state". One of ("COOK", "WILL", "WINNEBAGO", "KANE", "DUPAGE", "MADISON", "MACON", "LAKE", "PEORIA", "ST-CLAIR", "CHAMPAIGN", "MCLEAN", "SANGAMON", "KANKAKEE", "VERMILION", "LA SALLE", "TAZEWELL", "ADAMS", "LIVINGSTON", "STEPHENSON", "MCHENRY", "COLES", "WHITESIDE", "JEFFERSON", "MARION", "KENDALL", "ROCK-ISLAND", "KNOX", "HENRY", "DEKALB", "BOONE", "JACKSON", "MONTGOMERY", "MACOUPIN", "SALINE", "FRANKLIN", "LOGAN", "ROCK ISLAND", "CHRISTIAN", "FAYETTE", "CLINTON", "MORGAN", "WILLIAMSON", "JERSEY", "WHITE", "LEE", "MASON", "PIKE", "EDGAR", "RANDOLPH", "WOODFORD", "OGLE", "EFFINGHAM", "FULTON", "GRUNDY", "BOND", "IROQUOIS", "SHELBY", "UNION", "CRAWFORD", "LAWRENCE", "BUREAU", "CLAY", "MCDONOUGH", "DEWITT", "JOHNSON", "PERRY", "WAYNE", "MASSAC", "RICHLAND", "CLARK", "CASS", "HANCOCK", "ALEXANDER", "DOUGLAS", "WABASH", "HAMILTON", "GREENE", "WARREN", "FORD", "EDWARDS", "MONROE", "WASHINGTON", "MOULTRIE", "CUMBERLAND", "MERCER", "MENARD", "CARROLL", "GALLATIN", "SCHUYLER", "JASPER", "BROWN", "CALHOUN", "PIATT", "JO-DAVIESS", "POPE", "HARDIN", "PULASKI", "MARSHALL", "HENDERSON", "ST CLAIR", "PUTNAM", "SCOTT", "STARK", "OUT-OF-STATE", "OUT OF STATE", "JO DAVIESS") OR "" (enumeration or void)sentence_discharged: Whether the sentence has been discharged. One of ("YES", "NO") (boolean)
  8. Missing Attribute Values: See values marked "void" above.
  9. Class Distribution:

Statistics for hair:43305 Black17371 Brown2887 Blonde or Strawberry2539 Gray or Partially Gray740 Red or Auburn624 Bald396 Not Available209 Salt and Pepper70 White7 Sandy1 Unknown

Statistics for sex:63409 Male4740 Female

Statistics for race:37991 Black20992 White8637 Hispanic235 Asian104 Amer Indian94 Unknown92 Bi-Racial4

Statistics for eyes:51714 Brown7808 Blue4259 Hazel2469 Green1382 Black420 Not Available87 Gray9 Maroon1 Unknown

Summary Statistics:medianweight: 185height: 69---END SUMMARY---

Image:  


License: 

Public Domain


数据集详情
暂无
数据集元数据
暂无
概念层次
领域场景: 未指定
领域问题: 未指定
领域应用: 未指定
应用案例: 未指定